Open beta for ICLab TagTeam

I’m pleased to announce the open beta test of ICLab’s clearinghouse for data about censored websites, (temporary hostname, will be moved under Real Soon Now). This site will aggregate manual and automated test reports, facilitate more efficient use of automated test resources, and help policy analysts draw conclusions about what gets censored in particular countries.

… Well, that’s the aspiration, anyway. Right now what we have is a slightly reskinned instance of the Berkman Center’s TagTeam software, loaded up with a set of sites reported as censored in leaks and so on (mostly about five years old) and the automated topic analysis I described in my PETS paper last year, and taking one ongoing input feed, from Herdict. I said it was a beta test. :-)

If any of the above sounds interesting to you, there are a bunch of ways you can help:

  • The most important thing I need right now is additional inputs:

    • Ongoing, manually curated reports of censored websites in a specific country (e.g. Engelli Web,
    • Ongoing crowdsourced reports of inaccessible websites (like Herdict).
    • Recent, credible one-time leaks of the actual blacklist used in some country, or shipped with some specific commercial filtering software.
    • Control groups: relatively low-volume feeds of long-tail material that isn’t particularly likely to get censored. (We already have the tall head.)

    The optimal format for a continuously updated data source is an RSS feed that can be directly added to TagTeam as an input. If that’s not available, the next best thing is a screen-scraper that takes the existing website or whatever and converts it to an RSS feed (we already have infrastructure for this; send a pull request to, adding a program to the input-feeds directory, and I’ll take it from there).

    The optimal format for a one-time source is whatever you have, I’m going to have to write a custom import script for it regardless. :-/

  • The second most helpful thing would be manual verification of the topic labels assigned by my old analysis. Simply create an account on the site, and then go through the sites that already have a topic:something tag and add more tags indicating whether that is accurate. Please get in touch with me first so we can coordinate efforts.

    This task does not require a lot of technical skill, but it does need a lot of time and patience, and a strong stomach for the nasty underbelly of the Internet, ranging from garden-variety pornography all the way up to active advocacy for genocide. Fluency in diverse natural languages will also be helpful; the top five after English are Chinese, Japanese, Russian, Arabic, and Persian. Finally, many sites have been taken over by spam and/or malware, so you’ll want to use a disposable and locked-down browser instance.

  • General poking at the site, kicking the tires, finding things that don’t work and telling me about them is also very helpful. (I already know about the missing documentation.)

  • If you have any experience hacking Ruby on Rails, I need all the help I can get upstreaming my changes to TagTeam and developing further extensions that we’re going to need.

  • If you have any nonzero level of skill with web, graphic, and/or UI design, I also need help improving the presentation of the site.

  • Anyone who runs ongoing, automated monitoring for censorship, on any scale from one city to the whole world, is invited to get in touch to talk about how my data might help you do it better.

  • If you have ideas for interesting uses for a large collection of possibly-censored websites with extracted text and topic labels, or interesting analyses we could run on it, please also get in touch.

Please note that account creation is manual right now—after filling out the sign-up form, email me at and tell me the handle you picked plus a little about who you are and what you propose to do with the account.

Reproduction and dissemination of this announcement is encouraged.

A simple ritual for laying to rest domestic ghosts

In honor of the feast of All Souls, I thought I might put on a costume, as it were, and write a blog post as if I were an old English cunning man and you, my readers, came to me for advice on supernatural problems, rather than computational ones.

So your house is haunted. You don’t know who the ghosts were in life, and you’re maybe a bit scared to find out, but you would like to gently encourage them to let go of their troubles and move on. I have for you a simple ritual involving a little of the old rune-magic.


Call for Volunteers: Active Geolocation

For the past few months I’ve been working on a research study of active geolocation algorithms. These attempt to determine where in the world a computer is, by measuring how long it takes network messages from that computer to reach other computers in known locations.

In order to test some of these algorithms thoroughly, I need volunteers who are willing to run my measurement software on their computers, and tell me where they are. I’m especially interested in data reported from computers that are not in Europe nor North America, but data from anywhere is useful. Currently, running the software takes a fair bit of technical skill—if you’re not comfortable with the Unix command line, please wait for the friendlier web-based version which is in development.

If you’re interested, please go to for further instructions.

(For legal reasons, you must be at least 18 years old to volunteer.)

(Reproduction and dissemination of this call for volunteers is encouraged.)

Using GPG2 with a read-only .gnupg directory

Another bulletin funded by the I Just Blew An Entire Morning On This Foundation:

Suppose you want to encrypt and sign files using gpg, but without giving it ownership or write access to its own keystore. For instance, this might be necessary if the gpg process is going to be run from a low-privilege CGI user and you don’t have root privileges on the webserver. This is relatively straightforward with the classic version 1, although there’s an error message that’s harmless but impossible to suppress, but version 2 made some architectural changes that make it harder, and does not document the necessary tricks. Below the fold, how you do it.


2016 Hugo Award nominations

Let’s talk about something more fun, shall we? These were my nominations for the 2016 Hugo Awards. The final ballot will be announced on April 26. Hugo nominations, unlike final ballots, are not ranked. I’d be happy to see any of these things win their categories.

I read a lot of good stuff at novel-length this year, but not enough shorter fiction to fill all five nomination slots per category. Something to work harder on next year, I suppose. (It didn’t help that I spent most of January and February in paper crunch mode.) I don’t even try to nominate outside the fiction categories.

Links go to the full text of the work and to authors’ websites when possible, otherwise to Goodreads pages.


  • Zen Cho, Sorcerer to the Crown. If you, like me, have been wishing for a sequel to Jonathan Strange & Mr Norrell since the day you finished reading it, you will like this book.

  • Naomi Novik, Uprooted. Polish folktale crossed with supernatural horror. Online reviews tend to be all about the characters (whom they either love or hate) but the really compelling aspect of this one, IMNSHO, is the evil magic forest.

  • Kazuo Ishiguro, The Buried Giant. It takes some doing to achieve a new take on the Matter of Britain nowadays; Ishiguro has pulled it off.

  • Judith Tarr, Forgotten Suns. Three words: space opera archaeology. Why haven’t people done more of that? Yeah, Stargate, but it was almost never central to the plot.

  • Jo Walton, The Just City. Pallas Athene decides to create the allegorical city from Plato’s Republic in real life, basically to see what happens.



Short Story

Graphic Story

  • Sydney Padua, The Thrilling Adventures of Lovelace and Babbage. What if Ada Lovelace and Charles Babbage had successfully constructed an Analytical Engine? is not new territory—Bruce Sterling did it back in the 80s, and it’s often implied background for steampunk Victoriana—but doing it as a humorous graphic novel which is also a detailed work of historical research, with footnotes and references and everything: that deserves recognition.

  • Abbadon, Kill Six Billion Demons. This is worth reading just for the art. And the incredibly vast world that has been built. The plot is set up like your standard everygirl rescues love interest in distress, taking several levels in badass along the way but I doubt that’s where it’s going.

  • Ru Xu, Saint for Rent. So often you see time travel stories where the time travel is just a way to put people into the interesting historical or futuristic situations, and not actually used to its full power. This is not like that.

  • Pascale Lepas, Wilde Life. Oscar rented an old house off craigslist, then things got weird… Creepy rural Vermont and creepy rural Arizona are both well-traveled paths, but how often do you see cheerful-yet-creepy rural Oklahoma?

  • Dave Kellett, Drive. Relatively straightforward space opera, but lots of fun detail and manages to remain tongue-in-cheek while also running a deadly serious plot.

Do not do business with Northwest Talent Search

A depressing number of computer industry recruiters cannot be bothered to read the very first paragraph of my LinkedIn profile (which is now also the first paragraph of the contact information page of this very website—probably should have done that years ago), or else they think they are ~special snowflakes~ and it does not apply to them. For reference, this paragraph reads


I get unwanted solicitations about once a month, and I reply with a polite but acerbic note about how they should’ve noticed the paragraph in ALL CAPS that says don’t contact me, and usually that’s the end of it.

Last week I got one from an outfit calling itself Northwest Talent Search, Inc. (They don’t have a website.) It does just about everything wrong:

Hello Zack

I am working with one of the fastest growing startups in the world on a Aspiring Software Engineering Manager search. They just landed a major partnership with a fortune 500 company. If you have an interest in joining a world class team and an incredible opportunity what would be a good time for a phone call and a good number to reach you at?


Besides ignoring the request not to contact me: Why would anyone not want to know which startup, which megacorp, and at least the executive summary of the concrete job description? If you’re going to cold-contact people with job offers, these things should always appear in the initial message. And anyone who has done their due diligence on me should know that I’m not the right candidate for any sort of engineering management position and I’m allergic to startups. So I was less polite than I usually am, when replying:

Thank you for your interest, however:

  1. I have made it abundantly clear, both on my personal website, and everywhere recruiters typically trawl for interesting people, that I am not looking for a job and do not want to be cold-contacted with job offers.

  2. I have neither any interest nor any qualifications for an engineering management position, and I do not understand how you could possibly have gotten the impression that I might be an appropriate candidate for such.

  3. As a matter of basic courtesy, in your initial message you should have stated the name of the company you are recruiting for and given a couple sentences’ description of what business they are in and what the job responsibilities are.

Never contact me again. Do not even reply to this message.

Now, if that had been the end of it, you wouldn’t be reading this post. Today I received this:

Hello Zack

I am working with one of the fastest growing startups in the world on a Backend Engineer search. They just landed a major partnership with a fortune 500 company. If you have an interest in joining a world class team and an incredible opportunity what would be a good time for a phone call and a good number to reach you at?


The only change is the job title. Backend Engineer is less wrong than Aspiring Software Engineering Manager, but it’s still wrong. And sending another instance of what is evidently a form letter, after having been told not to contact me again, is both disrespectful and unprofessional.

Hence what I dearly hope will be my final reply to them, and this post.

You sent me a message last week which was word-for-word identical but for the job title. In my reply, I made it plain that I was not interested and I did not want to hear from you ever again.

Your continued solicitations are unprofessional, as are the vagueness of your cold-contact messages (as explained in the previous reply) and your clear lack of research on me prior to contact.

I have directed [MY MAIL CLIENT] to treat all further messages from anyone at your company as spam, and I have filed an abuse report with [YOUR BULKMAIL SERVICE]. I will also be publishing all of our communications on my website as a warning to others not to do business with your company.

My previous reply, for reference: [etc]

If you’re a company looking to hire: Don’t do business with these clowns, there are people who will do much better by you.

If you’re also getting these: I strongly suspect you don’t want any of the jobs they are soliciting for.

Bootstrapping trust in compilers

The other week, an acquaintance of mine was kvetching on Twitter about how the Rust compiler is written in Rust, and so to get started with the language you have to download a binary, and there’s no way to validate it—you could use the binary plus the matching compiler source to recreate the binary, but that doesn’t prove anything, and also if the compiler were really out to get you, you would be screwed the moment you ran the binary.

This is not a new problem, nor is it a Rust-specific problem. I recall having essentially the same issue back in 2000, give or take, with GNAT, the Ada front-end for GCC. It is written in Ada, and (at the time, anyway) not just any Ada compiler would do, you had to have a roughly contemporaneous version of … GNAT. It was especially infuriating compared to the rest of GCC, which (again, at the time) bent over backward to be buildable with any C compiler you could get your hands on, even a traditional one that didn’t support all of the 1989 language standard. But even that is problematic for someone who would rather not trust any machine code they didn’t verify themselves.

One way around the headache is diverse recompilation, in which you compile the same compiler with two different compilers, then recompile it with itself-as-produced-by-each, and compare the results. But this requires you to have two different compilers in the first place. As of this writing there is only one Rust compiler. There aren’t that many complete implementations of C++ out there, either, and you need one of those to build LLVM (which Rust depends on). I think you could devise a compiler virus that could propagate itself via both LLVM and GCC, for instance.

What’s needed, I think, is an independent root of correctness. A software environment built from scratch to be verifiable, maybe even provably correct, and geared specifically to host independent implementations of compilers for popular languages. They need not be terribly good at optimizing, because the only thing you’d ever use them for is to be one side of a diversely-recompiled bootstrap sequence. It has to be a complete and isolated environment, though, because it wouldn’t be impossible to propagate a compiler virus through the operating system kernel, which can see every block of I/O, after all.

And it seems to me that this environment naturally divides into four pieces. First, a tiny virtual machine. I’m thinking a FORTH interpreter, which is small enough that one programmer can code it by hand in assembly language, and having done that, another programmer can audit it by hand. You need multiple implementations of this, so you can check them against each other to guard against malicious lower layers—it could run on the bare metal, maybe, but the bare metal has an awful lot of clever embedded in it these days. But hopefully this is the only thing you need to implement more than once.

Second, you use the FORTH interpreter as the substratum for a more powerful language. If there’s a language in which each program is its own proof of correctness, that would be the obvious choice, but my mental allergy to arrow languages has put me off following that branch of PL research. Lisp is generally a good language to write compilers in, so a small dialect of that would be another obvious choice. (Maybe leave out the call/cc.)

Third, you write compilers in the more powerful language, with both the FORTH interpreter and more conventional execution environments as code-generation targets. These compilers can then be used to compile other stuff to run in the environment, and conversely, you can build arbitrary code within the environment and export it to your more conventional OS.

The fourth and final piece is a way of getting data in and out of the environment. I imagine it as strictly batch-oriented, not interactive at all, simply because that cuts out a huge chunk of complexity from the FORTH interpreter; similarly it does not have any business talking to the network, nor having any notion of time, maybe not even concurrency—most compile jobs are embarrassingly parallel, but again, huge chunk of complexity. What feels not-crazy to me is some sort of trivial file system: ar archive level of trivial, all files write-once, imposed on a linear array of disk blocks.

It is probably also necessary to reinvent Make, or at least some sort of batch job control language.

Operating system selection for $PROJECT, mid-2015

Presented without context, for amusement purposes only, a page from my notes:

FreeBSD NetBSD Linux
Per-process default route Poorly documented,
possibly incomplete
Probably not Poorly documented,
Can compile PhantomJS Probably Probably Yes
Jails Yes No Not really
Xen paravirtual guest Incomplete Yes Yes
System call tracing truss None? strace
pipe2 Yes Yes Yes
closefrom Yes Yes No
sysctl Yes Yes No
getauxval No No Yes
signalfd No No Yes
execvpe No Yes Yes
Reference documentation Acceptable (YMMV1) Acceptable (YMMV) Major gaps
Tutorial documentation Terrible Terrible Terrible
Package management Broken as designed Broken as designed Good
System maintenance automation I can’t find any I can’t find any Acceptable
QA reputation Excellent Good Good
Security reputation Good Good Debatable
Development community Unknown to me Unknown to me Full of assholes

1 It makes sense to me, but then, I taught myself Unix system programming and administration by reading the SunOS 4 manpages.

Google Voice Search and the Appearance of Trustworthiness

Last week there were several bug reports [1] [2] [3] about how Chrome (the web browser), even in its fully-open-source Chromium incarnation, downloads a closed-source, binary extension from Google’s servers and installs it, without telling you it has done this, and moreover this extension appears to listen to your computer’s microphone all the time, again without telling you about it. This got picked up by the trade press [4] [5] [6] and we rapidly had a full-on Internet panic going.

If you dig into the bug reports and/or the open source part of the code involved, which I have done, it turns out that what Chrome is doing is not nearly as bad as it looks. It does download a closed-source binary extension from Google, install it, and hide it from you in the list of installed extensions (technically there are two hidden extensions involved, only one of which is closed-source, but that’s only a detail of how it’s all put together). However, it does not activate this extension unless you turn on the voice search checkbox in the settings panel, and this checkbox has always (as far as I can tell) been off by default. The extension is labeled, accurately, as having the ability to listen to your computer’s microphone all the time, but of course it does not get to do this until it is activated.

As best anyone can tell without access to the source, what the closed-source extension actually does when it’s activated is monitor your microphone for the code phrase OK Google. When it detects this phrase it transmits the next few words spoken to Google’s servers, which convert it to text and conduct a search for the phrase. This is exactly how one would expect a voice search feature to behave. In particular, a voice-activated feature intrinsically has to listen to sound all the time, otherwise how could it know that you have spoken the magic words? And it makes sense to do the magic word detection with code running on the local computer, strictly as a matter of efficiency. There is even a non-bogus business reason why the detector is closed source; speech recognition is still in the land where tiny improvements lead to measurable competitive advantage.

So: this feature is not actually a massive privacy violation. However, Google could and should have put more care into making this not appear to be a massive privacy violation. They wouldn’t have had mud thrown at them by the trade press about it, and the general public wouldn’t have had to worry about it. Everyone wins. I will now dissect exactly what was done wrong and how it could have been done better.

It was a diagnostic report, intended for use by developers of the feature, that gave people the impression the extension was listening to the microphone all the time. Below is a screen shot of this diagnostic report (click for full width). You can see it on your own copy of Chrome by typing chrome://voicesearch into the URL bar; details will probably differ a little (especially if you’re not using a Mac).

Screen shot of Google Voice Search diagnostic report, taken on Chrome 43 running on MacOS X. The most important lines of text are 'Microphone: Yes', 'Audio Capture Allowed: Yes', 'Hotword Search Enabled: No', and 'Extension State: ENABLED.
Screen shot of Google Voice Search diagnostic report, taken on Chrome 43 running on MacOS X.

Google’s first mistake was not having anyone check this over for what it sounds like it means to someone who isn’t familiar with the code. It is very well known that when faced with a display like this, people who aren’t familiar with the code will pick out whatever bits they think they understand and ignore everything else, even if that means they completely misunderstand it. [7] In this case, people see Microphone: Yes and Audio Capture Allowed: Yes and maybe also Extension State: ENABLED and assume that this means the extension is actively listening right now. (What the developers know it means is this computer has a microphone, the extension could listen to it if it had been activated, and it’s connected itself to the checkbox in the preferences so it can be activated. And it’s hard for them to realize that anyone could think it would mean something else.)

They didn’t have anyone check it because they thought, well, who’s going to look at this who isn’t a developer? Thing is, it only takes one person to look at it, decide it looks hinky, mention it online, and now you have a media circus on your hands. Obscurity is no excuse for not doing a UX review.

Now, mistake number two becomes evident when you consider what this screen ought to say in order not to scare people who haven’t turned the feature on (and maybe this is the first they’ve heard of it even): something like

Voice Search is inactive.

(A couple of sentences about what Voice Search is and why you might want it.) To activate Voice Search, go to the preferences screen and check the box.

It would also be okay to have a duplicate checkbox right there on this screen, and to have all the same debugging information show up after you check the box. But wait—how do developers diagnose problems with downloading the extension, which happens before the box has been checked? And that’s mistake number two. The extension should not be downloaded until the box is checked. I am not aware of any technical reason why that couldn’t have been the way it worked in the first place, and it would go a long way to reassure people that this closed-source extension can’t listen to them unless they want it to. Note that even if the extension were open source it might still be a live question whether it does anything hinky. There’s an excellent chance that it’s a generic machine recognition algorithm that’s been trained to detect OK Google, which training appears in the code as a big lump of meaningless numbers—and there’s no way to know whether those numbers train it to detect anything besides OK Google. Maybe if you start talking about bombs the computer just quietly starts recording…

Mistake number three, finally, is something they got half-right. This is not a core browser feature. Indeed, it’s hard for me to imagine any situation where I would want this feature on a desktop computer. Hands-free operation of a mobile device, sure, but if my hands are already on a keyboard, that’s faster and less bothersome for other people in the room. So, Google implemented this frill as a browser extension—but then they didn’t expose that in the user interface. It should be an extension, and it should be visible as such. Then it needn’t take up space in the core preferences screen, even. If people want it they can get it from the Chrome extension repository like any other extension. And that would give Google valuable data on how many people actually use this feature and whether it’s worth continuing to develop.


I’d like to announce my new project,, where I will be reading and reviewing papers from the academic literature mostly (but not exclusively) about information security. I made a false start at this near the end of 2013 (it is the same site that’s been linked under readings in the top bar since then) but now I have a posting queue and a rhythm going. Expect three to five reviews a week. It’s not going to be syndicated to Planet Mozilla, but I may mention it here when I post something I think is of particular interest to that audience.

Longtime readers of this blog will notice that it has been redesigned and matches readings. That process is not 100% complete, but it’s close enough that I feel comfortable inviting people to kick the tires. Feedback is welcome, particularly regarding readability and organization; but unfortunately you’re going to have to email it to me, because the new CMS has no comment system. (The old comments have been preserved.) I’d also welcome recommendations of comment systems which are self-hosted, open-source, database-free, and don’t involve me manually copying comments out of my email. There will probably be a technical postmortem on the new CMS eventually.

(I know about the pages that are still using the old style sheet.)