HTML &c

Long, chewy articles about things related to the World Wide Web, mostly written during my time at Mozilla.

Google Voice Search and the Appearance of Trustworthiness

Last week there were several bug reports [1] [2] [3] about how Chrome (the web browser), even in its fully-open-source Chromium incarnation, downloads a closed-source, binary extension from Google’s servers and installs it, without telling you it has done this, and moreover this extension appears to listen to your computer’s microphone all the time, again without telling you about it. This got picked up by the trade press [4] [5] [6] and we rapidly had a full-on Internet panic going.

If you dig into the bug reports and/or the open source part of the code involved, which I have done, it turns out that what Chrome is doing is not nearly as bad as it looks. It does download a closed-source binary extension from Google, install it, and hide it from you in the list of installed extensions (technically there are two hidden extensions involved, only one of which is closed-source, but that’s only a detail of how it’s all put together). However, it does not activate this extension unless you turn on the voice search checkbox in the settings panel, and this checkbox has always (as far as I can tell) been off by default. The extension is labeled, accurately, as having the ability to listen to your computer’s microphone all the time, but of course it does not get to do this until it is activated.

As best anyone can tell without access to the source, what the closed-source extension actually does when it’s activated is monitor your microphone for the code phrase OK Google. When it detects this phrase it transmits the next few words spoken to Google’s servers, which convert it to text and conduct a search for the phrase. This is exactly how one would expect a voice search feature to behave. In particular, a voice-activated feature intrinsically has to listen to sound all the time, otherwise how could it know that you have spoken the magic words? And it makes sense to do the magic word detection with code running on the local computer, strictly as a matter of efficiency. There is even a non-bogus business reason why the detector is closed source; speech recognition is still in the land where tiny improvements lead to measurable competitive advantage.

So: this feature is not actually a massive privacy violation. However, Google could and should have put more care into making this not appear to be a massive privacy violation. They wouldn’t have had mud thrown at them by the trade press about it, and the general public wouldn’t have had to worry about it. Everyone wins. I will now dissect exactly what was done wrong and how it could have been done better.

It was a diagnostic report, intended for use by developers of the feature, that gave people the impression the extension was listening to the microphone all the time. Below is a screen shot of this diagnostic report (click for full width). You can see it on your own copy of Chrome by typing chrome://voicesearch into the URL bar; details will probably differ a little (especially if you’re not using a Mac).

Screen shot of Google Voice Search diagnostic report, taken on Chrome 43 running on MacOS X. The most important lines of text are 'Microphone: Yes', 'Audio Capture Allowed: Yes', 'Hotword Search Enabled: No', and 'Extension State: ENABLED.
Screen shot of Google Voice Search diagnostic report, taken on Chrome 43 running on MacOS X.

Google’s first mistake was not having anyone check this over for what it sounds like it means to someone who isn’t familiar with the code. It is very well known that when faced with a display like this, people who aren’t familiar with the code will pick out whatever bits they think they understand and ignore everything else, even if that means they completely misunderstand it. [7] In this case, people see Microphone: Yes and Audio Capture Allowed: Yes and maybe also Extension State: ENABLED and assume that this means the extension is actively listening right now. (What the developers know it means is this computer has a microphone, the extension could listen to it if it had been activated, and it’s connected itself to the checkbox in the preferences so it can be activated. And it’s hard for them to realize that anyone could think it would mean something else.)

They didn’t have anyone check it because they thought, well, who’s going to look at this who isn’t a developer? Thing is, it only takes one person to look at it, decide it looks hinky, mention it online, and now you have a media circus on your hands. Obscurity is no excuse for not doing a UX review.

Now, mistake number two becomes evident when you consider what this screen ought to say in order not to scare people who haven’t turned the feature on (and maybe this is the first they’ve heard of it even): something like

Voice Search is inactive.

(A couple of sentences about what Voice Search is and why you might want it.) To activate Voice Search, go to the preferences screen and check the box.

It would also be okay to have a duplicate checkbox right there on this screen, and to have all the same debugging information show up after you check the box. But wait—how do developers diagnose problems with downloading the extension, which happens before the box has been checked? And that’s mistake number two. The extension should not be downloaded until the box is checked. I am not aware of any technical reason why that couldn’t have been the way it worked in the first place, and it would go a long way to reassure people that this closed-source extension can’t listen to them unless they want it to. Note that even if the extension were open source it might still be a live question whether it does anything hinky. There’s an excellent chance that it’s a generic machine recognition algorithm that’s been trained to detect OK Google, which training appears in the code as a big lump of meaningless numbers—and there’s no way to know whether those numbers train it to detect anything besides OK Google. Maybe if you start talking about bombs the computer just quietly starts recording…

Mistake number three, finally, is something they got half-right. This is not a core browser feature. Indeed, it’s hard for me to imagine any situation where I would want this feature on a desktop computer. Hands-free operation of a mobile device, sure, but if my hands are already on a keyboard, that’s faster and less bothersome for other people in the room. So, Google implemented this frill as a browser extension—but then they didn’t expose that in the user interface. It should be an extension, and it should be visible as such. Then it needn’t take up space in the core preferences screen, even. If people want it they can get it from the Chrome extension repository like any other extension. And that would give Google valuable data on how many people actually use this feature and whether it’s worth continuing to develop.

I Will File Bugs For You

This post prompted by Aaron Klotz’s Diffusion of Responsibility and Sumana Harihareswara’s Inessential Weirdnesses in Open Source.

One of the most common ways to start interacting with a free software project, as opposed to just using the software produced by that project, is when you trip over a bug or a missing feature and now you need to go tell the developers about it. Unfortunately, that process is often incredibly off-putting. If there’s a bug tracking system, it is probably optimized for people who spend all day every day working with it, and may appear to demand all kinds of information you have no idea how to supply. If there isn’t, you’re probably looking at signing up for some sort of mailing list (mailing list! how retro!) Either way, it may not be easy to find, and there’s a nonzero chance that some neckbeard with a bad attitude is going to yell at you. It shouldn’t be so, but it is.

So, I make this offer to you, the general public, as I have been doing for close friends for many years: if you don’t want to deal with that shit, I will file bugs for you. I’ve been on the Internet since not quite the elder days, and I’ve been hacking free software almost as long; I know how to find these people and I know how to talk to them. We’ll have a conversation and we’ll figure out exactly what’s wrong and then I’ll take it from there. I’m best at compilers and Web browsers, but I’ll give anything a shot.

THE FINE PRINT: If you want to take me up on this, please do so only via email; my address is on the Contact page. Please allow up to one week for an initial response, as this service is provided in my copious free time.

Offer valid only for free software (also known as open source) (as opposed to software that you are not allowed to modify or redistribute, e.g. Microsoft Word). Offer also only valid for problems which I can personally reproduce; it’s not going to go well for anyone involved if I have to play telephone with you and the developers. Offer specifically not valid for operating system kernels or device drivers of any kind, both because those people are even less pleasant to work with than the usual run of neckbeards, and because that class of bugs tends to be hardware-dependent and therefore difficult for me to personally reproduce on account of I don’t have the exact same computer as you.

The management cannot guarantee this service will cause bugs to actually get fixed in any kind of timely fashion, or, in fact, ever.

Should new web features be HTTPS only?

I doubt anyone who reads this will disagree with the proposition that the Web needs to move toward all traffic being encrypted always. Yet there is constant back pressure in the standards groups, people trying to propose network-level innovations that provide only some of the fundamental three guarantees of a secure channel—maybe you can have integrity but not confidentiality or authenticity, for instance. I can personally see a case for an authentic channel that provides integrity and authenticity but not confidentiality, but I don’t think it’s useful enough to back off on the principle that everything should be encrypted always.

So here’s a way browser vendors could signal that we will not stand for erosion of secure channels: starting with a particular, documented and well-announced, version, all new content features are only usable for fully HTTPS pages. Everything that worked prior to that point continues to work, of course. I am informed that there is at least some support for this within the Chrome team. It might be hard to sell Microsoft on it. What does the fox think?

HTTP application layer integrity/authenticity guarantees

Note: These are half-baked ideas I’ve been turning over in my head, and should not be taken all that seriously.

Best available practice for mutually authenticated Web services (that is, both the client and the server know who the other party is) goes like this: TLS provides channel confidentiality and integrity to both parties; an X.509 certificate (countersigned by some sort of CA) offers evidence that the server is whom the client expects it to be; all resources are served from https:// URLs, thus the channel’s integrity guarantee can be taken to apply to the content; the client identifies itself to the server with either a username and password, or a third-party identity voucher (OAuth, OpenID, etc), which is exchanged for a session cookie. Nobody can impersonate the server without either subverting a CA or stealing the server’s private key, but all of the client’s proffered credentials are bearer tokens: anyone who can read them can impersonate the client to the server, probably for an extended period. TLS’s channel confidentiality assures that no one in the middle can read the tokens, but there are an awful lot of ways they can leak at the endpoints. Security-conscious sites nowadays have been adding one-time passwords and/or computer-identifying secondary cookies, but the combination of session cookie and secondary cookie is still a bearer token (possibly you also have to masquerade the client’s IP address).

Here are some design requirements for a better scheme:

  • Identify clients to servers using something that is not a bearer token: that is, even if client and server are communicating on an open (not confidential) channel, no eavesdropper gains sufficient information to impersonate client to server.
  • Provide application-layer message authentication in both directions: that is, both receivers can verify that each HTTP query and response is what the sender sent, without relying on TLS’s channel integrity assurance.
  • The application layer MACs should be cryptographically bound to the TLS server certificate (server→client) and the long-term client identity (when available) (client→server).
  • Neither party should be able to forge MACs in the name of their peer (i.e. server does not gain ability to impersonate client to a third party, and vice versa).
  • The client should not implicitly identify itself to the server when the user thinks they’re logged out.
  • Must afford at least as much design flexibility to site authors as the status quo.
  • Must gracefully degrade to the status quo when only one party supports the new system.
  • Must minimize number of additional expensive cryptographic operations on the server.
  • Must minimize server-held state.
  • Must not make server administrators deal with X.509 more than they already do.
  • Compromise of any key material that has to be held in online storage must not be a catastrophe.
  • If we can build a foundation for getting away from the CA quagmire in here somewhere, that would be nice.
  • If we can free sites from having to maintain databases of hashed passwords, that would be really nice.

The cryptographic primitives we need for this look something like:

  • A dirt-cheap asymmetric (verifier cannot forge signatures) message authentication code.
  • A mechanism for mutual agreement to session keys for the above MAC.
  • A reasonably efficient zero-knowledge proof of identity which can be bootstrapped from existing credentials (e.g. username+password pairs).
  • A way to bind one party’s contribution to the session keys to other credentials, such as the TLS shared secret, long-term client identity, and server certificate.

And here are some preliminary notes on how the protocol might work:

  • New HTTP query and response headers, sent only over TLS, declare client and server willingness to participate in the new scheme, and carry the first steps of the session key agreement protocol.
  • More new HTTP query and response headers sign each query and response once keys are negotiated.
  • The server always binds its half of the key agreement to its TLS identity (possibly via some intermediate key).
  • Upon explicit login action, the session key is renegotiated with the client identity tied in as well, and the server is provided with a zero-knowledge proof of the client’s long-term identity. This probably works via some combination of HTTP headers and new HTML form elements (<input type="password" method="zkp"> perhaps?)
  • Login provides the client with a ticket which can be used for an extended period as backup for new session key negotiations (thus providing a mechanism for automatic login for new sessions). The ticket must be useless without actual knowledge of the client’s long-term identity. The server-side state associated with this ticket must not be confidential (i.e. learning it is useless to an attacker) and ideally should be no more than a list of serial numbers for currently-valid tickets for that user.
  • Logout destroys the ticket by removing its serial number from the list.
  • If the client side of the zero-knowledge proof can be carried out in JavaScript as a fallback, the server need not store passwords at all, only ZKP verifier information; in that circumstance it would issue bearer session cookies instead of a ticket + renegotiated sesson authentication keys. (This is strictly an improvement over the status quo, so the usual objections to crypto in JS do not apply.) Servers that want to maintain compatibility with old clients that don’t support JavaScript can go on storing hashed passwords server-side.

I know all of this is possible except maybe the dirt-cheap asymmetric MAC, but I don’t know what cryptographers would pick for the primitives. I’m also not sure what to do to make it interoperable with OpenID etc.

HTML Fragment Parser with Substitution and Syntactic Sugar

This is a little off my usual beaten path, but what the heck.

This is two related proposals: one for a new DOM feature, document.parseDocumentFragment, and one for JS syntactic sugar for that feature. It is a response to Ian Hickson’s E4H Strawman, and is partially inspired by the general quasi-literal proposal for ES-Harmony.

Compared to Hixie’s proposal, this avoids embedding a subset of the HTML grammar in the JS grammar, while at the same time being more likely to conform with author expectations, since the HTML actually gets parsed by the HTML parser. It should have at least equivalent expressivity and power.

Motivating Example

function addUserBox(userlist, username, icon, attrs) {
  var section = h`<section class="user" {attrs}>
                    <h1>{username}</h1>
                  </section>`;
  if (icon)
    section.append(h`<img src="{icon}" alt="">`);
  userlist.append(section);
}

Continued…

Breaking things every six weeks

Attention conservation notice: 900 words of inside baseball about Mozilla. No security content whatsoever.

The Mozilla Project has been taking a whole lot of flak recently over its new rapid release cycle, in which there is a new major version of Firefox (and Thunderbird) every six weeks, and it potentially breaks all your extensions. Especially the big complicated extensions like Firebug that people cannot live without. One might reasonably ask, what the hell? Why would any software development team in their right mind—especially a team developing a critical piece of system infrastructure, which is what Web browsers are these days, like it or not—inflict unpredictable breakage on all their users at six-week intervals?

Continued…

A Zany Scheme for Compact Secure Hashes

Lots of current and near-future tech relies heavily on secure hashes as identifiers; these are usually represented as hexadecimal strings. For instance, in a previous post I threw out the strawman h: URN scheme that looks like this:

<!-- jQuery 1.5.2 -->
<script src="h:sha1,b8dcaa1c866905c0bdb0b70c8e564ff1c3fe27ad"></script>

Now the problem with this is, these hexadecimal strings are inconveniently long and are only going to get longer. SHA-1 (as shown above) produces 160-bit hashes, which take 40 characters to represent in hex. That algorithm is looking kinda creaky these days; the most convenient replacement is SHA-256. As the name implies, it produces 256-bit hashes, which take 64 characters to write out in hex. The next generation of secure hash algorithms, currently under development at NIST, are also going to produce 256-bit (and up) hashes. The inconvenience of these lengthy hashes becomes even worse if we want to use them as components of a URI with structure to it (as opposed to being the entirety of a URN, as above). Clearly some encoding other than hex, with its 2x expansion, is desirable.

Hashes are incompressible, so we can’t hope to pack a 256-bit hash into fewer than 32 characters, or a 160-bit hash into fewer than 20 characters. And we can’t just dump the raw binary string into our HTML, because HTML is not designed for that—there is no way to tell the HTML parser the next 20 characters are a binary literal. However, what we can do is find 256 printable, letter-like characters within the first few hundred Unicode code points and use them as an encoding of the 256 possible bytes. Continuing with the jQuery example, that might look something like this:

<script src="h:sha1,пՎЦbηúFԱщблMπĒÇճԴցmЩ"></script> <!-- jQuery 1.5.2 -->

See how we can fit the annotation on the same line now? Even with sha256, it’s still a little shorter than the original in hex:

<!-- jQuery 1.5.2 -->
<script src="h:sha256,ρKZհνàêþГJEχdKmՌYψիցyԷթνлшъÁÐFДÂ"></script>

Here’s my proposed encoding table:

    0              0 1              1
    0123456789ABCDEF 0123456789ABCDEF
 00 ABCDEFGHIJKLMNOP QRSTUVWXYZÞabcde
 20 fghijklmnopqrstu vwxyzþ0123456789
 40 ÀÈÌÒÙÁÉÍÓÚÂÊÎÔÛÇ ÄËÏÖÜĀĒĪŌŪĂĔĬŎŬÐ
 60 àèìòùáéíóúâêîôûç äëïöüāēīōūăĕĭŏŭð
 80 αβγδεζηθικλμνξπρ ςστυφχψωϐϑϒϕϖϞϰϱ
 A0 БГДЖЗИЙЛПФЦЧШЩЪЬ бгджзийлпфцчшщъь
 C0 ԱԲԳԴԵԶԷԸԹԺԻԽԾԿՀՁ ՂՃՄՅՆՇՈՉՊՋՌՍՎՐՑՒ
 E0 աբգդեզէըթժիխծկհձ ղճմյնշոչպջռսվրցւ

All of the characters in this table have one- or two-byte encodings in UTF-8. Every punctuation character below U+007F is given special meaning in some context or other, so I didn’t use any of them. This unfortunately does mean that only 62 of the 256 bytes get one-byte encodings, but storage compactness is not the point here, and it’s no worse than hex, anyway. What this gets us is display compactness: a 256-bit hash will occupy exactly 32 columns in your text editor, leaving room for at least a few other things on the same line.

Choosing the characters is a little tricky. A whole lot of the code space below U+07FF is taken up by characters we can’t use for this purpose—composing diacritics, control characters, punctuation, and right-to-left scripts. I didn’t want to use diacritics (even in precomposed form) or pairs of characters that might be visually identical to each other in some (combination of) fonts. Unfortunately, even with the rich well of Cyrillic and Armenian to work with, I wasn’t able to avoid using a bunch of Latin-alphabet diacritics. Someone a little more familiar with the repertoire might be able to do better.

Legibility of embedded Web fonts

It’s recently become possible to embed fonts in your website, so that you aren’t limited to using the same old fonts that everyone already has on their computer. Yay! Unfortunately, there are a lot of gotchas. Lots of people discuss the technical gotchas, but when you get past that, you’ve still got to worry about legibility.

Continued…

Strawman: MIME type for fonts

For a little while now, it’s been possible for websites to embed fonts that all major browsers will pick up on. This of course implies fonts being served as HTTP resources. But it turns out that nobody has bothered to assign any of the common font formats a MIME type.1 Fonts being embedded on the web nowadays come in two flavors and three kinds of container: you either have TrueType or PostScript CFF-style outline glyphs, and they are in a bare OpenType (really sfnt) container, or else compressed with either WOFF or EOT. (I am ignoring SVG fonts, which are spottily supported and open several cans of worms that I don’t want to get into right now.) In the future, people might also want to embed TTC font collections, which are also in a sfnt container and could thus also be compressed with WOFF—not sure about EOT there—and bare PostScript Type 1 fonts, but neither of these is supported in any browser at present, as far as I know. There is no official MIME type for any of these combinations; therefore, people deploying fonts over HTTP have been making them up. Without trying very hard, I found real sites using all of: application/ttf, application/otf, application/truetype, application/opentype, application/woff, application/eot, any of the above with an x-prefix, or any of the above in font/ instead of application/ (with or without the x-). There is no top-level font MIME category, making this last particularly egregious.

All of these made-up types work because browsers don’t pay any attention to the content type of a web-embedded font; they look at the data stream, and if it’s recognizably a font, they use it. Such sniffing has historically caused serious problems—recall my old post regarding CSS data theft—so you might expect me to be waving red flags and arguing for the entire feature to be pulled until we can get a standard MIME category for fonts, standard subtypes for the common ones, and browsers to start ignoring fonts served with the wrong type. But I’m not. I have serious misgivings about the whole the server-supplied Content-Type header is gospel truth, content sniffing is evil thing, and I think the font situation makes a nice test case for moving away from that model a bit.

Content types are a security issue because many of the file formats used on the web are ambiguous. You can make a well-formed HTML document that is simultaneously a well-formed CSS style sheet or JavaScript program, and attackers can and have taken advantage of this. But this isn’t necessarily the case for fonts. The sfnt container and its compressed variants are self-describing, unambiguously identifiable binary formats. Browsers thoroughly validate fonts before using them (because an accidentally malformed font can break the OS’s text drawing code), and don’t allow them to do anything but provide glyphs for text. A good analogy is to images: browsers also completely ignore the server’s content-type header for anything sent down for an <img>, and that doesn’t cause security holes—because images are also in self-describing binary formats, are thoroughly validated before use, and can’t do anything but define the appearance of a rectangle on the screen. We do not need filtering on the metadata, because we have filtering on the data itself.

Nonetheless, there may be value in having a MIME label for fonts as opposed to other kinds of binary blobs. For instance, if the server doesn’t think the file it has is a font, shouldn’t it be able to convince the browser of that, regardless of whether the contents of the file are indistinguishable from a font? (Old hands may recognize this as one of the usual rationales for not promoting text/plain to text/html just because the HTTP response body happens to begin with <!DOCTYPE.) The current draft standard algorithm for content sniffing takes this attitude with images, recommending that browsers only treat HTTP responses as images if their declared content type is in the image/ category, but ignore the subtype and sniff for the actual image format. With that in mind, here’s my proposal: let’s standardize application/font as the MIME type for all fonts delivered over the Internet, regardless of their format. Browsers should use only fonts delivered with that MIME type, but should detect the actual format based on the contents of the response body.

I can think of two potential problems with this scheme. First, it would be good if browsers could tell servers (using the normal Accept: mechanism) which specific font formats they understand. Right now, it’s reasonable to insist that browsers should be able to handle either TrueType or PostScript glyph definitions, in either bare sfnt or compressed WOFF containers, and ignore the other possibilities, but that state won’t endure forever. SVG fonts might become useful someday (if those cans of worms can be resolved to everyone’s satisfaction), or someone might come up with a new binary font format that has genuine advantages over OpenType. I think this should probably be handled with accept parameters, for instance Accept: application/font;container=sfnt could mean I understand all OpenType fonts but no others. The other problem is, what if someone comes up with a font format that can’t reliably be distinguished from an OpenType font based on the file contents? Well, this is pretty darn unlikely, and we can put it into the RFC defining application/font that future font formats need to be distinguishable or else get their own MIME type. The sfnt container keeps its magic number (and several other things that ought to be in the file header) in the wrong place, but as long as all the other font formats that we care about put their magic number at the beginning of the file where it belongs, that’s not a problem.


1 To be precise, there is a standard MIME type for a font format: RFC 3073 defines application/font-tdpfr for the Bitstream PFR font format, which nobody uses anymore, except possibly some proprietary television-related products. Bitstream appear to have been trying to get it used for web fonts back in the days of Netscape 4, and then to have given up on it, probably because the font foundries’ attitude was NO YOU CAN’T HAS LICENSE FOR WEBS until just last year.

Data theft with CSS

Mozilla has released security updates to Firefox 3.5 and 3.6 that include defenses for an old, little-known, but serious security hole: cross-site data theft using CSS. These defenses have a small but significant chance of breaking websites that rely on quirks mode rendering and use a server in another DNS domain (e.g. a CDN) for their style sheets.

In this article I’ll describe the attack, what we’re doing about it, how you can ensure that your site will continue to work, and how you can protect your users who have not upgraded their browsers yet.

Continued…