CCS 2012 Conference Report

The ACM held its annual Conference on Computer and Communications Security two weeks ago today in Raleigh, North Carolina. CCS is larger than Oakland and has two presentation tracks; I attended less than half of the talks, and my brain was still completely full afterward. Instead of doing one exhaustive post per day like I did with Oakland, I’m just going to highlight a handful of interesting papers over the course of the entire conference, plus the pre-conference Workshop on Privacy in the Electronic Society.

Note, paper links may go to expanded technical reports instead of as-presented papers, since obviously I am not going to link to the official editions behind ACM’s paywall. There were some talks that I didn’t write up, despite their interestingness, because I couldn’t find an unencumbered paper to link to—cavete auctores!

Monday (WPES)

An Approach for Identifying JavaScript-loaded Advertisements through Static Analysis

Right now the state of the art for blocking out ads on the Web is with gigantic URL-based blacklists—the popular EasyList for AdBlock Plus contains 18,000 entries according to the speaker, with new entries added at a rate of five to fifteen a week, and obsolete entries hardly ever removed. This paper proposes instead to use static analysis and machine learning to detect ad-related JavaScript and prevent it from executing. The claim is that this will be easier to maintain, more robust, and scale better. They wrote a browser extension that preprocesses incoming JavaScript through some basic optimizations (constant folding, mostly) and then looks for a handful of features that are more likely to appear in ad-loading JavaScript. There are a number of problems related to figuring out what to do next (see the paper) but as a proof of concept it seems to work quite well, with classification accuracy in the 98% range. It has trouble with analytics and HTML generation libraries, both of which share features with ad-loading scripts.

In the question period, someone asked whether they thought they could keep up with the rapidly evolving ad ecosystem, and they said well, this general approach works pretty well for spam filtering, which I thought was telling—there is, after all, substantial overlap. They also said that they thought the same general approach would work for tracking protection but it would require its own classifier.

What Do Online Behavioral Advertising Privacy Disclosures Communicate to Users?

Online behaviorally-targeted advertising is often tagged with a little icon and/or short phrase which are hyperlinks to landing pages that talk about the behavioral targeting and may offer the opportunity to disable ad targeting (but not the associated behavioral tracking). This is part of an industry self-regulatory program which is supposed to make behavioral targeting more palatable. The study investigated what, if anything, these tags actually communicate to end-users, and how they react. Participants were shown a variety of ads, with between-subjects randomized tags, and then quizzed about what they thought the tags meant and what the landing pages communicated. Takeaways include:

  • People mostly don’t even notice these tags.
  • The icons used are meaningless, and most of the short phrases do not communicate that this is something clickable.
  • After participants’ attention was drawn to the tags, more than half of them thought that clicking on them would cause more ads to pop up, increase ad frequency overall, and/or signal interest in the product currently being advertised. Some of the short phrases suggested an offer to buy advertising on the current website.
  • The landing pages do not clearly make the distinction between disabling ad targeting (which is offered) and disabling behavioral tracking (which is not offered).

The speaker carefully avoided the elephant in this particular room, i.e. that advertisers are motivated to make their disclosure tags and landing pages as nonobvious and unfriendly as possible, because they don’t want people to disable behavioral ad targeting.

Changing of the Guards: A Framework for Understanding and Improving Entry Guard Selection in Tor

Entry guards are a designated subset of Tor relays that are considered reliable and probably-nonmalicious enough to use as entry nodes. The Tor directory authorities maintain a large list of potential entry guards; Tor clients pick a smaller set of nodes off the list, and route all circuits through them. (This is done to reduce the probability that the first relay in the chain will be malicious; a malicious entry node can do rather more damage to client anonymity than a malicious node later in the chain.)

This paper is an empirical investigation of how well this scheme works in practice, and whether it can be improved. They only have preliminary conclusions, but some of those are pretty telling: long-lived entry guards accumulate clients over time, and long-lived malicious nodes are likely to become guards. It’s unclear how to do better than the present set of heuristics, though.

I’m highlighting this paper as much because of its clever methodology as anything else: experiments were run entirely in simulation, but the simulated Tor network is configured to match the real network, according to the public relay directory. This seems like an effective strategy that could be applied to other sorts of network simulation experiments.

Tuesday

The Most Dangerous Code in the World: Validating SSL Certificates in Non-Browser Software

SSL (also known as TLS) is the most widely deployed implementation of the cryptosystem primitive known as a secure channel, which is supposed to deliver three security properties: confidentiality (nobody can eavesdrop on data in transit), integrity (nobody can modify data in transit), and authenticity (the transceiver at the other end of the channel is who you think they are). Authenticity is critical to real-world security, because the other two properties by themselves do not protect against a man-in-the-middle attack. (How does someone get to be in the middle, you might wonder? One popular technique is to load malware onto the local network hub, wireless router, etc.)

SSL provides authenticity via certificates of identity, which at least one side transmits for the other to verify, before communicating. Verification is a complicated process that must be done correctly or authenticity is lost. The point of this talk is that, while Web browsers (by dint of fifteen years of bug fixing) usually get certificate verification right, most of the other software that uses SSL has not had the benefit of fifteen years of bug fixes, and so frequently gets it wrong. They audited a wide variety of middleware libraries and applications, found lots of bugs, and make the strong claim that basically all non-browser SSL-using applications are insecure against an active man-in-the-middle attack.

Why so terrible? Well, the authors blame the ridiculous complication of both the certificate scheme itself, and the library APIs involved. One worked example stuck with me: Amazon Payments provides a client library in PHP. That code calls into a C library (libcurl) which calls another C library (libssl) to perform the actual crypto. libssl has dozens of options, all of which are faithfully reflected up through libcurl to the PHP bindings that the Amazon Payments library uses. Many of those options are intended only for debugging, but the author of the Amazon code zealously set them all, and set one of them to a value that defeats security, without realizing it.

Why Eve and Mallory Love Android: An Analysis of Android SSL (In)Security

On the same theme as the previous talk: Android’s stock runtime libraries implement certificate validation correctly. What could possibly go wrong? Well, you can disable validation, and lots of people have found it easier to disable validation than to arrange for their servers to have good certificates all the time. They analyzed 13,500 apps from the Android Market and found just over 1000 instances where validation had been disabled. A manual audit of a smaller set of apps found 41 out of 100 made some kind of related mistake which also destroys security. They demoed injecting a malicious update to a virus scanner’s signature base, causing the scanner to detect itself as malware and delete itself.

They didn’t talk at all about why this happens, I would speculate it’s an operational problem at root, rather than a coding mistake. Not only are certificates ridiculously complicated, getting them and deploying them to all the necessary servers is difficult. If you’re an app developer and you’re under time pressure and your company’s sysadmins are taking forever to get around to setting up the server correctly… disabling verification may be the path of least resistance.

Routing Around Decoys

Decoy routing is a scheme for censorship evasion, in which the end-user’s machine sends out traffic overtly intended for an innocuous site; routers somewhere in the backbone are programmed to notice a covert message in that traffic, and divert it to the censored site that the user actually wanted. This paper points out that the adversary in this scheme is normally in control of the routing infrastructure for the evasive user’s AS and can therefore control how that user’s packets get routed. This allows them to pull a variety of TCP-level tricks to detect decoy routing, and then disrupt it simply by choosing BGP routes that don’t go through the decoy routers.

Thus, for decoy routing to work, there have to be a bunch of important overt destinations that are completely behind decoy routers, from the censorious AS’s perspective. Running the numbers for the usual suspect ASes indicates that you have to get an impractically huge number of backbone providers to deploy decoy routers.

Wednesday

Operating System Framed in Case of Mistaken Identity

This is a modern user study on one of the oldest problems in the computer security book: If you are prompted to type your password, how do you know that the program prompting you is entitled to know your password? In addition to the well-known phishing sites that try to steal credentials for a particular site, malware is known to try to steal local account passwords in hopes that they are also passwords for high-value online services. The user study presented itself (to MTurk users) as an opinion poll of various online games, but one of the games in the sequence reported a missing browser plugin and popped up a fake OS installation-permission dialog, prompting for an administrative password. The visual deception was not perfect (notably, Windows always dims out the rest of the screen when it puts up a legitimate request for administrative credentials, which is impossible to fake from inside a webpage) but it appears that the majority of participants did not notice. It’s unclear how many people were genuinely deceived, since of course there is no way for the experimenters to tell whether any password entered was real. Only 20% of participants admitted to having typed in a real password, but the majority of participants claimed to have thought the prompt was real, and rejected the request on other grounds (e.g. not wanting to install plugins).

No solutions are offered, but considering how old and thorny this problem is, we can’t really complain.

The Devil is in the (Implementation) Details: An Empirical Analysis of OAuth SSO Systems

OAuth is a widely adopted federated authentication scheme. It’s quite complicated, and the 2.0 revision is even more complicated, to the point where its spec editors are quitting in disgust. Its security depends, of course, on implementation correctness.

This study did a deep dive on a hand-picked set of very popular websites that use OAuth (if these guys get it wrong, what can we expect for everyone else?) and find all kinds of security-breaking errors. 32% of the relying parties in their study are vulnerable to a network eavesdropper stealing an access token (which are not supposed to be sent to the relying site in cleartext, but people do it anyway; site developers may be under the misapprehension that OAuth makes SSL unnecessary). 64% of RPs mis-use public identifiers (e.g. Facebook account IDs) as credentials, allowing impersonation by anyone who knows the public identifier. And nearly all RPs have inadequate defense-in-depth against an XSS exploit stealing access tokens (it is not clear to me whether this is a flaw in the relying sites, or in OAuth itself; successful XSS is generally considered game over anyway, but if this allows an attacker to escalate a credential for one site into a pluripotent single-sign-on credential, that’s much worse.

They didn’t have time to go into it in the talk, but the paper has a number of suggestions for how identity providers can improve their APIs so that it’s harder for RPs to get things wrong. I approve of this approach; I don’t know enough about the problem space to assess whether their particular suggestions are helpful.

Strengthening User Authentication through Opportunistic Cryptographic Identity Assertions

This proposes a better user experience for two-factor authentication using a smartphone as a second factor. Right now some sites (notably Google) will send you a text message with a numeric code you type back into the site, or else offer an application that shows you a numeric code that changes every minute, which again you have to type in. Instead, they propose to have the computer talk directly to the phone over unpaired Bluetooth, eliminating all user actions after pressing login. Bluetooth is notoriously slow but they claim that it is still faster than reading the number off the phone and typing it in, and regardless it seems like it would be a more pleasant user experience. However, I couldn’t tell you which of my computers actually speak Bluetooth, and if you were on a machine with an old browser you might be hosed.

Question from the audience: don’t most people leave Bluetooth off all the time because it drains the batteries? Answer: dunno, hasn’t that been fixed by now?

Touching from a Distance: Website Fingerprinting Attacks and Defenses

Fingerprinting attacks have been around for a while. The game is, suppose a victim loads a website via an anonymizing service, which provides an encrypted channel to a generic IP address. An attacker sees all the traffic on the encrypted channel, but can’t read it and can’t observe its ultimate destination. (Whether the anonymizing service is a simple proxy or a mix network is moot, because the attacker is snooping directly on the victim.) Can the attacker still deduce what website is being visited? Maybe. The attacker can still observe the size and direction of each packet, and the inter-packet interval for each pair of packets, so the idea is to record the patterns of packets generated by known page loads, then try to match those against traffic going to the anonymizing service. Most of the literature only uses packet size and direction. Per-page accuracy in the 60-80% range, within a closed world of 100 to 2000 pages (almost always site front pages), is considered good.

This paper tries to improve fingerprint accuracy for individual pages by using Damerau-Levenshtein edit distance as the distance metric for a support vector machine, but the more interesting idea in the paper (unfortunately not covered in the talk) is to use hidden Markov models to generalize from individual pages to entire sites. If the victim is looking at a particular page, it’s more likely that they will load one of its outgoing hyperlinks next. The attacker builds a hidden Markov model of each site of interest, and uses it to predict a typical pattern of page loads, which in turn adjusts the per-page classifiers’ thresholds.

Thursday

You Are What You Include: Large-scale Evaluation of Remote JavaScript Inclusions

Problem statement: We know cross-site inclusion of JS is ubiquitous; who is trusted to provide JS libraries? How hard would it be to attack a JS library provider? Are there attack vectors that are non-obvious?

They give a few examples of actual exploits of JS library providers, then move on to an analysis of a 3.3-million-page JS-aware web crawl, within which they find 300,000 unique scripts loaded from 20,000 remote hosts. There is, unsurprisingly, a Zipf-ish distribution of script popularity. Five of the ten most-frequently-included scripts belong to Google and another three belong to behind the scenes analytics agencies that are invisible to end users. (The remaining two are the Facebook and Twitter APIs.)

Common, exploitable errors include:

  • Requesting JS from localhost, i.e. the host running the browser, often on high port numbers. Malware can take advantage of this to mount attacks on sites, even in the presence of local privilege barriers (e.g. a malicious Android app normally cannot poke the browser).
  • Similarly, requesting JS from private IP space—now the malware just has to be on the same network as the browser.
  • Requesting JS from a site whose domain registration has expired; anyone could reregister it.
  • Similarly, requesting JS from a mistyped domain (they gave the example of googlesyndicatio.com with the final n left off) or from an IP address that has been reassigned.

They also pointed out that coarse-grained sandboxing won’t help because the intended scripts need too many privileges, and that it’s unusual for the scripts to change more often than once a week, so maintaining local copies might be feasible, given sufficient operational will and manpower.

Scriptless Attacks—Stealing the Pie Without Touching the Sill

This paper demonstrates a variety of XSS-style attacks that don’t require any scripting at all, bypassing existing XSS filters, CSP, NoScript—some even work in HTML-enabled mail readers. (This is your periodic reminder that nobody should ever send or accept HTML in email.)

The attacks, in general, work by exploiting some other feature of the Web platform that can conditionally trigger tailored requests to a malicious server: even when scripting is unavailable, it may be possible to inject these features. HTML form validation can be applied to hidden form fields and can trigger URL loads if regular expressions match. Invisible SVG files can use <set> elements to capture keystrokes (this is the one that works in Thunderbird). Custom fonts (using SVG, or OpenType’s discretionary ligatures) can control the size of the viewport, which together with media queries, can trigger URL loads. (This one seems a bit too baroque to be practical, but you never do know with these things.)