The ethics of preventing third-party net filtering

I haven’t posted anything research-related in a while because I’ve been on a project that I’m not supposed to talk about till it’s done, and it’s not done yet. I can say, though, that it’s about ways to get around country-scale filtration of the Internet. I’m writing it up now, starting with the threat model, as you do:

Alice Arishat wishes to publish things for Brutus to read. Cato does not approve of what Arishat has to say, and seeks to prevent her from publishing anything.

Most online discussion of “censorship” starts from the premise that Cato is automatically in the wrong here. That’s one of the cypherpunk premises that underpin most discussion of theoretical Internet security. I want to play devil’s advocate today, though, and explore circumstances where we might choose to support Cato. In the offline world, we trade off “free speech” against all sorts of other values every day:

Continue reading

Posted in Research | 14 Comments

unearthed arcana (music division)

Some time ago—I don’t remember how long precisely—I started working on a mixtape. I got as far as writing down a bunch of songs in categories, and then I lost interest, and the list has been cluttering up my desk ever since. The category tags no longer make a great deal of sense and I’m not even sure who sings some of these songs anymore, but if I put it into the computer then I can get rid of the paper cluttering up my desk, and maybe the magic of the internets will do something with it.

Continue reading

Posted in Uncategorized | 3 Comments

test your file locking

This PUBLIC SERVICE ANNOUNCEMENT is brought to you by the I JUST WASTED AN HOUR ON THAT Foundation:

Do you suffer from mysteriously hanging autotools processes? Or perhaps other mysteriously hanging processes? If so, you may have a problem with your file locking, and the IJWAHOT Foundation recommends you compile and run this program on the computer with the problem, preferably under strace or equivalent. If it, too, hangs, then you do indeed have a problem with your file locking. The Foundation does not presently know the cause of this problem, but we suspect that it is NFS’s fault somehow. If you do know the cause of this problem, we would love to hear about it in the comments.

Posted in Possibly Useful | 1 Comment

Breaking things every six weeks

Attention conservation notice: 900 words of inside baseball about Mozilla. No security content whatsoever.

The Mozilla Project has been taking a whole lot of flak recently over its new “rapid release cycle”, in which there is a new major version of Firefox (and Thunderbird) every six weeks, and it potentially breaks all your extensions. Especially the big complicated extensions like Firebug that people cannot live without. One might reasonably ask, what the hell? Why would any software development team in their right mind—especially a team developing a critical piece of system infrastructure, which is what Web browsers are these days, like it or not—inflict unpredictable breakage on all their users at six-week intervals?

Continue reading

Posted in HTML &c | 25 Comments

Icons of the Future City

Way back at the 2010 Mozilla Summit, one of the keynote speakers showed us an amazing demo flythrough of a 3D-rendered futuristic city, with embedded video, tweets, and the like, all running live inside a Firefox 4 beta thanks to awesome new tech like WebGL and JägerMonkey. (Note: in the linked video, the city only appears about a minute in.) That’s not what I want to talk about, though.

It occurred to me while I was watching, that there is a standard futuristic city used in demos like this one. It’s night. You can’t see the ground. Skyscrapers stretch all the way to the horizon. Said skyscrapers are glass oblongs, for the most part; this demo mixed it up quite a bit with interesting cross-sections, but still had hardly any ornamentation, terracing, or what-have-you. All the skyscrapers’ windows are lit up. There may be flying vehicles between or around the towers, but there is no sign of any other type of transportation. It is, in short, the future of the Futurists of the nineteen-teens, the city of Metropolis, Blade Runner, and Neuromancer.

Now the thing is, no city in the real world has ever looked like that. Even in the densest and most skyscraper-ful urban areas—have a look at these aerial videos of Manhattan and Hong Kong, for instance—there are buildings that are less than ten stories tall (these are in fact the majority in Manhattan, although possibly not in Hong Kong); there are parks and other open spaces; and by no means are all of the buildings boring oblongs. Furthermore, people doing actual urban design argue, vehemently, over whether or not dense skyscraper-ful cities are best (e.g.: pro, con) and I think nobody would argue, anymore, that open space is unnecessary.

And yet, when we want an icon of the city of the Future, the Futurists’ vision is what we turn to. Why? Perhaps because it’s instantly recognizable, or because it’s easy to build 3D models for. But I claim this is causing this discredited vision to occupy a share of the casual imagination that it does not deserve anymore. It crowds out other visions with its readiness to hand. Let’s invent some new icons for the future city. Let’s make the next demo flythrough be of something like this or this or this. (But watch out for the just-as-discredited “Radiant City” vision, please.)

Posted in Fiction | 3 Comments

A Zany Scheme for Compact Secure Hashes

Lots of current and near-future tech relies heavily on secure hashes as identifiers; these are usually represented as hexadecimal strings. For instance, in a previous post I threw out the strawman h: URN scheme that looks like this:

 <!-- jQuery 1.5.2 -->
 <script src="h:sha1,b8dcaa1c866905c0bdb0b70c8e564ff1c3fe27ad"></script>

Now the problem with this is, these hexadecimal strings are inconveniently long and are only going to get longer. SHA-1 (as shown above) produces 160-bit hashes, which take 40 characters to represent in hex. That algorithm is looking kinda creaky these days; the most convenient replacement is SHA-256. As the name implies, it produces 256-bit hashes, which take 64 characters to write out in hex. The next generation of secure hash algorithms, currently under development at NIST, are also going to produce 256-bit (and up) hashes. The inconvenience of these lengthy hashes becomes even worse if we want to use them as components of a URI with structure to it (as opposed to being the entirety of a URN, as above). Clearly some encoding other than hex, with its 2x expansion, is desirable.

Hashes are incompressible, so we can’t hope to pack a 256-bit hash into fewer than 32 characters, or a 160-bit hash into fewer than 20 characters. And we can’t just dump the raw binary string into our HTML, because HTML is not designed for that—there is no way to tell the HTML parser “the next 20 characters are a binary literal”. However, what we can do is find 256 printable, letter-like characters within the first few hundred Unicode code points and use them as an encoding of the 256 possible bytes. Continuing with the jQuery example, that might look something like this:

<script src="h:sha1,пՎЦbηúFԱщблMπĒÇճԴցmЩ"></script><!-- jQuery 1.5.2 -->

See how we can fit the annotation on the same line now? Even with sha256, it’s still a little shorter than the original in hex:

<!-- jQuery 1.5.2 -->
<script src="h:sha256,ρKZհνàêþГJEχdKmՌYψիցyԷթνлшъÁÐFДÂ"></script>

Here’s my proposed encoding table:

    0              0 1              1
    0123456789ABCDEF 0123456789ABCDEF
 00 ABCDEFGHIJKLMNOP QRSTUVWXYZÞabcde
 20 fghijklmnopqrstu vwxyzþ0123456789
 40 ÀÈÌÒÙÁÉÍÓÚÂÊÎÔÛÇ ÄËÏÖÜĀĒĪŌŪĂĔĬŎŬÐ
 60 àèìòùáéíóúâêîôûç äëïöüāēīōūăĕĭŏŭð
 80 αβγδεζηθικλμνξπρ ςστυφχψωϐϑϒϕϖϞϰϱ
 A0 БГДЖЗИЙЛПФЦЧШЩЪЬ бгджзийлпфцчшщъь
 C0 ԱԲԳԴԵԶԷԸԹԺԻԽԾԿՀՁ ՂՃՄՅՆՇՈՉՊՋՌՍՎՐՑՒ
 E0 աբգդեզէըթժիխծկհձ ղճմյնշոչպջռսվրցւ

All of the characters in this table have one- or two-byte encodings in UTF-8. Every punctuation character below U+007F is given special meaning in some context or other, so I didn’t use any of them. This unfortunately does mean that only 62 of the 256 bytes get one-byte encodings, but storage compactness is not the point here, and it’s no worse than hex, anyway. What this gets us is display compactness: a 256-bit hash will occupy exactly 32 columns in your text editor, leaving room for at least a few other things on the same line.

Choosing the characters is a little tricky. A whole lot of the code space below U+07FF is taken up by characters we can’t use for this purpose—composing diacritics, control characters, punctuation, and right-to-left scripts. I didn’t want to use diacritics (even in precomposed form) or pairs of characters that might be visually identical to each other in some (combination of) fonts. Unfortunately, even with the rich well of Cyrillic and Armenian to work with, I wasn’t able to avoid using a bunch of Latin-alphabet diacritics. Someone a little more familiar with the repertoire might be able to do better.

Posted in HTML &c | 13 Comments

How To Choose Passwords

When I talk to people who aren’t security researchers about history sniffing, they want to know whether they should worry about it, and I say no: the only thing you can do to protect yourself is use the latest version of your favorite browser, which you should do anyway; besides, the interactive attacks will probably never appear in the wild. But if I only ever talk about computer security topics that are only relevant to researchers, I’m not helping people as much as I could, and I’m scaring them about things they can’t control. So this post is about something you should worry about, because it’s under your direct control; lots of people do it poorly and that does make them less safe online; and it’s easy to do well. That thing is choosing passwords.

You have probably heard that you shouldn’t reuse the same password on many different websites, and that your passwords should be long, contain numbers and punctuation, and avoid dictionary words. But you probably haven’t heard anyone explain why, and you probably have noticed that these two pieces of advice are hard to follow at the same time, because long gibberish passwords are hard to remember even if you only have one of them. I’m going to tell you why you should do these things, and how to do them without too much grief.

Don’t use the same password on many different websites

No matter how good your password is, the bad guys might discover what it is. For instance, if you log into an unencrypted website over an unencrypted wireless network, anyone else on the same wireless network can listen in on the radio traffic and discover your password. (It’s just like eavesdropping on a private conversation.) Or you might accidentally type your password into a website that looks like the real thing but is actually a fake created to trick you.

Suppose the bad guys have discovered your password for a Web forum. That’s not a big deal, because someone impersonating you on one forum probably isn’t a big deal. You might have to apologize to some people for letting some schmuck insult them while pretending to be you. But the bad guys know that people often use the same password on many different websites, so they’re going to try to log into your email with that password, and your bank, and so on. If they succeed—if you did use the same password—they might be able to ruin your life, or at least steal some of your money. But if you always use different passwords on different websites, the bad guys have to discover the password you use for your bank (and nothing else) in order to steal your money.

How do you manage to remember lots of different passwords, especially when (as I’m about to explain) they all need to be long and complicated? The best way is to let the computer—specifically, your browser’s password manager—do it for you. This may seem unsafe, but it’s actually much safer than using the same password for everything. The password manager cannot be fooled by phishing sites, and it has no trouble remembering lots of long complicated passwords. Yes, all the passwords are in a file on your computer. But the only way the bad guys can get at that is by physically stealing your computer, or installing spyware on it remotely. If you keep your computer up to date with security patches, you don’t have to worry about spyware much. If your computer is in danger of being physically stolen (e.g. it’s a laptop) you should use the master password mode of your browser’s password manager, so that the file on your computer is encrypted. Whether or not you have to worry about theft, you should enable Sync, or equivalent feature, even if you have no other computer to sync with; that way, if your computer breaks, there’s still a backup of all your passwords out there in the cloud (safely encrypted).

Use long, complicated passwords

The other way the bad guys discover passwords is by breaking into servers that store entire databases of them. If these databases have been designed correctly, that doesn’t tell them anything by itself, because the passwords are hashed. Hashing deserves a little explanation: suppose my password on some site is “12345” (the kind of thing that an idiot would have on his luggage). The server doesn’t store “12345” in its database, it stores “827ccb0eea8a706c4c34a16891f84e7b”, which is the result of running “12345” through a cryptographic hash, in this case MD5. It’s easy to convert a password into its hash, but it’s prohibitively hard to do the reverse. MD5 is old and no longer considered a good choice for passwords (or anything, for that matter), but the fastest computer ever built would still take so long to recover “12345” from “827ccb0eea8a706c4c34a16891f84e7b” that the Sun would burn out before it was done.

So the bad guys can’t just read the passwords from a database once they have it. But they can guess passwords, run the guesses through MD5 (or whatever was used), and compare the results to the database entries. (They can guess passwords even if they haven’t stolen a database, by feeding the guesses to the site’s login form—but that’s much slower and the site admins are likely to notice.) “12345” isn’t a good password because it’s easy to guess—but so is any five-digit number: a cheap laptop can calculate the MD5 of all 100,000 five-digit (or smaller) numbers in less than a second. There are something like 250,000 words in English—that’s maybe five seconds’ worth of work for the same laptop—so any word in the dictionary is bad, too. You can buy a 40-million-entry word list for $30 that has not only all the words in 20 different languages, but mangled versions of them (e.g. “f0od”)—that might take an hour or two to process.

The longer and more complicated your password is, the harder it is to guess; but that makes it harder to remember as well. Adding punctuation and numbers doesn’t help as much as one would like. There are 95 characters that you can type on a US keyboard, so there are 958, or about a quadrillion (short scale) possible eight-character passwords, if you use all those characters. A quadrillion possibilities is out of the reach of a cheap laptop, but it’s a few weeks’ effort for a small cluster of beefy computers—a determined bad guy could do this for maybe $25,000.

The good news is, you can have passwords that can’t be guessed this way but are still easy to remember. The trick is to use phrases rather than words. One random English word is 250,000 possibilities. Two random English words are 62.5 billion possiblities—250,000 squared. That’s still not enough. But ten random English words is 250,00010=1054 possibilities, which is safely in “still guessing when the Sun burns out” territory.

You can’t take just any phrase, though. The bad guys could easily try every phrase in the Concise Oxford Dictionary of Quotations, because there are only 9000 of them. I haven’t worked out the math, but I think guessing every sentence in the complete works of Shakespeare is doable. But nobody has a database of every sentence in every work of literature that was written with the Latin alphabet. A phrase taken from somewhere in the middle of an obscure but lengthy book is a good choice. Or you could follow this procedure:

  1. Go to Wikipedia and click on “random article”. (You can use any site with a “random article” feature for this step, if you’d rather.)
  2. Copy the URL of the page you get, and paste it into the Eater of Meaning. Leave the drop-down on “Eat word endings.”
  3. Choose ten consecutive words from the result. They don’t have to all come from the same sentence.

Don’t worry about finding a sentence that you can remember yourself, because you’re going to have the password manager do it (unless you’re trying to pick the master password).

Some sites have limits on the length of their passwords. This is bad, and you should complain; but until they fix it, just use the first letter of each word in your ten-word phrase, with some numbers and punctuation if they insist on numbers and punctuation. That kind of password is theoretically crackable, as I said earlier, but it’s likely to be better than lots of other passwords in the database. So if the bad guys get the database, they will crack so many other people’s passwords before they get to yours that they don’t feel they have to bother cracking yours. (It’s kind of like the joke about how fast you need to run away from a lion.)

If there’s no limit on the length of the password, but the site still insists on numbers and/or punctuation, put them in between the words; that’s easier to type.

Posted in Research | 8 Comments

PSA: “like” buttons

Because I hit “empty spam” just a little too fast, erasing the question about this: There are no Facebook “like” buttons on this site because I myself barely ever use Facebook and don’t really see the point; same same digg, reddit, etc. If you like something you read here enough to want to promote it, please consider mentioning it somewhere you can put in a few words to explain why people should click through (twitter, Facebook wall, sort of thing). Or write a full-sized response article and link back.

Posted in Administrivia | Comments Off

Interactive history sniffing and its relatives

Readers of this blog will probably already know that, up till the middle of last year, it was possible to “sniff” browsing history by clever tricks involving CSS, JavaScript, and the venerable tradition of drawing hyperlinks to already-visited URLs in purple instead of blue. Last year, though, David Baron came up with a defense against history sniffing which has now been adopted by every major browser except Opera. One fewer thing to worry about when visiting the internets, hooray? Not so fast.

Imagine for a moment that the next time you visited an unfamiliar website and you wanted to leave a comment without creating an account, instead of one of those illegibly distorted codes that you have to type back in, you saw this:

Please click on all the chess pawns.
A six-by-six checkerboard grid with chess pawns in random locations. One of the pawns is green and has a mouse-cursor arrow pointing to it.

As you click on the pawns, they turn green. Nifty, innit? Much easier than an illegibly distorted code. Also easy for a spambot equipped with image processing software, but it turns out the distorted codes are not that hard for spambots anymore either and probably no one’s written the necessary image processing code for this one yet. Possibly also easier on people with poor eyesight, and there could still be a link to an audio challenge for people with no eyesight.

… What’s this got to do with history sniffing? That chessboard isn’t really a CAPTCHA. All the squares have pawns on them. But each one is a hyperlink, and the pawns linked to sites you haven’t visited are being drawn in the same color as the square, so they’re invisible. You only click on the pawns you can see, of course, and so you reveal to the site which of those URLs you have visited. A little technical cleverness is required—the pawns have to be Unicode dingbats, not images; all the normal interactive behavior of hyperlinks has to be suppressed; etcetera—but nothing too difficult. I and three other researchers with CMU Silicon Valley’s Web Security Group have tested this and a few other such fake CAPTCHAs on 300 people. We found them to be practical, although you have to be careful not to make the task too hard; for details please see our paper (to be presented at the IEEE Symposium on Security and Privacy, aka “Oakland 2011”).

An attacker obviously can’t use an “interactive sniffing” attack like this one to find out which sites out of the entire Alexa 10K your victim has visited—nobody’s going to work through that many chessboards—and for the same reason, deanonymization attacks that require the attacker to probe hundreds of thousands of URLs are out of reach. However, an attacker could reasonably probe a couple hundred URLs with an interactive attack, and according to Dongseok Jang’s study of actual history sniffing (paper), that’s about how many URLs real attackers want to sniff. It seems that the main thing real attackers want to know about your browsing history is which of their competitors you patronize, and that’s never going to need more than a few dozen URLs.

On the other hand, CAPTCHAs are such a hassle for users that they cause 10% to 33% attrition in conversion rates. And users don’t expect to see them on every visit to a site—just the first, usually, or each time they submit an anonymous comment. Even websites that were sniffing history when it was possible to do so automatically, and want to keep doing it, may consider that too high a price. But we can imagine similar attacks on higher-value information, where even a tiny success rate would be worth it. For instance, a malicious site could ask you to type a string of gibberish to continue—which happens to be your Amazon Web Services secret access key, IFRAMEd in from their management console. Amazon has taken steps to make this precise scenario difficult, but I’m not prepared to swear that it’s impossible, and other cloud services providers may have been less cautious.

Going forward, we also need to think carefully about how new web-platform capabilities might enable attackers to make similar end-runs around the browser’s security policies. In the aforementioned research project, we were also able to sniff history without user interaction by using a webcam to detect the color of the light reflecting off the user’s face; even with our remarkably crude image processing code this worked great as long as the user held still. It’s not terribly practical, because the user has to grant access to their webcam, and it involves putting an annoying flashing box on the screen, but it demonstrates the problem. We are particularly concerned about WebGL right now, since its “shader programs” can perform arbitrary computations and have access to cross-domain content that page JavaScript cannot see; there may well be a way for them to communicate back to page JavaScript that avoids the <canvas> element’s information leakage rules. Right now it’s not possible to put the rendering of a web page into a GL texture, so this couldn’t be used to snoop on browsing history, but there’s legitimate reasons to want to do that, so it might become possible in the future.

Posted in Research | 14 Comments

Classical Mechanics Interlude: Acceleration to stop in a constant distance

Over on twitter, @MegaManSE asked

does anyone know the equation to find the acceleration to stop a moving object in a constant distance given some random starting velocity?

I didn’t, at the time, know … but I do know how to work it out from first principles, and it makes a decent little classical mechanics exercise, and also an excuse to figure out how to get MathJax hooked up on this blog, which might be useful in the future. So here’s how it’s done.

The first step in solving one of these problems is to rewrite the question as formally as possible:

At time \(t=0\) an object is at position \(x=0\) and moving with velocity \(\nu=v\). Find the constant acceleration \(a\) such that at some future time \(t=T\), when the object is at position
\(x=d\), its velocity will be zero.

Now how do we do that? It’s time for just a little bit of integral calculus. Velocity is the rate at which a moving object’s position changes, as a function of time, and acceleration is the rate at which a moving object’s velocity changes, also as a function of time. The calculus was invented to answer the question, if I know what one of these is, what are the other two? It has a somewhat-deserved reputation for being confusing, but mostly that’s because it’s hard to explain how you come up with its rules. If you know the rules, they’re pretty easy to apply. The acceleration in this problem is constant, \(a\), and we know at time \(0\) the velocity is \(v\) and the position is \(0\). Therefore, the velocity at time \(t\) is

$$\nu(t) = v + \int_0^t a\; \text{d}t = v + at$$

and the position is

$$x(t) = 0 + \int_0^t v + at\; \text{d}t = 0 + vt + \frac{at^2}{2}$$

These are both functions of time, but we want to solve for acceleration as a function of distance and starting velocity. But that’s just a matter of algebra. We want \(\nu(T) = 0\), so we plug that into the first of these equations and solve for \(T\):

$$0 = v + aT \quad\rightarrow\quad T = \frac{-v}{a}$$

And we want \(x(T) = d\), so we plug both that and the formula for \(T\) into the second equation:

$$d = v\frac{-v}{a} + \frac{a}{2}\left(\frac{-v}{a}\right)^2$$

Now all we have to do is solve for \(a\):

$$d = \frac{-v^2}{a} + \frac{v^2}{2a}$$

$$d = \frac{-2v^2 + v^2}{2a}$$

$$2ad = -v^2$$

$$a = \frac{-v^2}{2d}$$

Wait, the acceleration comes out to be negative?! Yes. That’s how you know the object is slowing down rather than speeding up. (If the object weren’t moving in a straight line, its position, velocity, and acceleration would all have to be treated as 2- or 3-dimensional vectors, but the calculations would wind up being very nearly the same, only with more boldface. Also, if the velocity were negative, it would mean the object was moving backward. This is, in fact, the difference between velocity and speed: speed is the magnitude of velocity, without the direction, so it can never be negative.)

Posted in Uncategorized | 7 Comments