information theory | JasonMorrison.net

There’s an old joke about the Internet that’s important for two reasons. First the joke:

It’s important because it illustrates a key cultural and technological underpinning of the Internet: anonymity. The second reason it’s important is that it’s so old, printed in the New Yorker in 1993, which is basically old testament times in Internet years. So for decades, the web has allowed people to browse without telling or proving who they are. Though many sites would love if you created an account and logged in, the vast majority are perfectly happy to serve up pages to you without even knowing if you’re a person or a dog.

But there are many reasons to want to track a user from page to page or from site to site, and there are various ways to do it. The most common way involves cookies. Web developers need a way to create user sessions or else things users like (shopping carts, preferences, the ability to update your profile picture) are impossible to implement.

Cookies are pretty well understood, and users can turn them off or clear them out if they really want. Google Chrome, for example, has “Incognito Mode” which allows you to surf without saving cookies, history, etc. from session to session. Even with cookies off, though, maintaining a user session within a particular site by passing around a session id isn’t too hard. It’s trivial to do in PHP for example.

Most users are pretty comfortable with this state of affairs – Facebook knows who I am because I logged in, but I trust them. Amazon knows who I am but that’s cool because I’m shopping. Some other site doesn’t know who I am, but it knows that I’m the same person who clicked on the widget to change the language a couple minutes ago.

People start getting uncomfortable when you start tracking them across sites. People become even more uncomfortable when they no longer have control over their anonymity. Three recent techniques violate both of those comfort zones in limited ways.

1. The EFF’s Panopticlick project.

Follow the link above and click the “test me” button. Is your browser silently betraying you? This is a very clever hack based on the fact that browsers almost always send some information to web servers in http headers (the user agent, what type of content the browser is willing to accept, etc.). People have been misusing user agent headers to try to get Javascript working in multiple browsers for years. Panopticlick also checks for available plugins and fonts. Adding all this data up there’s enough variability from one browser to the next that you can apparently reliably identify individuals. The EFF has a great post on the information theory behind the project.

This doesn’t mean sites will know who you are, but they could use this information to know that you visited web page A, B and C whether or not you want them too. An ad network could use this info to track you across many sites. An unscrupulous site could sell this info, giving your browsing history away for cash, and if you log into a site that has personally-identifying info about you (email, shipping addresses, etc.) the history could potentially be tied back to a person.

Next post, I’ll talk about another way to track users without cookies and a way for a site to tell if you’ve visited other sites in the past. I’ll also tell you why you shouldn’t panic, though I admit a better writer would have told you that first.

JasonMorrison.net

Usability, web development, and design

Tag Archives: information theory

Three Ways Sites Can Track Visitors Without Cookies

1. The EFF’s Panopticlick project.