Posts Tagged ‘Javascript’

Adblock Ajax anonymity bit.ly Blog cookies EFF firefox firefox add-on Google Chrome Greasemonkey how-to HTTP headers internet Panopticlick privacy sessions unobtrusive javascript web-development Web Design

Three Ways Sites Can Track Visitors Without Cookies, Part 2

Wednesday, February 10th, 2010

In part 1, I wrote about the EFF’s Panopticlick project and the implications for anonymity. I’ve got two more methods up my sleeve.

2. Use the cache.

Cookies aren’t the only thing your browser downloads and keeps around, and for good reason. Logos and other images with stable filenames don’t tend to change very often, so instead of re-downloading them each time you revisit a site your browser caches them on disk. Other external files like Javascript can also be cached. This makes surfing the web a lot faster for everyone.

Any time someone is able to send you a file that sticks around, though, they’ve got a way to figure out if you’ve been there before. And as Josh Duck outlined in his blog post, Abusing the Cache: Tracking Users without Cookies, it’s not too tough to embed a tracking code to track your user sessions whether or not you clear your cookies.

This isn’t too terrible – users can always clean their cache, and this is generally most useful for tracking individual users visits to a single site. If you could convince enough site owners to add your widget to their site, though, you might be able to get more interesting data.

3. Check which links the user has visited.

This isn’t a new technique, at least by web standards, you can see examples as early as 2006 by Jeremiah Grossman. CSS gives you the ability to set up custom styles for links – the default style, the style when the user hovers or clicks, and most importantly for this hack the style after the user has visited the link. Browsers styled visited links differently even back in the ancient days of the web, turning blue link text to purple to help you navigate.

Any site can create a list of links to other sites and, with a bit of Javascript, tell if you’ve visited those sites in the recent past. The list of links can be hidden from the users view, so they might not even notice what’s going on. Spyjax is one example implementation with source code.

This is limited since you have to explicitly check for each potentially-visited site. So you might be able to check to see if they’ve been to Facebook, but not get the list of every social networking site they’ve ever been to. On the other hand with browsers like Chrome and Firefox getting faster all the time, checking lots of links by brute force is more possible. Users can always limit or clear their browsing history to make this technique less effective.

Should I panic yet?

Not quite, but it’s always a good idea to be on the lookout for things that undermine the assumptions of privacy and anonymity that people tend to have while surfing the web.

We’ve looked at clever ways to track a user from visit to visit, from site to site, and to get information about other sites they’ve visited. But each can be defeated, so if you want more anonymity you can still have it. To be honest I worry more about malware stealing passwords, phishing sites tricking people into giving away bank account info, and companies that have lots of sensitive info being hacked or ordered to divulge info by government. None of those problems rely on new Javascript hacks or can be fixed by clearing the browser cache.

Found a new clever hack for tracking users? Got even more important privacy concerns that I missed? Please post in the comments below.

Three Ways Sites Can Track Visitors Without Cookies

Friday, January 29th, 2010

There’s an old joke about the Internet that’s important for two reasons. First the joke:

On the internet, nobody knows you're a dog

It’s important because it illustrates a key cultural and technological underpinning of the Internet: anonymity. The second reason it’s important is that it’s so old, printed in the New Yorker in 1993, which is basically old testament times in Internet years. So for decades, the web has allowed people to browse without telling or proving who they are. Though many sites would love if you created an account and logged in, the vast majority are perfectly happy to serve up pages to you without even knowing if you’re a person or a dog.

But there are many reasons to want to track a user from page to page or from site to site, and there are various ways to do it. The most common way involves cookies. Web developers need a way to create user sessions or else things users like (shopping carts, preferences, the ability to update your profile picture) are impossible to implement.

Cookies are pretty well understood, and users can turn them off or clear them out if they really want. Google Chrome, for example, has “Incognito Mode” which allows you to surf without saving cookies, history, etc. from session to session. Even with cookies off, though, maintaining a user session within a particular site by passing around a session id isn’t too hard. It’s trivial to do in PHP for example.

Most users are pretty comfortable with this state of affairs – Facebook knows who I am because I logged in, but I trust them. Amazon knows who I am but that’s cool because I’m shopping. Some other site doesn’t know who I am, but it knows that I’m the same person who clicked on the widget to change the language a couple minutes ago.

People start getting uncomfortable when you start tracking them across sites. People become even more uncomfortable when they no longer have control over their anonymity. Three recent techniques violate both of those comfort zones in limited ways.

1. The EFF’s Panopticlick project.

Follow the link above and click the “test me” button. Is your browser silently betraying you? This is a very clever hack based on the fact that browsers almost always send some information to web servers in http headers (the user agent, what type of content the browser is willing to accept, etc.). People have been misusing user agent headers to try to get Javascript working in multiple browsers for years. Panopticlick also checks for available plugins and fonts. Adding all this data up there’s enough variability from one browser to the next that you can apparently reliably identify individuals. The EFF has a great post on the information theory behind the project.

This doesn’t mean sites will know who you are, but they could use this information to know that you visited web page A, B and C whether or not you want them too. An ad network could use this info to track you across many sites. An unscrupulous site could sell this info, giving your browsing history away for cash, and if you log into a site that has personally-identifying info about you (email, shipping addresses, etc.) the history could potentially be tied back to a person.

Next post, I’ll talk about another way to track users without cookies and a way for a site to tell if you’ve visited other sites in the past. I’ll also tell you why you shouldn’t panic, though I admit a better writer would have told you that first.

TinyUrl Trouble: Greasemonkey drops the location header in GM_xmlhttpRequest

Thursday, May 21st, 2009

I get a lot of ideas. Most of them wander aimlessly in my head until they become obsolete, but once in a while I’ll get an idea that seems useful and simple enough to do in my free time.

If you’ve used Twitter, you’ve seen the myriad of url shortening services like TinyUrl and Bit.ly. Url shortening services are a kludge and they break one useful, built-in feature of the web, which is the ability to know where you’re going when you click a link.

So I thought, this is something that I could fix in an hour or so with a Greasemonkey script. If you have no idea what I’m talking about, Greasemonkey is a Firefox Plugin that runs in your browser and lets you run your own Javascript on pages you load. Greasemonkey comes with a handy-dandy AJAX function called GM_xmlhttpRequest.

I figured all I have to do is grab all the anchors on the page, see if they match a list of shortener urls, do an xmlhttpRequest for each one and grab the final location (after the service finishes with it’s redirecting) from the headers.

Something along these lines:

function getTargetUrl(short_url) {

  GM_log('Getting '+short_url);

  GM_xmlhttpRequest({
      method: 'GET',
      url: short_url,
      headers: {
          'User-agent': 'Mozilla/4.0 (compatible) Greasemonkey',
          'Accept': 'text/html'
      },
      onload: function(responseDetails) {
          GM_log('Done.  Status ' + responseDetails.status +
                ' Text ' + responseDetails.statusText + '\n\n' +
                ' Headers:\n' + responseDetails.responseHeaders);
      }
  });
}

(more…)

How to link to an individual question in Google Moderator

Saturday, March 28th, 2009

The Obama administration’s just finished “Open for Questions“, where the President answered questions suggested and voted by the general public over the web. This is pretty cool – political openness, interaction, and democracy via the web. It’s also interesting to me because the site uses Google Moderator, a product we use at work all the time.

What’s not quite so cool is that Moderator apparently doesn’t play well with the rest of the web. I’m not sure why it was designed this way (and if I did know, I probably couldn’t tell you anyway). The design is the exact opposite of unobtrusive javascript. That’s fine for highly interactive web apps but it would be nice to see the mostly text content in Moderator made searchable just like any other collection of web pages.

(more…)

An interesting use of Greasemonkey – Troubleshooting other people’s sites

Wednesday, June 11th, 2008

Detriot-Superior and Center Street Bridge I’ve played around with Firefox’s Greasemonkey add-on here and there but never really delved into it until recently.  I found most of the common uses for it to be either too specific to someone else’s use habits or already covered by other extensions.  For example, there are probably a million ad blocking scripts out there, but I already have Adblock.

I’ve grown to appreciate Greasemonkey a lot more since I learned that you can make AJAX calls in scripts – now we can do some real damage.  But this post is not about that, it’s about a totally different use case that I hadn’t thought of before.

If you’re a web developer with any friends or family you’ve probably heard this one before:

“Something’s wrong with my web site, can you take a look?”

Often, though, you won’t have access to a dev server, database, or even a copy of the server-side code.  All you can see is the HTML and Javascript source and the HTTP transactions going back and forth.

Greasemonkey can’t rewrite PHP code on someone else’s server but it does make it really, really easy for you to alter forms, delete and change cookie values, and patch and debug Javascript on the site you’re looking at, without changing any other variables.

This can be really, really useful in some situations.  So now it’s officially added to my volunteer/web-developer/brother-in-law toolbelt.

A Scary, but Fascinating Idea – Javascript and CSS hack to see where your users have been

Friday, May 30th, 2008

Invasion of Segway infantry!

I just ran across this post on Aza Raskin’s blog about a technique used to cut down the number of social bookmarking links displayed to users.  I’m sure you’ve seen them–the 20 or so colorful buttons that have popped up at the bottom of every blog post on the web, for Digg, Del.icio.us and similar sites.  On my blog they are hidden behind the ShareThis Widget but Raskin had a better idea – why not just display the ones each user actually uses?

Impossible?  Not so fast – think about what happens when you visit a site.  After your visit any links to the site will change, usually from blue to purple.  We can put up links to each social bookmarking site and then use Javascript and CSS to check to see if each link has been visited.  If so, display the button, and if not, hide it.

This is a very cool way to manage buttons but the technique has wider privacy implications.  I could, for example, put links to…  questionable sites, and then use some Ajax to collect that information about users.  If I had other information about you (say you logged into my site or otherwise gave me an email address) I could link it together and build a database.

On the other hand, it’s not like I can grab your entire browsing history or follow you around after you leave my site – I have to specifically create a link and check it for every site I want to know about.  And unlike your browser history this info is cleared every time you close your browser.  So it’s not spyware or anything as intrusive as, say, the Alexa toolbar.

I can think of a bunch of cool ways to apply this technique, but I’m not sharing until I implement one.  Feel free to post any ideas (or misgivings) in the comments below.