Monthly Archives: February 2010

How my site disappeared from Google search

Seen my personal blog lately? Probably not, if you were searching via Google. Major sections of my site have been disappearing from the search index over the past three weeks. My homepage, my blog and many of the most recent articles on it no longer showed up in result pages. I’m no Matt Cutts, but I get a fair number of people coming to my site when searching for info about Google search, avoiding scams, and how to name their baby. All that traffic has been slipping away.

You can probably imagine how you would feel if this was happening to you. Does Google hate me? Was my site hacked? What do I do, and how much will it cost to get this fixed?

I will answer all of those questions, starting with the first:

My site is falling out of the index, does Google hate me?

Probably not. My situation is actually pretty illustrative – I’m pretty sure Google doesn’t hate me and isn’t unfairly slapping my site down because, well, I work at Google.

That’s right, Google was kicking pages from one of its own employees out of search results. I’m sure I’m not the first. Google doesn’t treat my site any differently than anyone else’s. BTW, standard disclaimers apply to this post.

So I knew there was probably a logical reason for the dropped pages, which brings me to the next question:

Continue reading

Three Ways Sites Can Track Visitors Without Cookies, Part 2

In part 1, I wrote about the EFF’s Panopticlick project and the implications for anonymity. I’ve got two more methods up my sleeve.

2. Use the cache.

Cookies aren’t the only thing your browser downloads and keeps around, and for good reason. Logos and other images with stable filenames don’t tend to change very often, so instead of re-downloading them each time you revisit a site your browser caches them on disk. Other external files like Javascript can also be cached. This makes surfing the web a lot faster for everyone.

Any time someone is able to send you a file that sticks around, though, they’ve got a way to figure out if you’ve been there before. And as Josh Duck outlined in his blog post, Abusing the Cache: Tracking Users without Cookies, it’s not too tough to embed a tracking code to track your user sessions whether or not you clear your cookies.

This isn’t too terrible – users can always clean their cache, and this is generally most useful for tracking individual users visits to a single site. If you could convince enough site owners to add your widget to their site, though, you might be able to get more interesting data.

3. Check which links the user has visited.

This isn’t a new technique, at least by web standards, you can see examples as early as 2006 by Jeremiah Grossman. CSS gives you the ability to set up custom styles for links – the default style, the style when the user hovers or clicks, and most importantly for this hack the style after the user has visited the link. Browsers styled visited links differently even back in the ancient days of the web, turning blue link text to purple to help you navigate.

Any site can create a list of links to other sites and, with a bit of Javascript, tell if you’ve visited those sites in the recent past. The list of links can be hidden from the users view, so they might not even notice what’s going on. Spyjax is one example implementation with source code.

This is limited since you have to explicitly check for each potentially-visited site. So you might be able to check to see if they’ve been to Facebook, but not get the list of every social networking site they’ve ever been to. On the other hand with browsers like Chrome and Firefox getting faster all the time, checking lots of links by brute force is more possible. Users can always limit or clear their browsing history to make this technique less effective.

Should I panic yet?

Not quite, but it’s always a good idea to be on the lookout for things that undermine the assumptions of privacy and anonymity that people tend to have while surfing the web.

We’ve looked at clever ways to track a user from visit to visit, from site to site, and to get information about other sites they’ve visited. But each can be defeated, so if you want more anonymity you can still have it. To be honest I worry more about malware stealing passwords, phishing sites tricking people into giving away bank account info, and companies that have lots of sensitive info being hacked or ordered to divulge info by government. None of those problems rely on new Javascript hacks or can be fixed by clearing the browser cache.

Found a new clever hack for tracking users? Got even more important privacy concerns that I missed? Please post in the comments below.