Archive for the ‘Blog’ Category

firefox Google how-to information-architecture iPhoto lifehack maps navigation procrastination RSS spam spyware Taxonomies Usability web-analytics web-development Web2.0 WordPress Writing

Three Ways Sites Can Track Visitors Without Cookies, Part 2

Wednesday, February 10th, 2010

In part 1, I wrote about the EFF’s Panopticlick project and the implications for anonymity. I’ve got two more methods up my sleeve.

2. Use the cache.

Cookies aren’t the only thing your browser downloads and keeps around, and for good reason. Logos and other images with stable filenames don’t tend to change very often, so instead of re-downloading them each time you revisit a site your browser caches them on disk. Other external files like Javascript can also be cached. This makes surfing the web a lot faster for everyone.

Any time someone is able to send you a file that sticks around, though, they’ve got a way to figure out if you’ve been there before. And as Josh Duck outlined in his blog post, Abusing the Cache: Tracking Users without Cookies, it’s not too tough to embed a tracking code to track your user sessions whether or not you clear your cookies.

This isn’t too terrible – users can always clean their cache, and this is generally most useful for tracking individual users visits to a single site. If you could convince enough site owners to add your widget to their site, though, you might be able to get more interesting data.

3. Check which links the user has visited.

This isn’t a new technique, at least by web standards, you can see examples as early as 2006 by Jeremiah Grossman. CSS gives you the ability to set up custom styles for links – the default style, the style when the user hovers or clicks, and most importantly for this hack the style after the user has visited the link. Browsers styled visited links differently even back in the ancient days of the web, turning blue link text to purple to help you navigate.

Any site can create a list of links to other sites and, with a bit of Javascript, tell if you’ve visited those sites in the recent past. The list of links can be hidden from the users view, so they might not even notice what’s going on. Spyjax is one example implementation with source code.

This is limited since you have to explicitly check for each potentially-visited site. So you might be able to check to see if they’ve been to Facebook, but not get the list of every social networking site they’ve ever been to. On the other hand with browsers like Chrome and Firefox getting faster all the time, checking lots of links by brute force is more possible. Users can always limit or clear their browsing history to make this technique less effective.

Should I panic yet?

Not quite, but it’s always a good idea to be on the lookout for things that undermine the assumptions of privacy and anonymity that people tend to have while surfing the web.

We’ve looked at clever ways to track a user from visit to visit, from site to site, and to get information about other sites they’ve visited. But each can be defeated, so if you want more anonymity you can still have it. To be honest I worry more about malware stealing passwords, phishing sites tricking people into giving away bank account info, and companies that have lots of sensitive info being hacked or ordered to divulge info by government. None of those problems rely on new Javascript hacks or can be fixed by clearing the browser cache.

Found a new clever hack for tracking users? Got even more important privacy concerns that I missed? Please post in the comments below.

Three Ways Sites Can Track Visitors Without Cookies

Friday, January 29th, 2010

There’s an old joke about the Internet that’s important for two reasons. First the joke:

On the internet, nobody knows you're a dog

It’s important because it illustrates a key cultural and technological underpinning of the Internet: anonymity. The second reason it’s important is that it’s so old, printed in the New Yorker in 1993, which is basically old testament times in Internet years. So for decades, the web has allowed people to browse without telling or proving who they are. Though many sites would love if you created an account and logged in, the vast majority are perfectly happy to serve up pages to you without even knowing if you’re a person or a dog.

But there are many reasons to want to track a user from page to page or from site to site, and there are various ways to do it. The most common way involves cookies. Web developers need a way to create user sessions or else things users like (shopping carts, preferences, the ability to update your profile picture) are impossible to implement.

Cookies are pretty well understood, and users can turn them off or clear them out if they really want. Google Chrome, for example, has “Incognito Mode” which allows you to surf without saving cookies, history, etc. from session to session. Even with cookies off, though, maintaining a user session within a particular site by passing around a session id isn’t too hard. It’s trivial to do in PHP for example.

Most users are pretty comfortable with this state of affairs – Facebook knows who I am because I logged in, but I trust them. Amazon knows who I am but that’s cool because I’m shopping. Some other site doesn’t know who I am, but it knows that I’m the same person who clicked on the widget to change the language a couple minutes ago.

People start getting uncomfortable when you start tracking them across sites. People become even more uncomfortable when they no longer have control over their anonymity. Three recent techniques violate both of those comfort zones in limited ways.

1. The EFF’s Panopticlick project.

Follow the link above and click the “test me” button. Is your browser silently betraying you? This is a very clever hack based on the fact that browsers almost always send some information to web servers in http headers (the user agent, what type of content the browser is willing to accept, etc.). People have been misusing user agent headers to try to get Javascript working in multiple browsers for years. Panopticlick also checks for available plugins and fonts. Adding all this data up there’s enough variability from one browser to the next that you can apparently reliably identify individuals. The EFF has a great post on the information theory behind the project.

This doesn’t mean sites will know who you are, but they could use this information to know that you visited web page A, B and C whether or not you want them too. An ad network could use this info to track you across many sites. An unscrupulous site could sell this info, giving your browsing history away for cash, and if you log into a site that has personally-identifying info about you (email, shipping addresses, etc.) the history could potentially be tied back to a person.

Next post, I’ll talk about another way to track users without cookies and a way for a site to tell if you’ve visited other sites in the past. I’ll also tell you why you shouldn’t panic, though I admit a better writer would have told you that first.

Important post on the Google blog about Google’s future in China

Wednesday, January 13th, 2010

If you haven’t heard, there’s big news on the Google Blog about malicious attacks and Google’s future in China. Please take a minute to read the post.

I wanted to add three things:

  1. I work for Google fighting abuse, but I’m not involved in this so I can’t tell you anything more than what you see on the blog. If I was involved, then I definitely couldn’t tell you more. Standard disclaimers apply.
  2. I am very proud to work for a company with such a commitment to openness and free speech.
  3. I’ve worked with some folks from our Beijing office, and in my experience they are smart, capable people committed to serving users and helping people get the information they are searching for. I hope everything works out for them.

 

How to get Google search results for academic research

Tuesday, January 12th, 2010

A few years ago, before I was a Googler, I was a grad student doing research on information retrieval. I wanted to compare the results of Google and other search engines with folksonomies form social bookmarking sites. It sounds pretty simple – Google does lots of internal search quality studies, so it’s not too surprising that outside researchers would want to execute lots of queries and use the results in their data.

The way I did it was… not optimal, to say the least. I wrote a bunch of PHP code, spaced out participant sessions, etc. to make sure I could get results back. Google tries to make sure that spammers aren’t scraping search results to generate webspam, so any kind of scraping with cURL, Beautiful Soup, etc. can result in a big pile of failure.

The way I did it wasn’t the right way or the easy way, so when I got the job I made a mental note to ask around for the best way to get search results. Then I forgot all about it until an email exchange with Gary Warner of CyberCrime & Doing Time fame.

It turns out Google has a great University research program and API. You have to apply for registration and let us know who you are, what school you’re affiliated with, and what you plan to study. Assuming everyting checks out you’ll get access to a pretty nice API. There’s a some example Python code but you could just as easily use PHP, Java, or whatever to consume the XML responses.

And that research I was doing? I recently noticed that my paper has been cited 7 or 8 times, according to Google Scholar. I used to joke that I had written the least influential paper in the history of academic publishing, but I guess I can’t claim the title anymore. Scopus only shows 4 citations so I will remain humble anyway.

Five Reasons To Get A Nexus One

Thursday, January 7th, 2010

Nexus One phone with the Android OS I’ve had a Nexus One for a few weeks and I can finally talk about it. It’s really nice – I’ve had a Palm Treo, an iPhone, and a G1 and this is definitely the best mobile device I’ve ever owned.

If you’re like me you’re probably tired of hearing about how every new phone is or is not an “iPhone killer.” To be honest, I really like the iPhone — I used to have one and my wife has one now. I’m not on the Android team, but I doubt they’re trying to “kill” any other devices – most Googlers like any mobile device with a full-fledged web browser.

That said, if you’re wondering which phone to buy, I think the Nexus One has the edge. Here are five reasons why:

1. The screen is really, really nice. This might sound a bit superficial, but the truth is I spend much more time surfing the web and reading than I do actually making phone calls. In my experience the higher the resolution, the less eyestrain. I also often use my phone to show people photos, and the Nexus One screen really does the photos justice.

I remember getting my iPhone and being amazed by the 480 x 320 pixel screen at 163 ppi. The Nexus One has a slightly larger screen, but much higher resolution, 800 x 480.

2. Voice input is awesome. Every time you need to type something, whether it’s an email, text message, or blog post, you always have the option of saying it. Today I texted my wife to let her know I was running late as I walked down the stairs from work – no need to look at the phone or spend time tapping out the message. That’s a pretty trivial example, but I find myself using it more and more in lots of situations just like that.

3. The video is actually adequate. This is the mobile phone I’ve seen that produces video that’s good enough to share with others. Here’s an example, and note that the lighting wasn’t exactly optimal:

We have a Canon video camera that we almost never use because it’s yet another device to lug around and getting videos off of it is a huge pain. I always have my Nexus One on me, and I can upload the videos directly to YouTube right after I take them. This means I’m getting a ton more video of my 1-year-old daughter and sharing it with family all around the country.

4. The photo gallery is nice, with great Picasa integration. I mentioned that showing off photos is a big use case for me, and the photo gallery is easy to navigate, fast, and looks cool too. It’s pushing me to use Picasa more even though I still prefer Flickr.

5. Multitasking is more useful than I thought. When the iPhone came out, I dismissed a lot of the criticism that it didn’t allow multitasking. How many different things do you expect to do at the same time on a small device? But as time went on, little task-switching annoyances started to add up.

I won’t run through all the possibilities, but my friend Wysz has a pretty good demonstration – he was able to get turn-by-turn GPS directions while listening to MP3s and streaming live video to the web. All on one phone. That’s pretty amazing.

Anything else? The Android Market is really starting to fill with cool apps, though it’s not quite as extensive as the iTunes App Store. I expect that to change as more people get Android phones. I wish I could write more about how developer-friendly the Android OS is, but I’m a bit ashamed to admit I haven’t made time to write a single line of code.

So, if you’re looking for a new phone, I completely recommend the Nexus One. If you really prefer a physical keyboard, take a look at the Droid, which has comparable specs to the Nexus One in a lot of ways. And honestly iPhones are still pretty cool, too, and I wouldn’t mind playing around the Palm Pre for a bit. This is the great thing about competition – right now we have a bunch of great mobile devices and mobile operating systems to chose from, and each is pushing the others to do better. If only we could say the same thing about the carriers.

If you have any Nexus One or Android questions, feel free to ask in the comments below.

More news about Google money scams and work from home scams

Wednesday, December 9th, 2009

You may have seen advertisements, web sites, or emails saying that Google is hiring – paying people thousands of dollars to work from home. It’s not true, and the companies behind these ads are using Google’s name without permission.

We’ve been trying to get word out, and we’ve got a new post on the Official Google Blog about how the company is fighting fraud by taking “Google Money” scammers to court.

The story has really been taking off. Here’s a story from CBS News (I’m featured near the end):

Another story from a local NBC affiliate in Utah goes in to a little more detail on one of the companies named in the suit:

Video Courtesy of KSL.com

And here’s a story from KQED radio news – this is pretty exciting for me, since I listen to NPR on my way to and from work every day:

In general, if it looks to good to be true, it probably is. If you see an offer promising thousands of dollars for very little effort or investment, be skeptical.

MSNBC Article About Scams Promising Money From Google

Thursday, September 24th, 2009

I did another interview a couple weeks ago about all the scams and get-rich-quick schemes using Google’s name and logo to fool people. Here’s the article at MSNBC, take a look.

Here’s a pretty good quote that sums up the scammers pretty well:

“They prey on people who are desperate,” says Ohio truck driver Robert Anderson, who fell for a home-based job opportunity that appeared to be from Google. “They make money by lying to people, promising them the world and giving them a guarantee they have no intention of honoring.”

I don’t have much more to add except that this sort of thing is showing up everywhere – email spam, spam accounts on Twitter, various ad networks, etc. Much of it is automated but the profit margins are high enough that they can afford to have actual people creating accounts, solving captchas, etc. so any company getting spammed with these schemes has to take a proactive approach.

As Shipwreck so presciently explained in 1985, the other half of the battle is knowing. Please feel free to send this article, or the one from ABC, or the Google Blog post around to any friends and family wondering about making money online. The truth is, may people make money with their websites, but it takes hard work. There’s no secret kit that returns cash with no effort and no investment.

Monopoly City Streets – Gaming on Google Maps

Sunday, September 13th, 2009

I’m a bit of a map geek, and in my youth I spent many hours playing Sim City, so it’s no surprise I was excited to play Monopoly City Streets. The game is a heavily-modified version of the classic board game, played out across the map of the world. You compete with players from all over the planet to buy streets, build properties, and amass as much cash as possible.

monopoly city streets

The first thing I noticed was that the map display uses Google Maps, but the site actually uses OpenStreetMap data for gameplay. Actually, the first thing I noticed was that their servers were being absolutely crushed by all the people rushing in the play the game, but I digress. OpenStreetMap is a really cool project to build mapping data using the same model as Wikipedia – interested volunteers add and verify data and everything is covered by a Creative Commons License.

(more…)

Talking about Google work from home scams on Good Morning America

Tuesday, September 1st, 2009

I was on Good Morning America this morning, interviewed for a story about the many online work-at-home scams that are floating around the web these days. Here’s a link to the story on the web site, and here’s a direct link to the video. Edit: I guess GMA’s urls don’t really work, the actual video at the link keeps changing. I even used their “link to this” button! I’ll try to find a more permanent url. I would embed the video for you here but unfortunately ABCNews doesn’t seem to allow embedding.

This is a very important story – these scams employ a number of social engineering techniques to seem trustworthy and they are tricking people who are desperate for work out of their money. The folks at GMA did a good job explaining the issue without getting too technical and confusing – please forward this on to anyone you know who’s wondering about schemes like these.

(more…)

Twelve dollars for five words? What is the Associated Press thinking?

Monday, August 3rd, 2009

I saw this on Reddit and had to comment. Follow this link. From now on if you want to quote an AP story in your blog, link to an AP headline, or email an article to your grandmother, this is the page they want you to see – including a price list that I’ll quote here:

Words Fees
5 – 25 $12.50
26 – 50 $17.50
51 – 100 $25.00
101 – 250 $50.00
251 and up $100.00

No, I am not making this up. The AP is really asking people to pay them $12.50 if they quote more than four words of any story. A nice long sentence with 26 words will cost you $17.50. Presumably this includes headlines, meaning the AP could come looking for cash if you even link to one of their stories with the relevant text.

Let’s put aside the fundamental misunderstanding of how the web works for a moment, and put it in terms that most journalists should understand: This pricing scheme amounts to prior constraint on any substantive criticism of the AP. One of the most important reasons we have fair use rights is to excerpt material for commentary or criticism. The AP says that this effort is directed against copyright infringement and sites that scrape and monetize their stories, but quoting 5, 25, even 50 words from an article is in most cases not copyright infringement–it’s attribution.

The AP runs stories on medical topics all the time. If a doctor wants to point out an error in an AP story on their blog, we had better hope they have the cash. Could you imagine being misidentified as the suspect in a crime, only to have the AP bill you when you point out the correction? It’s not like the AP never makes mistakes.

Heaven forbid someone working for a competing news agency wants to criticize the AP’s coverage for bias or political slant. That would require quoting sentences from many articles – perhaps thousands of dollars.

The AP even has the gall to quote prices for educational use. I’m sure the numerous Supreme Courts deciding Fair Use cases over the years would be pleased – it goes against all precedent, but hey, a coupon!

Outside of criticism, the AP’s own guidelines tell you why this is so important:

We should give the full name of a source and as much information as needed to identify the source and explain why he or she is credible. Where appropriate, include a source’s age; title; name of company, organization or government department; and hometown.

If we quote someone from a written document – a report, e-mail or news release — we should say so.

It’s just as important for writers in other media to use proper attribution. Writers should use direct quotes when it’s the fairest way to represent what someone else has said or written. How would the AP operate if every source they quoted demanded payment up front?

But, of course, they would never:

It means we don’t pay newsmakers for interviews, to take their photographs or to film or record them.

This makes me sad. I’m a big proponent of professional journalism and I hate to see newspapers in such dire straits. But if this is representative of what the industry plans to do, I’m not sure they have much of a chance.