Posts Tagged ‘internet’

Academic Papers baby names Blog communication democracy Google Google Spreadsheets Javascript journalism marketplace of ideas mass media media law Panopticlick poll skype social networking Twitter videoconferencing world wide web Writing

Three Ways Sites Can Track Visitors Without Cookies, Part 2

Wednesday, February 10th, 2010

In part 1, I wrote about the EFF’s Panopticlick project and the implications for anonymity. I’ve got two more methods up my sleeve.

2. Use the cache.

Cookies aren’t the only thing your browser downloads and keeps around, and for good reason. Logos and other images with stable filenames don’t tend to change very often, so instead of re-downloading them each time you revisit a site your browser caches them on disk. Other external files like Javascript can also be cached. This makes surfing the web a lot faster for everyone.

Any time someone is able to send you a file that sticks around, though, they’ve got a way to figure out if you’ve been there before. And as Josh Duck outlined in his blog post, Abusing the Cache: Tracking Users without Cookies, it’s not too tough to embed a tracking code to track your user sessions whether or not you clear your cookies.

This isn’t too terrible – users can always clean their cache, and this is generally most useful for tracking individual users visits to a single site. If you could convince enough site owners to add your widget to their site, though, you might be able to get more interesting data.

3. Check which links the user has visited.

This isn’t a new technique, at least by web standards, you can see examples as early as 2006 by Jeremiah Grossman. CSS gives you the ability to set up custom styles for links – the default style, the style when the user hovers or clicks, and most importantly for this hack the style after the user has visited the link. Browsers styled visited links differently even back in the ancient days of the web, turning blue link text to purple to help you navigate.

Any site can create a list of links to other sites and, with a bit of Javascript, tell if you’ve visited those sites in the recent past. The list of links can be hidden from the users view, so they might not even notice what’s going on. Spyjax is one example implementation with source code.

This is limited since you have to explicitly check for each potentially-visited site. So you might be able to check to see if they’ve been to Facebook, but not get the list of every social networking site they’ve ever been to. On the other hand with browsers like Chrome and Firefox getting faster all the time, checking lots of links by brute force is more possible. Users can always limit or clear their browsing history to make this technique less effective.

Should I panic yet?

Not quite, but it’s always a good idea to be on the lookout for things that undermine the assumptions of privacy and anonymity that people tend to have while surfing the web.

We’ve looked at clever ways to track a user from visit to visit, from site to site, and to get information about other sites they’ve visited. But each can be defeated, so if you want more anonymity you can still have it. To be honest I worry more about malware stealing passwords, phishing sites tricking people into giving away bank account info, and companies that have lots of sensitive info being hacked or ordered to divulge info by government. None of those problems rely on new Javascript hacks or can be fixed by clearing the browser cache.

Found a new clever hack for tracking users? Got even more important privacy concerns that I missed? Please post in the comments below.

Three Ways Sites Can Track Visitors Without Cookies

Friday, January 29th, 2010

There’s an old joke about the Internet that’s important for two reasons. First the joke:

On the internet, nobody knows you're a dog

It’s important because it illustrates a key cultural and technological underpinning of the Internet: anonymity. The second reason it’s important is that it’s so old, printed in the New Yorker in 1993, which is basically old testament times in Internet years. So for decades, the web has allowed people to browse without telling or proving who they are. Though many sites would love if you created an account and logged in, the vast majority are perfectly happy to serve up pages to you without even knowing if you’re a person or a dog.

But there are many reasons to want to track a user from page to page or from site to site, and there are various ways to do it. The most common way involves cookies. Web developers need a way to create user sessions or else things users like (shopping carts, preferences, the ability to update your profile picture) are impossible to implement.

Cookies are pretty well understood, and users can turn them off or clear them out if they really want. Google Chrome, for example, has “Incognito Mode” which allows you to surf without saving cookies, history, etc. from session to session. Even with cookies off, though, maintaining a user session within a particular site by passing around a session id isn’t too hard. It’s trivial to do in PHP for example.

Most users are pretty comfortable with this state of affairs – Facebook knows who I am because I logged in, but I trust them. Amazon knows who I am but that’s cool because I’m shopping. Some other site doesn’t know who I am, but it knows that I’m the same person who clicked on the widget to change the language a couple minutes ago.

People start getting uncomfortable when you start tracking them across sites. People become even more uncomfortable when they no longer have control over their anonymity. Three recent techniques violate both of those comfort zones in limited ways.

1. The EFF’s Panopticlick project.

Follow the link above and click the “test me” button. Is your browser silently betraying you? This is a very clever hack based on the fact that browsers almost always send some information to web servers in http headers (the user agent, what type of content the browser is willing to accept, etc.). People have been misusing user agent headers to try to get Javascript working in multiple browsers for years. Panopticlick also checks for available plugins and fonts. Adding all this data up there’s enough variability from one browser to the next that you can apparently reliably identify individuals. The EFF has a great post on the information theory behind the project.

This doesn’t mean sites will know who you are, but they could use this information to know that you visited web page A, B and C whether or not you want them too. An ad network could use this info to track you across many sites. An unscrupulous site could sell this info, giving your browsing history away for cash, and if you log into a site that has personally-identifying info about you (email, shipping addresses, etc.) the history could potentially be tied back to a person.

Next post, I’ll talk about another way to track users without cookies and a way for a site to tell if you’ve visited other sites in the past. I’ll also tell you why you shouldn’t panic, though I admit a better writer would have told you that first.

Twelve dollars for five words? What is the Associated Press thinking?

Monday, August 3rd, 2009

I saw this on Reddit and had to comment. Follow this link. From now on if you want to quote an AP story in your blog, link to an AP headline, or email an article to your grandmother, this is the page they want you to see – including a price list that I’ll quote here:

Words Fees
5 – 25 $12.50
26 – 50 $17.50
51 – 100 $25.00
101 – 250 $50.00
251 and up $100.00

No, I am not making this up. The AP is really asking people to pay them $12.50 if they quote more than four words of any story. A nice long sentence with 26 words will cost you $17.50. Presumably this includes headlines, meaning the AP could come looking for cash if you even link to one of their stories with the relevant text.

Let’s put aside the fundamental misunderstanding of how the web works for a moment, and put it in terms that most journalists should understand: This pricing scheme amounts to prior constraint on any substantive criticism of the AP. One of the most important reasons we have fair use rights is to excerpt material for commentary or criticism. The AP says that this effort is directed against copyright infringement and sites that scrape and monetize their stories, but quoting 5, 25, even 50 words from an article is in most cases not copyright infringement–it’s attribution.

The AP runs stories on medical topics all the time. If a doctor wants to point out an error in an AP story on their blog, we had better hope they have the cash. Could you imagine being misidentified as the suspect in a crime, only to have the AP bill you when you point out the correction? It’s not like the AP never makes mistakes.

Heaven forbid someone working for a competing news agency wants to criticize the AP’s coverage for bias or political slant. That would require quoting sentences from many articles – perhaps thousands of dollars.

The AP even has the gall to quote prices for educational use. I’m sure the numerous Supreme Courts deciding Fair Use cases over the years would be pleased – it goes against all precedent, but hey, a coupon!

Outside of criticism, the AP’s own guidelines tell you why this is so important:

We should give the full name of a source and as much information as needed to identify the source and explain why he or she is credible. Where appropriate, include a source’s age; title; name of company, organization or government department; and hometown.

If we quote someone from a written document – a report, e-mail or news release — we should say so.

It’s just as important for writers in other media to use proper attribution. Writers should use direct quotes when it’s the fairest way to represent what someone else has said or written. How would the AP operate if every source they quoted demanded payment up front?

But, of course, they would never:

It means we don’t pay newsmakers for interviews, to take their photographs or to film or record them.

This makes me sad. I’m a big proponent of professional journalism and I hate to see newspapers in such dire straits. But if this is representative of what the industry plans to do, I’m not sure they have much of a chance.

Wisdom of the Crowds – What To Do When Colbert Wins

Tuesday, March 24th, 2009

I saw an AP story on MSNBC titled Oops: Colbert wins space station name contest. I’m a bit of an expert when it comes to letting the internet vote on a name, if there is such a field of expertise, and the article strikes me as wrongheaded.

It’s not an “oops” that Colbert won, nor is it a problem or a mistake. Assuming the result is due to voting viewers and the web’s general affection for Colbert, and not a voting bot, this is exactly what NASA wants. Or at least, what it should have wanted.

The point of putting something up for a vote online is to involve people in a fun way and come out with a result you might not have otherwise. You can’t have the wisdom of the crowds without expecting a bit of whimsy.

Here’s to NASA naming their module after Colbert.

The Colbert Report Mon – Thurs 11:30pm / 10:30c
Space Module: Colbert – Vote Now
comedycentral.com
Colbert Report Full Episodes Political Humor Mark Sanford

I Love Hospitals With WiFi, or Twittering Childbirth

Tuesday, December 2nd, 2008

When we were looking for hospitals and doctors offices for little Athena, wifi wasn’t really on the list so much as reputation, compatibility with our insurance, and other concerns.  In retrospect, though, thank goodness Stanford Hospital and Palo Alto Medical Foundation have wifi.

We live more than 2,000 miles from most of our family.  Not all of them could make the flight to California for the birth.  We also have too many friends around the country to possibly make all the phone calls we’d have liked to have made that night.  In addition, we had several thousand people all over the world wondering which name we would pick for our baby.

Because of internet connectivity, I was able to do a fair job of including all of them in the process:

1) With my iPhone, I was able to take and post photos during labor and delivery.  Photos of my mom’s new granddaughter were available for her, on Flickr, within minutes of birth:

Wrapped and swaddled

I’m not sure I can properly express here how much it meant to her and the rest of our family to be able to see Athena so quickly.

2)  Using the Twitterific App on my iPhone was was able to post updates to Twitter throughout the whole labor.  This is a perfect example of what Twitter is good for.  Liveblogging while my wife endures the pains of childbirth would be ridiculously insensitive, but there were always minutes of downtime here and there to tap out a few words describing what’s going on.

live-twittering

3)  Using the Twitter App for Facebook, my updates showed up on my Facebook status as well.  This was a big help, since so many more friends and family use Facebook than Twitter.

A fourth option, which we didn’t use but might have had the labor been longer, was videoconferencing with Skype.  We’ve been using Skype to keep in touch with family for some time.  It is currently my grandmother’s favorite thing to do.  Since we’ve been back home Athena has become the star of many family video sessions.

One final thing I have to mention is YouTube – we certainly weren’t going to share the gooey miracle of life with the world in streaming video, but my wife followed the videos fo several other women during pregancy up to and including labor.  We don’t know a lot of other couples having kids right now, so that gave Ann a personal connection with their stories and helped her through some of the tougher times during the last 9 months.  She could see that other people were going through the same things she was and that was an important comfort.

The common theme here, which I think goes a long way toward explaining the growth of the internet as a whole, is communication.  Because of almost universal connectivity, we were able to turn a deep personal experience into a social experience as well.