Tag Archives: social-bookmarking

delicio.us digg folksonomies information-architecture information-retrieval privacy Reddit search-engines ShareThis social software spyware tagging Taxonomies trust user-task-analysis Web2.0 WordPress

Scientific proof that Reddit should add a tagging system

First, a disclaimer: the title of this post is obviously exaggerated. Proof is an awfully big word to throw around, and although I employed pretty good experiment design practices and statistical checks, I can’t really prove that Reddit should do this or that. But I can show that what they are doing now is not working, at least when it comes to search.

So, I got an email the other day letting me know that my article, Tagging and Searching: Search Retrieval Effectiveness of Folkonsomies on the World Wide Web, is being published in the July 2008 issue of Information Processing and Management (here’s the official DOI link to the article). In the study I compared search performance between traditional search engines (like Google), subject directories (like Open Directory), and social bookmarking systems (like Reddit) and their folksonomies.

What’s a folksonomy? The word is a play on the term taxonomy – a taxonomy is a system of organizing and categorizing things, like the Dewey Decimal System. Taxonomies usually follow very strict rules and are controlled by experts. A folksonomy is a system of organization built by large numbers of regular users, who add things to the collection, evaluate them, and usually tag them with keywords.


In my study, the social bookmarking systems with tagging systems did surprisingly well – Del.icio.us was more precise than Open Directory, and at a cut off of 20 results it’s precision was fairly close to that of the search engines.

Reddit, however, did not fare so well. It consistently had the lowest precision, meaning that searches returned very few relevant results. There could be many reasons for this, but the biggest difference between Reddit and the others is the lack of tags.

Now, it’s possible that the folks at Reddit have no interest in search, or information retrieval in general. I think Reddit is very effective at bringing out new and interesting links on a daily basis and encouraging commentary (just my opinion, no stats to back that up). But I think it’s a big missed opportunity not to add tagging and see where it leads.

(One last disclaimer: this post is my personal opinion as someone who enjoys using Reddit and does not reflect on my employer. This post refers to research done independently as a grad student.)

A Scary, but Fascinating Idea – Javascript and CSS hack to see where your users have been

Invasion of Segway infantry!

I just ran across this post on Aza Raskin’s blog about a technique used to cut down the number of social bookmarking links displayed to users.  I’m sure you’ve seen them–the 20 or so colorful buttons that have popped up at the bottom of every blog post on the web, for Digg, Del.icio.us and similar sites.  On my blog they are hidden behind the ShareThis Widget but Raskin had a better idea – why not just display the ones each user actually uses?

Impossible?  Not so fast – think about what happens when you visit a site.  After your visit any links to the site will change, usually from blue to purple.  We can put up links to each social bookmarking site and then use Javascript and CSS to check to see if each link has been visited.  If so, display the button, and if not, hide it.

This is a very cool way to manage buttons but the technique has wider privacy implications.  I could, for example, put links to…  questionable sites, and then use some Ajax to collect that information about users.  If I had other information about you (say you logged into my site or otherwise gave me an email address) I could link it together and build a database.

On the other hand, it’s not like I can grab your entire browsing history or follow you around after you leave my site – I have to specifically create a link and check it for every site I want to know about.  And unlike your browser history this info is cleared every time you close your browser.  So it’s not spyware or anything as intrusive as, say, the Alexa toolbar.

I can think of a bunch of cool ways to apply this technique, but I’m not sharing until I implement one.  Feel free to post any ideas (or misgivings) in the comments below.

Social software and the problem of trust

Although you don’t hear about it much, trust is an extremely important issue in the software world.  A common example is eBay – how could eBay stay in business if millions of anonymous buyers and sellers didn’t have a certain level of trust?

Andy Brice, a software developer, gives a really interesting example of the problem of trust in his blog.  He became concerned that his software products were getting a ridiculous number of awards and 5-star ratings from shareware download sites.  He devised an experiment: if you create a text file, change the file extension to .exe, and submit it to 700 download sites, how many award would you get?

It turns out you would get tons of awards.  A large percentage of these sites, which ostensibly provide users the service of evaluating shareware and freeware, are in reality just trying to skim adwords revenue.

Social software, if applied correctly with enough participation, can help to solve this problem.  It is much harder to fake 1000 del.icio.us bookmarks than it is to make an authoritative-looking award banner.

Many of us work on projects internal to companies where we don’t confront these issues directly on a day-to-day basis.  Large companies can generate billions of pages of documents and code each year.  Add to that the billions of external web pages we use as reference material.  Tools such as social bookmarking can help build up this network of trust and sift through the less useful resources even on intranets.

So now that we have the tools available, all we need is participation.  You’re reading this, so I’m probably already preaching to the choir.  Trust is a really interesting issue, though, so I’ll be writing about it here and there in the future.