Tag Archives: Taxonomies

Academic Papers Blog controlled-vocabulary folksonomies hierarchies information-architecture information-retrieval knowledge-management Knowledge-Organization-Systems metadata Papers pick-lists Reddit search-engines social-bookmarking Web2.0 Writing

Scientific proof that Reddit should add a tagging system

First, a disclaimer: the title of this post is obviously exaggerated. Proof is an awfully big word to throw around, and although I employed pretty good experiment design practices and statistical checks, I can’t really prove that Reddit should do this or that. But I can show that what they are doing now is not working, at least when it comes to search.

So, I got an email the other day letting me know that my article, Tagging and Searching: Search Retrieval Effectiveness of Folkonsomies on the World Wide Web, is being published in the July 2008 issue of Information Processing and Management (here’s the official DOI link to the article). In the study I compared search performance between traditional search engines (like Google), subject directories (like Open Directory), and social bookmarking systems (like Reddit) and their folksonomies.

What’s a folksonomy? The word is a play on the term taxonomy – a taxonomy is a system of organizing and categorizing things, like the Dewey Decimal System. Taxonomies usually follow very strict rules and are controlled by experts. A folksonomy is a system of organization built by large numbers of regular users, who add things to the collection, evaluate them, and usually tag them with keywords.

IR-system-precision-1-20

In my study, the social bookmarking systems with tagging systems did surprisingly well – Del.icio.us was more precise than Open Directory, and at a cut off of 20 results it’s precision was fairly close to that of the search engines.

Reddit, however, did not fare so well. It consistently had the lowest precision, meaning that searches returned very few relevant results. There could be many reasons for this, but the biggest difference between Reddit and the others is the lack of tags.

Now, it’s possible that the folks at Reddit have no interest in search, or information retrieval in general. I think Reddit is very effective at bringing out new and interesting links on a daily basis and encouraging commentary (just my opinion, no stats to back that up). But I think it’s a big missed opportunity not to add tagging and see where it leads.

(One last disclaimer: this post is my personal opinion as someone who enjoys using Reddit and does not reflect on my employer. This post refers to research done independently as a grad student.)

Tagging and Searching: Search Retrieval Effectiveness of Folkonsomies on the World Wide Web

To complete my MS in Information Architecture and Knowledge Management at Kent State I did some research on folksonomies and how the can support information retrieval.  I compared social bookmarking systems with search engines and directories.  I’m hoping to see the results published in an academic journal.   In the mean time, you can see a pre-publication copy of my results:

Tagging and searching [pdf, 989K]

Notes on “Vocabulary as a central concept in Information Science” and additional readings

Vocabulary as a Central Concept in Information Science, Michael Buckland (1999)

The role of classification in knowledge representation and discovery, BH Kwasnik – Library Trends, 1999

 

One good point in the Buckland article was that vocabulary can differ between those who are doing the cataloging, the authors and the searcher, even if everyone is within the same field. I’ve read some about these differences before, but they almost always seem to take the form of novice searcher vocabulary vs. expert author vocabulary or natural searcher vocabulary vs. structured system vocab. Those are probably the most clear ways to look at these distinctions—to tell you the truth looking at subtle differences between five different vocabularies does not seem like that much fun to me.

This article gets back to some of the same points we’ve already discussed in class when talking about synonym rings and taxnomies. Even through the author comes at it from a vocabulary point of view, he’s saying the same things everyone else is. If your users want to search for “Vietnam War” but your system uses “Vietnam Conflict,” without pointing the user in the right direction, no purpose has been served. You can be as correct and specific in your phrasing as you want but that’s no guarantee you’ll have a usable system.

The Kwasinik reading was really good at pointing out the strengths and weaknesses of hierarchies, trees and other organization schemes. In doing the AG assignment I ran into the “Lack of complete and comprehensive knowledge” barrier quite often. That’s one of the biggest problems with not just hierarchies, but any project like this where we have some knowledge of the domain—everyone has seen greeting cards—but not of the entire body of AG’s product line or even a representative subset. I wouldn’t want to construct a taxonomy of content object before people started entering data—I would have it be built as the database grew, with specific people in charge of keeping it consistent.