Tag Archives: folksonomies

altocumulus delicio.us information-architecture information-retrieval navigation plugin Reddit search-engines social-bookmarking social software tag clouds tagging Taxonomies Web2.0 WordPress

How to get Google search results for academic research

A few years ago, before I was a Googler, I was a grad student doing research on information retrieval. I wanted to compare the results of Google and other search engines with folksonomies form social bookmarking sites. It sounds pretty simple – Google does lots of internal search quality studies, so it’s not too surprising that outside researchers would want to execute lots of queries and use the results in their data.

The way I did it was… not optimal, to say the least. I wrote a bunch of PHP code, spaced out participant sessions, etc. to make sure I could get results back. Google tries to make sure that spammers aren’t scraping search results to generate webspam, so any kind of scraping with cURL, Beautiful Soup, etc. can result in a big pile of failure.

The way I did it wasn’t the right way or the easy way, so when I got the job I made a mental note to ask around for the best way to get search results. Then I forgot all about it until an email exchange with Gary Warner of CyberCrime & Doing Time fame.

It turns out Google has a great University research program and API. You have to apply for registration and let us know who you are, what school you’re affiliated with, and what you plan to study. Assuming everyting checks out you’ll get access to a pretty nice API. There’s a some example Python code but you could just as easily use PHP, Java, or whatever to consume the XML responses.

And that research I was doing? I recently noticed that my paper has been cited 7 or 8 times, according to Google Scholar. I used to joke that I had written the least influential paper in the history of academic publishing, but I guess I can’t claim the title anymore. Scopus only shows 4 citations so I will remain humble anyway.

New social news site – NewsTrust.net

I happened across NewsTrust.net, a new social news aggregation site.  I’m a big fan of other sites in the category like Reddit, despite their flaws, and NewsTrust includes a tagging system so I feel obligated to investigate it like any other folksonomy.

So I created an account to give it a try.  The big difference between this site and others is the emphasis on quality journalism.  NewsTrust asks for your real name, and in addition to giving weight to users who write good reviews and get votes from other users, it adds factors like experience as a journalist to the mix.  It makes specific disticntions between mainstream media sources and altenrative media sources.

It’s an interesting idea, and it’s good to see journalists working together with programmers and web developers to make use of some of the social software techniques that newspaper websites so often catch on the trailing edge.  The site’s features seem geared toward providing users with the best that professional journalism has to offer with a dash of brilliant amateur writing thrown in – even the page layout looks more like a newspaper site than a Digg or Del.icio.us clone.

But I’m not sure it will work, at least not without some tweaking.  I don’t know if they put a lot of weight into the “experience” of users, but it didn’t require any verification of my 5-9 years of journalism experience (for the record, that’s four years in college plus more than a year of stringing here and there).  Here’s the problem of trust again, though hopefully mitigated by fellow users’ reviews.

The other issue is interaction design.  The widgets and buttons all work just fine, but when you rate a story you’re asked to score on six dimensions: Recommendation, Trust, Information, Fairness, Sources, and Context.  Only the first is required, but give users options and they are bound to feel obligated to exercise them.  Give them too many tasks and they will tend to give up.  So the simple interaction model of Reddit, where users don’t even have to click through to rate a story, might be information-poor but participation-rich in comparison.

Still, I will play with the site more and I wish them luck, I think they have some promising ideas.  For example, in their blog they talk about gathering sources from other countries based on big world news events, specifically the Russian invasion of Georgia.  Reddit is only fleetingly so reflective and few sites use temporary peaks in interest to get long-term data on source credibility.

Update to Altocumulus WordPress Tagging Plugin – version 0.2

Screenshot of my tag cloud WordPress plugin in action

Everyone has tag clouds all over the web, but are they really useful?  Altocumulus is an attempt to use tag clouds as a real navigational system in WordPress blogs.

Install the plugin and it will automatically put a cloud of related tags at the top of all your Category and Tag pages.  Hopefully this will serve two purposes:

  1. Users who end up on a general category page can click through to a more specific (or more relevant) tag page, and
  2. It should give users a general idea of the topic of the posts on that archive page, increasing the information scent.

Next version I’ll add an options screen where you can change the number of tags, placement, etc.

Please drop me a note if you run into any bugs or are using it on your blog.  Let me know if you have any ideas you’d like to see implemented, too – I am all about implementing and studying folksonomies.  The more folks who are interested, the more likely I am to add features.  Thanks.

Download the Plugin Here