To complete my MS in Information Architecture and Knowledge Management at Kent State I did some research on folksonomies and how the can support information retrieval. I compared social bookmarking systems with search engines and directories. I’m hoping to see the results published in an academic journal. In the mean time, you can see a pre-publication copy of my results:
Can, F., Nuray, R., & Sevdik, A. B. (2004). Automatic performance evaluation of web search engines. Information Processing & Management, 40(3), 495-514.
Although virtually all Internet users utilize search engines to find information on the web evaluation of search engines is often difficult. A large number of searches would need to be tested and each one would need to be judged subjectively by human participants. The authors of this paper have devised a new way to test search engines and have tested their method against evaluations done by human judges, and found their automatic Web search engine evaluation method (AWSEEM) significantly predicted the subjective judgments. In the human-evaluation control, users were given a list of resources called up by the various search engines with no idea which engine each came from and were asked to rank the relevance of each. In AWSEEM, each query was run and the top 200 results for each engine were compiled into a collection of vectors which are then ranked by their similarity to the “the user information-needs” (including the question, the query, and a description of the need). The system then looks at the top 20 ranked pages for each engine and counts how many are in the top s (50 and 100 are used) commonly retrieved pages. These are assumed to be relevant.
One possible issue with this system is that it requires a little more human interaction than first assumed—the query providers must provide more than just a query. A bigger issue, though, is the choice of measure for relevancy. AWSEEM assumes that if a result appears in the results of multiple engines, it is relevant. This may be reasonable, but does raise the question—what if all the engines studied are wrong? For a simple example, searching for my own name online will retrieve a large number of results that are the same in many search engines but have nothing to do with the particular Jason Morrison who sits here typing this. Another interesting thing to note is that they did not find much of a statistically significant difference between the performance of the different search engines using either method (although more so with the human-judgment method). Very few scholarly articles (and even fewer popular press articles) bother to do this when pitting search engines against each other. Is it possible that the very notion of the “best” search engine has been statistically meaningless for some time?
The authors make a good point about the difficulty in using real users for search engine evaluation. An automated approach is one answer, but there is another—the problem is that too much time and effort is required of a small number of users. Instead, if tiny amounts of time and effort were spread across thousands or millions of users, similar results could be achieved while still using subjective measures. For example, if every time a user got results on any search engine they were presented with a simple “rate these results on a scale of 1 to 5 stars” input, they could quickly and effortlessly contribute data toward a shootout-type study. Cooperation of the search engines would not necessarily be needed, if one could use a university’s proxy to substitute or add the input for popular search engines, for example, or if a generic search page was set up to produce results from randomized (double-blind) engines. It would be interesting to try this, AWSEEM, and individual evaluation in one study to see if there was a statistical correlation.
Borgman, C.L. (1996). Why are online catalogs still hard to use? Journal of the American Society for Information Science, 47 (7): 493-503.
In this 1996 study, Borgman revisits a 1986 study of online library catalogs. In the original study, computer interfaces and online catalogs were still fairly new—the study looked at how the design of traditional card catalogs could inform the design of new online catalogs. By the time of this study online catalogs were common but still not easy to use. Three kinds of knowledge were seen as necessary for online catalog searching: conceptual knowledge about the information retrieval process in general, semantic knowledge of how to query the particular system, and technical knowledge including basic computer skills. Semantic knowledge and technical knowledge differ here in the same way as semantic and syntactic knowledge in computer science. The study also covers specific concepts like action, access points, search terms, boolean logic, and file organization. In the short term, Borgman recommends training and help facilities to help users gain the skills they need to use current systems. In the long run, though, libraries must employ the findings of information-seeking process research if they are ever going to create usable interfaces.
The study does point out a number of reasons why online catalogs are difficult for users, whether it’s because they lack computer skills or semantic knowledge. One good example is from a common type of query language. Even if the user knows that “FI” means “find” and “AU” means author, they may not know whether to use “FI AU ROBERT M. HAYES,” “FI AU R M HAYES,” “FI AU HAYES, ROBERT M,” etc., and how the results will differ. Unfortunately the article lacks clear instructions or examples of how to make the systems better. The conclusion that different types of training materials could be helpful seems to me like a bandage rather than a cure.
I think a lot of the criticisms are still true, but that modern cataloging and searching systems have become easier. I’m not so sure it’s because catalog designers have started applying information-seeking research in their interfaces, though. It almost seems like library systems are being made easier in self-defense. Users are getting more and more used to a Google or Yahoo type interface—a simple search box that looks at full text and uses advanced algorithms to find relevant results. I think part of this is due to the fact that people in the library field have experience with complicated, powerful structure search systems and are used to a lot of manual encoding of records. Web developers, lacking this background, have been more free to think in terms of searching massive amounts of unstructured data and automating the collection and indexing process. I also think that simple things such as showing the results, including summaries of each item, in a scrollable, clickable list, have helped a great deal to support the information seeking process. Things like search history and “back” and “forward” buttons, “search within these results,” automatic spell checking, etc. are becoming pretty standard as well.