Tag Archives: information-retrieval

Blog entropy folksonomies Google information-architecture information seeking keywords libraries methodologies research search-engines social-bookmarking Taxonomies Web2.0 web search WordPress

Notes: Why are online catalogs still hard to use?

Borgman, C.L. (1996). Why are online catalogs still hard to use? Journal of the American Society for Information Science, 47 (7): 493-503. 

In this 1996 study, Borgman revisits a 1986 study of online library catalogs. In the original study, computer interfaces and online catalogs were still fairly new—the study looked at how the design of traditional card catalogs could inform the design of new online catalogs. By the time of this study online catalogs were common but still not easy to use. Three kinds of knowledge were seen as necessary for online catalog searching: conceptual knowledge about the information retrieval process in general, semantic knowledge of how to query the particular system, and technical knowledge including basic computer skills. Semantic knowledge and technical knowledge differ here in the same way as semantic and syntactic knowledge in computer science. The study also covers specific concepts like action, access points, search terms, boolean logic, and file organization. In the short term, Borgman recommends training and help facilities to help users gain the skills they need to use current systems. In the long run, though, libraries must employ the findings of information-seeking process research if they are ever going to create usable interfaces.

The study does point out a number of reasons why online catalogs are difficult for users, whether it’s because they lack computer skills or semantic knowledge. One good example is from a common type of query language. Even if the user knows that “FI” means “find” and “AU” means author, they may not know whether to use “FI AU ROBERT M. HAYES,” “FI AU R M HAYES,” “FI AU HAYES, ROBERT M,” etc., and how the results will differ. Unfortunately the article lacks clear instructions or examples of how to make the systems better. The conclusion that different types of training materials could be helpful seems to me like a bandage rather than a cure.

I think a lot of the criticisms are still true, but that modern cataloging and searching systems have become easier. I’m not so sure it’s because catalog designers have started applying information-seeking research in their interfaces, though. It almost seems like library systems are being made easier in self-defense. Users are getting more and more used to a Google or Yahoo type interface—a simple search box that looks at full text and uses advanced algorithms to find relevant results. I think part of this is due to the fact that people in the library field have experience with complicated, powerful structure search systems and are used to a lot of manual encoding of records. Web developers, lacking this background, have been more free to think in terms of searching massive amounts of unstructured data and automating the collection and indexing process. I also think that simple things such as showing the results, including summaries of each item, in a scrollable, clickable list, have helped a great deal to support the information seeking process. Things like search history and “back” and “forward” buttons, “search within these results,” automatic spell checking, etc. are becoming pretty standard as well.

Notes: Looking for information

Case, D.O. (2002). Looking for information: A survey of research on information seeking, needs, and behavior.  New York: Academic Press.  Chapter 9: Methods: Examples by type.

In this chapter Case reviews the different methodologies employed by research studying information seeking, use, and sense-making. Although he notes a few overall studies that cast a wide net, finding overall proportions, this article is not a survey of all the literature. It instead gathers relevant examples of different types of research. The types of research included case studies, formal and field experiments, mail and Internet surveys, face-to-face and phone interviews, focus groups, ethnographic, and phenomenological studies, diaries, historical studies and content analysis. The were also multiple-method studies and meta-analysis. Case writes about some of the limitations of the different methodologies—for example, case studies have limited variables, focusing on one item or event to the exclusion of others, and they are limited in terms of time as well. The author concludes that most studies assume people make rational choices and that specific variables are more important than context. More qualitative measures are becoming more popular but cannot be generalized.

The author did a particularly good job in finding studies to examine. The best example of this are the experiments. Very few laboratory experiments have been conducted specifically on information use, but there have been many on consumer behavior—and here we consumer behavior studies that involved information gathering for decision making. Another choice I found particularly interesting was the historical research by Colin Richmond that looked at the dissemination of information in England during the Hundred Years’ War. Usually when I think of historical research in social science I think of things like comparing content analysis of newspapers of the 1950s and today. It was interesting to see thing from a historian’s point of view, and also a good reminder that people did not just start needing information with the invention of the Internet. A good, though dense, book on this topic is A Social History of Knowledge by Peter Burke.

The most immediate application of this chapter is in suggesting methodologies to use in different situations. When I’m doing research, I tend to have a bias toward sources that conducted experiments or did survey research. Reading through these cases reminded me of the usefulness of things like case studies and content analysis. Another interesting application of the chapter is in suggesting topics for further study. Although the author doesn’t really build to any general conclusion on the research topics at hand (there is no overall theme to the research) looking at the different conclusions of the different types of studies suggests some interesting questions. For example, since the study by Covell, Uman and Manning suggested that doctors report using books or journals first but in reality turn to colleagues first, how can we reexamine the studies that relied on self-reporting, such as the case study or the surveys? Perhaps some of the tactics used in the consumer research experiments would be a valuable addition.

Notes: Helping people find what they don’t know

Nicolas J. Belkin, Helping people find what they don’t know, Communications of the ACM, v.43 n.8, p.58-61, Aug. 2000

In this article, Belkin argues that since people generally start searching for information when they don’t know much about a subject. It is therefore problematic that many search systems require knowledge of the domain in order to get good results, for example when users do not know either the specific keywords or controlled vocabulary of the system. His group feels that the best way around this is for the system to make suggestions along the way. There are two techniques that can be used: system-controlled, where the user’s query is enhanced automatically by the system using algorithms like word frequency, and user-controlled, where the user is given the results of their query along with suggestions to make it more effective. The author’s team found that suggestions were most effective when the user was able to control which suggestions were used and when the user knew how the suggestions were generated and was comfortable with the results.

The author’s findings seem both intuitive and promising. It makes sense that in an interactive structured searching system giving the user suggestions and allowing them to take them or leave them would work well, and the suggestions should neither be bizarre or mysterious. But with the rise of the World Wide Web, I think it’s pretty clear that users with less domain knowledge prefer less-structured searching environments. In my experience, users who are new to a system will type unstructured, keyword queries into anything that even looks like a search box, even if it is clearly labeled as a field for author name, product code, or start date. Power users, on the other hand, often have more knowledge about the data then the system’s programmers—so for these sorts of suggestions to be useful, the algorithm would need to do more than just call up synonyms. The article makes it clear that these findings are early, so I would be interested to see what they have come up with since 2000.

These ideas could be applied to both structured and unstructured searching environments, though my guess is that they would be easier to implement in more structured environments because the structure of the system can be used to generate the suggestions. There certainly have been a number of projects which have tried to provide something like this with general web searching. Rudimentary systems like Google Suggest  or more advanced ones like Teoma show off the potential. Notice, however, that neither of these has exactly taken the search engine industry by storm, meaning people are apparently happy to muddle along with plain keyword searching and advanced ranking algorithms. I do wonder if their finding that users liked to have some idea about how suggestions were found would apply here as well—would users be happier with Google if they were told why PageRank picked a certain site as the number one result? Since the algorithms used by Google, Yahoo, MSN and others are trade secrets I doubt we’ll see anything like that in the near future. On the other hand, Amazon.com’s recommendation engine does tell the user why a certain book was suggested, and allow the user to remove certain suggestions. Although it is not really a search tool, it follows the precepts discussed here and seems to be successful.