Archive for the ‘Academic Papers’ Category

code-of-ethics First-Amendment folksonomies free-speech information-architecture internet journalism learnability media recommendation site-navigation social-bookmarking spatial maps Taxonomies Usability video-games Writing

Notes: Looking for information

Monday, June 20th, 2005

Case, D.O. (2002). Looking for information: A survey of research on information seeking, needs, and behavior.  New York: Academic Press.  Chapter 9: Methods: Examples by type.

In this chapter Case reviews the different methodologies employed by research studying information seeking, use, and sense-making. Although he notes a few overall studies that cast a wide net, finding overall proportions, this article is not a survey of all the literature. It instead gathers relevant examples of different types of research. The types of research included case studies, formal and field experiments, mail and Internet surveys, face-to-face and phone interviews, focus groups, ethnographic, and phenomenological studies, diaries, historical studies and content analysis. The were also multiple-method studies and meta-analysis. Case writes about some of the limitations of the different methodologies—for example, case studies have limited variables, focusing on one item or event to the exclusion of others, and they are limited in terms of time as well. The author concludes that most studies assume people make rational choices and that specific variables are more important than context. More qualitative measures are becoming more popular but cannot be generalized.

The author did a particularly good job in finding studies to examine. The best example of this are the experiments. Very few laboratory experiments have been conducted specifically on information use, but there have been many on consumer behavior—and here we consumer behavior studies that involved information gathering for decision making. Another choice I found particularly interesting was the historical research by Colin Richmond that looked at the dissemination of information in England during the Hundred Years’ War. Usually when I think of historical research in social science I think of things like comparing content analysis of newspapers of the 1950s and today. It was interesting to see thing from a historian’s point of view, and also a good reminder that people did not just start needing information with the invention of the Internet. A good, though dense, book on this topic is A Social History of Knowledge by Peter Burke.

The most immediate application of this chapter is in suggesting methodologies to use in different situations. When I’m doing research, I tend to have a bias toward sources that conducted experiments or did survey research. Reading through these cases reminded me of the usefulness of things like case studies and content analysis. Another interesting application of the chapter is in suggesting topics for further study. Although the author doesn’t really build to any general conclusion on the research topics at hand (there is no overall theme to the research) looking at the different conclusions of the different types of studies suggests some interesting questions. For example, since the study by Covell, Uman and Manning suggested that doctors report using books or journals first but in reality turn to colleagues first, how can we reexamine the studies that relied on self-reporting, such as the case study or the surveys? Perhaps some of the tactics used in the consumer research experiments would be a valuable addition.

Notes: Helping people find what they don’t know

Tuesday, June 14th, 2005

Nicolas J. Belkin, Helping people find what they don’t know, Communications of the ACM, v.43 n.8, p.58-61, Aug. 2000

In this article, Belkin argues that since people generally start searching for information when they don’t know much about a subject. It is therefore problematic that many search systems require knowledge of the domain in order to get good results, for example when users do not know either the specific keywords or controlled vocabulary of the system. His group feels that the best way around this is for the system to make suggestions along the way. There are two techniques that can be used: system-controlled, where the user’s query is enhanced automatically by the system using algorithms like word frequency, and user-controlled, where the user is given the results of their query along with suggestions to make it more effective. The author’s team found that suggestions were most effective when the user was able to control which suggestions were used and when the user knew how the suggestions were generated and was comfortable with the results.

The author’s findings seem both intuitive and promising. It makes sense that in an interactive structured searching system giving the user suggestions and allowing them to take them or leave them would work well, and the suggestions should neither be bizarre or mysterious. But with the rise of the World Wide Web, I think it’s pretty clear that users with less domain knowledge prefer less-structured searching environments. In my experience, users who are new to a system will type unstructured, keyword queries into anything that even looks like a search box, even if it is clearly labeled as a field for author name, product code, or start date. Power users, on the other hand, often have more knowledge about the data then the system’s programmers—so for these sorts of suggestions to be useful, the algorithm would need to do more than just call up synonyms. The article makes it clear that these findings are early, so I would be interested to see what they have come up with since 2000.

These ideas could be applied to both structured and unstructured searching environments, though my guess is that they would be easier to implement in more structured environments because the structure of the system can be used to generate the suggestions. There certainly have been a number of projects which have tried to provide something like this with general web searching. Rudimentary systems like Google Suggest  or more advanced ones like Teoma show off the potential. Notice, however, that neither of these has exactly taken the search engine industry by storm, meaning people are apparently happy to muddle along with plain keyword searching and advanced ranking algorithms. I do wonder if their finding that users liked to have some idea about how suggestions were found would apply here as well—would users be happier with Google if they were told why PageRank picked a certain site as the number one result? Since the algorithms used by Google, Yahoo, MSN and others are trade secrets I doubt we’ll see anything like that in the near future. On the other hand, Amazon.com’s recommendation engine does tell the user why a certain book was suggested, and allow the user to remove certain suggestions. Although it is not really a search tool, it follows the precepts discussed here and seems to be successful.

The information economics of price aggregation web sites

Monday, May 2nd, 2005

Introduction

Just as the Internet has had an impact on the market for information goods and services, it has also had an impact on the information necessary for markets to function. Perfectly competitive markets, upon which models of economics are based, require four key characteristics:

  • Many sellers.
  • Nearly identical products.
  • Easy market entry (and exit).
  • Buyers and sellers have perfect information.

The last point is possibly the most difficult. Good information is hard to come by, let alone perfect information, for both buyers and sellers. Buyers are perhaps at a disadvantage, but the rise of the online marketplace and specifically price aggregating web sites has created an interesting change.

(more…)

You and your third dimension… it’s cute. Beneath the surface of Aqua Teen Hunger Force’s Mooninites

Friday, December 10th, 2004

Cartoon Network’s Adult Swim line up of shows has become a real force in pop culture. It’s ratings now demolish late night mainstays like The Tonight Show and Late Show With David Letterman among 18- to 24-year olds (by 24 and 56 percent, respectively)1. Aqua Teen Hunger Force, created by Matt Maiellaro and Dave Willis, is an illustrative example of the kind of programming drawing viewers from more traditional fare to Cartoon Network. In the show, animated anthropomorphic fast food items Frylock, Master Shake and Meatwad deal with an equally colorful array of enemies, including the alien Mooninites, Inignot and Err. The three protagonists live in a house in New Jersey, next door to Carl, their human and not particularly friendly neighbor.

 

The show has reoccurring characters but little in the way of overarching themes, continuity, or logic. It commonly employs foul language (although the worst of it is beeped), explosions, and gross-out humor. It would be easy to dismiss it as yet another artifact of the steady decline of western civilization – although that attitude is probably premature. People have been bemoaning the decline of civilization at least since Socrates was put to death for corrupting the youth.2 There is more to this show than a surface reading would betray, and the characters of the Mooninites provide a good example of why.

 

The Mooninites are very popular among the show’s fans. Proof can be found in online discussion forums – in one, they are voted funniest villains by four out of nine posters.3 The characters were obviously inspired by early arcade and Atari games. Their spaceship, for example, would fit in perfectly in Space Invaders, and the sounds made when they walk, jump, or fire their lasers seem to come directly from games like Pac Man. Their bodies are squared and pixelated, as if they were rendered with limited processing power. The theme of alien enemies descending randomly from space is seen in many classic games, from Space Invaders to Galaga.

 (more…)

Metadata Schema for Radiological Terrorism Research

Friday, April 30th, 2004

Note: this was a project for a graduate course in Knowledge Organization Systems

 

Metadata schema for radiological terrorism research (MSRTR)

Terrorism research is a complex field dealing with a number of entities, each with their own metadata requirements. This document is an introduction to the kinds of schema that will be necessary for proper cataloging, identification, and retrieval in the radiological terrorism subfield. Schema for radioactive material sources and radiological terrorism responses are presented below, followed by sample records and a crosswalk between the two scheme and the Dublin Core. Schema were made as simple as possible (8 and 6 main fields, with several qualifiers, respectively) in order to make application quick, easy and consistent.

Fields are described in the following format:

(more…)

Ontology for Radiological Terrorism Research

Friday, April 30th, 2004

Domain

The ontology was created from the Radiological Terrorism Research Thesaurus, specifically constrained to the portions under the term “material sources” and “consequence management” (now called response). Other classes not found in these areas, but referenced by fields in these areas, are included, but not developed—this includes Organization, Event, Expertise, Person, and Material and their subclasses.

Background

Terrorism is an incredibly important issue, and agencies within the US and worldwide need to meet the challenge of compiling and organizing research in a number of fields in order to counter this very real threat. In addition, agencies have been criticized in the past for not sharing information, or maintaining knowledge organization systems (KOS) which are incompatible with each other. Work is often duplicated, and often vital information will be unavailable to some agencies even though it has already been archived by others.

Clearly, there is a need for a large-scale KOS that can be used to organize information efficiently and correctly, allow for complex analysis of information, and allow for easy knowledge sharing between agencies. The most flexible and powerful KOS, and therefore the most appropriate, is an ontology. Classes, subclasses and relationships are developed and then appropriate fields are created for each. This allows for faceted search and display, automated search, hierarchical organization of information, and interoperability with other systems.

Users

This is just a sample of the larger, more complete ontology. The complete ontology would be useful for virtually any person or agency dealing with anti-terrorism, counterterrorism, intelligence or consequence management. The ontology will allow risk assessment officers, for example, to see a list of every high-level material source in the United States and Canada and their coordinates. Medical first responders could use it to catalog and retrieve proper treatments for specific bioterrorism agents. And if widely-adopted, it would greatly reduce the barriers to efficient knowledge-sharing. If the Department of Energy we to license a new Uranium mine in Montana, the information would be immediate available to risk-assessment officers, instead of requiring time for the paperwork to make its way over to the Department of Homeland Security.

 

View and navigate the ontology

Notes on “Ontologies Come of Age”

Friday, April 30th, 2004

Ontologies Come of Age, Deborah L. McGuinness (2002)

One thing I noticed about this reading is the ample use of examples. If you look through all of the points below “Structured Ontologies and Their Uses” you can see what I mean. I find that to be a big problem with a lot of the things I’ve read about ontologies or the semantic web—there’s a lot of terminology and very little illustration. So in that regard, this was a good reading.

On the other hand, the more I play with Protege and read about ontologies, the more it seems to me that all the information science and library science people are moving closer and closer to the way relational databases work, without actually knowing it. For example, each of the classes in an ontology could be thought of as tables. The class/subclass relationship is like a one-to-many foreign key relationship, and since you can have more than one parent for any particular class you can have many-to-one and many-to-many relationships as well. Each of the fields or “slots” is just like a field in a relational table. There are only a few ways in which ontologies and relational databases differ, and they’re only really cosmetic differences. Relational databases have no notion of inheritance, for example, so fields for a table called “Thing” are not passed down to other other tables that have “thing_id” as a foreign key. But database applications and users create views which join the tables and do something similar. Also, Protege allows you to use a class or and instance as the type for a slot, whereas in relational databases it really only makes sense to use an instance.

There must be other people who have noticed this, and since a lot of web pages have relational database back-ends I have to assume semantic web pages will as well.

Notes on “Metadata Principles and Practicalities”

Friday, April 30th, 2004

Metadata Principles and Practicalities, Duval, Erik, Wayne Hodgins, Stuart Sutton, Stuart L. Weibel (2002).

This was a pretty straight-forward reading. I did like the Lego metaphor for metadata, but it would be nice if it was elaborated on a little bit more. So, kids have no problem combining space ship Lego parts with medieval castle parts, but just because it is possible to do so, is it beneficial or useful in any way? I understand they are just talking about modularity as a quality at the point, but it also gives the impression that metadata schema, properly constructed, can be mixed and matched willy-nilly.

One thing I have not seen discussed very much is where the line is drawn between metadata and regular data. For example, most schema have some sort of author/creator field. I see how the author could be data describing and article, but if you look at articles in a journal or on a web page, the author is almost always presented along with the body text. A more clear example of what I’m trying to say is an article’s abstract. I’ve see some schema that have the abstract as a piece of metadata, but does that mean it is not part of the data (the article) itself? Or is it both?

It all depends on your point of view. From a database designer’s perspective, all of theses metadata fields are just data, and metadata is what describes the structure of the database—field types, lengths, foreign key relationships, etc. I’m not saying that everything should be stored in one big lump. I guess I’m just concerned that depending on the point of view, metadata could mean anything. Really, all of this is just a matter of properly separating different data elements from each other. Obviously author/creator should be a separate field from article body text. And date, titles, etc. should be separate as well. But that doesn’t really separate them from the thing itself, they are still all aspects of the thing.

A Thesaurus for Radiological Terrorism Research

Thursday, April 15th, 2004

Changes in this Edition

A number of changes have been made in this revision. Changes to scope notes, terms, and related terms are highlighted throughout this document. These changes should clarify the precise meaning and use. Sturctural changes to broader and narrower term relationships are explained below.

One of the major structural changes is the removal of “radiological terrorism” as a root word for the entire thesaurus. Putting everything under one term was not my initial idea, but the use of the hierarchical display for both input and output lead me to think that was the preferred structure. I have removed “combating radiological terrorism,” “environmental effects,” “radiation protection,” “radioactive isotopes,” “radioactive material sources,” and “radiological injuries” from under “radiological terrorism.”

Still, I think “radiological terrorism goals,” “radiological terrorism scenarios,” and “radiological terrorism requirements” are necessary parts of “radiological terrorism,” so I have kept the first two in the hierarchy and added the third. This leads to multiple inheritance for “radiological terrorism requirements,” which is both a necessary part of “radiological terrorism” and “intelligence.”

Introduction

The CTRS Radiological Terrorism Thesaurus contains descriptive terms used throughout radiological terrorism literature. The terms, their relationships, and their use were culled from several documents, including:

The thesaurus is presented in three forms: first, an alphabetical display of all included terms, including scope notes, preferred terms and synonyms, broader, narrower and related terms, and any scope notes; second, a hierarchical display of preferred terms only; and third, a rotated display of all terms.

Several relationships may be defined for any term in the thesaurus. Scope Notes (SN) are more detailed descriptions of a term’s use when necessary. A preferred term (USE) is a synonym for the term that has been selected for most uses—non-preferred terms do not show up in the hierarchical view. A non-preferred term (UF) is a synonym that may be found in the literature but is not used in the hierarchy. Broader terms (BT) are terms that represent more general classes of the current term. Narrower terms (NT) represent more specific instances or parts of the current term. Finally, related terms (RT) are related to the current term but not in any of the ways already noted.

View the Thesaurus [pdf]

Notes on the UMLS Semantic Network

Thursday, April 8th, 2004

UMLS Semantic Network

This was a really interesting reading. Not interesting like a novel or movie, of course, but interesting because I keep hearing about semantic webs without seeing any worthwhile examples.

One thing I was a little surprised to see was the ASCII codes for creating a flat-file database. I would have through they would have either specified it in XML or something a little more modern. And I kind of cringe whenever I hear anyone call an ASCII file a database. Even though it’s technically true, to me ‘database’ means database management system (DBMS), with some mechanisms put into place to allow multiple users, referential integrity, etc. If everything is stored in ASCII files, than any idiot can ruin the whole system and it’s really, really easy to let data get corrupted. You have to do all sorts of extra work to make sure updates to one field cascade through the rest of the file.

Even through this was designed specifically for the medical field, it’s surprising how much their relationships and semantic types could be useful for almost any semantic web. I could only think of a few relationships that were missing, the chief one being requires. This is a relationship that’s very common in computer science, but I think it might apply fairly often in other fields as well. When and entity requires another it cannot exist without it. It’s kind of a mix between part_of and precedes.