Archive for the ‘Academic Papers’ Category

Ontology for Radiological Terrorism Research

Friday, April 30th, 2004

Domain

The ontology was created from the Radiological Terrorism Research Thesaurus, specifically constrained to the portions under the term “material sources” and “consequence management” (now called response). Other classes not found in these areas, but referenced by fields in these areas, are included, but not developed—this includes Organization, Event, Expertise, Person, and Material and their subclasses.

Background

Terrorism is an incredibly important issue, and agencies within the US and worldwide need to meet the challenge of compiling and organizing research in a number of fields in order to counter this very real threat. In addition, agencies have been criticized in the past for not sharing information, or maintaining knowledge organization systems (KOS) which are incompatible with each other. Work is often duplicated, and often vital information will be unavailable to some agencies even though it has already been archived by others.

Clearly, there is a need for a large-scale KOS that can be used to organize information efficiently and correctly, allow for complex analysis of information, and allow for easy knowledge sharing between agencies. The most flexible and powerful KOS, and therefore the most appropriate, is an ontology. Classes, subclasses and relationships are developed and then appropriate fields are created for each. This allows for faceted search and display, automated search, hierarchical organization of information, and interoperability with other systems.

Users

This is just a sample of the larger, more complete ontology. The complete ontology would be useful for virtually any person or agency dealing with anti-terrorism, counterterrorism, intelligence or consequence management. The ontology will allow risk assessment officers, for example, to see a list of every high-level material source in the United States and Canada and their coordinates. Medical first responders could use it to catalog and retrieve proper treatments for specific bioterrorism agents. And if widely-adopted, it would greatly reduce the barriers to efficient knowledge-sharing. If the Department of Energy we to license a new Uranium mine in Montana, the information would be immediate available to risk-assessment officers, instead of requiring time for the paperwork to make its way over to the Department of Homeland Security.

 

View and navigate the ontology

Sphere: Related Content

Notes on “Ontologies Come of Age”

Friday, April 30th, 2004

Ontologies Come of Age, Deborah L. McGuinness (2002)

One thing I noticed about this reading is the ample use of examples. If you look through all of the points below “Structured Ontologies and Their Uses” you can see what I mean. I find that to be a big problem with a lot of the things I’ve read about ontologies or the semantic web—there’s a lot of terminology and very little illustration. So in that regard, this was a good reading.

On the other hand, the more I play with Protege and read about ontologies, the more it seems to me that all the information science and library science people are moving closer and closer to the way relational databases work, without actually knowing it. For example, each of the classes in an ontology could be thought of as tables. The class/subclass relationship is like a one-to-many foreign key relationship, and since you can have more than one parent for any particular class you can have many-to-one and many-to-many relationships as well. Each of the fields or “slots” is just like a field in a relational table. There are only a few ways in which ontologies and relational databases differ, and they’re only really cosmetic differences. Relational databases have no notion of inheritance, for example, so fields for a table called “Thing” are not passed down to other other tables that have “thing_id” as a foreign key. But database applications and users create views which join the tables and do something similar. Also, Protege allows you to use a class or and instance as the type for a slot, whereas in relational databases it really only makes sense to use an instance.

There must be other people who have noticed this, and since a lot of web pages have relational database back-ends I have to assume semantic web pages will as well.

Sphere: Related Content

Notes on “Metadata Principles and Practicalities”

Friday, April 30th, 2004

Metadata Principles and Practicalities, Duval, Erik, Wayne Hodgins, Stuart Sutton, Stuart L. Weibel (2002).

This was a pretty straight-forward reading. I did like the Lego metaphor for metadata, but it would be nice if it was elaborated on a little bit more. So, kids have no problem combining space ship Lego parts with medieval castle parts, but just because it is possible to do so, is it beneficial or useful in any way? I understand they are just talking about modularity as a quality at the point, but it also gives the impression that metadata schema, properly constructed, can be mixed and matched willy-nilly.

One thing I have not seen discussed very much is where the line is drawn between metadata and regular data. For example, most schema have some sort of author/creator field. I see how the author could be data describing and article, but if you look at articles in a journal or on a web page, the author is almost always presented along with the body text. A more clear example of what I’m trying to say is an article’s abstract. I’ve see some schema that have the abstract as a piece of metadata, but does that mean it is not part of the data (the article) itself? Or is it both?

It all depends on your point of view. From a database designer’s perspective, all of theses metadata fields are just data, and metadata is what describes the structure of the database—field types, lengths, foreign key relationships, etc. I’m not saying that everything should be stored in one big lump. I guess I’m just concerned that depending on the point of view, metadata could mean anything. Really, all of this is just a matter of properly separating different data elements from each other. Obviously author/creator should be a separate field from article body text. And date, titles, etc. should be separate as well. But that doesn’t really separate them from the thing itself, they are still all aspects of the thing.

Sphere: Related Content

A Thesaurus for Radiological Terrorism Research

Thursday, April 15th, 2004

Changes in this Edition

A number of changes have been made in this revision. Changes to scope notes, terms, and related terms are highlighted throughout this document. These changes should clarify the precise meaning and use. Sturctural changes to broader and narrower term relationships are explained below.

One of the major structural changes is the removal of “radiological terrorism” as a root word for the entire thesaurus. Putting everything under one term was not my initial idea, but the use of the hierarchical display for both input and output lead me to think that was the preferred structure. I have removed “combating radiological terrorism,” “environmental effects,” “radiation protection,” “radioactive isotopes,” “radioactive material sources,” and “radiological injuries” from under “radiological terrorism.”

Still, I think “radiological terrorism goals,” “radiological terrorism scenarios,” and “radiological terrorism requirements” are necessary parts of “radiological terrorism,” so I have kept the first two in the hierarchy and added the third. This leads to multiple inheritance for “radiological terrorism requirements,” which is both a necessary part of “radiological terrorism” and “intelligence.”

Introduction

The CTRS Radiological Terrorism Thesaurus contains descriptive terms used throughout radiological terrorism literature. The terms, their relationships, and their use were culled from several documents, including:

The thesaurus is presented in three forms: first, an alphabetical display of all included terms, including scope notes, preferred terms and synonyms, broader, narrower and related terms, and any scope notes; second, a hierarchical display of preferred terms only; and third, a rotated display of all terms.

Several relationships may be defined for any term in the thesaurus. Scope Notes (SN) are more detailed descriptions of a term’s use when necessary. A preferred term (USE) is a synonym for the term that has been selected for most uses—non-preferred terms do not show up in the hierarchical view. A non-preferred term (UF) is a synonym that may be found in the literature but is not used in the hierarchy. Broader terms (BT) are terms that represent more general classes of the current term. Narrower terms (NT) represent more specific instances or parts of the current term. Finally, related terms (RT) are related to the current term but not in any of the ways already noted.

View the Thesaurus [pdf]

Sphere: Related Content

Notes on the UMLS Semantic Network

Thursday, April 8th, 2004

UMLS Semantic Network

This was a really interesting reading. Not interesting like a novel or movie, of course, but interesting because I keep hearing about semantic webs without seeing any worthwhile examples.

One thing I was a little surprised to see was the ASCII codes for creating a flat-file database. I would have through they would have either specified it in XML or something a little more modern. And I kind of cringe whenever I hear anyone call an ASCII file a database. Even though it’s technically true, to me ‘database’ means database management system (DBMS), with some mechanisms put into place to allow multiple users, referential integrity, etc. If everything is stored in ASCII files, than any idiot can ruin the whole system and it’s really, really easy to let data get corrupted. You have to do all sorts of extra work to make sure updates to one field cascade through the rest of the file.

Even through this was designed specifically for the medical field, it’s surprising how much their relationships and semantic types could be useful for almost any semantic web. I could only think of a few relationships that were missing, the chief one being requires. This is a relationship that’s very common in computer science, but I think it might apply fairly often in other fields as well. When and entity requires another it cannot exist without it. It’s kind of a mix between part_of and precedes.

Sphere: Related Content

Notes on “Vocabulary as a central concept in Information Science” and additional readings

Thursday, March 18th, 2004

Vocabulary as a Central Concept in Information Science, Michael Buckland (1999)

The role of classification in knowledge representation and discovery, BH Kwasnik - Library Trends, 1999

 

One good point in the Buckland article was that vocabulary can differ between those who are doing the cataloging, the authors and the searcher, even if everyone is within the same field. I’ve read some about these differences before, but they almost always seem to take the form of novice searcher vocabulary vs. expert author vocabulary or natural searcher vocabulary vs. structured system vocab. Those are probably the most clear ways to look at these distinctions—to tell you the truth looking at subtle differences between five different vocabularies does not seem like that much fun to me.

This article gets back to some of the same points we’ve already discussed in class when talking about synonym rings and taxnomies. Even through the author comes at it from a vocabulary point of view, he’s saying the same things everyone else is. If your users want to search for “Vietnam War” but your system uses “Vietnam Conflict,” without pointing the user in the right direction, no purpose has been served. You can be as correct and specific in your phrasing as you want but that’s no guarantee you’ll have a usable system.

The Kwasinik reading was really good at pointing out the strengths and weaknesses of hierarchies, trees and other organization schemes. In doing the AG assignment I ran into the “Lack of complete and comprehensive knowledge” barrier quite often. That’s one of the biggest problems with not just hierarchies, but any project like this where we have some knowledge of the domain—everyone has seen greeting cards—but not of the entire body of AG’s product line or even a representative subset. I wouldn’t want to construct a taxonomy of content object before people started entering data—I would have it be built as the database grew, with specific people in charge of keeping it consistent.

Sphere: Related Content

Knowledge Organization System for a Greeting Card Company’s Design Studio Archives

Thursday, March 18th, 2004

Note: this was a project for a graduate course in Knowledge Organization Systems

Introduction

The goal of this project is to create a Knowledge Organization System (KOS) for a Greeting Card Company Studio archive so that designers are able to find source artwork and previous designs. This is no small task–Greeting Card Company has been in operation for nearly 100 years and has at least partial archives from the entire period, and today the company employs hundreds of designers and produces thousands of products. There is no question that without an inclusive, accurate, and easy-to-use archive, designers are unable to build on each others ideas and a great deal of work is being duplicated. Also, intellectual property needs to be properly managed and licensed artwork needs to be tracked and protected from accidental misuse.

Currently, all archives are stored in protective containers in the Studio, shelved by year. In addition a vast number of digital files have been compiled on the Studio’s serves and CD and tape backups. This project does not address the physical process of collection and digitization, but instead offers a road map to how items will be classified as they are entered into the system. This KOS also provides a framework for the database and the ultimate user interface.

Below is an analysis of the users and groups, followed by a description of the overall structure of the KOS. After that is a description of each facet, followed by pick lists, synonym rings, and taxonomies for each where applicable.

 

Users

In this analysis three distinct user groups were identified: Archivists, Designers, and Management/Administration. Archivists include the companies current information professionals as well as the interns and temp workers who will be doing the digitization and data entry under their supervision. The KOS has been set up under the assumption that most data entry personnel will be able to properly classify perhaps 80 to 90 percent of all items within each facet, forwarding the rest to more skilled information professionals. The professionals include skilled librarians, art historians, and other researchers who should be adequately prepared to train data entry personnel and classify more difficult items.

The designer group includes artists and graphic designers of varying skill and experience. Nearly all, however, have completed at least a two-year program and the majority have completed a four-year college degree. Taxonomies were developed with this level of expertise in mind. Designers were surveyed and a wide range of thinking about art objects and designs were found. The facets below were designed to cover virtually every way in which a designer might want to look for a piece.

Management and administration also have specific needs. It is for them primarily that the Designer entity described below as well as most facets dealing with licensing and sales have been created.

 

Organization

The archive needs to be broken down into four different logical entities: Art Elements (such as clip art, photographs, sculptures, etc.), Products (such as individual greeting cards, e-cards, etc.), Digital Files, and Designers. Each entity will have a number of associated facets which roughly correspond to the fields in the database and will allow multiple methods of search and organization.

The entity relationships will be defined in the database so that searches will cascade upward. For example, some searching for art elements will be able to find those done by a specific AG department, because Art Elements are related to products which are related to Designers, who have the Department/Team facet. All of this is relatively simple to do with SQL and can be hidden in the interface to make searching easier.

Each facet has an associated type, whether that be a simple constraint on an open text field, a pick list, or a taxonomy. Where lists and taxonomies have been developed the list’s page number is noted as well.

View the KOS, including the entities and their facets, pick lists, and taxonomies [pdf]

Sphere: Related Content

Notes on “A Taxonomy Primer,” “Ten Taxonomy Myths,” and additional readings

Thursday, March 4th, 2004

A Taxonomy Primer, Warner, Amy J. (2002)

Ten Taxonomy Myths, Montague Institute (2002)

The Intellectual Foundation of Information Organization By Elaine Svenonius (2002)

 

The Taxonomy Primer was pretty straightforward, but the Myths were more interesting. I especially liked myths 1 and 2, because I think when most people think taxonomy they think of a single, giant, all-encompassing tree that everything fits into exactly. It can be very useful to have a number of taxonomies for the same information, and there are some great examples on the web, where a site my be organized by product type but then also by region or customer group, allowing browsing from each perspective.

One image I found particularly enlightening was in the Svenonius article, where taxonomies were described as “elaborate Victorian edifices” and contrasted with “jerrybuilt systems [that] could meet the needs of most users most of the time.” This is an excellent description of where library people and web people seem to have a disconnect. Coming at thing more from the web side myself, I often think of grand schemes to classify everything and put everything into neatly labeled boxes—like Dewey or the Library of Congress Classification Schemes—as too big, too elaborate, and too old. I this is why many of the people who first started organizing information on web sites and the like don’t look to library science for inspiration, despite the wealth that is there. Most of the web people have only worked with systems that are small enough to be informal, personal enough to be ideosyncratic, or targeted enough to simply model how current users talk about the information already. In other words, jerrybuilt.

Later in the chapter, though, the writer states that organizing information is different from organizing anything else, and is in particular not to be done with “routine application of the database modeling techniques” used in business. While I agree that organizing information would be substantially different from organizing employees, the rationale given (something to do with works and differences in editions of them) lends itself really well to more-or-less common relational database structures. I think there are important issues, but too often the issues I see brought up are superficial.

Sphere: Related Content

Notes on “Creating a Controlled Vocabulary”

Thursday, February 19th, 2004

Creating a Controlled Vocabulary

 Fast, Karl, Fred Leise and Mike Steckel (2003)

 

This was a good rundown of the general process of creating a controlled vocabulary, but a lot of this seems pretty apparent to me. I guess I shouldn’t assume that this stuff is obvious, though, given how many companies make web sites or intranets without really bothering to find out how their users use vocabulary for their domain, or even establishing a vocabulary, for that matter.

The two most important points, to me, are number 5, “Establish a record of the rules you are using if you are creating a large thesaurus” and number 8, “Go back and refine. What can be improved?” In fact I think the whole notion of controlled vocabulary is misguided if there’s no clear rationale for it and attempts to update and maintain the terms at all times. Language in any field is constantly changing, and the pace of change is always accelerating. Anyone who was building a directory of Internet services would have left off the World Wide Web in 1989, and any list about self-publishing on the web would probably have left off the term “blog” in 1998. How useful would those pick lists be today?

Controlled vocabulary can be damaging if there’s no mechanism for change, or that mechanism is left unused. I don’t know why, but humanity seems to have some undying urge to compile things around ourselves into grand lists and hierarchies that are supposed to encompass all of what is or ever has been, ignoring our complete ignorance of what the future will bring. It’s not that classification in and of itself is bad, it’s that there’s a tendency to get to the “end” and say, “there, it’s done, and set in stone forever.”

 

 

 

Sphere: Related Content

Software Comparison: ASP.NET vs PHP

Tuesday, February 17th, 2004

ASP.NET and PHP

Virtually every medium or large web site now uses some kind of server-side scripting to generate web pages and interactive features instead of static html. A number of technologies are used for this purpose, including PHP, ASP.NET, Perl, ColdFusion, and JSP. This paper will look at Microsoft’s ASP.NET and an open-source alternative, PHP, and compare them in terms of cost, performance, support, features and ease of use for web development.

 

Comparing ASP and PHP can be difficult because they are not exactly the same class of software. PHP is simply a server-side scripting language. The PHP homepage describes it as “a widely-used general-purpose scripting language that is especially suited for Web development and can be embedded into HTML.”1 ASP, more properly ASP.NET, is not a language per se, and allows users to program Microsoft Internet Information Services (IIS) in Jscript, Vbscript, and C#, among others. ASP.NET is a little harder to define than PHP. ASP stands for Active Server Pages, and .NET, according to Microsoft, “is a set of Microsoft software technologies for connecting information, people, systems, and devices. It enables a high level of software integration through the use of Web services—small, discrete, building-block applications that connect to each other as well as to other, larger applications over the Internet.”2

 

Despite major structural differences, the two can and should be compared because they can be used to create the same kinds of medium-to-large, dynamic, often database-driven web sites. Server-side scripting allows sites to easily edit and update information, offer interactive features like forums and personalization, and track user traffic.

  (more…)

Sphere: Related Content

Notes on “Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files”

Thursday, January 22nd, 2004

Systems of Knowledge Organization for Digital Libraries:

Beyond Traditional Authority Files

(G Hodge - 2000)

One thing I liked was this definition:

“A KOS serves as a bridge between the user’s information need and the material in the collection. With it, the user should be able to identify an object of interest without prior knowledge of its existence.”

 

I like the notion that a KOS helps users find resources they’re not even aware of. I think that’s an important goal.

 

An impression I get from a lot of LIS people is a mild disdain for the web. Obviously the web is in many ways unstructured and can be difficult to use in ways that library systems are not. At one point the article states that “Someone recently compared the Web with a large room filled with books that were scattered all over the floor.”

 

The description above is an example of the kind of lame metaphors this disdain fosters. If the web is a large room filled with books, it is the largest room that has ever existed; the vast majority of books are available virtually for free; and although they are scattered all over the floor, thousands of people will freely provide you with maps to find books on certain subjects, and everyone is provided with magical binoculars that let them see deep inside books and find a single phrase.

 

I’m not saying that bringing better standards to the web any devising better KOSs to organize web resources is bad, just that it seems like many LIS people take the existence of the web for granted.

 

One thing mentioned throughout this article is the high cost of indexing and cataloging or merging different cataloging schemes together. I think the costs may be exaggerated in some ways. For example, if you wish to catalog web resources for educators and for medical professionals, two groups that probably have different terminology for similar concepts, you don’t need to pay thousands of grad students to index everything under one, then the other scheme. Instead develop a mapping system that translates between the two types of terminology. The mapping system would be a big project and have to be very robust, but once it’s built it can run behind the scenes when anyone does any kind of searching. The article mentions cases where this has been done (with MESH terms, for example) but insists that it is a high-cost venture.

 

Similarly, what’s wrong with using the users of the indexing system as the workforce? Logs of search terms and phrases and how they are used together can be analyzed. Users can be tracked to see which titles or abstracts they click on when searching for certain terms, how long they spend at that resources, etc. Users can even be asked to rate resources and search results. If you are in the market for a hard drive or digital camera, I recommend you go to bizrate.com, pricegrabber.com, or any of a dozen services that allow users to rate both products and merchants, making it easy to find a good LCD monitor at a reputable dealer despite the massive anonymity of the Internet and the ease of creating fly-by-night stores or selling junk merchandise online. Something similar could be done to winnow out junk information and organize information resources.

Sphere: Related Content

Ignorance, paralysis or expense: The problem of software and business method patents for information architects and web designers

Sunday, December 14th, 2003

[Note: this is a paper prepared for a graduate course in the IAKM program at Kent State University.]

What is the first step an information architect or web designer makes when designing a web site? Designers are often worried about principles such as giving users control, being consistent, providing feedback, or not relying on users’ memory (Dumas). Or, they are mentally checking off any one of a thousand “top ten mistakes” lists available in books and on the web. They may even start by discussing requirements with clients or conducting usability tests. But chances are, they aren’t at the US Patent Office web site. The last thing most information architects and web designers think about before creating a web site is doing a patent search. And this is becoming a big problem.

The introduction and explosion of software and business method patents relating to website design features presents a major problem to those who design sites, and designers and companies find themselves in three positions: ignorant of the issue, caught up in the “defensive” patent race themselves, or perhaps ultimately paralyzed and unable to continue work.

(more…)

Sphere: Related Content

Usability test of the Kent State IAKM home page

Thursday, December 11th, 2003

Note: this report shows the results of a usability test of the Information Architecture and Knowledge Management program web site at Kent State University in 2003. The site has since been redesigned.

1. Introduction

In usability study of the IAKM web site I found a number of serious problems. Current IAKM students were asked to complete a series of tasks using the site. Although participants were able to complete the tasks 91.67 percent of the time, they met all performance goals for each task only 36.11 percent of the time. The site is not fundamentally broken, but clearly there is room for improvement. Through statistical analysis, observations of the students, and remarks made by the students a number of issues were uncovered.

Many of the problems were global problems with site navigation and labeling, but there were also a number of prominent local problems. The severity of problems were rated via three categories:

  • Severe—prevents the user from completing a task or results in catastrophic loss of data or time.
  • Moderate—significantly hinders task completion but users can find a work-around.
  • Minor—irritating to the user but does not significantly hinder task completion. (Artim, 1).

Problems are also rated by scope. Any problem can be either global, meaning it applies to most pages or the site as a whole, or local, meaning it is particular to a page or specific section. Global problems are generally more pressing than local ones.

Findings are presented first in order of importance, followed by a description of the study methods.

  (more…)

Sphere: Related Content

Information visualizations and spatial maps on the web – Usability concerns

Thursday, October 23rd, 2003

Visualizing the web

Although web technologies are constantly changing, most users still browse the web the same way they did back in 1995—typing keywords into search boxes, clicking from home page, to section, to subsection on a navigation bar, or following link, to link, to link. The fact that it is called a “web” suggests that there should be other ways of navigating websites, and there are a number of projects attempting to employ information visualizations and spatial maps to do so.

All web pages organize information visually, but “information visualization centers around helping people explore or explain data that is not inherently spatial, such as that from the domains of bioinformatics, data mining and databases, finance and commerce, telecommunications and networking, information retrieval from large text corpora, software, and computer-supported cooperative work.” (“InfoVis 2003 Symposium”) Spatial metaphors are used to communicate different levels of information. A simple, static example would be a personal homepage built to look like the designers home, with links to favorite movies in the living room and recipes in the kitchen. A more advanced example would be a customer relationship management system for a large company which instead of presenting a list of technical support problems and solutions, displays an interactive map of problems, with more common problems in a larger font size, and recent problems in red. In both cases, users get an immediate grasp of complex information.

Such visualizations are intended to help solve two current web usability problems: the lack of a wide view to web structure, and the lack of query refinement based on relationships of retrieved pages (Ohwada 548). But they must do so without creating additional usability barriers. This paper will describe three current information visualization projects and describe some of the usability issues these sorts of projects face.

Many visualizations, including the three below, are not designed for specialists but instead are “targeted toward guiding the public through newly accessible oceans of on-line information.” (Morse 637) This means that many of their target users will be unfamiliar with both the interface and the particular information they are looking for.

(more…)

Sphere: Related Content

Usability Study: Kent State School of Library Science Website

Wednesday, September 17th, 2003

Kent State University School of Library Science Web Site

Site Design

The most basic level of usability is accessibility. Although it is beyond the scope of this analysis to consider problems that disabled users may have, it is useful to look at the site through the eyes of the Javascript-disabled or the DSL-disabled, those who do not have the latest, most up-to-date browsers with all the options turned on. One thing in the KSU SLIS site’s favor is the lack of any necessary plugins, like Flash or QuickTime VR, which some users might not have installed. The home page and the site’s navigation bar do use Javascript, which some users may have turned off, but disabling Javascript does not completely break the site’s navigation. It does, however, mean the users only have access to the first level of the navigation hierarchy from the homepage, which might make it a little more difficult to figure out which section is the appropriate one to go to.

On the plus side, the site is fairly slow-connection friendly. The entire homepage, including the Javascript rollover images, is only about 163K. The site makes appropriate use of alt tags for images, so anyone using a text-only browser like Lynx or surfing with images off will still be able to get around. Again, they will miss the descriptive second-tier categories for each section. The site is fully navigable in a full-text browser, but there are two problems: first, the homepage has no descriptive text, and second, there’s not always a link back to the homepage, probably because the image that links back has not alt text on most pages.

  (more…)

Sphere: Related Content