Category Archives: Academic Papers

Academic Papers Artwork baby names Blog blogging democracy Design ethics Facebook firefox Flickr folksonomies Google Google Docs Google Spreadsheets how-to information-architecture information-retrieval information design internet iphone journalism listserv mailing list maps mass media Online News Papers Photography plugin poll social-bookmarking social networking social software spam tagging trust Twitter Usability web-development Web2.0 webspam web standards WordPress Writing

Notes: Web site usability, design, and performance metrics

Palmer, J.W. (2002). Web site usability, design, and performance metrics. Information Systems Research, 13(2), 151-167.

In this study Palmer looks at three different ways to measure web site design, usability and performance. Rather than testing specific sites or trying out specific design elements, this paper looks at the validity of the measurements themselves. Any metrics must exhibit at least construct validity and reliability—meaning that the metrics must measure what they say they measure, and they must continue to do so in other studies. Constructs measured included download delay, navigability, site content, interactivity, and responsiveness (to user questions). The key measures of the user’s success with the web site included frequency of use, user satisfaction, and intent to return. Three different methods were used: a jury; third-party rankings (via Alexa), and a software agent (WebL). The paper examine the results of three studies, one in 1997, on in 1999, and one in 2000, involving corporate web sites. The measures were found to be reliable, meaning jurors could answer a question the same way each time, and valid, in that different jurors and methods agreed on the answers to questions. In addition, the measures were found to be significant predictors of success.

This is an interesting article because in my experience, usability studies are often all over the place, with everything from cognitive psychology and physical ergonomics to studies of server logs to formal usability testing to “top ten usability tips” lists. Some of this can be attributed to the fact that it is a young field, and some of it is due to the different motive fueling research (commercial versus academic). One thing in the article I worry about, however, is any measure of “interactivity” as a whole. Interactivity is not a simple concept to control, and adding more interactivity is not always a good idea. Imagine a user trying to find the menu on a restaurant’s web site—do they want to be personally guided through it via an interactive Flash cartoon of the chef, or do they want to just see the menu? Palmer links interactivity to the theory of media richness, which has a whole body of research behind it that I am no expert on. But I would word my jury questionnaires to reflect a rating of appropriate interactivity.

The most important impact of this study is that it helps put usability studies on a more academically sound footing. It is very important to have evidence that you are measuring what you think you are measuring. It would be interesting to see if other studies have adopted these particular metrics because of the strong statistical evidence in this study.

The most straight-forward metric, download delay, is also one that has been discounted lately. The thought is that with so many users switching to broadband access, download speed is no longer the issue it used to be. This is especially false for sites with information seeking interfaces, which are often very dynamic and rely on database access. No amount of bandwidth will help if your site’s database server is overloaded.

Notes: Design of interfaces for information seeking

Marchionini, G., & Komllodi, A.  (1998). Design of interfaces for information seeking. Annual Review of Information Science and Technology (ARIST), 21, 89-130.

In this chapter Marchionini and Komlodi examine the state of user interfaces for information seeking. Interfaces are defined as the conjunctions and boundaries where different physical and conceptual human constructs meet, and is at the center of information science in fields such as human-computer interaction (HCI and human factors. The chapter looks at advances in technology and research, summarizes the developments of the first two generations of user interfaces, and examines current (as of 1998) developments in the field. One way to look at the chapter is shown in figure 1, with technology, information seeking, and interface design research and development shifting from mainframes to PCs to the web, from professionals to literate end users to universal access, and from ASCII characters to graphics to multimedia respectively. Some early developments remain important today, such as the components of an interactive system – task, user, terminal and content (with context added later). Another milestone was the development of the GOMS (goals, operators, methods and selection) model, the first formal model of of HCI. Two themes throughout the chapter are the interdependent nature of research in this area and the importance of human-centered concepts and design.

This is a really good summary of the history of HCI with an eye specifically toward searching and information use. It’s not surprising the many of the names we have seen on articles this semester show up here as well. The only real regret I have is that there are no pictures. User interfaces often rely on visual display for interaction, so in addition to all the description it would be really interesting to see examples of the different generations of user interfaces. One other criticism is that little attention is paid the the interfaces of video games—I have read a lot of articles about interface design that ignore this field as well.

Although it is a little out of date, there’s a lot to be taken from this chapter’s historical perspective. I found three things in particular that were talked about in relationship to third-generation user interfaces that were particularly interesting. First was the move toward universal access or ubiquitous computing. It is in some ways a measure of success that researchers now worry about the lack of computers in Sub-Saharan Africa—this wouldn’t be a problem if information seeking computer interfaces were not so available, useful, and approachable. Second was the notion that the advance of the web in some ways slowed the advance of user interface design, although the apparent disadvantage quickly disappeared. This is something I’ve run into in a different form as a web designer—clients complaining that their web site did not look exactly like their brochure. Again, in some ways this was an embarrassment of riches—the web site cost nothing to distribute, could be found by search engines, acted as a storefront, but the lack of a particular font face was a step backward? Finally, the notion that the whole field is really interdisciplinary is important to always keep in mind.

Notes: Automatic performance evaluation of web search engines

Can, F., Nuray, R., & Sevdik, A. B. (2004). Automatic performance evaluation of web search engines. Information Processing & Management, 40(3), 495-514.

Although virtually all Internet users utilize search engines to find information on the web evaluation of search engines is often difficult. A large number of searches would need to be tested and each one would need to be judged subjectively by human participants. The authors of this paper have devised a new way to test search engines and have tested their method against evaluations done by human judges, and found their automatic Web search engine evaluation method (AWSEEM) significantly predicted the subjective judgments. In the human-evaluation control, users were given a list of resources called up by the various search engines with no idea which engine each came from and were asked to rank the relevance of each. In AWSEEM, each query was run and the top 200 results for each engine were compiled into a collection of vectors which are then ranked by their similarity to the “the user information-needs” (including the question, the query, and a description of the need). The system then looks at the top 20 ranked pages for each engine and counts how many are in the top s (50 and 100 are used) commonly retrieved pages. These are assumed to be relevant.

One possible issue with this system is that it requires a little more human interaction than first assumed—the query providers must provide more than just a query. A bigger issue, though, is the choice of measure for relevancy. AWSEEM assumes that if a result appears in the results of multiple engines, it is relevant. This may be reasonable, but does raise the question—what if all the engines studied are wrong? For a simple example, searching for my own name online will retrieve a large number of results that are the same in many search engines but have nothing to do with the particular Jason Morrison who sits here typing this. Another interesting thing to note is that they did not find much of a statistically significant difference between the performance of the different search engines using either method (although more so with the human-judgment method). Very few scholarly articles (and even fewer popular press articles) bother to do this when pitting search engines against each other. Is it possible that the very notion of the “best” search engine has been statistically meaningless for some time?

The authors make a good point about the difficulty in using real users for search engine evaluation. An automated approach is one answer, but there is another—the problem is that too much time and effort is required of a small number of users. Instead, if tiny amounts of time and effort were spread across thousands or millions of users, similar results could be achieved while still using subjective measures. For example, if every time a user got results on any search engine they were presented with a simple “rate these results on a scale of 1 to 5 stars” input, they could quickly and effortlessly contribute data toward a shootout-type study. Cooperation of the search engines would not necessarily be needed, if one could use a university’s proxy to substitute or add the input for popular search engines, for example, or if a generic search page was set up to produce results from randomized (double-blind) engines. It would be interesting to try this, AWSEEM, and individual evaluation in one study to see if there was a statistical correlation.