Palmer, J.W. (2002). Web site usability, design, and performance metrics. Information Systems Research, 13(2), 151-167.
In this study Palmer looks at three different ways to measure web site design, usability and performance. Rather than testing specific sites or trying out specific design elements, this paper looks at the validity of the measurements themselves. Any metrics must exhibit at least construct validity and reliability—meaning that the metrics must measure what they say they measure, and they must continue to do so in other studies. Constructs measured included download delay, navigability, site content, interactivity, and responsiveness (to user questions). The key measures of the user’s success with the web site included frequency of use, user satisfaction, and intent to return. Three different methods were used: a jury; third-party rankings (via Alexa), and a software agent (WebL). The paper examine the results of three studies, one in 1997, on in 1999, and one in 2000, involving corporate web sites. The measures were found to be reliable, meaning jurors could answer a question the same way each time, and valid, in that different jurors and methods agreed on the answers to questions. In addition, the measures were found to be significant predictors of success.
This is an interesting article because in my experience, usability studies are often all over the place, with everything from cognitive psychology and physical ergonomics to studies of server logs to formal usability testing to “top ten usability tips” lists. Some of this can be attributed to the fact that it is a young field, and some of it is due to the different motive fueling research (commercial versus academic). One thing in the article I worry about, however, is any measure of “interactivity” as a whole. Interactivity is not a simple concept to control, and adding more interactivity is not always a good idea. Imagine a user trying to find the menu on a restaurant’s web site—do they want to be personally guided through it via an interactive Flash cartoon of the chef, or do they want to just see the menu? Palmer links interactivity to the theory of media richness, which has a whole body of research behind it that I am no expert on. But I would word my jury questionnaires to reflect a rating of appropriate interactivity.
The most important impact of this study is that it helps put usability studies on a more academically sound footing. It is very important to have evidence that you are measuring what you think you are measuring. It would be interesting to see if other studies have adopted these particular metrics because of the strong statistical evidence in this study.
The most straight-forward metric, download delay, is also one that has been discounted lately. The thought is that with so many users switching to broadband access, download speed is no longer the issue it used to be. This is especially false for sites with information seeking interfaces, which are often very dynamic and rely on database access. No amount of bandwidth will help if your site’s database server is overloaded.