Julie Albertson

Friends call me Jules EXPERIENCE
PORTFOLIO
EDUCATION
THOUGHTS   •   The problem with usability testing...

mail@juliealbertson.com

... as it's being practiced in corporate America -- and probably at a newspaper near you

Usability testing can offer important insights, no question there. It's particularly valuable during the design process, especially in the middle stages of design when a workable mockup is available but there's still plenty of time to correct problems identified. That's because usability testing as it should be practiced (observing as representative users perform representative tasks) is an excellent tool for detecting oversights or design flaws.

Unfortunately, many companies don't employ this tool during the design process. They rush through, skipping this important step, and then bring usability testing into play as an after-the-fact evaluative measure.

When it is finally decided that "usability testing" (I use that term loosely) should be conducted, a group of executives rushes out and brings in a professional usability testing company which promptly conducts a large-scale study, often unbeknownst to the designers altogether, and a couple months later returns to present findings complete with numbers and recommendations and fancy charts.

Why this is bad:

Usability testing isn't all about percentages.

These companies are digging for quantitative data to support their recommendations. To be fair this is probably, as much as anything else, an effect of the clients who hire them demanding facts for all the money they've spent on the testing. The problem is that pages of numbers don't show the design team how people use the site. Designers need to see people using a prototype. We need to see whether or not they're scrolling. If a user pauses for a full minute we need to be able to ask why -- are they reading or hopelessly lost? The fact that, say, 19 percent of people tested report feeling that the site organization is confusing may sound very authoritative, but it does nothing for me as a designer. What about it? Where were they when they got confused? What were they asked to do when they got confused?

Designers need to be involved, even if that means just observing or, worst-case scenario, at least being able to watch a videotape of users interacting with the site after the fact. Directly observing this interaction is far more important than any list of "facts" (again, I use that term very loosely).

Usability guru Jakob Nielsen: "... to evaluate interaction designs you must closely observe individual users as they perform tasks with the user interface."

The testing premise is all wrong

Every set of professionally conducted usability results I've come across cops to starting out something like "Take a few minutes to familiarize yourself with the site, just look at it...." Clearly, somewhere along the line someone decided this was a good idea, because it seems to be fairly ubiquitous. But how many people, as they surf the Web, come to a new page and then stop and take a few minutes to survey it? None, that's how many. That's not how people use the Web. We all know this, so why is anyone pretending that test subjects could possibly tell us with any accuracy how they would use the site when you've just forced them to use it in a way they never would.

Our users are brilliant and cultured -- and probably all beautiful too

They've created the usability focus group. Much information that comes away from improper usability test questioning is akin to focus group results. Theater, yes I'm "very likely" to return to this site for theater reviews. Celebrity gossip? It is "somewhat unlikely" that I would read celebrity gossip. Hmmm... ah, yes, there we go, world affairs -- that's probably what I'd look at first, and then independent movie reviews. Known to researchers as demand characteristics, this perfectly human tendency to present the best version of ourselves when we know someone is looking typically goes unchecked in usability testing even though evaluators are pulling for quantitative data. And I have yet to come across usability test results which made any effort to cross-validate this type of data.

The fact is, usability testing isn't particularly useful for collecting quantitative data about how visitors use your site. Oh you can collect the numbers alright, and then type them up and present the charts and that room full of executives will 'hmmm' and 'ahhh' and nod along, but the accuracy of all those numbers is suspect because ...

a) and this is a problem that must be overcome in even properly performed usability tests -- if people have been brought into a lab for the study, they aren't using the site in the same frame of mind that they normally would be. They are also subject to the "stage fright syndrome" a.k.a. "I know I'm being watched and suddenly I'm nervous." Even if cameras are not present during testing, most people don't react well to someone staring over their shoulders as they work. All in all, not terribly indicative of real use situations.

b) if people are answering questions from an online intercept study they're mostly telling you how they think they use the site -- and people are generally *terrible* self-evaluators.

Nielsen's "Basic rules of usability"
1. Watch what people actually do.
2. Do not believe what people say they do.
3. Definitely don't believe what people predict they may do in the future.

c) perhaps most importantly for news organizations, information on a news site changes every day. You can't take one day's site (or even a couple) and present it to a group of people to find out how they generally use the site. The art in, say, your entertainment tease might be particularly compelling the day after the MTV VMAs (i.e. "the kiss") and immediately pull users to that content whereas on any other given day your lead news headline will get the pull. You just don't know on the basis of this kind of testing.

The convenience/volunteer sample can't help things either

Many Web usability tests employ an intercept method (usually a pop-up invitation or a temporary replacement page) to pull users from your site. The people who respond are probably not terribly representative of the larger population of site visitors if for no other reason than it leaves out all the people who block pop-ups, the people who ignore ads, the people who, without even looking, automatically close anything that pops up at them or gets in the way of their surfing. Among those who do look, it is more likely to pull people who already have something they want to say about the site. It is more likely to pull users who aren't as busy at the moment. It is more likely to pull users with any number of characteristics which are not representative of your users as a whole. People use the Web in very different ways and you'll never know if the skew in your sample is disregarding an important segment of your audience. It is unlikely that you'll produce quantitative data that is either valid or reliable using such methodology.

If it's quantitative data you want ...

Chances are you're already sitting on a mountain of it. If you want to know what visitors are using on your site as it currently exists, dig into your stats. If your statistics don't tell you what you need to know a) check with the company tracking them to make sure the data isn't really there and you just don't know how to use it and b) if it really isn't there, go with a new company. The tracking software that comes free with my $35/year hosting package tells me exactly how visitors are using my site, from how they were referred to every single file accessed in the order it was accessed, including elements within a page. Your stat provider is tracking every page view. If the software they're using can't turn that data into the information you need, find a provider who can.

If you want to know what things people want or can't find by navigating your site, dig through your search queries. It's the first place they'll go once they've given up on your navigation. Dig through your referral search queries. Are they searching for things you've buried? For things you don't offer? Users are giving you all kinds of hints as to how they want to use your site, you just need to look for them.

While a usability test geared toward pulling quantitative data might nail some problems (usually the most glaring ones that your staffers and more vocal users have probably been trying to point out all along anyway), how do you think the statistical data from maybe a couple hundred respondents who know their actions are being tracked compares to the exact data from every single user?

You don't need all those numbers to detect design problems.

Finally some good news. Most design problems can be detected and fixed using good old-fashioned (OK not that old), relatively inexpensive, small-sample usability tests. The kind where you observe as representative users perform representative tasks. You'll save resources (read: money), your designers can be involved or possibly even run the testing, and the results will be superior.

Nielsen: "To identify a design's most important usability problems, testing five users is typically enough. Rather than run a big, expensive study, it's a better use of resources to run many small tests and revise the design between each one so you can fix the usability flaws as you identify them."

August 29, 2003