What You Really Get From a Heuristic Evaluation
Every user experience researcher I know gets requests to do heuristic evaluations. But it isn't always clear that the requester actually knows what is involved in doing a heuristic evaluation. That happens. If I had a dollar for every time someone called asking for a focus group when what they ended up needing was a usability test, I could take a very nice holiday on Aruba.
They've heard the buzzword. They might have heard that Jakob Neilsen had something to do with the method, so that adds something to the appeal (though I'm not sure what). And they know that someone who they hope is a "usability expert" can just tell them what's wrong. Right now.
Some clients who have asked me to do heuristic evaluations have picked up the term somewhere but often are not clear on the details. Typically, they have mapped "heuristic evaluation" to "usability audit," or something like that. It's close enough to start a conversation.
Unfortunately, the request usually suggests that a heuristic evaluation can substitute for usability tests. I chat with the person, starting by talking about what a heuristic evaluation is, what you get out of it, and how it compares to what you find out in a usability test.
How do you do a heuristic evaluation?
Let's talk about what a "classic" heuristic evaluation is. When Jakob Nielsen and Rolf Molich published the method in 1990, these two really smart guys were trying to distill some of the basic principles that make a user interface usable to its audience. They came up with 10 "accepted usability principles" (heuristics) that, when multiple evaluators applied them to any UI, should reveal gaps in the design of the UI that could cause problems for users.
Armed with the Nielsen checklist of accepted usability principles—heuristics—someone who had never seen the target UI before, and who was not necessarily knowledgeable about the domain, should be able to determine whether any UI complied with these 10 commandments of usable UI design. If three or four or five people sat down for an hour or two and inspected an interface separately, they could come up with piles of problems. Then they could compare their lists, normalize the issues, and then hand a list off to the engineers to go fix.
What do you get out of a heuristic evaluation?
Let's say that the person who called me the other day was asking for a review in the form of a heuristic evaluation to resolve a conflict on the team. The conflict on this team was about the page flow: What should the order of steps in the process be? The same as site X, or the same as site Y? Should the up-sell be at the beginning or the end of the purchase process? "Could you please review the UI and just tell us what to do because we don't have time and money to do a usability test?"
Several of the Nielsen heuristics might apply. Some probably don't. For example, did the success of the page flow require users to remember things from step to step (recognition rather than recall)? Were there any shortcuts for return customers (flexibility and efficiency of use)? Where might users get bogged down, distracted, or lost (aesthetic and minimalist design)? By applying these heuristics, what have we found out?
The flow might require people to remember something from one step to another. The way the heuristic is written, requiring this of users is always bad. But it might not be.
The flow might not have shortcuts for expert users. The way the heuristic is written, not having shortcuts is bad. But it might not be.
There may be places in the flow that slow people down. The way the heuristic is written, you always want users to be able to do tasks quickly. But you might not.
And I don't think we have resolved the conflict on the team.
When applying what I call "checklist usability" in a heuristic evaluation to learn what the flaws and frustrations of a design might be, the outcome is a determination of whether the UI complies with the heuristics. It is an inspection, not an evaluation. It is not about the user experience. It's not even about task performance, which is what the underlying question was in the team's conflict: Will users do better with this flow versus that flow? If we interrupt them, will they still complete a purchase? Any inspection method that claims to answer those kinds of questions is just guessing.
A team may learn about some design flaws, but the frustrations could remain stubbornly hidden—unless the reviewer has already observed many, many users trying to reach goals using this site or process, or something very like it in the same domain. Even then, there's a huge risk that a single inspector or even a small group of inspectors—who are applying very general guidelines, are not actually using the design as part of the inspection, and are not like the users—will miss flaws that will be task-stoppers. Worse, they may identify things that don't comply with the heuristics that should not be changed.
How does heuristic evaluation compare to usability testing?
Heuristic evaluation was codified around 1990, at a time when it was expensive to get access to users. It was common for people to have to be trained to use the technology being evaluated before they could sit down in a usability lab to perform some tasks. The whole concept of even having an interface for end-users was pretty new. Conventions were just settling into place.
Usability testing has been around since at least the 1980s, but began to be widely practiced about the same time Nielsen and Molich published their heuristic evaluation method. While usability testing probably needs some updating as a method, the basic process still works well. It is pretty inexpensive to get access to users. UIs to technology are everywhere. For most of the applications of technology that I test, users don't need special training.
Heuristic evaluation may help a team know whether their UI complies with someone else's guidelines. But observing people using a design in a usability test gives a team primary data for making design decisions for their users using their design—especially in a world evolved far beyond command line entry and simple GUIs to options like touchscreens, social media, and ubiquitous connectivity. Separately and in combination, these and other design decisions present subtle, complex problems of usability. For me, observing people using a design will always trump an inspection or audit for getting solid evidence to determine a design direction. There is nothing like that "ah ha!" moment when a user does something unexpected to shed light on how well a design works.