Every user experience researcher I know gets requests to do heuristic evaluations. But it isn't always clear that the requester actually knows what is involved in doing a heuristic evaluation. That happens. If I had a dollar for every time someone called asking for a focus group when what they ended up needing was a usability test, I could take a very nice holiday on Aruba.

They've heard the buzzword. They might have heard that Jakob Neilsen had something to do with the method, so that adds something to the appeal (though I'm not sure what). And they know that someone who they hope is a "usability expert" can just tell them what's wrong. Right now.

Some clients who have asked me to do heuristic evaluations have picked up the term somewhere but often are not clear on the details. Typically, they have mapped "heuristic evaluation" to "usability audit," or something like that. It's close enough to start a conversation.

Unfortunately, the request usually suggests that a heuristic evaluation can substitute for usability tests. I chat with the person, starting by talking about what a heuristic evaluation is, what you get out of it, and how it compares to what you find out in a usability test.

How do you do a heuristic evaluation?

Let's talk about what a "classic" heuristic evaluation is. When Jakob Nielsen and Rolf Molich published the method in 1990, these two really smart guys were trying to distill some of the basic principles that make a user interface usable to its audience. They came up with 10 "accepted usability principles" (heuristics) that, when multiple evaluators applied them to any UI, should reveal gaps in the design of the UI that could cause problems for users.

Armed with the Nielsen checklist of accepted usability principles—heuristics—someone who had never seen the target UI before, and who was not necessarily knowledgeable about the domain, should be able to determine whether any UI complied with these 10 commandments of usable UI design. If three or four or five people sat down for an hour or two and inspected an interface separately, they could come up with piles of problems. Then they could compare their lists, normalize the issues, and then hand a list off to the engineers to go fix.

What do you get out of a heuristic evaluation?

Let's say that the person who called me the other day was asking for a review in the form of a heuristic evaluation to resolve a conflict on the team. The conflict on this team was about the page flow: What should the order of steps in the process be? The same as site X, or the same as site Y? Should the up-sell be at the beginning or the end of the purchase process? "Could you please review the UI and just tell us what to do because we don't have time and money to do a usability test?"

Several of the Nielsen heuristics might apply. Some probably don't. For example, did the success of the page flow require users to remember things from step to step (recognition rather than recall)? Were there any shortcuts for return customers (flexibility and efficiency of use)? Where might users get bogged down, distracted, or lost (aesthetic and minimalist design)? By applying these heuristics, what have we found out?

The flow might require people to remember something from one step to another. The way the heuristic is written, requiring this of users is always bad. But it might not be.

The flow might not have shortcuts for expert users. The way the heuristic is written, not having shortcuts is bad. But it might not be.

There may be places in the flow that slow people down. The way the heuristic is written, you always want users to be able to do tasks quickly. But you might not.

And I don't think we have resolved the conflict on the team.

When applying what I call "checklist usability" in a heuristic evaluation to learn what the flaws and frustrations of a design might be, the outcome is a determination of whether the UI complies with the heuristics. It is an inspection, not an evaluation. It is not about the user experience. It's not even about task performance, which is what the underlying question was in the team's conflict: Will users do better with this flow versus that flow? If we interrupt them, will they still complete a purchase? Any inspection method that claims to answer those kinds of questions is just guessing.

A team may learn about some design flaws, but the frustrations could remain stubbornly hidden—unless the reviewer has already observed many, many users trying to reach goals using this site or process, or something very like it in the same domain. Even then, there's a huge risk that a single inspector or even a small group of inspectors—who are applying very general guidelines, are not actually using the design as part of the inspection, and are not like the users—will miss flaws that will be task-stoppers. Worse, they may identify things that don't comply with the heuristics that should not be changed.

How does heuristic evaluation compare to usability testing?

Heuristic evaluation was codified around 1990, at a time when it was expensive to get access to users. It was common for people to have to be trained to use the technology being evaluated before they could sit down in a usability lab to perform some tasks. The whole concept of even having an interface for end-users was pretty new. Conventions were just settling into place.

Usability testing has been around since at least the 1980s, but began to be widely practiced about the same time Nielsen and Molich published their heuristic evaluation method. While usability testing probably needs some updating as a method, the basic process still works well. It is pretty inexpensive to get access to users. UIs to technology are everywhere. For most of the applications of technology that I test, users don't need special training.

Heuristic evaluation may help a team know whether their UI complies with someone else's guidelines. But observing people using a design in a usability test gives a team primary data for making design decisions for their users using their design—especially in a world evolved far beyond command line entry and simple GUIs to options like touchscreens, social media, and ubiquitous connectivity. Separately and in combination, these and other design decisions present subtle, complex problems of usability. For me, observing people using a design will always trump an inspection or audit for getting solid evidence to determine a design direction. There is nothing like that "ah ha!" moment when a user does something unexpected to shed light on how well a design works.


Thank you for writing this article, Dana. It was very helpful for me today. 

This is a really great article Dana. Thanks for taking the time to write this up. I especially like the part about how something in an assessment may say that one thing is bad, but in reality (after real testing), it turns out just fine. I like to remember the tone of the presentation of the information. I think's it really critical to paint a gentle picture of how we are "teaming up" to find "possible usability focus points", but not necessarily good/bad parts.

How would your blog interface adhere to these principles. Can you explain.

I use a different checklist depending on the UI element. Thats how rich UI design is becoming. I comb through high level hueristics like Nielson's, then lower lever specific hueristics.

Nielsen's heuristics were designed for desktop software. They are not fully applicable to web interfaces. I have a reduced set that I use for web evaluations that is also coupled with an assessment on the frequency and severity of the issue. So it is not just "this is a list of problems and they all must be fixed", but here are a number of things which, as Harry indicates, can be prioritized as we know this will be a problem simply in it's construct in relation to the target users and the context. I completely agree that calling in a consultant to do heuristic evaluation is neither going to be a good use of the consultant's time nor get the desired results to improve the web application, unless they have a way to really understand the domain and the users.

To do a good heuristic it is helpful to have personas and scenarios. Without those you are just asking for opinions. Certainly some opinions are better than others. Which is why heuristics in the end are no replacement for usability tests. As anyone who has done both knows, it is always interesting to see how little users are bothered by one thing as compared to what they really get hung up on when performing a task. where doing almost any type of user testing is better than none, some heuristics can be detrimental if you end up working on things that while likely are problems, are not significant problems for your users.

As a counterpoint, I loathe being hired in to do some usability testing only to find the product is riddled with schoolboy errors that I could easily have uncovered in a quick UX review of some kind beforehand. Waste of time, waste of client's money.

For me, the classic definition of a heuristic evaluation involves the notion that somewhere out there exists a "perfect interface" for the task at hand and the job a of an interaction or UI designer is to strive toward that perfect model, never attaining it of course. I find many clients seem to think that if you hand off a couple of wireframes or a prototype to a usability expert, he or she will look up all these usability rules in her big book and come back with a kind of score or something. I think that basic concept is all wrong. Designing interfaces (or any product) is all about context, tasks, limits, and these change over time and even within the time scope of a project. I totally agree that an inspection is incredibly valuable and cost-efficient if it is done by an experienced usability expert. Evaluating during sprints in an agile development process is spot on. To me, heuristic evaluations are to be considered as a high resolution, narrow focus tool that should be used often but in short bursts. Usability testing is a lower frequency, lower resolution, broader tool that has the benefit (and pitfalls) of relying on trying to make sense of complex human behavior. It's a whole other ball game.

You talk about "a small group of inspectors — who are... not actually using the design as part of the inspection". Wouldn't a heuristic evaluation typically involve the evaluators using the system under test? How else can they do it?

And I do think that "heuristic evaluation" is a good name for the technique. "Heuristic" implies that the evaluation is made with guidelines -- rules of thumb, not precise measurements. "Usability audit" carries a whiff of accounting, which misleadingly implies that the results will be exact. "Your application is 87.3% usable."

I think much more common than full-blown heuristic assessments are the requests for advice and opinions in the corridor, over the phone, in meetings. People expect you to know stuff - and responding "It depends" because you haven't had the opportunity to conduct research yet (if at all) just doesn't fly with management or colleagues and makes you sound amateur and spineless.

It's tough to balance dispensing unsubstantiated advice with putting questions off till you have done your research but it is a compromise that needs to be made and you are going to have to draw on general design principles, your instincts and experience.

Great article, Dana.

I agree with you and Chris Rourke when you say we shouldn't follow the rules of heuristic evaluation too closely. In an expert evaluation, you should look at context and what is really important for a particular product or website.

As Joe has rightly said: the quality of expertise of the usability expert is key.

But then again, expert evaluations are no substitute for user testing. Experts don't know everything, not even usability experts.

Great Article.

I think there is value in both. Heuristics by professionals with a ton of experience (working with users in the field and in labs on a large variety of sites) can be invaluable. It can be especially useful during the design process if you want some basic feedback on wires/rough comps. It can help squash some of the bigger gotchas and get a design to a point where you can use user testing to answer more specific questions and not waste time/money on broad problems.

User testing is obviously invaluable, but I think it must really be done in a controlled environment with professionals who know how to interpret use reactions/behavior meaningfully. I've seen UX teams doing their own testing (even with real users) and drawing big shiny conclusions from anecdotal user behavior. There is a definite danger in oversimplifying a user's psychology or in simply seeing what you want to see based one a single user's behavior.

In both methods, I think quality of expertise and professional interpretation are key. I love scrappy, quick and dirty hallway testing as much as the next person - but if everyone in your hallway is also a designer, your results aren't worth much. Bottom line, I think you need to get your design out of your own mind to be interpreted by a third party with fresh eyes.

Hi Dana - thanks for the good summary. I agree that there are drawbacks to the method, starting with its name (alienating, academic & strange) and the fact it was developed for software rather than the web (some of the 10 are awkward to apply - help & documentation)especially for social media where the issue is often more the context than what you see. But the biggest risk is that the results can be dismissed as 'just an opinion' even when well conducted by experienced people. I was part of the CUE studies by Rolf Molich a few years ago and the results comparing usability tests to heuristic evaluations are interesting in a high degree of noncomformity of findings. And there is no chance for the killer quote / video clip to seal the deal showing what a barrier some issue is.
On the plus side, for quick turnaround however, they will always be around though perhaps without the straight jacket of the 10 rules but more applied as a 'usability audit' 'expert evaluation' or similar.

Chris -

You raise some excellent points, which I'm going to address in succeeding articles.

Without spoiling the plot, I think that heuristic reviews are better when a) the heuristics are developed from data collected for that product for that context of use, and b) when they're done by people who have a lot of experience observing users in that domain. And that's where a quick turn-around works really well. 

Also, if you're embedded in a team that trusts you, you can do reviews all the time - every sprint, say - but you still have to test with real users.

Thanks for a great article, Dana. The term "heuristic evaluation" frankly has always made me gag because, as you note, it's a buzzword that sounds credibly scientific but no one actually seems to understand the meaning or purpose of it. There's a section of my book that discusses how quantitative or quasi-objective methods of evaluating a product aren't anywhere near as effective as just talking to and observing users. Those methods can indicate that there might be a problem, possibly, somewhere... user feedback/observation tells you that there is a problem, and specifically where.

Jonathan -

A proven way to make your method seem more appealing is to give it an inscrutable name. But I'm sure in 1990 doing that made sense to Jakob and Rolf. "Usability engineering" was looking for credibility.

As far as observing people use a design, what I like is you get to find out why an issue is an issue. This is a really hard thing to find out from other methods where you're not observing directly (even if you can get users to write comments).