There has been a lot of debate and confusion about how many participants should be included in usability testing. Sometimes business teams think sample size in usability testing is just a matter of personal preference, and may request dozens of participants for “a good sample set.” Other times, they want to include participants who meet specific demographic requirements, which may satisfy internal politics more than they meet research requirements. There are also debates in the user research field about whether it’s better to conduct one big test with many participants and find all usability issues, or to conduct a series of smaller tests with fewer participants, identifying fewer issues but allowing for iterative design. And there is always the question of funding and the best use of money. It makes you wonder, “Where do I begin to get a good answer?”
Some say five participants are enough (e.g., Jakob Neilsen), and some say 15 are necessary. There are also mathematical formulas that determine the “right” number to test to find all of the issues. In some ways, each of these approaches is correct, which is why I think the question of how many users are needed for the ideal usability test needs to be reframed.
One of the reasons for having a larger set of test subjects is to discover all of the nuances of an application that may cause usability issues. This is great for one-shot testing; you get it all done at once and implement all the feedback. But in most development teams, all of the feedback can’t be implemented; feedback from tests is prioritized in a general queue for future releases. If a lot of the user feedback isn’t implemented, it makes you wonder, “Why did we test in the first place, and how much input do users really have into the design of the applications they are using?” In this case, the answer is very little, which doesn’t make much sense.
The determination of the right number of users to test is based less on a "golden number," and more on the goals for testing, what is being tested, and if you want to consider your users as stakeholders.
Single, All-Encompassing Tests Do Not Include Users as Stakeholders
Usability testing is sometimes seen as an independent activity, something that’s done once or twice during a project lifecycle to ensure users “get it,” and done separately and in isolation from the team. It’s not part of development or design; it’s done to prove that a concept may be used. This results in the perception that users are "them" and ironically doesn't include them in the development process, although users are really the major stakeholder. With one-shot testing, your key audience—users—spends a few hours with a product that developers have worked on full-time for weeks. How can you get a decent opinion in such an aggressive timeframe? You need to include users throughout the development process.
Fix What Matters Most for Your Stakeholders
When you find 95% of the problems with a sample set of 15 users, you get a long list of potential enhancements. From my experience, most of the items on such lists never get fixed. Some of these issues get deprioritized to the bottom of the list while other more critical fixes, such as database optimization enhancements, take precedence. How can a graphic adjustment compete with a database optimization if the product manager has a limited number of hours available to improve the application? Which is more important to usability, a faster system or a better-looking interface? In most cases, users and product managers will pick system optimizations because, honestly, users will make do with a poorly designed application if they get the results they expect.
Since oftentimes the long list of usability problems unearthed in large-sample testing never gets fixed, does this type of testing provide good value? Isn’t it better to discover the few key issues that matter most to the users, fix those issues, and test again to validate that those issues are truly fixed?
Confirm You Solved Your Stakeholders’ Problems
Another challenge with usability testing is being able to confirm that your changes are truly solutions. Let’s say you have budget to support testing with 15 people and you decide to have a single test session. After the test you create a list of issues that you recommend be fixed. You find solutions for some of the issues and get them implemented. But how do you determine if your “solutions” actually fixed the problem unless you have a second test? And what if one solution adversely affects another solution or, worse yet, negatively affects something that had already worked well? You need another test to evaluate the successfulness and overall effect of your implemented solutions.
Often with the larger tests there isn't a follow-up test, just an assumption that what the designers changed will work. But sometimes with larger issues that come out of testing, there are a number of implications that need to be considered in the design and may affect the usability in other areas. But you won't know the full impact unless you complete a second test.
If you can't afford a second test, then you can't confirm that all identified issues from the first test have been fixed. You did one usability test and hope that things are working better, but this is making business decisions on hope—not a great way to run a successful business. And your major stakeholder wasn’t included in reviewing the changes—is that really a good way to treat a stakeholder?
Identify Your Users/Stakeholders Based on Activity Rather Than Demographic
Most people use applications in the same way. People may have different opinions and insights, but the general usage pattern is the same. For example, on a shopping website, people visit the site to “window shop” or with the intent of making a purchase. The user could be a high spender or a low spender; it really doesn’t matter. What matters most is the type of activity the user is doing—in this case, online shopping. Both user types will require similar shopping features to complete their tasks. The visitors who come less often may actually be better test participants; they know how shopping sites typically work and are looking for the familiar steps to shop, but may not be so savvy that they can easily overlook usability issues. Either way, the key point here is that the users are shoppers.
Do demographics matter? Not really. Sure, it’s nice to have participants who are engaged customers, but at the end of the day you are testing functionality and use patterns. It’s not a focus group, and the only profile requirement is really just “people who shop.”
How Do I Use These Ideas in My Own Tests?
Before determining who should be included in your test, you may want to determine how many tests you can afford. I suggest you have at least two tests, and there are many ways to get your budget to accommodate two tests. You will need at least five people per session but plan on at least two not showing up, so invite about eight people for good measure. You could also use an online testing tool such as Userlytics, which includes a definite five users, and that’s a great option as well.
Get a solid definition of what the product does
Knowing the main goals of the product will help you find participants based on usage patterns. Traditionally, the best test subjects are those who would be engaged customers if they found the test product on their own because they can suggest ways to improve the product for that market. For example, if you are testing a sheet music site, contact test subjects who are musicians who regularly buy sheet music. They could give advice on how they like to research and select sheet music.
However, if finding the engaged customer type is difficult for your test, remember that you are testing general usability. If you can’t find musicians, then at a minimum find users who like to do online shopping in general. For testing a shopping site, it’s about the process of shopping; the general shopping process is the same whether it’s for video games or for designer dresses. An engaged customer will offer more tips and suggestions to make the site better for their specific segment, but if you are testing the shopping process you are mainly testing the shopping cart, indicating shipment to multiple locations, purchasing, and whether information is easy to find. Whether you are a Halo or Hermes fan, you should be able to find the product you want and the specs you need to make a decision, and be able to easily purchase the item on the site.
As a different example, let’s say you are testing a system for administering corporate health insurance. You will need to find test subjects who do these administration tasks. What may be surprising is that although the participants may be in large or small companies, the general needs for an administrator are the same no matter what the size of the company because this is about how administrators communicate with an insurance company. The tasks that are part of the role may be divided among several people, but the administrator(s) still have to enter new applications, change information, and track statuses. You may get some interesting data depending on the industry the people are from (e.g., in construction where computers are less accessible, or banking where everyone is on multiple terminals). But at the end of the day, HR administrators and office managers from any industry and company size are able to provide valuable feedback on a system based on what they need to accomplish.
Let’s say you are requested to test the usability of Product B as well as how users of Product A will accept Product B. Your instinct may be to have two groups represented in the usability test: the usability test group (those who meets the general user profile) and a second group of existing Product A users to confirm acceptance. But is this the right approach? In the end, usability is usability. If a product isn’t generally usable, it won’t be accepted or adopted anyway. It’s more important to confirm Product B’s general usability because if it’s usable, it will be accepted. If you still need to prove more specifically if users of Product A will accept Product B, recommend a usability test for Product B and complete a heuristic evaluation for the two products. With this approach, you confirm that Product B is generally usable, and can use that data in the evaluation to support why users of Product A will adopt Product B.
There’s no question that using engaged customers as participants will provide great results to any study. However, in a pinch, participant selection criteria based on tasks and activities will meet your needs as well. Keep in mind the goals of your test, remember you are testing usability, and that you are looking for the significant usability issues for the current project for the next iteration. And your users should be treated as stakeholders and be just as involved as any other major stakeholder on your team.
Approaching usability testing with iterations of smaller groups who are targeted based on usage patterns isn't just better for the development team, it's better for the users and better for the business. User feedback is heard, ideas incorporated, and problems fixed; it’s a better use of resources; there is more direct user dialog; and the business will have a higher quality, shippable product. And, consistent with agile methodologies, users are included in the development process as a stakeholder. Everyone benefits. The product manager now has a clear understanding of the users’ perceptions of the product, the designers are able to focus on addressing well-defined and verified usability problems instead of trying to resolve issues that may not exist except in a team member’s perception, and the developers understand why certain issues are so important for the final product. Through iterative testing, the issues that are raised are always timely and relevant. Conducting larger tests to find all the issues is a concept that only works when there is a generous budget and an unlimited schedule, which exists only in engineering fantasies. And it isolates users from the process, when in fact they are major stakeholders.