Flag

We stand with Ukraine and our team members from Ukraine. Here are ways you can help

Get exclusive access to thought-provoking articles, bonus podcast content, and cutting-edge whitepapers. Become a member of the UX Magazine community today!

Home ›› Business Value and ROI ›› 6 Key Questions to Guide International UX Research ›› Testing Dialog Design in a Speech Application

Testing Dialog Design in a Speech Application

by Stephen Keller
6 min read
Share this post on
Tweet
Share
Post
Share
Email
Print

Save

Speech-recognition apps require specialized testing methods that can be applied to other interfaces as well.

Designing applications that use speech recognition as their primary user experience is a challenge that is compounded somewhat by the difficulty in testing an application that provides a speech interface.

In some ways, a voice user interface (VUI) has all the challenges of other interface design and testing, such as subjective feedback from users (“I don’t like this”), along with the additional challenge of having to account for the error rate associated with speech recognition.

Here we’ll discuss some of the methods associated with proving out the UX associated with a speech application dialog.

Dialog Design and Testing

A dialog is a common term for the UX of a speech application, because most speech applications take the form of a dialog between human and machine. Even most multi-modal applications such as Apple‘s Siri still largely model the interactions as a series of questions and answers. Because of this, test plans for speech applications often resemble literal scripts, like you might see for a play or a movie. A simple test script might look something like this:

COMPUTER: Thank you for calling the Acme Bank. What would you like to do today?

HUMAN: Transfer funds.

COMPUTER: Do you want to transfer funds from your savings account or your checking account?

HUMAN: Checking account.

The purpose of the test script, at a basic level, is to verify that simple functionality is working as intended. These scripts are similar to the unit tests used by developers to test specific pieces of code in an application. They are an important part of testing speech applications (if “transfer funds” is not a valid verb in the main menu prompt, that is an important bug to uncover), but they do not really test the design except to prove that the simplest use cases work as intended.

The Importance of a Blank Slate

When we refer to testing dialog design, we’re really referring to testing the interface. To do this, the application must be exposed to users who have no scripts or preconceived notions about the interface. Using the play analogy again, every prompt may be understood differently than the author intended, and there are a vast number of responses that can be elicited for even commonly understood prompts.

All of these different lines in the play need to be tested in the real world, and they will all vary in terms of what is prompted, what is understood, and what is wanted based on word choice. All good user experiences should be somewhat intuitive and self-explanatory, with the interface providing the information necessary for novice users to quickly understand how to accomplish their goals. In dialog design, this means providing to the user a clear mental model of the application by having prompts that spell out what the user can do at any menu and how to navigate to other menus.

If a speech application is only exposed to users who follow test scripts, there is no way to verify that the interface meets these requirements. As any interface designer knows, humans are not robots and are often unpredictable. In order to know whether a dialog design provides a good experience, it must be exposed to users who are allowed to react naturally.

These “blank slate” testers should be brought in to test the application as early as possible—often while it is still in an alpha or beta state. Since a speech application’s UX design is so tightly woven with its programming, getting feedback about the dialog design early on helps developers make and test any changes to an application’s codebase.

Running Usability Tests

There are a number of ways to perform usability tests of a speech application’s dialog design. The most common ways leverage the following techniques:

  • Informal or formal user feedback
  • Speech tuning response files
  • Listening to recordings of entire sessions or phone calls
  • “Wizard of Oz” interactions
User Feedback

The simplest sort of testing is to allow users to experience a dialog and provide feedback. This may be informal (asking users to just write up their experiences) or it may be formal (asking users to fill out a detailed evaluation form). Allowing users to provide direct feedback is always useful with any kind of UX design, as it allows the designer to know what really matters to users and what doesn’t.

User feedback is most helpful when it is combined with other types of testing, such as those discussed below. For example, by combining user feedback with full-call recording, it’s possible to use free-form feedback (“It didn’t seem to understand me much”) to pinpoint the exact spot where users had trouble (“I had to repeat myself three times at this particular prompt”).

Capturing Response Files

Every commercial speech recognizer on the market allows for the saving of what are called response or utterance files that act as a recording of what a user said and what the recognition result for that utterance was. In conjunction with speech tuning utilities (such as the LumenVox Speech Tuner), designers can listen to all of the interactions with application’s users, transcribe the speech, and generate accuracy metrics that indicate how well the application is performing.

During this process, the designer can identify a number of issues. One of the main things that can be ascertained, especially if a lot of data is collected, is which problems are related to the design of the dialog compared to issues with the speech recognizer. For instance, users who are giving appropriate responses to a prompt but are not recognized probably indicate that the recognizer needs to be tuned or the grammars changed. On the other hand, if users are speaking out-of-grammar phrases, then the prompts and the dialog options probably should be reworked so that users have a better idea what is expected of them.

Full Session Recording

Recording and listening to entire phone calls or sessions can be costly in terms of time required, but can reveal issues that may not otherwise present themselves. For instance, if users are getting frustrated and quitting the application, this can be obvious when the entire call is heard in context but may not be clear if only individual interactions are heard.

“Wizard of Oz” Interactions

The most time-intensive form of usability testing involves using “Wizard of Oz” interactions. These involve a person “behind the curtain” listening to live users interacting with the system in real time. The Wizard, who can be thought of as the automated system in the background, listens to the human talk to the system and then inputs the user’s choice through another mechanism (usually by putting an entry into a form using a mouse and keyboard on the back-end). In this sense, the human is replacing the speech recognizer and trying to determine the clarity of prompts and the range and wording of responses (this was also the subplot of an episode of Seinfeld where Kramer erroneously provided movie start times by reading a newspaper).

Using the Wizard of Oz model is a good mechanism for testing a dialog design before much programming or grammar design has been done. It allows the intended UX to be evaluated against actual users instead of just testers, since the system will still appear to work correctly and not upset paying customers who may not otherwise appreciate beta testing a new speech recognition system.

Conclusion

The best testing plan for speech applications will combine the methods above or will be a variation of one or more of them. When collecting user feedback on a speech application, it’s usually a good idea to capture response files at the same time in order to perform more in-depth speech tuning. Full recordings should be enabled when doing Wizard of Oz testing, and so on. These methods will allow the designer to understand how real-world users interact with a speech system, and provide instructive input for improving and enhancing the quality of the dialog design.

More generally, the same testing methodologies can also be adapted to other types of user interfaces outside of speech recognition. This includes the UX for web transactions, web chat, call center scripting, kiosk interfaces, and other systems where user input may be open ended or require semantic interpretation. The more real world testing that can be performed prior to building a system, the closer the launched product will serve its intended purpose right out of the gate, and the less rework will be required.

post authorStephen Keller

Stephen Keller
A Solutions Architect at LumenVox, a leading provider of speech recognition and text-to-speech software, Stephen specializes in the design and development of complex voice user interfaces (VUIs) as part of their client services team, working to improve the user experience for anyone interfacing with speech technologies. He has worked on dozens of speech applications, ranging from simple yes/no-style interactions to complex, multi-slot natural language interfaces. His role at LumenVox encompasses a variety of functions, including pure design/consultation work and also development and training.

Tweet
Share
Post
Share
Email
Print

Related Articles

AI is reshaping the role of designers, shifting them from creators to curators. This article explores how AI tools are changing design workflows, allowing designers to focus more on strategy and user experience. Discover how this shift is revolutionizing the design process and the future of creative work.

Article by Andy Budd
The Future of Design: How AI Is Shifting Designers from Makers to Curators
  • This article examines how AI is transforming the role of designers, shifting them from creators to curators.
  • It explores how AI tools are enhancing design processes by automating routine tasks, allowing designers to focus on strategic decision-making and curating user experiences.
  • The piece highlights the growing importance of creativity in managing AI-driven systems and fostering collaboration across teams, ultimately reshaping the future of design work.
Share:The Future of Design: How AI Is Shifting Designers from Makers to Curators
5 min read

Are you overthinking your research process? This article explores how to determine when research is essential and when it might be holding you back. Learn a practical framework to prioritize studies, align with stakeholders, and maximize impact without wasting time or resources. Unlock strategies to focus your efforts on what truly matters.

Article by Carol Rossi
Are You Doing Too Much Research?
  • The article questions the common practice of conducting extensive research, suggesting that not all projects require it. Instead, it highlights the importance of discerning when research is truly necessary.
  • It introduces a framework to prioritize research efforts based on clarity, risk, and cost, ensuring impactful results without unnecessary delays.
  • The author emphasizes the role of stakeholder involvement in the research process, fostering better collaboration and alignment with organizational goals.
  • Practical strategies are provided for tracking and communicating the post-research impact, making the value of research more transparent and measurable.
Share:Are You Doing Too Much Research?
7 min read

Learn how to design and facilitate impactful co-creation sessions that inspire collaboration, spark creativity, and align diverse perspectives for meaningful outcomes.

Article by Aalap Doshi
Designing and Facilitating a Co-Creation Session
  • This article delves into the process of designing and facilitating effective co-creation sessions to foster collaborative innovation.
  • It outlines key steps, including planning, creating a structured agenda, and using tools to engage participants meaningfully.
  • The piece highlights how co-creation sessions can drive creativity, align diverse stakeholders, and generate actionable outcomes for complex challenges.
Share:Designing and Facilitating a Co-Creation Session
3 min read

Join the UX Magazine community!

Stay informed with exclusive content on the intersection of UX, AI agents, and agentic automation—essential reading for future-focused professionals.

Hello!

You're officially a member of the UX Magazine Community.
We're excited to have you with us!

Thank you!

To begin viewing member content, please verify your email.

Tell us about you. Enroll in the course.

    This website uses cookies to ensure you get the best experience on our website. Check our privacy policy and