Flag

We stand with Ukraine and our team members from Ukraine. Here are ways you can help

Get exclusive access to thought-provoking articles, bonus podcast content, and cutting-edge whitepapers. Become a member of the UX Magazine community today!

Home ›› Business Value and ROI ›› 6 Key Questions to Guide International UX Research ›› Testing Dialog Design in a Speech Application

Testing Dialog Design in a Speech Application

by Stephen Keller
6 min read
Share this post on
Tweet
Share
Post
Share
Email
Print

Save

Speech-recognition apps require specialized testing methods that can be applied to other interfaces as well.

Designing applications that use speech recognition as their primary user experience is a challenge that is compounded somewhat by the difficulty in testing an application that provides a speech interface.

In some ways, a voice user interface (VUI) has all the challenges of other interface design and testing, such as subjective feedback from users (“I don’t like this”), along with the additional challenge of having to account for the error rate associated with speech recognition.

Here we’ll discuss some of the methods associated with proving out the UX associated with a speech application dialog.

Dialog Design and Testing

A dialog is a common term for the UX of a speech application, because most speech applications take the form of a dialog between human and machine. Even most multi-modal applications such as Apple‘s Siri still largely model the interactions as a series of questions and answers. Because of this, test plans for speech applications often resemble literal scripts, like you might see for a play or a movie. A simple test script might look something like this:

COMPUTER: Thank you for calling the Acme Bank. What would you like to do today?

HUMAN: Transfer funds.

COMPUTER: Do you want to transfer funds from your savings account or your checking account?

HUMAN: Checking account.

The purpose of the test script, at a basic level, is to verify that simple functionality is working as intended. These scripts are similar to the unit tests used by developers to test specific pieces of code in an application. They are an important part of testing speech applications (if “transfer funds” is not a valid verb in the main menu prompt, that is an important bug to uncover), but they do not really test the design except to prove that the simplest use cases work as intended.

The Importance of a Blank Slate

When we refer to testing dialog design, we’re really referring to testing the interface. To do this, the application must be exposed to users who have no scripts or preconceived notions about the interface. Using the play analogy again, every prompt may be understood differently than the author intended, and there are a vast number of responses that can be elicited for even commonly understood prompts.

All of these different lines in the play need to be tested in the real world, and they will all vary in terms of what is prompted, what is understood, and what is wanted based on word choice. All good user experiences should be somewhat intuitive and self-explanatory, with the interface providing the information necessary for novice users to quickly understand how to accomplish their goals. In dialog design, this means providing to the user a clear mental model of the application by having prompts that spell out what the user can do at any menu and how to navigate to other menus.

If a speech application is only exposed to users who follow test scripts, there is no way to verify that the interface meets these requirements. As any interface designer knows, humans are not robots and are often unpredictable. In order to know whether a dialog design provides a good experience, it must be exposed to users who are allowed to react naturally.

These “blank slate” testers should be brought in to test the application as early as possible—often while it is still in an alpha or beta state. Since a speech application’s UX design is so tightly woven with its programming, getting feedback about the dialog design early on helps developers make and test any changes to an application’s codebase.

Running Usability Tests

There are a number of ways to perform usability tests of a speech application’s dialog design. The most common ways leverage the following techniques:

  • Informal or formal user feedback
  • Speech tuning response files
  • Listening to recordings of entire sessions or phone calls
  • “Wizard of Oz” interactions
User Feedback

The simplest sort of testing is to allow users to experience a dialog and provide feedback. This may be informal (asking users to just write up their experiences) or it may be formal (asking users to fill out a detailed evaluation form). Allowing users to provide direct feedback is always useful with any kind of UX design, as it allows the designer to know what really matters to users and what doesn’t.

User feedback is most helpful when it is combined with other types of testing, such as those discussed below. For example, by combining user feedback with full-call recording, it’s possible to use free-form feedback (“It didn’t seem to understand me much”) to pinpoint the exact spot where users had trouble (“I had to repeat myself three times at this particular prompt”).

Capturing Response Files

Every commercial speech recognizer on the market allows for the saving of what are called response or utterance files that act as a recording of what a user said and what the recognition result for that utterance was. In conjunction with speech tuning utilities (such as the LumenVox Speech Tuner), designers can listen to all of the interactions with application’s users, transcribe the speech, and generate accuracy metrics that indicate how well the application is performing.

During this process, the designer can identify a number of issues. One of the main things that can be ascertained, especially if a lot of data is collected, is which problems are related to the design of the dialog compared to issues with the speech recognizer. For instance, users who are giving appropriate responses to a prompt but are not recognized probably indicate that the recognizer needs to be tuned or the grammars changed. On the other hand, if users are speaking out-of-grammar phrases, then the prompts and the dialog options probably should be reworked so that users have a better idea what is expected of them.

Full Session Recording

Recording and listening to entire phone calls or sessions can be costly in terms of time required, but can reveal issues that may not otherwise present themselves. For instance, if users are getting frustrated and quitting the application, this can be obvious when the entire call is heard in context but may not be clear if only individual interactions are heard.

“Wizard of Oz” Interactions

The most time-intensive form of usability testing involves using “Wizard of Oz” interactions. These involve a person “behind the curtain” listening to live users interacting with the system in real time. The Wizard, who can be thought of as the automated system in the background, listens to the human talk to the system and then inputs the user’s choice through another mechanism (usually by putting an entry into a form using a mouse and keyboard on the back-end). In this sense, the human is replacing the speech recognizer and trying to determine the clarity of prompts and the range and wording of responses (this was also the subplot of an episode of Seinfeld where Kramer erroneously provided movie start times by reading a newspaper).

Using the Wizard of Oz model is a good mechanism for testing a dialog design before much programming or grammar design has been done. It allows the intended UX to be evaluated against actual users instead of just testers, since the system will still appear to work correctly and not upset paying customers who may not otherwise appreciate beta testing a new speech recognition system.

Conclusion

The best testing plan for speech applications will combine the methods above or will be a variation of one or more of them. When collecting user feedback on a speech application, it’s usually a good idea to capture response files at the same time in order to perform more in-depth speech tuning. Full recordings should be enabled when doing Wizard of Oz testing, and so on. These methods will allow the designer to understand how real-world users interact with a speech system, and provide instructive input for improving and enhancing the quality of the dialog design.

More generally, the same testing methodologies can also be adapted to other types of user interfaces outside of speech recognition. This includes the UX for web transactions, web chat, call center scripting, kiosk interfaces, and other systems where user input may be open ended or require semantic interpretation. The more real world testing that can be performed prior to building a system, the closer the launched product will serve its intended purpose right out of the gate, and the less rework will be required.

post authorStephen Keller

Stephen Keller
A Solutions Architect at LumenVox, a leading provider of speech recognition and text-to-speech software, Stephen specializes in the design and development of complex voice user interfaces (VUIs) as part of their client services team, working to improve the user experience for anyone interfacing with speech technologies. He has worked on dozens of speech applications, ranging from simple yes/no-style interactions to complex, multi-slot natural language interfaces. His role at LumenVox encompasses a variety of functions, including pure design/consultation work and also development and training.

Tweet
Share
Post
Share
Email
Print

Related Articles

Discover the hidden costs of AI-driven connectivity, from environmental impacts to privacy risks. Explore how our increasing reliance on AI is reshaping personal relationships and raising ethical challenges in the digital age.

Article by Louis Byrd
The Hidden Cost of Being Connected in the Age of AI
  • The article discusses the hidden costs of AI-driven connectivity, focusing on its environmental and energy demands.
  • It examines how increased connectivity exposes users to privacy risks and weakens personal relationships.
  • The article also highlights the need for ethical considerations to ensure responsible AI development and usage.
Share:The Hidden Cost of Being Connected in the Age of AI
9 min read

The role of the Head of Design is transforming. Dive into how modern design leaders amplify impact, foster innovation, and shape strategic culture, redefining what it means to lead design today.

Article by Darren Smith
Head of Design is Dead, Long Live the Head of Design!
  • The article examines the evolving role of the Head of Design, highlighting shifts in expectations, responsibilities, and leadership impact within design teams.
  • It discusses how design leaders amplify team performance, foster innovation, and align design initiatives with broader business goals, especially under changing demands in leadership roles.
  • The piece emphasizes the critical value of design leadership as a multiplier for organizational success, offering insights into the unique contributions that design leaders bring to strategy, culture, and team cohesion.
Share:Head of Design is Dead, Long Live the Head of Design!
9 min read

Discover how digital twins are transforming industries by enabling innovation and reducing waste. This article delves into the power of digital twins to create virtual replicas, allowing companies to improve products, processes, and sustainability efforts before physical resources are used. Read on to see how this cutting-edge technology helps streamline operations and drive smarter, eco-friendly decisions

Article by Alla Slesarenko
How Digital Twins Drive Innovation and Minimize Waste
  • The article explores how digital twins—virtual models of physical objects—enable organizations to drive innovation by allowing testing and improvements before physical implementation.
  • It discusses how digital twins can minimize waste and increase efficiency by identifying potential issues early, ultimately optimizing resource use.
  • The piece emphasizes the role of digital twins in various sectors, showcasing their capacity to improve processes, product development, and sustainability initiatives.
Share:How Digital Twins Drive Innovation and Minimize Waste
5 min read

Join the UX Magazine community!

Stay informed with exclusive content on the intersection of UX, AI agents, and agentic automation—essential reading for future-focused professionals.

Hello!

You're officially a member of the UX Magazine Community.
We're excited to have you with us!

Thank you!

To begin viewing member content, please verify your email.

Tell us about you. Enroll in the course.

    This website uses cookies to ensure you get the best experience on our website. Check our privacy policy and