Flag

We stand with Ukraine and our team members from Ukraine. Here are ways you can help

Get exclusive access to thought-provoking articles, bonus podcast content, and cutting-edge whitepapers. Become a member of the UX Magazine community today!

Home ›› Business Value and ROI ›› 6 Key Questions to Guide International UX Research ›› Checking Under the Hood of your ASR Engine

Checking Under the Hood of your ASR Engine

by Jo Appleton
7 min read
Share this post on
Tweet
Share
Post
Share
Email
Print

Save

Automated speech recognition (ASR) engines need to be tuned using diagnostic tools for optimum performance, just like cars.

If you put your foot on the accelerator and listen to your car rev before finally kicking into gear, you can usually tell when it’s time to tune-up your engine. The same is true for an automated speech recognition (ASR) engine.

If calls into a speech application start sputtering their way into an interaction with a customer service rep because words become “unrecognizable,” it is probably time to tune-up your ASR software.

Like the diagnostic tool that a mechanic plugs into your dashboard that tells them almost everything about your car’s performance, a speech tuner is a powerful software tool that allows ASR users to evaluate their speech application and get feedback on how it is performing. There are other tuning tools available out there but this article focuses on the LumenVox Speech Tuner as an example of one possible tool with which to tune a speech application.

The Engines of Today

Speech recognition engines developed by today’s speech technology leaders are designed to be robust enough to recognize natural language with a fair amount of tuning flexibility already prebuilt into the software. Yet, many of you reading this can undoubtedly say there has been at least one instance when you’ve given up trying to talk to a machine and said “operator” or pressed the zero key so many times the system gives up and routes you to customer service.

We then likely decide that, in future, it’s best not to go through the automated attendant to contact that organization. It was too much trouble, and speaking to a live customer service representative seemed the only way to satisfy the original purpose of your call. You might even tell your friends about it so they don’t get stuck in the system, too. To the organization, this equates to increased labor costs as more calls are routed to human attendants.

“In most cases, the fix can be simple,” says Axel An, a senior core technology engineer in the speech industry who helped develop one of the first speech tuners nearly a decade ago. “Both businesses and customers could benefit from tuning. It can reduce the call length and the number of people handling calls. And that can translate directly to a monetary return.”

From an efficiency standpoint, the car engine and the speech engine are the same

From an efficiency standpoint, the car engine and the speech engine are the same; a regularly tuned-up car will get good gas mileage and release good emissions. A regularly tuned-up speech engine will allow higher call volumes and less human work. Although the speech recognition industry estimates that about 40-50% of total development and deployment time should be spent on the tuning process, speech tuners are greatly under-utilized.

What separates tuning from the normal process of testing an application is that tuning generally relies on actual production data collected from call logs. It provides a statistical method—rather than an anecdotal one—for evaluating the speech engine, making tuning easier and more precise.

The Tuner Interface

Our latest and greatest LumenVox Speech Tuner has a user interface that is surprisingly easy to use. Six, colorful icons make up most of a tool bar to the left of the screen when the tuner is started up:

  • The Summary page
  • Call Browser
  • Grammar Editor
  • Transcriber
  • Text-to-Speech
  • Tester

The difficulty level of tuning a speech engine can range from rather simple to quite intricate, depending on performance indicators from the ASR application. For instance, it might be as basic as adjusting the “grammars” to include new utterances. A grammar is a vocabulary group comprised of the words or numbers that the ASR will expect callers to say, such as all the names of the employees in an organization. Tuning can also be as involved as rewriting a prompt to sound clearer or, in some cases, redesigning the entire application to better guide callers through the flow of a call. In addition to grammar tuning, the tuner can perform transcriptions, instant parameter tuning, and version upgrade testing of any speech application.

The tuning process is essentially the same for simplistic or sophisticated improvements to a speech engine. The process is also cyclical, meaning after changes have been made to the application, the results are reviewed and more changes are made. Returning to the analogy of tuning a car, this is when the mechanic plugs the vehicle back into his diagnostic machine after changing the spark plugs and adjusting the timing to see if it is running better, or if he needs to make more adjustments. This type of feedback loop actually turns out to be an excellent way for ASR users and application developers to see the affects of tuning in real-time.

Here are the 5 basic steps to tuning an automated speech recognizer:

  1. Deploy application
  2. Collect the data
  3. Transcribe the data
  4. Evaluate the data
  5. Test the changes (by deploying the application)

LumenVox Tuner

Deploy an application

There are many places within an ASR application to make effective changes but in general grammars are the easiest and most effective place to start tuning. To begin using the LumenVox Speech Tuner you must first deploy your speech application so it can record live incoming calls and build up a dataset.

[google_ad:WITHINARTICLE_1_234X60_ALL]

Collect the Data

The data is then imported into the Speech Tuner application in the form of a recording file, or utterance file that contains a list of each call into the speech application. Once the logs are loaded into the Speech Tuner they appear in the summary screen separated by grammar type. For example, one grammar might be a company’s directory of employee names and extensions; another might be made up of city and state names. It depends on the type business for which the auto attendant is taking calls.

LumenVox Tuner

Transcribe the Data

Now that you have a dataset to work with, the next step in the tuning process is to transcribe the data. Start by selecting a grammar set from the list “Grammars” in the summary screen and click the Transcriber tab on the toolbar (pictured above). The list of calls will show in the top half of this screen and a waveform of the sound quality of each call along a time scale is shown in the section below. Select each call one by one, press play and listen to the audio.

LumenVox Tuner

If what the caller says matches what the ASR engine interpreted them as saying, confirm it by selecting the green check mark next to the text box, or press enter to confirm and play the next call on the list. If the caller says something other than what was decoded, type in the word that the caller said and then press enter. Sometimes a caller’s response is not clear or is muddled by background noise. When this happens, select the red “X” mark or press escape to reject the call as “garbage” and play the next call.

LumenVox Tuner

Analyze the Data

After all calls have been transcribed, you now have an aggregate dataset that can be easily analyzed. Navigate to the “Tester” screen where you can see at a glance how well, or how poorly, your ASR engine has performed. The “statistics” box displays the percentage of words the ASR engine recognized correctly, how confident it was about being correct and how many words or utterances were found “OOV,” out of vocabulary or out of grammar. Evaluating the numbers will also give you an idea of how callers are using the system and help to identify problem areas.

LumenVox Tuner

For example, if you find most of the out of vocabulary (OOV) calls are saying the same word, you might want to add that word to the grammar so the ASR engine will recognize it in future calls. This is the point when a developer is brought into the tuning process because editing a grammar can be a highly technical task. Nevertheless, his or her input will be worth its weight in gold as this will not only increase the ASR’s word accuracy levels, it will also decrease the number of calls routed to a human attendant to complete.

Test the Data

While the tester displays detailed information about the loaded dataset and provides accuracy information for the ASR engine, its main purpose is to test changes. The tester can save those changes and iterate with others until the application is optimized and ready for production.

LumenVox Tuner

Crossing the Finish Line

In essence, what is outlined here is “How to Tune a Speech Recognition Engine 101.”” There are other, more in-depth ways to quantify the accuracy of an ASR engine, and this article skimmed only the surface. A more comprehensive tune-up would measure the tuner’s correct acceptance and correct rejected rates—how often the tuner accepted or rejected a user’s response properly. Conversely there are false accepts and false reject rates that can be factored into calculating a speech engine’s accuracy—all of which can be controlled by the speech designer.

Like cars, no two speech applications are exactly alike, so it is important to measure the accuracy of each speech application differently. But there is one thing most would agree on: it doesn’t matter how well tuned your car or speech engine is, you can always do better.

Image of engine courtesy Shutterstock

post authorJo Appleton

Jo Appleton
This user does not have bio yet.

Tweet
Share
Post
Share
Email
Print

Related Articles

Learn how understanding user emotions can create intuitive, supportive designs that build trust and loyalty.

Article by Pavel Bukengolts
The Role of Emotion in UX: Embracing Emotionally Intelligent Design
  • The article emphasizes that emotionally intelligent design is key to creating meaningful UX that satisfies users and drives business success.
  • It shows how understanding users’ emotions — through research, empathy mapping, journey mapping, and service blueprinting — can reveal hidden needs and shape more intuitive, reassuring digital experiences.
  • The piece argues that embedding empathy and emotional insights into design strengthens user engagement, loyalty, and overall satisfaction.
Share:The Role of Emotion in UX: Embracing Emotionally Intelligent Design
5 min read

As AI takes on more of the solution work, the real craft of design shifts to how we frame the problem. This piece explores why staying with uncertainty and resisting the urge to rush to answers may be a designer’s most powerful skill.

Article by Morteza Pourmohamadi
The Frame, the Illusion, and the Brief
  • The article highlights that as AI takes over more of the solution work, the designer’s true craft lies in framing the problem rather than rushing to solve it.
  • It shows how cognitive biases like the need for closure or action bias can distort our perception, making careful problem framing essential for clarity and creativity.
  • The piece argues that framing is itself a design act — a practice of staying with uncertainty long enough to cultivate shared understanding and more meaningful outcomes.
Share:The Frame, the Illusion, and the Brief
3 min read

UX isn’t just about screens — it’s about feelings. This article explores why the future of UX depends on blending artificial and emotional intelligence to create truly human experiences.

Article by Krystian M. Frahn
UX is More Than Screens: The Art of Designing Emotions
  • The article shows how Steve Jobs’ shift from “form follows function” to “form follows emotion” transformed design into a deeply human practice centered on empathy.
  • It explains that emotions drive perception, usability, and loyalty — making emotional intelligence essential to meaningful user experiences.
  • The piece argues that the future of UX lies in uniting artificial and emotional intelligence to create technology that feels truly human.
Share:UX is More Than Screens: The Art of Designing Emotions
6 min read

Join the UX Magazine community!

Stay informed with exclusive content on the intersection of UX, AI agents, and agentic automation—essential reading for future-focused professionals.

Hello!

You're officially a member of the UX Magazine Community.
We're excited to have you with us!

Thank you!

To begin viewing member content, please verify your email.

Tell us about you. Enroll in the course.

    This website uses cookies to ensure you get the best experience on our website. Check our privacy policy and