Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding

04/05/2021
by   Suyoun Kim, et al.
0

Word Error Rate (WER) has been the predominant metric used to evaluate the performance of automatic speech recognition (ASR) systems. However, WER is sometimes not a good indicator for downstream Natural Language Understanding (NLU) tasks, such as intent recognition, slot filling, and semantic parsing in task-oriented dialog systems. This is because WER takes into consideration only literal correctness instead of semantic correctness, the latter of which is typically more important for these downstream tasks. In this study, we propose a novel Semantic Distance (SemDist) measure as an alternative evaluation metric for ASR systems to address this issue. We define SemDist as the distance between a reference and hypothesis pair in a sentence-level embedding space. To represent the reference and hypothesis as a sentence embedding, we exploit RoBERTa, a state-of-the-art pre-trained deep contextualized language model based on the transformer architecture. We demonstrate the effectiveness of our proposed metric on various downstream tasks, including intent recognition, semantic parsing, and named entity recognition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2022

Hybrid-SD (H_SD): A new hybrid evaluation metric for automatic speech recognition tasks

Many studies have examined the shortcomings of word error rate (WER) as ...
research
06/03/2021

Semantic-WER: A Unified Metric for the Evaluation of ASR Transcript for End Usability

Recent advances in supervised, semi-supervised and self-supervised deep ...
research
10/11/2021

Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric

Measuring automatic speech recognition (ASR) system quality is critical ...
research
09/09/2023

Leveraging Large Language Models for Exploiting ASR Uncertainty

While large language models excel in a variety of natural language proce...
research
04/22/2022

WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment

Historically lower-level tasks such as automatic speech recognition (ASR...
research
09/27/2021

Using Pause Information for More Accurate Entity Recognition

Entity tags in human-machine dialog are integral to natural language und...
research
12/03/2019

Fast Intent Classification for Spoken Language Understanding

Spoken Language Understanding (SLU) systems consist of several machine l...

Please sign up or login with your details

Forgot password? Click here to reset