Hybrid-SD (H_SD): A new hybrid evaluation metric for automatic speech recognition tasks

11/03/2022
by   Zitha Sasindran, et al.
0

Many studies have examined the shortcomings of word error rate (WER) as an evaluation metric for automatic speech recognition (ASR) systems, particularly when used for spoken language understanding tasks such as intent recognition and dialogue systems. In this paper, we propose Hybrid-SD (H_SD), a new hybrid evaluation metric for ASR systems that takes into account both semantic correctness and error rate. To generate sentence dissimilarity scores (SD), we built a fast and lightweight SNanoBERT model using distillation techniques. Our experiments show that the SNanoBERT model is 25.9x smaller and 38.8x faster than SRoBERTa while achieving comparable results on well-known benchmarks. Hence, making it suitable for deploying with ASR models on edge devices. We also show that H_SD correlates more strongly with downstream tasks such as intent recognition and named-entity recognition (NER).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2021

Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding

Word Error Rate (WER) has been the predominant metric used to evaluate t...
research
06/03/2021

Semantic-WER: A Unified Metric for the Evaluation of ASR Transcript for End Usability

Recent advances in supervised, semi-supervised and self-supervised deep ...
research
09/09/2023

Leveraging Large Language Models for Exploiting ASR Uncertainty

While large language models excel in a variety of natural language proce...
research
02/17/2022

AISHELL-NER: Named Entity Recognition from Chinese Speech

Named Entity Recognition (NER) from speech is among Spoken Language Unde...
research
05/19/2020

Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition

Knowledge distillation has been widely used to compress existing deep le...
research
12/07/2020

Using multiple ASR hypotheses to boost i18n NLU performance

Current voice assistants typically use the best hypothesis yielded by th...
research
02/08/2022

A two-step approach to leverage contextual data: speech recognition in air-traffic communications

Automatic Speech Recognition (ASR), as the assistance of speech communic...

Please sign up or login with your details

Forgot password? Click here to reset