RED-ACE: Robust Error Detection for ASR using Confidence Embeddings

03/14/2022
by   Zorik Gekhman, et al.
0

ASR Error Detection (AED) models aim to post-process the output of Automatic Speech Recognition (ASR) systems, in order to detect transcription errors. Modern approaches usually use text-based input, comprised solely of the ASR transcription hypothesis, disregarding additional signals from the ASR model. Instead, we propose to utilize the ASR system's word-level confidence scores for improving AED performance. Specifically, we add an ASR Confidence Embedding (ACE) layer to the AED model's encoder, allowing us to jointly encode the confidence scores and the transcribed text into a contextualized representation. Our experiments show the benefits of ASR confidence scores for AED, their complementary effect over the textual signal, as well as the effectiveness and robustness of ACE for combining these signals. To foster further research, we publish a novel AED dataset consisting of ASR outputs on the LibriSpeech corpus with annotated transcription errors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition

In Speech Emotion Recognition (SER), textual data is often used alongsid...
research
10/05/2021

ASR Rescoring and Confidence Estimation with ELECTRA

In automatic speech recognition (ASR) rescoring, the hypothesis with the...
research
06/30/2021

Sequence-level Confidence Classifier for ASR Utterance Accuracy and Application to Acoustic Models

Scores from traditional confidence classifiers (CCs) in automatic speech...
research
06/22/2017

Automatic Quality Estimation for ASR System Combination

Recognizer Output Voting Error Reduction (ROVER) has been widely used fo...
research
11/16/2018

Investigating the Effects of Word Substitution Errors on Sentence Embeddings

A key initial step in several natural language processing (NLP) tasks in...
research
08/04/2021

Improving Distinction between ASR Errors and Speech Disfluencies with Feature Space Interpolation

Fine-tuning pretrained language models (LMs) is a popular approach to au...
research
04/13/2020

Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?

Automatic Speech Recognition (ASR) systems introduce word errors, which ...

Please sign up or login with your details

Forgot password? Click here to reset