ASR Error Detection via Audio-Transcript entailment

by   Nimshi Venkat Meripo, et al.

Despite improved performances of the latest Automatic Speech Recognition (ASR) systems, transcription errors are still unavoidable. These errors can have a considerable impact in critical domains such as healthcare, when used to help with clinical documentation. Therefore, detecting ASR errors is a critical first step in preventing further error propagation to downstream applications. To this end, we propose a novel end-to-end approach for ASR error detection using audio-transcript entailment. To the best of our knowledge, we are the first to frame this problem as an end-to-end entailment task between the audio segment and its corresponding transcript segment. Our intuition is that there should be a bidirectional entailment between audio and transcript when there is no recognition error and vice versa. The proposed model utilizes an acoustic encoder and a linguistic encoder to model the speech and transcript respectively. The encoded representations of both modalities are fused to predict the entailment. Since doctor-patient conversations are used in our experiments, a particular emphasis is placed on medical terms. Our proposed model achieves classification error rates (CER) of 26.2 errors and 23 strong baseline by 12


page 1

page 2

page 3

page 4


Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems

Automatic speech recognition (ASR) systems typically rely on an external...

Decoupling recognition and transcription in Mandarin ASR

Much of the recent literature on automatic speech recognition (ASR) is t...

Streaming Audio-Visual Speech Recognition with Alignment Regularization

Recognizing a word shortly after it is spoken is an important requiremen...

Data Augmentation for Training Dialog Models Robust to Speech Recognition Errors

Speech-based virtual assistants, such as Amazon Alexa, Google assistant,...

The Far Side of Failure: Investigating the Impact of Speech Recognition Errors on Subsequent Dementia Classification

Linguistic anomalies detectable in spontaneous speech have shown promise...

MP3 Compression To Diminish Adversarial Noise in End-to-End Speech Recognition

Audio Adversarial Examples (AAE) represent specially created inputs mean...

Calibrate and Refine! A Novel and Agile Framework for ASR-error Robust Intent Detection

The past ten years have witnessed the rapid development of text-based in...

Please sign up or login with your details

Forgot password? Click here to reset