Residual Energy-Based Models for End-to-End Speech Recognition

03/25/2021
by   Qiujia Li, et al.
0

End-to-end models with auto-regressive decoders have shown impressive results for automatic speech recognition (ASR). These models formulate the sequence-level probability as a product of the conditional probabilities of all individual tokens given their histories. However, the performance of locally normalised models can be sub-optimal because of factors such as exposure bias. Consequently, the model distribution differs from the underlying data distribution. In this paper, the residual energy-based model (R-EBM) is proposed to complement the auto-regressive ASR model to close the gap between the two distributions. Meanwhile, R-EBMs can also be regarded as utterance-level confidence estimators, which may benefit many downstream tasks. Experiments on a 100hr LibriSpeech dataset show that R-EBMs can reduce the word error rates (WERs) by 8.2 curves of confidence scores by 12.6 Furthermore, on a state-of-the-art model using self-supervised learning (wav2vec 2.0), R-EBMs still significantly improves both the WER and confidence estimation performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition

For various speech-related tasks, confidence scores from a speech recogn...
research
01/14/2021

An evaluation of word-level confidence estimation for end-to-end automatic speech recognition

Quantifying the confidence (or conversely the uncertainty) of a predicti...
research
10/07/2021

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

As end-to-end automatic speech recognition (ASR) models reach promising ...
research
04/26/2021

Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction

Confidence scores are very useful for downstream applications of automat...
research
05/29/2023

Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning

Self-supervised learning (SSL) of speech has shown impressive results in...
research
10/08/2021

Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units

In end-to-end automatic speech recognition (ASR), a model is expected to...
research
05/18/2023

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System

Estimating confidence scores for recognition results is a classic task i...

Please sign up or login with your details

Forgot password? Click here to reset