Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances

02/12/2020
by   Phillip Keung, et al.
0

We discuss the problem of echographic transcription in autoregressive sequence-to-sequence attentional architectures for automatic speech recognition, where a model produces very long sequences of repetitive outputs when presented with out-of-domain utterances. We decode audio from the British National Corpus with an attentional encoder-decoder model trained solely on the LibriSpeech corpus. We observe that there are many 5-second recordings that produce more than 500 characters of decoding output (i.e. more than 100 characters per second). A frame-synchronous hybrid (DNN-HMM) model trained on the same data does not produce these unusually long transcripts. These decoding issues are reproducible in a speech transformer model from ESPnet, and to a lesser extent in a self-attention CTC model, suggesting that these issues are intrinsic to the use of the attention mechanism. We create a separate length prediction model to predict the correct number of wordpieces in the output, which allows us to identify and truncate problematic decoding results without increasing word error rates on the LibriSpeech task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2019

Synchronous Transformers for End-to-End Speech Recognition

For most of the attention-based sequence-to-sequence models, the decoder...
research
12/08/2016

Towards better decoding and language model integration in sequence to sequence models

The recently proposed Sequence-to-Sequence (seq2seq) framework advocates...
research
10/26/2022

Monotonic segmental attention for automatic speech recognition

We introduce a novel segmental-attention model for automatic speech reco...
research
06/02/2023

Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation

Punctuated text prediction is crucial for automatic speech recognition a...
research
01/22/2019

Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition

Self-attention has demonstrated great success in sequence-to-sequence ta...
research
10/28/2019

Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding

In this paper, we investigate the benefit that off-the-shelf word embedd...
research
05/21/2020

Large scale evaluation of importance maps in automatic speech recognition

In this paper, we propose a metric that we call the structured saliency ...

Please sign up or login with your details

Forgot password? Click here to reset