Do End-to-End Speech Recognition Models Care About Context?

02/17/2021
by   Lasse Borgholt, et al.
0

The two most common paradigms for end-to-end speech recognition are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. It has been argued that the latter is better suited for learning an implicit language model. We test this hypothesis by measuring temporal context sensitivity and evaluate how the models perform when we constrain the amount of contextual information in the audio input. We find that the AED model is indeed more context sensitive, but that the gap can be closed by adding self-attention to the CTC model. Furthermore, the two models perform similarly when contextual information is constrained. Finally, in contrast to previous research, our results show that the CTC model is highly competitive on WSJ and LibriSpeech without the help of an external language model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2017

Exploring Neural Transducers for End-to-End Speech Recognition

In this work, we perform an empirical comparison among the CTC, RNN-Tran...
research
03/15/2018

Advancing Connectionist Temporal Classification With Attention Modeling

In this study, we propose advancing all-neural speech recognition by dir...
research
04/19/2023

CB-Conformer: Contextual biasing Conformer for biased word recognition

Due to the mismatch between the source and target domains, how to better...
research
09/19/2023

End-to-End Speech Recognition Contextualization with Large Language Models

In recent years, Large Language Models (LLMs) have garnered significant ...
research
05/03/2021

On the limit of English conversational speech recognition

In our previous work we demonstrated that a single headed attention enco...
research
12/17/2020

CIF-based Collaborative Decoding for End-to-end Contextual Speech Recognition

End-to-end (E2E) models have achieved promising results on multiple spee...
research
06/27/2019

EmotionX-KU: BERT-Max based Contextual Emotion Classifier

We propose a contextual emotion classifier based on a transferable langu...

Please sign up or login with your details

Forgot password? Click here to reset