On the limit of English conversational speech recognition

05/03/2021
by   Zoltán Tüske, et al.
0

In our previous work we demonstrated that a single headed attention encoder-decoder model is able to reach state-of-the-art results in conversational speech recognition. In this paper, we further improve the results for both Switchboard 300 and 2000. Through use of an improved optimizer, speaker vector embeddings, and alternative speech representations we reduce the recognition errors of our LSTM system on Switchboard-300 by 4 relative. Compensation of the decoder model with the probability ratio approach allows more efficient integration of an external language model, and we report 5.9 models. Our study also considers the recently proposed conformer, and more advanced self-attention based language models. Overall, the conformer shows similar performance to the LSTM; nevertheless, their combination and decoding with an improved LM reaches a new record on Switchboard-300, 5.0 on SWB and CHM. Our findings are also confirmed on Switchboard-2000, and a new state of the art is reported, practically reaching the limit of the benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2018

An improved hybrid CTC-Attention model for speech recognition

Recently, end-to-end speech recognition with a hybrid model consisting o...
research
10/29/2018

Improved hybrid CTC-Attention model for speech recognition

Recently, end-to-end speech recognition with a hybrid model consisting o...
research
02/17/2021

Do End-to-End Speech Recognition Models Care About Context?

The two most common paradigms for end-to-end speech recognition are conn...
research
01/04/2020

Transformer-based language modeling and decoding for conversational speech recognition

We propose a way to use a transformer-based language model in conversati...
research
01/20/2020

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300

It is generally believed that direct sequence-to-sequence (seq2seq) spee...
research
03/17/2021

Advancing RNN Transducer Technology for Speech Recognition

We investigate a set of techniques for RNN Transducers (RNN-Ts) that wer...
research
05/14/2018

RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition

We compare the fast training and decoding speed of RETURNN of attention ...

Please sign up or login with your details

Forgot password? Click here to reset