High Performance Sequence-to-Sequence Model for Streaming Speech Recognition

03/22/2020
by   Thai-Son Nguyen, et al.
0

Recently sequence-to-sequence models have started to achieve state-of-the art performance on standard speech recognition tasks when processing audio data in batch mode, i.e., the complete audio data is available when starting processing. However, when it comes to perform run-on recognition on an input stream of audio data while producing recognition results in real-time and with a low word-based latency, these models face several challenges. For many techniques, the whole audio sequence to be decoded needs to be available at the start of the processing, e.g., for the attention mechanism or for the bidirectional LSTM (BLSTM). In this paper we propose several techniques to mitigate these problems. We introduce an additional loss function controlling the uncertainty of the attention mechanism, a modified beam search identifying partial, stable hypotheses, ways of working with BLSTM in the encoder, and the use of chunked BLSTM. Our experiments show that with the right combination of these techniques it is possible to perform run-on speech recognition with a low word-based latency without sacrificing performance in terms of word error rate.

READ FULL TEXT
research
12/21/2018

An Empirical Analysis of Deep Audio-Visual Models for Speech Recognition

In this project, we worked on speech recognition, specifically predictin...
research
07/20/2023

Globally Normalising the Transducer for Streaming Speech Recognition

The Transducer (e.g. RNN-Transducer or Conformer-Transducer) generates a...
research
08/07/2020

Incremental Text to Speech for Neural Sequence-to-Sequence Models using Reinforcement Learning

Modern approaches to text to speech require the entire input character s...
research
05/22/2020

Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection

Encoder-decoder models provide a generic architecture for sequence-to-se...
research
05/15/2020

A Novel Fusion of Attention and Sequence to Sequence Autoencoders to Predict Sleepiness From Speech

Motivated by the attention mechanism of the human visual system and rece...
research
03/22/2020

Low Latency ASR for Simultaneous Speech Translation

User studies have shown that reducing the latency of our simultaneous le...
research
04/03/2023

Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition

We present dual-attention neural biasing, an architecture designed to bo...

Please sign up or login with your details

Forgot password? Click here to reset