Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection

05/22/2020
by   Danni Liu, et al.
0

Encoder-decoder models provide a generic architecture for sequence-to-sequence tasks such as speech recognition and translation. While offline systems are often evaluated on quality metrics like word error rates (WER) and BLEU, latency is also a crucial factor in many practical use-cases. We propose three latency reduction techniques for chunk-based incremental inference and evaluate their efficiency in terms of accuracy-latency trade-off. On the 300-hour How2 dataset, we reduce latency by 83 sacrificing 1 experiments use the Transformer, the hypothesis selection strategies are applicable to other encoder-decoder models. To avoid expensive re-computation, we use a unidirectionally-attending encoder. After an adaptation procedure to partial sequences, the unidirectional model performs on-par with the original model. We further show that our approach is also applicable to low-latency speech translation. On How2 English-Portuguese speech translation, we reduce latency to 0.7 second (-84 rel.) compared to the offline system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2017

Sequence-to-Sequence Models Can Directly Translate Foreign Speech

We present a recurrent encoder-decoder deep neural network architecture ...
research
06/08/2020

Learning to Count Words in Fluent Speech enables Online Speech Recognition

Sequence to Sequence models, in particular the Transformer, achieve stat...
research
10/07/2020

Super-Human Performance in Online Low-latency Recognition of Conversational Speech

Achieving super-human performance in recognizing human speech has been a...
research
03/22/2020

High Performance Sequence-to-Sequence Model for Streaming Speech Recognition

Recently sequence-to-sequence models have started to achieve state-of-th...
research
04/03/2017

Neural Lattice-to-Sequence Models for Uncertain Inputs

The input to a neural sequence-to-sequence model is often determined by ...
research
08/07/2020

Incremental Text to Speech for Neural Sequence-to-Sequence Models using Reinforcement Learning

Modern approaches to text to speech require the entire input character s...

Please sign up or login with your details

Forgot password? Click here to reset