Efficient Sequence Training of Attention Models using Approximative Recombination

10/18/2021
by   Nils-Philipp Wynands, et al.
0

Sequence discriminative training is a great tool to improve the performance of an automatic speech recognition system. It does, however, necessitate a sum over all possible word sequences, which is intractable to compute in practice. Current state-of-the-art systems with unlimited label context circumvent this problem by limiting the summation to an n-best list of relevant competing hypotheses obtained from beam search. This work proposes to perform (approximative) recombinations of hypotheses during beam search, if they share a common local history. The error that is incurred by the approximation is analyzed and it is shown that using this technique the effective beam size can be increased by several orders of magnitude without significantly increasing the computational requirements. Lastly, it is shown that this technique can be used to effectively perform sequence discriminative training for attention-based encoder-decoder acoustic models on the LibriSpeech task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2018

Vectorization of hypotheses and speech for faster beam search in encoder decoder-based speech recognition

Attention-based encoder decoder network uses a left-to-right beam search...
research
12/05/2017

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Attention-based encoder-decoder architectures such as Listen, Attend, an...
research
07/24/2023

Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition

Although frame-based models, such as CTC and transducers, have an affini...
research
08/02/2018

Sequence Discriminative Training for Deep Learning based Acoustic Keyword Spotting

Speech recognition is a sequence prediction problem. Besides employing v...
research
12/12/2020

Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging

End-to-end models that condition the output label sequence on all previo...
research
02/16/2019

A Fully Differentiable Beam Search Decoder

We introduce a new beam search decoder that is fully differentiable, mak...
research
04/13/2021

Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept

With the advent of direct models in automatic speech recognition (ASR), ...

Please sign up or login with your details

Forgot password? Click here to reset