A Fully Differentiable Beam Search Decoder

02/16/2019
by   Ronan Collobert, et al.
0

We introduce a new beam search decoder that is fully differentiable, making it possible to optimize at training time through the inference procedure. Our decoder allows us to combine models which operate at different granularities (e.g. acoustic and language models). It can be used when target sequences are not aligned to input sequences by considering all possible alignments between the two. We demonstrate our approach scales by applying it to speech recognition, jointly training acoustic and word-level language models. The system is end-to-end, with gradients flowing through the whole architecture from the word-level transcriptions. Recent research efforts have shown that deep neural networks with attention-based mechanisms are powerful enough to successfully train an acoustic model from the final transcription, while implicitly learning a language model. Instead, we show that it is possible to discriminatively train an acoustic model jointly with an explicit and possibly pre-trained language model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2023

Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation

Attention-based encoder-decoder (AED) speech recognition model has been ...
research
08/07/2015

An End-to-End Neural Network for Polyphonic Piano Music Transcription

We present a supervised neural network model for polyphonic piano music ...
research
11/25/2019

Independent language modeling architecture for end-to-end ASR

The attention-based end-to-end (E2E) automatic speech recognition (ASR) ...
research
08/16/2018

Improved Chord Recognition by Combining Duration and Harmonic Language Models

Chord recognition systems typically comprise an acoustic model that pred...
research
08/18/2023

OCR Language Models with Custom Vocabularies

Language models are useful adjuncts to optical models for producing accu...
research
10/18/2021

Efficient Sequence Training of Attention Models using Approximative Recombination

Sequence discriminative training is a great tool to improve the performa...
research
10/18/2021

Automatic Learning of Subword Dependent Model Scales

To improve the performance of state-of-the-art automatic speech recognit...

Please sign up or login with your details

Forgot password? Click here to reset