Comparison of Decoding Strategies for CTC Acoustic Models

08/15/2017
by   Thomas Zenkel, et al.
0

Connectionist Temporal Classification has recently attracted a lot of interest as it offers an elegant approach to building acoustic models (AMs) for speech recognition. The CTC loss function maps an input sequence of observable feature vectors to an output sequence of symbols. Output symbols are conditionally independent of each other under CTC loss, so a language model (LM) can be incorporated conveniently during decoding, retaining the traditional separation of acoustic and linguistic components in ASR. For fixed vocabularies, Weighted Finite State Transducers provide a strong baseline for efficient integration of CTC AMs with n-gram LMs. Character-based neural LMs provide a straight forward solution for open vocabulary speech recognition and all-neural models, and can be decoded with beam search. Finally, sequence-to-sequence models can be used to translate a sequence of individual sounds into a word string. We compare the performance of these three approaches, and analyze their error patterns, which provides insightful guidance for future research and development in this important area.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/23/2018

Acoustic-to-Word Recognition with Sequence-to-Sequence Models

Acoustic-to-Word recognition provides a straightforward solution to end-...
research
08/07/2018

Deep context: end-to-end contextual speech recognition

In automatic speech recognition (ASR) what a user says depends on the pa...
research
05/28/2023

RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition

Modern public ASR tools usually provide rich support for training variou...
research
03/03/2018

On Modular Training of Neural Acoustics-to-Word Model for LVCSR

End-to-end (E2E) automatic speech recognition (ASR) systems directly map...
research
08/02/2018

Sequence Discriminative Training for Deep Learning based Acoustic Keyword Spotting

Speech recognition is a sequence prediction problem. Besides employing v...
research
08/02/2018

Linguistic Search Optimization for Deep Learning Based LVCSR

Recent advances in deep learning based large vocabulary con- tinuous spe...
research
10/02/2020

Differentiable Weighted Finite-State Transducers

We introduce a framework for automatic differentiation with weighted fin...

Please sign up or login with your details

Forgot password? Click here to reset