Powerful and Extensible WFST Framework for RNN-Transducer Losses

03/18/2023
by   Aleksandr Laptev, et al.
0

This paper presents a framework based on Weighted Finite-State Transducers (WFST) to simplify the development of modifications for RNN-Transducer (RNN-T) loss. Existing implementations of RNN-T use CUDA-related code, which is hard to extend and debug. WFSTs are easy to construct and extend, and allow debugging through visualization. We introduce two WFST-powered RNN-T implementations: (1) "Compose-Transducer", based on a composition of the WFST graphs from acoustic and textual schema – computationally competitive and easy to modify; (2) "Grid-Transducer", which constructs the lattice directly for further computations – most compact, and computationally efficient. We illustrate the ease of extensibility through introduction of a new W-Transducer loss – the adaptation of the Connectionist Temporal Classification with Wild Cards. W-Transducer (W-RNNT) consistently outperforms the standard RNN-T in a weakly-supervised data setup with missing parts of transcriptions at the beginning and end of utterances. All RNN-T losses are implemented with the k2 framework and are available in the NeMo toolkit.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2021

Input Length Matters: An Empirical Study Of RNN-T And MWER Training For Long-form Telephony Speech Recognition

End-to-end models have achieved state-of-the-art results on several auto...
research
11/25/2018

Deep RNN Framework for Visual Sequential Applications

Extracting temporal and representation features efficiently plays a pivo...
research
02/14/2014

A Clockwork RNN

Sequence prediction and classification are ubiquitous and challenging pr...
research
11/01/2021

Sequence Transduction with Graph-based Supervision

The recurrent neural network transducer (RNN-T) objective plays a major ...
research
03/31/2022

Memory-Efficient Training of RNN-Transducer with Sampled Softmax

RNN-Transducer has been one of promising architectures for end-to-end au...
research
05/09/2017

Phonetic Temporal Neural Model for Language Identification

Deep neural models, particularly the LSTM-RNN model, have shown great po...
research
07/11/2023

Improving RNN-Transducers with Acoustic LookAhead

RNN-Transducers (RNN-Ts) have gained widespread acceptance as an end-to-...

Please sign up or login with your details

Forgot password? Click here to reset