EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding

07/29/2015
by   Yajie Miao, et al.
0

The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs). Despite this progress, building a new ASR system remains a challenging task, requiring various resources, multiple training stages and significant expertise. This paper presents our Eesen framework which drastically simplifies the existing pipeline to build state-of-the-art ASR systems. Acoustic modeling in Eesen involves learning a single recurrent neural network (RNN) predicting context-independent targets (phonemes or characters). To remove the need for pre-generated frame labels, we adopt the connectionist temporal classification (CTC) objective function to infer the alignments between speech and label sequences. A distinctive feature of Eesen is a generalized decoding approach based on weighted finite-state transducers (WFSTs), which enables the efficient incorporation of lexicons and language models into CTC decoding. Experiments show that compared with the standard hybrid DNN systems, Eesen achieves comparable word error rates (WERs), while at the same time speeding up decoding significantly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2018

Towards End-to-End Code-Switching Speech Recognition

Code-switching speech recognition has attracted an increasing interest r...
research
09/27/2019

End-to-End Code-Switching ASR for Low-Resourced Language Pairs

Despite the significant progress in end-to-end (E2E) automatic speech re...
research
11/01/2021

Sequence Transduction with Graph-based Supervision

The recurrent neural network transducer (RNN-T) objective plays a major ...
research
03/17/2022

Prediction of speech intelligibility with DNN-based performance measures

This paper presents a speech intelligibility model based on automatic sp...
research
07/10/2017

Feature Joint-State Posterior Estimation in Factorial Speech Processing Models using Deep Neural Networks

This paper proposes a new method for calculating joint-state posteriors ...
research
04/07/2015

Transferring Knowledge from a RNN to a DNN

Deep Neural Network (DNN) acoustic models have yielded many state-of-the...
research
08/29/2017

Information Theoretic Analysis of DNN-HMM Acoustic Modeling

We propose an information theoretic framework for quantitative assessmen...

Please sign up or login with your details

Forgot password? Click here to reset