Integration of TensorFlow based Acoustic Model with Kaldi WFST Decoder

06/21/2019
by   Minkyu Lim, et al.
0

While the Kaldi framework provides state-of-the-art components for speech recognition like feature extraction, deep neural network (DNN)-based acoustic models, and a weighted finite state transducer (WFST)-based decoder, it is difficult to implement a new flexible DNN model. By contrast, a general-purpose deep learning framework, such as TensorFlow, can easily build various types of neural network architectures using a tensor-based computation method, but it is difficult to apply them to WFST-based speech recognition. In this study, a TensorFlow-based acoustic model is integrated with a WFST-based Kaldi decoder to combine the two frameworks. The features and alignments used in Kaldi are converted so they can be trained by the TensorFlow model, and the DNN-based acoustic model is then trained. In the integrated Kaldi decoder, the posterior probabilities are calculated by querying the trained TensorFlow model, and a beam search is performed to generate the lattice. The advantages of the proposed one-pass decoder include the application of various types of neural networks to WFST-based speech recognition and WFST-based online decoding using a TensorFlow-based acoustic model. The TensorFlow based acoustic models trained using the RM, WSJ, and LibriSpeech datasets show the same level of performance as the model trained using the Kaldi framework.

READ FULL TEXT

page 2

page 3

research
10/31/2021

Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model

In typical multi-talker speech recognition systems, a neural network-bas...
research
08/27/2018

Augmenting Bottleneck Features of Deep Neural Network Employing Motor State for Speech Recognition at Humanoid Robots

As for the humanoid robots, the internal noise, which is generated by mo...
research
04/13/2018

Language Recognition using Time Delay Deep Neural Network

This work explores the use of a monolingual Deep Neural Network (DNN) mo...
research
10/27/2018

Fabrik: An Online Collaborative Neural Network Editor

We present Fabrik, an online neural network editor that provides tools t...
research
03/15/2020

Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models

In this paper we investigate the GMM-derived (GMMD) features for adaptat...
research
07/22/2015

Discriminative Segmental Cascades for Feature-Rich Phone Recognition

Discriminative segmental models, such as segmental conditional random fi...
research
12/04/2018

Auto-tuning TensorFlow Threading Model for CPU Backend

TensorFlow is a popular deep learning framework used by data scientists ...

Please sign up or login with your details

Forgot password? Click here to reset