LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition

10/21/2020
by   Xie Chen, et al.
0

LSTM language models (LSTM-LMs) have been proven to be powerful and yielded significant performance improvements over count based n-gram LMs in modern speech recognition systems. Due to its infinite history states and computational load, most previous studies focus on applying LSTM-LMs in the second-pass for rescoring purpose. Recent work shows that it is feasible and computationally affordable to adopt the LSTM-LMs in the first-pass decoding within a dynamic (or tree based) decoder framework. In this work, the LSTM-LM is composed with a WFST decoder on-the-fly for the first-pass decoding. Furthermore, motivated by the long-term history nature of LSTM-LMs, the use of context beyond the current utterance is explored for the first-pass decoding in conversational speech recognition. The context information is captured by the hidden states of LSTM-LMs across utterance and can be used to guide the first-pass search effectively. The experimental results in our internal meeting transcription system show that significant performance improvements can be obtained by incorporating the contextual information with LSTM-LMs in the first-pass decoding, compared to applying the contextual information in the second-pass rescoring.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/16/2021

An Asynchronous WFST-Based Decoder For Automatic Speech Recognition

We introduce asynchronous dynamic decoder, which adopts an efficient A* ...
11/18/2020

Context-aware RNNLM Rescoring for Conversational Speech Recognition

Conversational speech recognition is regarded as a challenging task due ...
04/02/2020

Full-Sum Decoding for Hybrid HMM based Speech Recognition using LSTM Language Model

In hybrid HMM based speech recognition, LSTM language models have been w...
07/01/2019

LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring

LSTM based language models are an important part of modern LVCSR systems...
10/03/2017

Decoding visemes: improving machine lipreading

To undertake machine lip-reading, we try to recognise speech from a visu...
01/27/2021

Transformer Based Deliberation for Two-Pass Speech Recognition

Interactive speech recognition systems must generate words quickly while...
08/02/2016

Efficient Segmental Cascades for Speech Recognition

Discriminative segmental models offer a way to incorporate flexible feat...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.