Language model fusion for streaming end to end speech recognition

04/09/2021
by   Rodrigo Cabrera, et al.
0

Streaming processing of speech audio is required for many contemporary practical speech recognition tasks. Even with the large corpora of manually transcribed speech data available today, it is impossible for such corpora to cover adequately the long tail of linguistic content that's important for tasks such as open-ended dictation and voice search. We seek to address both the streaming and the tail recognition challenges by using a language model (LM) trained on unpaired text data to enhance the end-to-end (E2E) model. We extend shallow fusion and cold fusion approaches to streaming Recurrent Neural Network Transducer (RNNT), and also propose two new competitive fusion approaches that further enhance the RNNT architecture. Our results on multiple languages with varying training set sizes show that these fusion methods improve streaming RNNT performance through introducing extra linguistic features. Cold fusion works consistently better on streaming RNNT with up to a 8.5

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2021

Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion

How to leverage dynamic contextual information in end-to-end speech reco...
research
10/26/2020

Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer

Recurrent Neural Network Transducer (RNN-T), like most end-to-end speech...
research
08/24/2020

Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus

End-to-end (E2E) automatic speech recognition (ASR) systems lack the dis...
research
12/22/2019

end-to-end training of a large vocabulary end-to-end speech recognition system

In this paper, we present an end-to-end training framework for building ...
research
11/15/2018

Streaming End-to-end Speech Recognition For Mobile Devices

End-to-end (E2E) models, which directly predict output character sequenc...
research
11/09/2022

Adaptive Multi-Corpora Language Model Training for Speech Recognition

Neural network language model (NNLM) plays an essential role in automati...
research
06/16/2022

Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization

We report on aggressive quantization strategies that greatly accelerate ...

Please sign up or login with your details

Forgot password? Click here to reset