Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification

09/13/2022
by   Chao Zhang, et al.
0

Language identification is critical for many downstream tasks in automatic speech recognition (ASR), and is beneficial to integrate into multilingual end-to-end ASR as an additional task. In this paper, we propose to modify the structure of the cascaded-encoder-based recurrent neural network transducer (RNN-T) model by integrating a per-frame language identifier (LID) predictor. RNN-T with cascaded encoders can achieve streaming ASR with low latency using first-pass decoding with no right-context, and achieve lower word error rates (WERs) using second-pass decoding with longer right-context. By leveraging such differences in the right-contexts and a streaming implementation of statistics pooling, the proposed method can achieve accurate streaming LID prediction with little extra test-time cost. Experimental results on a voice search dataset with 9 language locales shows that the proposed method achieves an average of 96.2 including oracle LID in the input.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2023

Streaming Speech-to-Confusion Network Speech Recognition

In interactive automatic speech recognition (ASR) systems, low-latency r...
research
11/05/2022

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

End-to-end formulation of automatic speech recognition (ASR) and speech ...
research
10/27/2020

Cascaded encoders for unifying streaming and non-streaming ASR

End-to-end (E2E) automatic speech recognition (ASR) models, by now, have...
research
01/25/2022

Improving the fusion of acoustic and text representations in RNN-T

The recurrent neural network transducer (RNN-T) has recently become the ...
research
03/29/2022

Streaming parallel transducer beam search with fast-slow cascaded encoders

Streaming ASR with strict latency constraints is required in many speech...
research
07/08/2020

Streaming End-to-End Bilingual ASR Systems with Joint Language Identification

Multilingual ASR technology simplifies model training and deployment, bu...
research
08/03/2021

Amortized Neural Networks for Low-Latency Speech Recognition

We introduce Amortized Neural Networks (AmNets), a compute cost- and lat...

Please sign up or login with your details

Forgot password? Click here to reset