LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

06/05/2022
by   Jinchuan Tian, et al.
0

Despite the rapid progress in automatic speech recognition (ASR) research, recognizing multilingual speech using a unified ASR system remains highly challenging. Previous works on multilingual speech recognition mainly focus on two directions: recognizing multiple monolingual speech or recognizing code-switched speech that uses different languages interchangeably within a single utterance. However, a pragmatic multilingual recognizer is expected to be compatible with both directions. In this work, a novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information and generating frame-level language-aware representations during encoding. In the LAE, the primary encoding is implemented by the shared block while the language-specific blocks are used to extract specific representations for each language. To learn language-specific information discriminatively, a language-aware training method is proposed to optimize the language-specific blocks in LAE. Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level and shows superior performance on both monolingual and multilingual ASR tasks. With either a real-recorded or simulated code-switched dataset, the proposed LAE achieves statistically significant improvements on both CTC and neural transducer systems. Code is released

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2022

Code Switched and Code Mixed Speech Recognition for Indic languages

Training multilingual automatic speech recognition (ASR) systems is chal...
research
07/12/2023

Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition

Multilingual speech recognition for both monolingual and code-switching ...
research
05/13/2020

DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation

In previous works, only parameter weights of ASR models are optimized un...
research
05/01/2022

Bilingual End-to-End ASR with Byte-Level Subwords

In this paper, we investigate how the output representation of an end-to...
research
06/09/2020

Learning not to Discriminate: Task Agnostic Learning for Improving Monolingual and Code-switched Speech Recognition

Recognizing code-switched speech is challenging for Automatic Speech Rec...
research
07/21/2023

Prompting Large Language Models with Speech Recognition Abilities

Large language models have proven themselves highly flexible, able to so...
research
06/01/2020

Streaming Language Identification using Combination of Acoustic Representations and ASR Hypotheses

This paper presents our modeling and architecture approaches for buildin...

Please sign up or login with your details

Forgot password? Click here to reset