Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss

08/11/2023
by   Mohammad Soleymanpour, et al.
0

We introduce a bilingual solution to support English as secondary locale for most primary locales in hybrid automatic speech recognition (ASR) settings. Our key developments constitute: (a) pronunciation lexicon with grapheme units instead of phone units, (b) a fully bilingual alignment model and subsequently bilingual streaming transformer model, (c) a parallel encoder structure with language identification (LID) loss, (d) parallel encoder with an auxiliary loss for monolingual projections. We conclude that in comparison to LID loss, our proposed auxiliary loss is superior in specializing the parallel encoders to respective monolingual locales, and that contributes to stronger bilingual learning. We evaluate our work on large-scale training and test tasks for bilingual Spanish (ES) and bilingual Italian (IT) applications. Our bilingual models demonstrate strong English code-mixing capability. In particular, the bilingual IT model improves the word error rate (WER) for a code-mix IT task from 46.5 monolingual IT model (9.5

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2020

Learning to Recognize Code-switched Speech Without Forgetting Monolingual Speech Recognition

Recently, there has been significant progress made in Automatic Speech R...
research
06/09/2020

Learning not to Discriminate: Task Agnostic Learning for Improving Monolingual and Code-switched Speech Recognition

Recognizing code-switched speech is challenging for Automatic Speech Rec...
research
06/14/2023

Towards training Bilingual and Code-Switched Speech Recognition models from Monolingual data sources

Multilingual Automatic Speech Recognition (ASR) models are capable of tr...
research
11/02/2022

Monolingual Recognizers Fusion for Code-switching Speech Recognition

The bi-encoder structure has been intensively investigated in code-switc...
research
11/30/2020

Transformer-Transducers for Code-Switched Speech Recognition

We live in a world where 60 languages fluently. Members of these communi...
research
11/29/2021

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

Conversational bilingual speech encompasses three types of utterances: t...
research
11/02/2022

Conversation-oriented ASR with multi-look-ahead CBS architecture

During conversations, humans are capable of inferring the intention of t...

Please sign up or login with your details

Forgot password? Click here to reset