Dual Script E2E framework for Multilingual and Code-Switching ASR

by   Mari Ganesh Kumar, et al.

India is home to multiple languages, and training automatic speech recognition (ASR) systems for languages is challenging. Over time, each language has adopted words from other languages, such as English, leading to code-mixing. Most Indian languages also have their own unique scripts, which poses a major limitation in training multilingual and code-switching ASR systems. Inspired by results in text-to-speech synthesis, in this work, we use an in-house rule-based phoneme-level common label set (CLS) representation to train multilingual and code-switching ASR for Indian languages. We propose two end-to-end (E2E) ASR systems. In the first system, the E2E model is trained on the CLS representation, and we use a novel data-driven back-end to recover the native language script. In the second system, we propose a modification to the E2E model, wherein the CLS representation and the native language characters are used simultaneously for training. We show our results on the multilingual and code-switching tasks of the Indic ASR Challenge 2021. Our best results achieve 6 system for the multilingual and code-switching tasks, respectively, on the challenge development data.



There are no comments yet.


page 1

page 2

page 3

page 4


Multilingual and code-switching ASR challenges for low resource Indian languages

Recently, there is increasing interest in multilingual automatic speech ...

A Configurable Multilingual Model is All You Need to Recognize All Languages

Multilingual automatic speech recognition (ASR) models have shown great ...

Phonological Features for 0-shot Multilingual Speech Synthesis

Code-switching—the intra-utterance use of multiple languages—is prevalen...

Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR

With the advent of globalization, there is an increasing demand for mult...

Bilingual End-to-End ASR with Byte-Level Subwords

In this paper, we investigate how the output representation of an end-to...

Multilingual Adaptation of RNN Based ASR Systems

A large amount of data is required for automatic speech recognition (ASR...

Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech

This study investigates whether the phonological features derived from t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.