Language-agnostic Multilingual Modeling

04/20/2020
by   Arindrima Datta, et al.
4

Multilingual Automated Speech Recognition (ASR) systems allow for the joint training of data-rich and data-scarce languages in a single model. This enables data and parameter sharing across languages, which is especially beneficial for the data-scarce languages. However, most state-of-the-art multilingual models require the encoding of language information and therefore are not as flexible or scalable when expanding to newer languages. Language-independent multilingual models help to address this issue, and are also better suited for multicultural societies where several languages are frequently used together (but often rendered with different writing systems). In this paper, we propose a new approach to building a language-agnostic multilingual ASR system which transforms all languages to one writing system through a many-to-one transliteration transducer. Thus, similar sounding acoustics are mapped to a single, canonical target sequence of graphemes, effectively separating the modeling and rendering problems. We show with four Indic languages, namely, Hindi, Bengali, Tamil and Kannada, that the language-agnostic multilingual model achieves up to 10 language-dependent multilingual model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2022

Learning ASR pathways: A sparse multilingual ASR model

Neural network pruning can be effectively applied to compress automatic ...
research
05/24/2023

MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition

Traditional Multilingual Text Recognition (MLTR) usually targets a fixed...
research
10/30/2022

DuDe: Dual-Decoder Multilingual ASR for Indian Languages using Common Label Set

In a multilingual country like India, multilingual Automatic Speech Reco...
research
05/31/2023

The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR

Building a multilingual Automated Speech Recognition (ASR) system in a l...
research
05/19/2023

XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

Data scarcity is a crucial issue for the development of highly multiling...
research
06/02/2023

Multilingual Conceptual Coverage in Text-to-Image Models

We propose "Conceptual Coverage Across Languages" (CoCo-CroLa), a techni...
research
07/13/2021

A Configurable Multilingual Model is All You Need to Recognize All Languages

Multilingual automatic speech recognition (ASR) models have shown great ...

Please sign up or login with your details

Forgot password? Click here to reset