The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR

05/31/2023
by   Kaousheik Jayakumar, et al.
0

Building a multilingual Automated Speech Recognition (ASR) system in a linguistically diverse country like India can be a challenging task due to the differences in scripts and the limited availability of speech data. This problem can be solved by exploiting the fact that many of these languages are phonetically similar. These languages can be converted into a Common Label Set (CLS) by mapping similar sounds to common labels. In this paper, new approaches are explored and compared to improve the performance of CLS based multilingual ASR model. Specific language information is infused in the ASR model by giving Language ID or using CLS to Native script converter on top of the CLS Multilingual model. These methods give a significant improvement in Word Error Rate (WER) compared to the CLS baseline. These methods are further tried on out-of-distribution data to check their robustness.

READ FULL TEXT
research
10/30/2022

DuDe: Dual-Decoder Multilingual ASR for Indian Languages using Common Label Set

In a multilingual country like India, multilingual Automatic Speech Reco...
research
06/02/2021

Dual Script E2E framework for Multilingual and Code-Switching ASR

India is home to multiple languages, and training automatic speech recog...
research
07/23/2021

OLR 2021 Challenge: Datasets, Rules and Baselines

This paper introduces the sixth Oriental Language Recognition (OLR) 2021...
research
06/23/2020

One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble

The task of grapheme-to-phoneme (G2P) conversion is important for both s...
research
09/27/2016

Multi-task Recurrent Model for True Multilingual Speech Recognition

Research on multilingual speech recognition remains attractive yet chall...
research
04/20/2020

Language-agnostic Multilingual Modeling

Multilingual Automated Speech Recognition (ASR) systems allow for the jo...
research
12/10/2022

Punctuation Restoration for Singaporean Spoken Languages: English, Malay, and Mandarin

This paper presents the work of restoring punctuation for ASR transcript...

Please sign up or login with your details

Forgot password? Click here to reset