A Survey of Multilingual Models for Automatic Speech Recognition

02/25/2022
by   Hemant Yadav, et al.
0

Although Automatic Speech Recognition (ASR) systems have achieved human-like performance for a few languages, the majority of the world's languages do not have usable systems due to the lack of large speech datasets to train these models. Cross-lingual transfer is an attractive solution to this problem, because low-resource languages can potentially benefit from higher-resource languages either through transfer learning, or being jointly trained in the same multilingual model. The problem of cross-lingual transfer has been well studied in ASR, however, recent advances in Self Supervised Learning are opening up avenues for unlabeled speech data to be used in multilingual ASR models, which can pave the way for improved performance on low-resource languages. In this paper, we survey the state of the art in multilingual ASR models that are built with cross-lingual transfer in mind. We present best practices for building multilingual models from research across diverse languages and techniques, discuss open questions and provide recommendations for future work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2022

Adaptive Activation Network For Low Resource Multilingual Speech Recognition

Low resource automatic speech recognition (ASR) is a useful but thorny t...
research
10/07/2021

Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0

We propose a simple and effective cross-lingual transfer learning method...
research
06/02/2023

DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model

Multilingual self-supervised speech representation models have greatly e...
research
04/01/2022

Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition

Aphasia is a common speech and language disorder, typically caused by a ...
research
05/27/2022

Punctuation Restoration in Spanish Customer Support Transcripts using Transfer Learning

Automatic Speech Recognition (ASR) systems typically produce unpunctuate...
research
03/15/2021

XLST: Cross-lingual Self-training to Learn Multilingual Representation for Low Resource Speech Recognition

In this paper, we propose a weakly supervised multilingual representatio...
research
07/28/2020

Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous?

Phones, the segmental units of the International Phonetic Alphabet (IPA)...

Please sign up or login with your details

Forgot password? Click here to reset