A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

08/03/2021
by   Saida Mussakhojayeva, et al.
0

We study training a single end-to-end (E2E) automatic speech recognition (ASR) model for three languages used in Kazakhstan: Kazakh, Russian, and English. We first describe the development of multilingual E2E ASR based on Transformer networks and then perform an extensive assessment on the aforementioned languages. We also compare two variants of output grapheme set construction: combined and independent. Furthermore, we evaluate the impact of LMs and data augmentation techniques on the recognition performance of the multilingual E2E ASR. In addition, we present several datasets for training and evaluation purposes. Experiment results show that the multilingual models achieve comparable performances to the monolingual baselines with a similar number of parameters. Our best monolingual and multilingual models achieved 20.9 respectively. To ensure the reproducibility of our experiments and results, we share our training recipes, datasets, and pre-trained models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2022

Code Switched and Code Mixed Speech Recognition for Indic languages

Training multilingual automatic speech recognition (ASR) systems is chal...
research
10/11/2022

Scaling Up Deliberation for Multilingual ASR

Multilingual end-to-end automatic speech recognition models are attracti...
research
05/01/2022

Bilingual End-to-End ASR with Byte-Level Subwords

In this paper, we investigate how the output representation of an end-to...
research
05/13/2020

DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation

In previous works, only parameter weights of ASR models are optimized un...
research
09/16/2023

Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints

We propose a first step toward multilingual end-to-end automatic speech ...
research
05/25/2023

Mixture-of-Expert Conformer for Streaming Multilingual ASR

End-to-end models with large capacity have significantly improved multil...
research
02/22/2023

UML: A Universal Monolingual Output Layer for Multilingual ASR

Word-piece models (WPMs) are commonly used subword units in state-of-the...

Please sign up or login with your details

Forgot password? Click here to reset