A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

08/03/2021

∙

We study training a single end-to-end (E2E) automatic speech recognition (ASR) model for three languages used in Kazakhstan: Kazakh, Russian, and English. We first describe the development of multilingual E2E ASR based on Transformer networks and then perform an extensive assessment on the aforementioned languages. We also compare two variants of output grapheme set construction: combined and independent. Furthermore, we evaluate the impact of LMs and data augmentation techniques on the recognition performance of the multilingual E2E ASR. In addition, we present several datasets for training and evaluation purposes. Experiment results show that the multilingual models achieve comparable performances to the monolingual baselines with a similar number of parameters. Our best monolingual and multilingual models achieved 20.9 respectively. To ensure the reproducibility of our experiments and results, we share our training recipes, datasets, and pre-trained models.

READ FULL TEXT

A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

Sign in with Google

Consider DeepAI Pro