CLSRIL-23: Cross Lingual Speech Representations for Indic Languages

07/15/2021
by   Anirudh Gupta, et al.
0

We present a CLSRIL-23, a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across 23 Indic languages. It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations and jointly learns the quantization of latents shared across all languages. We compare the language wise loss during pretraining to compare effects of monolingual and multilingual pretraining. Performance on some downstream fine-tuning tasks for speech recognition is also compared and our experiments show that multilingual pretraining outperforms monolingual training, in terms of learning speech representations which encodes phonetic similarity of languages and also in terms of performance on down stream tasks. A decrease of 5 used for finetuning in Hindi. All the code models are also open sourced. CLSRIL-23 is a model trained on 23 languages and almost 10,000 hours of audio data to facilitate research in speech recognition for Indic languages. We hope that new state of the art systems will be created using the self supervised approach, especially for low resources Indic languages.

READ FULL TEXT
research
06/24/2020

Unsupervised Cross-lingual Representation Learning for Speech Recognition

This paper presents XLSR which learns cross-lingual speech representatio...
research
03/14/2023

Learning Cross-lingual Visual Speech Representations

Cross-lingual self-supervised learning has been a growing research topic...
research
02/07/2020

Unsupervised pretraining transfers well across languages

Cross-lingual and multi-lingual training of Automatic Speech Recognition...
research
11/15/2021

Joint Unsupervised and Supervised Training for Multilingual ASR

Self-supervised training has shown promising gains in pretraining models...
research
04/03/2019

Massively Multilingual Adversarial Speech Recognition

We report on adaptation of multilingual end-to-end speech recognition mo...
research
11/17/2021

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

This paper presents XLS-R, a large-scale model for cross-lingual speech ...
research
01/19/2021

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

In this paper, we propose a unified pre-training approach called UniSpee...

Please sign up or login with your details

Forgot password? Click here to reset