Sequence-based Multi-lingual Low Resource Speech Recognition

02/21/2018
by   Siddharth Dalmia, et al.
0

Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains. End-to-end approaches, in particular sequence-based techniques, are attractive because of their simplicity and elegance. While it is possible to integrate traditional multi-lingual bottleneck feature extractors as front-ends, we show that end-to-end multi-lingual training of sequence models is effective on context independent models trained using Connectionist Temporal Classification (CTC) loss. We show that our model improves performance on Babel languages by over 6 word/phoneme error rate when compared to mono-lingual systems built in the same setting for these languages. We also show that the trained model can be adapted cross-lingually to an unseen language using just 25 show that training on multiple languages is important for very low resource cross-lingual target scenarios, but not for multi-lingual testing scenarios. Here, it appears beneficial to include large well prepared datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2021

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

Speech processing systems currently do not support the vast majority of ...
research
02/27/2023

MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition

Multi-lingual speech recognition aims to distinguish linguistic expressi...
research
01/28/2021

Does Typological Blinding Impede Cross-Lingual Sharing?

Bridging the performance gap between high- and low-resource languages ha...
research
09/28/2021

Exploring Teacher-Student Learning Approach for Multi-lingual Speech-to-Intent Classification

End-to-end speech-to-intent classification has shown its advantage in ha...
research
04/08/2022

Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition

Low resource speech recognition has been long-suffering from insufficien...
research
06/16/2023

CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages

In this paper, we present CML-TTS, a recursive acronym for CML-Multi-Lin...
research
11/07/2018

Analysis of Multilingual Sequence-to-Sequence speech recognition systems

This paper investigates the applications of various multilingual approac...

Please sign up or login with your details

Forgot password? Click here to reset