Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

07/07/2022
by   Muhammad Umar Farooq, et al.
6

Multilingual speech recognition has drawn significant attention as an effective way to compensate data scarcity for low-resource languages. End-to-end (e2e) modelling is preferred over conventional hybrid systems, mainly because of no lexicon requirement. However, hybrid DNN-HMMs still outperform e2e models in limited data scenarios. Furthermore, the problem of manual lexicon creation has been alleviated by publicly available trained models of grapheme-to-phoneme (G2P) and text to IPA transliteration for a lot of languages. In this paper, a novel approach of hybrid DNN-HMM acoustic models fusion is proposed in a multilingual setup for the low-resource languages. Posterior distributions from different monolingual acoustic models, against a target language speech signal, are fused together. A separate regression neural network is trained for each source-target language pair to transform posteriors from source acoustic model to the target language. These networks require very limited data as compared to the ASR training. Posterior fusion yields a relative gain of 14.65 monolingual baselines respectively. Cross-lingual model fusion shows that the comparable results can be achieved without using posteriors from the language dependent ASR.

READ FULL TEXT
research
06/14/2023

Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition

Exploiting cross-lingual resources is an effective way to compensate for...
research
07/07/2022

Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition

Multilingual automatic speech recognition (ASR) systems mostly benefit l...
research
07/05/2018

Neural Language Codes for Multilingual Acoustic Models

Multilingual Speech Recognition is one of the most costly AI problems, b...
research
09/08/2022

Multilingual Transformer Language Model for Speech Recognition in Low-resource Languages

It is challenging to train and deploy Transformer LMs for hybrid speech ...
research
06/25/2020

Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion

Grapheme-to-phoneme (G2P) models are a key component in Automatic Speech...
research
03/12/2021

Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition

With the rapid development of speech assistants, adapting server-intende...
research
04/13/2019

End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning

End-to-end text-to-speech (TTS) has shown great success on large quantit...

Please sign up or login with your details

Forgot password? Click here to reset