Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

06/09/2020
by   Changhan Wang, et al.
0

Transfer learning from high-resource languages is known to be an efficient way to improve end-to-end automatic speech recognition (ASR) for low-resource languages. Pre-trained or jointly trained encoder-decoder models, however, do not share the language modeling (decoder) for the same language, which is likely to be inefficient for distant target languages. We introduce speech-to-text translation (ST) as an auxiliary task to incorporate additional knowledge of the target language and enable transferring from that target language. Specifically, we first translate high-resource ASR transcripts into a target low-resource language, with which a ST model is trained. Both ST and target ASR share the same attention-based encoder-decoder architecture and vocabulary. The former task then provides a fully pre-trained model for the latter, bringing up to 24.6 (direct transfer from high-resource ASR). We show that training ST with human translations is not necessary. ST trained with machine translation (MT) pseudo-labels brings consistent gains. It can even outperform those using human labels when transferred to target ASR by leveraging only 500K MT examples. Even with pseudo-labels from low-resource MT (200K examples), ST-enhanced transfer brings up to 8.9

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2020

Reduce and Reconstruct: Improving Low-resource End-to-end ASR Via Reconstruction Using Reduced Vocabularies

End-to-end automatic speech recognition (ASR) systems are increasingly b...
research
06/07/2022

LegoNN: Building Modular Encoder-Decoder Models

State-of-the-art encoder-decoder models (e.g. for machine translation (M...
research
11/06/2018

Transfer learning of language-independent end-to-end ASR with language model fusion

This work explores better adaptation methods to low-resource languages u...
research
10/19/2020

Subtitles to Segmentation: Improving Low-Resource Speech-to-Text Translation Pipelines

In this work, we focus on improving ASR output segmentation in the conte...
research
03/05/2021

Transfer Learning based Speech Affect Recognition in Urdu

It has been established that Speech Affect Recognition for low resource ...
research
05/21/2020

Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

In this work, we study leveraging extra text data to improve low-resourc...
research
06/25/2020

Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion

Grapheme-to-phoneme (G2P) models are a key component in Automatic Speech...

Please sign up or login with your details

Forgot password? Click here to reset