Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

09/05/2018
by   Sameer Bansal, et al.
0

We present a simple approach to improve direct speech-to-text translation (ST) when the source language is low-resource: we pre-train the model on a high-resource automatic speech recognition (ASR) task, and then fine-tune its parameters for ST. We demonstrate that our approach is effective by pre-training on 300 hours of English ASR data to improve Spanish-English ST from 10.8 to 20.2 BLEU when only 20 hours of Spanish-English ST training data is available. Through an ablation study, we find that the pre-trained encoder (acoustic model) accounts for most of the improvement, which is surprising since the shared language in these tasks is the target language (text), and not the source language (audio). Applying this insight, we show that pre-training on ASR helps ST even when the ASR language differs from both source and target ST languages: pre-training on French ASR also improves Spanish-English ST. Finally, we show that the approach improves a true low-resource task: pre-training on a combination of English ASR and French ASR improves Mboshi-French ST, where only 4 hours of data are available, from 3.5 to 7.1 BLEU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2023

Strategies for improving low resource speech to text translation relying on pre-trained ASR models

This paper presents techniques and findings for improving the performanc...
research
03/31/2022

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition

Self-supervised learning (SSL) to learn high-level speech representation...
research
04/05/2022

A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition

Unpaired data has shown to be beneficial for low-resource automatic spee...
research
09/05/2022

Multi-Figurative Language Generation

Figurative language generation is the task of reformulating a given text...
research
10/22/2020

MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation

End-to-end Speech-to-text Translation (E2E- ST), which directly translat...
research
05/24/2023

Unit-based Speech-to-Speech Translation Without Parallel Data

We propose an unsupervised speech-to-speech translation (S2ST) system th...
research
03/05/2021

Transfer Learning based Speech Affect Recognition in Urdu

It has been established that Speech Affect Recognition for low resource ...

Please sign up or login with your details

Forgot password? Click here to reset