Cross-Modal Transfer Learning for Multilingual Speech-to-Text Translation

10/24/2020
by   Chau Tran, et al.
8

We propose an effective approach to utilize pretrained speech and text models to perform speech-to-text translation (ST). Our recipe to achieve cross-modal and cross-lingual transfer learning (XMTL) is simple and generalizable: using an adaptor module to bridge the modules pretrained in different modalities, and an efficient finetuning step which leverages the knowledge from pretrained modules yet making it work on a drastically different downstream task. With this approach, we built a multilingual speech-to-text translation model with pretrained audio encoder (wav2vec) and multilingual text decoder (mBART), which achieves new state-of-the-art on CoVoST 2 ST benchmark [1] for English into 15 languages as well as 6 Romance languages into English with on average +2.8 BLEU and +3.9 BLEU, respectively. On low-resource languages (with less than 10 hours training data), our approach significantly improves the quality of speech-to-text translation with +9.0 BLEU on Portuguese-English and +5.2 BLEU on Dutch-English.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

Improved Cross-Lingual Transfer Learning For Automatic Speech Translation

Research in multilingual speech-to-text translation is topical. Having a...
research
05/24/2023

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

Joint speech-language training is challenging due to the large demand fo...
research
07/12/2021

Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task

Pretraining and multitask learning are widely used to improve the speech...
research
06/21/2021

Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling

Multi-head attention has each of the attention heads collect salient inf...
research
07/14/2021

FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task

In this paper, we describe our end-to-end multilingual speech translatio...
research
06/13/2023

NAVER LABS Europe's Multilingual Speech Translation Systems for the IWSLT 2023 Low-Resource Track

This paper presents NAVER LABS Europe's systems for Tamasheq-French and ...
research
03/09/2023

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition

Multi-media communications facilitate global interaction among people. H...

Please sign up or login with your details

Forgot password? Click here to reset