Cross-Modal Transfer Learning for Multilingual Speech-to-Text Translation

10/24/2020
by   Chau Tran, et al.
8

We propose an effective approach to utilize pretrained speech and text models to perform speech-to-text translation (ST). Our recipe to achieve cross-modal and cross-lingual transfer learning (XMTL) is simple and generalizable: using an adaptor module to bridge the modules pretrained in different modalities, and an efficient finetuning step which leverages the knowledge from pretrained modules yet making it work on a drastically different downstream task. With this approach, we built a multilingual speech-to-text translation model with pretrained audio encoder (wav2vec) and multilingual text decoder (mBART), which achieves new state-of-the-art on CoVoST 2 ST benchmark [1] for English into 15 languages as well as 6 Romance languages into English with on average +2.8 BLEU and +3.9 BLEU, respectively. On low-resource languages (with less than 10 hours training data), our approach significantly improves the quality of speech-to-text translation with +9.0 BLEU on Portuguese-English and +5.2 BLEU on Dutch-English.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/24/2022

T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation

We present a new approach to perform zero-shot cross-modal transfer betw...
07/14/2021

FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task

In this paper, we describe our end-to-end multilingual speech translatio...
07/12/2021

Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task

Pretraining and multitask learning are widely used to improve the speech...
06/21/2021

Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling

Multi-head attention has each of the attention heads collect salient inf...
09/16/2020

NABU - Multilingual Graph-based Neural RDF Verbalizer

The RDF-to-text task has recently gained substantial attention due to co...
09/06/2022

Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation

We propose a two-stage training approach for developing a single NMT mod...
01/11/2022

CVSS Corpus and Massively Multilingual Speech-to-Speech Translation

We introduce CVSS, a massively multilingual-to-English speech-to-speech ...