FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task

07/14/2021
by   Yun Tang, et al.
7

In this paper, we describe our end-to-end multilingual speech translation system submitted to the IWSLT 2021 evaluation campaign on the Multilingual Speech Translation shared task. Our system is built by leveraging transfer learning across modalities, tasks and languages. First, we leverage general-purpose multilingual modules pretrained with large amounts of unlabelled and labelled data. We further enable knowledge transfer from the text task to the speech task by training two tasks jointly. Finally, our multilingual model is finetuned on speech translation task-specific data to achieve the best translation results. Experimental results show our system outperforms the reported systems, including both end-to-end and cascaded based approaches, by a large margin. In some translation directions, our speech translation results evaluated on the public Multilingual TEDx test set are even comparable with the ones from a strong text-to-text translation system, which uses the oracle speech transcripts as input.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

Joint speech-language training is challenging due to the large demand fo...
research
06/08/2023

KIT's Multilingual Speech Translation System for IWSLT 2023

Many existing speech translation benchmarks focus on native-English spee...
research
10/24/2020

Cross-Modal Transfer Learning for Multilingual Speech-to-Text Translation

We propose an effective approach to utilize pretrained speech and text m...
research
10/01/2019

Multilingual End-to-End Speech Translation

In this paper, we propose a simple yet effective framework for multiling...
research
06/21/2021

Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling

Multi-head attention has each of the attention heads collect salient inf...
research
07/31/2023

Multilingual context-based pronunciation learning for Text-to-Speech

Phonetic information and linguistic knowledge are an essential component...
research
05/13/2022

Talking Face Generation with Multilingual TTS

In this work, we propose a joint system combining a talking face generat...

Please sign up or login with your details

Forgot password? Click here to reset