Leveraging Out-of-Task Data for End-to-End Automatic Speech Translation

09/14/2019
by   Juan Pino, et al.
0

For automatic speech translation (AST), end-to-end approaches are outperformed by cascaded models that transcribe with automatic speech recognition (ASR), then translate with machine translation (MT). A major cause of the performance gap is that, while existing AST corpora are small, massive datasets exist for both the ASR and MT subsystems. In this work, we evaluate several data augmentation and pretraining approaches for AST, comparing all on the same datasets. Simple data augmentation by translating ASR transcripts proves most effective on the English–French augmented LibriSpeech dataset, closing the performance gap from 8.2 to 1.4 BLEU, compared to a very strong cascade that could directly utilize copious ASR and MT data. The same end-to-end approach plus fine-tuning closes the gap on the English–Romanian MuST-C dataset from 6.7 to 3.7 BLEU. In addition to these results, we present practical recommendations for augmentation and pretraining approaches. Finally, we decrease the performance gap to 0.01 BLEU using a Transformer-based architecture.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2019

Harnessing Indirect Training Data for End-to-End Automatic Speech Translation: Tricks of the Trade

For automatic speech translation (AST), end-to-end approaches are outper...
research
06/30/2021

IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task

This paper describes the submission to the IWSLT 2021 Low-Resource Speec...
research
03/22/2023

Selective Data Augmentation for Robust Speech Translation

Speech translation (ST) systems translate speech in one language to text...
research
12/11/2022

End-to-End Speech Translation of Arabic to English Broadcast News

Speech translation (ST) is the task of directly translating acoustic spe...
research
09/20/2021

MeetDot: Videoconferencing with Live Translation Captions

We present MeetDot, a videoconferencing system with live translation cap...
research
04/11/2022

Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

Neural transducers have been widely used in automatic speech recognition...
research
07/10/2023

The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task

This paper describes the NPU-MSXF system for the IWSLT 2023 speech-to-sp...

Please sign up or login with your details

Forgot password? Click here to reset