Harnessing Indirect Training Data for End-to-End Automatic Speech Translation: Tricks of the Trade

09/14/2019
by   Juan Pino, et al.
0

For automatic speech translation (AST), end-to-end approaches are outperformed by cascaded models that transcribe with automatic speech recognition (ASR), then translate with machine translation (MT). A major cause of the performance gap is that, while existing AST corpora are small, massive datasets exist for both the ASR and MT subsystems. In this work, we evaluate several data augmentation and pretraining approaches for AST, by comparing all on the same datasets. Simple data augmentation by translating ASR transcripts proves most effective on the English–French augmented LibriSpeech dataset, closing the performance gap from 8.2 to 1.4 BLEU, compared to a very strong cascade that could directly utilize copious ASR and MT data. The same end-to-end approach plus fine-tuning closes the gap on the English–Romanian MuST-C dataset from 6.7 to 3.7 BLEU. In addition to these results, we present practical recommendations for augmentation and pretraining approaches. Finally, we decrease the performance gap to 0.01 BLEU using a Transformer-based architecture.

READ FULL TEXT
research
09/14/2019

Leveraging Out-of-Task Data for End-to-End Automatic Speech Translation

For automatic speech translation (AST), end-to-end approaches are outper...
research
10/21/2020

Cascaded Models With Cyclic Feedback For Direct Speech Translation

Direct speech translation describes a scenario where only speech inputs ...
research
03/22/2023

Selective Data Augmentation for Robust Speech Translation

Speech translation (ST) systems translate speech in one language to text...
research
10/19/2022

G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR

Data augmentation is a ubiquitous technique used to provide robustness t...
research
10/27/2022

Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation

Data augmentation is a technique to generate new training data based on ...
research
10/16/2020

Adaptive Feature Selection for End-to-End Speech Translation

Information in speech signals is not evenly distributed, making it an ad...
research
04/25/2020

Jointly Trained Transformers models for Spoken Language Translation

Conventional spoken language translation (SLT) systems are pipeline base...

Please sign up or login with your details

Forgot password? Click here to reset