Self-Training for End-to-End Speech Translation

06/03/2020
by   Juan Pino, et al.
0

One of the main challenges for end-to-end speech translation is data scarcity. We leverage pseudo-labels generated from unlabeled audio by a cascade and an end-to-end speech translation model. This provides 8.3 and 5.7 BLEU gains over a strong semi-supervised baseline on the MuST-C English-French and English-German datasets, reaching state-of-the art performance. The effect of the quality of the pseudo-labels is investigated. Our approach is shown to be more effective than simply pre-training the encoder on the speech recognition task. Finally, we demonstrate the effectiveness of self-training by directly generating pseudo-labels with an end-to-end model instead of a cascade model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2022

The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task

This paper describes the submission of our end-to-end YiTrans speech tra...
research
02/21/2023

Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation

For end-to-end speech translation, regularizing the encoder with the Con...
research
09/19/2019

Self-Training for End-to-End Speech Recognition

We revisit self-training in the context of end-to-end speech recognition...
research
03/29/2022

Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment

Current leading mispronunciation detection and diagnosis (MDD) systems a...
research
05/02/2021

Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks

End-to-end approaches for sequence tasks are becoming increasingly popul...
research
03/04/2021

An Empirical Study of End-to-end Simultaneous Speech Translation Decoding Strategies

This paper proposes a decoding strategy for end-to-end simultaneous spee...
research
12/20/2022

Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data

Self-training has been shown to be helpful in addressing data scarcity f...

Please sign up or login with your details

Forgot password? Click here to reset