Large-Scale Self- and Semi-Supervised Learning for Speech Translation

04/14/2021
by   Changhan Wang, et al.
0

In this paper, we improve speech translation (ST) through effectively leveraging large quantities of unlabeled speech and text data in different and complementary ways. We explore both pretraining and self-training by using the large Libri-Light speech audio corpus and language modeling with CommonCrawl. Our experiments improve over the previous state of the art by 2.6 BLEU on average on all four considered CoVoST 2 language pairs via a simple recipe of combining wav2vec 2.0 pretraining, a single iteration of self-training and decoding with a language model. Different to existing work, our approach does not leverage any other supervision than ST data. Code and models will be publicly released.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/17/2021

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

This paper presents XLS-R, a large-scale model for cross-lingual speech ...
11/17/2020

Neural Semi-supervised Learning for Text Classification Under Large-Scale Pretraining

The goal of semi-supervised learning is to utilize the unlabeled, in-dom...
09/30/2021

Semi-Supervised Text Classification via Self-Pretraining

We present a neural semi-supervised learning model termed Self-Pretraini...
09/02/2021

Better Self-training for Image Classification through Self-supervision

Self-training is a simple semi-supervised learning approach: Unlabelled ...
12/15/2021

Textless Speech-to-Speech Translation on Real Data

We present a textless speech-to-speech translation (S2ST) system that ca...
10/29/2021

Combining Unsupervised and Text Augmented Semi-Supervised Learning for Low Resourced Autoregressive Speech Recognition

Recent advances in unsupervised representation learning have demonstrate...
06/09/2022

Revisiting End-to-End Speech-to-Text Translation From Scratch

End-to-end (E2E) speech-to-text translation (ST) often depends on pretra...