Simple and Effective Unsupervised Speech Translation

10/18/2022
by   Changhan Wang, et al.
0

The amount of labeled data to train models for speech tasks is limited for most languages, however, the data scarcity is exacerbated for speech translation which requires labeled data covering two different languages. To address this issue, we study a simple and effective approach to build speech translation systems without labeled data by leveraging recent advances in unsupervised speech recognition, machine translation and speech synthesis, either in a pipeline approach, or to generate pseudo-labels for training end-to-end speech translation models. Furthermore, we present an unsupervised domain adaptation technique for pre-trained speech models which improves the performance of downstream unsupervised speech recognition, especially for low-resource settings. Experiments show that unsupervised speech-to-text translation outperforms the previous unsupervised state of the art by 3.2 BLEU on the Libri-Trans benchmark, on CoVoST 2, our best systems outperform the best supervised end-to-end models (without pre-training) from only two years ago by an average of 5.0 BLEU over five X-En directions. We also report competitive results on MuST-C and CVSS benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2021

IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task

This paper describes the submission to the IWSLT 2021 Low-Resource Speec...
research
04/06/2022

Simple and Effective Unsupervised Speech Synthesis

We introduce the first unsupervised speech synthesis system based on a s...
research
05/12/2023

Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation

Most of the speech translation models heavily rely on parallel data, whi...
research
01/28/2020

Unsupervised Pre-training of Bidirectional Speech Encoders via Masked Reconstruction

We propose an approach for pre-training speech representations via a mas...
research
11/20/2019

A Comparative Study on End-to-end Speech to Text Translation

Recent advances in deep learning show that end-to-end speech to text tra...
research
07/27/2021

Unsupervised Domain Adaptation for Hate Speech Detection Using a Data Augmentation Approach

Online harassment in the form of hate speech has been on the rise in rec...
research
06/09/2022

Revisiting End-to-End Speech-to-Text Translation From Scratch

End-to-end (E2E) speech-to-text translation (ST) often depends on pretra...

Please sign up or login with your details

Forgot password? Click here to reset