DUB: Discrete Unit Back-translation for Speech Translation

05/19/2023
by   Dong Zhang, et al.
0

How can speech-to-text translation (ST) perform as well as machine translation (MT)? The key point is to bridge the modality gap between speech and text so that useful MT techniques can be applied to ST. Recently, the approach of representing speech with unsupervised discrete units yields a new way to ease the modality problem. This motivates us to propose Discrete Unit Back-translation (DUB) to answer two questions: (1) Is it better to represent speech with discrete units than with continuous features in direct ST? (2) How much benefit can useful MT techniques bring to ST? With DUB, the back-translation technique can successfully be applied on direct ST and obtains an average boost of 5.5 BLEU on MuST-C En-De/Fr/Es. In the low-resource language scenario, our method achieves comparable performance to existing methods that rely on large-scale external data. Code and models are available at https://github.com/0nutation/DUB.

READ FULL TEXT
research
05/15/2023

Back Translation for Speech-to-text Translation Without Transcripts

The success of end-to-end speech-to-text translation (ST) is often achie...
research
10/15/2022

Generating Synthetic Speech from SpokenVocab for Speech Translation

Training end-to-end speech translation (ST) systems requires sufficientl...
research
05/07/2021

Learning Shared Semantic Space for Speech-to-Text Translation

Having numerous potential applications and great impact, end-to-end spee...
research
02/24/2019

The ARIEL-CMU Systems for LoReHLT18

This paper describes the ARIEL-CMU submissions to the Low Resource Human...
research
04/06/2022

Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation

Direct speech-to-speech translation (S2ST) models suffer from data scarc...
research
05/23/2023

Improving speech translation by fusing speech and text

In speech translation, leveraging multimodal data to improve model perfo...
research
05/18/2022

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation

Direct Speech-to-speech translation (S2ST) has drawn more and more atten...

Please sign up or login with your details

Forgot password? Click here to reset