Direct speech-to-speech translation with discrete units

07/12/2021
by   Ann Lee, et al.
0

We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation. Previous work addresses the problem by training an attention-based sequence-to-sequence model that maps source speech spectrograms into target spectrograms. To tackle the challenge of modeling continuous spectrogram features of the target speech, we propose to predict the self-supervised discrete representations learned from an unlabeled speech corpus instead. When target text transcripts are available, we design a multitask learning framework with joint speech and text training that enables the model to generate dual mode output (speech and text) simultaneously in the same inference pass. Experiments on the Fisher Spanish-English dataset show that predicting discrete units and joint speech and text training improve model performance by 11 BLEU compared with a baseline that predicts spectrograms and bridges 83 without any text transcripts, our model achieves similar performance as a baseline that predicts spectrograms and is trained with text data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2021

Direct simultaneous speech to speech translation

We present the first direct simultaneous speech-to-speech translation (S...
research
12/15/2022

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

Direct speech-to-speech translation (S2ST), in which all components can ...
research
05/25/2022

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation

Direct speech-to-speech translation (S2ST) systems leverage recent progr...
research
05/22/2023

Duplex Diffusion Models Improve Speech-to-Speech Translation

Speech-to-speech translation is a typical sequence-to-sequence learning ...
research
04/10/2023

Enhancing Speech-to-Speech Translation with Multiple TTS Targets

It has been known that direct speech-to-speech translation (S2ST) models...
research
10/02/2019

Speech-to-speech Translation between Untranscribed Unknown Languages

In this paper, we explore a method for training speech-to-speech transla...
research
11/11/2022

Speech-to-Speech Translation For A Real-world Unwritten Language

We study speech-to-speech translation (S2ST) that translates speech from...

Please sign up or login with your details

Forgot password? Click here to reset