Direct simultaneous speech to speech translation

10/15/2021
by   Xutai Ma, et al.
0

We present the first direct simultaneous speech-to-speech translation (Simul-S2ST) model, with the ability to start generating translation in the target speech before consuming the full source speech content and independently from intermediate text representations. Our approach leverages recent progress on direct speech-to-speech translation with discrete units. Instead of continuous spectrogram features, a sequence of direct representations, which are learned in a unsupervised manner, are predicted from the model and passed directly to a vocoder for speech synthesis. The simultaneous policy then operates on source speech features and target discrete units. Finally, a vocoder synthesize the target speech from discrete units on-the-fly. We carry out numerical studies to compare cascaded and direct approach on Fisher Spanish-English dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2021

Direct speech-to-speech translation with discrete units

We present a direct speech-to-speech translation (S2ST) model that trans...
research
12/12/2022

Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features

Speech-to-speech translation directly translates a speech utterance to a...
research
09/14/2023

Direct Text to Speech Translation System using Acoustic Units

This paper proposes a direct text to speech translation system using dis...
research
09/27/2022

Direct Speech Translation for Automatic Subtitling

Automatic subtitling is the task of automatically translating the speech...
research
06/14/2020

UWSpeech: Speech to Speech Translation for Unwritten Languages

Existing speech to speech translation systems heavily rely on the text o...
research
06/11/2021

Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR

Simultaneous speech-to-text translation is widely useful in many scenari...
research
10/02/2019

Speech-to-speech Translation between Untranscribed Unknown Languages

In this paper, we explore a method for training speech-to-speech transla...

Please sign up or login with your details

Forgot password? Click here to reset