Direct Text to Speech Translation System using Acoustic Units

09/14/2023
by   Victoria Mingote, et al.
0

This paper proposes a direct text to speech translation system using discrete acoustic units. This framework employs text in different source languages as input to generate speech in the target language without the need for text transcriptions in this language. Motivated by the success of acoustic units in previous works for direct speech to speech translation systems, we use the same pipeline to extract the acoustic units using a speech encoder combined with a clustering algorithm. Once units are obtained, an encoder-decoder architecture is trained to predict them. Then a vocoder generates speech from units. Our approach for direct text to speech translation was tested on the new CVSS corpus with two different text mBART models employed as initialisation. The systems presented report competitive performance for most of the language pairs evaluated. Besides, results show a remarkable improvement when initialising our proposed architecture with a model pre-trained with more languages.

READ FULL TEXT
research
09/21/2020

TED: Triple Supervision Decouples End-to-end Speech-to-text Translation

An end-to-end speech-to-text translation (ST) takes audio in a source la...
research
10/21/2022

A Textless Metric for Speech-to-Speech Comparison

This paper proposes a textless speech-to-speech comparison metric that a...
research
12/15/2022

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

Direct speech-to-speech translation (S2ST), in which all components can ...
research
09/27/2022

Direct Speech Translation for Automatic Subtitling

Automatic subtitling is the task of automatically translating the speech...
research
10/15/2021

Direct simultaneous speech to speech translation

We present the first direct simultaneous speech-to-speech translation (S...
research
05/27/2019

Specific polysemy of the brief sapiential units

In this paper we explain how we deal with the problems related to the co...
research
04/10/2023

Enhancing Speech-to-Speech Translation with Multiple TTS Targets

It has been known that direct speech-to-speech translation (S2ST) models...

Please sign up or login with your details

Forgot password? Click here to reset