Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features

12/12/2022
by   Junhui Zhang, et al.
0

Speech-to-speech translation directly translates a speech utterance to another between different languages, and has great potential in tasks such as simultaneous interpretation. State-of-art models usually contains an auxiliary module for phoneme sequences prediction, and this requires textual annotation of the training dataset. We propose a direct speech-to-speech translation model which can be trained without any textual annotation or content information. Instead of introducing an auxiliary phoneme prediction task in the model, we propose to use bottleneck features as intermediate training objectives for our model to ensure the translation performance of the system. Experiments on Mandarin-Cantonese speech translation demonstrate the feasibility of the proposed approach and the performance can match a cascaded system with respect of translation and synthesis qualities.

READ FULL TEXT
research
10/15/2021

Direct simultaneous speech to speech translation

We present the first direct simultaneous speech-to-speech translation (S...
research
10/31/2022

Textless Direct Speech-to-Speech Translation with Discrete Speech Representation

Research on speech-to-speech translation (S2ST) has progressed rapidly i...
research
04/15/2019

Attention-Passing Models for Robust and Data-Efficient End-to-End Speech Translation

Speech translation has traditionally been approached through cascaded mo...
research
10/28/2022

Efficient Speech Translation with Dynamic Latent Perceivers

Transformers have been the dominant architecture for Speech Translation ...
research
05/16/2023

Towards Speech Dialogue Translation Mediating Speakers of Different Languages

We present a new task, speech dialogue translation mediating speakers of...
research
10/24/2022

Does Joint Training Really Help Cascaded Speech Translation?

Currently, in speech translation, the straightforward approach - cascadi...
research
08/03/2022

A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis

In human speech, the attitude of a speaker cannot be fully expressed onl...

Please sign up or login with your details

Forgot password? Click here to reset