Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer

09/14/2023
by   Yongqi Wang, et al.
0

Direct speech-to-speech translation (S2ST) with discrete self-supervised representations has achieved remarkable accuracy, but is unable to preserve the speaker timbre of the source speech during translation. Meanwhile, the scarcity of high-quality speaker-parallel data poses a challenge for learning style transfer between source and target speech. We propose an S2ST framework with an acoustic language model based on discrete units from a self-supervised model and a neural codec for style transfer. The acoustic language model leverages self-supervised in-context learning, acquiring the ability for style transfer without relying on any speaker-parallel data, thereby overcoming the issue of data scarcity. By using extensive training data, our model achieves zero-shot cross-lingual style transfer on previously unseen source languages. Experiments show that our model generates translated speeches with high fidelity and style similarity. Audio samples are available at http://stylelm.github.io/ .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2023

StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation

Direct speech-to-speech translation (S2ST) has gradually become popular ...
research
07/30/2023

HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer

Despite rapid progress in the voice style transfer (VST) field, recent z...
research
10/06/2021

Self-Supervised Knowledge Assimilation for Expert-Layman Text Style Transfer

Expert-layman text style transfer technologies have the potential to imp...
research
05/25/2022

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation

Direct speech-to-speech translation (S2ST) systems leverage recent progr...
research
05/18/2022

Exploiting Social Media Content for Self-Supervised Style Transfer

Recent research on style transfer takes inspiration from unsupervised ne...
research
12/13/2022

Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis

Cross-speaker style transfer in speech synthesis aims at transferring a ...
research
10/10/2020

Semi-supervised Formality Style Transfer using Language Model Discriminator and Mutual Information Maximization

Formality style transfer is the task of converting informal sentences to...

Please sign up or login with your details

Forgot password? Click here to reset