StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation

05/28/2023
by   Kun Song, et al.
0

Direct speech-to-speech translation (S2ST) has gradually become popular as it has many advantages compared with cascade S2ST. However, current research mainly focuses on the accuracy of semantic translation and ignores the speech style transfer from a source language to a target language. The lack of high-fidelity expressive parallel data makes such style transfer challenging, especially in more practical zero-shot scenarios. To solve this problem, we first build a parallel corpus using a multi-lingual multi-speaker text-to-speech synthesis (TTS) system and then propose the StyleS2ST model with cross-lingual speech style transfer ability based on a style adaptor on a direct S2ST system framework. Enabling continuous style space modeling of an acoustic model through parallel corpus training and non-parallel TTS data augmentation, StyleS2ST captures cross-lingual acoustic feature mapping from the source to the target language. Experiments show that StyleS2ST achieves good style similarity and naturalness in both in-set and out-of-set zero-shot scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2023

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer

Direct speech-to-speech translation (S2ST) with discrete self-supervised...
research
08/01/2018

Monolingual and Cross-lingual Zero-shot Style Transfer

We introduce the task of zero-shot style transfer between different lang...
research
11/13/2017

Zero-Shot Style Transfer in Text Using Recurrent Neural Networks

Zero-shot translation is the task of translating between a language pair...
research
08/09/2023

VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer

Current talking face generation methods mainly focus on speech-lip synch...
research
05/09/2023

Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing

Automatic dubbing, which generates a corresponding version of the input ...
research
01/25/2023

A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation

Expressive speech-to-speech translation (S2ST) aims to transfer prosodic...
research
06/27/2023

GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech

Cross-lingual timbre and style generalizable text-to-speech (TTS) aims t...

Please sign up or login with your details

Forgot password? Click here to reset