Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing

05/09/2023
by   Jingbei Li, et al.
0

Automatic dubbing, which generates a corresponding version of the input speech in another language, could be widely utilized in many real-world scenarios such as video and game localization. In addition to synthesizing the translated scripts, automatic dubbing needs to further transfer the speaking style in the original language to the dubbed speeches to give audiences the impression that the characters are speaking in their native tongue. However, state-of-the-art automatic dubbing systems only model the transfer on duration and speaking rate, neglecting the other aspects in speaking style such as emotion, intonation and emphasis which are also crucial to fully perform the characters and speech understanding. In this paper, we propose a joint multi-scale cross-lingual speaking style transfer framework to simultaneously model the bidirectional speaking style transfer between languages at both global (i.e. utterance level) and local (i.e. word level) scales. The global and local speaking styles in each language are extracted and utilized to predicted the global and local speaking styles in the other language with an encoder-decoder framework for each direction and a shared bidirectional attention mechanism for both directions. A multi-scale speaking style enhanced FastSpeech 2 is then utilized to synthesize the predicted the global and local speaking styles to speech for each language. Experiment results demonstrate the effectiveness of our proposed framework, which outperforms a baseline with only duration transfer in both objective and subjective evaluations.

READ FULL TEXT

page 1

page 3

page 5

page 10

research
11/02/2022

Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement

Disentanglement of a speaker's timbre and style is very important for st...
research
05/28/2023

StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation

Direct speech-to-speech translation (S2ST) has gradually become popular ...
research
06/16/2021

Global Rhythm Style Transfer Without Text Transcriptions

Prosody plays an important role in characterizing the style of a speaker...
research
10/16/2020

Anisotropic Stroke Control for Multiple Artists Style Transfer

Though significant progress has been made in artistic style transfer, se...
research
06/27/2023

GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech

Cross-lingual timbre and style generalizable text-to-speech (TTS) aims t...
research
12/08/2022

All-to-key Attention for Arbitrary Style Transfer

Attention-based arbitrary style transfer studies have shown promising pe...

Please sign up or login with your details

Forgot password? Click here to reset