Cross-speaker Emotion Transfer by Manipulating Speech Style Latents

03/15/2023
by   Suhee Jo, et al.
0

In recent years, emotional text-to-speech has shown considerable progress. However, it requires a large amount of labeled data, which is not easily accessible. Even if it is possible to acquire an emotional speech dataset, there is still a limitation in controlling emotion intensity. In this work, we propose a novel method for cross-speaker emotion transfer and manipulation using vector arithmetic in latent style space. By leveraging only a few labeled samples, we generate emotional speech from reading-style speech without losing the speaker identity. Furthermore, emotion strength is readily controllable using a scalar value, providing an intuitive way for users to manipulate speech. Experimental results show the proposed method affords superior performance in terms of expressiveness, naturalness, and controllability, preserving speaker identity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2022

Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS

Expressive text-to-speech has shown improved performance in recent years...
research
08/16/2023

AffectEcho: Speaker Independent and Language-Agnostic Emotion and Affect Transfer for Speech Synthesis

Affect is an emotional characteristic encompassing valence, arousal, and...
research
06/29/2022

iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre

The capability of generating speech with specific type of emotion is des...
research
10/08/2021

Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

In expressive speech synthesis, there are high requirements for emotion ...
research
06/21/2021

UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control

We propose a novel high-fidelity expressive speech synthesis model, UniT...
research
11/17/2020

Controllable Emotion Transfer For End-to-End Speech Synthesis

Emotion embedding space learned from references is a straightforward app...
research
11/17/2022

EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance

Although current neural text-to-speech (TTS) models are able to generate...

Please sign up or login with your details

Forgot password? Click here to reset