Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

08/04/2021
by   Julian Zaidi, et al.
0

This paper presents Daft-Exprt, a multi-speaker acoustic model advancing the state-of-the-art on inter-speaker and inter-text prosody transfer. This improvement is achieved using FiLM conditioning layers, alongside adversarial training that encourages disentanglement between prosodic information and speaker identity. The acoustic model inherits attractive qualities from FastSpeech 2, such as fast inference and local prosody attributes prediction for finer grained control over generation. Experimental results show that Daft-Exprt significantly outperforms strong baselines on prosody transfer tasks, while yielding naturalness comparable to state-of-the-art expressive models. Moreover, results indicate that adversarial training effectively discards speaker identity information from the prosody representation, which ensures Daft-Exprt will consistently generate speech with the desired voice. We publicly release our code and provide speech samples from our experiments.

READ FULL TEXT
research
01/30/2021

Expressive Neural Voice Cloning

Voice cloning is the task of learning to synthesize the voice of an unse...
research
08/10/2020

Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training

Data efficient voice cloning aims at synthesizing target speaker's voice...
research
02/10/2022

Cross-speaker style transfer for text-to-speech using data augmentation

We address the problem of cross-speaker style transfer for text-to-speec...
research
02/18/2022

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation

Dysarthric speech reconstruction (DSR), which aims to improve the qualit...
research
02/15/2022

SpeechPainter: Text-conditioned Speech Inpainting

We propose SpeechPainter, a model for filling in gaps of up to one secon...
research
06/17/2019

Combining Adversarial Training and Disentangled Speech Representation for Robust Zero-Resource Subword Modeling

This study addresses the problem of unsupervised subword unit discovery ...
research
11/16/2022

Psychophysiology-aided Perceptually Fluent Speech Analysis of Children Who Stutter

This first-of-its-kind paper presents a novel approach named PASAD that ...

Please sign up or login with your details

Forgot password? Click here to reset