Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency

10/25/2019
by   Matt Whitehill, et al.
0

Current multi-reference style transfer models for Text-to-Speech (TTS) perform sub-optimally on disjoints datasets, where one dataset contains only a single style class for one of the style dimensions. These models generally fail to produce style transfer for the dimension that is underrepresented in the dataset. In this paper, we propose an adversarial cycle consistency training scheme with paired and unpaired triplets to ensure the use of information from all style dimensions. During training, we incorporate unpaired triplets with randomly selected reference audio samples and encourage the synthesized speech to preserve the appropriate styles using adversarial cycle consistency. We use this method to transfer emotion from a dataset containing four emotions to a dataset with only a single emotion. This results in a 78 transfer (based on emotion classification) with minimal reduction in fidelity and naturalness. In subjective evaluations our method was consistently rated as closer to the reference style than the baseline. Synthesized speech samples are available at: https://sites.google.com/view/adv-cycle-consistent-tts

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis

Style transfer for out-of-domain (OOD) speech synthesis aims to generate...
research
04/04/2019

Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis

Speech style control and transfer techniques aim to enrich the diversity...
research
01/24/2022

Disentangling Style and Speaker Attributes for TTS Style Transfer

End-to-end neural TTS has shown improved performance in speech style tra...
research
07/27/2021

Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis

Cross-speaker style transfer is crucial to the applications of multi-sty...
research
06/18/2021

Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS

End-to-end neural TTS training has shown improved performance in speech ...
research
11/04/2022

NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS

Expressive text-to-speech (TTS) can synthesize a new speaking style by i...
research
06/25/2022

Self-supervised Context-aware Style Representation for Expressive Speech Synthesis

Expressive speech synthesis, like audiobook synthesis, is still challeng...

Please sign up or login with your details

Forgot password? Click here to reset