YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

12/04/2021
by   Edresson Casanova, et al.
0

YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our approach achieves promising results in a target language with a single-speaker dataset, opening possibilities for zero-shot multi-speaker TTS and zero-shot voice conversion systems in low-resource languages. Finally, it is possible to fine-tune the YourTTS model with less than 1 minute of speech and achieve state-of-the-art results in voice similarity and with reasonable quality. This is important to allow synthesis for speakers with a very different voice or recording characteristics from those seen during training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2022

DGC-vector: A new speaker embedding for zero-shot voice conversion

Recently, more and more zero-shot voice conversion algorithms have been ...
research
05/31/2021

StarGAN-ZSVC: Towards Zero-Shot Voice Conversion in Low-Resource Contexts

Voice conversion is the task of converting a spoken utterance from a sou...
research
10/21/2022

Low-Resource Multilingual and Zero-Shot Multispeaker TTS

While neural methods for text-to-speech (TTS) have shown great advances ...
research
07/18/2023

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs

In recent years, large-scale pre-trained speech language models (SLMs) h...
research
03/03/2023

WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions

Recognizing whispered speech and converting it to normal speech creates ...
research
06/28/2023

Two-Stage Voice Anonymization for Enhanced Privacy

In recent years, the need for privacy preservation when manipulating or ...
research
06/16/2021

Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments

Voice Conversion (VC) is a technique that aims to transform the non-ling...

Please sign up or login with your details

Forgot password? Click here to reset