Cloning one's voice using very limited data in the wild

by   Dongyang Dai, et al.

With the increasing popularity of speech synthesis products, the industry has put forward more requirements for personalized speech synthesis: (1) How to use low-resource, easily accessible data to clone a person's voice. (2) How to clone a person's voice while controlling the style and prosody. To solve the above two problems, we proposed the Hieratron model framework in which the prosody and timbre are modeled separately using two modules, therefore, the independent control of timbre and the other characteristics of audio can be achieved while generating speech. The practice shows that, for very limited target speaker data in the wild, Hieratron has obvious advantages over the traditional method, in addition to controlling the style and language of the generated speech, the mean opinion score on speech quality of the generated speech has also been improved by more than 0.2 points.



There are no comments yet.


page 1

page 2

page 3

page 4


Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement

With the popularity of deep neural network, speech synthesis task has ac...

Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens

Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GS...

Disentangling Timbre and Singing Style with Multi-singer Singing Synthesis System

In this study, we define the identity of the singer with two independent...

Voice Impersonation using Generative Adversarial Networks

Voice impersonation is not the same as voice transformation, although th...

VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop

We present a new neural text to speech (TTS) method that is able to tran...

Securing Voice-driven Interfaces against Fake (Cloned) Audio Attacks

Voice cloning technologies have found applications in a variety of areas...

Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis

The style of the speech varies from person to person and every person ex...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.