Cloning one's voice using very limited data in the wild

10/07/2021
by   Dongyang Dai, et al.
0

With the increasing popularity of speech synthesis products, the industry has put forward more requirements for personalized speech synthesis: (1) How to use low-resource, easily accessible data to clone a person's voice. (2) How to clone a person's voice while controlling the style and prosody. To solve the above two problems, we proposed the Hieratron model framework in which the prosody and timbre are modeled separately using two modules, therefore, the independent control of timbre and the other characteristics of audio can be achieved while generating speech. The practice shows that, for very limited target speaker data in the wild, Hieratron has obvious advantages over the traditional method, in addition to controlling the style and language of the generated speech, the mean opinion score on speech quality of the generated speech has also been improved by more than 0.2 points.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/26/2020

Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement

With the popularity of deep neural network, speech synthesis task has ac...
10/26/2019

Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens

Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GS...
10/29/2019

Disentangling Timbre and Singing Style with Multi-singer Singing Synthesis System

In this study, we define the identity of the singer with two independent...
02/19/2018

Voice Impersonation using Generative Adversarial Networks

Voice impersonation is not the same as voice transformation, although th...
07/20/2017

VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop

We present a new neural text to speech (TTS) method that is able to tran...
02/18/2019

Securing Voice-driven Interfaces against Fake (Cloned) Audio Attacks

Voice cloning technologies have found applications in a variety of areas...
12/14/2020

Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis

The style of the speech varies from person to person and every person ex...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.