Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning

02/10/2021
by   Giuseppe Ruggiero, et al.
0

Deep learning models are becoming predominant in many fields of machine learning. Text-to-Speech (TTS), the process of synthesizing artificial speech from text, is no exception. To this end, a deep neural network is usually trained using a corpus of several hours of recorded speech from a single speaker. Trying to produce the voice of a speaker other than the one learned is expensive and requires large effort since it is necessary to record a new dataset and retrain the model. This is the main reason why the TTS models are usually single speaker. The proposed approach has the goal to overcome these limitations trying to obtain a system which is able to model a multi-speaker acoustic space. This allows the generation of speech audio similar to the voice of different target speakers, even if they were not observed during the training phase.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2018

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

We describe a neural network-based system for text-to-speech (TTS) synth...
research
05/11/2020

End-To-End Speech Synthesis Applied to Brazilian Portuguese

Voice synthesis systems are popular in different applications, such as p...
research
10/15/2021

Neural Dubber: Dubbing for Videos According to Scripts

Dubbing is a post-production process of re-recording actors' dialogues, ...
research
04/07/2023

ArmanTTS single-speaker Persian dataset

TTS, or text-to-speech, is a complicated process that can be accomplishe...
research
03/21/2022

Automated detection of foreground speech with wearable sensing in everyday home environments: A transfer learning approach

Acoustic sensing has proved effective as a foundation for numerous appli...
research
10/20/2017

Deep Voice 3: 2000-Speaker Neural Text-to-Speech

We present Deep Voice 3, a fully-convolutional attention-based neural te...
research
10/20/2017

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

We present Deep Voice 3, a fully-convolutional attention-based neural te...

Please sign up or login with your details

Forgot password? Click here to reset