NAUTILUS: a Versatile Voice Cloning System

05/22/2020
by   Hieu-Thi Luong, et al.
0

We introduce a novel speech synthesis system, called NAUTILUS, that can generate speech with a target voice either from a text input or a reference utterance of an arbitrary source speaker. By using a multi-speaker speech corpus to train all requisite encoders and decoders in the initial training stage, our system can clone unseen voices using untranscribed speech of target speakers on the basis of the backpropagation algorithm. Moreover, depending on the data circumstance of the target speaker, the cloning strategy can be adjusted to take advantage of additional data and modify the behaviors of text-to-speech (TTS) and/or voice conversion (VC) systems to accommodate the situation. We test the performance of the proposed framework by using deep convolution layers to model the encoders, decoders and WaveNet vocoder. Evaluations show that it achieves comparable quality with state-of-the-art TTS and VC systems when cloning with just five minutes of untranscribed speech. Moreover, it is demonstrated that the proposed framework has the ability to switch between TTS and VC with high speaker consistency, which will be useful for many applications.

READ FULL TEXT

page 3

page 4

page 5

research
06/04/2018

Voice Imitating Text-to-Speech Neural Networks

We propose a neural text-to-speech (TTS) model that can imitate a new sp...
research
09/14/2019

Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech

Voice conversion (VC) and text-to-speech (TTS) are two tasks that share ...
research
09/15/2023

Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech

In this work, we introduce a framework for cross-lingual speech synthesi...
research
08/15/2018

Investigation of Using Disentangled and Interpretable Representations for One-shot Cross-lingual Voice Conversion

We study the problem of cross-lingual voice conversion in non-parallel s...
research
12/22/2017

On Using Backpropagation for Speech Texture Generation and Voice Conversion

Inspired by recent work on neural network image generation which rely on...
research
09/18/2023

Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders

We propose a novel framework for electrolaryngeal speech intelligibility...
research
09/16/2023

FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework

This paper integrates graph-to-sequence into an end-to-end text-to-speec...

Please sign up or login with your details

Forgot password? Click here to reset