Voice Imitating Text-to-Speech Neural Networks

06/04/2018
by   Younggun Lee, et al.
0

We propose a neural text-to-speech (TTS) model that can imitate a new speaker's voice using only a small amount of speech sample. We demonstrate voice imitation using only a 6-seconds long speech sample without any other information such as transcripts. Our model also enables voice imitation instantly without additional training of the model. We implemented the voice imitating TTS model by combining a speaker embedder network with a state-of-the-art TTS model, Tacotron. The speaker embedder network takes a new speaker's speech sample and returns a speaker embedding. The speaker embedding with a target sentence are fed to Tacotron, and speech is generated with the new speaker's voice. We show that the speaker embeddings extracted by the speaker embedder network can represent the latent structure in different voices. The generated speech samples from our model have comparable voice quality to the ones from existing multi-speaker TTS models.

READ FULL TEXT
research
10/11/2018

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

In this paper, we present a novel system that separates the voice of a t...
research
05/24/2017

Deep Voice 2: Multi-Speaker Neural Text-to-Speech

We introduce a technique for augmenting neural text-to-speech (TTS) with...
research
11/15/2022

Rapid Connectionist Speaker Adaptation

We present SVCnet, a system for modelling speaker variability. Encoder N...
research
10/31/2022

Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection

The rapid spread of media content synthesis technology and the potential...
research
06/12/2020

Neural voice cloning with a few low-quality samples

In this paper, we explore the possibility of speech synthesis from low q...
research
05/22/2020

NAUTILUS: a Versatile Voice Cloning System

We introduce a novel speech synthesis system, called NAUTILUS, that can ...
research
06/05/2022

Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models

We present a novel way of conditioning a pretrained denoising diffusion ...

Please sign up or login with your details

Forgot password? Click here to reset