Deep Voice 2: Multi-Speaker Neural Text-to-Speech

05/24/2017
by   Sercan Arik, et al.
0

We introduce a technique for augmenting neural text-to-speech (TTS) with lowdimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-ofthe-art approaches for single-speaker neural TTS: Deep Voice 1 and Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constructed with higher performance building blocks and demonstrates a significant audio quality improvement over Deep Voice 1. We improve Tacotron by introducing a post-processing neural vocoder, and demonstrate a significant audio quality improvement. We then demonstrate our technique for multi-speaker speech synthesis for both Deep Voice 2 and Tacotron on two multi-speaker TTS datasets. We show that a single neural TTS system can learn hundreds of unique voices from less than half an hour of data per speaker, while achieving high audio quality synthesis and preserving the speaker identities almost perfectly.

READ FULL TEXT

page 12

page 13

research
06/04/2018

Voice Imitating Text-to-Speech Neural Networks

We propose a neural text-to-speech (TTS) model that can imitate a new sp...
research
11/04/2019

pyannote.audio: neural building blocks for speaker diarization

We introduce pyannote.audio, an open-source toolkit written in Python fo...
research
01/02/2020

Excitation-based Voice Quality Analysis and Modification

This paper investigates the differences occuring in the excitation for d...
research
10/22/2020

NU-GAN: High resolution neural upsampling with GAN

In this paper, we propose NU-GAN, a new method for resampling audio from...
research
09/28/2022

MeWEHV: Mel and Wave Embeddings for Human Voice Tasks

A recent trend in speech processing is the use of embeddings created thr...
research
03/25/2022

WaveFuzz: A Clean-Label Poisoning Attack to Protect Your Voice

People are not always receptive to their voice data being collected and ...
research
10/07/2021

Towards Universal Neural Vocoding with a Multi-band Excited WaveNet

This paper introduces the Multi-Band Excited WaveNet a neural vocoder fo...

Please sign up or login with your details

Forgot password? Click here to reset