Guided-TTS:Text-to-Speech with Untranscribed Speech

11/23/2021
by   Heeseung Kim, et al.
0

Most neural text-to-speech (TTS) models require <speech, transcript> paired data from the desired speaker for high-quality speech synthesis, which limits the usage of large amounts of untranscribed data for training. In this work, we present Guided-TTS, a high-quality TTS model that learns to generate speech from untranscribed speech data. Guided-TTS combines an unconditional diffusion probabilistic model with a separately trained phoneme classifier for text-to-speech. By modeling the unconditional distribution for speech, our model can utilize the untranscribed data for training. For text-to-speech synthesis, we guide the generative process of the unconditional DDPM via phoneme classification to produce mel-spectrograms from the conditional distribution given transcript. We show that Guided-TTS achieves comparable performance with the existing methods without any transcript for LJSpeech. Our results further show that a single speaker-dependent phoneme classifier trained on multispeaker large-scale data can guide unconditional DDPMs for various speakers to perform TTS.

READ FULL TEXT
research
05/30/2022

Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data

We propose Guided-TTS 2, a diffusion-based generative model for high-qua...
research
02/16/2022

ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech

Expressive text-to-speech (TTS) has become a hot research topic recently...
research
06/05/2023

LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading

Lip-to-speech involves generating a natural-sounding speech synchronized...
research
03/29/2022

CycleGAN-Based Unpaired Speech Dereverberation

Typically, neural network-based speech dereverberation models are traine...
research
08/28/2022

Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks

Transfer tasks in text-to-speech (TTS) synthesis - where one or more asp...
research
10/26/2022

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection

This paper proposes a method for selecting training data for text-to-spe...
research
07/07/2020

X-vectors: New Quantitative Biomarkers for Early Parkinson's Disease Detection from Speech

Many articles have used voice analysis to detect Parkinson's disease (PD...

Please sign up or login with your details

Forgot password? Click here to reset