Discrete acoustic space for an efficient sampling in neural text-to-speech

10/24/2021
by   Marek Strelec, et al.
0

We present an SVQ-VAE architecture using a split vector quantizer for NTTS, as an enhancement to the well-known VAE and VQ-VAE architectures. Compared to these previous architectures, our proposed model retains the benefits of using an utterance-level bottleneck, while reducing the associated loss of representation power. We train the model on recordings in the highly expressive task-oriented dialogues domain and show that SVQ-VAE achieves a statistically significant improvement in naturalness over the VAE and VQ-VAE models. Furthermore, we demonstrate that the SVQ-VAE acoustic space is predictable from text, reducing the gap between the standard constant vector synthesis and vocoded recordings by 32

READ FULL TEXT
research
05/25/2017

Investigation of Using VAE for i-Vector Speaker Verification

New system for i-vector speaker recognition based on variational autoenc...
research
01/25/2019

Unsupervised speech representation learning using WaveNet autoencoders

We consider the task of unsupervised extraction of meaningful latent rep...
research
12/06/2018

β-VAEs can retain label information even at high compression

In this paper, we investigate the degree to which the encoding of a β-VA...
research
08/13/2021

Enhancing audio quality for expressive Neural Text-to-Speech

Artificial speech synthesis has made a great leap in terms of naturalnes...
research
07/27/2023

Online Clustered Codebook

Vector Quantisation (VQ) is experiencing a comeback in machine learning,...
research
11/04/2020

Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech

In this paper, we introduce Kathaka, a model trained with a novel two-st...
research
06/10/2019

Using generative modelling to produce varied intonation for speech synthesis

Unlike human speakers, typical text-to-speech (TTS) systems are unable t...

Please sign up or login with your details

Forgot password? Click here to reset