Maximizing Mutual Information for Tacotron

08/30/2019
by   Peng Liu, et al.
0

End-to-end speech synthesis method such as Tacotron, Tacotron2 and Transformer-TTS already achieves close to human quality performance. However compared to HMM-based method or NN-based frame-to-frame regression method, it is prone to some bad cases, such as missing words, repeating words and incomplete synthesis. More seriously, we cannot know whether such errors exist in a synthesized waveform or not unless we listen to it. We attribute the comparatively high sentence error rate to the local information preference of conditional autoregressive models. Inspired by the success of InfoGAN in learning interpretable representation by a mutual information regularization, in this paper, we propose to maximize the mutual information between the predicted acoustic features and the input text for end-to-end speech synthesis methods to address the local information preference problem and avoid such bad cases. What is more, we provide an indicator to detect errors in the predicted acoustic features as a byproduct. Experiment results show that our method can reduce the rate of bad cases and provide a reliable indicator to detect bad cases automatically.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2020

Audiovisual Speech Synthesis using Tacotron2

Audiovisual speech synthesis is the problem of synthesizing a talking fa...
research
03/09/2020

Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis

We present a method to generate speech from input text and a style vecto...
research
12/12/2018

FPUAS : Fully Parallel UFANS-based End-to-End Acoustic System with 10x Speed Up

A lightweight end-to-end acoustic system is crucial in the deployment of...
research
10/09/2021

Using multiple reference audios and style embedding constraints for speech synthesis

The end-to-end speech synthesis model can directly take an utterance as ...
research
12/11/2019

End-to-End Learning of Geometrical Shaping Maximizing Generalized Mutual Information

GMI-based end-to-end learning is shown to be highly nonconvex. We apply ...
research
02/23/2022

End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation

Neural vocoders have recently demonstrated high quality speech synthesis...

Please sign up or login with your details

Forgot password? Click here to reset