Teacher-Student Training for Robust Tacotron-based TTS

11/07/2019
by   Rui Liu, et al.
0

While neural end-to-end text-to-speech (TTS) is superior to conventional statistical methods in many ways, the exposure bias problem in the autoregressive models remains an issue to be resolved. The exposure bias problem arises from the mismatch between the training and inference process, that results in unpredictable performance for out-of-domain test data at run-time. To overcome this, we propose a teacher-student training scheme for Tacotron-based TTS by introducing a distillation loss function in addition to the feature loss function. We first train a Tacotron2-based TTS model by always providing natural speech frames to the decoder, that serves as a teacher model. We then train another Tacotron2-based model as a student model, of which the decoder takes the predicted speech frames as input, similar to how the decoder works during run-time inference. With the distillation loss, the student model learns the output probabilities from the teacher model, that is called knowledge distillation. Experiments show that our proposed training scheme consistently improves the voice quality for out-of-domain test data both in Chinese and English systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2020

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Advanced text to speech (TTS) models such as FastSpeech can synthesize s...
research
04/20/2021

Knowledge Distillation as Semiparametric Inference

A popular approach to model compression is to train an inexpensive stude...
research
06/05/2021

Bidirectional Distillation for Top-K Recommender System

Recommender systems (RS) have started to employ knowledge distillation, ...
research
06/14/2021

CoDERT: Distilling Encoder Representations with Co-learning for Transducer-based Speech Recognition

We propose a simple yet effective method to compress an RNN-Transducer (...
research
11/05/2021

Oracle Teacher: Towards Better Knowledge Distillation

Knowledge distillation (KD), best known as an effective method for model...
research
04/09/2019

A New GAN-based End-to-End TTS Training Algorithm

End-to-end, autoregressive model-based TTS has shown significant perform...
research
05/12/2018

I Have Seen Enough: A Teacher Student Network for Video Classification Using Fewer Frames

Over the past few years, various tasks involving videos such as classifi...

Please sign up or login with your details

Forgot password? Click here to reset