Speaker-adaptive neural vocoders for statistical parametric speech synthesis systems

11/08/2018
by   Eunwoo Song, et al.
0

This paper proposes speaker-adaptive neural vocoders for statistical parametric speech synthesis (SPSS) systems. Recently proposed WaveNet-based neural vocoding systems successfully generate a time sequence of speech signal with an autoregressive framework. However, building high-quality speech synthesis systems with limited training data for a target speaker remains a challenge. To generate more natural speech signals with the constraint of limited training data, we employ a speaker adaptation task with an effective variation of neural vocoding models. In the proposed method, a speaker-independent training method is applied to capture universal attributes embedded in multiple speakers, and the trained model is then fine-tuned to represent the specific characteristics of the target speaker. Experimental results verify that the proposed SPSS systems with speaker-adaptive neural vocoders outperform those with traditional source-filter model-based vocoders and those with WaveNet vocoders, trained either speaker-dependently or speaker-independently.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2018

ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems

This paper proposes a WaveNet-based neural excitation model (ExcitNet) f...
research
08/16/2021

GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints

Few-shot speaker adaptation is a specific Text-to-Speech (TTS) system th...
research
06/29/2021

GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis

Recent advances in neural multi-speaker text-to-speech (TTS) models have...
research
07/28/2018

Analysing Shortcomings of Statistical Parametric Speech Synthesis

Output from statistical parametric speech synthesis (SPSS) remains notic...
research
02/25/2018

Multi-channel Adaptive Dereverberation Tracing Abrupt Position Change of Target Speaker

Adaptive algorithm based on multi-channel linear prediction is an effect...
research
07/31/2018

Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems

Most neural-network based speaker-adaptive acoustic models for speech sy...
research
02/07/2022

Building Synthetic Speaker Profiles in Text-to-Speech Systems

The diversity of speaker profiles in multi-speaker TTS systems is a cruc...

Please sign up or login with your details

Forgot password? Click here to reset