A Cyclical Post-filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-speech Systems

05/18/2020
by   Yi-Chiao Wu, et al.
0

Recently, the effectiveness of text-to-speech (TTS) systems combined with neural vocoders to generate high-fidelity speech has been shown. However, collecting the required training data and building these advanced systems from scratch is time and resource consuming. A more economical approach is to develop a neural vocoder to enhance the speech generated by existing TTS systems. Nonetheless, this approach usually suffers from two issues: 1) temporal mismatches between TTS and natural waveforms and 2) acoustic mismatches between training and testing data. To address these issues, we adopt a cyclic voice conversion (VC) model to generate temporally matched pseudo-VC data for training and acoustically matched enhanced data for testing the neural vocoders. Because of the generality, this framework can be applied to arbitrary neural vocoders. In this paper, we apply the proposed method with a state-of-the-art WaveNet vocoder for two different TTS systems, and both objective and subjective experimental results confirm the effectiveness of the proposed framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2022

A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System

Neural-based text-to-speech (TTS) systems achieve very high-fidelity spe...
research
03/26/2020

Non-parallel Voice Conversion System with WaveNet Vocoder and Collapsed Speech Suppression

In this paper, we integrate a simple non-parallel voice conversion (VC) ...
research
12/05/2019

Towards Robust Neural Vocoding for Speech Generation: A Survey

Recently, neural vocoders have been widely used in speech synthesis task...
research
11/27/2018

Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion

This paper presents a refinement framework of WaveNet vocoders for varia...
research
04/30/2018

Collapsed speech segment detection and suppression for WaveNet vocoder

In this paper, we propose a technique to alleviate quality degradation c...
research
11/24/1998

Text-To-Speech Conversion with Neural Networks: A Recurrent TDNN Approach

This paper describes the design of a neural network that performs the ph...
research
07/01/2019

Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation

In this paper, we propose a quasi-periodic neural network (QPNet) vocode...

Please sign up or login with your details

Forgot password? Click here to reset