WaveCycleGAN2: Time-domain Neural Post-filter for Speech Waveform Generation

04/05/2019
by   Kou Tanaka, et al.
0

WaveCycleGAN has recently been proposed to bridge the gap between natural and synthesized speech waveforms in statistical parametric speech synthesis and provides fast inference with a moving average model rather than an autoregressive model and high-quality speech synthesis with the adversarial training. However, the human ear can still distinguish the processed speech waveforms from natural ones. One possible cause of this distinguishability is the aliasing observed in the processed speech waveform via down/up-sampling modules. To solve the aliasing and provide higher quality speech synthesis, we propose WaveCycleGAN2, which 1) uses generators without down/up-sampling modules and 2) combines discriminators of the waveform domain and acoustic parameter domain. The results show that the proposed method 1) alleviates the aliasing well, 2) is useful for both speech waveforms generated by analysis-and-synthesis and statistical parametric speech synthesis, and 3) achieves a mean opinion score comparable to those of natural speech and speech synthesized by WaveNet (open WaveNet) and WaveGlow while processing speech samples at a rate of more than 150 kHz on an NVIDIA Tesla P100.

READ FULL TEXT
research
09/25/2018

WaveCycleGAN: Synthetic-to-natural speech waveform conversion using cycle-consistent adversarial networks

We propose a learning-based filter that allows us to directly modify a s...
research
06/12/2021

Continuous Wavelet Vocoder-based Decomposition of Parametric Speech Waveform Synthesis

To date, various speech technology systems have adopted the vocoder appr...
research
11/21/2022

Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System

This paper integrates a classic mel-cepstral synthesis filter into a mod...
research
07/11/2020

Fast Griffin Lim based Waveform Generation Strategy for Text-to-Speech Synthesis

The performance of text-to-speech (TTS) systems heavily depends on spect...
research
01/25/2021

High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion

This Ph.D. thesis focuses on developing a system for high-quality speech...
research
04/10/2019

RawNet: Fast End-to-End Neural Vocoder

Neural networks based vocoders have recently demonstrated the powerful a...
research
11/29/2018

LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis

We propose a linear prediction (LP)-based waveform generation method via...

Please sign up or login with your details

Forgot password? Click here to reset