Non-parallel Voice Conversion System with WaveNet Vocoder and Collapsed Speech Suppression

03/26/2020
by   Yi-Chiao Wu, et al.
0

In this paper, we integrate a simple non-parallel voice conversion (VC) system with a WaveNet (WN) vocoder and a proposed collapsed speech suppression technique. The effectiveness of WN as a vocoder for generating high-fidelity speech waveforms on the basis of acoustic features has been confirmed in recent works. However, when combining the WN vocoder with a VC system, the distorted acoustic features, acoustic and temporal mismatches, and exposure bias usually lead to significant speech quality degradation, making WN generate some very noisy speech segments called collapsed speech. To tackle the problem, we take conventional-vocoder-generated speech as the reference speech to derive a linear predictive coding distribution constraint (LPCDC) to avoid the collapsed speech problem. Furthermore, to mitigate the negative effects introduced by the LPCDC, we propose a collapsed speech segment detector (CSSD) to ensure that the LPCDC is only applied to the problematic segments to limit the loss of quality to short periods. Objective and subjective evaluations are conducted, and the experimental results confirm the effectiveness of the proposed method, which further improves the speech quality of our previous non-parallel VC system submitted to Voice Conversion Challenge 2018.

READ FULL TEXT
research
04/30/2018

Collapsed speech segment detection and suppression for WaveNet vocoder

In this paper, we propose a technique to alleviate quality degradation c...
research
12/30/2020

Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic Speech Synthesis

Articulatory-to-acoustic (A2A) synthesis refers to the generation of aud...
research
05/18/2020

A Cyclical Post-filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-speech Systems

Recently, the effectiveness of text-to-speech (TTS) systems combined wit...
research
02/24/2022

Speech segmentation using multilevel hybrid filters

A novel approach for speech segmentation is proposed, based on Multileve...
research
02/05/2019

An Enhanced Interleaving Frame Loss Concealment Method for Voice Over IP Network Services

This paper focuses on AMR WB G.722.2 speech codec, and discusses the unu...
research
07/21/2019

Statistical Voice Conversion with Quasi-Periodic WaveNet Vocoder

In this paper, we investigate the effectiveness of a quasi-periodic Wave...
research
10/10/2021

Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding

Recently, phonetic posteriorgrams (PPGs) based methods have been quite p...

Please sign up or login with your details

Forgot password? Click here to reset