Towards Robust Neural Vocoding for Speech Generation: A Survey

12/05/2019
by   Po-chun Hsu, et al.
0

Recently, neural vocoders have been widely used in speech synthesis tasks, including text-to-speech and voice conversion. However, in the encounter of data distribution mismatch between training and inference, neural vocoders trained on real data often degrade in voice quality for unseen scenarios. In this paper, we train three commonly used neural vocoders, including WaveNet, WaveRNN, and WaveGlow, alternately on five different datasets. To study the robustness of neural vocoders, we evaluate the models using acoustic features from seen/unseen speakers, seen/unseen languages, a text-to-speech model, and a voice conversion model. In this work, we found that WaveNet is more robust than WaveRNN, especially in the face of inconsistency between training and testing data. Through our experiments, we show that WaveNet is more suitable for text-to-speech models, and WaveRNN more suitable for voice conversion applications. Furthermore, we present results with considerable reference value of subjective human evaluation for future studies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2022

HiFi-VC: High Quality ASR-Based Voice Conversion

The goal of voice conversion (VC) is to convert input voice to match the...
research
02/19/2019

Data Efficient Voice Cloning for Neural Singing Synthesis

There are many use cases in singing synthesis where creating voices from...
research
05/10/2022

Read the Room: Adapting a Robot's Voice to Ambient and Social Contexts

Adapting one's voice to different ambient environments and social intera...
research
05/18/2020

A Cyclical Post-filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-speech Systems

Recently, the effectiveness of text-to-speech (TTS) systems combined wit...
research
12/08/2022

SpeechLMScore: Evaluating speech generation using speech language model

While human evaluation is the most reliable metric for evaluating speech...
research
11/24/1998

Text-To-Speech Conversion with Neural Networks: A Recurrent TDNN Approach

This paper describes the design of a neural network that performs the ph...
research
08/24/2023

Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion

There are growing implications surrounding generative AI in the speech d...

Please sign up or login with your details

Forgot password? Click here to reset