Robust universal neural vocoding

11/15/2018
by   Jaime Lorenzo-Trueba, et al.
0

This paper introduces a robust universal neural vocoder trained with 74 speakers (comprised of both genders) coming from 17 languages. This vocoder is shown to be capable of generating speech of consistently good quality (98 relative mean MUSHRA when compared to natural speech) regardless of whether the input spectrogram comes from a speaker, style or recording condition seen during training or from an out-of-domain scenario. Together with the system, we present a full text-to-speech analysis of robustness of a number of implemented systems. The complexity of systems tested range from a convolutional neural networks-based system conditioned on linguistics to a recurrent neural networks-based system conditioned on mel-spectrograms. The analysis shows that convolutional neural networks-based systems are prone to occasional instabilities, while the recurrent approaches are significantly more stable and capable of providing universalizing robustness.

READ FULL TEXT

page 2

page 4

research
08/09/2020

Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions

Recent advancements in deep learning led to human-level performance in s...
research
05/27/2020

Fast and Effective Robustness Certification for Recurrent Neural Networks

We present a precise and scalable verifier for recurrent neural networks...
research
04/30/2018

How Robust are Deep Neural Networks?

Convolutional and Recurrent, deep neural networks have been successful i...
research
10/05/2021

Is Attention always needed? A Case Study on Language Identification from Speech

Language Identification (LID), a recommended initial step to Automatic S...
research
12/02/2020

The Third DIHARD Diarization Challenge

This paper introduces the third DIHARD challenge, the third in a series ...
research
02/01/2021

Universal Neural Vocoding with Parallel WaveNet

We present a universal neural vocoder based on Parallel WaveNet, with an...
research
02/15/2022

SpeechPainter: Text-conditioned Speech Inpainting

We propose SpeechPainter, a model for filling in gaps of up to one secon...

Please sign up or login with your details

Forgot password? Click here to reset