High Fidelity Speech Synthesis with Adversarial Networks

09/25/2019
by   Mikołaj Bińkowski, et al.
0

Generative adversarial networks have seen rapid development in recent years and have led to remarkable improvements in generative modelling of images. However, their application in the audio domain has received limited attention, and autoregressive models, such as WaveNet, remain the state of the art in generative modelling of audio signals such as human speech. To address this paucity, we introduce GAN-TTS, a Generative Adversarial Network for Text-to-Speech. Our architecture is composed of a conditional feed-forward generator producing raw speech audio, and an ensemble of discriminators which operate on random windows of different sizes. The discriminators analyse the audio both in terms of general realism, as well as how well the audio corresponds to the utterance that should be pronounced. To measure the performance of GAN-TTS, we employ both subjective human evaluation (MOS - Mean Opinion Score), as well as novel quantitative metrics (Fréchet DeepSpeech Distance and Kernel DeepSpeech Distance), which we find to be well correlated with MOS. We show that GAN-TTS is capable of generating high-fidelity speech with naturalness comparable to the state-of-the-art models, and unlike autoregressive models, it is highly parallelisable thanks to an efficient feed-forward generator. Listen to GAN-TTS reading this abstract at http://tiny.cc/gantts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2020

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Several recent work on speech synthesis have employed generative adversa...
research
06/10/2020

HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Real-world audio recordings are often degraded by factors such as noise,...
research
08/03/2020

A Spectral Energy Distance for Parallel Speech Synthesis

Speech synthesis is an important practical generative modeling problem t...
research
06/05/2020

End-to-End Adversarial Text-to-Speech

Modern text-to-speech synthesis pipelines typically involve multiple pro...
research
07/04/2022

Stochastic Restoration of Heavily Compressed Musical Audio using Generative Adversarial Networks

Lossy audio codecs compress (and decompress) digital audio streams by re...
research
09/30/2018

Pseudo-Random Number Generation using Generative Adversarial Networks

Pseudo-random number generators (PRNG) are a fundamental element of many...
research
05/18/2020

Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization

In a recent paper, we have presented a generative adversarial network (G...

Please sign up or login with your details

Forgot password? Click here to reset