Parallel WaveNet: Fast High-Fidelity Speech Synthesis

11/28/2017
by   Aaron van den Oord, et al.
0

The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today's massively parallel computers, and therefore hard to deploy in a real-time production setting. This paper introduces Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality. The resulting system is capable of generating high-fidelity speech samples at more than 20 times faster than real-time, and is deployed online by Google Assistant, including serving multiple English and Japanese voices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2020

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Several recent work on speech synthesis have employed generative adversa...
research
12/03/2019

WaveFlow: A Compact Flow-based Model for Raw Audio

In this work, we present WaveFlow, a small-footprint generative flow for...
research
10/21/2021

CaloFlow II: Even Faster and Still Accurate Generation of Calorimeter Showers with Normalizing Flows

Recently, we introduced CaloFlow, a high-fidelity generative model for G...
research
11/03/2020

StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization

In recent years, neural vocoders have surpassed classical speech generat...
research
02/23/2018

Efficient Neural Audio Synthesis

Sequential models achieve state-of-the-art results in audio, visual and ...
research
06/20/2022

WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis

Recently, GAN-based neural vocoders such as Parallel WaveGAN, MelGAN, Hi...
research
04/01/2021

Fast DCTTS: Efficient Deep Convolutional Text-to-Speech

We propose an end-to-end speech synthesizer, Fast DCTTS, that synthesize...

Please sign up or login with your details

Forgot password? Click here to reset