SpeedySpeech: Efficient Neural Speech Synthesis

08/09/2020
by   Jan Vainer, et al.
0

While recent neural sequence-to-sequence models have greatly improved the quality of speech synthesis, there has not been a system capable of fast training, fast inference and high-quality audio synthesis at the same time. We propose a student-teacher network capable of high-quality faster-than-real-time spectrogram synthesis, with low requirements on computational resources and fast training time. We show that self-attention layers are not necessary for generation of high quality audio. We utilize simple convolutional blocks with residual connections in both student and teacher networks and use only a single attention layer in the teacher model. Coupled with a MelGAN vocoder, our model's voice quality was rated significantly higher than Tacotron 2. Our model can be efficiently trained on a single GPU and can run in real time even on a CPU. We provide both our source code and audio samples in our GitHub repository.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2023

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

Denoising diffusion probabilistic models (DDPMs) have shown promising pe...
research
02/23/2018

Efficient Neural Audio Synthesis

Sequential models achieve state-of-the-art results in audio, visual and ...
research
04/14/2022

Streamable Neural Audio Synthesis With Non-Causal Convolutions

Deep learning models are mostly used in an offline inference fashion. Ho...
research
05/30/2018

Marian: Cost-effective High-Quality Neural Machine Translation in C++

This paper describes the submissions of the "Marian" team to the WNMT 20...
research
08/30/2021

Neural HMMs are all you need (for high-quality attention-free TTS)

Neural sequence-to-sequence TTS has achieved significantly better output...
research
05/15/2020

WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU

In this paper, we propose WG-WaveNet, a fast, lightweight, and high-qual...
research
10/31/2018

WaveGlow: A Flow-based Generative Network for Speech Synthesis

In this paper we propose WaveGlow: a flow-based network capable of gener...

Please sign up or login with your details

Forgot password? Click here to reset