SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis

01/16/2020
by   Bohan Zhai, et al.
0

Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into an audio waveform. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. WaveGlow is a flow-based feed-forward alternative to these auto-regressive models (Prenger et al., 2019). However, while WaveGlow can be easily parallelized, the model is too expensive for real-time speech synthesis on the edge. This paper presents SqueezeWave, a family of lightweight vocoders based on WaveGlow that can generate audio of similar quality to WaveGlow with 61x - 214x fewer MACs. Code, trained models, and generated audio are publicly available at https://github.com/tianrengao/SqueezeWave.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/18/2023

OxfordVGG Submission to the EGO4D AV Transcription Challenge

This report presents the technical details of our submission on the EGO4...
research
03/11/2019

Deep Text-to-Speech System with Seq2Seq Model

Recent trends in neural network based text-to-speech/speech synthesis pi...
research
11/25/2020

FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge

Nowadays more and more applications can benefit from edge-based text-to-...
research
11/08/2022

PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping

Previous generative adversarial network (GAN)-based neural vocoders are ...
research
10/31/2018

WaveGlow: A Flow-based Generative Network for Speech Synthesis

In this paper we propose WaveGlow: a flow-based network capable of gener...
research
09/14/2021

fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit

This paper presents fairseq S^2, a fairseq extension for speech synthesi...
research
04/23/2021

DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for Embedded Speech and Audio Processing from Decentralised Data

Deep neural speech and audio processing systems have a large number of t...

Please sign up or login with your details

Forgot password? Click here to reset