FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge

11/25/2020
by   Bichen Wu, et al.
0

Nowadays more and more applications can benefit from edge-based text-to-speech (TTS). However, most existing TTS models are too computationally expensive and are not flexible enough to be deployed on the diverse variety of edge devices with their equally diverse computational capacities. To address this, we propose FBWave, a family of efficient and scalable neural vocoders that can achieve optimal performance-efficiency trade-offs for different edge devices. FBWave is a hybrid flow-based generative model that combines the advantages of autoregressive and non-autoregressive models. It produces high quality audio and supports streaming during inference while remaining highly computationally efficient. Our experiments show that FBWave can achieve similar audio quality to WaveRNN while reducing MACs by 40x. More efficient variants of FBWave can achieve up to 109x fewer MACs while still delivering acceptable audio quality. Audio demos are available at https://bichenwu09.github.io/vocoder_demos.

READ FULL TEXT

page 2

page 3

research
01/16/2020

SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis

Automatic speech synthesis is a challenging task that is becoming increa...
research
03/27/2022

Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge

Text-to-Speech (TTS) services that run on edge devices have many advanta...
research
05/23/2023

EfficientSpeech: An On-Device Text to Speech Model

State of the art (SOTA) neural text to speech (TTS) models can generate ...
research
07/08/2022

FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis

Unconstrained lip-to-speech synthesis aims to generate corresponding spe...
research
06/17/2021

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

This paper introduces WaveGrad 2, a non-autoregressive generative model ...
research
05/16/2023

SoundStorm: Efficient Parallel Audio Generation

We present SoundStorm, a model for efficient, non-autoregressive audio g...
research
08/30/2021

Neural HMMs are all you need (for high-quality attention-free TTS)

Neural sequence-to-sequence TTS has achieved significantly better output...

Please sign up or login with your details

Forgot password? Click here to reset