FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction

05/12/2020
by   Qiao Tian, et al.
0

In this paper, we propose the FeatherWave, yet another variant of WaveRNN vocoder combining the multi-band signal processing and the linear predictive coding. The LPCNet, a recently proposed neural vocoder which utilized the linear predictive characteristic of speech signal in the WaveRNN architecture, can generate high quality speech with a speed faster than real-time on a single CPU core. However, LPCNet is still not efficient enough for online speech generation tasks. To address this issue, we adopt the multi-band linear predictive coding for WaveRNN vocoder. The multi-band method enables the model to generate several speech samples in parallel at one step. Therefore, it can significantly improve the efficiency of speech synthesis. The proposed model with 4 sub-bands needs less than 1.6 GFLOPS for speech generation. In our experiments, it can generate 24 kHz high-fidelity audio 9x faster than real-time on a single CPU, which is much faster than the LPCNet vocoder. Furthermore, our subjective listening test shows that the FeatherWave can generate speech with better quality than LPCNet.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2020

Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech

In this paper, we propose multi-band MelGAN, a much faster waveform gene...
research
03/14/2023

Native Multi-Band Audio Coding within Hyper-Autoencoded Reconstruction Propagation Networks

Spectral sub-bands do not portray the same perceptual relevance. In audi...
research
09/04/2019

DurIAN: Duration Informed Attention Network For Multimodal Synthesis

In this paper, we present a generic and robust multimodal synthesis syst...
research
06/15/2021

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Most neural vocoders employ band-limited mel-spectrograms to generate wa...
research
04/01/2021

Fast DCTTS: Efficient Deep Convolutional Text-to-Speech

We propose an end-to-end speech synthesizer, Fast DCTTS, that synthesize...
research
01/30/2021

Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet

In this work, a robust and efficient text-to-speech system, named Triple...
research
10/07/2021

Towards Universal Neural Vocoding with a Multi-band Excited WaveNet

This paper introduces the Multi-Band Excited WaveNet a neural vocoder fo...

Please sign up or login with your details

Forgot password? Click here to reset