RawNet: Fast End-to-End Neural Vocoder

04/10/2019
by   Yunchao He, et al.
0

Neural networks based vocoders have recently demonstrated the powerful ability to synthesize high quality speech. These models usually generate samples by conditioning on some spectrum features, such as Mel-spectrum. However, these features are extracted by using speech analysis module including some processing based on the human knowledge. In this work, we proposed RawNet, a truly end-to-end neural vocoder, which use a coder network to learn the higher representation of signal, and an autoregressive voder network to generate speech sample by sample. The coder and voder together act like an auto-encoder network, and could be jointly trained directly on raw waveform without any human-designed features. The experiments on the Copy-Synthesis tasks show that RawNet can achieve the comparative synthesized speech quality with LPCNet, with a smaller model architecture and faster speech generation at the inference step.

READ FULL TEXT

page 2

page 3

page 4

research
04/05/2019

WaveCycleGAN2: Time-domain Neural Post-filter for Speech Waveform Generation

WaveCycleGAN has recently been proposed to bridge the gap between natura...
research
02/23/2022

End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation

Neural vocoders have recently demonstrated high quality speech synthesis...
research
11/06/2020

Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis

We describe a sequence-to-sequence neural network which can directly gen...
research
04/13/2020

From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech

This work seeks the possibility of generating the human face from voice ...
research
04/08/2021

Flavored Tacotron: Conditional Learning for Prosodic-linguistic Features

Neural sequence-to-sequence text-to-speech synthesis (TTS), such as Taco...
research
08/15/2022

Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0

Neural network-based Text-to-Speech has significantly improved the quali...
research
04/22/2021

Restoring degraded speech via a modified diffusion model

There are many deterministic mathematical operations (e.g. compression, ...

Please sign up or login with your details

Forgot password? Click here to reset