High-Fidelity and Low-Latency Universal Neural Vocoder based on Multiband WaveRNN with Data-Driven Linear Prediction for Discrete Waveform Modeling

05/20/2021
by   Patrick Lumban Tobing, et al.
0

This paper presents a novel high-fidelity and low-latency universal neural vocoder framework based on multiband WaveRNN with data-driven linear prediction for discrete waveform modeling (MWDLP). MWDLP employs a coarse-fine bit WaveRNN architecture for 10-bit mu-law waveform modeling. A sparse gated recurrent unit with a relatively large size of hidden units is utilized, while the multiband modeling is deployed to achieve real-time low-latency usage. A novel technique for data-driven linear prediction (LP) with discrete waveform modeling is proposed, where the LP coefficients are estimated in a data-driven manner. Moreover, a novel loss function using short-time Fourier transform (STFT) for discrete waveform modeling with Gumbel approximation is also proposed. The experimental results demonstrate that the proposed MWDLP framework generates high-fidelity synthetic speech for seen and unseen speakers and/or language on 300 speakers training data including clean and noisy/reverberant conditions, where the number of training utterances is limited to 60 per speaker, while allowing for real-time low-latency processing using a single core of ∼ 2.1–2.7 GHz CPU with ∼ 0.57–0.64 real-time factor including input/output and feature extraction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2023

Multi-fidelity prediction of fluid flow and temperature field based on transfer learning using Fourier Neural Operator

Data-driven prediction of fluid flow and temperature distribution in mar...
research
11/19/2020

Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains

We propose Universal MelGAN, a vocoder that synthesizes high-fidelity sp...
research
07/30/2020

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

We present a novel high-fidelity real-time neural vocoder called VocGAN....
research
03/27/2019

Real-time data-driven detection of the rock type alteration during a directional drilling

During the directional drilling, a bit may sometimes go to a nonproducti...
research
07/22/2020

FASTSWARM: A Data-driven FrAmework for Real-time Flying InSecT SWARM Simulation

Insect swarms are common phenomena in nature and therefore have been act...
research
05/15/2020

Reverberation Modeling for Source-Filter-based Neural Vocoder

This paper presents a reverberation module for source-filter-based neura...

Please sign up or login with your details

Forgot password? Click here to reset