Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

01/24/2018
by   Zhen-Hua Ling, et al.
0

This paper presents a waveform modeling and generation method using hierarchical recurrent neural networks (HRNN) for speech bandwidth extension (BWE). Different from conventional BWE methods which predict spectral parameters for reconstructing wideband speech waveforms, this BWE method models and predicts waveform samples directly without using vocoders. Inspired by SampleRNN which is an unconditional neural audio generator, the HRNN model represents the distribution of each wideband or high-frequency waveform sample conditioned on the input narrowband waveform samples using a neural network composed of long short-term memory (LSTM) layers and feed-forward (FF) layers. The LSTM layers form a hierarchical structure and each layer operates at a specific temporal resolution to efficiently capture long-span dependencies between temporal sequences. Furthermore, additional conditions, such as the bottleneck (BN) features derived from narrowband speech using a deep neural network (DNN)-based state classifier, are employed as auxiliary input to further improve the quality of generated wideband speech. The experimental results of comparing several waveform modeling methods show that the HRNN-based method can achieve better speech quality and run-time efficiency than the dilated convolutional neural network (DCNN)-based method and the plain sample-level recurrent neural network (SRNN)-based method. Our proposed method also outperforms the conventional vocoder-based BWE method using LSTM-RNNs in terms of the subjective quality of the reconstructed wideband speech.

READ FULL TEXT
research
02/02/2020

Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks

In recent decades, neural network based methods have significantly impro...
research
04/22/2020

Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit

This paper presents a deep Gaussian process (DGP) model with a recurrent...
research
08/28/2019

Convolutional Recurrent Neural Network Based Progressive Learning for Monaural Speech Enhancement

Recently, progressive learning has shown its capacity of improving speec...
research
10/29/2018

STFT spectral loss for training a neural speech waveform model

This paper proposes a new loss using short-time Fourier transform (STFT)...
research
06/21/2019

Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling

This paper presents a method of using autoregressive neural networks for...
research
07/02/2018

Waveform to Single Sinusoid Regression to Estimate the F0 Contour from Noisy Speech Using Recurrent Deep Neural Networks

The fundamental frequency (F0) represents pitch in speech that determine...
research
02/22/2016

Improving Trajectory Modelling for DNN-based Speech Synthesis by using Stacked Bottleneck Features and Minimum Generation Error Training

We propose two novel techniques --- stacking bottleneck features and min...

Please sign up or login with your details

Forgot password? Click here to reset