Supervised Initialization of LSTM Networks for Fundamental Frequency Detection in Noisy Speech Signals

11/11/2019
by   Marvin Coto-Jiménez, et al.
0

Fundamental frequency is one of the most important parameters of human speech, of importance for the classification of accent, gender, speaking styles, speaker identification, age, among others. The proper detection of this parameter remains as an important challenge for severely degraded signals. In previous references for detecting fundamental frequency in noisy speech using deep learning, the networks, such as Long Short-term Memory (LSTM) has been initialized with random weights, and then trained following a back-propagation through time algorithm. In this work, a proposal for a more efficient initialization, based on a supervised training using an Auto-associative network, is presented. This initialization is a better starting point for the detection of fundamental frequency in noisy speech. The advantages of this initialization are noticeable using objective measures for the accuracy of the detection and for the training of the networks, under the presence of additive white noise at different signal-to-noise levels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2019

Audio-noise Power Spectral Density Estimation Using Long Short-term Memory

We propose a method using a long short-term memory (LSTM) network to est...
research
12/22/2019

On the Initialization of Long Short-Term Memory Networks

Weight initialization is important for faster convergence and stability ...
research
08/31/2017

Joint Separation and Denoising of Noisy Multi-talker Speech using Recurrent Neural Networks and Permutation Invariant Training

In this paper we propose to use utterance-level Permutation Invariant Tr...
research
01/22/2022

Supervised and Self-supervised Pretraining Based COVID-19 Detection Using Acoustic Breathing/Cough/Speech Signals

In this work, we propose a bi-directional long short-term memory (BiLSTM...
research
05/08/2018

A Regression Model of Recurrent Deep Neural Networks for Noise Robust Estimation of the Fundamental Frequency Contour of Speech

The fundamental frequency (F0) contour of speech is a key aspect to repr...
research
07/02/2018

Waveform to Single Sinusoid Regression to Estimate the F0 Contour from Noisy Speech Using Recurrent Deep Neural Networks

The fundamental frequency (F0) represents pitch in speech that determine...
research
10/20/2019

Deep speech inpainting of time-frequency masks

In particularly noisy environments, transient loud intrusions can comple...

Please sign up or login with your details

Forgot password? Click here to reset