Waveform to Single Sinusoid Regression to Estimate the F0 Contour from Noisy Speech Using Recurrent Deep Neural Networks

07/02/2018
by   Akihiro Kato, et al.
0

The fundamental frequency (F0) represents pitch in speech that determines prosodic characteristics of speech and is needed in various tasks for speech analysis and synthesis. Despite decades of research on this topic, F0 estimation at low signal-to-noise ratios (SNRs) in unexpected noise conditions remains difficult. This work proposes a new approach to noise robust F0 estimation using a recurrent neural network (RNN) trained in a supervised manner. Recent studies employ deep neural networks (DNNs) for F0 tracking as a frame-by-frame classification task into quantised frequency states but we propose waveform-to-sinusoid regression instead to achieve both noise robustness and accurate estimation with increased frequency resolution. Experimental results with PTDB-TUG corpus contaminated by additive noise (NOISEX-92) demonstrate that the proposed method improves gross pitch error (GPE) rate and fine pitch error (FPE) by more than 35 and +10 dB compared with well-known noise robust F0 tracker, PEFAC. Furthermore, the proposed method also outperforms state-of-the-art DNN-based approaches by more than 15 preceding SNR range.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2018

A Regression Model of Recurrent Deep Neural Networks for Noise Robust Estimation of the Fundamental Frequency Contour of Speech

The fundamental frequency (F0) contour of speech is a key aspect to repr...
research
02/23/2023

Frequency bin-wise single channel speech presence probability estimation using multiple DNNs

In this work, we propose a frequency bin-wise method to estimate the sin...
research
06/01/2018

Machines hear better when they have ears

Deep-neural-network (DNN) based noise suppression systems yield signific...
research
05/21/2019

DNN-Based Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement

Multi-frame approaches for single-microphone speech enhancement, e.g., t...
research
01/24/2018

Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

This paper presents a waveform modeling and generation method using hier...
research
01/05/2022

Formant Tracking Using Quasi-Closed Phase Forward-Backward Linear Prediction Analysis and Deep Neural Networks

Formant tracking is investigated in this study by using trackers based o...
research
11/11/2019

Supervised Initialization of LSTM Networks for Fundamental Frequency Detection in Noisy Speech Signals

Fundamental frequency is one of the most important parameters of human s...

Please sign up or login with your details

Forgot password? Click here to reset