hf0: A hybrid pitch extraction method for multimodal voice

04/22/2019
by   Pradeep Rengaswamy, et al.
0

Pitch or fundamental frequency (f0) extraction is a fundamental problem studied extensively for its potential applications in speech and clinical applications. In literature, explicit mode specific (modal speech or singing voice or emotional/ expressive speech or noisy speech) signal processing and deep learning f0 extraction methods that exploit the quasi periodic nature of the signal in time, harmonic property in spectral or combined form to extract the pitch is developed. Hence, there is no single unified method which can reliably extract the pitch from various modes of the acoustic signal. In this work, we propose a hybrid f0 extraction method which seamlessly extracts the pitch across modes of speech production with very high accuracy required for many applications. The proposed hybrid model exploits the advantages of deep learning and signal processing methods to minimize the pitch detection error and adopts to various modes of acoustic signal. Specifically, we propose an ordinal regression convolutional neural networks to map the periodicity rich input representation to obtain the nominal pitch classes which drastically reduces the number of classes required for pitch detection unlike other deep learning approaches. Further, the accurate f0 is estimated from the nominal pitch class labels by filtering and autocorrelation. We show that the proposed method generalizes to the unseen modes of voice production and various noises for large scale datasets. Also, the proposed hybrid model significantly reduces the learning parameters required to train the deep model compared to other methods. Furthermore,the evaluation measures showed that the proposed method is significantly better than the state-of-the-art signal processing and deep learning approaches.

READ FULL TEXT

page 1

page 4

research
10/05/2021

Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet

Modifying the pitch and timing of an audio signal are fundamental audio ...
research
11/25/2018

Glottal Closure Instants Detection From Pathological Acoustic Speech Signal Using Deep Learning

In this paper, we propose a classification based glottal closure instant...
research
04/26/2018

Detection of Glottal Closure Instants using Deep Dilated Convolutional Neural Networks

Glottal Closure Instants (GCIs) correspond to the temporal locations of ...
research
03/17/2022

Robust and Complex Approach of Pathological Speech Signal Analysis

This paper presents a study of the approaches in the state-of-the-art in...
research
10/05/2021

Manifold learning-supported estimation of relative transfer functions for spatial filtering

Many spatial filtering algorithms used for voice capture in, e.g., telec...
research
08/10/2022

Preserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source

Reverberations are unavoidable in enclosures, resulting in reduced intel...

Please sign up or login with your details

Forgot password? Click here to reset