DEEPF0: End-To-End Fundamental Frequency Estimation for Music and Speech Signals

02/11/2021
by   Satwinder Singh, et al.
0

We propose a novel pitch estimation technique called DeepF0, which leverages the available annotated data to directly learns from the raw audio in a data-driven manner. F0 estimation is important in various speech processing and music information retrieval applications. Existing deep learning models for pitch estimations have relatively limited learning capabilities due to their shallow receptive field. The proposed model addresses this issue by extending the receptive field of a network by introducing the dilated convolutional blocks into the network. The dilation factor increases the network receptive field exponentially without increasing the parameters of the model exponentially. To make the training process more efficient and faster, DeepF0 is augmented with residual blocks with residual connections. Our empirical evaluation demonstrates that the proposed model outperforms the baselines in terms of raw pitch accuracy and raw chroma accuracy even using 77.4 network parameters. We also show that our model can capture reasonably well pitch estimation even under the various levels of accompaniment noise.

READ FULL TEXT

page 1

page 2

page 3

page 5

research
02/17/2018

CREPE: A Convolutional Representation for Pitch Estimation

The task of estimating the fundamental frequency of a monophonic sound r...
research
07/16/2017

Optical Music Recognition with Convolutional Sequence-to-Sequence Models

Optical Music Recognition (OMR) is an important technology within Music ...
research
10/29/2018

Cascaded CNN-resBiLSTM-CTC: An End-to-End Acoustic Model For Speech Recognition

Automatic speech recognition (ASR) tasks are resolved by end-to-end deep...
research
06/27/2023

RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music

Vocal pitch is an important high-level feature in music audio processing...
research
09/04/2023

Raw Data Is All You Need: Virtual Axle Detector with Enhanced Receptive Field

Rising maintenance costs of ageing infrastructure necessitate innovative...
research
12/01/2018

SwishNet: A Fast Convolutional Neural Network for Speech, Music and Noise Classification and Segmentation

Speech, Music and Noise classification/segmentation is an important prep...

Please sign up or login with your details

Forgot password? Click here to reset