TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

02/02/2022
by   Ke Chen, et al.
0

Singing melody extraction is an important problem in the field of music information retrieval. Existing methods typically rely on frequency-domain representations to estimate the sung frequencies. However, this design does not lead to human-level performance in the perception of melody information for both tone (pitch-class) and octave. In this paper, we propose TONet, a plug-and-play model that improves both tone and octave perceptions by leveraging a novel input representation and a novel network architecture. First, we present an improved input representation, the Tone-CFP, that explicitly groups harmonics via a rearrangement of frequency-bins. Second, we introduce an encoder-decoder architecture that is designed to obtain a salience feature map, a tone feature map, and an octave feature map. Third, we propose a tone-octave fusion mechanism to improve the final salience feature map. Experiments are done to verify the capability of TONet with various baseline backbone models. Our results show that tone-octave fusion with Tone-CFP can significantly improve the singing voice extraction performance across various datasets – with substantial gains in octave and tone accuracy.

READ FULL TEXT
research
02/24/2021

Deep Learning Approach for Singer Voice Classification of Vietnamese Popular Music

Singer voice classification is a meaningful task in the digital era. Wit...
research
01/25/2020

The impact of Audio input representations on neural network based music transcription

This paper thoroughly analyses the effect of different input representat...
research
10/30/2018

A Streamlined Encoder/Decoder Architecture for Melody Extraction

Melody extraction in polyphonic musical audio is important for music sig...
research
01/14/2019

Music Artist Classification with Convolutional Recurrent Neural Networks

Previous attempts at music artist classification use frame-level audio f...
research
10/18/2021

SpecTNT: a Time-Frequency Transformer for Music Audio

Transformers have drawn attention in the MIR field for their remarkable ...
research
10/11/2021

Pitch Preservation In Singing Voice Synthesis

Suffering from limited singing voice corpus, existing singing voice synt...
research
08/04/2023

Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction

In deep learning research, many melody extraction models rely on redesig...

Please sign up or login with your details

Forgot password? Click here to reset