The Effect of Spectrogram Reconstruction on Automatic Music Transcription: An Alternative Approach to Improve Transcription Accuracy

10/20/2020
by   Kin Wai Cheuk, et al.
4

Most of the state-of-the-art automatic music transcription (AMT) models break down the main transcription task into sub-tasks such as onset prediction and offset prediction and train them with onset and offset labels. These predictions are then concatenated together and used as the input to train another model with the pitch labels to obtain the final transcription. We attempt to use only the pitch labels (together with spectrogram reconstruction loss) and explore how far this model can go without introducing supervised sub-tasks. In this paper, we do not aim at achieving state-of-the-art transcription accuracy, instead, we explore the effect that spectrogram reconstruction has on our AMT model. Our proposed model consists of two U-nets: the first U-net transcribes the spectrogram into a posteriorgram, and a second U-net transforms the posteriorgram back into a spectrogram. A reconstruction loss is applied between the original spectrogram and the reconstructed spectrogram to constrain the second U-net to focus only on reconstruction. We train our model on three different datasets: MAPS, MAESTRO, and MusicNet. Our experiments show that adding the reconstruction loss can generally improve the note-level transcription accuracy when compared to the same model without the reconstruction part. Moreover, it can also boost the frame-level precision to be higher than the state-of-the-art models. The feature maps learned by our U-net contain gridlike structures (not present in the baseline model) which implies that with the presence of the reconstruction loss, the model is probably trying to count along both the time and frequency axis, resulting in a higher note-level transcription accuracy.

READ FULL TEXT

page 3

page 6

page 7

research
01/15/2019

Spectrogram Feature Losses for Music Source Separation

In this paper we study deep learning-based music source separation, and ...
research
04/06/2022

S-R2F2U-Net: A single-stage model for teeth segmentation

Precision tooth segmentation is crucial in the oral sector because it pr...
research
07/10/2023

Automatic Piano Transcription with Hierarchical Frequency-Time Transformer

Taking long-term spectral and temporal dependencies into account is esse...
research
07/19/2017

Metrical-accent Aware Vocal Onset Detection in Polyphonic Audio

The goal of this study is the automatic detection of onsets of the singi...
research
11/24/2021

KUIELab-MDX-Net: A Two-Stream Neural Network for Music Demixing

Recently, many methods based on deep learning have been proposed for mus...
research
06/20/2019

Adversarial Learning for Improved Onsets and Frames Music Transcription

Automatic music transcription is considered to be one of the hardest pro...
research
11/05/2020

A Comparison Study on Infant-Parent Voice Diarization

We design a framework for studying prelinguistic child voicefrom 3 to 24...

Please sign up or login with your details

Forgot password? Click here to reset