The impact of Audio input representations on neural network based music transcription

01/25/2020
by   Kin Wai Cheuk, et al.
0

This paper thoroughly analyses the effect of different input representations on polyphonic multi-instrument music transcription. We use our own GPU based spectrogram extraction tool, nnAudio, to investigate the influence of using a linear-frequency spectrogram, log-frequency spectrogram, Mel spectrogram, and constant-Q transform (CQT). Our results show that a 8.33 transcription accuracy and a 9.39 choosing the appropriate input representation (log-frequency spectrogram with STFT window length 4,096 and 2,048 frequency bins in the spectrogram) without changing the neural network design (single layer fully connected). Our experiments also show that Mel spectrogram is a compact representation for which we can reduce the number of frequency bins to only 512 while still keeping a relatively high music transcription accuracy.

READ FULL TEXT
research
06/19/2023

Multitrack Music Transcription with a Time-Frequency Perceiver

Multitrack music transcription aims to transcribe a music audio input in...
research
12/14/2017

DLR : Toward a deep learned rhythmic representation for music content analysis

In the use of deep neural networks, it is crucial to provide appropriate...
research
11/13/2017

Invariances and Data Augmentation for Supervised Music Transcription

This paper explores a variety of models for frame-based music transcript...
research
02/02/2022

TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

Singing melody extraction is an important problem in the field of music ...
research
09/06/2017

A Comparison on Audio Signal Preprocessing Methods for Deep Neural Networks on Music Tagging

Deep neural networks (DNN) have been successfully applied for music clas...
research
04/10/2019

Neuralogram: A Deep Neural Network Based Representation for Audio Signals

We propose the Neuralogram – a deep neural network based representation ...
research
06/19/2017

Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras

We introduce Kapre, Keras layers for audio and music signal preprocessin...

Please sign up or login with your details

Forgot password? Click here to reset