An End-to-End Neural Network for Polyphonic Piano Music Transcription

08/07/2015
by   Siddharth Sigtia, et al.
0

We present a supervised neural network model for polyphonic piano music transcription. The architecture of the proposed model is analogous to speech recognition systems and comprises an acoustic model and a music language model. The acoustic model is a neural network used for estimating the probabilities of pitches in a frame of audio. The language model is a recurrent neural network that models the correlations between pitch combinations over time. The proposed model is general and can be used to transcribe polyphonic music without imposing any constraints on the polyphony. The acoustic and language model predictions are combined using a probabilistic graphical model. Inference over the output variables is performed using the beam search algorithm. We perform two sets of experiments. We investigate various neural network architectures for the acoustic models and also investigate the effect of combining acoustic and music language model predictions using the proposed architecture. We compare performance of the neural network based acoustic models with two popular unsupervised acoustic models. Results show that convolutional neural network acoustic models yields the best performance across all evaluation metrics. We also observe improved performance with the application of the music language models. Finally, we present an efficient variant of beam search that improves performance and reduces run-times by an order of magnitude, making the model suitable for real-time applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/02/2018

Exploring Architectures, Data and Units For Streaming End-to-End Speech Recognition with RNN-Transducer

We investigate training end-to-end speech recognition models with the re...
research
02/16/2019

A Fully Differentiable Beam Search Decoder

We introduce a new beam search decoder that is fully differentiable, mak...
research
08/16/2018

Improved Chord Recognition by Combining Duration and Harmonic Language Models

Chord recognition systems typically comprise an acoustic model that pred...
research
10/08/2020

Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural Network with Language Model-Based Regularized Training

This paper describes a neural drum transcription method that detects fro...
research
08/16/2018

Automatic Chord Recognition with Higher-Order Harmonic Language Modelling

Common temporal models for automatic chord recognition model chord chang...
research
10/08/2021

MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection

With the recent growth of remote and hybrid work, online meetings often ...
research
06/11/2018

Finding Syntax in Human Encephalography with Beam Search

Recurrent neural network grammars (RNNGs) are generative models of (tree...

Please sign up or login with your details

Forgot password? Click here to reset