Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural Network with Language Model-Based Regularized Training

10/08/2020
by   Ryoto Ishizuka, et al.
0

This paper describes a neural drum transcription method that detects from music signals the onset times of drums at the tatum level, where tatum times are assumed to be estimated in advance. In conventional studies on drum transcription, deep neural networks (DNNs) have often been used to take a music spectrogram as input and estimate the onset times of drums at the frame level. The major problem with such frame-to-frame DNNs, however, is that the estimated onset times do not often conform with the typical tatum-level patterns appearing in symbolic drum scores because the long-term musically meaningful structures of those patterns are difficult to learn at the frame level. To solve this problem, we propose a regularized training method for a frame-to-tatum DNN. In the proposed method, a tatum-level probabilistic language model (gated recurrent unit (GRU) network or repetition-aware bi-gram model) is trained from an extensive collection of drum scores. Given that the musical naturalness of tatum-level onset times can be evaluated by the language model, the frame-to-tatum DNN is trained with a regularizer based on the pretrained language model. The experimental results demonstrate the effectiveness of the proposed regularized training method.

READ FULL TEXT
research
05/12/2021

Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms

This paper describes an automatic drum transcription (ADT) method that d...
research
08/07/2015

An End-to-End Neural Network for Polyphonic Piano Music Transcription

We present a supervised neural network model for polyphonic piano music ...
research
06/27/2012

Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription

We investigate the problem of modeling symbolic sequences of polyphonic ...
research
02/14/2020

Phase reconstruction based on recurrent phase unwrapping with deep neural networks

Phase reconstruction, which estimates phase from a given amplitude spect...
research
10/24/2019

Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks

The present paper describes singing voice synthesis based on convolution...
research
08/16/2018

Improved Chord Recognition by Combining Duration and Harmonic Language Models

Chord recognition systems typically comprise an acoustic model that pred...
research
10/29/2018

A context encoder for audio inpainting

We studied the ability of deep neural networks (DNNs) to restore missing...

Please sign up or login with your details

Forgot password? Click here to reset