Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms

05/12/2021
by   Ryoto Ishizuka, et al.
10

This paper describes an automatic drum transcription (ADT) method that directly estimates a tatum-level drum score from a music signal, in contrast to most conventional ADT methods that estimate the frame-level onset probabilities of drums. To estimate a tatum-level score, we propose a deep transcription model that consists of a frame-level encoder for extracting the latent features from a music signal and a tatum-level decoder for estimating a drum score from the latent features pooled at the tatum level. To capture the global repetitive structure of drum scores, which is difficult to learn with a recurrent neural network (RNN), we introduce a self-attention mechanism with tatum-synchronous positional encoding into the decoder. To mitigate the difficulty of training the self-attention-based model from an insufficient amount of paired data and improve the musical naturalness of the estimated scores, we propose a regularized training method that uses a global structure-aware masked language (score) model with a self-attention mechanism pretrained from an extensive collection of drum scores. Experimental results showed that the proposed regularized model outperformed the conventional RNN-based model in terms of the tatum-level error rate and the frame-level F-measure, even when only a limited amount of paired data was available so that the non-regularized model underperformed the RNN-based model.

READ FULL TEXT

page 3

page 4

page 5

page 7

page 10

page 16

page 18

page 19

research
10/08/2020

Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural Network with Language Model-Based Regularized Training

This paper describes a neural drum transcription method that detects fro...
research
11/11/2019

Visualizing and Understanding Self-attention based Music Tagging

Recently, we proposed a self-attention based music tagging model. Differ...
research
09/17/2020

Temporally Guided Music-to-Body-Movement Generation

This paper presents a neural network model to generate virtual violinist...
research
06/12/2019

Toward Interpretable Music Tagging with Self-Attention

Self-attention is an attention mechanism that learns a representation by...
research
12/28/2022

Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism

This paper proposes a novel sequence-to-sequence (seq2seq) model with a ...
research
08/22/2021

Generalizing RNN-Transducer to Out-Domain Audio via Sparse Self-Attention Layers

Recurrent neural network transducers (RNN-T) are a promising end-to-end ...
research
06/12/2023

Recurrent Attention Networks for Long-text Modeling

Self-attention-based models have achieved remarkable progress in short-t...

Please sign up or login with your details

Forgot password? Click here to reset