Automatic Piano Transcription with Hierarchical Frequency-Time Transformer

07/10/2023
by   Keisuke Toyama, et al.
0

Taking long-term spectral and temporal dependencies into account is essential for automatic piano transcription. This is especially helpful when determining the precise onset and offset for each note in the polyphonic piano content. In this case, we may rely on the capability of self-attention mechanism in Transformers to capture these long-term dependencies in the frequency and time axes. In this work, we propose hFT-Transformer, which is an automatic music transcription method that uses a two-level hierarchical frequency-time Transformer architecture. The first hierarchy includes a convolutional block in the time axis, a Transformer encoder in the frequency axis, and a Transformer decoder that converts the dimension in the frequency axis. The output is then fed into the second hierarchy which consists of another Transformer encoder in the time axis. We evaluated our method with the widely used MAPS and MAESTRO v3.0.0 datasets, and it demonstrated state-of-the-art performance on all the F1-scores of the metrics among Frame, Note, Note with Offset, and Note with Offset and Velocity estimations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2021

SpecTNT: a Time-Frequency Transformer for Music Audio

Transformers have drawn attention in the MIR field for their remarkable ...
research
04/08/2022

Exploring Transformer's potential on automatic piano transcription

Most recent research about automatic music transcription (AMT) uses conv...
research
05/29/2022

Modeling Beats and Downbeats with a Time-Frequency Transformer

Transformer is a successful deep neural network (DNN) architecture that ...
research
10/20/2020

The Effect of Spectrogram Reconstruction on Automatic Music Transcription: An Alternative Approach to Improve Transcription Accuracy

Most of the state-of-the-art automatic music transcription (AMT) models ...
research
02/06/2022

Pole recovery from noisy data on imaginary axis

This note proposes an algorithm for identifying the poles and residues o...
research
03/07/2023

AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer

In this paper, we propose an effective sound event detection (SED) metho...
research
10/29/2020

T-vectors: Weakly Supervised Speaker Identification Using Hierarchical Transformer Model

Identifying multiple speakers without knowing where a speaker's voice is...

Please sign up or login with your details

Forgot password? Click here to reset