Modeling Beats and Downbeats with a Time-Frequency Transformer

05/29/2022
by   Yun-Ning Hung, et al.
0

Transformer is a successful deep neural network (DNN) architecture that has shown its versatility not only in natural language processing but also in music information retrieval (MIR). In this paper, we present a novel Transformer-based approach to tackle beat and downbeat tracking. This approach employs SpecTNT (Spectral-Temporal Transformer in Transformer), a variant of Transformer that models both spectral and temporal dimensions of a time-frequency input of music audio. A SpecTNT model uses a stack of blocks, where each consists of two levels of Transformer encoders. The lower-level (or spectral) encoder handles the spectral features and enables the model to pay attention to harmonic components of each frame. Since downbeats indicate bar boundaries and are often accompanied by harmonic changes, this step may help downbeat modeling. The upper-level (or temporal) encoder aggregates useful local spectral information to pay attention to beat/downbeat positions. We also propose an architecture that combines SpecTNT with a state-of-the-art model, Temporal Convolutional Networks (TCN), to further improve the performance. Extensive experiments demonstrate that our approach can significantly outperform TCN in downbeat tracking while maintaining comparable result in beat tracking.

READ FULL TEXT
research
10/18/2021

SpecTNT: a Time-Frequency Transformer for Music Audio

Transformers have drawn attention in the MIR field for their remarkable ...
research
06/19/2023

Multitrack Music Transcription with a Time-Frequency Perceiver

Multitrack music transcription aims to transcribe a music audio input in...
research
03/13/2023

Transformer Encoder with Multiscale Deep Learning for Pain Classification Using Physiological Signals

Pain is a serious worldwide health problem that affects a vast proportio...
research
07/10/2023

Automatic Piano Transcription with Hierarchical Frequency-Time Transformer

Taking long-term spectral and temporal dependencies into account is esse...
research
02/21/2023

DasFormer: Deep Alternating Spectrogram Transformer for Multi/Single-Channel Speech Separation

For the task of speech separation, previous study usually treats multi-c...
research
04/19/2021

A novel Time-frequency Transformer and its Application in Fault Diagnosis of Rolling Bearings

The scope of data-driven fault diagnosis models is greatly improved thro...
research
04/08/2022

Exploring Transformer's potential on automatic piano transcription

Most recent research about automatic music transcription (AMT) uses conv...

Please sign up or login with your details

Forgot password? Click here to reset