A Convolutional-Attentional Neural Framework for Structure-Aware Performance-Score Synchronization

04/19/2022
by   Ruchit Agrawal, et al.
0

Performance-score synchronization is an integral task in signal processing, which entails generating an accurate mapping between an audio recording of a performance and the corresponding musical score. Traditional synchronization methods compute alignment using knowledge-driven and stochastic approaches, and are typically unable to generalize well to different domains and modalities. We present a novel data-driven method for structure-aware performance-score synchronization. We propose a convolutional-attentional architecture trained with a custom loss based on time-series divergence. We conduct experiments for the audio-to-MIDI and audio-to-image alignment tasks pertained to different score modalities. We validate the effectiveness of our method via ablation studies and comparisons with state-of-the-art alignment approaches. We demonstrate that our approach outperforms previous synchronization methods for a variety of test settings across score modalities and acoustic conditions. Our method is also robust to structural differences between the performance and score sequences, which is a common limitation of standard alignment approaches.

READ FULL TEXT
research
05/31/2022

Towards Context-Aware Neural Performance-Score Synchronisation

Music can be represented in multiple forms, such as in the audio form as...
research
11/15/2020

Learning Frame Similarity using Siamese networks for Audio-to-Score Alignment

Audio-to-score alignment aims at generating an accurate mapping between ...
research
01/31/2021

Structure-Aware Audio-to-Score Alignment using Progressively Dilated Convolutional Neural Networks

The identification of structural differences between a music performance...
research
09/30/2020

Rethinking Evaluation Methodology for Audio-to-Score Alignment

This paper offers a precise, formal definition of an audio-to-score alig...
research
07/18/2023

Plug the Leaks: Advancing Audio-driven Talking Face Generation by Preventing Unintended Information Flow

Audio-driven talking face generation is the task of creating a lip-synch...
research
01/24/2019

Multi-Frequency Phase Synchronization

We propose a novel formulation for phase synchronization -- the statisti...
research
10/12/2021

Are you doing what I say? On modalities alignment in ALFRED

ALFRED is a recently proposed benchmark that requires a model to complet...

Please sign up or login with your details

Forgot password? Click here to reset