Exploiting Temporal Dependencies for Cross-Modal Music Piece Identification

05/26/2021
by   Luis Carvalho, et al.
0

This paper addresses the problem of cross-modal musical piece identification and retrieval: finding the appropriate recording(s) from a database given a sheet music query, and vice versa, working directly with audio and scanned sheet music images. The fundamental approach to this is to learn a cross-modal embedding space with a suitable similarity structure for audio and sheet image snippets, using a deep neural network, and identifying candidate pieces by cross-modal near neighbour search in this space. However, this method is oblivious of temporal aspects of music. In this paper, we introduce two strategies that address this shortcoming. First, we present a strategy that aligns sequences of embeddings learned from sheet music scans and audio snippets. A series of experiments on whole piece and fragment-level retrieval on 24 hours worth of classical piano recordings demonstrates significant improvement. Second, we show that the retrieval can be further improved by introducing an attention mechanism to the embedding learning model that reduces the effects of tempo variations in music. To conclude, we assess the scalability of our method and discuss potential measures to make it suitable for truly large-scale applications.

READ FULL TEXT

page 1

page 3

research
02/12/2019

Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies

There has been a rapid growth of digitally available music data, includi...
research
09/21/2023

Passage Summarization with Recurrent Models for Audio-Sheet Music Retrieval

Many applications of cross-modal music retrieval are related to connecti...
research
09/21/2023

Towards Robust and Truly Large-Scale Audio-Sheet Music Retrieval

A range of applications of multi-modal music information retrieval is ce...
research
06/26/2019

Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music Retrieval

Connecting large libraries of digitized audio recordings to their corres...
research
09/15/2018

Attention as a Perspective for Learning Tempo-invariant Audio Queries

Current models for audio--sheet music retrieval via multimodal embedding...
research
05/12/2023

Music Rearrangement Using Hierarchical Segmentation

Music rearrangement involves reshuffling, deleting, and repeating sectio...
research
04/22/2020

Towards Linking the Lakh and IMSLP Datasets

This paper investigates the problem of matching a MIDI file against a la...

Please sign up or login with your details

Forgot password? Click here to reset