Passage Summarization with Recurrent Models for Audio-Sheet Music Retrieval

09/21/2023
by   Luis Carvalho, et al.
0

Many applications of cross-modal music retrieval are related to connecting sheet music images to audio recordings. A typical and recent approach to this is to learn, via deep neural networks, a joint embedding space that correlates short fixed-size snippets of audio and sheet music by means of an appropriate similarity structure. However, two challenges that arise out of this strategy are the requirement of strongly aligned data to train the networks, and the inherent discrepancies of musical content between audio and sheet music snippets caused by local and global tempo differences. In this paper, we address these two shortcomings by designing a cross-modal recurrent network that learns joint embeddings that can summarize longer passages of corresponding audio and sheet music. The benefits of our method are that it only requires weakly aligned audio-sheet music pairs, as well as that the recurrent network handles the non-linearities caused by tempo variations between audio and sheet music. We conduct a number of experiments on synthetic and real piano data and scores, showing that our proposed recurrent method leads to more accurate retrieval in all possible configurations.

READ FULL TEXT
research
09/21/2023

Self-Supervised Contrastive Learning for Robust Audio-Sheet Music Retrieval Systems

Linking sheet music images to audio recordings remains a key problem for...
research
09/21/2023

Towards Robust and Truly Large-Scale Audio-Sheet Music Retrieval

A range of applications of multi-modal music information retrieval is ce...
research
06/26/2019

Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music Retrieval

Connecting large libraries of digitized audio recordings to their corres...
research
05/26/2021

Exploiting Temporal Dependencies for Cross-Modal Music Piece Identification

This paper addresses the problem of cross-modal musical piece identifica...
research
09/15/2018

Attention as a Perspective for Learning Tempo-invariant Audio Queries

Current models for audio--sheet music retrieval via multimodal embedding...
research
11/24/2017

Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval

Little research focuses on cross-modal correlation learning where tempor...
research
04/21/2020

MIDI Passage Retrieval Using Cell Phone Pictures of Sheet Music

This paper investigates a cross-modal retrieval problem in which a user ...

Please sign up or login with your details

Forgot password? Click here to reset