Towards Linking the Lakh and IMSLP Datasets
This paper investigates the problem of matching a MIDI file against a large database of piano sheet music images. Previous sheet-audio and sheet-MIDI alignment approaches have primarily focused on a 1-to-1 alignment task, which is not a scalable solution for retrieval from large databases. We propose a method for scalable cross-modal retrieval that might be used to link the Lakh MIDI dataset with IMSLP sheet music data. Our approach is to modify a previously proposed feature representation called a symbolic bootleg score to be suitable for hashing. On a database of 5,000 piano scores containing 55,000 individual sheet music images, our system achieves a mean reciprocal rank of 0.84 and an average retrieval time of 25.4 seconds.
READ FULL TEXT