In Search of a Dataset for Handwritten Optical Music Recognition: Introducing MUSCIMA++

03/14/2017
by   Jan Hajič Jr., et al.
0

Optical Music Recognition (OMR) has long been without an adequate dataset and ground truth for evaluating OMR systems, which has been a major problem for establishing a state of the art in the field. Furthermore, machine learning methods require training data. We analyze how the OMR processing pipeline can be expressed in terms of gradually more complex ground truth, and based on this analysis, we design the MUSCIMA++ dataset of handwritten music notation that addresses musical symbol recognition and notation reconstruction. The MUSCIMA++ dataset version 0.9 consists of 140 pages of handwritten music, with 91255 manually annotated notation symbols and 82261 explicitly marked relationships between symbol pairs. The dataset allows training and evaluating models for symbol classification, symbol localization, and notation graph assembly, both in isolation and jointly. Open-source tools are provided for manipulating the dataset, visualizing the data and further annotation, and the dataset itself is made available under an open license.

READ FULL TEXT

page 3

page 5

page 8

page 10

page 11

research
08/05/2017

Detecting Noteheads in Handwritten Scores with ConvNets and Bounding Box Regression

Noteheads are the interface between the written score and music. Each no...
research
08/08/2016

Database of handwritten Arabic mathematical formulas images

Although publicly available, ground-truthed database have proven useful ...
research
08/18/2023

TrOMR:Transformer-Based Polyphonic Optical Music Recognition

Optical Music Recognition (OMR) is an important technology in music and ...
research
05/26/2018

Deep Watershed Detector for Music Object Recognition

Optical Music Recognition (OMR) is an important and challenging area wit...
research
01/19/2022

Open Source Handwritten Text Recognition on Medieval Manuscripts using Mixed Models and Document-Specific Finetuning

This paper deals with the task of practical and open source Handwritten ...
research
03/04/2020

ASMD: an automatic framework for compiling multimodal datasets

This paper describes an open-source Python framework for handling datase...
research
03/04/2020

ASMD: an automatic framework for compiling multimodal datasets with audio and scores

This paper describes an open-source Python framework for handling datase...

Please sign up or login with your details

Forgot password? Click here to reset