Improving Perceptual Quality of Drum Transcription with the Expanded Groove MIDI Dataset

04/01/2020
by   Lee Callender, et al.
0

Classifier metrics, such as accuracy and F-measure score, often serve as proxies for performance in downstream tasks. For the case of generative systems that use predicted labels as inputs, accuracy is a good proxy only if it aligns with the perceptual quality of generated outputs. Here, we demonstrate this effect using the example of automatic drum transcription (ADT). We optimize classifiers for downstream generation by predicting expressive dynamics (velocity) and show with listening tests that they produce outputs with improved perceptual quality, despite achieving similar results on classification metrics. To train expressive ADT models, we introduce the Expanded Groove MIDI dataset (E-GMD), a large dataset of human drum performances, with audio recordings annotated in MIDI. E-GMD contains 444 hours of audio from 43 drum kits and is an order of magnitude larger than similar datasets. It is also the first human-performed drum dataset with annotations of velocity. We make this new dataset available under a Creative Commons license along with open source code for training and a pre-trained model for inference.

READ FULL TEXT

page 7

page 11

research
05/16/2022

Perceptual Evaluation on Audio-visual Dataset of 360 Content

To open up new possibilities to assess the multimodal perceptual quality...
research
10/26/2022

AVES: Animal Vocalization Encoder based on Self-Supervision

The lack of annotated training data in bioacoustics hinders the use of l...
research
09/10/2020

ICASSP 2021 Acoustic Echo Cancellation Challenge: Datasets and Testing Framework

The ICASSP 2021 Acoustic Echo Cancellation Challenge is intended to stim...
research
05/04/2023

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec

Audio codec models are widely used in audio communication as a crucial t...
research
07/23/2023

Downstream-agnostic Adversarial Examples

Self-supervised learning usually uses a large amount of unlabeled data t...
research
08/31/2022

Evaluating generative audio systems and their metrics

Recent years have seen considerable advances in audio synthesis with dee...
research
04/01/2019

HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

Generative models often use human evaluations to measure the perceived q...

Please sign up or login with your details

Forgot password? Click here to reset