Composer's Assistant: Interactive Transformers for Multi-Track MIDI Infilling

01/29/2023
by   Martin E. Malandro, et al.
0

We consider the task of multi-track MIDI infilling when arbitrary (track, measure) pairs of information have been deleted from a contiguous slice of measures from a MIDI file. We train two T5-like models to solve this task, one using a basic MIDI-like event vocabulary and one using a joined word-like version of this vocabulary. We introduce a new test set, created from the Lakh MIDI dataset, consisting of 9 multi-track MIDI infilling tasks. We evaluate our models on these tasks and find that one model works better on some tasks while the other works better on others. Our results have implications for the training of neural networks in other small-vocabulary domains, such as byte sequence modeling and protein sequence modeling. We release our source code, and we demonstrate that our models are capable of enabling real-time human-computer interactive composition in the REAPER digital audio workstation.

READ FULL TEXT

page 12

page 13

page 14

page 15

research
04/03/2019

Modeling Vocabulary for Big Code Machine Learning

When building machine learning models that operate on source code, sever...
research
04/12/2021

SuperSim: a test set for word similarity and relatedness in Swedish

Language models are notoriously difficult to evaluate. We release SuperS...
research
10/18/2018

Open Vocabulary Learning on Source Code with a Graph-Structured Cache

Machine learning models that take computer program source code as input ...
research
05/02/2019

The 2019 DAVIS Challenge on VOS: Unsupervised Multi-Object Segmentation

We present the 2019 DAVIS Challenge on Video Object Segmentation, the th...
research
11/27/2019

SimpleBooks: Long-term dependency book dataset with simplified English vocabulary for word-level language modeling

With language modeling becoming the popular base task for unsupervised r...
research
04/12/2023

CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks

Contrastive Language-Image Pre-training (CLIP) is a powerful multimodal ...
research
11/12/2021

MS-LaTTE: A Dataset of Where and When To-do Tasks are Completed

Tasks are a fundamental unit of work in the daily lives of people, who a...

Please sign up or login with your details

Forgot password? Click here to reset