Composer's Assistant: Interactive Transformers for Multi-Track MIDI Infilling
We consider the task of multi-track MIDI infilling when arbitrary (track, measure) pairs of information have been deleted from a contiguous slice of measures from a MIDI file. We train two T5-like models to solve this task, one using a basic MIDI-like event vocabulary and one using a joined word-like version of this vocabulary. We introduce a new test set, created from the Lakh MIDI dataset, consisting of 9 multi-track MIDI infilling tasks. We evaluate our models on these tasks and find that one model works better on some tasks while the other works better on others. Our results have implications for the training of neural networks in other small-vocabulary domains, such as byte sequence modeling and protein sequence modeling. We release our source code, and we demonstrate that our models are capable of enabling real-time human-computer interactive composition in the REAPER digital audio workstation.
READ FULL TEXT