MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation

09/19/2023
by   Xinda Wu, et al.
0

Pre-trained language models have achieved impressive results in various music understanding and generation tasks. However, existing pre-training methods for symbolic melody generation struggle to capture multi-scale, multi-dimensional structural information in note sequences, due to the domain knowledge discrepancy between text and music. Moreover, the lack of available large-scale symbolic melody datasets limits the pre-training improvement. In this paper, we propose MelodyGLM, a multi-task pre-training framework for generating melodies with long-term structure. We design the melodic n-gram and long span sampling strategies to create local and global blank infilling tasks for modeling the local and global structures in melodies. Specifically, we incorporate pitch n-grams, rhythm n-grams, and their combined n-grams into the melodic n-gram blank infilling tasks for modeling the multi-dimensional structures in melodies. To this end, we have constructed a large-scale symbolic melody dataset, MelodyNet, containing more than 0.4 million melody pieces. MelodyNet is utilized for large-scale pre-training and domain-specific n-gram lexicon construction. Both subjective and objective evaluations demonstrate that MelodyGLM surpasses the standard and previous pre-training methods. In particular, subjective evaluations show that, on the melody continuation task, MelodyGLM gains average improvements of 0.82, 0.87, 0.78, and 0.94 in consistency, rhythmicity, structure, and overall quality, respectively. Notably, MelodyGLM nearly matches the quality of human-composed melodies on the melody inpainting task.

READ FULL TEXT
research
06/24/2022

MVP: Multi-task Supervised Pre-training for Natural Language Generation

Pre-trained language models (PLMs) have achieved notable success in natu...
research
07/10/2019

LakhNES: Improving multi-instrumental music generation with cross-domain pre-training

We are interested in the task of generating multi-instrumental music sco...
research
07/12/2021

MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding

This paper presents an attempt to employ the mask language modeling appr...
research
12/15/2022

Injecting Domain Knowledge in Language Models for Task-Oriented Dialogue Systems

Pre-trained language models (PLM) have advanced the state-of-the-art acr...
research
04/25/2023

GTN-Bailando: Genre Consistent Long-Term 3D Dance Generation based on Pre-trained Genre Token Network

Music-driven 3D dance generation has become an intensive research topic ...
research
12/09/2020

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint

Automatic song writing aims to compose a song (lyric and/or melody) by m...
research
10/23/2020

ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding

Coarse-grained linguistic information, such as name entities or phrases,...

Please sign up or login with your details

Forgot password? Click here to reset