SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint

12/09/2020
by   Zhonghao Sheng, et al.
4

Automatic song writing aims to compose a song (lyric and/or melody) by machine, which is an interesting topic in both academia and industry. In automatic song writing, lyric-to-melody generation and melody-to-lyric generation are two important tasks, both of which usually suffer from the following challenges: 1) the paired lyric and melody data are limited, which affects the generation quality of the two tasks, considering a lot of paired training data are needed due to the weak correlation between lyric and melody; 2) Strict alignments are required between lyric and melody, which relies on specific alignment modeling. In this paper, we propose SongMASS to address the above challenges, which leverages masked sequence to sequence (MASS) pre-training and attention based alignment modeling for lyric-to-melody and melody-to-lyric generation. Specifically, 1) we extend the original sentence-level MASS pre-training to song level to better capture long contextual information in music, and use a separate encoder and decoder for each modality (lyric or melody); 2) we leverage sentence-level attention mask and token-level attention constraint during training to enhance the alignment between lyric and melody. During inference, we use a dynamic programming strategy to obtain the alignment between each word/syllable in lyric and note in melody. We pre-train SongMASS on unpaired lyric and melody datasets, and both objective and subjective evaluations demonstrate that SongMASS generates lyric and melody with significantly better quality than the baseline method without pre-training or alignment constraint.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2019

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Pre-training and fine-tuning, e.g., BERT, have achieved great success in...
research
01/26/2020

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

Current pre-training works in natural language generation pay little att...
research
10/28/2019

Unsupervised pre-traing for sequence to sequence speech recognition

This paper proposes a novel approach to pre-train encoder-decoder sequen...
research
06/15/2019

A Hierarchical Attention Based Seq2seq Model for Chinese Lyrics Generation

In this paper, we comprehensively study on context-aware generation of C...
research
09/19/2023

MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation

Pre-trained language models have achieved impressive results in various ...
research
06/05/2023

End-to-End Word-Level Pronunciation Assessment with MASK Pre-training

Pronunciation assessment is a major challenge in the computer-aided pron...

Please sign up or login with your details

Forgot password? Click here to reset