Modeling the Rhythm from Lyrics for Melody Generation of Pop Song

01/03/2023
by   Daiyu Zhang, et al.
0

Creating a pop song melody according to pre-written lyrics is a typical practice for composers. A computational model of how lyrics are set as melodies is important for automatic composition systems, but an end-to-end lyric-to-melody model would require enormous amounts of paired training data. To mitigate the data constraints, we adopt a two-stage approach, dividing the task into lyric-to-rhythm and rhythm-to-melody modules. However, the lyric-to-rhythm task is still challenging due to its multimodality. In this paper, we propose a novel lyric-to-rhythm framework that includes part-of-speech tags to achieve better text setting, and a Transformer architecture designed to model long-term syllable-to-note associations. For the rhythm-to-melody task, we adapt a proven chord-conditioned melody Transformer, which has achieved state-of-the-art results. Experiments for Chinese lyric-to-melody generation show that the proposed framework is able to model key characteristics of rhythm and pitch distributions in the dataset, and in a subjective evaluation, the melodies generated by our system were rated as similar to or better than those of a state-of-the-art alternative.

READ FULL TEXT

page 5

page 6

research
10/28/2020

Decoupling Pronunciation and Language for End-to-end Code-switching Automatic Speech Recognition

Despite the recent significant advances witnessed in end-to-end (E2E) AS...
research
11/25/2022

The Naughtyformer: A Transformer Understands Offensive Humor

Jokes are intentionally written to be funny, but not all jokes are creat...
research
05/19/2021

Retrieval-Augmented Transformer-XL for Close-Domain Dialog Generation

Transformer-based models have demonstrated excellent capabilities of cap...
research
10/22/2020

The NTU-AISG Text-to-speech System for Blizzard Challenge 2020

We report our NTU-AISG Text-to-speech (TTS) entry systems for the Blizza...
research
05/16/2022

CONSENT: Context Sensitive Transformer for Bold Words Classification

We present CONSENT, a simple yet effective CONtext SENsitive Transformer...
research
03/29/2023

Robust Dancer: Long-term 3D Dance Synthesis Using Unpaired Data

How to automatically synthesize natural-looking dance movements based on...
research
11/18/2021

Transformer-S2A: Robust and Efficient Speech-to-Animation

We propose a novel robust and efficient Speech-to-Animation (S2A) approa...

Please sign up or login with your details

Forgot password? Click here to reset