SongDriver: Real-time Music Accompaniment Generation without Logical Latency nor Exposure Bias

by   Zihao Wang, et al.

Real-time music accompaniment generation has a wide range of applications in the music industry, such as music education and live performances. However, automatic real-time music accompaniment generation is still understudied and often faces a trade-off between logical latency and exposure bias. In this paper, we propose SongDriver, a real-time music accompaniment generation system without logical latency nor exposure bias. Specifically, SongDriver divides one accompaniment generation task into two phases: 1) The arrangement phase, where a Transformer model first arranges chords for input melodies in real-time, and caches the chords for the next phase instead of playing them out. 2) The prediction phase, where a CRF model generates playable multi-track accompaniments for the coming melodies based on previously cached chords. With this two-phase strategy, SongDriver directly generates the accompaniment for the upcoming melody, achieving zero logical latency. Furthermore, when predicting chords for a timestep, SongDriver refers to the cached chords from the first phase rather than its previous predictions, which avoids the exposure bias problem. Since the input length is often constrained under real-time conditions, another potential problem is the loss of long-term sequential information. To make up for this disadvantage, we extract four musical features from a long-term music piece before the current time step as global information. In the experiment, we train SongDriver on some open-source datasets and an original àiSong Dataset built from Chinese-style modern pop music scores. The results show that SongDriver outperforms existing SOTA (state-of-the-art) models on both objective and subjective metrics, meanwhile significantly reducing the physical latency.


PopMAG: Pop Music Accompaniment Generation

In pop music, accompaniments are usually played by multiple instruments ...

SongDriver2: Real-time Emotion-based Music Arrangement with Soft Transition

Real-time emotion-based music arrangement, which aims to transform a giv...

MorpheuS: generating structured music with constrained patterns and tension

Automatic music generation systems have gained in popularity and sophist...

Symbolic Music Playing Techniques Generation as a Tagging Problem

Music generation has always been a hot topic. When discussing symbolic m...

Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls

We propose Polyffusion, a diffusion model that generates polyphonic musi...

GTN-Bailando: Genre Consistent Long-Term 3D Dance Generation based on Pre-trained Genre Token Network

Music-driven 3D dance generation has become an intensive research topic ...

Real-time auralization for performers on virtual stages

This article presents an interactive system for stage acoustics experime...

Please sign up or login with your details

Forgot password? Click here to reset