Play as You Like: Timbre-enhanced Multi-modal Music Style Transfer

by   Chien-Yu Lu, et al.

Style transfer of polyphonic music recordings is a challenging task when considering the modeling of diverse, imaginative, and reasonable music pieces in the style different from their original one. To achieve this, learning stable multi-modal representations for both domain-variant (i.e., style) and domain-invariant (i.e., content) information of music in an unsupervised manner is critical. In this paper, we propose an unsupervised music style transfer method without the need for parallel data. Besides, to characterize the multi-modal distribution of music pieces, we employ the Multi-modal Unsupervised Image-to-Image Translation (MUNIT) framework in the proposed system. This allows one to generate diverse outputs from the learned latent distributions representing contents and styles. Moreover, to better capture the granularity of sound, such as the perceptual dimensions of timbre and the nuance in instrument-specific performance, cognitively plausible features including mel-frequency cepstral coefficients (MFCC), spectral difference, and spectral envelope, are combined with the widely-used mel-spectrogram into a timber-enhanced multi-channel input representation. The Relativistic average Generative Adversarial Networks (RaGAN) is also utilized to achieve fast convergence and high stability. We conduct experiments on bilateral style transfer tasks among three different genres, namely piano solo, guitar solo, and string quartet. Results demonstrate the advantages of the proposed method in music style transfer with improved sound quality and in allowing users to manipulate the output.


page 6

page 7


Crossing You in Style: Cross-modal Style Transfer from Music to Visual Arts

Music-to-visual style transfer is a challenging yet important cross-moda...

CycleDRUMS: Automatic Drum Arrangement For Bass Lines Using CycleGAN

The two main research threads in computer-based music generation are: th...

Unsupervised Multi-modal Style Transfer for Cardiac MR Segmentation

In this work, we present a fully automatic method to segment cardiac str...

Music Style Transfer: A Position Paper

Led by the success of neural style transfer on visual arts, there has be...

Music Style Transfer Issues: A Position Paper

Led by the success of neural style transfer on visual arts, there has be...

Towards Harmonized Regional Style Transfer and Manipulation for Facial Images

Regional facial image synthesis conditioned on semantic mask has achieve...

Unsupervised multi-modal Styled Content Generation

The emergence of deep generative models has recently enabled the automat...

Please sign up or login with your details

Forgot password? Click here to reset