MuLan: A Joint Embedding of Music Audio and Natural Language

08/26/2022
by   Qingqing Huang, et al.
5

Music tagging and content-based retrieval systems have traditionally been constructed using pre-defined ontologies covering a rigid set of music attributes or text queries. This paper presents MuLan: a first attempt at a new generation of acoustic models that link music audio directly to unconstrained natural language music descriptions. MuLan takes the form of a two-tower, joint audio-text embedding model trained using 44 million music recordings (370K hours) and weakly-associated, free-form text annotations. Through its compatibility with a wide range of music genres and text styles (including conventional music tags), the resulting audio-text representation subsumes existing ontologies while graduating to true zero-shot functionalities. We demonstrate the versatility of the MuLan embeddings with a range of experiments including transfer learning, zero-shot music tagging, language understanding in the music domain, and cross-modal retrieval applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/25/2022

Contrastive Audio-Language Learning for Music

As one of the most intuitive interfaces known to humans, natural languag...
research
08/24/2023

Emotion-Aligned Contrastive Learning Between Images and Music

Traditional music search engines rely on retrieval methods that match na...
research
07/05/2019

Zero-shot Learning for Audio-based Music Classification and Tagging

Audio-based music classification and tagging is typically based on categ...
research
06/20/2019

Zero-shot Learning and Knowledge Transfer in Music Classification and Tagging

Music classification and tagging is conducted through categorical superv...
research
11/26/2022

Toward Universal Text-to-Music Retrieval

This paper introduces effective design choices for text-to-music retriev...
research
09/20/2023

Investigating Personalization Methods in Text to Music Generation

In this work, we investigate the personalization of text-to-music diffus...
research
11/26/2021

Emotion Embedding Spaces for Matching Music to Stories

Content creators often use music to enhance their stories, as it can be ...

Please sign up or login with your details

Forgot password? Click here to reset