MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

09/15/2023
by   Zihao Deng, et al.
0

Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains relatively unexplored. To address this gap, we present MusiLingo, a novel system for music caption generation and music-related query responses. MusiLingo employs a single projection layer to align music representations from the pre-trained frozen music audio model MERT with the frozen LLaMA language model, bridging the gap between music audio and textual contexts. We train it on an extensive music caption dataset and fine-tune it with instructional data. Due to the scarcity of high-quality music Q A datasets, we created the MusicInstruct (MI) dataset from MusicCaps, tailored for open-ended music inquiries. Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q A pairs. Our introduced dataset enables notable advancements beyond previous ones.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/22/2023

Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning

Text-to-music generation (T2M-Gen) faces a major obstacle due to the sca...
research
01/03/2023

Language Models are Drummers: Drum Composition with Natural Language Pre-Training

Automatic music generation with artificial intelligence typically requir...
research
06/18/2023

MARBLE: Music Audio Representation Benchmark for Universal Evaluation

In the era of extensive intersection between art and Artificial Intellig...
research
06/23/2023

DISCO-10M: A Large-Scale Music Dataset

Music datasets play a crucial role in advancing research in machine lear...
research
07/29/2020

Composer Style Classification of Piano Sheet Music Images Using Language Model Pretraining

This paper studies composer style classification of piano sheet music im...
research
07/12/2021

Codified audio language modeling learns useful representations for music information retrieval

We demonstrate that language models pre-trained on codified (discretely-...
research
03/16/2020

TensorFlow Audio Models in Essentia

Essentia is a reference open-source C++/Python library for audio and mus...

Please sign up or login with your details

Forgot password? Click here to reset