LP-MusicCaps: LLM-Based Pseudo Music Captioning

07/31/2023
by   Seungheon Doh, et al.
0

Automatic music captioning, which generates natural language descriptions for given music tracks, holds significant potential for enhancing the understanding and organization of large volumes of musical data. Despite its importance, researchers face challenges due to the costly and time-consuming collection process of existing music-language datasets, which are limited in size. To address this data scarcity issue, we propose the use of large language models (LLMs) to artificially generate the description sentences from large-scale tag datasets. This results in approximately 2.2M captions paired with 0.5M audio clips. We term it Large Language Model based Pseudo music caption dataset, shortly, LP-MusicCaps. We conduct a systemic evaluation of the large-scale music captioning dataset with various quantitative evaluation metrics used in the field of natural language processing as well as human evaluation. In addition, we trained a transformer-based music captioning model with the dataset and evaluated it under zero-shot and transfer-learning settings. The results demonstrate that our proposed approach outperforms the supervised baseline model.

READ FULL TEXT
research
08/17/2016

Towards Music Captioning: Generating Music Playlist Descriptions

Descriptions are often provided along with recommendations to help users...
research
03/30/2023

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

The advancement of audio-language (AL) multimodal learning tasks has bee...
research
08/22/2023

Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning

Text-to-music generation (T2M-Gen) faces a major obstacle due to the sca...
research
04/24/2021

MusCaps: Generating Captions for Music Audio

Content-based music information retrieval has seen rapid progress with t...
research
05/15/2023

A Whisper transformer for audio captioning trained with synthetic captions and transfer learning

The field of audio captioning has seen significant advancements in recen...
research
07/21/2021

JS Fake Chorales: a Synthetic Dataset of Polyphonic Music with Human Annotation

High quality datasets for learning-based modelling of polyphonic symboli...
research
11/04/2021

MT3: Multi-Task Multitrack Music Transcription

Automatic Music Transcription (AMT), inferring musical notes from raw au...

Please sign up or login with your details

Forgot password? Click here to reset