A Large-scale Dataset for Audio-Language Representation Learning

09/20/2023
by   Luoyi Sun, et al.
0

The AI community has made significant strides in developing powerful foundation models, driven by large-scale multimodal datasets. However, in the audio representation learning community, the present audio-language datasets suffer from limitations such as insufficient volume, simplistic content, and arduous collection procedures. To tackle these challenges, we present an innovative and automatic audio caption generation pipeline based on a series of public tools or APIs, and construct a large-scale, high-quality, audio-language dataset, named as Auto-ACD, comprising over 1.9M audio-text pairs. To demonstrate the effectiveness of the proposed dataset, we train popular models on our dataset and show performance improvement on various downstream tasks, namely, audio-language retrieval, audio captioning, environment classification. In addition, we establish a novel test set and provide a benchmark for audio-text tasks. The proposed dataset will be released at https://auto-acd.github.io/.

READ FULL TEXT
research
07/29/2023

UniBriVL: Robust Universal Representation and Generation of Audio Driven Diffusion Models

Multimodal large models have been recognized for their advantages in var...
research
03/30/2023

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

The advancement of audio-language (AL) multimodal learning tasks has bee...
research
02/28/2023

Audio Retrieval for Multimodal Design Documents: A New Dataset and Algorithms

We consider and propose a new problem of retrieving audio files relevant...
research
01/30/2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

Large-scale multimodal generative modeling has created milestones in tex...
research
01/30/2021

Melon Playlist Dataset: a public dataset for audio-based playlist generation and music tagging

One of the main limitations in the field of audio signal processing is t...
research
07/10/2023

A Demand-Driven Perspective on Generative Audio AI

To achieve successful deployment of AI research, it is crucial to unders...
research
03/30/2015

LSHTC: A Benchmark for Large-Scale Text Classification

LSHTC is a series of challenges which aims to assess the performance of ...

Please sign up or login with your details

Forgot password? Click here to reset