Audio Retrieval with Natural Language Queries

05/05/2021
by   Andreea-Maria Oncescu, et al.
0

We consider the task of retrieving audio using free-form natural language queries. To study this problem, which has received limited attention in the existing literature, we introduce challenging new benchmarks for text-based audio retrieval using text annotations sourced from the Audiocaps and Clotho datasets. We then employ these benchmarks to establish baselines for cross-modal audio retrieval, where we demonstrate the benefits of pre-training on diverse audio tasks. We hope that our benchmarks will inspire further research into cross-modal text-based audio retrieval with free-form text queries.

READ FULL TEXT
research
12/17/2021

Audio Retrieval with Natural Language Queries: A Benchmark Study

The objectives of this work are cross-modal text-audio and audio-text re...
research
11/21/2022

TimbreCLIP: Connecting Timbre to Text and Images

We present work in progress on TimbreCLIP, an audio-text cross modal emb...
research
03/19/2023

Audio-Text Models Do Not Yet Leverage Natural Language

Multi-modal contrastive learning techniques in the audio-text domain hav...
research
10/06/2022

Matching Text and Audio Embeddings: Exploring Transfer-learning Strategies for Language-based Audio Retrieval

We present an analysis of large-scale pretrained deep learning models us...
research
06/01/2023

End-to-end Knowledge Retrieval with Multi-modal Queries

We investigate knowledge retrieval with multi-modal queries, i.e. querie...
research
03/28/2022

Text2Pos: Text-to-Point-Cloud Cross-Modal Localization

Natural language-based communication with mobile devices and home applia...
research
02/23/2023

Data leakage in cross-modal retrieval training: A case study

The recent progress in text-based audio retrieval was largely propelled ...

Please sign up or login with your details

Forgot password? Click here to reset