Audio Retrieval with WavText5K and CLAP Training

09/28/2022
by   Soham Deshmukh, et al.
0

Audio-Text retrieval takes a natural language query to retrieve relevant audio files in a database. Conversely, Text-Audio retrieval takes an audio file as a query to retrieve relevant natural language descriptions. Most of the literature train retrieval systems with one audio captioning dataset, but evaluating the benefit of training with multiple datasets is underexplored. Moreover, retrieval systems have to learn the alignment between elaborated sentences describing audio content of variable length ranging from a few seconds to several minutes. In this work, we propose a new collection of web audio-text pairs and a new framework for retrieval. First, we provide a new collection of about five thousand web audio-text pairs that we refer to as WavText5K. When used to train our retrieval system, WavText5K improved performance more than other audio captioning datasets. Second, our framework learns to connect language and audio content by using a text encoder, two audio encoders, and a contrastive learning objective. Combining both audio encoders helps to process variable length audio. The two contributions beat state of the art performance for AudioCaps and Clotho on Text-Audio retrieval by a relative 2

READ FULL TEXT
research
12/17/2021

Audio Retrieval with Natural Language Queries: A Benchmark Study

The objectives of this work are cross-modal text-audio and audio-text re...
research
06/16/2023

Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances

This paper explores grading text-based audio retrieval relevances with c...
research
10/30/2017

Content-based Representations of audio using Siamese neural networks

In this paper, we focus on the problem of content-based retrieval for au...
research
08/29/2023

Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?

Automated Audio Captioning (AAC) aims to develop systems capable of desc...
research
09/21/2021

Audio Interval Retrieval using Convolutional Neural Networks

Modern streaming services are increasingly labeling videos based on thei...
research
10/16/2022

Attention-Based Audio Embeddings for Query-by-Example

An ideal audio retrieval system efficiently and robustly recognizes a sh...
research
02/28/2023

Audio Retrieval for Multimodal Design Documents: A New Dataset and Algorithms

We consider and propose a new problem of retrieving audio files relevant...

Please sign up or login with your details

Forgot password? Click here to reset