Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances

06/16/2023
by   Huang Xie, et al.
0

This paper explores grading text-based audio retrieval relevances with crowdsourcing assessments. Given a free-form text (e.g., a caption) as a query, crowdworkers are asked to grade audio clips using numeric scores (between 0 and 100) to indicate their judgements of how much the sound content of an audio clip matches the text, where 0 indicates no content match at all and 100 indicates perfect content match. We integrate the crowdsourced relevances into training and evaluating text-based audio retrieval systems, and evaluate the effect of using them together with binary relevances from audio captioning. Conventionally, these binary relevances are defined by captioning-based audio-caption pairs, where being positive indicates that the caption describes the paired audio, and being negative applies to all other pairs. Experimental results indicate that there is no clear benefit from incorporating crowdsourced relevances alongside binary relevances when the crowdsourced relevances are binarized for contrastive learning. Conversely, the results suggest that using only binary relevances defined by captioning-based audio-caption pairs is sufficient for contrastive learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/28/2022

Audio Retrieval with WavText5K and CLAP Training

Audio-Text retrieval takes a natural language query to retrieve relevant...
research
11/08/2022

On Negative Sampling for Contrastive Audio-Text Retrieval

This paper investigates negative sampling for contrastive learning in th...
research
10/16/2022

Attention-Based Audio Embeddings for Query-by-Example

An ideal audio retrieval system efficiently and robustly recognizes a sh...
research
07/20/2022

Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval

The amount of audio data available on public websites is growing rapidly...
research
07/13/2016

AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

Recently, sound recognition has been used to identify sounds, such as ca...
research
08/29/2023

Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?

Automated Audio Captioning (AAC) aims to develop systems capable of desc...
research
10/22/2020

Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning

Most of existing audio fingerprinting systems have limitations to be use...

Please sign up or login with your details

Forgot password? Click here to reset