Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering

04/20/2022
by   Samuel Lipping, et al.
2

Audio question answering (AQA) is a multimodal translation task where a system analyzes an audio signal and a natural language question, to generate a desirable natural language answer. In this paper, we introduce Clotho-AQA, a dataset for Audio question answering consisting of 1991 audio files each between 15 to 30 seconds in duration selected from the Clotho dataset [1]. For each audio file, we collect six different questions and corresponding answers by crowdsourcing using Amazon Mechanical Turk. The questions and answers are produced by different annotators. Out of the six questions for each audio, two questions each are designed to have 'yes' and 'no' as answers, while the remaining two questions have other single-word answers. For each question, we collect answers from three different annotators. We also present two baseline experiments to describe the usage of our dataset for the AQA task - an LSTM-based multimodal binary classifier for 'yes' or 'no' type answers and an LSTM-based multimodal multi-class classifier for 828 single-word answers. The binary classifier achieved an accuracy of 62.7 achieved a top-1 accuracy of 54.2 dataset is freely available online at https://zenodo.org/record/6473207.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2023

Attention-Based Methods For Audio Question Answering

Audio question answering (AQA) is the task of producing natural language...
research
07/06/2017

Long-Term Memory Networks for Question Answering

Question answering is an important and difficult task in the natural lan...
research
07/22/2019

ELI5: Long Form Question Answering

We introduce the first large-scale corpus for long-form question answeri...
research
11/08/2019

Are we asking the right questions in MovieQA?

Joint vision and language tasks like visual question answering are fasci...
research
06/27/2016

SelQA: A New Benchmark for Selection-based Question Answering

This paper presents a new selection-based question answering dataset, Se...
research
04/22/2018

Adversarial Training for Community Question Answer Selection Based on Multi-scale Matching

Community-based question answering (CQA) websites represent an important...
research
05/24/2021

VANiLLa : Verbalized Answers in Natural Language at Large Scale

In the last years, there have been significant developments in the area ...

Please sign up or login with your details

Forgot password? Click here to reset