Attention-Based Methods For Audio Question Answering

Audio question answering (AQA) is the task of producing natural language answers when a system is provided with audio and natural language questions. In this paper, we propose neural network architectures based on self-attention and cross-attention for the AQA task. The self-attention layers extract powerful audio and textual representations. The cross-attention maps audio features that are relevant to the textual features to produce answers. All our models are trained on the recently proposed Clotho-AQA dataset for both binary yes/no questions and single-word answer questions. Our results clearly show improvement over the reference method reported in the original paper. On the yes/no binary classification task, our proposed model achieves an accuracy of 68.3 multiclass classifier, our model produces a top-1 and top-5 accuracy of 57.9 and 99.8 further discuss some of the challenges in the Clotho-AQA dataset such as the presence of the same answer word in multiple tenses, singular and plural forms, and the presence of specific and generic answers to the same question. We address these issues and present a revised version of the dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2022

Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering

Audio question answering (AQA) is a multimodal translation task where a ...
research
06/24/2015

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

Most tasks in natural language processing can be cast into question answ...
research
10/31/2017

DCN+: Mixed Objective and Deep Residual Coattention for Question Answering

Traditional models for question answering optimize using cross entropy l...
research
09/02/2019

Answering questions by learning to rank -- Learning to rank by answering questions

Answering multiple-choice questions in a setting in which no supporting ...
research
11/01/2016

Solving Visual Madlibs with Multiple Cues

This paper presents an approach for answering fill-in-the-blank multiple...
research
09/29/2020

Neural Retrieval for Question Answering with Cross-Attention Supervised Data Augmentation

Neural models that independently project questions and answers into a sh...
research
05/26/2019

Gated Group Self-Attention for Answer Selection

Answer selection (answer ranking) is one of the key steps in many kinds ...

Please sign up or login with your details

Forgot password? Click here to reset