Temporal Reasoning via Audio Question Answering

11/21/2019
by   Haytham M. Fayek, et al.
0

Multimodal question answering tasks can be used as proxy tasks to study systems that can perceive and reason about the world. Answering questions about different types of input modalities stresses different aspects of reasoning such as visual reasoning, reading comprehension, story understanding, or navigation. In this paper, we use the task of Audio Question Answering (AQA) to study the temporal reasoning abilities of machine learning models. To this end, we introduce the Diagnostic Audio Question Answering (DAQA) dataset comprising audio sequences of natural sound events and programmatically generated questions and answers that probe various aspects of temporal reasoning. We adapt several recent state-of-the-art methods for visual question answering to the AQA task, and use DAQA to demonstrate that they perform poorly on questions that require in-depth temporal reasoning. Finally, we propose a new model, Multiple Auxiliary Controllers for Linear Modulation (MALiMo) that extends the recent Feature-wise Linear Modulation (FiLM) model and significantly improves its temporal reasoning capabilities. We envisage DAQA to foster research on AQA and temporal reasoning and MALiMo a step towards models for AQA.

READ FULL TEXT

page 2

page 3

page 4

page 9

page 10

page 12

research
07/25/2017

Question Dependent Recurrent Entity Network for Question Answering

Question Answering is a task which requires building models capable of p...
research
05/29/2023

Multi-Scale Attention for Audio Question Answering

Audio question answering (AQA), acting as a widely used proxy task to ex...
research
09/06/2018

Cascaded Mutual Modulation for Visual Reasoning

Visual reasoning is a special visual question answering problem that is ...
research
04/21/2020

Logic-Guided Data Augmentation and Regularization for Consistent Question Answering

Many natural language questions require qualitative, quantitative or log...
research
12/06/2016

MarioQA: Answering Questions by Watching Gameplay Videos

We present a framework to analyze various aspects of models for video qu...
research
04/20/2023

Why Does ChatGPT Fall Short in Answering Questions Faithfully?

Recent advancements in Large Language Models, such as ChatGPT, have demo...
research
05/06/2022

QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning

Synthetic datasets have successfully been used to probe visual question-...

Please sign up or login with your details

Forgot password? Click here to reset