The meaning of "most" for visual question answering models

12/31/2018
by   Alexander Kuhnle, et al.
10

The correct interpretation of quantifier statements in the context of a visual scene requires non-trivial inference mechanisms. For the example of "most", we discuss two strategies which rely on fundamentally different cognitive concepts. Our aim is to identify what strategy deep learning models for visual question answering learn when trained on such questions. To this end, we carefully design data to replicate experiments from psycholinguistics where the same question was investigated for humans. Focusing on the FiLM visual question answering model, our experiments indicate that a form of approximate number system emerges whose performance declines with more difficult scenes as predicted by Weber's law. Moreover, we identify confounding factors, like spatial arrangement of the scene, which impede the effectiveness of this system.

READ FULL TEXT

page 2

page 5

page 6

research
02/09/2022

Can Open Domain Question Answering Systems Answer Visual Knowledge Questions?

The task of Outside Knowledge Visual Question Answering (OKVQA) requires...
research
02/28/2019

From Visual to Acoustic Question Answering

We introduce the new task of Acoustic Question Answering (AQA) to promot...
research
09/11/2018

Answering Visual What-If Questions: From Actions to Predicted Scene Descriptions

In-depth scene descriptions and question answering tasks have greatly in...
research
05/25/2018

Think Visually: Question Answering through Virtual Imagery

In this paper, we study the problem of geometric reasoning in the contex...
research
05/14/2018

Did the Model Understand the Question?

We analyze state-of-the-art deep learning models for three tasks: questi...
research
02/17/2017

Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers from Vision

People can refer to quantities in a visual scene by using either exact c...
research
04/07/2023

Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images

Aiming at answering questions based on the content of remotely sensed im...

Please sign up or login with your details

Forgot password? Click here to reset