CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning

08/10/2022
by   Adam Dahlgren Lindström, et al.
0

We introduce CLEVR-Math, a multi-modal math word problems dataset consisting of simple math word problems involving addition/subtraction, represented partly by a textual description and partly by an image illustrating the scenario. The text describes actions performed on the scene that is depicted in the image. Since the question posed may not be about the scene in the image, but about the state of the scene before or after the actions are applied, the solver envision or imagine the state changes due to these actions. Solving these word problems requires a combination of language, visual and mathematical reasoning. We apply state-of-the-art neural and neuro-symbolic models for visual question answering on CLEVR-Math and empirically evaluate their performances. Our results show how neither method generalise to chains of operations. We discuss the limitations of the two in addressing the task of multi-modal word problem solving.

READ FULL TEXT

page 2

page 4

page 9

page 13

page 14

page 16

research
09/24/2017

Survey of Recent Advances in Visual Question Answering

Visual Question Answering (VQA) presents a unique challenge as it requir...
research
12/01/2022

Analogical Math Word Problems Solving with Enhanced Problem-Solution Association

Math word problem (MWP) solving is an important task in question answeri...
research
06/01/2020

Multimodal grid features and cell pointers for Scene Text Visual Question Answering

This paper presents a new model for the task of scene text visual questi...
research
06/01/2023

Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data

The impressive advances and applications of large language and joint lan...
research
09/18/2021

ReaSCAN: Compositional Reasoning in Language Grounding

The ability to compositionally map language to referents, relations, and...
research
07/07/2023

MultiQG-TI: Towards Question Generation from Multi-modal Sources

We study the new problem of automatic question generation (QG) from mult...
research
11/09/2020

After All, Only The Last Neuron Matters: Comparing Multi-modal Fusion Functions for Scene Graph Generation

From object segmentation to word vector representations, Scene Graph Gen...

Please sign up or login with your details

Forgot password? Click here to reset