InfographicVQA

04/26/2021
by   Minesh Mathew, et al.
12

Infographics are documents designed to effectively communicate information using a combination of textual, graphical and visual elements. In this work, we explore the automatic understanding of infographic images by using Visual Question Answering technique.To this end, we present InfographicVQA, a new dataset that comprises a diverse collection of infographics along with natural language questions and answers annotations. The collected questions require methods to jointly reason over the document layout, textual content, graphical elements, and data visualizations. We curate the dataset with emphasis on questions that require elementary reasoning and basic arithmetic skills. Finally, we evaluate two strong baselines based on state of the art multi-modal VQA models, and establish baseline performance for the new task. The dataset, code and leaderboard will be made available at http://docvqa.org

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 4

page 15

page 17

page 18

page 19

page 21

page 22

04/23/2020

Visual Question Answering Using Semantic Information from Image Descriptions

Visual question answering (VQA) is a task that requires AI systems to di...
04/27/2021

Document Collection Visual Question Answering

Current tasks and methods in Document Understanding aims to process docu...
07/01/2020

DocVQA: A Dataset for VQA on Document Images

We present a new dataset for Visual Question Answering on document image...
10/25/2021

IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning

Current visual question answering (VQA) tasks mainly consider answering ...
10/19/2017

FigureQA: An Annotated Figure Dataset for Visual Reasoning

We introduce FigureQA, a visual reasoning corpus of over one million que...
03/23/2021

Multi-Modal Answer Validation for Knowledge-Based VQA

The problem of knowledge-based visual question answering involves answer...
02/18/2021

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

We address the challenging problem of Natural Language Comprehension bey...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.