A Diagram Is Worth A Dozen Images

03/24/2016
by   Aniruddha Kembhavi, et al.
0

Diagrams are common tools for representing complex concepts, relationships and events, often when it would be difficult to portray the same information with natural images. Understanding natural images has been extensively studied in computer vision, while diagram understanding has received little attention. In this paper, we study the problem of diagram interpretation and reasoning, the challenging task of identifying the structure of a diagram and the semantics of its constituents and their relationships. We introduce Diagram Parse Graphs (DPG) as our representation to model the structure of diagrams. We define syntactic parsing of diagrams as learning to infer DPGs for diagrams and study semantic interpretation and reasoning of diagrams in the context of diagram question answering. We devise an LSTM-based method for syntactic parsing of diagrams and introduce a DPG-based attention model for diagram question answering. We compile a new dataset of diagrams with exhaustive annotations of constituents and relationships for over 5,000 diagrams and 15,000 questions and answers. Our results show the significance of our models for syntactic parsing and question answering in diagrams using DPGs.

READ FULL TEXT

page 2

page 13

research
03/10/2021

RL-CSDia: Representation Learning of Computer Science Diagrams

Recent studies on computer vision mainly focus on natural images that ex...
research
10/25/2021

IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning

Current visual question answering (VQA) tasks mainly consider answering ...
research
11/27/2017

Dynamic Graph Generation Network: Generating Relational Knowledge from Diagrams

In this work, we introduce a new algorithm for analyzing a diagram, whic...
research
12/06/2021

MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided Multimodal Attention for Textbook Question Answering

Textbook Question Answering (TQA) is a complex multimodal task to infer ...
research
05/19/2023

RxnScribe: A Sequence Generation Model for Reaction Diagram Parsing

Reaction diagram parsing is the task of extracting reaction schemes from...
research
05/19/2022

Neural network topological snake models for locating general phase diagrams

Machine learning for locating phase diagram has received intensive resea...
research
12/29/2022

GPTR: Gestalt-Perception Transformer for Diagram Object Detection

Diagram object detection is the key basis of practical applications such...

Please sign up or login with your details

Forgot password? Click here to reset