Deep Reason: A Strong Baseline for Real-World Visual Reasoning

05/24/2019
by   Chenfei Wu, et al.
0

This paper presents a strong baseline for real-world visual reasoning (GQA), which achieves 60.93 large dataset with 22M questions involving spatial understanding and multi-step inference. To help further research in this area, we identified three crucial parts that improve the performance, namely: multi-source features, fine-grained encoder, and score-weighted ensemble. We provide a series of analysis on their impact on performance.

READ FULL TEXT

page 1

page 2

page 3

research
12/22/2021

CLEVR3D: Compositional Language and Elementary Visual Reasoning for Question Answering in 3D Real-World Scenes

3D scene understanding is a relatively emerging research field. In this ...
research
06/06/2022

Scan2Part: Fine-grained and Hierarchical Part-level Understanding of Real-World 3D Scans

We propose Scan2Part, a method to segment individual parts of objects in...
research
08/16/2021

WikiChurches: A Fine-Grained Dataset of Architectural Styles with Real-World Challenges

We introduce a novel dataset for architectural style classification, con...
research
07/24/2022

Explored An Effective Methodology for Fine-Grained Snake Recognition

Fine-Grained Visual Classification (FGVC) is a longstanding and fundamen...
research
03/27/2013

Evidential Reasoning in Parallel Hierarchical Vision Programs

This paper presents an efficient adaptation and application of the Demps...
research
10/07/2020

Hierarchical Relational Inference

Common-sense physical reasoning in the real world requires learning abou...
research
05/07/2022

Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information

Recently, online shopping has gradually become a common way of shopping ...

Please sign up or login with your details

Forgot password? Click here to reset