SAViR-T: Spatially Attentive Visual Reasoning with Transformers

06/18/2022
by   Pritish Sahu, et al.
27

We present a novel computational model, "SAViR-T", for the family of visual reasoning problems embodied in the Raven's Progressive Matrices (RPM). Our model considers explicit spatial semantics of visual elements within each image in the puzzle, encoded as spatio-visual tokens, and learns the intra-image as well as the inter-image token dependencies, highly relevant for the visual reasoning task. Token-wise relationship, modeled through a transformer-based SAViR-T architecture, extract group (row or column) driven representations by leveraging the group-rule coherence and use this as the inductive bias to extract the underlying rule representations in the top two row (or column) per token in the RPM. We use this relation representations to locate the correct choice image that completes the last row or column for the RPM. Extensive experiments across both synthetic RPM benchmarks, including RAVEN, I-RAVEN, RAVEN-FAIR, and PGM, and the natural image-based "V-PROM" demonstrate that SAViR-T sets a new state-of-the-art for visual reasoning, exceeding prior models' performance by a considerable margin.

READ FULL TEXT

page 11

page 13

page 21

page 23

page 24

page 26

research
01/16/2023

Ae^2I: A Double Autoencoder for Imputation of Missing Values

The most common strategy of imputing missing values in a table is to stu...
research
03/01/2022

TableFormer: Robust Transformer Modeling for Table-Text Encoding

Understanding tables is an important aspect of natural language understa...
research
11/02/2020

Pairwise Relations Discriminator for Unsupervised Raven's Progressive Matrices

Abstract reasoning is a key indicator of intelligence. The ability to hy...
research
06/05/2020

Visual Transformers: Token-based Image Representation and Processing for Computer Vision

Computer vision has achieved great success using standardized image repr...
research
01/22/2022

Dual-Flattening Transformers through Decomposed Row and Column Queries for Semantic Segmentation

It is critical to obtain high resolution features with long range depend...
research
05/27/2022

Effective Abstract Reasoning with Dual-Contrast Network

As a step towards improving the abstract reasoning capability of machine...

Please sign up or login with your details

Forgot password? Click here to reset