HindSight: A Graph-Based Vision Model Architecture For Representing Part-Whole Hierarchies

04/08/2021
by   Muhammad AbdurRafae, et al.
0

This paper presents a model architecture for encoding the representations of part-whole hierarchies in images in form of a graph. The idea is to divide the image into patches of different levels and then treat all of these patches as nodes for a fully connected graph. A dynamic feature extraction module is used to extract feature representations from these patches in each graph iteration. This enables us to learn a rich graph representation of the image that encompasses the inherent part-whole hierarchical information. Utilizing proper self-supervised training techniques, such a model can be trained as a general purpose vision encoder model which can then be used for various vision related downstream tasks (e.g., Image Classification, Object Detection, Image Captioning, etc.).

READ FULL TEXT

page 4

page 6

page 9

page 11

research
06/01/2022

Vision GNN: An Image is Worth Graph of Nodes

Network architecture plays a key role in the deep learning-based compute...
research
02/07/2022

Context Autoencoder for Self-Supervised Representation Learning

We present a novel masked image modeling (MIM) approach, context autoenc...
research
04/01/2023

Mask Hierarchical Features For Self-Supervised Learning

This paper shows that Masking the Deep hierarchical features is an effic...
research
05/27/2022

Architecture-Agnostic Masked Image Modeling – From ViT back to CNN

Masked image modeling (MIM), an emerging self-supervised pre-training me...
research
06/01/2022

Where are my Neighbors? Exploiting Patches Relations in Self-Supervised Vision Transformer

Vision Transformers (ViTs) enabled the use of transformer architecture o...
research
09/15/2023

BROW: Better featuRes fOr Whole slide image based on self-distillation

Whole slide image (WSI) processing is becoming part of the key component...
research
10/26/2022

Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input

Masked Autoencoders is a simple yet powerful self-supervised learning me...

Please sign up or login with your details

Forgot password? Click here to reset