Graph Reasoning Transformer for Image Parsing

09/20/2022
by   Dong Zhang, et al.
0

Capturing the long-range dependencies has empirically proven to be effective on a wide range of computer vision tasks. The progressive advances on this topic have been made through the employment of the transformer framework with the help of the multi-head attention mechanism. However, the attention-based image patch interaction potentially suffers from problems of redundant interactions of intra-class patches and unoriented interactions of inter-class patches. In this paper, we propose a novel Graph Reasoning Transformer (GReaT) for image parsing to enable image patches to interact following a relation reasoning pattern. Specifically, the linearly embedded image patches are first projected into the graph space, where each node represents the implicit visual center for a cluster of image patches and each edge reflects the relation weight between two adjacent nodes. After that, global relation reasoning is performed on this graph accordingly. Finally, all nodes including the relation information are mapped back into the original space for subsequent processes. Compared to the conventional transformer, GReaT has higher interaction efficiency and a more purposeful interaction pattern. Experiments are carried out on the challenging Cityscapes and ADE20K datasets. Results show that GReaT achieves consistent performance gains with slight computational overheads on the state-of-the-art transformer baselines.

READ FULL TEXT

page 1

page 4

page 6

page 8

research
03/11/2022

Visualizing and Understanding Patch Interactions in Vision Transformer

Vision Transformer (ViT) has become a leading tool in various computer v...
research
02/03/2023

DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition

As a de facto solution, the vanilla Vision Transformers (ViTs) are encou...
research
05/19/2023

Graph Propagation Transformer for Graph Representation Learning

This paper presents a novel transformer architecture for graph represent...
research
12/09/2021

Locally Shifted Attention With Early Global Integration

Recent work has shown the potential of transformers for computer vision ...
research
04/13/2020

Relation Transformer Network

The identification of objects in an image, together with their mutual re...
research
10/18/2022

Sequence and Circle: Exploring the Relationship Between Patches

The vision transformer (ViT) has achieved state-of-the-art results in va...
research
10/05/2022

Centralized Feature Pyramid for Object Detection

Visual feature pyramid has shown its superiority in both effectiveness a...

Please sign up or login with your details

Forgot password? Click here to reset