DeepAI AI Chat
Log In Sign Up

Graph-Based Global Reasoning Networks

by   Yunpeng Chen, et al.

Globally modeling and reasoning over relations between regions can be beneficial for many computer vision tasks on both images and videos. Convolutional Neural Networks (CNNs) excel at modeling local relations by convolution operations, but they are typically inefficient at capturing global relations between distant regions and require stacking multiple convolution layers. In this work, we propose a new approach for reasoning globally in which a set of features are globally aggregated over the coordinate space and then projected to an interaction space where relational reasoning can be efficiently computed. After reasoning, relation-aware features are distributed back to the original coordinate space for down-stream tasks. We further present a highly efficient instantiation of the proposed approach and introduce the Global Reasoning unit (GloRe unit) that implements the coordinate-interaction space mapping by weighted global pooling and weighted broadcasting, and the relation reasoning via graph convolution on a small graph in interaction space. The proposed GloRe unit is lightweight, end-to-end trainable and can be easily plugged into existing CNNs for a wide range of tasks. Extensive experiments show our GloRe unit can consistently boost the performance of state-of-the-art backbone architectures, including ResNet, ResNeXt, SE-Net and DPN, for both 2D and 3D CNNs, on image classification, semantic segmentation and video action recognition task.


Context-Gated Convolution

As the basic building block of Convolutional Neural Networks (CNNs), the...

Relation-Aware Global Attention

Attention mechanism aims to increase the representation power by focusin...

Visual Concept Reasoning Networks

A split-transform-merge strategy has been broadly used as an architectur...

Broadcasting Convolutional Network

While convolutional neural networks (CNNs) are widely used for handling ...

A^2-Nets: Double Attention Networks

Learning to capture long-range relations is fundamental to image/video r...

Towards Efficient Scene Understanding via Squeeze Reasoning

Graph-based convolutional model such as non-local block has shown to be ...