An attention-driven hierarchical multi-scale representation for visual recognition

10/23/2021
by   Zachary Wharton, et al.
0

Convolutional Neural Networks (CNNs) have revolutionized the understanding of visual content. This is mainly due to their ability to break down an image into smaller pieces, extract multi-scale localized features and compose them to construct highly expressive representations for decision making. However, the convolution operation is unable to capture long-range dependencies such as arbitrary relations between pixels since it operates on a fixed-size window. Therefore, it may not be suitable for discriminating subtle changes (e.g. fine-grained visual recognition). To this end, our proposed method captures the high-level long-range dependencies by exploring Graph Convolutional Networks (GCNs), which aggregate information by establishing relationships among multi-scale hierarchical regions. These regions consist of smaller (closer look) to larger (far look), and the dependency between regions is modeled by an innovative attention-driven message propagation, guided by the graph structure to emphasize the neighborhoods of a given region. Our approach is simple yet extremely effective in solving both the fine-grained and generic visual classification problems. It outperforms the state-of-the-arts with a significant margin on three and is very competitive on other two datasets.

READ FULL TEXT

page 1

page 4

page 11

page 23

page 24

page 25

page 26

research
12/16/2022

DQnet: Cross-Model Detail Querying for Camouflaged Object Detection

Camouflaged objects are seamlessly blended in with their surroundings, w...
research
08/31/2022

MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual Recognition

Vision Transformer and its variants have demonstrated great potential in...
research
06/16/2015

Deep Convolutional Networks on Graph-Structured Data

Deep Learning's recent successes have mostly relied on Convolutional Net...
research
09/05/2022

SR-GNN: Spatial Relation-aware Graph Neural Network for Fine-Grained Image Categorization

Over the past few years, a significant progress has been made in deep co...
research
03/03/2023

PPCR: Learning Pyramid Pixel Context Recalibration Module for Medical Image Classification

Spatial attention mechanism has been widely incorporated into deep convo...
research
01/17/2021

Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification

Deep convolutional neural networks (CNNs) have shown a strong ability in...
research
08/30/2022

ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer

Generating robust and reliable correspondences across images is a fundam...

Please sign up or login with your details

Forgot password? Click here to reset