Multi-label Image Recognition by Recurrently Discovering Attentional Regions

11/08/2017
by   Zhouxia Wang, et al.
0

This paper proposes a novel deep architecture to address multi-label image recognition, a fundamental and practical task towards general visual understanding. Current solutions for this task usually rely on an extra step of extracting hypothesis regions (i.e., region proposals), resulting in redundant computation and sub-optimal performance. In this work, we achieve the interpretable and contextualized multi-label image classification by developing a recurrent memorized-attention module. This module consists of two alternately performed components: i) a spatial transformer layer to locate attentional regions from the convolutional feature maps in a region-proposal-free way and ii) an LSTM (Long-Short Term Memory) sub-network to sequentially predict semantic labeling scores on the located regions while capturing the global dependencies of these regions. The LSTM also output the parameters for computing the spatial transformer. On large-scale benchmarks of multi-label image classification (e.g., MS-COCO and PASCAL VOC 07), our approach demonstrates superior performances over other existing state-of-the-arts in both accuracy and efficiency.

READ FULL TEXT

page 1

page 5

page 7

research
12/20/2017

Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition

Recognizing multiple labels of images is a fundamental but challenging t...
research
07/03/2020

Multi-Label Image Recognition with Multi-Class Attentional Regions

Multi-label image recognition is a practical and challenging task compar...
research
07/16/2019

Relation Network for Multi-label Aerial Image Classification

Multi-label classification plays a momentous role in perceiving intricat...
research
01/01/2020

Residual Block-based Multi-Label Classification and Localization Network with Integral Regression for Vertebrae Labeling

Accurate identification and localization of the vertebrae in CT scans is...
research
08/28/2023

GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition

Multi-Label Image Recognition (MLIR) is a challenging task that aims to ...
research
08/21/2023

LDCSF: Local depth convolution-based Swim framework for classifying multi-label histopathology images

Histopathological images are the gold standard for diagnosing liver canc...

Please sign up or login with your details

Forgot password? Click here to reset