Contextual Attention for Hand Detection in the Wild

by   Supreeth Narasimhaswamy, et al.

We present Hand-CNN, a novel convolutional network architecture for detecting hand masks and predicting hand orientations in unconstrained images. Hand-CNN extends MaskRCNN with a novel attention mechanism to incorporate contextual cues in the detection process. This attention mechanism can be implemented as an efficient network module that captures non-local dependencies between features. This network module can be inserted at different stages of an object detection network, and the entire detector can be trained end-to-end. We also introduce a large-scale annotated hand dataset containing hands in unconstrained images for training and evaluation. We show that Hand-CNN outperforms existing methods on several datasets, including our hand detection benchmark and the publicly available PASCAL VOC human layout challenge. We also conduct ablation studies on hand detection to show the effectiveness of the proposed contextual attention module.


page 1

page 3

page 5

page 6

page 8


Detecting Hands and Recognizing Physical Contact in the Wild

We investigate a new problem of detecting hands and recognizing their ph...

An Attention Module for Convolutional Neural Networks

Attention mechanism has been regarded as an advanced technique to captur...

One for All: An End-to-End Compact Solution for Hand Gesture Recognition

The HGR is a quite challenging task as its performance is influenced by ...

Interacting Attention Graph for Single Image Two-Hand Reconstruction

Graph convolutional network (GCN) has achieved great success in single h...

DCANet: Learning Connected Attentions for Convolutional Neural Networks

While self-attention mechanism has shown promising results for many visi...

An attention mechanism based convolutional network for satellite precipitation downscaling over China

Precipitation is a key part of hydrological circulation and is a sensiti...

Large-Field Contextual Feature Learning for Glass Detection

Glass is very common in our daily life. Existing computer vision systems...