Regional Attention Network (RAN) for Head Pose and Fine-grained Gesture Recognition

by   Ardhendu Behera, et al.

Affect is often expressed via non-verbal body language such as actions/gestures, which are vital indicators for human behaviors. Recent studies on recognition of fine-grained actions/gestures in monocular images have mainly focused on modeling spatial configuration of body parts representing body pose, human-objects interactions and variations in local appearance. The results show that this is a brittle approach since it relies on accurate body parts/objects detection. In this work, we argue that there exist local discriminative semantic regions, whose "informativeness" can be evaluated by the attention mechanism for inferring fine-grained gestures/actions. To this end, we propose a novel end-to-end Regional Attention Network (RAN), which is a fully Convolutional Neural Network (CNN) to combine multiple contextual regions through attention mechanism, focusing on parts of the images that are most relevant to a given task. Our regions consist of one or more consecutive cells and are adapted from the strategies used in computing HOG (Histogram of Oriented Gradient) descriptor. The model is extensively evaluated on ten datasets belonging to 3 different scenarios: 1) head pose recognition, 2) drivers state recognition, and 3) human action and facial expression recognition. The proposed approach outperforms the state-of-the-art by a considerable margin in different metrics.


page 4

page 9

page 11

page 12

page 14


Attend and Guide (AG-Net): A Keypoints-driven Attention-based Deep Network for Image Recognition

This paper presents a novel keypoints-based attention mechanism for visu...

SR-GNN: Spatial Relation-aware Graph Neural Network for Fine-Grained Image Categorization

Over the past few years, a significant progress has been made in deep co...

Coarse Temporal Attention Network (CTA-Net) for Driver's Activity Recognition

There is significant progress in recognizing traditional human activitie...

Pose-adaptive Hierarchical Attention Network for Facial Expression Recognition

Multi-view facial expression recognition (FER) is a challenging task bec...

A Fine-Grained Visual Attention Approach for Fingerspelling Recognition in the Wild

Fingerspelling in sign language has been the means of communicating tech...

Multi-Context Attention for Human Pose Estimation

In this paper, we propose to incorporate convolutional neural networks w...

One for All: An End-to-End Compact Solution for Hand Gesture Recognition

The HGR is a quite challenging task as its performance is influenced by ...

Please sign up or login with your details

Forgot password? Click here to reset