Attentional Pooling for Action Recognition

11/04/2017
by   Rohit Girdhar, et al.
0

We introduce a simple yet surprisingly powerful model to incorporate attention in action recognition and human object interaction tasks. Our proposed attention module can be trained with or without extra supervision, and gives a sizable boost in accuracy while keeping the network size and computational cost nearly the same. It leads to significant improvements over state of the art base architecture on three standard action recognition benchmarks across still images and videos, and establishes new state of the art on MPII (12.5 an extensive analysis of our attention module both empirically and analytically. In terms of the latter, we introduce a novel derivation of bottom-up and top-down attention as low-rank approximations of bilinear pooling methods (typically used for fine-grained classification). From this perspective, our attention formulation suggests a novel characterization of action recognition as a fine-grained recognition problem.

READ FULL TEXT

page 6

page 8

research
04/02/2020

Knowing What, Where and When to Look: Efficient Video Action Modeling with Attention

Attentive video modeling is essential for action recognition in unconstr...
research
12/13/2021

Multi-Expert Human Action Recognition with Hierarchical Super-Class Learning

In still image human action recognition, existing studies have mainly le...
research
07/25/2021

Adaptive Recursive Circle Framework for Fine-grained Action Recognition

How to model fine-grained spatial-temporal dynamics in videos has been a...
research
08/20/2019

Action recognition with spatial-temporal discriminative filter banks

Action recognition has seen a dramatic performance improvement in the la...
research
08/19/2022

Hierarchical Compositional Representations for Few-shot Action Recognition

Recently action recognition has received more and more attention for its...
research
02/16/2021

Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries

We present EgoACO, a deep neural architecture for video action recogniti...
research
12/05/2018

Local Temporal Bilinear Pooling for Fine-grained Action Parsing

Fine-grained temporal action parsing is important in many applications, ...

Please sign up or login with your details

Forgot password? Click here to reset