Hierarchical Feature Aggregation Networks for Video Action Recognition

05/29/2019
by   Swathikiran Sudhakaran, et al.
0

Most action recognition methods base on a) a late aggregation of frame level CNN features using average pooling, max pooling, or RNN, among others, or b) spatio-temporal aggregation via 3D convolutions. The first assume independence among frame features up to a certain level of abstraction and then perform higher-level aggregation, while the second extracts spatio-temporal features from grouped frames as early fusion. In this paper we explore the space in between these two, by letting adjacent feature branches interact as they develop into the higher level representation. The interaction happens between feature differencing and averaging at each level of the hierarchy, and it has convolutional structure that learns to select the appropriate mode locally in contrast to previous works that impose one of the modes globally (e.g. feature differencing) as a design choice. We further constrain this interaction to be conservative, e.g. a local feature subtraction in one branch is compensated by the addition on another, such that the total feature flow is preserved. We evaluate the performance of our proposal on a number of existing models, i.e. TSN, TRN and ECO, to show its flexibility and effectiveness in improving action recognition performance.

READ FULL TEXT
research
10/06/2021

SAIC_Cambridge-HuPBA-FBK Submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021

This report presents the technical details of our submission to the EPIC...
research
12/14/2018

TAN: Temporal Aggregation Network for Dense Multi-label Action Recognition

We present Temporal Aggregation Network (TAN) which decomposes 3D convol...
research
08/24/2017

Relaxed Spatio-Temporal Deep Feature Aggregation for Real-Fake Expression Prediction

Frame-level visual features are generally aggregated in time with the te...
research
09/18/2019

Class Feature Pyramids for Video Explanation

Deep convolutional networks are widely used in video action recognition....
research
06/08/2020

Action Recognition with Deep Multiple Aggregation Networks

Most of the current action recognition algorithms are based on deep netw...
research
12/09/2021

Spatio-temporal Relation Modeling for Few-shot Action Recognition

We propose a novel few-shot action recognition framework, STRM, which en...
research
06/08/2020

Deep hierarchical pooling design for cross-granularity action recognition

In this paper, we introduce a novel hierarchical aggregation design that...

Please sign up or login with your details

Forgot password? Click here to reset