Interpretable Deep Feature Propagation for Early Action Recognition

07/11/2021
by   He Zhao, et al.
6

Early action recognition (action prediction) from limited preliminary observations plays a critical role for streaming vision systems that demand real-time inference, as video actions often possess elongated temporal spans which cause undesired latency. In this study, we address action prediction by investigating how action patterns evolve over time in a spatial feature space. There are three key components to our system. First, we work with intermediate-layer ConvNet features, which allow for abstraction from raw data, while retaining spatial layout. Second, instead of propagating features per se, we propagate their residuals across time, which allows for a compact representation that reduces redundancy. Third, we employ a Kalman filter to combat error build-up and unify across prediction start times. Extensive experimental results on multiple benchmarks show that our approach leads to competitive performance in action prediction. Notably, we investigate the learned components of our system to shed light on their otherwise opaque natures in two ways. First, we document that our learned feature propagation module works as a spatial shifting mechanism under convolution to propagate current observations into the future. Thus, it captures flow-based image motion information. Second, the learned Kalman filter adaptively updates prior estimation to aid the sequence learning process.

READ FULL TEXT

page 1

page 8

page 9

page 10

page 11

page 12

page 14

research
03/23/2021

Learning Comprehensive Motion Representation for Action Recognition

For action recognition learning, 2D CNN-based methods are efficient but ...
research
06/02/2018

Squeeze-and-Excitation on Spatial and Temporal Deep Feature Space for Action Recognition

Spatial and temporal features are two key and complementary information ...
research
02/22/2021

Deep Kalman Filter: A Refinement Module for the Rollout Trajectory Prediction Methods

Trajectory prediction plays a pivotal role in the field of intelligent v...
research
12/13/2015

Action Recognition with Image Based CNN Features

Most of human actions consist of complex temporal compositions of more s...
research
08/26/2020

Effective Action Recognition with Embedded Key Point Shifts

Temporal feature extraction is an essential technique in video-based act...
research
07/11/2021

Review of Video Predictive Understanding: Early Action Recognition and Future Action Prediction

Video predictive understanding encompasses a wide range of efforts that ...
research
07/28/2020

Kalman Filter-based Head Motion Prediction for Cloud-based Mixed Reality

Volumetric video allows viewers to experience highly-realistic 3D conten...

Please sign up or login with your details

Forgot password? Click here to reset