Efficient Human Vision Inspired Action Recognition using Adaptive Spatiotemporal Sampling

07/12/2022
by   Khoi-Nguyen C. Mac, et al.
0

Adaptive sampling that exploits the spatiotemporal redundancy in videos is critical for always-on action recognition on wearable devices with limited computing and battery resources. The commonly used fixed sampling strategy is not context-aware and may under-sample the visual content, and thus adversely impacts both computation efficiency and accuracy. Inspired by the concepts of foveal vision and pre-attentive processing from the human visual perception mechanism, we introduce a novel adaptive spatiotemporal sampling scheme for efficient action recognition. Our system pre-scans the global scene context at low-resolution and decides to skip or request high-resolution features at salient regions for further processing. We validate the system on EPIC-KITCHENS and UCF-101 datasets for action recognition, and show that our proposed approach can greatly speed up inference with a tolerable loss of accuracy compared with those from state-of-the-art baselines. Source code is available in https://github.com/knmac/adaptive_spatiotemporal.

READ FULL TEXT

page 3

page 4

page 6

page 13

page 15

page 16

page 18

page 19

research
01/11/2018

Fully-Coupled Two-Stream Spatiotemporal Networks for Extremely Low Resolution Action Recognition

A major emerging challenge is how to protect people's privacy as cameras...
research
07/20/2020

Context-Aware RCNN: A Baseline for Action Detection in Videos

Video action detection approaches usually conduct actor-centric action r...
research
07/31/2020

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

Action recognition is an open and challenging problem in computer vision...
research
12/15/2020

Towards Improving Spatiotemporal Action Recognition in Videos

Spatiotemporal action recognition deals with locating and classifying ac...
research
04/10/2019

Attentive Action and Context Factorization

We propose a method for human action recognition, one that can localize ...
research
08/30/2021

LIGAR: Lightweight General-purpose Action Recognition

Growing amount of different practical tasks in a video understanding pro...
research
06/09/2022

STIP: A SpatioTemporal Information-Preserving and Perception-Augmented Model for High-Resolution Video Prediction

Although significant achievements have been achieved by recurrent neural...

Please sign up or login with your details

Forgot password? Click here to reset