No frame left behind: Full Video Action Recognition

03/29/2021
by   Xin Liu, et al.
0

Not all video frames are equally informative for recognizing an action. It is computationally infeasible to train deep networks on all video frames when actions develop over hundreds of frames. A common heuristic is uniformly sampling a small number of video frames and using these to recognize the action. Instead, here we propose full video action recognition and consider all video frames. To make this computational tractable, we first cluster all frame activations along the temporal dimension based on their similarity with respect to the classification task, and then temporally aggregate the frames in the clusters into a smaller number of representations. Our method is end-to-end trainable and computationally efficient as it relies on temporally localized clustering in combination with fast Hamming distances in feature space. We evaluate on UCF101, HMDB51, Breakfast, and Something-Something V1 and V2, where we compare favorably to existing heuristic frame sampling methods.

READ FULL TEXT

page 1

page 7

research
11/24/2016

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos

We propose a novel method for temporally pooling frames in a video for t...
research
04/20/2023

Search-Map-Search: A Frame Selection Paradigm for Action Recognition

Despite the success of deep learning in video understanding tasks, proce...
research
09/25/2020

Online Learnable Keyframe Extraction in Videos and its Application with Semantic Word Vector in Action Recognition

Video processing has become a popular research direction in computer vis...
research
12/19/2020

SMART Frame Selection for Action Recognition

Action recognition is computationally expensive. In this paper, we addre...
research
05/11/2022

Video-ReTime: Learning Temporally Varying Speediness for Time Remapping

We propose a method for generating a temporally remapped video that matc...
research
06/15/2015

Slow and steady feature analysis: higher order temporal coherence in video

How can unlabeled video augment visual learning? Existing methods perfor...
research
02/27/2019

Efficient Video Classification Using Fewer Frames

Recently,there has been a lot of interest in building compact models for...

Please sign up or login with your details

Forgot password? Click here to reset