AdaFrame: Adaptive Frame Selection for Fast Video Recognition

11/29/2018
by   Zuxuan Wu, et al.
4

We present AdaFrame, a framework that adaptively selects relevant frames on a per-input basis for fast video recognition. AdaFrame contains a Long Short-Term Memory network augmented with a global memory that provides context information for searching which frames to use over time. Trained with policy gradient methods, AdaFrame generates a prediction, determines which frame to observe next, and computes the utility, i.e., expected future rewards, of seeing more frames at each time step. At testing time, AdaFrame exploits predicted utilities to achieve adaptive lookahead inference such that the overall computational costs are reduced without incurring a decrease in accuracy. Extensive experiments are conducted on two large-scale video benchmarks, FCVID and AvtivityNet. AdaFrame matches the performance of using all frames with only 8.21 and 8.65 frames on FCVID and AvtivityNet, respectively. We further qualitatively demonstrate learned frame usage can indicate the difficulty of making classification decisions; easier samples need fewer frames while harder ones require more, both at instance-level within the same class and at class-level among different categories.

READ FULL TEXT

page 3

page 8

research
09/19/2017

Convolutional Long Short-Term Memory Networks for Recognizing First Person Interactions

In this paper, we present a novel deep learning based approach for addre...
research
12/29/2020

2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition

3D convolutional networks are prevalent for video recognition. While ach...
research
03/11/2019

Quality-Gated Convolutional LSTM for Enhancing compressed video

The past decade has witnessed great success in applying deep learning to...
research
05/22/2015

Efficient Large Scale Video Classification

Video classification has advanced tremendously over the recent years. A ...
research
03/15/2023

VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression

In content-based video retrieval (CBVR), dealing with large-scale collec...
research
09/30/2022

INT: Towards Infinite-frames 3D Detection with An Efficient Framework

It is natural to construct a multi-frame instead of a single-frame 3D de...
research
11/30/2017

Budget-Aware Activity Detection with A Recurrent Policy Network

In this paper, we address the challenging problem of effi- cient tempora...

Please sign up or login with your details

Forgot password? Click here to reset