FrameExit: Conditional Early Exiting for Efficient Video Recognition

04/27/2021
by   Amir Ghodrati, et al.
5

In this paper, we propose a conditional early exiting framework for efficient video recognition. While existing works focus on selecting a subset of salient frames to reduce the computation costs, we propose to use a simple sampling strategy combined with conditional early exiting to enable efficient recognition. Our model automatically learns to process fewer frames for simpler videos and more frames for complex ones. To achieve this, we employ a cascade of gating modules to automatically determine the earliest point in processing where an inference is sufficiently reliable. We generate on-the-fly supervision signals to the gates to provide a dynamic trade-off between accuracy and computational cost. Our proposed model outperforms competing methods on three large-scale video benchmarks. In particular, on ActivityNet1.3 and mini-kinetics, we outperform the state-of-the-art efficient video recognition methods with 1.3× and 2.1× less GFLOPs, respectively. Additionally, our method sets a new state of the art for efficient video understanding on the HVU benchmark.

READ FULL TEXT

page 1

page 3

page 5

page 11

research
01/12/2022

OCSampler: Compressing Videos to One Clip with Single-step Sampling

In this paper, we propose a framework named OCSampler to explore a compa...
research
12/29/2020

2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition

3D convolutional networks are prevalent for video recognition. While ach...
research
09/27/2022

AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition

Recent research has revealed that reducing the temporal and spatial redu...
research
12/03/2019

LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition

This paper presents LiteEval, a simple yet effective coarse-to-fine fram...
research
09/26/2022

Rethinking Resolution in the Context of Efficient Video Recognition

In this paper, we empirically study how to make the most of low-resoluti...
research
01/27/2023

Semi-Parametric Video-Grounded Text Generation

Efficient video-language modeling should consider the computational cost...
research
02/16/2021

THIA: Accelerating Video Analytics using Early Inference and Fine-Grained Query Planning

To efficiently process visual data at scale, researchers have proposed t...

Please sign up or login with your details

Forgot password? Click here to reset