Efficient Video Classification Using Fewer Frames

02/27/2019
by   Shweta Bhardwaj, et al.
0

Recently,there has been a lot of interest in building compact models for video classification which have a small memory footprint (<1 GB). While these models are compact, they typically operate by repeated application of a small weight matrix to all the frames in a video. E.g. recurrent neural network based methods compute a hidden state for every frame of the video using a recurrent weight matrix. Similarly, cluster-and-aggregate based methods such as NetVLAD, have a learnable clustering matrix which is used to assign soft-clusters to every frame in the video. Since these models look at every frame in the video, the number of floating point operations (FLOPs) is still large even though the memory footprint is small. We focus on building compute-efficient video classification models which process fewer frames and hence have less number of FLOPs. Similar to memory efficient models, we use the idea of distillation albeit in a different setting. Specifically, in our case, a compute-heavy teacher which looks at all the frames in the video is used to train a compute-efficient student which looks at only a small fraction of frames in the video. This is in contrast to a typical memory efficient Teacher-Student setting, wherein both the teacher and the student look at all the frames in the video but the student has fewer parameters. Our work thus complements the research on memory efficient video classification. We do an extensive evaluation with three types of models for video classification,viz.(i) recurrent models (ii) cluster-and-aggregate models and (iii) memory-efficient cluster-and-aggregate models and show that in each of these cases, a see-it-all teacher can be used to train a compute efficient see-very-little student. We show that the proposed student network can reduce the inference time by 30 the number of FLOPs by approximately 90 performance.

READ FULL TEXT

page 4

page 7

research
05/12/2018

I Have Seen Enough: A Teacher Student Network for Video Classification Using Fewer Frames

Over the past few years, various tasks involving videos such as classifi...
research
12/06/2018

Online Model Distillation for Efficient Video Inference

High-quality computer vision models typically address the problem of und...
research
03/17/2022

Delta Distillation for Efficient Video Processing

This paper aims to accelerate video stream processing, such as object de...
research
03/02/2023

Distilling Multi-Level X-vector Knowledge for Small-footprint Speaker Verification

Deep speaker models yield low error rates in speaker verification. Nonet...
research
10/02/2018

Training compact deep learning models for video classification using circulant matrices

In real world scenarios, model accuracy is hardly the only factor to con...
research
03/29/2021

No frame left behind: Full Video Action Recognition

Not all video frames are equally informative for recognizing an action. ...

Please sign up or login with your details

Forgot password? Click here to reset