Online Model Distillation for Efficient Video Inference

by   Ravi Teja Mullapudi, et al.

High-quality computer vision models typically address the problem of understanding the general distribution of real-world images. However, most cameras observe only a very small fraction of this distribution. This offers the possibility of achieving more efficient inference by specializing compact, low-cost models to the specific distribution of frames observed by a single camera. In this paper, we employ the technique of model distillation (supervising a low-cost student model using the output of a high-cost teacher) to specialize accurate, low-cost semantic segmentation models to a target video stream. Rather than learn a specialized student model on offline data from the video stream, we train the student in an online fashion on the live video, intermittently running the teacher to provide a target for learning. Online model distillation yields semantic segmentation models that closely approximate their Mask R-CNN teacher with 7 to 17x lower inference runtime cost (11 to 26x in FLOPs), even when the target video's distribution is non-stationary. Our method requires no offline pretraining on the target video stream, and achieves higher accuracy and lower cost than solutions based on flow or video object segmentation. We also provide a new video dataset for evaluating the efficiency of inference over long running video streams.


page 1

page 5

page 7


Delta Distillation for Efficient Video Processing

This paper aims to accelerate video stream processing, such as object de...

MaskedKD: Efficient Distillation of Vision Transformers with Masked Images

Knowledge distillation is a popular and effective regularization techniq...

Efficient Video Classification Using Fewer Frames

Recently,there has been a lot of interest in building compact models for...

Low Cost Edge Sensing for High Quality Demosaicking

Digital cameras that use Color Filter Arrays (CFA) entail a demosaicking...

ShadowTutor: Distributed Partial Distillation for Mobile Video DNN Inference

Following the recent success of deep neural networks (DNN) on video comp...

I Have Seen Enough: A Teacher Student Network for Video Classification Using Fewer Frames

Over the past few years, various tasks involving videos such as classifi...

Toward Understanding Privileged Features Distillation in Learning-to-Rank

In learning-to-rank problems, a privileged feature is one that is availa...

Please sign up or login with your details

Forgot password? Click here to reset