Dynamic Network Quantization for Efficient Video Inference

08/23/2021
by   Ximeng Sun, et al.
5

Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition. Motivated by the effectiveness of quantization for boosting efficiency, in this paper, we propose a dynamic network quantization framework, that selects optimal precision for each frame conditioned on the input for efficient video recognition. Specifically, given a video clip, we train a very lightweight network in parallel with the recognition network, to produce a dynamic policy indicating which numerical precision to be used per frame in recognizing videos. We train both networks effectively using standard backpropagation with a loss to achieve both competitive performance and resource efficiency required for video recognition. Extensive experiments on four challenging diverse benchmark datasets demonstrate that our proposed approach provides significant savings in computation and memory usage while outperforming the existing state-of-the-art methods.

READ FULL TEXT

page 1

page 3

page 8

page 14

research
07/31/2020

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

Action recognition is an open and challenging problem in computer vision...
research
05/11/2021

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

Multi-modal learning, which focuses on utilizing various modalities to i...
research
03/02/2021

All at Once Network Quantization via Collaborative Knowledge Transfer

Network quantization has rapidly become one of the most widely used meth...
research
07/10/2016

Intra-layer Nonuniform Quantization for Deep Convolutional Neural Network

Deep convolutional neural network (DCNN) has achieved remarkable perform...
research
02/15/2021

VA-RED^2: Video Adaptive Redundancy Reduction

Performing inference on deep learning models for videos remains a challe...
research
09/27/2022

AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition

Recent research has revealed that reducing the temporal and spatial redu...
research
06/11/2018

Massively Parallel Video Networks

We introduce a class of causal video understanding models that aims to i...

Please sign up or login with your details

Forgot password? Click here to reset