Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence

05/15/2021
by   Robert A. Cohen, et al.
0

In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN, without requiring any retraining of the network weights. Mathematical models for estimating the clipping and quantization error of ReLU and leaky-ReLU activations at this intermediate layer are developed and used to compute optimal clipping ranges for coarse quantization. We also present a modified entropy-constrained design algorithm for quantizing clipped activations. When applied to popular object-detection and classification DNNs, we were able to compress the 32-bit floating point intermediate activations down to 0.6 to 0.8 bits, while keeping the loss in accuracy to less than 1 HEVC, we found that the lightweight codec consistently provided better inference accuracy, by up to 1.3 lightweight compression technique makes it an attractive option for coding an intermediate layer of a split neural network for edge/cloud applications.

READ FULL TEXT

page 1

page 13

research
05/12/2021

Lightweight compression of neural network feature tensors for collaborative intelligence

In collaborative intelligence applications, part of a deep neural networ...
research
10/11/2022

Edge-Cloud Cooperation for DNN Inference via Reinforcement Learning and Supervised Learning

Deep Neural Networks (DNNs) have been widely applied in Internet of Thin...
research
05/24/2022

Multi-Agent Collaborative Inference via DNN Decoupling: Intermediate Feature Compression and Edge Learning

Recently, deploying deep neural network (DNN) models via collaborative i...
research
08/24/2022

A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate Compression for Split DNN Computing

Split computing has emerged as a recent paradigm for implementation of D...
research
12/16/2018

Auto-tuning Neural Network Quantization Framework for Collaborative Inference Between the Cloud and Edge

Recently, deep neural networks (DNNs) have been widely applied in mobile...
research
02/12/2018

Deep feature compression for collaborative object detection

Recent studies have shown that the efficiency of deep neural networks in...
research
07/13/2022

DiverGet: A Search-Based Software Testing Approach for Deep Neural Network Quantization Assessment

Quantization is one of the most applied Deep Neural Network (DNN) compre...

Please sign up or login with your details

Forgot password? Click here to reset