A High-Performance Adaptive Quantization Approach for Edge CNN Applications

07/18/2021
by   Hsu-Hsun Chin, et al.
0

Recent convolutional neural network (CNN) development continues to advance the state-of-the-art model accuracy for various applications. However, the enhanced accuracy comes at the cost of substantial memory bandwidth and storage requirements and demanding computational resources. Although in the past the quantization methods have effectively reduced the deployment cost for edge devices, it suffers from significant information loss when processing the biased activations of contemporary CNNs. In this paper, we hence introduce an adaptive high-performance quantization method to resolve the issue of biased activation by dynamically adjusting the scaling and shifting factors based on the task loss. Our proposed method has been extensively evaluated on image classification models (ResNet-18/34/50, MobileNet-V2, EfficientNet-B0) with ImageNet dataset, object detection model (YOLO-V4) with COCO dataset, and language models with PTB dataset. The results show that our 4-bit integer (INT4) quantization models achieve better accuracy than the state-of-the-art 4-bit models, and in some cases, even surpass the golden full-precision models. The final designs have been successfully deployed onto extremely resource-constrained edge devices for many practical applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2023

Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

As the size of large language models (LLMs) continues to grow, model com...
research
05/04/2021

One Model for All Quantization: A Quantized Network Supporting Hot-Swap Bit-Width Adjustment

As an effective technique to achieve the implementation of deep neural n...
research
10/13/2020

A Very Compact Embedded CNN Processor Design Based on Logarithmic Computing

In this paper, we propose a very compact embedded CNN processor design b...
research
04/19/2023

Post-Training Quantization for Object Detection

Efficient inference for object detection networks is a major challenge o...
research
12/18/2014

Compressing Deep Convolutional Networks using Vector Quantization

Deep convolutional neural networks (CNN) has become the most promising m...
research
05/30/2019

Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers

This paper presents a novel end-to-end methodology for enabling the depl...
research
03/07/2023

CAE-CNNLoc: An Edge-based WiFi Fingerprinting Indoor Localization Using Convolutional Neural Network and Convolutional Auto-Encoder

With the ongoing development of Indoor Location-Based Services, accurate...

Please sign up or login with your details

Forgot password? Click here to reset