A Quantization-Friendly Separable Convolution for MobileNets

03/22/2018
by   Tao Sheng, et al.
0

As deep learning (DL) is being rapidly pushed to edge computing, researchers invented various ways to make inference computation more efficient on mobile/IoT devices, such as network pruning, parameter compression, and etc. Quantization, as one of the key approaches, can effectively offload GPU, and make it possible to deploy DL on fixed-point pipeline. Unfortunately, not all existing networks design are friendly to quantization. For example, the popular lightweight MobileNetV1, while it successfully reduces parameter size and computation latency with separable convolution, our experiment shows its quantized models have large accuracy gap against its float point models. To resolve this, we analyzed the root cause of quantization loss and proposed a quantization-friendly separable convolution architecture. By evaluating the image classification task on ImageNet2012 dataset, our modified MobileNetV1 model can archive 8-bit inference top-1 accuracy in 68.03 gap to the float pipeline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2021

Quantization Backdoors to Deep Learning Models

There is currently a burgeoning demand for deploying deep learning (DL) ...
research
02/15/2019

AutoQB: AutoML for Network Quantization and Binarization on Mobile Devices

In this paper, we propose a hierarchical deep reinforcement learning (DR...
research
12/08/2020

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

Deep Neural Networks (DNNs) have achieved extraordinary performance in v...
research
12/03/2022

Make RepVGG Greater Again: A Quantization-aware Approach

The tradeoff between performance and inference speed is critical for pra...
research
08/01/2023

MRQ:Support Multiple Quantization Schemes through Model Re-Quantization

Despite the proliferation of diverse hardware accelerators (e.g., NPU, T...
research
09/25/2019

FALCON: Fast and Lightweight Convolution for Compressing and Accelerating CNN

How can we efficiently compress Convolutional Neural Networks (CNN) whil...
research
08/04/2022

Keyword Spotting System and Evaluation of Pruning and Quantization Methods on Low-power Edge Microcontrollers

Keyword spotting (KWS) is beneficial for voice-based user interactions w...

Please sign up or login with your details

Forgot password? Click here to reset