DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures using Lookup Tables

04/18/2023
by   Darshan C. Ganji, et al.
0

A lot of recent progress has been made in ultra low-bit quantization, promising significant improvements in latency, memory footprint and energy consumption on edge devices. Quantization methods such as Learned Step Size Quantization can achieve model accuracy that is comparable to full-precision floating-point baselines even with sub-byte quantization. However, it is extremely challenging to deploy these ultra low-bit quantized models on mainstream CPU devices because commodity SIMD (Single Instruction, Multiple Data) hardware typically supports no less than 8-bit precision. To overcome this limitation, we propose DeepGEMM, a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware. The proposed method precomputes all possible products of weights and activations, stores them in a lookup table, and efficiently accesses them at inference time to avoid costly multiply-accumulate operations. Our 2-bit implementation outperforms corresponding 8-bit integer kernels in the QNNPACK framework by up to 1.74x on x86 platforms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2023

DeepliteRT: Computer Vision at the Edge

The proliferation of edge devices has unlocked unprecedented opportuniti...
research
11/20/2020

HAWQV3: Dyadic Neural Network Quantization

Quantization is one of the key techniques used to make Neural Networks (...
research
07/13/2022

Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets

We propose a novel 2-stage sub 8-bit quantization aware training algorit...
research
06/16/2023

Sparq: A Custom RISC-V Vector Processor for Efficient Sub-Byte Quantized Inference

Convolutional Neural Networks (CNNs) are used in a wide range of applica...
research
08/31/2021

Quantization of Generative Adversarial Networks for Efficient Inference: a Methodological Study

Generative adversarial networks (GANs) have an enormous potential impact...
research
09/30/2022

Convolutional Neural Networks Quantization with Attention

It has been proven that, compared to using 32-bit floating-point numbers...
research
08/25/2023

A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance

We present accumulator-aware quantization (A2Q), a novel weight quantiza...

Please sign up or login with your details

Forgot password? Click here to reset