Fixed-point quantization aware training for on-device keyword-spotting

03/04/2023
by   Sashank Macha, et al.
0

Fixed-point (FXP) inference has proven suitable for embedded devices with limited computational resources, and yet model training is continually performed in floating-point (FLP). FXP training has not been fully explored and the non-trivial conversion from FLP to FXP presents unavoidable performance drop. We propose a novel method to train and obtain FXP convolutional keyword-spotting (KWS) models. We combine our methodology with two quantization-aware-training (QAT) techniques - squashed weight distribution and absolute cosine regularization for model parameters, and propose techniques for extending QAT over transient variables, otherwise neglected by previous paradigms. Experimental results on the Google Speech Commands v2 dataset show that we can reduce model precision up to 4-bit with no loss in accuracy. Furthermore, on an in-house KWS dataset, we show that our 8-bit FXP-QAT models have a 4-6 rate compared to full precision FLP models. During inference we argue that FXP-QAT eliminates q-format normalization and enables the use of low-bit accumulators while maximizing SIMD throughput to reduce user perceived latency. We demonstrate that we can reduce execution time by 68 KWS model's predictive performance or requiring model architectural changes. Our work provides novel findings that aid future research in this area and enable accurate and efficient models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2019

Cheetah: Mixed Low-Precision Hardware Software Co-Design Framework for DNNs on the Edge

Low-precision DNNs have been extensively explored in order to reduce the...
research
02/10/2022

F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

Neural network quantization is a promising compression technique to redu...
research
11/01/2019

Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Recent emerged quantization technique (i.e., using low bit-width fixed-p...
research
08/30/2020

Optimal Quantization for Batch Normalization in Neural Network Deployments and Beyond

Quantized Neural Networks (QNNs) use low bit-width fixed-point numbers f...
research
09/29/2019

AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference

Conventional hardware-friendly quantization methods, such as fixed-point...
research
08/02/2019

U-Net Fixed-Point Quantization for Medical Image Segmentation

Model quantization is leveraged to reduce the memory consumption and the...
research
02/19/2020

SYMOG: learning symmetric mixture of Gaussian modes for improved fixed-point quantization

Deep neural networks (DNNs) have been proven to outperform classical met...

Please sign up or login with your details

Forgot password? Click here to reset