Partitioning-Guided K-Means: Extreme Empty Cluster Resolution for Extreme Model Compression

06/24/2023
by   Tianhong Huang, et al.
0

Compactness in deep learning can be critical to a model's viability in low-resource applications, and a common approach to extreme model compression is quantization. We consider Iterative Product Quantization (iPQ) with Quant-Noise to be state-of-the-art in this area, but this quantization framework suffers from preventable inference quality degradation due to prevalent empty clusters. In this paper, we propose several novel enhancements aiming to improve the accuracy of iPQ with Quant-Noise by focusing on resolving empty clusters. Our contribution, which we call Partitioning-Guided k-means (PG k-means), is a heavily augmented k-means implementation composed of three main components. First, we propose a partitioning-based pre-assignment strategy that ensures no initial empty clusters and encourages an even weight-to-cluster distribution. Second, we propose an empirically superior empty cluster resolution heuristic executed via cautious partitioning of large clusters. Finally, we construct an optional optimization step that consolidates intuitively dense clusters of weights to ensure shared representation. The proposed approach consistently reduces the number of empty clusters in iPQ with Quant-Noise by 100x on average, uses 8x fewer iterations during empty cluster resolution, and improves overall model accuracy by up to 12 RoBERTa on a variety of tasks in the GLUE benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2020

Training with Quantization Noise for Extreme Model Compression

We tackle the problem of producing compact models, maximizing their accu...
research
03/10/2021

Quantization-Guided Training for Compact TinyML Models

We propose a Quantization Guided Training (QGT) method to guide DNN trai...
research
04/15/2020

Training with Quantization Noise for Extreme Fixed-Point Compression

We tackle the problem of producing compact models, maximizing their accu...
research
08/08/2023

Optimal partitioning of directed acyclic graphs with dependent costs between clusters

Many statistical inference contexts, including Bayesian Networks (BNs), ...
research
08/22/2020

One Weight Bitwidth to Rule Them All

Weight quantization for deep ConvNets has shown promising results for ap...
research
04/12/2010

Feature Level Fusion of Face and Palmprint Biometrics by Isomorphic Graph-based Improved K-Medoids Partitioning

This paper presents a feature level fusion approach which uses the impro...
research
02/02/2021

Image Splicing Detection, Localization and Attribution via JPEG Primary Quantization Matrix Estimation and Clustering

Detection of inconsistencies of double JPEG artefacts across different i...

Please sign up or login with your details

Forgot password? Click here to reset