DKM: Differentiable K-Means Clustering Layer for Neural Network Compression

08/28/2021
by   Minsik Cho, et al.
33

Deep neural network (DNN) model compression for efficient on-device inference is becoming increasingly important to reduce memory requirements and keep user data on-device. To this end, we propose a novel differentiable k-means clustering layer (DKM) and its application to train-time weight clustering-based DNN model compression. DKM casts k-means clustering as an attention problem and enables joint optimization of the parameters and clustering centroids. Unlike prior works that rely on additional regularizers and parameters, DKM-based compression keeps the original loss function and model architecture fixed. We evaluated DKM-based compression on various DNN models for computer vision and natural language processing (NLP) tasks. Our results demonstrate that DMK delivers superior compression and accuracy trade-off on ImageNet1k and GLUE benchmarks. For example, DKM-based compression can offer 74.5 model size (29.4x model compression factor). For MobileNet-v1, which is a challenging DNN to compress, DKM delivers 62.8 0.74 MB model size (22.4x model compression factor). This result is 6.8 top-1 accuracy and 33 state-of-the-art DNN compression algorithms. Additionally, DKM enables compression of DistilBERT model by 11.8x with minimal (1.1 GLUE NLP benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2023

eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models

Since Large Language Models or LLMs have demonstrated high-quality perfo...
research
03/14/2018

DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework

As one of most fascinating machine learning techniques, deep neural netw...
research
09/27/2019

Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

Deep Neural Network (DNN) is powerful but computationally expensive and ...
research
01/24/2020

Compressing Language Models using Doped Kronecker Products

Kronecker Products (KP) have been used to compress IoT RNN Applications ...
research
06/02/2016

Ensemble-Compression: A New Method for Parallel Training of Deep Neural Networks

Parallelization framework has become a necessity to speed up the trainin...
research
12/06/2021

Fast Test Input Generation for Finding Deviated Behaviors in Compressed Deep Neural Network

Model compression can significantly reduce sizes of deep neural network ...
research
01/10/2021

Adversarially robust and explainable model compression with on-device personalization for NLP applications

On-device Deep Neural Networks (DNNs) have recently gained more attentio...

Please sign up or login with your details

Forgot password? Click here to reset