NF4 Isn't Information Theoretically Optimal (and that's Good)

06/12/2023
by   Davis Yoshida, et al.
0

This note shares some simple calculations and experiments related to absmax-based blockwise quantization, as used in Dettmers et al., 2023. Their proposed NF4 data type is said to be information theoretically optimal for representing normally distributed weights. I show that this is can't quite be the case, as the distribution of the values to be quantized depends on the block-size. I attempt to apply these insights to derive an improved code based on minimizing the expected L1 reconstruction error, rather than the quantile based method. This leads to improved performance for larger quantization block sizes, while both codes perform similarly at smaller block sizes.

READ FULL TEXT
research
04/29/2020

Quantized Adam with Error Feedback

In this paper, we present a distributed variant of adaptive stochastic g...
research
03/03/2023

Codes with Weighted Poset Block Metrics

Weighted poset block metric is a generalization of weighted poset metric...
research
06/11/2022

Convex quantization preserves logconcavity

Much like convexity is key to variational optimization, a logconcave dis...
research
06/07/2021

Smoothness-Aware Quantization Techniques

Distributed machine learning has become an indispensable tool for traini...
research
09/21/2023

Activation Compression of Graph Neural Networks using Block-wise Quantization with Improved Variance Minimization

Efficient training of large-scale graph neural networks (GNNs) has been ...
research
01/12/2020

Finite-Level Quantization Procedures for Construction and Decoding of Polar Codes

We consider finite-level, symmetric quantization procedures for construc...

Please sign up or login with your details

Forgot password? Click here to reset