Quantizing neural networks is one of the most effective methods for achi...
Neural network pruning and quantization techniques are almost as old as
...
Recently, the idea of using FP8 as a number format for neural network
tr...
Neural network quantization is frequently used to optimize model size,
l...
When quantizing neural networks for efficient inference, low-bit integer...
In this paper, we introduce a novel method of neural network weight
comp...
Current methods for pruning neural network weights iteratively apply
mag...
While neural networks have advanced the frontiers in many applications, ...
We introduce Bayesian Bits, a practical method for joint mixed precision...
When quantizing neural networks, assigning each floating-point weight to...
We analyze the effect of quantizing weights and activations of neural
ne...
We introduce a data-free quantization method for deep neural networks th...