b'Markus Nagel'

research

∙ 09/04/2023

Softmax Bias Correction for Quantized Generative Models

Post-training quantization (PTQ) is the go-to compression technique for ...

0 Nilesh Prasad Pandey, et al. ∙

research

∙ 08/18/2023

ResQ: Residual Quantization for Video Perception

This paper accelerates video perception, such as semantic segmentation a...

0 Davide Abati, et al. ∙

research

∙ 07/10/2023

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

Quantizing neural networks is one of the most effective methods for achi...

0 Jorn Peters, et al. ∙

research

∙ 07/06/2023

Pruning vs Quantization: Which is Better?

Neural network pruning and quantization techniques are almost as old as ...

0 Andrey Kuzmin, et al. ∙

research

∙ 06/22/2023

Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing

Transformer models have been widely adopted in various domains over the ...

0 Yelysei Bondarenko, et al. ∙

research

∙ 03/31/2023

FP8 versus INT8 for efficient deep learning inference

Recently, the idea of using FP8 as a number format for neural network tr...

5 Mart van Baalen, et al. ∙

research

∙ 02/10/2023

A Practical Mixed Precision Algorithm for Post-Training Quantization

Neural network quantization is frequently used to optimize model size, l...

1 Nilesh Prasad Pandey, et al. ∙

research

∙ 11/30/2022

Quadapter: Adapter for GPT-2 Quantization

Transformer language models such as GPT-2 are difficult to quantize beca...

0 Minseop Park, et al. ∙

research

∙ 08/19/2022

FP8 Quantization: The Power of the Exponent

When quantizing neural networks for efficient inference, low-bit integer...

4 Andrey Kuzmin, et al. ∙

research

∙ 07/22/2022

Quantized Sparse Weight Decomposition for Neural Network Compression

In this paper, we introduce a novel method of neural network weight comp...

8 Andrey Kuzmin, et al. ∙

research

∙ 06/22/2022

Quantization Robust Federated Learning for Efficient Inference on Heterogeneous Devices

Federated Learning (FL) is a machine learning paradigm to distributively...

1 Kartik Gupta, et al. ∙

research

∙ 03/21/2022

Overcoming Oscillations in Quantization-Aware Training

When training neural networks with simulated quantization, we observe th...

4 Markus Nagel, et al. ∙

research

∙ 02/02/2022

Cyclical Pruning for Sparse Neural Networks

Current methods for pruning neural network weights iteratively apply mag...

12 Suraj Srinivas, et al. ∙

research

∙ 01/20/2022

Neural Network Quantization with AI Model Efficiency Toolkit (AIMET)

While neural networks have advanced the frontiers in many machine learni...

43 Sangeetha Siddegowda, et al. ∙

research

∙ 12/21/2021

Implicit Neural Video Compression

We propose a method to compress full-resolution video sequences with imp...

5 Yunfan Zhang, et al. ∙

research

∙ 09/27/2021

Understanding and Overcoming the Challenges of Efficient Transformer Quantization

Transformer-based architectures have become the de-facto standard models...

3 Yelysei Bondarenko, et al. ∙

research

∙ 06/15/2021

A White Paper on Neural Network Quantization

While neural networks have advanced the frontiers in many applications, ...

12 Markus Nagel, et al. ∙

research

∙ 05/10/2021

In-Hindsight Quantization Range Estimation for Quantized Training

Quantization techniques applied to the inference of deep neural networks...

0 Marios Fournarakis, et al. ∙

research

∙ 05/14/2020

Bayesian Bits: Unifying Quantization and Pruning

We introduce Bayesian Bits, a practical method for joint mixed precision...

4 Mart van Baalen, et al. ∙

research

∙ 04/22/2020

Up or Down? Adaptive Rounding for Post-Training Quantization

When quantizing neural networks, assigning each floating-point weight to...

1 Markus Nagel, et al. ∙

research

∙ 04/20/2020

LSQ+: Improving low-bit quantization through learnable offsets and better initialization

Unlike ReLU, newer activation functions (like Swish, H-swish, Mish) that...

5 Yash Bhalgat, et al. ∙

research

∙ 12/20/2019

Taxonomy and Evaluation of Structured Compression of Convolutional Neural Networks

The success of deep neural networks in many real-world applications is l...

47 Andrey Kuzmin, et al. ∙

research

∙ 06/11/2019

Data-Free Quantization through Weight Equalization and Bias Correction

We introduce a data-free quantization method for deep neural networks th...

1 Markus Nagel, et al. ∙

Markus Nagel

Featured Co-authors

Sign in with Google

Consider DeepAI Pro