DeepAI AI Chat
Log In Sign Up

With Shared Microexponents, A Little Shifting Goes a Long Way

by   Bita Rouhani, et al.

This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning. It enables comparison of popular quantization standards, and through BDR, new formats based on shared microexponents (MX) are identified, which outperform other state-of-the-art quantization approaches, including narrow-precision floating-point and block floating-point. MX utilizes multiple levels of quantization scaling with ultra-fine scaling factors based on shared microexponents in the hardware. The effectiveness of MX is demonstrated on real-world models including large-scale generative pretraining and inferencing, and production-scale recommendation systems.


page 1

page 2

page 3

page 4


Rethinking Numerical Representations for Deep Neural Networks

With ever-increasing computational demand for deep learning, it is criti...

FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding

Block Floating Point (BFP) can efficiently support quantization for Deep...

Training DNNs with Hybrid Block Floating Point

The wide adoption of DNNs has given birth to unrelenting computing requi...

Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines

Deep learning as a means to inferencing has proliferated thanks to its v...

ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats

In the complex domain of large language models (LLMs), striking a balanc...

StatAssist GradBoost: A Study on Optimal INT8 Quantization-aware Training from Scratch

This paper studies the scratch training of quantization-aware training (...

A Study of BFLOAT16 for Deep Learning Training

This paper presents the first comprehensive empirical study demonstratin...