Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

08/13/2023
by   Minsoo Kim, et al.
0

Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning. However, the large model size poses challenges for practical deployment. To solve this problem, Quantization-Aware Training (QAT) has become increasingly popular. However, current QAT methods for generative models have resulted in a noticeable loss of accuracy. To counteract this issue, we propose a novel knowledge distillation method specifically designed for GLMs. Our method, called token-scaled logit distillation, prevents overfitting and provides superior learning from the teacher model and ground truth. This research marks the first evaluation of ternary weight quantization-aware training of large-scale GLMs with less than 1.0 degradation in perplexity and no loss of accuracy in a reasoning task.

READ FULL TEXT

page 5

page 6

research
06/14/2023

Knowledge Distillation of Large Language Models

Knowledge Distillation (KD) is a promising technique for reducing the hi...
research
10/05/2020

Lifelong Language Knowledge Distillation

It is challenging to perform lifelong language learning (LLL) on a strea...
research
05/29/2023

LLM-QAT: Data-Free Quantization Aware Training for Large Language Models

Several post-training quantization methods have been applied to large la...
research
01/02/2023

Massive Language Models Can Be Accurately Pruned in One-Shot

We show for the first time that large-scale generative pretrained transf...
research
11/02/2021

LMdiff: A Visual Diff Tool to Compare Language Models

While different language models are ubiquitous in NLP, it is hard to con...
research
08/17/2023

Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

Text recognition methods are gaining rapid development. Some advanced te...
research
06/01/2023

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Large language models (LLMs) have shown excellent performance on various...

Please sign up or login with your details

Forgot password? Click here to reset