Quantization of transformer language models faces significant challenges...
Transformer architecture has become the fundamental element of the wides...
Model binarization is an effective method of compressing neural networks...
Recently, generative data-free quantization emerges as a practical appro...
Quantization has emerged as one of the most prevalent approaches to comp...