Binary and Ternary Natural Language Generation

06/02/2023
by   Zechun Liu, et al.
0

Ternary and binary neural networks enable multiplication-free computation and promise multiple orders of magnitude efficiency gains over full-precision networks if implemented on specialized hardware. However, since both the parameter and the output space are highly discretized, such networks have proven very difficult to optimize. The difficulties are compounded for the class of transformer text generation models due to the sensitivity of the attention operation to quantization and the noise-compounding effects of autoregressive decoding in the high-cardinality output space. We approach the problem with a mix of statistics-based quantization for the weights and elastic quantization of the activations and demonstrate the first ternary and binary transformer models on the downstream tasks of summarization and machine translation. Our ternary BART base achieves an R1 score of 41 on the CNN/DailyMail benchmark, which is merely 3.9 points behind the full model while being 16x more efficient. Our binary model, while less accurate, achieves a highly non-trivial score of 35.6. For machine translation, we achieved BLEU scores of 21.7 and 17.6 on the WMT16 En-Ro benchmark, compared with a full precision mBART model score of 26.8. We also compare our approach in the 8-bit activation setting, where our ternary and even binary weight models can match or outperform the best existing 8-bit weight models in the literature. Our code and models are available at: https://github.com/facebookresearch/Ternary_Binary_Transformer

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2022

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Large language models (LLMs) show excellent performance but are compute-...
research
04/24/2020

Lite Transformer with Long-Short Range Attention

Transformer has become ubiquitous in natural language processing (e.g., ...
research
08/15/2023

EQ-Net: Elastic Quantization Neural Networks

Current model quantization methods have shown their promising capability...
research
01/03/2020

Learning Accurate Integer Transformer Machine-Translation Models

We describe a method for training accurate Transformer machine-translati...
research
01/22/2020

Normalization of Input-output Shared Embeddings in Text Generation Models

Neural Network based models have been state-of-the-art models for variou...
research
12/27/2020

Learning Light-Weight Translation Models from Deep Transformer

Recently, deep models have shown tremendous improvements in neural machi...
research
05/25/2022

BiT: Robustly Binarized Multi-distilled Transformer

Modern pre-trained transformers have rapidly advanced the state-of-the-a...

Please sign up or login with your details

Forgot password? Click here to reset