Revisiting Offline Compression: Going Beyond Factorization-based Methods for Transformer Language Models

02/08/2023
by   Mohammadreza Banaei, et al.
0

Recent transformer language models achieve outstanding results in many natural language processing (NLP) tasks. However, their enormous size often makes them impractical on memory-constrained devices, requiring practitioners to compress them to smaller networks. In this paper, we explore offline compression methods, meaning computationally-cheap approaches that do not require further fine-tuning of the compressed model. We challenge the classical matrix factorization methods by proposing a novel, better-performing autoencoder-based framework. We perform a comprehensive ablation study of our approach, examining its different aspects over a diverse set of evaluation settings. Moreover, we show that enabling collaboration between modules across layers by compressing certain modules together positively impacts the final model performance. Experiments on various NLP tasks demonstrate that our approach significantly outperforms commonly used factorization-based offline compression methods.

READ FULL TEXT

page 4

page 5

research
06/15/2021

Direction is what you need: Improving Word Embedding Compression in Large Language Models

The adoption of Transformer-based models in natural language processing ...
research
08/27/2019

On the Effectiveness of Low-Rank Matrix Factorization for LSTM Model Compression

Despite their ubiquity in NLP tasks, Long Short-Term Memory (LSTM) netwo...
research
04/21/2023

Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition

In recent years, Large Language Models such as GPT-3 showed remarkable c...
research
09/14/2019

Ouroboros: On Accelerating Training of Transformer-Based Language Models

Language models are essential for natural language processing (NLP) task...
research
01/24/2020

Compressing Language Models using Doped Kronecker Products

Kronecker Products (KP) have been used to compress IoT RNN Applications ...
research
10/14/2021

Compressibility of Distributed Document Representations

Contemporary natural language processing (NLP) revolves around learning ...
research
02/10/2023

Step by Step Loss Goes Very Far: Multi-Step Quantization for Adversarial Text Attacks

We propose a novel gradient-based attack against transformer-based langu...

Please sign up or login with your details

Forgot password? Click here to reset