SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers

11/29/2022
by   Ameet Deshpande, et al.
0

Fine-tuning pre-trained language models (PLMs) achieves impressive performance on a range of downstream tasks, and their sizes have consequently been getting bigger. Since a different copy of the model is required for each task, this paradigm is infeasible for storage-constrained edge devices like mobile phones. In this paper, we propose SPARTAN, a parameter efficient (PE) and computationally fast architecture for edge devices that adds hierarchically organized sparse memory after each Transformer layer. SPARTAN freezes the PLM parameters and fine-tunes only its memory, thus significantly reducing storage costs by re-using the PLM backbone for different tasks. SPARTAN contains two levels of memory, with only a sparse subset of parents being chosen in the first level for each input, and children cells corresponding to those parents being used to compute an output representation. This sparsity combined with other architecture optimizations improves SPARTAN's throughput by over 90 during inference on a Raspberry Pi 4 when compared to PE baselines (adapters) while also outperforming the latter by 0.1 points on the GLUE benchmark. Further, it can be trained 34 within 0.9 points of adapters. Qualitative analysis shows that different parent cells in SPARTAN specialize in different topics, thus dividing responsibility efficiently.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2023

PVP: Pre-trained Visual Parameter-Efficient Tuning

Large-scale pre-trained transformers have demonstrated remarkable succes...
research
05/26/2023

Parameter-Efficient Fine-Tuning without Introducing New Latency

Parameter-efficient fine-tuning (PEFT) of pre-trained language models ha...
research
04/30/2023

Consolidator: Mergeable Adapter with Grouped Connections for Visual Adaptation

Recently, transformers have shown strong ability as visual feature extra...
research
08/28/2023

EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models

Large Language Models (LLMs) such as GPTs and LLaMa have ushered in a re...
research
10/14/2021

UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning

Conventional fine-tuning of pre-trained language models tunes all model ...
research
03/29/2022

Fine-tuning Image Transformers using Learnable Memory

In this paper we propose augmenting Vision Transformer models with learn...
research
09/22/2022

Layer Freezing Data Sieving: Missing Pieces of a Generic Framework for Sparse Training

Recently, sparse training has emerged as a promising paradigm for effici...

Please sign up or login with your details

Forgot password? Click here to reset