LoRA: Low-Rank Adaptation of Large Language Models

06/17/2021
by   Edward J. Hu, et al.
13

The dominant paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, conventional fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example, deploying many independent instances of fine-tuned models, each with 175B parameters, is extremely expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning. LoRA performs on-par or better than fine-tuning in model quality on both GPT-3 and GPT-2, despite having fewer trainable parameters, a higher training throughput, and no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptations, which sheds light on the efficacy of LoRA. We release our implementation in GPT-2 at https://github.com/microsoft/LoRA .

READ FULL TEXT

page 8

page 9

page 18

page 19

page 20

research
06/09/2023

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection

Self-supervised speech models are a rapidly developing research topic in...
research
10/08/2022

AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models

There are growing interests in adapting large-scale language models usin...
research
03/23/2022

A Scalable Model Specialization Framework for Training and Inference using Submodels and its Application to Speech Model Personalization

Model fine-tuning and adaptation have become a common approach for model...
research
10/21/2021

Memory Efficient Adaptive Attention For Multiple Domain Learning

Training CNNs from scratch on new domains typically demands large number...
research
09/13/2023

Hydra: Multi-head Low-rank Adaptation for Parameter Efficient Fine-tuning

The recent surge in large-scale foundation models has spurred the develo...
research
05/04/2023

Cuttlefish: Low-Rank Model Training without All the Tuning

Recent research has shown that training low-rank neural networks can eff...
research
03/18/2023

Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

Fine-tuning large pre-trained language models on downstream tasks has be...

Please sign up or login with your details

Forgot password? Click here to reset