Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF

09/16/2023
by   Simeng Sun, et al.
0

During the last stage of RLHF, a large language model is aligned to human intents via PPO training, a process that generally requires large-scale computational resources. In this technical report, we empirically investigate an efficient implementation of RLHF using low-rank adaptation (LoRA), which allows us to align the LLaMA 7B checkpoint on the Alpaca dataset using only two A100 GPUs instead of the eight required for full model fine-tuning. Despite tuning only 0.2 performance than the publicly-released AlpacaFarm checkpoint with full model fine-tuning. Next, we analyze several configurations of our LoRA-based PPO implementation, varying the form of the KL regularization term in the training objective. We find that (1) removing this penalty term does not harm performance on the AlpacaFarm evaluation set under our LoRA setup; (2) other regularizers, such as Jensen-Shannon divergence, lead to improved performance; and (3) while PPO training negatively impacts the factuality of model-generated responses, training with LoRA largely mitigates this effect. We release our code and pretrained checkpoints to facilitate future research on more efficient RLHF.

READ FULL TEXT
research
08/07/2023

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning

The low-rank adaptation (LoRA) method can largely reduce the amount of t...
research
12/20/2022

KronA: Parameter Efficient Tuning with Kronecker Adapter

Fine-tuning a Pre-trained Language Model (PLM) on a specific downstream ...
research
04/30/2020

TRP: Trained Rank Pruning for Efficient Deep Neural Networks

To enable DNNs on edge devices like mobile phones, low-rank approximatio...
research
09/13/2023

Hydra: Multi-head Low-rank Adaptation for Parameter Efficient Fine-tuning

The recent surge in large-scale foundation models has spurred the develo...
research
06/13/2023

INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation

We introduce a method that dramatically reduces fine-tuning VRAM require...
research
09/02/2022

Petals: Collaborative Inference and Fine-tuning of Large Models

Many NLP tasks benefit from using large language models (LLMs) that ofte...
research
10/14/2022

DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation

With the ever-growing size of pre-trained models (PMs), fine-tuning them...

Please sign up or login with your details

Forgot password? Click here to reset