HyperTuning: Toward Adapting Large Language Models without Back-propagation

11/22/2022
by   Jason Phang, et al.
0

Fine-tuning large language models for different tasks can be costly and inefficient, and even methods that reduce the number of tuned parameters still require full gradient-based optimization. We propose HyperTuning, a novel approach to model adaptation that uses a hypermodel to generate task-specific parameters for a fixed downstream model. We demonstrate a simple setup for hypertuning with HyperT5, a T5-based hypermodel that produces soft prefixes or LoRA parameters for a frozen T5 model from few-shot examples. We train HyperT5 in two stages: first, hyperpretraining with a modified conditional language modeling objective that trains a hypermodel to generate parameters; second, multi-task fine-tuning (MTF) on a large number of diverse language tasks. We evaluate HyperT5 on P3, MetaICL and Super-NaturalInstructions datasets, and show that it can effectively generate parameters for unseen tasks. Moreover, we show that using hypermodel-generated parameters as initializations for further parameter-efficient fine-tuning improves performance. HyperTuning can thus be a flexible and efficient way to leverage large language models for diverse downstream applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2022

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Few-shot in-context learning (ICL) enables pre-trained language models t...
research
10/31/2022

AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

Standard fine-tuning of large pre-trained language models (PLMs) for dow...
research
07/06/2023

Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain

Adapting pretrained language models to novel domains, such as clinical a...
research
05/26/2023

PIP: Parse-Instructed Prefix for Syntactically Controlled Paraphrase Generation

Syntactically controlled paraphrase generation requires language models ...
research
05/22/2023

Small Language Models Improve Giants by Rewriting Their Outputs

Large language models (LLMs) have demonstrated impressive few-shot learn...
research
06/07/2021

A Simple Recipe for Multilingual Grammatical Error Correction

This paper presents a simple recipe to train state-of-the-art multilingu...
research
12/20/2021

Efficient Large Scale Language Modeling with Mixtures of Experts

Mixture of Experts layers (MoEs) enable efficient scaling of language mo...

Please sign up or login with your details

Forgot password? Click here to reset