Knowledge is a Region in Weight Space for Fine-tuned Language Models

02/09/2023
by   Almog Gueta, et al.
0

Research on neural networks has largely focused on understanding a single model trained on a single dataset. However, relatively little is known about the relationships between different models, especially those trained or tested on different datasets. We address this by studying how the weight space and underlying loss landscape of different models are interconnected. Specifically, we demonstrate that fine-tuned models that were optimized for high performance, reside in well-defined regions in weight space, and vice versa – that any model that resides anywhere in those regions also has high performance. Specifically, we show that language models that have been fine-tuned on the same dataset form a tight cluster in the weight space and that models fine-tuned on different datasets from the same underlying task form a looser cluster. Moreover, traversing around the region between the models reaches new models that perform comparably or even better than models found via fine-tuning, even on tasks that the original models were not fine-tuned on. Our findings provide insight into the relationships between models, demonstrating that a model positioned between two similar models can acquire the knowledge of both. We leverage this finding and design a method to pick a better model for efficient fine-tuning. Specifically, we show that starting from the center of the region is as good or better than the pre-trained model in 11 of 12 datasets and improves accuracy by 3.06 on average.

READ FULL TEXT

page 2

page 5

page 7

page 10

page 17

page 19

page 20

page 21

research
08/06/2020

Better Fine-Tuning by Reducing Representational Collapse

Although widely adopted, existing approaches for fine-tuning pre-trained...
research
04/24/2023

AMR Parsing with Instruction Fine-tuned Pre-trained Language Models

Instruction fine-tuned language models on a collection of instruction an...
research
06/05/2023

Analyzing Syntactic Generalization Capacity of Pre-trained Language Models on Japanese Honorific Conversion

Using Japanese honorifics is challenging because it requires not only kn...
research
03/22/2018

What do Deep Networks Like to See?

We propose a novel way to measure and understand convolutional neural ne...
research
01/25/2017

Universal representations:The missing link between faces, text, planktons, and cat breeds

With the advent of large labelled datasets and high-capacity models, the...
research
08/14/2023

Platypus: Quick, Cheap, and Powerful Refinement of LLMs

We present Platypus, a family of fine-tuned and merged Large Language Mo...
research
05/22/2023

Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models

Task arithmetic has recently emerged as a cost-effective and scalable ap...

Please sign up or login with your details

Forgot password? Click here to reset