Exploring Mode Connectivity for Pre-trained Language Models

10/25/2022
by   Yujia Qin, et al.
0

Recent years have witnessed the prevalent application of pre-trained language models (PLMs) in NLP. From the perspective of parameter space, PLMs provide generic initialization, starting from which high-performance minima could be found. Although plenty of works have studied how to effectively and efficiently adapt PLMs to high-performance minima, little is known about the connection of various minima reached under different adaptation configurations. In this paper, we investigate the geometric connections of different minima through the lens of mode connectivity, which measures whether two minima can be connected with a low-loss path. We conduct empirical analyses to investigate three questions: (1) how could hyperparameters, specific tuning methods, and training data affect PLM's mode connectivity? (2) How does mode connectivity change during pre-training? (3) How does the PLM's task knowledge change along the path connecting two minima? In general, exploring the mode connectivity of PLMs conduces to understanding the geometric connection of different minima, which may help us fathom the inner workings of PLM downstream adaptation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/24/2023

Geodesic Mode Connectivity

Mode connectivity is a phenomenon where trained models are connected by ...
research
06/18/2018

Using Mode Connectivity for Loss Landscape Analysis

Mode connectivity is a recently introduced frame- work that empirically ...
research
04/30/2020

Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness

Mode connectivity provides novel geometric insights on analyzing loss la...
research
05/15/2023

Recyclable Tuning for Continual Pre-training

Continual pre-training is the paradigm where pre-trained language models...
research
06/14/2021

Revisiting Model Stitching to Compare Neural Representations

We revisit and extend model stitching (Lenc Vedaldi 2015) as a metho...
research
02/25/2021

Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling

With a better understanding of the loss surfaces for multilayer networks...
research
08/22/2023

Mode Combinability: Exploring Convex Combinations of Permutation Aligned Models

We explore element-wise convex combinations of two permutation-aligned n...

Please sign up or login with your details

Forgot password? Click here to reset