Dataless Knowledge Fusion by Merging Weights of Language Models

12/19/2022
by   Xisen Jin, et al.
0

Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. Oftentimes fine-tuned models are readily available but their training data is not, due to data privacy or intellectual property concerns. This creates a barrier to fusing knowledge across individual models to yield a better single model. In this paper, we study the problem of merging individual models built on different training data sets to obtain a single model that performs well both across all data set domains and can generalize on out-of-domain data. We propose a dataless knowledge fusion method that merges models in their parameter space, guided by weights that minimize prediction differences between the merged model and the individual models. Over a battery of evaluation settings, we show that the proposed method significantly outperforms baselines such as Fisher-weighted averaging or model ensembling. Further, we find that our method is a promising alternative to multi-task learning that can preserve or sometimes improve over the individual models without access to the training data. Finally, model merging is more efficient than training a multi-task model, thus making it applicable to a wider set of scenarios.

READ FULL TEXT

page 17

page 18

research
03/29/2020

Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining

Pre-trained neural language models bring significant improvement for var...
research
10/18/2022

Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models

Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attac...
research
03/14/2023

Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies

Recent work has shown the promise of creating generalist, transformer-ba...
research
11/18/2021

Merging Models with Fisher-Weighted Averaging

Transfer learning provides a way of leveraging knowledge from one task w...
research
07/07/2023

Derivative Free Weight-space Ensembling

Recent work suggests that interpolating between the weights of two speci...
research
06/07/2021

GAN Cocktail: mixing GANs without dataset access

Today's generative models are capable of synthesizing high-fidelity imag...
research
12/08/2019

Individual predictions matter: Assessing the effect of data ordering in training fine-tuned CNNs for medical imaging

We reproduced the results of CheXNet with fixed hyperparameters and 50 d...

Please sign up or login with your details

Forgot password? Click here to reset