Merging Models with Fisher-Weighted Averaging

11/18/2021
by   Michael Matena, et al.
0

Transfer learning provides a way of leveraging knowledge from one task when learning another task. Performing transfer learning typically involves iteratively updating a model's parameters through gradient descent on a training dataset. In this paper, we introduce a fundamentally different method for transferring knowledge across models that amounts to "merging" multiple models into one. Our approach effectively involves computing a weighted average of the models' parameters. We show that this averaging is equivalent to approximately sampling from the posteriors of the model weights. While using an isotropic Gaussian approximation works well in some cases, we also demonstrate benefits by approximating the precision matrix via the Fisher information. In sum, our approach makes it possible to combine the "knowledge" in multiple models at an extremely low computational cost compared to standard gradient-based training. We demonstrate that model merging achieves comparable performance to gradient descent-based transfer learning on intermediate-task training and domain adaptation problems. We also show that our merging procedure makes it possible to combine models in previously unexplored ways. To measure the robustness of our approach, we perform an extensive ablation on the design of our algorithm.

READ FULL TEXT
research
09/21/2023

Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance

Stochastic Gradient Descent (SGD), a widely used optimization algorithm ...
research
12/19/2022

Dataless Knowledge Fusion by Merging Weights of Language Models

Fine-tuning pre-trained language models has become the prevalent paradig...
research
06/02/2023

Resolving Interference When Merging Models

Transfer learning - i.e., further fine-tuning a pre-trained model on a d...
research
02/03/2023

Interpretations of Domain Adaptations via Layer Variational Analysis

Transfer learning is known to perform efficiently in many applications e...
research
11/18/2021

Training Neural Networks with Fixed Sparse Masks

During typical gradient-based training of deep neural networks, all of t...
research
12/03/2018

Transferring Knowledge across Learning Processes

In complex transfer learning scenarios new tasks might not be tightly li...
research
07/28/2017

Human in the Loop: Interactive Passive Automata Learning via Evidence-Driven State-Merging Algorithms

We present an interactive version of an evidence-driven state-merging (E...

Please sign up or login with your details

Forgot password? Click here to reset