lo-fi: distributed fine-tuning without communication

10/19/2022
by   Mitchell Wortsman, et al.
5

When fine-tuning large neural networks, it is common to use multiple nodes and to communicate gradients at each optimization step. By contrast, we investigate completely local fine-tuning, which we refer to as lo-fi. During lo-fi, each node is fine-tuned independently without any communication. Then, the weights are averaged across nodes at the conclusion of fine-tuning. When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step. We also observe that lo-fi matches the baseline's performance when fine-tuning OPT language models (up to 1.3B parameters) on Common Crawl. By removing the communication requirement, lo-fi reduces resource barriers for fine-tuning large models and enables fine-tuning in settings with prohibitive communication cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2022

Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

When transferring a pretrained model to a downstream task, two popular m...
research
08/26/2023

Adversarial Fine-Tuning of Language Models: An Iterative Optimisation Approach for the Generation and Detection of Problematic Content

In this paper, we tackle the emerging challenge of unintended harmful co...
research
02/14/2022

Orthogonalising gradients to speed up neural network optimisation

The optimisation of neural networks can be sped up by orthogonalising th...
research
09/10/2019

What do Deep Networks Like to Read?

Recent research towards understanding neural networks probes models in a...
research
08/29/2022

Assessing, testing and estimating the amount of fine-tuning by means of active information

A general framework is introduced to estimate how much external informat...
research
07/20/2022

Pretraining a Neural Network before Knowing Its Architecture

Training large neural networks is possible by training a smaller hyperne...
research
12/12/2022

CLIP Itself is a Strong Fine-tuner: Achieving 85.7 Accuracy with ViT-B and ViT-L on ImageNet

Recent studies have shown that CLIP has achieved remarkable success in p...

Please sign up or login with your details

Forgot password? Click here to reset