ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning

12/02/2022
by   Shachar Don-Yehiya, et al.
0

Pretraining has been shown to scale well with compute, data size and data diversity. Multitask learning trains on a mixture of supervised datasets and produces improved performance compared to self-supervised pretraining. Until now, massively multitask learning required simultaneous access to all datasets in the mixture and heavy compute resources that are only available to well-resourced teams. In this paper, we propose ColD Fusion, a method that provides the benefits of multitask learning but leverages distributed computation and requires limited communication and no sharing of data. Consequentially, ColD Fusion can create a synergistic loop, where finetuned models can be recycled to continually improve the pretrained model they are based on. We show that ColD Fusion yields comparable benefits to multitask pretraining by producing a model that (a) attains strong performance on all of the datasets it was multitask trained on and (b) is a better starting point for finetuning on unseen datasets. We find ColD Fusion outperforms RoBERTa and even previous multitask models. Specifically, when training and testing on 35 diverse datasets, ColD Fusion-based model outperforms RoBERTa by 2.45 points in average without any changes to the architecture.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2023

Self-Supervised Pretraining on Paired Sequences of fMRI Data for Transfer Learning to Brain Decoding Tasks

In this work we introduce a self-supervised pretraining framework for tr...
research
03/23/2021

Self-Supervised Pretraining Improves Self-Supervised Pretraining

While self-supervised pretraining has proven beneficial for many compute...
research
08/25/2023

Learning Compact Neural Networks with Deep Overparameterised Multitask Learning

Compact neural network offers many benefits for real-world applications....
research
05/05/2023

Reduction of Class Activation Uncertainty with Background Information

Multitask learning is a popular approach to training high-performing neu...
research
07/17/2018

Hierarchical Multitask Learning for CTC-based Speech Recognition

Previous work has shown that neural encoder-decoder speech recognition c...
research
06/25/2022

Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction

This work presents a multitask approach to the simultaneous estimation o...
research
02/27/2023

Linear pretraining in recurrent mixture density networks

We present a method for pretraining a recurrent mixture density network ...

Please sign up or login with your details

Forgot password? Click here to reset