Derivative Free Weight-space Ensembling

07/07/2023
by   Dean Ninalga, et al.
0

Recent work suggests that interpolating between the weights of two specialized language models can transfer knowledge between tasks in a way that multi-task learning cannot. However, very few have explored interpolation between more than two models, where each has a distinct knowledge base. In this paper, we introduce Derivative Free Weight-space Ensembling (DFWE), a new few-sample task transfer approach for open-domain dialogue. Our framework creates a set of diverse expert language models trained using a predefined set of source tasks. Next, we finetune each of the expert models on the target task, approaching the target task from several distinct knowledge bases. Finally, we linearly interpolate between the model weights using a gradient-free-optimization algorithm, to efficiently find a good interpolation weighting. We demonstrate the effectiveness of the method on FETA-Friends outperforming the standard pretrain-finetune approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2021

MetaXT: Meta Cross-Task Transfer between Disparate Label Spaces

Albeit the universal representational power of pre-trained language mode...
research
12/19/2022

Dataless Knowledge Fusion by Merging Weights of Language Models

Fine-tuning pre-trained language models has become the prevalent paradig...
research
06/04/2023

RadLing: Towards Efficient Radiology Report Understanding

Most natural language tasks in the radiology domain use language models ...
research
10/10/2022

Knowledge Prompts: Injecting World Knowledge into Language Models through Soft Prompts

Soft prompts have been recently proposed as a tool for adapting large fr...
research
08/26/2023

Differentiable Weight Masks for Domain Transfer

One of the major drawbacks of deep learning models for computer vision h...
research
03/20/2022

Hierarchical Inductive Transfer for Continual Dialogue Learning

Pre-trained models have achieved excellent performance on the dialogue t...
research
10/06/2021

Self-Supervised Knowledge Assimilation for Expert-Layman Text Style Transfer

Expert-layman text style transfer technologies have the potential to imp...

Please sign up or login with your details

Forgot password? Click here to reset