Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies

03/14/2023
by   Daniel Lawson, et al.
0

Recent work has shown the promise of creating generalist, transformer-based, policies for language, vision, and sequential decision-making problems. To create such models, we generally require centralized training objectives, data, and compute. It is of interest if we can more flexibly create generalist policies, by merging together multiple, task-specific, individually trained policies. In this work, we take a preliminary step in this direction through merging, or averaging, subsets of Decision Transformers in weight space trained on different MuJoCo locomotion problems, forming multi-task models without centralized training. We also propose that when merging policies, we can obtain better results if all policies start from common, pre-trained initializations, while also co-training on shared auxiliary tasks during problem-specific finetuning. In general, we believe research in this direction can help democratize and distribute the process of which forms generally capable agents.

READ FULL TEXT

page 3

page 4

page 6

research
04/28/2023

An Empirical Study of Multimodal Model Merging

Model merging (e.g., via interpolation or task arithmetic) fuses multipl...
research
12/19/2022

Dataless Knowledge Fusion by Merging Weights of Language Models

Fine-tuning pre-trained language models has become the prevalent paradig...
research
05/06/2022

Explaining the Effectiveness of Multi-Task Learning for Efficient Knowledge Extraction from Spine MRI Reports

Pretrained Transformer based models finetuned on domain specific corpora...
research
03/28/2022

Modular Adaptive Policy Selection for Multi-Task Imitation Learning through Task Division

Deep imitation learning requires many expert demonstrations, which can b...
research
04/21/2023

Spatial-Language Attention Policies for Efficient Robot Learning

We investigate how to build and train spatial representations for robot ...
research
04/28/2022

Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

Randomly masking and predicting word tokens has been a successful approa...

Please sign up or login with your details

Forgot password? Click here to reset