Scalable Transfer Learning with Expert Models

by   Joan Puigcerver, et al.

Transfer of pre-trained representations can improve sample efficiency and reduce computational requirements for new tasks. However, representations used for transfer are usually generic, and are not tailored to a particular distribution of downstream tasks. We explore the use of expert representations for transfer with a simple, yet effective, strategy. We train a diverse set of experts by exploiting existing label structures, and use cheap-to-compute performance proxies to select the relevant expert for each target task. This strategy scales the process of transferring to new tasks, since it does not revisit the pre-training data during transfer. Accordingly, it requires little extra compute per target task, and results in a speed-up of 2-3 orders of magnitude compared to competing approaches. Further, we provide an adapter-based architecture able to compress many experts into a single model. We evaluate our approach on two different data sources and demonstrate that it outperforms baselines on over 20 diverse vision tasks in both cases.


page 2

page 15

page 16

page 17

page 19

page 20

page 27


Scalable Neural Data Server: A Data Recommender for Transfer Learning

Absence of large-scale labeled data in the practitioner's target domain ...

Large Scale Learning of General Visual Representations for Transfer

Transfer of pre-trained representations improves sample efficiency and s...

Domain Adaptive Transfer Learning with Specialist Models

Transfer learning is a widely used method to build high performing compu...

Task-Specific Expert Pruning for Sparse Mixture-of-Experts

The sparse Mixture-of-Experts (MoE) model is powerful for large-scale pr...

Which Model to Transfer? Finding the Needle in the Growing Haystack

Transfer learning has been recently popularized as a data-efficient alte...

Finding Experts in Transformer Models

In this work we study the presence of expert units in pre-trained Transf...

Hyper-Representations for Pre-Training and Transfer Learning

Learning representations of neural network weights given a model zoo is ...