Deep Ensembles for Low-Data Transfer Learning

10/14/2020
by   Basil Mustafa, et al.
19

In the low-data regime, it is difficult to train good supervised models from scratch. Instead practitioners turn to pre-trained models, leveraging transfer learning. Ensembling is an empirically and theoretically appealing way to construct powerful predictive models, but the predominant approach of training multiple deep networks with different random initialisations collides with the need for transfer via pre-trained weights. In this work, we study different ways of creating ensembles from pre-trained models. We show that the nature of pre-training itself is a performant source of diversity, and propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset. The approach is simple: Use nearest-neighbour accuracy to rank pre-trained models, fine-tune the best ones with a small hyperparameter sweep, and greedily construct an ensemble to minimise validation cross-entropy. When evaluated together with strong baselines on 19 different downstream tasks (the Visual Task Adaptation Benchmark), this achieves state-of-the-art performance at a much lower inference budget, even when selecting from over 2,000 pre-trained models. We also assess our ensembles on ImageNet variants and show improved robustness to distribution shift.

READ FULL TEXT

page 18

page 19

research
03/06/2023

To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in Transfer Learning

Transfer learning and ensembling are two popular techniques for improvin...
research
02/22/2021

LogME: Practical Assessment of Pre-trained Models for Transfer Learning

This paper studies task adaptive pre-trained model selection, an underex...
research
06/08/2022

Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models

Transfer learning aims to leverage knowledge from pre-trained models to ...
research
06/24/2022

Out of distribution robustness with pre-trained Bayesian neural networks

We develop ShiftMatch, a new training-data-dependent likelihood for out ...
research
01/19/2022

Enhanced Performance of Pre-Trained Networks by Matched Augmentation Distributions

There exists a distribution discrepancy between training and testing, in...
research
06/08/2015

Learning to Select Pre-Trained Deep Representations with Bayesian Evidence Framework

We propose a Bayesian evidence framework to facilitate transfer learning...
research
03/07/2023

Introspective Cross-Attention Probing for Lightweight Transfer of Pre-trained Models

We propose InCA, a lightweight method for transfer learning that cross-a...

Please sign up or login with your details

Forgot password? Click here to reset