Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability

03/10/2022
by   Ruifei He, et al.
0

Large-scale pre-training has been proven to be crucial for various computer vision tasks. However, with the increase of pre-training data amount, model architecture amount, and the private/inaccessible data, it is not very efficient or possible to pre-train all the model architectures on large-scale datasets. In this work, we investigate an alternative strategy for pre-training, namely Knowledge Distillation as Efficient Pre-training (KDEP), aiming to efficiently transfer the learned feature representation from existing pre-trained models to new student models for future downstream tasks. We observe that existing Knowledge Distillation (KD) methods are unsuitable towards pre-training since they normally distill the logits that are going to be discarded when transferred to downstream tasks. To resolve this problem, we propose a feature-based KD method with non-parametric feature dimension aligning. Notably, our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time. Code is available at https://github.com/CVMI-Lab/KDEP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2022

Insights into Pre-training via Simpler Synthetic Tasks

Pre-training produces representations that are effective for a wide rang...
research
03/12/2022

ELLE: Efficient Lifelong Pre-training for Emerging Data

Current pre-trained language models (PLM) are typically trained with sta...
research
07/22/2021

Multi-stage Pre-training over Simplified Multimodal Pre-training Models

Multimodal pre-training models, such as LXMERT, have achieved excellent ...
research
08/29/2023

Reprogramming under constraints: Revisiting efficient and reliable transferability of lottery tickets

In the era of foundation models with huge pre-training budgets, the down...
research
03/27/2023

Generalizable Local Feature Pre-training for Deformable Shape Analysis

Transfer learning is fundamental for addressing problems in settings wit...
research
10/05/2021

Exploring the Limits of Large Scale Pre-training

Recent developments in large-scale machine learning suggest that by scal...
research
11/30/2021

Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data

Pre-training models on Imagenet or other massive datasets of real images...

Please sign up or login with your details

Forgot password? Click here to reset