SMILE: Self-Distilled MIxup for Efficient Transfer LEarning

03/25/2021
by   Xingjian Li, et al.
0

To improve the performance of deep learning, mixup has been proposed to force the neural networks favoring simple linear behaviors in-between training samples. Performing mixup for transfer learning with pre-trained models however is not that simple, a high capacity pre-trained model with a large fully-connected (FC) layer could easily overfit to the target dataset even with samples-to-labels mixed up. In this work, we propose SMILE - Self-Distilled Mixup for EffIcient Transfer LEarning. With mixed images as inputs, SMILE regularizes the outputs of CNN feature extractors to learn from the mixed feature vectors of inputs (sample-to-feature mixup), in addition to the mixed labels. Specifically, SMILE incorporates a mean teacher, inherited from the pre-trained model, to provide the feature vectors of input samples in a self-distilling fashion, and mixes up the feature vectors accordingly via a novel triplet regularizer. The triple regularizer balances the mixup effects in both feature and label spaces while bounding the linearity in-between samples for pre-training tasks. Extensive experiments have been done to verify the performance improvement made by SMILE, in comparisons with a wide spectrum of transfer learning algorithms, including fine-tuning, L2-SP, DELTA, and RIFLE, even with mixup strategies combined. Ablation studies show that the vanilla sample-to-label mixup strategies could marginally increase the linearity in-between training samples but lack of generalizability, while SMILE significantly improve the mixup effects in both label and feature spaces with both training and testing datasets. The empirical observations backup our design intuition and purposes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2020

RIFLE: Backpropagation in Depth for Deep Transfer Learning through Re-Initializing the Fully-connected LayEr

Fine-tuning the deep convolution neural network(CNN) using a pre-trained...
research
12/08/2022

Editing Models with Task Arithmetic

Changing how pre-trained models behave – e.g., improving their performan...
research
06/09/2021

AutoFT: Automatic Fine-Tune for Parameters Transfer Learning in Click-Through Rate Prediction

Recommender systems are often asked to serve multiple recommendation sce...
research
02/25/2021

Self-Tuning for Data-Efficient Deep Learning

Deep learning has made revolutionary advances to diverse applications in...
research
11/18/2019

Towards Making Deep Transfer Learning Never Hurt

Transfer learning have been frequently used to improve deep neural netwo...
research
06/12/2022

PAC-Net: A Model Pruning Approach to Inductive Transfer Learning

Inductive transfer learning aims to learn from a small amount of trainin...
research
01/26/2019

DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks

Transfer learning through fine-tuning a pre-trained neural network with ...

Please sign up or login with your details

Forgot password? Click here to reset