MEAL V2: Boosting Vanilla ResNet-50 to 80 without Tricks

09/17/2020
by   Zhiqiang Shen, et al.
0

In this paper, we introduce a simple yet effective approach that can boost the vanilla ResNet-50 to 80 Generally, our method is based on the recently proposed MEAL, i.e., ensemble knowledge distillation via discriminators. We further simplify it through 1) adopting the similarity loss and discriminator only on the final outputs and 2) using the average of softmax probabilities from all teacher ensembles as the stronger supervision for distillation. One crucial perspective of our method is that the one-hot/hard label should not be used in the distillation process. We show that such a simple framework can achieve state-of-the-art results without involving any commonly-used techniques, such as 1) architecture modification; 2) outside training data beyond ImageNet; 3) autoaug/randaug; 4) cosine learning rate; 5) mixup/cutmix training; 6) label smoothing; etc. On ImageNet, our method obtains 80.67 the vanilla ResNet-50, outperforming the previous state-of-the-arts by a remarkable margin under the same network structure. Our result can be regarded as a new strong baseline on ResNet-50 using knowledge distillation. To our best knowledge, this is the first work that is able to boost vanilla ResNet-50 to surpass 80 training data. Our code and models are available at: https://github.com/szq0214/MEAL-V2.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale

The tremendous success of large models trained on extensive datasets dem...
research
12/02/2021

A Fast Knowledge Distillation Framework for Visual Recognition

While Knowledge Distillation (KD) has been recognized as a useful tool i...
research
10/01/2021

ResNet strikes back: An improved training procedure in timm

The influential Residual Networks designed by He et al. remain the gold-...
research
03/28/2023

DisWOT: Student Architecture Search for Distillation WithOut Training

Knowledge distillation (KD) is an effective training strategy to improve...
research
06/20/2020

Paying more attention to snapshots of Iterative Pruning: Improving Model Compression via Ensemble Distillation

Network pruning is one of the most dominant methods for reducing the hea...
research
07/26/2022

Efficient One Pass Self-distillation with Zipf's Label Smoothing

Self-distillation exploits non-uniform soft supervision from itself duri...
research
04/07/2022

Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results

ImageNet serves as the primary dataset for evaluating the quality of com...

Please sign up or login with your details

Forgot password? Click here to reset