Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones

03/10/2021
by   Cheng Cui, et al.
0

Recently, research efforts have been concentrated on revealing how pre-trained model makes a difference in neural network performance. Self-supervision and semi-supervised learning technologies have been extensively explored by the community and are proven to be of great potential in obtaining a powerful pre-trained model. However, these models require huge training costs (i.e., hundreds of millions of images or training iterations). In this paper, we propose to improve existing baseline networks via knowledge distillation from off-the-shelf pre-trained big powerful models. Different from existing knowledge distillation frameworks which require student model to be consistent with both soft-label generated by teacher model and hard-label annotated by humans, our solution performs distillation by only driving prediction of the student model consistent with that of the teacher model. Therefore, our distillation setting can get rid of manually labeled data and can be trained with extra unlabeled data to fully exploit capability of teacher model for better learning. We empirically find that such simple distillation settings perform extremely effective, for example, the top-1 accuracy on ImageNet-1k validation set of MobileNetV3-large and ResNet50-D can be significantly improved from 75.2 have also thoroughly analyzed what are dominant factors that affect the distillation performance and how they make a difference. Extensive downstream computer vision tasks, including transfer learning, object detection and semantic segmentation, can significantly benefit from the distilled pretrained models. All our experiments are implemented based on PaddlePaddle, codes and a series of improved pretrained models with ssld suffix are available in PaddleClas.

READ FULL TEXT
research
03/23/2021

Student Network Learning via Evolutionary Knowledge Distillation

Knowledge distillation provides an effective way to transfer knowledge v...
research
12/01/2018

Snapshot Distillation: Teacher-Student Optimization in One Generation

Optimizing a deep neural network is a fundamental task in computer visio...
research
10/04/2019

Distilling Transformers into Simple Neural Networks with Unlabeled Transfer Data

Recent advances in pre-training huge models on large amounts of text thr...
research
12/09/2022

Co-training 2^L Submodels for Visual Recognition

We introduce submodel co-training, a regularization method related to co...
research
12/05/2021

Safe Distillation Box

Knowledge distillation (KD) has recently emerged as a powerful strategy ...
research
04/13/2020

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Deep neural models in recent years have been successful in almost every ...
research
02/24/2023

A Knowledge Distillation framework for Multi-Organ Segmentation of Medaka Fish in Tomographic Image

Morphological atlases are an important tool in organismal studies, and m...

Please sign up or login with your details

Forgot password? Click here to reset