Enhancing Data-Free Adversarial Distillation with Activation Regularization and Virtual Interpolation

by   Xiaoyang Qu, et al.

Knowledge distillation refers to a technique of transferring the knowledge from a large learned model or an ensemble of learned models to a small model. This method relies on access to the original training set, which might not always be available. A possible solution is a data-free adversarial distillation framework, which deploys a generative network to transfer the teacher model's knowledge to the student model. However, the data generation efficiency is low in the data-free adversarial distillation. We add an activation regularizer and a virtual interpolation method to improve the data generation efficiency. The activation regularizer enables the students to match the teacher's predictions close to activation boundaries and decision boundaries. The virtual interpolation method can generate virtual samples and labels in-between decision boundaries. Our experiments show that our approach surpasses state-of-the-art data-free distillation methods. The student model can achieve 95.42 without any original training data. Our model's accuracy is 13.8 the state-of-the-art data-free method on CIFAR-100.


Similarity Transfer for Knowledge Distillation

Knowledge distillation is a popular paradigm for learning portable neura...

Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons

An activation boundary for a neuron refers to a separating hyperplane th...

Improving Knowledge Distillation with Supporting Adversarial Samples

Many recent works on knowledge distillation have provided ways to transf...

Data-Free Knowledge Distillation with Soft Targeted Transfer Set Synthesis

Knowledge distillation (KD) has proved to be an effective approach for d...

Generative Adversarial Simulator

Knowledge distillation between machine learning models has opened many n...

Data-Efficient Ranking Distillation for Image Retrieval

Recent advances in deep learning has lead to rapid developments in the f...

Up to 100x Faster Data-free Knowledge Distillation

Data-free knowledge distillation (DFKD) has recently been attracting inc...