Improving Generalization and Robustness with Noisy Collaboration in Knowledge Distillation

10/11/2019
by   Elahe Arani, et al.
17

Inspired by trial-to-trial variability in the brain that can result from multiple noise sources, we introduce variability through noise at different levels in a knowledge distillation framework. We introduce "Fickle Teacher" which provides variable supervision signals to the student for the same input. We observe that the response variability from the teacher results in a significant generalization improvement in the student. We further propose "Soft-Randomization" as a novel technique for improving robustness to input variability in the student. This minimizes the dissimilarity between the student's distribution on noisy data with teacher's distribution on clean data. We show that soft-randomization, even with low noise intensity, improves the robustness significantly with minimal drop in generalization. Lastly, we propose a new technique, "Messy-collaboration", which introduces target variability, whereby student and/or teacher are trained with randomly corrupted labels. We find that supervision from a corrupted teacher improves the adversarial robustness of student significantly while preserving its generalization and natural robustness. Our extensive empirical results verify the effectiveness of adding constructive noise in the knowledge distillation framework for improving the generalization and robustness of the model.

READ FULL TEXT

page 4

page 5

page 6

page 7

page 12

page 13

page 14

page 15

research
02/22/2023

Distilling Calibrated Student from an Uncalibrated Teacher

Knowledge distillation is a common technique for improving the performan...
research
10/08/2019

Knowledge Distillation from Internal Representations

Knowledge distillation is typically conducted by training a small model ...
research
10/22/2022

Hard Gate Knowledge Distillation – Leverage Calibration for Robust and Reliable Language Model

In knowledge distillation, a student model is trained with supervisions ...
research
06/07/2023

Faithful Knowledge Distillation

Knowledge distillation (KD) has received much attention due to its succe...
research
09/15/2020

Noisy Self-Knowledge Distillation for Text Summarization

In this paper we apply self-knowledge distillation to text summarization...
research
06/09/2021

Reliable Adversarial Distillation with Unreliable Teachers

In ordinary distillation, student networks are trained with soft labels ...
research
10/15/2022

RoS-KD: A Robust Stochastic Knowledge Distillation Approach for Noisy Medical Imaging

AI-powered Medical Imaging has recently achieved enormous attention due ...

Please sign up or login with your details

Forgot password? Click here to reset