MUSE: Feature Self-Distillation with Mutual Information and Self-Information

10/25/2021
by   Yu Gong, et al.
6

We present a novel information-theoretic approach to introduce dependency among features of a deep convolutional neural network (CNN). The core idea of our proposed method, called MUSE, is to combine MUtual information and SElf-information to jointly improve the expressivity of all features extracted from different layers in a CNN. We present two variants of the realization of MUSE – Additive Information and Multiplicative Information. Importantly, we argue and empirically demonstrate that MUSE, compared to other feature discrepancy functions, is a more functional proxy to introduce dependency and effectively improve the expressivity of all features in the knowledge distillation framework. MUSE achieves superior performance over a variety of popular architectures and feature discrepancy functions for self-distillation and online distillation, and performs competitively with the state-of-the-art methods for offline distillation. MUSE is also demonstrably versatile that enables it to be easily extended to CNN-based models on tasks other than image classification such as object detection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2019

Variational Information Distillation for Knowledge Transfer

Transferring knowledge from a teacher neural network pretrained on the s...
research
12/01/2021

Information Theoretic Representation Distillation

Despite the empirical success of knowledge distillation, there still lac...
research
09/18/2023

Heterogeneous Generative Knowledge Distillation with Masked Image Modeling

Small CNN-based models usually require transferring knowledge from a lar...
research
07/23/2022

Online Knowledge Distillation via Mutual Contrastive Learning for Visual Recognition

The teacher-free online Knowledge Distillation (KD) aims to train an ens...
research
07/24/2023

A Good Student is Cooperative and Reliable: CNN-Transformer Collaborative Learning for Semantic Segmentation

In this paper, we strive to answer the question "how to collaboratively ...
research
04/07/2021

Farewell to Mutual Information: Variational Distillation for Cross-Modal Person Re-Identification

The Information Bottleneck (IB) provides an information theoretic princi...
research
04/03/2021

Mutual Graph Learning for Camouflaged Object Detection

Automatically detecting/segmenting object(s) that blend in with their su...

Please sign up or login with your details

Forgot password? Click here to reset