Wasserstein Contrastive Representation Distillation

12/15/2020
by   Liqun Chen, et al.
0

The primary goal of knowledge distillation (KD) is to encapsulate the information of a model learned from a teacher network into a student network, with the latter being more compact than the former. Existing work, e.g., using Kullback-Leibler divergence for distillation, may fail to capture important structural knowledge in the teacher network and often lacks the ability for feature generalization, particularly in situations when teacher and student are built to address different classification tasks. We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein distance for KD. The dual form is used for global knowledge transfer, yielding a contrastive learning objective that maximizes the lower bound of mutual information between the teacher and the student networks. The primal form is used for local contrastive knowledge transfer within a mini-batch, effectively matching the distributions of features between the teacher and the student networks. Experiments demonstrate that the proposed WCoRD method outperforms state-of-the-art approaches on privileged information distillation, model compression and cross-modal transfer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2019

Contrastive Representation Distillation

Often we wish to transfer representational knowledge from one neural net...
research
10/29/2021

Estimating and Maximizing Mutual Information for Knowledge Distillation

In this work, we propose Mutual Information Maximization Knowledge Disti...
research
06/13/2023

Enhanced Multimodal Representation Learning with Cross-modal KD

This paper explores the tasks of leveraging auxiliary modalities which a...
research
03/14/2023

A Contrastive Knowledge Transfer Framework for Model Compression and Transfer Learning

Knowledge Transfer (KT) achieves competitive performance and is widely u...
research
01/06/2022

Contrastive Neighborhood Alignment

We present Contrastive Neighborhood Alignment (CNA), a manifold learning...
research
03/29/2021

Complementary Relation Contrastive Distillation

Knowledge distillation aims to transfer representation ability from a te...
research
03/28/2018

Probabilistic Knowledge Transfer for Deep Representation Learning

Knowledge Transfer (KT) techniques tackle the problem of transferring th...

Please sign up or login with your details

Forgot password? Click here to reset