Decoupled Kullback-Leibler Divergence Loss

05/23/2023
by   Jiequan Cui, et al.
0

In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and observe that it is equivalent to the Doupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error (wMSE) loss and 2) a Cross-Entropy loss incorporating soft labels. From our analysis of the DKL loss, we have identified two areas for improvement. Firstly, we address the limitation of DKL in scenarios like knowledge distillation by breaking its asymmetry property in training optimization. This modification ensures that the wMSE component is always effective during training, providing extra constructive cues. Secondly, we introduce global information into DKL for intra-class consistency regularization. With these two enhancements, we derive the Improved Kullback-Leibler (IKL) Divergence loss and evaluate its effectiveness by conducting experiments on CIFAR-10/100 and ImageNet datasets, focusing on adversarial training and knowledge distillation tasks. The proposed approach achieves new state-of-the-art performance on both tasks, demonstrating the substantial practical merits. Code and models will be available soon at https://github.com/jiequancui/DKL.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2023

Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels

IoU losses are surrogates that directly optimize the Jaccard index. In s...
research
05/19/2021

Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation

Knowledge distillation (KD), transferring knowledge from a cumbersome te...
research
03/16/2022

Decoupled Knowledge Distillation

State-of-the-art distillation methods are mainly based on distilling dee...
research
02/01/2021

Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective

Knowledge distillation is an effective approach to leverage a well-train...
research
12/05/2020

Knowledge Distillation Thrives on Data Augmentation

Knowledge distillation (KD) is a general deep neural network training fr...
research
09/14/2021

Exploring the Connection between Knowledge Distillation and Logits Matching

Knowledge distillation is a generalized logits matching technique for mo...
research
07/20/2023

LBL: Logarithmic Barrier Loss Function for One-class Classification

One-class classification (OCC) aims to train a classifier only with the ...

Please sign up or login with your details

Forgot password? Click here to reset