Collision Cross-entropy and EM Algorithm for Self-labeled Classification

03/13/2023
by   Zhongwen Zhang, et al.
0

We propose "collision cross-entropy" as a robust alternative to the Shannon's cross-entropy in the context of self-labeled classification with posterior models. Assuming unlabeled data, self-labeling works by estimating latent pseudo-labels, categorical distributions y, that optimize some discriminative clustering criteria, e.g. "decisiveness" and "fairness". All existing self-labeled losses incorporate Shannon's cross-entropy term targeting the model prediction, softmax, at the estimated distribution y. In fact, softmax is trained to mimic the uncertainty in y exactly. Instead, we propose the negative log-likelihood of "collision" to maximize the probability of equality between two random variables represented by distributions softmax and y. We show that our loss satisfies some properties of a generalized cross-entropy. Interestingly, it agrees with the Shannon's cross-entropy for one-hot pseudo-labels y, but the training from softer labels weakens. For example, if y is a uniform distribution at some data point, it has zero contribution to the training. Our self-labeling loss combining collision cross entropy with basic clustering criteria is convex w.r.t. pseudo-labels, but non-trivial to optimize over the probability simplex. We derive a practical EM algorithm optimizing pseudo-labels y significantly faster than generic methods, e.g. the projectile gradient descent. The collision cross-entropy consistently improves the results on multiple self-labeled clustering examples using different DNNs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2022

Semi-supervised Learning using Robust Loss

The amount of manually labeled data is limited in medical applications, ...
research
01/26/2023

Revisiting Discriminative Entropy Clustering and its relation to K-means

Maximization of mutual information between the model's input and output ...
research
10/28/2022

An Approach for Noisy, Crowdsourced Datasets Utilizing Ensemble Modeling, 'Human Softmax' Distributions, and Entropic Measures of Uncertainty

Noisy, crowdsourced image datasets prove challenging, even for the best ...
research
03/22/2022

A Quantitative Comparison between Shannon and Tsallis Havrda Charvat Entropies Applied to Cancer Outcome Prediction

In this paper, we propose to quantitatively compare loss functions based...
research
11/24/2012

Detection of elliptical shapes via cross-entropy clustering

The problem of finding elliptical shapes in an image will be considered....
research
10/14/2020

Temperature check: theory and practice for training models with softmax-cross-entropy losses

The softmax function combined with a cross-entropy loss is a principled ...
research
08/07/2022

Preserving Fine-Grain Feature Information in Classification via Entropic Regularization

Labeling a classification dataset implies to define classes and associat...

Please sign up or login with your details

Forgot password? Click here to reset