Entropy Penalty: Towards Generalization Beyond the IID Assumption

10/01/2019
by   Devansh Arpit, et al.
11

It has been shown that instead of learning actual object features, deep networks tend to exploit non-robust (spurious) discriminative features that are shared between training and test sets. Therefore, while they achieve state of the art performance on such test sets, they achieve poor generalization on out of distribution (OOD) samples where the IID (independent, identical distribution) assumption breaks and the distribution of non-robust features shifts. Through theoretical and empirical analysis, we show that this happens because maximum likelihood training (without appropriate regularization) leads the model to depend on all the correlations (including spurious ones) present between inputs and targets in the dataset. We then show evidence that the information bottleneck (IB) principle can address this problem. To do so, we propose a regularization approach based on IB, called Entropy Penalty, that reduces the model's dependence on spurious features-- features corresponding to such spurious correlations. This allows deep networks trained with Entropy Penalty to generalize well even under distribution shift of spurious features. As a controlled test-bed for evaluating our claim, we train deep networks with Entropy Penalty on a colored MNIST (C-MNIST) dataset and show that it is able to generalize well on vanilla MNIST, MNIST-M and SVHN datasets in addition to an OOD version of C-MNIST itself. The baseline regularization methods we compare against fail to generalize on this test-bed. Our code is available at https://github.com/salesforce/EntropyPenalty.

READ FULL TEXT
research
01/31/2021

Deep Deterministic Information Bottleneck with Matrix-based Entropy Functional

We introduce the matrix-based Renyi's α-order entropy functional to para...
research
09/26/2019

Stochastic Weight Matrix-based Regularization Methods for Deep Neural Networks

The aim of this paper is to introduce two widely applicable regularizati...
research
02/13/2020

The Conditional Entropy Bottleneck

Much of the field of Machine Learning exhibits a prominent set of failur...
research
07/25/2022

Domain Decorrelation with Potential Energy Ranking

Machine learning systems, especially the methods based on deep learning,...
research
11/28/2022

Beyond Invariance: Test-Time Label-Shift Adaptation for Distributions with "Spurious" Correlations

Spurious correlations, or correlations that change across domains where ...
research
10/16/2017

Generalization in Deep Learning

This paper explains why deep learning can generalize well, despite large...
research
09/30/2022

MaskTune: Mitigating Spurious Correlations by Forcing to Explore

A fundamental challenge of over-parameterized deep learning models is le...

Please sign up or login with your details

Forgot password? Click here to reset