DeepAI AI Chat
Log In Sign Up

Pareto-optimal data compression for binary classification tasks

by   Max Tegmark, et al.

The goal of lossy data compression is to reduce the storage cost of a data set X while retaining as much information as possible about something (Y) that you care about. For example, what aspects of an image X contain the most information about whether it depicts a cat? Mathematically, this corresponds to finding a mapping X→ Z≡ f(X) that maximizes the mutual information I(Z,Y) while the entropy H(Z) is kept below some fixed threshold. We present a method for mapping out the Pareto frontier for classification tasks, reflecting the tradeoff between retained entropy and class information. We first show how a random variable X (an image, say) drawn from a class Y∈{1,...,n} can be distilled into a vector W=f(X)∈R^n-1 losslessly, so that I(W,Y)=I(X,Y); for example, for a binary classification task of cats and dogs, each image X is mapped into a single real number W retaining all information that helps distinguish cats from dogs. For the n=2 case of binary classification, we then show how W can be further compressed into a discrete variable Z=g_β(W)∈{1,...,m_β} by binning W into m_β bins, in such a way that varying the parameter β sweeps out the full Pareto frontier, solving a generalization of the Discrete Information Bottleneck (DIB) problem. We argue that the most interesting points on this frontier are "corners" maximizing I(Z,Y) for a fixed number of bins m=2,3... which can be conveniently be found without multiobjective optimization. We apply this method to the CIFAR-10, MNIST and Fashion-MNIST datasets, illustrating how it can be interpreted as an information-theoretically optimal image clustering algorithm.


page 1

page 4

page 5

page 6

page 9

page 10

page 11


Optimal Compression for Minimizing Classification Error Probability: an Information-Theoretic Approach

We formulate the problem of performing optimal data compression under th...

Random Forests and VGG-NET: An Algorithm for the ISIC 2017 Skin Lesion Classification Challenge

This manuscript briefly describes an algorithm developed for the ISIC 20...

Pareto-optimal clustering with the primal deterministic information bottleneck

At the heart of both lossy compression and clustering is a trade-off bet...

Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information

We propose learning discrete structured representations from unlabeled d...

The deterministic information bottleneck

Lossy compression and clustering fundamentally involve a decision about ...

cutpointr: Improved Estimation and Validation of Optimal Cutpoints in R

'Optimal cutpoints' for binary classification tasks are often establishe...

Architecture Compression

In this paper we propose a novel approach to model compression termed Ar...

Code Repositories


Pareto-optimal data compression for binary classification tasks

view repo