Pareto-optimal clustering with the primal deterministic information bottleneck

04/05/2022
by   Andrew K. Tan, et al.
0

At the heart of both lossy compression and clustering is a trade-off between the fidelity and size of the learned representation. Our goal is to map out and study the Pareto frontier that quantifies this trade-off. We focus on the Deterministic Information Bottleneck (DIB) formulation of lossy compression, which can be interpreted as a clustering problem. To this end, we introduce the primal DIB problem, which we show results in a much richer frontier than its previously studied dual counterpart. We present an algorithm for mapping out the Pareto frontier of the primal DIB trade-off that is also applicable to most other two-objective clustering problems. We study general properties of the Pareto frontier, and give both analytic and numerical evidence for logarithmic sparsity of the frontier in general. We provide evidence that our algorithm has polynomial scaling despite the super-exponential search space; and additionally propose a modification to the algorithm that can be used where sampling noise is expected to be significant. Finally, we use our algorithm to map the DIB frontier of three different tasks: compressing the English alphabet, extracting informative color classes from natural images, and compressing a group theory inspired dataset, revealing interesting features of frontier, and demonstrating how the structure of the frontier can be used for model selection with a focus on points previously hidden by the cloak of the convex hull.

READ FULL TEXT

page 2

page 14

page 15

page 26

research
10/21/2020

Smoothed Analysis of Pareto Curves in Multiobjective Optimization

In a multiobjective optimization problem a solution is called Pareto-opt...
research
08/23/2019

Pareto-optimal data compression for binary classification tasks

The goal of lossy data compression is to reduce the storage cost of a da...
research
04/18/2017

Simple Problems: The Simplicial Gluing Structure of Pareto Sets and Pareto Fronts

Quite a few studies on real-world applications of multi-objective optimi...
research
10/25/2021

The Pareto Frontier of model selection for general Contextual Bandits

Recent progress in model selection raises the question of the fundamenta...
research
03/13/2013

Group-Sparse Model Selection: Hardness and Relaxations

Group-based sparsity models are proven instrumental in linear regression...
research
12/01/2018

On Bi-Objective convex-quadratic problems

In this paper we analyze theoretical properties of bi-objective convex-q...
research
11/02/2020

On the Relevance-Complexity Region of Scalable Information Bottleneck

The Information Bottleneck method is a learning technique that seeks a r...

Please sign up or login with your details

Forgot password? Click here to reset