DeepAI AI Chat
Log In Sign Up

On Mutual Information Maximization for Representation Learning

by   Michael Tschannen, et al.

Many recent methods for unsupervised or self-supervised representation learning train feature extractors by maximizing an estimate of the mutual information (MI) between different views of the data. This comes with several immediate problems: For example, MI is notoriously hard to estimate, and using it as an objective for representation learning may lead to highly entangled representations due to its invariance under arbitrary invertible transformations. Nevertheless, these methods have been repeatedly shown to excel in practice. In this paper we argue, and provide empirical evidence, that the success of these methods might be only loosely attributed to the properties of MI, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators. Finally, we establish a connection to deep metric learning and argue that this interpretation may be a plausible explanation for the success of the recently introduced methods.


page 1

page 2

page 3

page 4


Maximizing Mutual Information Across Feature and Topology Views for Learning Graph Representations

Recently, maximizing mutual information has emerged as a powerful method...

A Mutual Information Maximization Perspective of Language Representation Learning

We show state-of-the-art word representation learning methods maximize a...

Mutual Information Gradient Estimation for Representation Learning

Mutual Information (MI) plays an important role in representation learni...

DiME: Maximizing Mutual Information by a Difference of Matrix-Based Entropies

We introduce an information-theoretic quantity with similar properties t...

Learning Representations by Maximizing Mutual Information Across Views

We propose an approach to self-supervised representation learning based ...

Which Mutual-Information Representation Learning Objectives are Sufficient for Control?

Mutual information maximization provides an appealing formalism for lear...

Hierarchical and Unsupervised Graph Representation Learning with Loukas's Coarsening

We propose a novel algorithm for unsupervised graph representation learn...