A Mutual Information Maximization Perspective of Language Representation Learning

10/18/2019
by   Lingpeng Kong, et al.
0

We show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i.e., a sentence). Our formulation provides an alternative perspective that unifies classical word embedding models (e.g., Skip-gram) and modern contextual embeddings (e.g., BERT, XLNet). In addition to enhancing our theoretical understanding of these methods, our derivation leads to a principled framework that can be used to construct new self-supervised tasks. We provide an example by drawing inspirations from related methods based on mutual information maximization that have been successful in computer vision, and introduce a simple self-supervised objective that maximizes the mutual information between a global sentence representation and n-grams in the sentence. Our analysis offers a holistic view of representation learning methods to transfer knowledge and translate progress across multiple domains (e.g., natural language processing, computer vision, audio processing).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/25/2020

Evolution Is All You Need: Phylogenetic Augmentation for Contrastive Learning

Self-supervised representation learning of biological sequence embedding...
research
03/01/2023

An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization

In this paper, we provide an information-theoretic perspective on Varian...
research
07/31/2019

On Mutual Information Maximization for Representation Learning

Many recent methods for unsupervised or self-supervised representation l...
research
10/07/2021

InfoSeg: Unsupervised Semantic Image Segmentation with Mutual Information Maximization

We propose a novel method for unsupervised semantic image segmentation b...
research
07/06/2021

InfoNCE is a variational autoencoder

We show that a popular self-supervised learning method, InfoNCE, is a sp...
research
03/12/2021

Information Maximization Clustering via Multi-View Self-Labelling

Image clustering is a particularly challenging computer vision task, whi...
research
07/15/2022

HOME: High-Order Mixed-Moment-based Embedding for Representation Learning

Minimum redundancy among different elements of an embedding in a latent ...

Please sign up or login with your details

Forgot password? Click here to reset