Masked prediction tasks: a parameter identifiability view

02/18/2022
by   Bingbin Liu, et al.
0

The vast majority of work in self-supervised learning, both theoretical and empirical (though mostly the latter), have largely focused on recovering good features for downstream tasks, with the definition of "good" often being intricately tied to the downstream task itself. This lens is undoubtedly very interesting, but suffers from the problem that there isn't a "canonical" set of downstream tasks to focus on – in practice, this problem is usually resolved by competing on the benchmark dataset du jour. In this paper, we present an alternative lens: one of parameter identifiability. More precisely, we consider data coming from a parametric probabilistic model, and train a self-supervised learning predictor with a suitably chosen parametric form. Then, we ask whether we can read off the ground truth parameters of the probabilistic model from the optimal predictor. We focus on the widely used self-supervised learning method of predicting masked tokens, which is popular for both natural languages and visual data. While incarnations of this approach have already been successfully used for simpler probabilistic models (e.g. learning fully-observed undirected graphical models), we focus instead on latent-variable models capturing sequential structures – namely Hidden Markov Models with both discrete and conditionally Gaussian observations. We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not. Our results, borne of a theoretical grounding of self-supervised learning, could thus potentially beneficially inform practice. Moreover, we uncover close connections with uniqueness of tensor rank decompositions – a widely used tool in studying identifiability through the lens of the method of moments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/12/2021

SEED: Self-supervised Distillation For Visual Representation

This paper is concerned with self-supervised learning for small models. ...
research
10/13/2022

On Compressing Sequences for Self-Supervised Speech Models

Compressing self-supervised models has become increasingly necessary, as...
research
12/15/2021

Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration

Our work reveals a structured shortcoming of the existing mainstream sel...
research
06/14/2021

Self-Supervised Metric Learning in Multi-View Data: A Downstream Task Perspective

Self-supervised metric learning has been a successful approach for learn...
research
12/30/2022

An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models

With the attention mechanism, transformers achieve significant empirical...
research
10/25/2021

On Learning Prediction-Focused Mixtures

Probabilistic models help us encode latent structures that both model th...
research
06/02/2022

Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning

We propose a unifying view to analyze the representation quality of self...

Please sign up or login with your details

Forgot password? Click here to reset