Optimizing the Noise in Self-Supervised Learning: from Importance Sampling to Noise-Contrastive Estimation

01/23/2023
by   Omar Chehab, et al.
0

Self-supervised learning is an increasingly popular approach to unsupervised learning, achieving state-of-the-art results. A prevalent approach consists in contrasting data points and noise points within a classification task: this requires a good noise distribution which is notoriously hard to specify. While a comprehensive theory is missing, it is widely assumed that the optimal noise distribution should in practice be made equal to the data distribution, as in Generative Adversarial Networks (GANs). We here empirically and theoretically challenge this assumption. We turn to Noise-Contrastive Estimation (NCE) which grounds this self-supervised task as an estimation problem of an energy-based model of the data. This ties the optimality of the noise distribution to the sample efficiency of the estimator, which is rigorously defined as its asymptotic variance, or mean-squared error. In the special case where the normalization constant only is unknown, we show that NCE recovers a family of Importance Sampling estimators for which the optimal noise is indeed equal to the data distribution. However, in the general case where the energy is also unknown, we prove that the optimal noise density is the data density multiplied by a correction term based on the Fisher score. In particular, the optimal noise distribution is different from the data distribution, and is even from a different family. Nevertheless, we soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2022

The Optimal Noise in Noise-Contrastive Learning Is Not What You Think

Learning a parametric model of a data distribution is a well-known stati...
research
12/02/2019

Flow Contrastive Estimation of Energy-Based Models

This paper studies a training method to jointly estimate an energy-based...
research
05/07/2019

Multifidelity probability estimation via fusion of estimators

This paper develops a multifidelity method that enables estimation of fa...
research
01/07/2022

Optimality in Noisy Importance Sampling

In this work, we analyze the noisy importance sampling (IS), i.e., IS wo...
research
06/18/2021

Residual Contrastive Learning for Joint Demosaicking and Denoising

The breakthrough of contrastive learning (CL) has fueled the recent succ...
research
01/28/2023

Unbiased and Efficient Self-Supervised Incremental Contrastive Learning

Contrastive Learning (CL) has been proved to be a powerful self-supervis...
research
10/01/2022

Pitfalls of Gaussians as a noise distribution in NCE

Noise Contrastive Estimation (NCE) is a popular approach for learning pr...

Please sign up or login with your details

Forgot password? Click here to reset