Investigating Power laws in Deep Representation Learning

02/11/2022
by   Arna Ghosh, et al.
31

Representation learning that leverages large-scale labelled datasets, is central to recent progress in machine learning. Access to task relevant labels at scale is often scarce or expensive, motivating the need to learn from unlabelled datasets with self-supervised learning (SSL). Such large unlabelled datasets (with data augmentations) often provide a good coverage of the underlying input distribution. However evaluating the representations learned by SSL algorithms still requires task-specific labelled samples in the training pipeline. Additionally, the generalization of task-specific encoding is often sensitive to potential distribution shift. Inspired by recent advances in theoretical machine learning and vision neuroscience, we observe that the eigenspectrum of the empirical feature covariance matrix often follows a power law. For visual representations, we estimate the coefficient of the power law, α, across three key attributes which influence representation learning: learning objective (supervised, SimCLR, Barlow Twins and BYOL), network architecture (VGG, ResNet and Vision Transformer), and tasks (object and scene recognition). We observe that under mild conditions, proximity of α to 1, is strongly correlated to the downstream generalization performance. Furthermore, α≈ 1 is a strong indicator of robustness to label noise during fine-tuning. Notably, α is computable from the representations without knowledge of any labels, thereby offering a framework to evaluate the quality of representations in unlabelled datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2021

Self-supervised EEG Representation Learning for Automatic Sleep Staging

Objective: In this paper, we aim to learn robust vector representations ...
research
09/07/2021

Scale-invariant representation of machine learning

The success of machine learning stems from its structured data represent...
research
03/15/2023

Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification

While Multiple Instance Learning (MIL) has shown promising results in di...
research
08/25/2021

Learning From Long-Tailed Data With Noisy Labels

Class imbalance and noisy labels are the norm rather than the exception ...
research
02/26/2020

Evolving Losses for Unsupervised Video Representation Learning

We present a new method to learn video representations from large-scale ...
research
12/07/2022

Self-Supervised PPG Representation Learning Shows High Inter-Subject Variability

With the progress of sensor technology in wearables, the collection and ...
research
02/01/2022

Quantifying Relevance in Learning and Inference

Learning is a distinctive feature of intelligent behaviour. High-through...

Please sign up or login with your details

Forgot password? Click here to reset