Probing the State of the Art: A Critical Look at Visual Representation Evaluation

11/30/2019
by   Cinjon Resnick, et al.
12

Self-supervised research improved greatly over the past half decade, with much of the growth being driven by objectives that are hard to quantitatively compare. These techniques include colorization, cyclical consistency, and noise-contrastive estimation from image patches. Consequently, the field has settled on a handful of measurements that depend on linear probes to adjudicate which approaches are the best. Our first contribution is to show that this test is insufficient and that models which perform poorly (strongly) on linear classification can perform strongly (weakly) on more involved tasks like temporal activity localization. Our second contribution is to analyze the capabilities of five different representations. And our third contribution is a much needed new dataset for temporal activity localization.

READ FULL TEXT

page 1

page 5

research
03/21/2023

Visual Representation Learning from Unlabeled Video using Contrastive Masked Autoencoders

Masked Autoencoders (MAEs) learn self-supervised representations by rand...
research
04/07/2021

Contrastive Learning of Global and Local Audio-Visual Representations

Contrastive learning has delivered impressive results in many audio-visu...
research
07/03/2023

Don't freeze: Finetune encoders for better Self-Supervised HAR

Recently self-supervised learning has been proposed in the field of huma...
research
03/30/2021

CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning

Weakly-supervised temporal action localization (WS-TAL) aims to localize...
research
11/18/2022

Contrastive Positive Sample Propagation along the Audio-Visual Event Line

Visual and audio signals often coexist in natural environments, forming ...

Please sign up or login with your details

Forgot password? Click here to reset