On the surprising similarities between supervised and self-supervised models

10/16/2020
by   Robert Geirhos, et al.
0

How do humans learn to acquire a powerful, flexible and robust representation of objects? While much of this process remains unknown, it is clear that humans do not require millions of object labels. Excitingly, recent algorithmic advancements in self-supervised learning now enable convolutional neural networks (CNNs) to learn useful visual object representations without supervised labels, too. In the light of this recent breakthrough, we here compare self-supervised networks to supervised models and human behaviour. We tested models on 15 generalisation datasets for which large-scale human behavioural data is available (130K highly controlled psychophysical trials). Surprisingly, current self-supervised CNNs share four key characteristics of their supervised counterparts: (1.) relatively poor noise robustness (with the notable exception of SimCLR), (2.) non-human category-level error patterns, (3.) non-human image-level error patterns (yet high similarity to supervised model errors) and (4.) a bias towards texture. Taken together, these results suggest that the strategies learned through today's supervised and self-supervised training objectives end up being surprisingly similar, but distant from human-like behaviour. That being said, we are clearly just at the beginning of what could be called a self-supervised revolution of machine vision, and we are hopeful that future self-supervised models behave differently from supervised ones, and—perhaps—more similar to robust human object recognition.

READ FULL TEXT

page 4

page 5

page 10

research
10/04/2021

Learning Online Visual Invariances for Novel Objects via Supervised and Self-Supervised Training

Humans can identify objects following various spatial transformations su...
research
06/14/2021

Partial success in closing the gap between human and machine vision

A few years ago, the first CNN surpassed human performance on ImageNet. ...
research
10/12/2021

Trivial or impossible – dichotomous data difficulty masks model differences (on ImageNet and beyond)

"The power of a generalization system follows directly from its biases" ...
research
09/23/2021

How much "human-like" visual experience do current self-supervised learning algorithms need to achieve human-level object recognition?

This paper addresses a fundamental question: how good are our current se...
research
10/01/2021

Do Self-Supervised and Supervised Methods Learn Similar Visual Representations?

Despite the success of a number of recent techniques for visual self-sup...
research
02/03/2021

Fast Concept Mapping: The Emergence of Human Abilities in Artificial Neural Networks when Learning Embodied and Self-Supervised

Most artificial neural networks used for object detection and recognitio...

Please sign up or login with your details

Forgot password? Click here to reset