How much "human-like" visual experience do current self-supervised learning algorithms need to achieve human-level object recognition?

09/23/2021
by   A. Emin Orhan, et al.
0

This paper addresses a fundamental question: how good are our current self-supervised visual representation learning algorithms relative to humans? More concretely, how much "human-like", natural visual experience would these algorithms need in order to reach human-level performance in a complex, realistic visual object recognition task such as ImageNet? Using a scaling experiment, here we estimate that the answer is on the order of a million years of natural visual experience, in other words several orders of magnitude longer than a human lifetime. However, this estimate is quite sensitive to some underlying assumptions, underscoring the need to run carefully controlled human experiments. We discuss the main caveats surrounding our estimate and the implications of this rather surprising result.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2023

Scaling may be all you need for achieving human-level object recognition capacity with human-like visual experience

This paper asks whether current self-supervised learning methods, if suf...
research
10/16/2020

On the surprising similarities between supervised and self-supervised models

How do humans learn to acquire a powerful, flexible and robust represent...
research
11/23/2022

Reason from Context with Self-supervised Learning

A tiny object in the sky cannot be an elephant. Context reasoning is cri...
research
05/20/2022

The developmental trajectory of object recognition robustness: children are like small adults but unlike big deep neural networks

In laboratory object recognition tasks based on undistorted photographs,...
research
08/09/2023

A degree of image identification at sub-human scales could be possible with more advanced clusters

The purpose of the research is to determine if currently available self-...
research
06/14/2021

Partial success in closing the gap between human and machine vision

A few years ago, the first CNN surpassed human performance on ImageNet. ...
research
06/28/2021

GIFT: Generalizable Interaction-aware Functional Tool Affordances without Labels

Tool use requires reasoning about the fit between an object's affordance...

Please sign up or login with your details

Forgot password? Click here to reset