Learning Online Visual Invariances for Novel Objects via Supervised and Self-Supervised Training

10/04/2021
by   Valerio Biscione, et al.
0

Humans can identify objects following various spatial transformations such as scale and viewpoint. This extends to novel objects, after a single presentation at a single pose, sometimes referred to as online invariance. CNNs have been proposed as a compelling model of human vision, but their ability to identify objects across transformations is typically tested on held-out samples of trained categories after extensive data augmentation. This paper assesses whether standard CNNs can support human-like online invariance by training models to recognize images of synthetic 3D objects that undergo several transformations: rotation, scaling, translation, brightness, contrast, and viewpoint. Through the analysis of models' internal representations, we show that standard supervised CNNs trained on transformed objects can acquire strong invariances on novel classes even when trained with as few as 50 objects taken from 10 classes. This extended to a different dataset of photographs of real objects. We also show that these invariances can be acquired in a self-supervised way, through solving the same/different task. We suggest that this latter approach may be similar to how humans acquire invariances.

READ FULL TEXT

page 12

page 18

page 22

page 24

page 25

page 26

research
10/16/2020

On the surprising similarities between supervised and self-supervised models

How do humans learn to acquire a powerful, flexible and robust represent...
research
06/09/2021

Self-supervised Feature Enhancement: Applying Internal Pretext Task to Supervised Learning

Traditional self-supervised learning requires CNNs using external pretex...
research
08/09/2017

Transitive Invariance for Self-supervised Visual Representation Learning

Learning visual representations with self-supervised learning has become...
research
02/08/2022

TransformNet: Self-supervised representation learning through predicting geometric transformations

Deep neural networks need a big amount of training data, while in the re...
research
06/11/2014

"Mental Rotation" by Optimizing Transforming Distance

The human visual system is able to recognize objects despite transformat...
research
12/14/2021

On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation

Self-supervised learning is a powerful way to learn useful representatio...
research
06/14/2021

Partial success in closing the gap between human and machine vision

A few years ago, the first CNN surpassed human performance on ImageNet. ...

Please sign up or login with your details

Forgot password? Click here to reset