Are Convolutional Neural Networks or Transformers more like human vision?

05/15/2021
by   Shikhar Tuli, et al.
0

Modern machine learning models for computer vision exceed humans in accuracy on specific visual recognition tasks, notably on datasets like ImageNet. However, high accuracy can be achieved in many ways. The particular decision function found by a machine learning system is determined not only by the data to which the system is exposed, but also the inductive biases of the model, which are typically harder to characterize. In this work, we follow a recent trend of in-depth behavioral analyses of neural network models that go beyond accuracy as an evaluation metric by looking at patterns of errors. Our focus is on comparing a suite of standard Convolutional Neural Networks (CNNs) and a recently-proposed attention-based network, the Vision Transformer (ViT), which relaxes the translation-invariance constraint of CNNs and therefore represents a model with a weaker set of inductive biases. Attention-based networks have previously been shown to achieve higher accuracy than CNNs on vision tasks, and we demonstrate, using new metrics for examining error consistency with more granularity, that their errors are also more consistent with those of humans. These results have implications both for building more human-like vision models, as well as for understanding visual object recognition in humans.

READ FULL TEXT

page 4

page 5

research
06/30/2020

Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency

A central problem in cognitive science and behavioural neuroscience as w...
research
12/07/2021

Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training

Recently, vision Transformers (ViTs) are developing rapidly and starting...
research
05/17/2022

POViT: Vision Transformer for Multi-objective Design and Characterization of Nanophotonic Devices

We solve a fundamental challenge in semiconductor IC design: the fast an...
research
03/13/2023

Evaluating Visual Number Discrimination in Deep Neural Networks

The ability to discriminate between large and small quantities is a core...
research
06/15/2017

Human-like Clustering with Deep Convolutional Neural Networks

Classification and clustering have been studied separately in machine le...
research
04/11/2023

The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

No free lunch theorems for supervised learning state that no learner can...
research
05/07/2022

Ultra-fast image categorization in vivo and in silico

Humans are able to robustly categorize images and can, for instance, det...

Please sign up or login with your details

Forgot password? Click here to reset