Peripheral Vision Transformer

06/14/2022
by   Juhong Min, et al.
0

Human vision possesses a special type of visual processing systems called peripheral vision. Partitioning the entire visual field into multiple contour regions based on the distance to the center of our gaze, the peripheral vision provides us the ability to perceive various visual features at different regions. In this work, we take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition. We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data. We evaluate the proposed network, dubbed PerViT, on the large-scale ImageNet dataset and systematically investigate the inner workings of the model for machine perception, showing that the network learns to perceive visual data similarly to the way that human vision does. The state-of-the-art performance in image classification task across various model sizes demonstrates the efficacy of the proposed method.

READ FULL TEXT

page 6

page 12

page 17

page 19

page 20

page 21

page 22

page 23

research
04/19/2022

Behind the Machine's Gaze: Biologically Constrained Neural Networks Exhibit Human-like Visual Attention

By and large, existing computational models of visual attention tacitly ...
research
06/14/2017

SideEye: A Generative Neural Network Based Simulator of Human Peripheral Vision

Foveal vision makes up less than 1 peripheral vision. Precisely what hum...
research
09/13/2019

SANVis: Visual Analytics for Understanding Self-Attention Networks

Attention networks, a deep neural network architecture inspired by human...
research
05/29/2021

FoveaTer: Foveated Transformer for Image Classification

Many animals and humans process the visual field with a varying spatial ...
research
11/22/2022

Simulating Human Gaze with Neural Visual Attention

Existing models of human visual attention are generally unable to incorp...
research
06/02/2023

A Novel Vision Transformer with Residual in Self-attention for Biomedical Image Classification

Biomedical image classification requires capturing of bio-informatics ba...
research
10/12/2022

S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces

Visual data such as images and videos are typically modeled as discretiz...

Please sign up or login with your details

Forgot password? Click here to reset