Learning Scene Gist with Convolutional Neural Networks to Improve Object Recognition

03/06/2018
by   Kevin Wu, et al.
0

Advancements in convolutional neural networks (CNNs) have made significant strides toward achieving high performance levels on multiple object recognition tasks. While some approaches utilize information from the entire scene to propose regions of interest, the task of interpreting a particular region or object is still performed independently of other objects and features in the image. Here we demonstrate that a scene's 'gist' can significantly contribute to how well humans can recognize objects. These findings are consistent with the notion that humans foveate on an object and incorporate information from the periphery to aid in recognition. We use a biologically inspired two-part convolutional neural network ('GistNet') that models the fovea and periphery to provide a proof-of-principle demonstration that computational object recognition can significantly benefit from the gist of the scene as contextual information. Our model yields accuracy improvements of up to 50 object categories when incorporating contextual gist, while only increasing the original model size by 5 the human visual system recognizes objects, suggesting specific biologically plausible constraints to improve machine vision and building initial steps towards the challenge of scene understanding.

READ FULL TEXT

page 1

page 3

page 5

page 6

research
11/17/2019

Putting visual object recognition in context

Context plays an important role in visual recognition. Recent studies ha...
research
04/10/2017

Deep Affordance-grounded Sensorimotor Object Recognition

It is well-established by cognitive neuroscience that human perception o...
research
04/21/2017

Track Everything: Limiting Prior Knowledge in Online Multi-Object Recognition

This paper addresses the problem of online tracking and classification o...
research
08/28/2017

Automatic Dataset Augmentation

Large scale image dataset and deep convolutional neural network (DCNN) a...
research
02/01/2019

Lift-the-Flap: Context Reasoning Using Object-Centered Graphs

Children benefit from lift-the-flap books by taking on an active role in...
research
10/20/2021

Combining Different V1 Brain Model Variants to Improve Robustness to Image Corruptions in CNNs

While some convolutional neural networks (CNNs) have surpassed human vis...
research
12/04/2017

Why my photos look sideways or upside down? Detecting Canonical Orientation of Images using Convolutional Neural Networks

Image orientation detection requires high-level scene understanding. Hum...

Please sign up or login with your details

Forgot password? Click here to reset