Image interpretation by iterative bottom-up top-down processing

05/12/2021
by   Shimon Ullman, et al.
19

Scene understanding requires the extraction and representation of scene components together with their properties and inter-relations. We describe a model in which meaningful scene structures are extracted from the image by an iterative process, combining bottom-up (BU) and top-down (TD) networks, interacting through a symmetric bi-directional communication between them (counter-streams structure). The model constructs a scene representation by the iterative use of three components. The first model component is a BU stream that extracts selected scene elements, properties and relations. The second component (cognitive augmentation) augments the extracted visual representation based on relevant non-visual stored representations. It also provides input to the third component, the TD stream, in the form of a TD instruction, instructing the model what task to perform next. The TD stream then guides the BU visual stream to perform the selected task in the next cycle. During this process, the visual representations extracted from the image can be combined with relevant non-visual representations, so that the final scene representation is based on both visual information extracted from the scene and relevant stored knowledge of the world. We describe how a sequence of TD-instructions is used to extract from the scene structures of interest, including an algorithm to automatically select the next TD-instruction in the sequence. The extraction process is shown to have favorable properties in terms of combinatorial generalization, generalizing well to novel scene structures and new combinations of objects, properties and relations not seen during training. Finally, we compare the model with relevant aspects of the human vision, and suggest directions for using the BU-TD scheme for integrating visual and cognitive components in the process of scene understanding.

READ FULL TEXT

page 12

page 14

page 15

page 16

page 17

page 18

page 23

page 24

research
07/16/2018

Visual Graphs from Motion (VGfM): Scene understanding with object geometry reasoning

Recent approaches on visual scene understanding attempt to build a scene...
research
10/26/2022

Visual Semantic Parsing: From Images to Abstract Meaning Representation

The success of scene graphs for visual scene understanding has brought a...
research
06/27/2017

A Pig, an Angel and a Cactus Walk Into a Blender: A Descriptive Approach to Visual Blending

A descriptive approach for automatic generation of visual blends is pres...
research
01/14/2020

NODIS: Neural Ordinary Differential Scene Understanding

Semantic image understanding is a challenging topic in computer vision. ...
research
06/10/2022

Symbolic image detection using scene and knowledge graphs

Sometimes the meaning conveyed by images goes beyond the list of objects...
research
12/24/2014

Transformation Properties of Learned Visual Representations

When a three-dimensional object moves relative to an observer, a change ...
research
01/23/2019

AlteregoNets: a way to human augmentation

A person dependent network, called an AlterEgo net, is proposed for deve...

Please sign up or login with your details

Forgot password? Click here to reset