Embodied Visual Recognition

04/09/2019
by   Jianwei Yang, et al.
22

Passive visual systems typically fail to recognize objects in the amodal setting where they are heavily occluded. In contrast, humans and other embodied agents have the ability to move in the environment, and actively control the viewing angle to better understand object shapes and semantics. In this work, we introduce the task of Embodied Visual Recognition (EVR): An agent is instantiated in a 3D environment close to an occluded target object, and is free to move in the environment to perform object classification, amodal object localization, and amodal object segmentation. To address this, we develop a new model called Embodied Mask R-CNN, for agents to learn to move strategically to improve their visual recognition abilities. We conduct experiments using the House3D environment. Experimental results show that: 1) agents with embodiment (movement) achieve better visual recognition performance than passive ones; 2) in order to improve visual recognition abilities, agents can learn strategical moving paths that are different from shortest paths.

READ FULL TEXT

page 1

page 3

page 8

page 13

page 14

research
11/17/2020

SeekNet: Improved Human Instance Segmentation via Reinforcement Learning Based Optimized Robot Relocation

Amodal recognition is the ability of the system to detect occluded objec...
research
05/11/2019

Robustness of Object Recognition under Extreme Occlusion in Humans and Computational Models

Most objects in the visual world are partially occluded, but humans can ...
research
07/17/2020

The Effect of Top-Down Attention in Occluded Object Recognition

This study is concerned with the top-down visual processing benefit in t...
research
11/16/2016

Unsupervised Learning of Important Objects from First-Person Videos

A first-person camera, placed at a person's head, captures, which object...
research
07/20/2019

Recurrent Connections Aid Occluded Object Recognition by Discounting Occluders

Recurrent connections in the visual cortex are thought to aid object rec...
research
03/16/2020

PS-RCNN: Detecting Secondary Human Instances in a Crowd via Primary Object Suppression

Detecting human bodies in highly crowded scenes is a challenging problem...
research
10/03/2018

Grounding the Experience of a Visual Field through Sensorimotor Contingencies

Artificial perception is traditionally handled by hand-designing task sp...

Please sign up or login with your details

Forgot password? Click here to reset