Geometry-Aware Recurrent Neural Networks for Active Visual Recognition

by   Ricson Cheng, et al.

We present recurrent geometry-aware neural networks that integrate visual information across multiple views of a scene into 3D latent feature tensors, while maintaining an one-to-one mapping between 3D physical locations in the world scene and latent feature locations. Object detection, object segmentation, and 3D reconstruction is then carried out directly using the constructed 3D feature memory, as opposed to any of the input 2D images. The proposed models are equipped with differentiable egomotion-aware feature warping and (learned) depth-aware unprojection operations to achieve geometrically consistent mapping between the features in the input frame and the constructed latent model of the scene. We empirically show the proposed model generalizes much better than geometryunaware LSTM/GRU networks, especially under the presence of multiple objects and cross-object occlusions. Combined with active view selection policies, our model learns to select informative viewpoints to integrate information from by "undoing" cross-object occlusions, seamlessly combining geometry with learning from experience.


Learning Spatial Common Sense with Geometry-Aware Recurrent Networks

We integrate two powerful ideas, geometry and deep visual representation...

BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

We present BlockGAN, an image generative model that learns object-aware ...

Embodied View-Contrastive 3D Feature Learning

Humans can effortlessly imagine the occluded side of objects in a photog...

Geometry-Aware Video Object Detection for Static Cameras

In this paper we propose a geometry-aware model for video object detecti...

Affordance segmentation of hand-occluded containers from exocentric images

Visual affordance segmentation identifies the surfaces of an object an a...

Disentangling 3D Prototypical Networks For Few-Shot Concept Learning

We present neural architectures that disentangle RGB-D images into objec...

Deep Learned Full-3D Object Completion from Single View

3D geometry is a very informative cue when interacting with and navigati...

Please sign up or login with your details

Forgot password? Click here to reset