3D Object Recognition By Corresponding and Quantizing Neural 3D Scene Representations

10/30/2020
by   Mihir Prabhudesai, et al.
64

We propose a system that learns to detect objects and infer their 3D poses in RGB-D images. Many existing systems can identify objects and infer 3D poses, but they heavily rely on human labels and 3D annotations. The challenge here is to achieve this without relying on strong supervision signals. To address this challenge, we propose a model that maps RGB-D images to a set of 3D visual feature maps in a differentiable fully-convolutional manner, supervised by predicting views. The 3D feature maps correspond to a featurization of the 3D world scene depicted in the images. The object 3D feature representations are invariant to camera viewpoint changes or zooms, which means feature matching can identify similar objects under different camera viewpoints. We can compare the 3D feature maps of two objects by searching alignment across scales and 3D rotations, and, as a result of the operation, we can estimate pose and scale changes without the need for 3D pose annotations. We cluster object feature maps into a set of 3D prototypes that represent familiar objects in canonical scales and orientations. We then parse images by inferring the prototype identity and 3D pose for each detected object. We compare our method to numerous baselines that do not learn 3D feature visual representations or do not attempt to correspond features across scenes, and outperform them by a large margin in the tasks of object retrieval and object pose estimation. Thanks to the 3D nature of the object-centric feature maps, the visual similarity cues are invariant to 3D pose changes or small scale changes, which gives our method an advantage over 2D and 1D methods.

READ FULL TEXT

page 14

page 15

page 16

page 17

page 18

page 19

page 20

page 22

research
10/02/2019

Embodied Language Grounding with Implicit 3D Visual Feature Representations

Consider the utterance "the tomato is to the left of the pot." Humans ca...
research
12/01/2020

Unsupervised Part Discovery via Feature Alignment

Understanding objects in terms of their individual parts is important, b...
research
11/06/2020

Disentangling 3D Prototypical Networks For Few-Shot Concept Learning

We present neural architectures that disentangle RGB-D images into objec...
research
05/28/2019

Cerberus: A Multi-headed Derenderer

To generalize to novel visual scenes with new viewpoints and new object ...
research
02/20/2015

Learning Descriptors for Object Recognition and 3D Pose Estimation

Detecting poorly textured objects and estimating their 3D pose reliably ...
research
08/16/2019

RIO: 3D Object Instance Re-Localization in Changing Indoor Environments

In this work, we introduce the task of 3D object instance re-localizatio...
research
12/31/2018

Learning Spatial Common Sense with Geometry-Aware Recurrent Networks

We integrate two powerful ideas, geometry and deep visual representation...

Please sign up or login with your details

Forgot password? Click here to reset