Generic 3D Representation via Pose Estimation and Matching

10/23/2017
by   Amir R. Zamir, et al.
0

Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited. In this paper, we learn a generic 3D representation through solving a set of foundational proxy 3D tasks: object-centric camera pose estimation and wide baseline feature matching. Our method is based upon the premise that by providing supervision over a set of carefully selected foundational tasks, generalization to novel tasks and abstraction capabilities can be achieved. We empirically show that the internal representation of a multi-task ConvNet trained to solve the above core problems generalizes to novel 3D tasks (e.g., scene layout estimation, object pose estimation, surface normal estimation) without the need for fine-tuning and shows traits of abstraction abilities (e.g., cross-modality pose estimation). In the context of the core supervised tasks, we demonstrate our representation achieves state-of-the-art wide baseline feature matching results without requiring apriori rectification (unlike SIFT and the majority of learned features). We also show 6DOF camera pose estimation given a pair local image patches. The accuracy of both supervised tasks come comparable to humans. Finally, we contribute a large-scale dataset composed of object-centric street view scenes along with point correspondences and camera pose information, and conclude with a discussion on the learned representation and open research questions.

READ FULL TEXT

page 2

page 5

page 8

page 11

page 12

page 13

page 14

research
03/13/2023

Object-Centric Multi-Task Learning for Human Instances

Human is one of the most essential classes in visual recognition tasks s...
research
02/01/2022

Sim2Real Object-Centric Keypoint Detection and Description

Keypoint detection and description play a central role in computer visio...
research
11/09/2022

A Solution for a Fundamental Problem of 3D Inference based on 2D Representations

3D inference from monocular vision using neural networks is an important...
research
03/30/2015

Globally Tuned Cascade Pose Regression via Back Propagation with Application in 2D Face Pose Estimation and Heart Segmentation in 3D CT Images

Recently, a successful pose estimation algorithm, called Cascade Pose Re...
research
06/04/2022

Nerfels: Renderable Neural Codes for Improved Camera Pose Estimation

This paper presents a framework that combines traditional keypoint-based...
research
07/08/2019

Introduction to Camera Pose Estimation with Deep Learning

Over the last two decades, deep learning has transformed the field of co...
research
03/13/2019

Neural Scene Decomposition for Multi-Person Motion Capture

Learning general image representations has proven key to the success of ...

Please sign up or login with your details

Forgot password? Click here to reset