3D Interpreter Networks for Viewer-Centered Wireframe Modeling

04/03/2018
by   Jiajun Wu, et al.
0

Understanding 3D object structure from a single image is an important but challenging task in computer vision, mostly due to the lack of 3D object annotations to real images. Previous research tackled this problem by either searching for a 3D shape that best explains 2D annotations, or training purely on synthetic data with ground truth 3D information. In this work, we propose 3D INterpreter Networks (3D-INN), an end-to-end trainable framework that sequentially estimates 2D keypoint heatmaps and 3D object skeletons and poses. Our system learns from both 2D-annotated real images and synthetic 3D data. This is made possible mainly by two technical innovations. First, heatmaps of 2D keypoints serve as an intermediate representation to connect real and synthetic data. 3D-INN is trained on real images to estimate 2D keypoint heatmaps from an input image; it then predicts 3D object structure from heatmaps using knowledge learned from synthetic 3D shapes. By doing so, 3D-INN benefits from the variation and abundance of synthetic 3D objects, without suffering from the domain difference between real and synthesized images, often due to imperfect rendering. Second, we propose a Projection Layer, mapping estimated 3D structure back to 2D. During training, it ensures 3D-INN to predict 3D structure whose projection is consistent with the 2D annotations to real images. Experiments show that the proposed system performs well on both 2D keypoint estimation and 3D structure recovery. We also demonstrate that the recovered 3D information has wide vision applications, such as image retrieval.

READ FULL TEXT

page 2

page 6

page 8

page 11

page 12

page 13

page 14

page 15

research
11/08/2017

MarrNet: 3D Shape Reconstruction via 2.5D Sketches

3D object reconstruction from a single image is a highly under-determine...
research
03/22/2015

Lifting Object Detection Datasets into 3D

While data has certainly taken the center stage in computer vision in re...
research
12/08/2016

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

Monocular 3D object parsing is highly desirable in various scenarios inc...
research
07/03/2019

Learning to Predict Robot Keypoints Using Artificially Generated Images

This work considers robot keypoint estimation on color images as a super...
research
10/21/2022

Real-time Detection of 2D Tool Landmarks with Synthetic Training Data

In this paper a deep learning architecture is presented that can, in rea...
research
12/13/2019

Joint Viewpoint and Keypoint Estimation with Real and Synthetic Data

The estimation of viewpoints and keypoints effectively enhance object de...
research
06/15/2021

End-to-End Learning of Keypoint Representations for Continuous Control from Images

In many control problems that include vision, optimal controls can be in...

Please sign up or login with your details

Forgot password? Click here to reset