iSPA-Net: Iterative Semantic Pose Alignment Network

by   Jogendra Nath Kundu, et al.

Understanding and extracting 3D information of objects from monocular 2D images is a fundamental problem in computer vision. In the task of 3D object pose estimation, recent data driven deep neural network based approaches suffer from scarcity of real images with 3D keypoint and pose annotations. Drawing inspiration from human cognition, where the annotators use a 3D CAD model as structural reference to acquire ground-truth viewpoints for real images; we propose an iterative Semantic Pose Alignment Network, called iSPA-Net. Our approach focuses on exploiting semantic 3D structural regularity to solve the task of fine-grained pose estimation by predicting viewpoint difference between a given pair of images. Such image comparison based approach also alleviates the problem of data scarcity and hence enhances scalability of the proposed approach for novel object categories with minimal annotation. The fine-grained object pose estimator is also aided by correspondence of learned spatial descriptor of the input image pair. The proposed pose alignment framework enjoys the faculty to refine its initial pose estimation in consecutive iterations by utilizing an online rendering setup along with effectiveness of a non-uniform bin classification of pose-difference. This enables iSPA-Net to achieve state-of-the-art performance on various real image viewpoint estimation datasets. Further, we demonstrate effectiveness of the approach for multiple applications. First, we show results for active object viewpoint localization to capture images from similar pose considering only a single image as pose reference. Second, we demonstrate the ability of the learned semantic correspondence to perform unsupervised part-segmentation transfer using only a single part-annotated 3D template model per object class. To encourage reproducible research, we have released the codes for our proposed algorithm.


Object Pose Estimation from Monocular Image using Multi-View Keypoint Correspondence

Understanding the geometry and pose of objects in 2D images is a fundame...

PFRL: Pose-Free Reinforcement Learning for 6D Pose Estimation

6D pose estimation from a single RGB image is a challenging and vital ta...

Spatial Attention Improves Iterative 6D Object Pose Estimation

The task of estimating the 6D pose of an object from RGB images can be b...

Unsupervised Part Discovery via Feature Alignment

Understanding objects in terms of their individual parts is important, b...

Novel Object Viewpoint Estimation through Reconstruction Alignment

The goal of this paper is to estimate the viewpoint for a novel object. ...

Towards Scene Understanding with Detailed 3D Object Representations

Current approaches to semantic image and scene understanding typically e...

Please sign up or login with your details

Forgot password? Click here to reset