ZS6D: Zero-shot 6D Object Pose Estimation using Vision Transformers

09/21/2023
by   Philipp Ausserlechner, et al.
0

As robotic systems increasingly encounter complex and unconstrained real-world scenarios, there is a demand to recognize diverse objects. The state-of-the-art 6D object pose estimation methods rely on object-specific training and therefore do not generalize to unseen objects. Recent novel object pose estimation methods are solving this issue using task-specific fine-tuned CNNs for deep template matching. This adaptation for pose estimation still requires expensive data rendering and training procedures. MegaPose for example is trained on a dataset consisting of two million images showing 20,000 different objects to reach such generalization capabilities. To overcome this shortcoming we introduce ZS6D, for zero-shot novel object 6D pose estimation. Visual descriptors, extracted using pre-trained Vision Transformers (ViT), are used for matching rendered templates against query images of objects and for establishing local correspondences. These local correspondences enable deriving geometric correspondences and are used for estimating the object's 6D pose with RANSAC-based PnP. This approach showcases that the image descriptors extracted by pre-trained ViTs are well-suited to achieve a notable improvement over two state-of-the-art novel object 6D pose estimation methods, without the need for task-specific fine-tuning. Experiments are performed on LMO, YCBV, and TLESS. In comparison to one of the two methods we improve the Average Recall on all three datasets and compared to the second method we improve on two datasets.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

research
05/31/2023

Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects

Object pose estimation is important for object manipulation and scene un...
research
04/20/2023

Reinforcement Learning for Picking Cluttered General Objects with Dense Object Descriptors

Picking cluttered general objects is a challenging task due to the compl...
research
03/02/2022

OVE6D: Object Viewpoint Encoding for Depth-based 6D Object Pose Estimation

This paper proposes a universal framework, called OVE6D, for model-based...
research
10/30/2019

Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly

Is it possible to learn policies for robotic assembly that can generaliz...
research
04/03/2023

PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching

Estimating the pose of an unseen object is the goal of the challenging o...
research
11/17/2022

TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation

How do we imbue robots with the ability to efficiently manipulate unseen...
research
04/28/2021

ZePHyR: Zero-shot Pose Hypothesis Rating

Pose estimation is a basic module in many robot manipulation pipelines. ...

Please sign up or login with your details

Forgot password? Click here to reset