Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects

05/31/2023
by   Stefan Thalhammer, et al.
0

Object pose estimation is important for object manipulation and scene understanding. In order to improve the general applicability of pose estimators, recent research focuses on providing estimates for novel objects, that is objects unseen during training. Such works use deep template matching strategies to retrieve the closest template connected to a query image. This template retrieval implicitly provides object class and pose. Despite the recent success and improvements of Vision Transformers over CNNs for many vision tasks, the state of the art uses CNN-based approaches for novel object pose estimation. This work evaluates and demonstrates the differences between self-supervised CNNs and Vision Transformers for deep template matching. In detail, both types of approaches are trained using contrastive learning to match training images against rendered templates of isolated objects. At test time, such templates are matched against query images of known and novel objects under challenging settings, such as clutter, occlusion and object symmetries, using masked cosine similarity. The presented results not only demonstrate that Vision Transformers improve in matching accuracy over CNNs, but also that for some cases pre-trained Vision Transformers do not need fine-tuning to do so. Furthermore, we highlight the differences in optimization and network architecture when comparing these two types of network for deep template matching.

READ FULL TEXT

page 16

page 17

research
09/21/2023

ZS6D: Zero-shot 6D Object Pose Estimation using Vision Transformers

As robotic systems increasingly encounter complex and unconstrained real...
research
11/26/2019

Learning to Match Templates for Unseen Instance Detection

Detecting objects in images is a quintessential problem in computer visi...
research
11/21/2018

Real-Time 6D Object Pose Estimation on CPU

We propose a fast and accurate 6D object pose estimation from a RGB-D im...
research
07/21/2023

YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation

6D object pose estimation is a crucial prerequisite for autonomous robot...
research
03/29/2022

OSOP: A Multi-Stage One Shot Object Pose Estimation Framework

We present a novel one-shot method for object detection and 6 DoF pose e...
research
03/11/2017

A 3D Object Detection and Pose Estimation Pipeline Using RGB-D Images

3D object detection and pose estimation has been studied extensively in ...
research
11/29/2022

Finer-Grained Correlations: Location Priors for Unseen Object Pose Estimation

We present a new method which provides object location priors for previo...

Please sign up or login with your details

Forgot password? Click here to reset