YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation

07/21/2023
by   Arul Selvam Periyasamy, et al.
0

6D object pose estimation is a crucial prerequisite for autonomous robot manipulation applications. The state-of-the-art models for pose estimation are convolutional neural network (CNN)-based. Lately, Transformers, an architecture originally proposed for natural language processing, is achieving state-of-the-art results in many computer vision tasks as well. Equipped with the multi-head self-attention mechanism, Transformers enable simple single-stage end-to-end architectures for learning object detection and 6D object pose estimation jointly. In this work, we propose YOLOPose (short form for You Only Look Once Pose estimation), a Transformer-based multi-object 6D pose estimation method based on keypoint regression and an improved variant of the YOLOPose model. In contrast to the standard heatmaps for predicting keypoints in an image, we directly regress the keypoints. Additionally, we employ a learnable orientation estimation module to predict the orientation from the keypoints. Along with a separate translation estimation module, our model is end-to-end differentiable. Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods. We analyze the role of object queries in our architecture and reveal that the object queries specialize in detecting objects in specific image regions. Furthermore, we quantify the accuracy trade-off of using datasets of smaller sizes to train our model.

READ FULL TEXT

page 12

page 14

page 20

page 21

page 22

research
05/05/2022

YOLOPose: Transformer-based Multi-Object 6D Pose Estimation using Keypoint Regression

6D object pose estimation is a crucial prerequisite for autonomous robot...
research
03/24/2022

RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers

We propose a transformer-based neural network architecture for multi-obj...
research
09/22/2021

T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression

6D pose estimation is the task of predicting the translation and orienta...
research
03/22/2021

End-to-End Trainable Multi-Instance Pose Estimation with Transformers

We propose a new end-to-end trainable approach for multi-instance pose e...
research
05/31/2023

Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects

Object pose estimation is important for object manipulation and scene un...
research
03/06/2023

Improving Transformer-based Image Matching by Cascaded Capturing Spatially Informative Keypoints

Learning robust local image feature matching is a fundamental low-level ...
research
04/14/2021

Pose Recognition with Cascade Transformers

In this paper, we present a regression-based pose recognition method usi...

Please sign up or login with your details

Forgot password? Click here to reset