YOLOPose: Transformer-based Multi-Object 6D Pose Estimation using Keypoint Regression

by   Arash Amini, et al.

6D object pose estimation is a crucial prerequisite for autonomous robot manipulation applications. The state-of-the-art models for pose estimation are convolutional neural network (CNN)-based. Lately, Transformers, an architecture originally proposed for natural language processing, is achieving state-of-the-art results in many computer vision tasks as well. Equipped with the multi-head self-attention mechanism, Transformers enable simple single-stage end-to-end architectures for learning object detection and 6D object pose estimation jointly. In this work, we propose YOLOPose (short form for You Only Look Once Pose estimation), a Transformer-based multi-object 6D pose estimation method based on keypoint regression. In contrast to the standard heatmaps for predicting keypoints in an image, we directly regress the keypoints. Additionally, we employ a learnable orientation estimation module to predict the orientation from the keypoints. Along with a separate translation estimation module, our model is end-to-end differentiable. Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods.




This is actually the kind of information I have been trying to find. Thank you for writing this information.saatlarin mənasi


This is actually the kind of information I have been trying to find. Thank you for writing this information.saatlarin mənasi


This particular papers fabulous, and My spouse and i enjoy each of the perform that you have placed into this. I’m sure that you will be making a really useful place. I has been additionally pleased. Good perform!www.marketingacademy.bg    


Wow, excellent post. I'd like to draft like this too - taking time and real hard work to make a great article. This post has encouraged me to write some posts that I am going to write soon.erotik film izle


An fascinating discussion is value comment. I think that it is best to write extra on this matter, it won’t be a taboo topic however generally people are not enough to talk on such topics. To the next. Cheerscheck into cash


page 8

page 11

page 12


T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression

6D pose estimation is the task of predicting the translation and orienta...

End-to-End Trainable Multi-Instance Pose Estimation with Transformers

We propose a new end-to-end trainable approach for multi-instance pose e...

PE-former: Pose Estimation Transformer

Vision transformer architectures have been demonstrated to work very eff...

RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers

We propose a transformer-based neural network architecture for multi-obj...

PoseAgent: Budget-Constrained 6D Object Pose Estimation via Reinforcement Learning

State-of-the-art computer vision algorithms often achieve efficiency by ...

See the Difference: Direct Pre-Image Reconstruction and Pose Estimation by Differentiating HOG

The Histogram of Oriented Gradient (HOG) descriptor has led to many adva...

Pose Recognition with Cascade Transformers

In this paper, we present a regression-based pose recognition method usi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.