DeepAI AI Chat
Log In Sign Up

YOLOPose: Transformer-based Multi-Object 6D Pose Estimation using Keypoint Regression

05/05/2022
by   Arash Amini, et al.
63

6D object pose estimation is a crucial prerequisite for autonomous robot manipulation applications. The state-of-the-art models for pose estimation are convolutional neural network (CNN)-based. Lately, Transformers, an architecture originally proposed for natural language processing, is achieving state-of-the-art results in many computer vision tasks as well. Equipped with the multi-head self-attention mechanism, Transformers enable simple single-stage end-to-end architectures for learning object detection and 6D object pose estimation jointly. In this work, we propose YOLOPose (short form for You Only Look Once Pose estimation), a Transformer-based multi-object 6D pose estimation method based on keypoint regression. In contrast to the standard heatmaps for predicting keypoints in an image, we directly regress the keypoints. Additionally, we employ a learnable orientation estimation module to predict the orientation from the keypoints. Along with a separate translation estimation module, our model is end-to-end differentiable. Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods.

READ FULL TEXT

page 8

page 11

page 12

07/21/2023

YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation

6D object pose estimation is a crucial prerequisite for autonomous robot...
09/22/2021

T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression

6D pose estimation is the task of predicting the translation and orienta...
03/22/2021

End-to-End Trainable Multi-Instance Pose Estimation with Transformers

We propose a new end-to-end trainable approach for multi-instance pose e...
12/09/2021

PE-former: Pose Estimation Transformer

Vision transformer architectures have been demonstrated to work very eff...
12/12/2016

PoseAgent: Budget-Constrained 6D Object Pose Estimation via Reinforcement Learning

State-of-the-art computer vision algorithms often achieve efficiency by ...
11/21/2022

Simultaneous Multiple Object Detection and Pose Estimation using 3D Model Infusion with Monocular Vision

Multiple object detection and pose estimation are vital computer vision ...