Video based Object 6D Pose Estimation using Transformers

10/24/2022
by   Apoorva Beedu, et al.
0

We introduce a Transformer based 6D Object Pose Estimation framework VideoPose, comprising an end-to-end attention based modelling architecture, that attends to previous frames in order to estimate accurate 6D Object Poses in videos. Our approach leverages the temporal information from a video sequence for pose refinement, along with being computationally efficient and robust. Compared to existing methods, our architecture is able to capture and reason from long-range dependencies efficiently, thus iteratively refining over video sequences. Experimental evaluation on the YCB-Video dataset shows that our approach is on par with the state-of-the-art Transformer methods, and performs significantly better relative to CNN based approaches. Further, with a speed of 33 fps, it is also more efficient and therefore applicable to a variety of applications that require real-time object pose estimation. Training code and pretrained models are available at https://github.com/ApoorvaBeedu/VideoPose

READ FULL TEXT

page 3

page 9

page 13

research
08/03/2018

Real-Time Object Pose Estimation with Pose Interpreter Networks

In this work, we introduce pose interpreter networks for 6-DoF object po...
research
08/22/2019

Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation

Existing deep learning approaches on 3d human pose estimation for videos...
research
08/22/2022

PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling

Training state-of-the-art models for human pose estimation in videos req...
research
10/21/2022

CRT-6D: Fast 6D Object Pose Estimation with Cascaded Refinement Transformers

Learning based 6D object pose estimation methods rely on computing large...
research
09/01/2020

LiftFormer: 3D Human Pose Estimation using attention models

Estimating the 3D position of human joints has become a widely researche...
research
02/09/2023

HybrIK-Transformer

HybrIK relies on a combination of analytical inverse kinematics and deep...
research
10/11/2021

Adaptively Multi-view and Temporal Fusing Transformer for 3D Human Pose Estimation

In practical application, 3D Human Pose Estimation (HPE) is facing with ...

Please sign up or login with your details

Forgot password? Click here to reset