PE-former: Pose Estimation Transformer

12/09/2021
by   Paschalis Panteleris, et al.
0

Vision transformer architectures have been demonstrated to work very effectively for image classification tasks. Efforts to solve more challenging vision tasks with transformers rely on convolutional backbones for feature extraction. In this paper we investigate the use of a pure transformer architecture (i.e., one with no CNN backbone) for the problem of 2D body pose estimation. We evaluate two ViT architectures on the COCO dataset. We demonstrate that using an encoder-decoder transformer architecture yields state of the art results on this estimation problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2022

YOLOPose: Transformer-based Multi-Object 6D Pose Estimation using Keypoint Regression

6D object pose estimation is a crucial prerequisite for autonomous robot...
research
09/12/2022

Vision Transformer with Convolutional Encoder-Decoder for Hand Gesture Recognition using 24 GHz Doppler Radar

Transformers combined with convolutional encoders have been recently use...
research
12/17/2020

Toward Transformer-Based Object Detection

Transformers have become the dominant model in natural language processi...
research
05/28/2021

TransCamP: Graph Transformer for 6-DoF Camera Pose Estimation

Camera pose estimation or camera relocalization is the centerpiece in nu...
research
03/06/2023

Improving Transformer-based Image Matching by Cascaded Capturing Spatially Informative Keypoints

Learning robust local image feature matching is a fundamental low-level ...
research
02/07/2022

HeadPosr: End-to-end Trainable Head Pose Estimation using Transformer Encoders

In this paper, HeadPosr is proposed to predict the head poses using a si...
research
12/07/2022

Multimodal Vision Transformers with Forced Attention for Behavior Analysis

Human behavior understanding requires looking at minute details in the l...

Please sign up or login with your details

Forgot password? Click here to reset