HeadPosr: End-to-end Trainable Head Pose Estimation using Transformer Encoders

02/07/2022
by   Naina Dhingra, et al.
0

In this paper, HeadPosr is proposed to predict the head poses using a single RGB image. HeadPosr uses a novel architecture which includes a transformer encoder. In concrete, it consists of: (1) backbone; (2) connector; (3) transformer encoder; (4) prediction head. The significance of using a transformer encoder for HPE is studied. An extensive ablation study is performed on varying the (1) number of encoders; (2) number of heads; (3) different position embeddings; (4) different activations; (5) input channel size, in a transformer used in HeadPosr. Further studies on using: (1) different backbones, (2) using different learning rates are also shown. The elaborated experiments and ablations studies are conducted using three different open-source widely used datasets for HPE, i.e., 300W-LP, AFLW2000, and BIWI datasets. Experiments illustrate that HeadPosr outperforms all the state-of-art methods including both the landmark-free and the others based on using landmark or depth estimation on the AFLW2000 dataset and BIWI datasets when trained with 300W-LP. It also outperforms when averaging the results from the compared datasets, hence setting a benchmark for the problem of HPE, also demonstrating the effectiveness of using transformers over the state-of-the-art.

READ FULL TEXT

page 1

page 3

page 4

page 6

research
02/07/2022

LwPosr: Lightweight Efficient Fine-Grained Head Pose Estimation

This paper presents a lightweight network for head pose estimation (HPE)...
research
12/09/2021

PE-former: Pose Estimation Transformer

Vision transformer architectures have been demonstrated to work very eff...
research
03/22/2021

End-to-End Trainable Multi-Instance Pose Estimation with Transformers

We propose a new end-to-end trainable approach for multi-instance pose e...
research
11/28/2020

AdaBins: Depth Estimation using Adaptive Bins

We address the problem of estimating a high quality dense depth map from...
research
06/03/2023

A Conditional Generative Chatbot using Transformer Model

A Chatbot serves as a communication tool between a human user and a mach...
research
08/24/2022

K-Order Graph-oriented Transformer with GraAttention for 3D Pose and Shape Estimation

We propose a novel attention-based 2D-to-3D pose estimation network for ...
research
11/02/2022

Transformer-based encoder-encoder architecture for Spoken Term Detection

The paper presents a method for spoken term detection based on the Trans...

Please sign up or login with your details

Forgot password? Click here to reset