Gaze Estimation using Transformer

05/30/2021
by   Yihua Cheng, et al.
0

Recent work has proven the effectiveness of transformers in many computer vision tasks. However, the performance of transformers in gaze estimation is still unexplored. In this paper, we employ transformers and assess their effectiveness for gaze estimation. We consider two forms of vision transformer which are pure transformers and hybrid transformers. We first follow the popular ViT and employ a pure transformer to estimate gaze from images. On the other hand, we preserve the convolutional layers and integrate CNNs as well as transformers. The transformer serves as a component to complement CNNs. We compare the performance of the two transformers in gaze estimation. The Hybrid transformer significantly outperforms the pure transformer in all evaluation datasets with less parameters. We further conduct experiments to assess the effectiveness of the hybrid transformer and explore the advantage of self-attention mechanism. Experiments show the hybrid transformer can achieve state-of-the-art performance in all benchmarks with pre-training.To facilitate further research, we release codes and models in https://github.com/yihuacheng/GazeTR.

READ FULL TEXT
research
06/04/2021

Glance-and-Gaze Vision Transformer

Recently, there emerges a series of vision Transformers, which show supe...
research
03/03/2022

Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work

Vision Transformers (ViTs) are becoming more popular and dominating tech...
research
10/23/2022

Transformers For Recognition In Overhead Imagery: A Reality Check

There is evidence that transformers offer state-of-the-art recognition p...
research
06/12/2023

Mitigating Transformer Overconfidence via Lipschitz Regularization

Though Transformers have achieved promising results in many computer vis...
research
03/22/2023

Q-HyViT: Post-Training Quantization for Hybrid Vision Transformer with Bridge Block Reconstruction

Recently, vision transformers (ViT) have replaced convolutional neural n...
research
03/17/2022

SepTr: Separable Transformer for Audio Spectrogram Processing

Following the successful application of vision transformers in multiple ...
research
03/30/2022

Collaborative Transformers for Grounded Situation Recognition

Grounded situation recognition is the task of predicting the main activi...

Please sign up or login with your details

Forgot password? Click here to reset