Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective

05/25/2023
by   Thanh-Dat Truong, et al.
0

Understanding action recognition in egocentric videos has emerged as a vital research topic with numerous practical applications. With the limitation in the scale of egocentric data collection, learning robust deep learning-based action recognition models remains difficult. Transferring knowledge learned from the large-scale exocentric data to the egocentric data is challenging due to the difference in videos across views. Our work introduces a novel cross-view learning approach to action recognition (CVAR) that effectively transfers knowledge from the exocentric to the egocentric view. First, we introduce a novel geometric-based constraint into the self-attention mechanism in Transformer based on analyzing the camera positions between two views. Then, we propose a new cross-view self-attention loss learned on unpaired cross-view data to enforce the self-attention mechanism learning to transfer knowledge across views. Finally, to further improve the performance of our cross-view learning approach, we present the metrics to measure the correlations in videos and attention maps effectively. Experimental results on standard egocentric action recognition benchmarks, i.e., Charades-Ego, EPIC-Kitchens-55, and EPIC-Kitchens-100, have shown our approach's effectiveness and state-of-the-art performance.

READ FULL TEXT

page 1

page 5

page 6

research
02/02/2016

Learning a Deep Model for Human Action Recognition from Novel Viewpoints

Recognizing human actions from unknown and unseen (novel) views is a cha...
research
04/14/2023

CROVIA: Seeing Drone Scenes from Car Perspective via Cross-View Adaptation

Understanding semantic scene segmentation of urban scenes captured from ...
research
03/24/2020

Modeling Cross-view Interaction Consistency for Paired Egocentric Interaction Recognition

With the development of Augmented Reality (AR), egocentric action recogn...
research
07/23/2023

LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference

We propose a novel method, LoLep, which regresses Locally-Learned planes...
research
07/02/2021

Cross-view Geo-localization with Evolving Transformer

In this work, we address the problem of cross-view geo-localization, whi...
research
04/01/2022

Vision Transformer with Cross-attention by Temporal Shift for Efficient Action Recognition

We propose Multi-head Self/Cross-Attention (MSCA), which introduces a te...
research
12/02/2020

Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization

We introduce a novel representation learning method to disentangle pose-...

Please sign up or login with your details

Forgot password? Click here to reset