Cross-Domain First Person Audio-Visual Action Recognition through Relative Norm Alignment

06/03/2021
by   Mirco Planamente, et al.
0

First person action recognition is an increasingly researched topic because of the growing popularity of wearable cameras. This is bringing to light cross-domain issues that are yet to be addressed in this context. Indeed, the information extracted from learned representations suffers from an intrinsic environmental bias. This strongly affects the ability to generalize to unseen scenarios, limiting the application of current methods in real settings where trimmed labeled data are not available during training. In this work, we propose to leverage over the intrinsic complementary nature of audio-visual signals to learn a representation that works well on data seen during training, while being able to generalize across different domains. To this end, we introduce an audio-visual loss that aligns the contributions from the two modalities by acting on the magnitude of their feature norm representations. This new loss, plugged into a minimal multi-modal action recognition architecture, leads to strong results in cross-domain first person action recognition, as demonstrated by extensive experiments on the popular EPIC-Kitchens dataset.

READ FULL TEXT

page 5

page 8

research
10/19/2021

Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition

First person action recognition is becoming an increasingly researched a...
research
03/03/2021

Domain and View-point Agnostic Hand Action Recognition

Hand action recognition is a special case of human action recognition wi...
research
08/22/2019

EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition

We focus on multi-modal fusion for egocentric action recognition, and pr...
research
08/26/2021

Learning Cross-modal Contrastive Features for Video Domain Adaptation

Learning transferable and domain adaptive feature representations from v...
research
07/20/2022

Collaborating Domain-shared and Target-specific Feature Clustering for Cross-domain 3D Action Recognition

In this work, we consider the problem of cross-domain 3D action recognit...
research
03/22/2018

Towards Universal Representation for Unseen Action Recognition

Unseen Action Recognition (UAR) aims to recognise novel action categorie...
research
03/03/2014

Multiview Hessian regularized logistic regression for action recognition

With the rapid development of social media sharing, people often need to...

Please sign up or login with your details

Forgot password? Click here to reset