Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

01/27/2020
by   Jonathan Munro, et al.
69

Fine-grained action recognition datasets exhibit environmental bias, where multiple video sequences are captured from a limited number of environments. Training a model in one environment and deploying in another results in a drop in performance due to an unavoidable domain shift. Unsupervised Domain Adaptation (UDA) approaches have frequently utilised adversarial training between the source and target domains. However, these approaches have not explored the multi-modal nature of video within each domain. In this work we exploit the correspondence of modalities as a self-supervised alignment approach for UDA in addition to adversarial alignment. We test our approach on three kitchens from our large-scale dataset, EPIC-Kitchens, using two modalities commonly employed for action recognition: RGB and Optical Flow. We show that multi-modal self-supervision alone improves the performance over source-only training by 2.4 adversarial training with multi-modal self-supervision, showing that our approach outperforms other UDA methods by 3

READ FULL TEXT

page 1

page 2

page 3

page 5

page 8

research
06/03/2022

Team VI-I2R Technical Report on EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2021

In this report, we present the technical details of our approach to the ...
research
08/26/2021

Learning Cross-modal Contrastive Features for Video Domain Adaptation

Learning transferable and domain adaptive feature representations from v...
research
06/18/2021

EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2021: Team M3EM Technical Report

In this report, we describe the technical details of our submission to t...
research
09/09/2022

PoliTO-IIT-CINI Submission to the EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition

In this report, we describe the technical details of our submission to t...
research
09/20/2019

Coupled Generative Adversarial Network for Continuous Fine-grained Action Segmentation

We propose a novel conditional GAN (cGAN) model for continuous fine-grai...
research
08/17/2023

A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation

Body language (BL) refers to the non-verbal communication expressed thro...
research
10/18/2016

From Traditional to Modern : Domain Adaptation for Action Classification in Short Social Video Clips

Short internet video clips like vines present a significantly wild distr...

Please sign up or login with your details

Forgot password? Click here to reset