Audio-Adaptive Activity Recognition Across Video Domains

03/27/2022
by   Yunhua Zhang, et al.
11

This paper strives for activity recognition under domain shift, for example caused by change of scenery or camera viewpoint. The leading approaches reduce the shift in activity appearance by adversarial training and self-supervised learning. Different from these vision-focused works we leverage activity sounds for domain adaptation as they have less variance across domains and can reliably indicate which activities are not happening. We propose an audio-adaptive encoder and associated learning methods that discriminatively adjust the visual feature representation as well as addressing shifts in the semantic distribution. To further eliminate domain-specific features and include domain-invariant activity sounds for recognition, an audio-infused recognizer is proposed, which effectively models the cross-modal interaction across domains. We also introduce the new task of actor shift, with a corresponding audio-visual dataset, to challenge our method with situations where the activity appearance changes dramatically. Experiments on this dataset, EPIC-Kitchens and CharadesEgo show the effectiveness of our approach.

READ FULL TEXT

page 1

page 3

page 15

research
12/05/2022

Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight

State-of-the-art activity recognizers are effective during the day, but ...
research
07/21/2022

Domain Generalization for Activity Recognition via Adaptive Feature Fusion

Human activity recognition requires the efforts to build a generalizable...
research
08/03/2022

Multimodal Generation of Novel Action Appearances for Synthetic-to-Real Recognition of Activities of Daily Living

Domain shifts, such as appearance changes, are a key challenge in real-w...
research
07/02/2018

Multi-modal Egocentric Activity Recognition using Audio-Visual Features

Egocentric activity recognition in first-person videos has an increasing...
research
07/10/2022

Domain Adaptation Under Behavioral and Temporal Shifts for Natural Time Series Mobile Activity Recognition

Increasingly, human behavior is captured on mobile devices, leading to a...
research
09/14/2023

Hierarchical Metadata Information Constrained Self-Supervised Learning for Anomalous Sound Detection Under Domain Shift

Self-supervised learning methods have achieved promising performance for...
research
10/19/2021

Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition

First person action recognition is becoming an increasingly researched a...

Please sign up or login with your details

Forgot password? Click here to reset