Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition

12/11/2019
by   Jinwoo Choi, et al.
24

Human activities often occur in specific scene contexts, e.g., playing basketball on a basketball court. Training a model using existing video datasets thus inevitably captures and leverages such bias (instead of using the actual discriminative cues). The learned representation may not generalize well to new action classes or different tasks. In this paper, we propose to mitigate scene bias for video representation learning. Specifically, we augment the standard cross-entropy loss for action classification with 1) an adversarial loss for scene types and 2) a human mask confusion loss for videos where the human actors are masked out. These two losses encourage learning representations that are unable to predict the scene types and the correct actions when there is no evidence. We validate the effectiveness of our method by transferring our pre-trained model to three different tasks, including action classification, temporal localization, and spatio-temporal action detection. Our results show consistent improvement over the baseline model without debiasing.

READ FULL TEXT

page 1

page 2

page 9

research
12/15/2019

Action Genome: Actions as Composition of Spatio-temporal Scene Graphs

Action recognition has typically treated actions and activities as monol...
research
09/12/2020

Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

One significant factor we expect the video representation learning to ca...
research
09/03/2023

SOAR: Scene-debiasing Open-set Action Recognition

Deep learning models have a risk of utilizing spurious clues to make pre...
research
05/19/2017

The Kinetics Human Action Video Dataset

We describe the DeepMind Kinetics human action video dataset. The datase...
research
02/18/2020

Knowledge Integration Networks for Action Recognition

In this work, we propose Knowledge Integration Networks (referred as KIN...
research
04/25/2023

Weakly-Supervised Temporal Action Localization with Bidirectional Semantic Consistency Constraint

Weakly Supervised Temporal Action Localization (WTAL) aims to classify a...
research
03/10/2022

OpenTAL: Towards Open Set Temporal Action Localization

Temporal Action Localization (TAL) has experienced remarkable success un...

Please sign up or login with your details

Forgot password? Click here to reset