Evaluating and Mitigating Static Bias of Action Representations in the Background and the Foreground

11/23/2022
by   Haoxin Li, et al.
0

Deep neural networks for video action recognition easily learn to utilize shortcut static features, such as background and objects instead of motion features. This results in poor generalization to atypical videos such as soccer playing on concrete surfaces (instead of soccer fields). However, due to the rarity of out-of-distribution (OOD) data, quantitative evaluation of static bias remains a difficult task. In this paper, we synthesize new sets of benchmarks to evaluate static bias of action representations, including SCUB for static cues in the background, and SCUF for static cues in the foreground. Further, we propose a simple yet effective video data augmentation technique, StillMix, that automatically identifies bias-inducing video frames; unlike similar augmentation techniques, StillMix does not need to enumerate or precisely segment biased content. With extensive experiments, we quantitatively compare and analyze existing action recognition models on the created benchmarks to reveal their characteristics. We validate the effectiveness of StillMix and show that it improves TSM (Lin, Gan, and Han 2021) and Video Swin Transformer (Liu et al. 2021) by more than 10 action recognition.

READ FULL TEXT

page 1

page 3

research
12/07/2020

VideoMix: Rethinking Data Augmentation for Video Classification

State-of-the-art video action classifiers often suffer from overfitting....
research
11/09/2022

Extending Temporal Data Augmentation for Video Action Recognition

Pixel space augmentation has grown in popularity in many Deep Learning a...
research
06/06/2022

A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information

Deep spatiotemporal models are used in a variety of computer vision task...
research
11/03/2022

Quantifying and Learning Static vs. Dynamic Information in Deep Spatiotemporal Networks

There is limited understanding of the information captured by deep spati...
research
09/03/2023

SOAR: Scene-debiasing Open-set Action Recognition

Deep learning models have a risk of utilizing spurious clues to make pre...
research
03/24/2018

VOS-GAN: Adversarial Learning of Visual-Temporal Dynamics for Unsupervised Dense Prediction in Videos

Recent GAN-based video generation approaches model videos as the combina...
research
09/15/2022

Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition

We present a learning algorithm for human activity recognition in videos...

Please sign up or login with your details

Forgot password? Click here to reset