Large-scale Robustness Analysis of Video Action Recognition Models

07/04/2022
by   Madeline C. Schiappa, et al.
1

We have seen a great progress in video action recognition in recent years. There are several models based on convolutional neural network (CNN) with some recent transformer based approaches which provide state-of-the-art performance on existing benchmark datasets. However, large-scale robustness has not been studied for these models which is a critical aspect for real-world applications. In this work we perform a large-scale robustness analysis of these existing models for video action recognition. We mainly focus on robustness against distribution shifts due to real-world perturbations instead of adversarial perturbations. We propose four different benchmark datasets, HMDB-51P, UCF-101P, Kinetics-400P, and SSv2P and study the robustness of six different state-of-the-art action recognition models against 90 different perturbations. The study reveals some interesting findings, 1) transformer based models are consistently more robust against most of the perturbations when compared with CNN based models, 2) Pretraining helps Transformer based models to be more robust to different perturbations than CNN based models, and 3) All of the studied models are robust to temporal perturbation on the Kinetics dataset, but not on SSv2; this suggests temporal information is much more important for action label prediction on SSv2 datasets than on the Kinetics dataset. We hope that this study will serve as a benchmark for future research in robust video action recognition. More details about the project are available at https://rose-ar.github.io/.

READ FULL TEXT

page 2

page 5

page 7

page 10

page 22

page 23

page 24

page 25

research
10/22/2020

Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition

In recent years, a number of approaches based on 2D CNNs and 3D CNNs hav...
research
07/05/2022

Multi-modal Robustness Analysis Against Language and Visual Perturbations

Joint visual and language modeling on large-scale datasets has recently ...
research
08/09/2022

Sports Video Analysis on Large-Scale Data

This paper investigates the modeling of automated machine description on...
research
10/13/2021

Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions

The state-of-the-art deep neural networks are vulnerable to common corru...
research
12/22/2021

Recur, Attend or Convolve? Frame Dependency Modeling Matters for Cross-Domain Robustness in Action Recognition

Most action recognition models today are highly parameterized, and evalu...
research
02/19/2022

Going Deeper into Recognizing Actions in Dark Environments: A Comprehensive Benchmark Study

While action recognition (AR) has gained large improvements with the int...
research
04/25/2022

Temporal Relevance Analysis for Video Action Models

In this paper, we provide a deep analysis of temporal modeling for actio...

Please sign up or login with your details

Forgot password? Click here to reset