Is Appearance Free Action Recognition Possible?

07/13/2022
by   Filip Ilic, et al.
0

Intuition might suggest that motion and dynamic information are key to video-based action recognition. In contrast, there is evidence that state-of-the-art deep-learning video understanding architectures are biased toward static information available in single frames. Presently, a methodology and corresponding dataset to isolate the effects of dynamic information in video are missing. Their absence makes it difficult to understand how well contemporary architectures capitalize on dynamic vs. static information. We respond with a novel Appearance Free Dataset (AFD) for action recognition. AFD is devoid of static information relevant to action recognition in a single frame. Modeling of the dynamics is necessary for solving the task, as the action is only apparent through consideration of the temporal dimension. We evaluated 11 contemporary action recognition architectures on AFD as well as its related RGB video. Our results show a notable decrease in performance for all architectures on AFD compared to RGB. We also conducted a complimentary study with humans that shows their recognition accuracy on AFD and RGB is very similar and much better than the evaluated architectures on AFD. Our results motivate a novel architecture that revives explicit recovery of optical flow, within a contemporary design for best performance on AFD and RGB.

READ FULL TEXT

page 1

page 3

page 4

page 6

research
06/06/2022

A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information

Deep spatiotemporal models are used in a variety of computer vision task...
research
11/23/2022

Dynamic Appearance: A Video Representation for Action Recognition with Joint Training

Static appearance of video may impede the ability of a deep neural netwo...
research
11/03/2022

Quantifying and Learning Static vs. Dynamic Information in Deep Spatiotemporal Networks

There is limited understanding of the information captured by deep spati...
research
05/12/2020

3DV: 3D Dynamic Voxel for Action Recognition in Depth Video

To facilitate depth-based 3D action recognition, 3D dynamic voxel (3DV) ...
research
03/19/2018

Deja Vu: Motion Prediction in Static Images

This paper proposes motion prediction in single still images by learning...
research
10/21/2019

Conquering the CNN Over-Parameterization Dilemma: A Volterra Filtering Approach for Action Recognition

The importance of inference in Machine Learning (ML) has led to an explo...
research
06/14/2017

Learning without Prejudice: Avoiding Bias in Webly-Supervised Action Recognition

Webly-supervised learning has recently emerged as an alternative paradig...

Please sign up or login with your details

Forgot password? Click here to reset