Quantifying and Learning Static vs. Dynamic Information in Deep Spatiotemporal Networks

11/03/2022
by   Matthew Kowal, et al.
10

There is limited understanding of the information captured by deep spatiotemporal models in their intermediate representations. For example, while evidence suggests that action recognition algorithms are heavily influenced by visual appearance in single frames, no quantitative methodology exists for evaluating such static bias in the latent representation compared to bias toward dynamics. We tackle this challenge by proposing an approach for quantifying the static and dynamic biases of any spatiotemporal model, and apply our approach to three tasks, action recognition, automatic video object segmentation (AVOS) and video instance segmentation (VIS). Our key findings are: (i) Most examined models are biased toward static information. (ii) Some datasets that are assumed to be biased toward dynamics are actually biased toward static information. (iii) Individual channels in an architecture can be biased toward static, dynamic or a combination of the two. (iv) Most models converge to their culminating biases in the first half of training. We then explore how these biases affect performance on dynamically biased datasets. For action recognition, we propose StaticDropout, a semantically guided dropout that debiases a model from static information toward dynamics. For AVOS, we design a better combination of fusion and cross connection layers compared with previous architectures.

READ FULL TEXT

page 1

page 3

page 8

page 14

research
06/06/2022

A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information

Deep spatiotemporal models are used in a variety of computer vision task...
research
07/13/2022

Is Appearance Free Action Recognition Possible?

Intuition might suggest that motion and dynamic information are key to v...
research
11/23/2022

Evaluating and Mitigating Static Bias of Action Representations in the Background and the Foreground

Deep neural networks for video action recognition easily learn to utiliz...
research
11/23/2022

Dynamic Appearance: A Video Representation for Action Recognition with Joint Training

Static appearance of video may impede the ability of a deep neural netwo...
research
01/21/2019

Semantic Image Networks for Human Action Recognition

In this paper, we propose the use of a semantic image, an improved repre...
research
10/22/2020

Learning to Sort Image Sequences via Accumulated Temporal Differences

Consider a set of n images of a scene with dynamic objects captured with...
research
11/18/2017

Excitation Backprop for RNNs

Deep models are state-of-the-art for many vision tasks including video a...

Please sign up or login with your details

Forgot password? Click here to reset