A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information

06/06/2022
by   Matthew Kowal, et al.
21

Deep spatiotemporal models are used in a variety of computer vision tasks, such as action recognition and video object segmentation. Currently, there is a limited understanding of what information is captured by these models in their intermediate representations. For example, while it has been observed that action recognition algorithms are heavily influenced by visual appearance in single static frames, there is no quantitative methodology for evaluating such static bias in the latent representation compared to bias toward dynamic information (e.g. motion). We tackle this challenge by proposing a novel approach for quantifying the static and dynamic biases of any spatiotemporal model. To show the efficacy of our approach, we analyse two widely studied tasks, action recognition and video object segmentation. Our key findings are threefold: (i) Most examined spatiotemporal models are biased toward static information; although, certain two-stream architectures with cross-connections show a better balance between the static and dynamic information captured. (ii) Some datasets that are commonly assumed to be biased toward dynamics are actually biased toward static information. (iii) Individual units (channels) in an architecture can be biased toward static, dynamic or a combination of the two.

READ FULL TEXT

page 3

page 8

page 11

page 12

research
11/03/2022

Quantifying and Learning Static vs. Dynamic Information in Deep Spatiotemporal Networks

There is limited understanding of the information captured by deep spati...
research
07/13/2022

Is Appearance Free Action Recognition Possible?

Intuition might suggest that motion and dynamic information are key to v...
research
11/07/2016

Spatiotemporal Residual Networks for Video Action Recognition

Two-stream Convolutional Networks (ConvNets) have shown strong performan...
research
11/23/2022

Evaluating and Mitigating Static Bias of Action Representations in the Background and the Foreground

Deep neural networks for video action recognition easily learn to utiliz...
research
11/18/2017

Excitation Backprop for RNNs

Deep models are state-of-the-art for many vision tasks including video a...
research
01/04/2018

What have we learned from deep representations for action recognition?

As the success of deep models has led to their deployment in all areas o...
research
01/25/2022

Semantically Video Coding: Instill Static-Dynamic Clues into Structured Bitstream for AI Tasks

Traditional media coding schemes typically encode image/video into a sem...

Please sign up or login with your details

Forgot password? Click here to reset