ST-ABN: Visual Explanation Taking into Account Spatio-temporal Information for Video Recognition

10/29/2021
by   Masahiro Mitsuhara, et al.
3

It is difficult for people to interpret the decision-making in the inference process of deep neural networks. Visual explanation is one method for interpreting the decision-making of deep learning. It analyzes the decision-making of 2D CNNs by visualizing an attention map that highlights discriminative regions. Visual explanation for interpreting the decision-making process in video recognition is more difficult because it is necessary to consider not only spatial but also temporal information, which is different from the case of still images. In this paper, we propose a visual explanation method called spatio-temporal attention branch network (ST-ABN) for video recognition. It enables visual explanation for both spatial and temporal information. ST-ABN acquires the importance of spatial and temporal information during network inference and applies it to recognition processing to improve recognition performance and visual explainability. Experimental results with Something-Something datasets V1 & V2 demonstrated that ST-ABN enables visual explanation that takes into account spatial and temporal information simultaneously and improves recognition performance.

READ FULL TEXT

page 1

page 2

page 8

research
03/18/2020

STH: Spatio-Temporal Hybrid Convolution for Efficient Action Recognition

Effective and Efficient spatio-temporal modeling is essential for action...
research
06/09/2022

Spatial-temporal Concept based Explanation of 3D ConvNets

Recent studies have achieved outstanding success in explaining 2D image ...
research
07/21/2021

DRIVE: Deep Reinforced Accident Anticipation with Visual Explanation

Traffic accident anticipation aims to accurately and promptly predict th...
research
12/13/2022

Examining the Difference Among Transformers and CNNs with Explanation Methods

We propose a methodology that systematically applies deep explanation al...
research
03/31/2020

Explaining Motion Relevance for Activity Recognition in Video Deep Learning Models

A small subset of explainability techniques developed initially for imag...
research
08/05/2019

Discriminating Spatial and Temporal Relevance in Deep Taylor Decompositions for Explainable Activity Recognition

Current techniques for explainable AI have been applied with some succes...
research
07/01/2022

Motion Compensated Frequency Selective Extrapolation for Error Concealment in Video Coding

Although wireless and IP-based access to video content gives a new degre...

Please sign up or login with your details

Forgot password? Click here to reset