Class Feature Pyramids for Video Explanation

09/18/2019
by   Alexandros Stergiou, et al.
0

Deep convolutional networks are widely used in video action recognition. 3D convolutions are one prominent approach to deal with the additional time dimension. While 3D convolutions typically lead to higher accuracies, the inner workings of the trained models are more difficult to interpret. We focus on creating human-understandable visual explanations that represent the hierarchical parts of spatio-temporal networks. We introduce Class Feature Pyramids, a method that traverses the entire network structure and incrementally discovers kernels at different network depths that are informative for a specific class. Our method does not depend on the network's architecture or the type of 3D convolutions, supporting grouped and depth-wise convolutions, convolutions in fibers, and convolutions in branches. We demonstrate the method on six state-of-the-art 3D convolution neural networks (CNNs) on three action recognition (Kinetics-400, UCF-101, and HMDB-51) and two egocentric action recognition datasets (EPIC-Kitchens and EGTEA Gaze+).

READ FULL TEXT

page 7

page 8

research
02/04/2019

Saliency Tubes: Visual Explanations for Spatio-Temporal Convolutions

Deep learning approaches have been established as the main methodology f...
research
03/28/2020

CAKES: Channel-wise Automatic KErnel Shrinking for Efficient 3D Network

3D Convolution Neural Networks (CNNs) have been widely applied to 3D sce...
research
09/30/2019

Spatio-Temporal FAST 3D Convolutions for Human Action Recognition

Effective processing of video input is essential for the recognition of ...
research
02/07/2020

Learning Class Regularized Features for Action Recognition

Training Deep Convolutional Neural Networks (CNNs) is based on the notio...
research
02/26/2019

STAR-Net: Action Recognition using Spatio-Temporal Activation Reprojection

While depth cameras and inertial sensors have been frequently leveraged ...
research
05/29/2019

Hierarchical Feature Aggregation Networks for Video Action Recognition

Most action recognition methods base on a) a late aggregation of frame l...
research
01/01/2023

Hierarchical Explanations for Video Action Recognition

We propose Hierarchical ProtoPNet: an interpretable network that explain...

Please sign up or login with your details

Forgot password? Click here to reset