Blockwise Temporal-Spatial Pathway Network

08/05/2022
by   SeulGi Hong, et al.
5

Algorithms for video action recognition should consider not only spatial information but also temporal relations, which remains challenging. We propose a 3D-CNN-based action recognition model, called the blockwise temporal-spatial path-way network (BTSNet), which can adjust the temporal and spatial receptive fields by multiple pathways. We designed a novel model inspired by an adaptive kernel selection-based model, which is an architecture for effective feature encoding that adaptively chooses spatial receptive fields for image recognition. Expanding this approach to the temporal domain, our model extracts temporal and channel-wise attention and fuses information on various candidate operations. For evaluation, we tested our proposed model on UCF-101, HMDB-51, SVW, and Epic-Kitchen datasets and showed that it generalized well without pretraining. BTSNet also provides interpretable visualization based on spatiotemporal channel-wise attention. We confirm that the blockwise temporal-spatial pathway supports a better representation for 3D convolutional blocks based on this visualization.

READ FULL TEXT

page 1

page 2

page 4

research
02/08/2020

CTM: Collaborative Temporal Modeling for Action Recognition

With the rapid development of digital multimedia, video understanding ha...
research
03/23/2021

Learning Comprehensive Motion Representation for Action Recognition

For action recognition learning, 2D CNN-based methods are efficient but ...
research
11/23/2021

Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis

Event analysis in untrimmed videos has attracted increasing attention du...
research
12/12/2022

Cross-Modal Learning with 3D Deformable Attention for Action Recognition

An important challenge in vision-based action recognition is the embeddi...
research
03/28/2020

CAKES: Channel-wise Automatic KErnel Shrinking for Efficient 3D Network

3D Convolution Neural Networks (CNNs) have been widely applied to 3D sce...
research
05/22/2023

Spatiotemporal Attention-based Semantic Compression for Real-time Video Recognition

This paper studies the computational offloading of video action recognit...
research
04/19/2023

SLIC: Self-Conditioned Adaptive Transform with Large-Scale Receptive Fields for Learned Image Compression

Learned image compression has achieved remarkable performance. Transform...

Please sign up or login with your details

Forgot password? Click here to reset