DeepAI
Log In Sign Up

Right on Time: Multi-Temporal Convolutions for Human Action Recognition in Videos

11/08/2020
by   Alexandros Stergiou, et al.
0

The variations in the temporal performance of human actions observed in videos present challenges for their extraction using fixed-sized convolution kernels in CNNs. We present an approach that is more flexible in terms of processing the input at multiple timescales. We introduce Multi-Temporal networks that model spatio-temporal patterns of different temporal durations at each layer. To this end, they employ novel 3D convolution (MTConv) blocks that consist of a short stream for local space-time features and a long stream for features spanning across longer times. By aligning features of each stream with respect to the global motion patterns using recurrent cells, we can discover temporally coherent spatio-temporal features with varying durations. We further introduce sub-streams within each of the block pathways to reduce the computation requirements. The proposed MTNet architectures outperform state-of-the-art 3D-CNNs on five action recognition benchmark datasets. Notably, we achieve at 87.22 Kinectics-700. We further demonstrate the favorable computational requirements. Using sub-streams, we can further achieve a drastic reduction in parameters ( 60 generalization capabilities of the multi-temporal features

READ FULL TEXT

page 1

page 2

page 3

09/30/2019

Spatio-Temporal FAST 3D Convolutions for Human Action Recognition

Effective processing of video input is essential for the recognition of ...
10/05/2021

Efficient Modelling Across Time of Human Actions and Interactions

This thesis focuses on video understanding for human action and interact...
06/16/2020

Focus of Attention Improves Information Transfer in Visual Features

Unsupervised learning from continuous visual streams is a challenging pr...
07/22/2020

Depthwise Spatio-Temporal STFT Convolutional Neural Networks for Human Action Recognition

Conventional 3D convolutional neural networks (CNNs) are computationally...
01/11/2022

Representing Videos as Discriminative Sub-graphs for Action Recognition

Human actions are typically of combinatorial structures or patterns, i.e...
10/20/2021

GTM: Gray Temporal Model for Video Recognition

Data input modality plays an important role in video action recognition....
05/09/2017

Deep Spatio-temporal Manifold Network for Action Recognition

Visual data such as videos are often sampled from complex manifold. We p...

Code Repositories

Squeeze-and-Recursion-Temporal-Gates

Implementation of Squeeze and Recursion Temporal Gates blocks for action recognition


view repo