TEA: Temporal Excitation and Aggregation for Action Recognition

04/03/2020
by   Yan Li, et al.
8

Temporal modeling is key for action recognition in videos. It normally considers both short-range motions and long-range aggregations. In this paper, we propose a Temporal Excitation and Aggregation (TEA) block, including a motion excitation (ME) module and a multiple temporal aggregation (MTA) module, specifically designed to capture both short- and long-range temporal evolution. In particular, for short-range motion modeling, the ME module calculates the feature-level temporal differences from spatiotemporal features. It then utilizes the differences to excite the motion-sensitive channels of the features. The long-range temporal aggregations in previous works are typically achieved by stacking a large number of local temporal convolutions. Each convolution processes a local temporal window at a time. In contrast, the MTA module proposes to deform the local convolution to a group of sub-convolutions, forming a hierarchical residual architecture. Without introducing additional parameters, the features will be processed with a series of sub-convolutions, and each frame could complete multiple temporal aggregations with neighborhoods. The final equivalent receptive field of temporal dimension is accordingly enlarged, which is capable of modeling the long-range temporal relationship over distant frames. The two components of the TEA block are complementary in temporal modeling. Finally, our approach achieves impressive results at low FLOPs on several action recognition benchmarks, such as Kinetics, Something-Something, HMDB51, and UCF101, which confirms its effectiveness and efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2022

Multi-Scale Spatial Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Graph convolutional networks have been widely used for skeleton-based ac...
research
12/01/2020

Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification

Video classification researches that have recently attracted attention a...
research
04/03/2023

MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition

Current state-of-the-art approaches for few-shot action recognition achi...
research
06/15/2020

Learn to cycle: Time-consistent feature discovery for action recognition

Temporal motion has been one of the essential components for effectively...
research
12/04/2020

A high performance approach to detecting small targets in long range low quality infrared videos

Since targets are small in long range infrared (IR) videos, it is challe...
research
03/20/2021

Efficient Spatialtemporal Context Modeling for Action Recognition

Contextual information plays an important role in action recognition. Lo...
research
12/04/2018

Timeception for Complex Action Recognition

This paper focuses on the temporal aspect for recognizing human activiti...

Please sign up or login with your details

Forgot password? Click here to reset