Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition

08/03/2020
by   M. Esat Kalfaoglu, et al.
0

In this work, we combine 3D convolution with late temporal modeling for action recognition. For this aim, we replace the conventional Temporal Global Average Pooling (TGAP) layer at the end of 3D convolutional architecture with the Bidirectional Encoder Representations from Transformers (BERT) layer in order to better utilize the temporal information with BERT's attention mechanism. We show that this replacement improves the performances of many popular 3D convolution architectures for action recognition, including ResNeXt, I3D, SlowFast and R(2+1)D. Moreover, we provide the-state-of-the-art results on both HMDB51 and UCF101 datasets with 83.99 respectively. The code is publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2020

CTM: Collaborative Temporal Modeling for Action Recognition

With the rapid development of digital multimedia, video understanding ha...
research
01/19/2020

MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recogntion

To efficiently extract spatiotemporal features of video for action recog...
research
04/06/2023

Therbligs in Action: Video Understanding through Motion Primitives

In this paper we introduce a rule-based, compositional, and hierarchical...
research
04/25/2022

Temporal Relevance Analysis for Video Action Models

In this paper, we provide a deep analysis of temporal modeling for actio...
research
12/17/2020

Weakly-Supervised Action Localization and Action Recognition using Global-Local Attention of 3D CNN

3D Convolutional Neural Network (3D CNN) captures spatial and temporal i...
research
03/19/2018

Attention-based Temporal Weighted Convolutional Neural Network for Action Recognition

Research in human action recognition has accelerated significantly since...
research
12/22/2021

Recur, Attend or Convolve? Frame Dependency Modeling Matters for Cross-Domain Robustness in Action Recognition

Most action recognition models today are highly parameterized, and evalu...

Please sign up or login with your details

Forgot password? Click here to reset