Attention-based Temporal Weighted Convolutional Neural Network for Action Recognition

03/19/2018
by   Jinliang Zang, et al.
0

Research in human action recognition has accelerated significantly since the introduction of powerful machine learning tools such as Convolutional Neural Networks (CNNs). However, effective and efficient methods for incorporation of temporal information into CNNs are still being actively explored in the recent literature. Motivated by the popular recurrent attention models in the research area of natural language processing, we propose the Attention-based Temporal Weighted CNN (ATW), which embeds a visual attention model into a temporal weighted multi-stream CNN. This attention model is simply implemented as temporal weighting yet it effectively boosts the recognition performance of video representations. Besides, each stream in the proposed ATW framework is capable of end-to-end training, with both network parameters and temporal weights optimized by stochastic gradient descent (SGD) with backpropagation. Our experiments show that the proposed attention mechanism contributes substantially to the performance gains with the more discriminative snippets by focusing on more relevant video segments.

READ FULL TEXT
research
04/01/2022

Vision Transformer with Cross-attention by Temporal Shift for Efficient Action Recognition

We propose Multi-head Self/Cross-Attention (MSCA), which introduces a te...
research
12/22/2021

Recur, Attend or Convolve? Frame Dependency Modeling Matters for Cross-Domain Robustness in Action Recognition

Most action recognition models today are highly parameterized, and evalu...
research
03/21/2023

Automatic evaluation of herding behavior in towed fishing gear using end-to-end training of CNN and attention-based networks

This paper considers the automatic classification of herding behavior in...
research
05/08/2018

Visual Attribute-augmented Three-dimensional Convolutional Neural Network for Enhanced Human Action Recognition

Visual attributes in individual video frames, such as the presence of ch...
research
08/03/2020

Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition

In this work, we combine 3D convolution with late temporal modeling for ...
research
08/25/2017

Hierarchical Multi-scale Attention Networks for Action Recognition

Recurrent Neural Networks (RNNs) have been widely used in natural langua...
research
01/22/2021

A Two-stream Neural Network for Pose-based Hand Gesture Recognition

Pose based hand gesture recognition has been widely studied in the recen...

Please sign up or login with your details

Forgot password? Click here to reset