VideoLSTM Convolves, Attends and Flows for Action Recognition

07/06/2016
by   Zhenyang Li, et al.
0

We present a new architecture for end-to-end sequence learning of actions in video, we call VideoLSTM. Rather than adapting the video to the peculiarities of established recurrent or convolutional architectures, we adapt the architecture to fit the requirements of the video medium. Starting from the soft-Attention LSTM, VideoLSTM makes three novel contributions. First, video has a spatial layout. To exploit the spatial correlation we hardwire convolutions in the soft-Attention LSTM architecture. Second, motion not only informs us about the action content, but also guides better the attention towards the relevant spatio-temporal locations. We introduce motion-based attention. And finally, we demonstrate how the attention from VideoLSTM can be used for action localization by relying on just the action class label. Experiments and comparisons on challenging datasets for action classification and localization support our claims.

READ FULL TEXT

page 2

page 6

page 8

research
08/29/2018

Top-down Attention Recurrent VLAD Encoding for Action Recognition in Videos

Most recent approaches for action recognition from video leverage deep a...
research
09/30/2019

Spatio-Temporal FAST 3D Convolutions for Human Action Recognition

Effective processing of video input is essential for the recognition of ...
research
05/09/2017

CHAM: action recognition using convolutional hierarchical attention model

Recently, the soft attention mechanism, which was originally proposed in...
research
04/03/2017

Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

General human action recognition requires understanding of various visua...
research
05/16/2021

MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions

Spatio-temporal action detection is an important and challenging problem...
research
10/24/2019

Controllable Attention for Structured Layered Video Decomposition

The objective of this paper is to be able to separate a video into its n...
research
04/06/2023

Therbligs in Action: Video Understanding through Motion Primitives

In this paper we introduce a rule-based, compositional, and hierarchical...

Please sign up or login with your details

Forgot password? Click here to reset