DeepAI AI Chat
Log In Sign Up

TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

by   Chih-Yao Ma, et al.

Recent two-stream deep Convolutional Neural Networks (ConvNets) have made significant progress in recognizing human actions in videos. Despite their success, methods extending the basic two-stream ConvNet have not systematically explored possible network architectures to further exploit spatiotemporal dynamics within video sequences. Further, such networks often use different baseline two-stream networks. Therefore, the differences and the distinguishing factors between various methods using Recurrent Neural Networks (RNN) or convolutional networks on temporally-constructed feature vectors (Temporal-ConvNet) are unclear. In this work, we first demonstrate a strong baseline two-stream ConvNet using ResNet-101. We use this baseline to thoroughly examine the use of both RNNs and Temporal-ConvNets for extracting spatiotemporal information. Building upon our experimental results, we then propose and investigate two different networks to further integrate spatiotemporal information: 1) temporal segment RNN and 2) Inception-style Temporal-ConvNet. We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance. However, each of these methods require proper care to achieve state-of-the-art performance; for example, LSTMs require pre-segmented data or else they cannot fully exploit temporal information. Our analysis identifies specific limitations for each method that could form the basis of future work. Our experimental results on UCF101 and HMDB51 datasets achieve state-of-the-art performances, 94.1 69.0


page 13

page 14

page 15


Learning from Videos with Deep Convolutional LSTM Networks

This paper explores the use of convolution LSTMs to simultaneously learn...

Activity Recognition based on a Magnitude-Orientation Stream Network

The temporal component of videos provides an important clue for activity...

Dual-stream spatiotemporal networks with feature sharing for monitoring animals in the home cage

This paper presents a spatiotemporal deep learning approach for mouse be...

Spatiotemporal Residual Networks for Video Action Recognition

Two-stream Convolutional Networks (ConvNets) have shown strong performan...

STAU: A SpatioTemporal-Aware Unit for Video Prediction and Beyond

Video prediction aims to predict future frames by modeling the complex s...

Spatiotemporal Fusion in 3D CNNs: A Probabilistic View

Despite the success in still image recognition, deep neural networks for...

Two-stream convolutional networks for end-to-end learning of self-driving cars

We propose a methodology to extend the concept of Two-Stream Convolution...

Code Repositories


Temporal Segments LSTM and Temporal-Inception for Activity Recognition

view repo


Temporal augmentation with two-stream ConvNet features on human action recognition

view repo