TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

03/30/2017
by   Chih-Yao Ma, et al.
0

Recent two-stream deep Convolutional Neural Networks (ConvNets) have made significant progress in recognizing human actions in videos. Despite their success, methods extending the basic two-stream ConvNet have not systematically explored possible network architectures to further exploit spatiotemporal dynamics within video sequences. Further, such networks often use different baseline two-stream networks. Therefore, the differences and the distinguishing factors between various methods using Recurrent Neural Networks (RNN) or convolutional networks on temporally-constructed feature vectors (Temporal-ConvNet) are unclear. In this work, we first demonstrate a strong baseline two-stream ConvNet using ResNet-101. We use this baseline to thoroughly examine the use of both RNNs and Temporal-ConvNets for extracting spatiotemporal information. Building upon our experimental results, we then propose and investigate two different networks to further integrate spatiotemporal information: 1) temporal segment RNN and 2) Inception-style Temporal-ConvNet. We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance. However, each of these methods require proper care to achieve state-of-the-art performance; for example, LSTMs require pre-segmented data or else they cannot fully exploit temporal information. Our analysis identifies specific limitations for each method that could form the basis of future work. Our experimental results on UCF101 and HMDB51 datasets achieve state-of-the-art performances, 94.1 69.0

READ FULL TEXT

page 13

page 14

page 15

research
04/09/2019

Learning from Videos with Deep Convolutional LSTM Networks

This paper explores the use of convolution LSTMs to simultaneously learn...
research
08/22/2017

Activity Recognition based on a Magnitude-Orientation Stream Network

The temporal component of videos provides an important clue for activity...
research
06/01/2022

Dual-stream spatiotemporal networks with feature sharing for monitoring animals in the home cage

This paper presents a spatiotemporal deep learning approach for mouse be...
research
11/07/2016

Spatiotemporal Residual Networks for Video Action Recognition

Two-stream Convolutional Networks (ConvNets) have shown strong performan...
research
04/20/2022

STAU: A SpatioTemporal-Aware Unit for Video Prediction and Beyond

Video prediction aims to predict future frames by modeling the complex s...
research
04/10/2020

Spatiotemporal Fusion in 3D CNNs: A Probabilistic View

Despite the success in still image recognition, deep neural networks for...
research
11/13/2018

Two-stream convolutional networks for end-to-end learning of self-driving cars

We propose a methodology to extend the concept of Two-Stream Convolution...

Please sign up or login with your details

Forgot password? Click here to reset