DeepAI AI Chat
Log In Sign Up

TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

03/30/2017
by   Chih-Yao Ma, et al.
0

Recent two-stream deep Convolutional Neural Networks (ConvNets) have made significant progress in recognizing human actions in videos. Despite their success, methods extending the basic two-stream ConvNet have not systematically explored possible network architectures to further exploit spatiotemporal dynamics within video sequences. Further, such networks often use different baseline two-stream networks. Therefore, the differences and the distinguishing factors between various methods using Recurrent Neural Networks (RNN) or convolutional networks on temporally-constructed feature vectors (Temporal-ConvNet) are unclear. In this work, we first demonstrate a strong baseline two-stream ConvNet using ResNet-101. We use this baseline to thoroughly examine the use of both RNNs and Temporal-ConvNets for extracting spatiotemporal information. Building upon our experimental results, we then propose and investigate two different networks to further integrate spatiotemporal information: 1) temporal segment RNN and 2) Inception-style Temporal-ConvNet. We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance. However, each of these methods require proper care to achieve state-of-the-art performance; for example, LSTMs require pre-segmented data or else they cannot fully exploit temporal information. Our analysis identifies specific limitations for each method that could form the basis of future work. Our experimental results on UCF101 and HMDB51 datasets achieve state-of-the-art performances, 94.1 69.0

READ FULL TEXT

page 13

page 14

page 15

04/09/2019

Learning from Videos with Deep Convolutional LSTM Networks

This paper explores the use of convolution LSTMs to simultaneously learn...
08/22/2017

Activity Recognition based on a Magnitude-Orientation Stream Network

The temporal component of videos provides an important clue for activity...
06/01/2022

Dual-stream spatiotemporal networks with feature sharing for monitoring animals in the home cage

This paper presents a spatiotemporal deep learning approach for mouse be...
11/07/2016

Spatiotemporal Residual Networks for Video Action Recognition

Two-stream Convolutional Networks (ConvNets) have shown strong performan...
04/20/2022

STAU: A SpatioTemporal-Aware Unit for Video Prediction and Beyond

Video prediction aims to predict future frames by modeling the complex s...
04/10/2020

Spatiotemporal Fusion in 3D CNNs: A Probabilistic View

Despite the success in still image recognition, deep neural networks for...
11/13/2018

Two-stream convolutional networks for end-to-end learning of self-driving cars

We propose a methodology to extend the concept of Two-Stream Convolution...

Code Repositories

Activity-Recognition-with-CNN-and-RNN

Temporal Segments LSTM and Temporal-Inception for Activity Recognition


view repo

temporal-augmentation

Temporal augmentation with two-stream ConvNet features on human action recognition


view repo