Investigation on Combining 3D Convolution of Image Data and Optical Flow to Generate Temporal Action Proposals

03/11/2019
by   Patrick Schlosser, et al.
0

In this paper, a novel two-stream architecture for the task of temporal action proposal generation in long, untrimmed videos is presented. Inspired by the recent advances in the field of human action recognition utilizing 3D convolutions in combination with two-stream networks and based on the Single-Stream Temporal Action Proposals (SST) architecture, four different two-stream architectures utilizing sequences of images on one stream and images of optical flow on the other stream are subsequently investigated. The four architectures fuse the two separate streams at different depths in the model; for each of them, a broad range of parameters is investigated systematically as well as an optimal parametrization is empirically determined. The experiments on action and sports datasets show that all four two-stream architectures are able to outperform the original single-stream SST and achieve state of the art results. Additional experiments revealed that the improvements are not restricted to a single method of calculating optical flow by exchanging the formerly used method of Brox with FlowNet2 and still achieving improvements.

READ FULL TEXT
research
12/19/2018

D3D: Distilled 3D Networks for Video Action Recognition

State-of-the-art methods for video action recognition commonly use an en...
research
07/09/2021

RGB Stream Is Enough for Temporal Action Detection

State-of-the-art temporal action detectors to date are based on two-stre...
research
08/22/2017

Activity Recognition based on a Magnitude-Orientation Stream Network

The temporal component of videos provides an important clue for activity...
research
08/19/2019

Cross-Enhancement Transform Two-Stream 3D ConvNets for Pedestrian Action Recognition of Autonomous Vehicles

Action recognition is an important research topic in machine vision. It ...
research
12/22/2018

Temporal Hockey Action Recognition via Pose and Optical Flows

Recognizing actions in ice hockey using computer vision poses challenges...
research
05/18/2017

Learning Spatiotemporal Features for Infrared Action Recognition with 3D Convolutional Neural Networks

Infrared (IR) imaging has the potential to enable more robust action rec...
research
05/30/2019

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

Learning to represent videos is a very challenging task both algorithmic...

Please sign up or login with your details

Forgot password? Click here to reset