Real-Time End-to-End Action Detection with Two-Stream Networks

02/23/2018
by   Alaaeldin El-Nouby, et al.
0

Two-stream networks have been very successful for solving the problem of action detection. However, prior work using two-stream networks train both streams separately, which prevents the network from exploiting regularities between the two streams. Moreover, unlike the visual stream, the dominant forms of optical flow computation typically do not maximally exploit GPU parallelism. We present a real-time end-to-end trainable two-stream network for action detection. First, we integrate the optical flow computation in our framework by using Flownet2. Second, we apply early fusion for the two streams and train the whole pipeline jointly end-to-end. Finally, for better network initialization, we transfer from the task of action recognition to action detection by pre-training our framework using the recently released large-scale Kinetics dataset. Our experimental results show that training the pipeline jointly end-to-end with fine-tuning the optical flow for the objective of action detection improves detection performance significantly. Additionally, we observe an improvement when initializing with parameters pre-trained using Kinetics. Last, we show that by integrating the optical flow computation, our framework is more efficient, running at real-time speeds (up to 31 fps).

READ FULL TEXT
research
07/09/2021

RGB Stream Is Enough for Temporal Action Detection

State-of-the-art temporal action detectors to date are based on two-stre...
research
04/03/2020

Two-Stream AMTnet for Action Detection

In this paper, we propose Two-Stream AMTnet, which leverages recent adva...
research
07/02/2020

Low-light Environment Neural Surveillance

We design and implement an end-to-end system for real-time crime detecti...
research
08/10/2020

2nd Place Scheme on Action Recognition Track of ECCV 2020 VIPriors Challenges: An Efficient Optical Flow Stream Guided Framework

To address the problem of training on small datasets for action recognit...
research
05/16/2018

Fast Retinomorphic Event Stream for Video Recognition and Reinforcement Learning

Good temporal representations are crucial for video understanding, and t...
research
06/13/2019

Hallucinating Bag-of-Words and Fisher Vector IDT terms for CNN-based Action Recognition

In this paper, we revive the use of old-fashioned handcrafted video repr...
research
09/15/2017

ClickBAIT: Click-based Accelerated Incremental Training of Convolutional Neural Networks

Today's general-purpose deep convolutional neural networks (CNN) for ima...

Please sign up or login with your details

Forgot password? Click here to reset