You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization

11/15/2019
by   Okan Köpüklü, et al.
34

Spatiotemporal action localization requires incorporation of two sources of information into the designed architecture: (1) Temporal information from the previous frames and (2) spatial information from the key frame. Current state-of-the-art approaches usually extract these information with separate networks and use an extra mechanism for fusion to get detections. In this work, we present YOWO, a unified CNN architecture for real-time spatiotemporal action localization in video stream. YOWO makes use of a single neural network to extract temporal and spatial information concurrently and predict bounding boxes and action probabilities directly from video clips in one evaluation. Since the whole architecture is unified, it can be optimized end-to-end. The YOWO architecture is fast providing 34 frames-per-second on 16-frames input clips and 62 frames-per-second on 8-frames input clips. Remarkably, YOWO outperforms the previous state-of-the art results on J-HMDB-21 (71.1 UCF101-24 (75.0

READ FULL TEXT

page 1

page 3

page 8

research
06/08/2015

You Only Look Once: Unified, Real-Time Object Detection

We present YOLO, a new approach to object detection. Prior work on objec...
research
11/05/2021

KORSAL: Key-point Detection based Online Real-Time Spatio-Temporal Action Localization

Real-time and online action localization in a video is a critical yet hi...
research
05/04/2017

Action Tubelet Detector for Spatio-Temporal Action Localization

Current state-of-the-art approaches for spatio-temporal action localizat...
research
04/09/2019

Learning from Videos with Deep Convolutional LSTM Networks

This paper explores the use of convolution LSTMs to simultaneously learn...
research
02/14/2019

Exploring Frame Segmentation Networks for Temporal Action Localization

Temporal action localization is an important task of computer vision. Th...
research
06/15/2021

Cascading Convolutional Temporal Colour Constancy

Computational Colour Constancy (CCC) consists of estimating the colour o...
research
10/24/2022

GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction

Many online action prediction models observe complete frames to locate a...

Please sign up or login with your details

Forgot password? Click here to reset