DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding

05/19/2018
by   Xiaokai Chen, et al.
0

Many of the leading approaches for video understanding are data-hungry and time-consuming, failing to capture the gist of spatial-temporal evolution in an efficient manner. The latest research shows that CNN network can reason about static relation of entities in images. To further exploit its capacity in dynamic evolution reasoning, we introduce a novel network module called DenseImage Network(DIN) with two main contributions. 1) A novel compact representation of video which distills its significant spatial-temporal evolution into a matrix called DenseImage, primed for efficient video encoding. 2) A simple yet powerful learning strategy based on DenseImage and a temporal-order-preserving CNN network is proposed for video understanding, which contains a local temporal correlation constraint capturing temporal evolution at multiple time scales with different filter widths. Extensive experiments on two recent challenging benchmarks demonstrate that our DenseImage Network can accurately capture the common spatial-temporal evolution between similar actions, even with enormous visual variations or different time scales. Moreover, we obtain the state-of-the-art results in action and gesture recognition with much less time-and-memory cost, indicating its immense potential in video representing and understanding.

READ FULL TEXT

page 1

page 3

page 4

page 6

research
11/05/2018

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

Despite the success of deep learning for static image understanding, it ...
research
09/28/2019

Grouped Spatial-Temporal Aggregation for Efficient Action Recognition

Temporal reasoning is an important aspect of video analysis. 3D CNN show...
research
12/15/2014

Automatic video scene segmentation based on spatial-temporal clues and rhythm

With ever increasing computing power and data storage capacity, the pote...
research
12/08/2019

Adversarial Pyramid Network for Video Domain Generalization

This paper introduces a new research problem of video domain generalizat...
research
10/11/2021

EchoVPR: Echo State Networks for Visual Place Recognition

Recognising previously visited locations is an important, but unsolved, ...
research
05/06/2021

Probablistic Bigraphs

Bigraphs are a universal computational modelling formalism for the spati...
research
07/22/2020

Video-ception Network: Towards Multi-Scale Efficient Asymmetric Spatial-Temporal Interactions

Previous video modeling methods leverage the cubic 3D convolution filter...

Please sign up or login with your details

Forgot password? Click here to reset