HalluciNet-ing Spatiotemporal Representations Using 2D-CNN

12/10/2019
by   Paritosh Parmar, et al.
11

Spatiotemporal representations learnt using 3D convolutional neural networks (CNN's) are currently the state-of-the-art approaches for action related tasks. However, 3D-CNN's are notoriously known for being memory and compute resource intensive. 2D-CNN's, on the other hand, are much lighter on computing resource requirements, and are faster. However, 2D-CNN's performance on action related tasks is generally inferior to that of 3D-CNN's. Also, whereas 3D-CNN's simultaneously attend to appearance and salient motion patterns, 2D-CNN's are known to take shortcuts and recognize actions just from attending to background, which is not very meaningful. Taking inspiration from the fact that we, humans, can intuit how the actors will act and objects will be manipulated through years of experience and general understanding of the "how the world works," we suggest a way to combine the best attributes of 2D- and 3D-CNN's – we propose to hallucinate spatiotemporal representations as computed by 3D-CNN's, using a 2D-CNN. We believe that requiring the 2D-CNN to "see" into the future, would encourage it gain deeper about actions, and how scenes evolve by providing a stronger supervisory signal. Hallucination task is treated rather as an auxiliary task, while the main task is any other action related task such as, action recognition. Thorough experimental evaluation shows that hallucination task indeed helps improve performance on action recognition, action quality assessment, and dynamic scene recognition. From practical standpoint, being able to hallucinate spatiotemporal representations without an actual 3D-CNN, would enable deployment in resource-constrained scenarios such as lower-end phones and edge devices, and/or with lower bandwidth. This translates to pervasion of Video Analytics Software as a Service (VA SaaS), for e.g., automated physiotherapy options for financially challenged demographic.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 8

page 9

page 10

research
08/25/2020

Spatiotemporal Action Recognition in Restaurant Videos

Spatiotemporal action recognition is the task of locating and classifyin...
research
02/15/2021

Win-Fail Action Recognition

Current video/action understanding systems have demonstrated impressive ...
research
05/05/2015

Contextual Action Recognition with R*CNN

There are multiple cues in an image which reveal what action a person is...
research
08/07/2019

STM: SpatioTemporal and Motion Encoding for Action Recognition

Spatiotemporal and motion features are two complementary and crucial inf...
research
08/31/2020

Online Spatiotemporal Action Detection and Prediction via Causal Representations

In this thesis, we focus on video action understanding problems from an ...
research
04/19/2011

Hue Histograms to Spatiotemporal Local Features for Action Recognition

Despite the recent developments in spatiotemporal local features for act...
research
01/04/2018

What have we learned from deep representations for action recognition?

As the success of deep models has led to their deployment in all areas o...

Please sign up or login with your details

Forgot password? Click here to reset