Video Action Recognition Via Neural Architecture Searching

07/10/2019
by   Wei Peng, et al.
0

Deep neural networks have achieved great success for video analysis and understanding. However, designing a high-performance neural architecture requires substantial efforts and expertise. In this paper, we make the first attempt to let algorithm automatically design neural networks for video action recognition tasks. Specifically, a spatio-temporal network is developed in a differentiable space modeled by a directed acyclic graph, thus a gradient-based strategy can be performed to search an optimal architecture. Nonetheless, it is computationally expensive, since the computational burden to evaluate each architecture candidate is still heavy. To alleviate this issue, we, for the video input, introduce a temporal segment approach to reduce the computational cost without losing global video information. For the architecture, we explore in an efficient search space by introducing pseudo 3D operators. Experiments show that, our architecture outperforms popular neural architectures, under the training from scratch protocol, on the challenging UCF101 dataset, surprisingly, with only around one percentage of parameters of its manual-design counterparts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2021

STSM: Spatio-Temporal Shift Module for Efficient Action Recognition

The modeling, computational cost, and accuracy of traditional Spatio-tem...
research
07/30/2018

Multi-Fiber Networks for Video Recognition

In this paper, we aim to reduce the computational cost of spatio-tempora...
research
10/29/2020

SAR-NAS: Skeleton-based Action Recognition via Neural Architecture Searching

This paper presents a study of automatic design of neural network archit...
research
09/23/2019

Scheduled Differentiable Architecture Search for Visual Recognition

Convolutional Neural Networks (CNN) have been regarded as a capable clas...
research
02/26/2019

STAR-Net: Action Recognition using Spatio-Temporal Activation Reprojection

While depth cameras and inertial sensors have been frequently leveraged ...
research
12/09/2021

Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search

Efficient video architecture is the key to deploying video recognition s...
research
08/30/2021

Searching for Two-Stream Models in Multivariate Space for Video Recognition

Conventional video models rely on a single stream to capture the complex...

Please sign up or login with your details

Forgot password? Click here to reset