Searching for Two-Stream Models in Multivariate Space for Video Recognition

08/30/2021
by   Xinyu Gong, et al.
0

Conventional video models rely on a single stream to capture the complex spatial-temporal features. Recent work on two-stream video models, such as SlowFast network and AssembleNet, prescribe separate streams to learn complementary features, and achieve stronger performance. However, manually designing both streams as well as the in-between fusion blocks is a daunting task, requiring to explore a tremendously large design space. Such manual exploration is time-consuming and often ends up with sub-optimal architectures when computational resources are limited and the exploration is insufficient. In this work, we present a pragmatic neural architecture search approach, which is able to search for two-stream video models in giant spaces efficiently. We design a multivariate search space, including 6 search variables to capture a wide variety of choices in designing two-stream models. Furthermore, we propose a progressive search procedure, by searching for the architecture of individual streams, fusion blocks, and attention blocks one after the other. We demonstrate two-stream models with significantly better performance can be automatically discovered in our design space. Our searched two-stream models, namely Auto-TSNet, consistently outperform other models on standard benchmarks. On Kinetics, compared with the SlowFast model, our Auto-TSNet-L model reduces FLOPS by nearly 11 times while achieving the same accuracy 78.9 Something-Something-V2, Auto-TSNet-M improves the accuracy by at least 2 other methods which use less than 50 GFLOPS per video.

READ FULL TEXT
research
09/25/2021

Profiling Neural Blocks and Design Spaces for Mobile Neural Architecture Search

Neural architecture search automates neural network design and has achie...
research
11/02/2020

PV-NAS: Practical Neural Architecture Search for Video Recognition

Recently, deep learning has been utilized to solve video recognition pro...
research
12/09/2021

Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search

Efficient video architecture is the key to deploying video recognition s...
research
05/21/2021

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

Human pose estimation has achieved significant progress in recent years....
research
03/21/2021

MoViNets: Mobile Video Networks for Efficient Video Recognition

We present Mobile Video Networks (MoViNets), a family of computation and...
research
07/10/2019

Video Action Recognition Via Neural Architecture Searching

Deep neural networks have achieved great success for video analysis and ...
research
03/30/2020

Designing Network Design Spaces

In this work, we present a new network design paradigm. Our goal is to h...

Please sign up or login with your details

Forgot password? Click here to reset