X3D: Expanding Architectures for Efficient Video Recognition

04/09/2020
by   Christoph Feichtenhofer, et al.
0

This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth. Inspired by feature selection methods in machine learning, a simple stepwise network expansion approach is employed that expands a single axis in each step, such that good accuracy to complexity trade-off is achieved. To expand X3D to a specific target complexity, we perform progressive forward expansion followed by backward contraction. X3D achieves state-of-the-art performance while requiring 4.8x and 5.5x fewer multiply-adds and parameters for similar accuracy as previous work. Our most surprising finding is that networks with high spatiotemporal resolution can perform well, while being extremely light in terms of network width and parameters. We report competitive accuracy at unprecedented efficiency on video classification and detection benchmarks. Code will be available at: https://github.com/facebookresearch/SlowFast

READ FULL TEXT

page 3

page 5

page 6

page 10

page 11

page 12

page 13

page 14

research
07/22/2023

An X3D Neural Network Analysis for Runner's Performance Assessment in a Wild Sporting Environment

We present a transfer learning analysis on a sporting environment of the...
research
03/21/2021

MoViNets: Mobile Video Networks for Efficient Video Recognition

We present Mobile Video Networks (MoViNets), a family of computation and...
research
12/09/2021

Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search

Efficient video architecture is the key to deploying video recognition s...
research
04/20/2023

Feature-compatible Progressive Learning for Video Copy Detection

Video Copy Detection (VCD) has been developed to identify instances of u...
research
09/26/2022

Rethinking Resolution in the Context of Efficient Video Recognition

In this paper, we empirically study how to make the most of low-resoluti...
research
06/08/2019

TransNet: A deep network for fast detection of common shot transitions

Shot boundary detection (SBD) is an important first step in many video p...
research
08/11/2019

HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions

MobileNets, a class of top-performing convolutional neural network archi...

Please sign up or login with your details

Forgot password? Click here to reset