VidConv: A modernized 2D ConvNet for Efficient Video Recognition

07/08/2022
by   Chuong H. Nguyen, et al.
0

Since being introduced in 2020, Vision Transformers (ViT) has been steadily breaking the record for many vision tasks and are often described as “all-you-need" to replace ConvNet. Despite that, ViTs are generally computational, memory-consuming, and unfriendly for embedded devices. In addition, recent research shows that standard ConvNet if redesigned and trained appropriately can compete favorably with ViT in terms of accuracy and scalability. In this paper, we adopt the modernized structure of ConvNet to design a new backbone for action recognition. Particularly, our main target is to serve for industrial product deployment, such as FPGA boards in which only standard operations are supported. Therefore, our network simply consists of 2D convolutions, without using any 3D convolution, long-range attention plugin, or Transformer blocks. While being trained with much fewer epochs (5x-10x), our backbone surpasses the methods using (2+1)D and 3D convolution, and achieve comparable results with ViT on two benchmark datasets.

READ FULL TEXT
research
01/10/2022

A ConvNet for the 2020s

The "Roaring 20s" of visual recognition began with the introduction of V...
research
11/30/2021

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

Built on top of self-attention mechanisms, vision transformers have demo...
research
08/05/2018

3D Depthwise Convolution: Reducing Model Parameters in 3D Vision Tasks

Standard 3D convolution operations require much larger amounts of memory...
research
03/08/2022

EdgeFormer: Improving Light-weight ConvNets by Learning from Vision Transformers

Recently, vision transformers started to show impressive results which o...
research
02/01/2021

Video Transformer Network

This paper presents VTN, a transformer-based framework for video recogni...
research
03/17/2023

LION: Implicit Vision Prompt Tuning

Despite recent competitive performance across a range of vision tasks, v...
research
12/25/2020

Inception Convolution with Efficient Dilation Search

Dilation convolution is a critical mutant of standard convolution neural...

Please sign up or login with your details

Forgot password? Click here to reset