EdgeFormer: Improving Light-weight ConvNets by Learning from Vision Transformers

03/08/2022
by   Haokui Zhang, et al.
13

Recently, vision transformers started to show impressive results which outperform large convolution based models significantly. However, in the area of small models for mobile or resource constrained devices, ConvNet still has its own advantages in both performance and model complexity. We propose EdgeFormer, a pure ConvNet based backbone model that further strengthens these advantages by fusing the merits of vision transformers into ConvNets. Specifically, we propose global circular convolution (GCC) with position embeddings, a light-weight convolution op which boasts a global receptive field while producing location sensitive features as in local convolutions. We combine the GCCs and squeeze-exictation ops to form a meta-former like model block, which further has the attention mechanism like transformers. The aforementioned block can be used in plug-and-play manner to replace relevant blocks in ConvNets or transformers. Experiment results show that the proposed EdgeFormer achieves better performance than popular light-weight ConvNets and vision transformer based models in common vision tasks and datasets, while having fewer parameters and faster inference speed. For classification on ImageNet-1k, EdgeFormer achieves 78.6 parameters, saving 11 higher accuracy and 23 compared with MobileViT, and uses only 0.5 times parameters but gaining 2.7 accuracy compared with DeIT. On MS-COCO object detection and PASCAL VOC segmentation tasks, EdgeFormer also shows better performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2021

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

Light-weight convolutional neural networks (CNNs) are the de-facto for m...
research
08/03/2020

DeLighT: Very Deep and Light-weight Transformer

We introduce a very deep and light-weight transformer, DeLighT, that del...
research
10/08/2022

Fast-ParC: Position Aware Global Kernel for ConvNets and ViTs

Transformer models have made tremendous progress in various fields in re...
research
06/10/2023

FalconNet: Factorization for the Light-weight ConvNets

Designing light-weight CNN models with little parameters and Flops is a ...
research
04/20/2022

Residual Mixture of Experts

Mixture of Experts (MoE) is able to scale up vision transformers effecti...
research
10/18/2018

Decoupling Semantic Context and Color Correlation with multi-class cross branch regularization

Success and applicability of Deep Neural Network (DNN) based methods for...
research
07/08/2022

VidConv: A modernized 2D ConvNet for Efficient Video Recognition

Since being introduced in 2020, Vision Transformers (ViT) has been stead...

Please sign up or login with your details

Forgot password? Click here to reset