DeepAI AI Chat
Log In Sign Up

Mobile-Former: Bridging MobileNet and Transformer

08/12/2021
by   Yinpeng Chen, et al.
Microsoft
USTC
28

We present Mobile-Former, a parallel design of MobileNet and Transformer with a two-way bridge in between. This structure leverages the advantage of MobileNet at local processing and transformer at global interaction. And the bridge enables bidirectional fusion of local and global features. Different with recent works on vision transformer, the transformer in Mobile-Former contains very few tokens (e.g. less than 6 tokens) that are randomly initialized, resulting in low computational cost. Combining with the proposed light-weight cross attention to model the bridge, Mobile-Former is not only computationally efficient, but also has more representation power, outperforming MobileNetV3 at low FLOP regime from 25M to 500M FLOPs on ImageNet classification. For instance, it achieves 77.9% top-1 accuracy at 294M FLOPs, gaining 1.3% over MobileNetV3 but saving 17% of computations. When transferring to object detection, Mobile-Former outperforms MobileNetV3 by 8.6 AP.

READ FULL TEXT

page 3

page 8

page 9

page 13

04/13/2023

Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention and Residual Connection in Kernel Space

We introduce Dynamic Mobile-Former(DMF), maximizes the capabilities of d...
05/25/2022

MoCoViT: Mobile Convolutional Vision Transformer

Recently, Transformer networks have achieved impressive results on a var...
11/14/2022

CabViT: Cross Attention among Blocks for Vision Transformer

Since the vision transformer (ViT) has achieved impressive performance i...
11/25/2021

Global Interaction Modelling in Vision Transformer via Super Tokens

With the popularity of Transformer architectures in computer vision, the...
03/23/2023

MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer

Mobile monocular 3D object detection (Mono3D) (e.g., on a vehicle, a dro...
11/20/2021

Discrete Representations Strengthen Vision Transformer Robustness

Vision Transformer (ViT) is emerging as the state-of-the-art architectur...
08/30/2021

Hire-MLP: Vision MLP via Hierarchical Rearrangement

This paper presents Hire-MLP, a simple yet competitive vision MLP archit...

Code Repositories

Pytorch-implementation-of-Mobile-Former

Simple implementation of Mobile-Former on Pytorch


view repo

MobileFormer

MobileFormer in torch


view repo