MoCoViT: Mobile Convolutional Vision Transformer

05/25/2022
by   Hailong Ma, et al.
0

Recently, Transformer networks have achieved impressive results on a variety of vision tasks. However, most of them are computationally expensive and not suitable for real-world mobile applications. In this work, we present Mobile Convolutional Vision Transformer (MoCoViT), which improves in performance and efficiency by introducing transformer into mobile convolutional networks to leverage the benefits of both architectures. Different from recent works on vision transformer, the mobile transformer block in MoCoViT is carefully designed for mobile devices and is very lightweight, accomplished through two primary modifications: the Mobile Self-Attention (MoSA) module and the Mobile Feed Forward Network (MoFFN). MoSA simplifies the calculation of the attention map through Branch Sharing scheme while MoFFN serves as a mobile version of MLP in the transformer, further reducing the computation by a large margin. Comprehensive experiments verify that our proposed MoCoViT family outperform state-of-the-art portable CNNs and transformer neural architectures on various vision tasks. On ImageNet classification, it achieves 74.5 147M FLOPs, gaining 1.2 COCO object detection task, MoCoViT outperforms GhostNet by 2.1 AP in RetinaNet framework.

READ FULL TEXT
research
10/05/2021

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

Light-weight convolutional neural networks (CNNs) are the de-facto for m...
research
05/30/2023

Vision Transformers for Mobile Applications: A Short Survey

Vision Transformers (ViTs) have demonstrated state-of-the-art performanc...
research
06/12/2021

Dynamic Clone Transformer for Efficient Convolutional Neural Netwoks

Convolutional networks (ConvNets) have shown impressive capability to so...
research
08/12/2021

Mobile-Former: Bridging MobileNet and Transformer

We present Mobile-Former, a parallel design of MobileNet and Transformer...
research
08/30/2021

Exploring and Improving Mobile Level Vision Transformers

We study the vision transformer structure in the mobile level in this pa...
research
04/16/2022

Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks

Despite the exciting performance, Transformer is criticized for its exce...
research
06/19/2023

Vision Transformer with Attention Map Hallucination and FFN Compaction

Vision Transformer(ViT) is now dominating many vision tasks. The drawbac...

Please sign up or login with your details

Forgot password? Click here to reset