MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features

09/30/2022
by   Shakti N. Wadekar, et al.
0

MobileViT (MobileViTv1) combines convolutional neural networks (CNNs) and vision transformers (ViTs) to create light-weight models for mobile vision tasks. Though the main MobileViTv1-block helps to achieve competitive state-of-the-art results, the fusion block inside MobileViTv1-block, creates scaling challenges and has a complex learning task. We propose changes to the fusion block that are simple and effective to create MobileViTv3-block, which addresses the scaling and simplifies the learning task. Our proposed MobileViTv3-block used to create MobileViTv3-XXS, XS and S models outperform MobileViTv1 on ImageNet-1k, ADE20K, COCO and PascalVOC2012 datasets. On ImageNet-1K, MobileViTv3-XXS and MobileViTv3-XS surpasses MobileViTv1-XXS and MobileViTv1-XS by 2 architecture removes fusion block and uses linear complexity transformers to perform better than MobileViTv1. We add our proposed fusion block to MobileViTv2 to create MobileViTv3-0.5, 0.75 and 1.0 models. These new models give better accuracy numbers on ImageNet-1k, ADE20K, COCO and PascalVOC2012 datasets as compared to MobileViTv2. MobileViTv3-0.5 and MobileViTv3-0.75 outperforms MobileViTv2-0.5 and MobileViTv2-0.75 by 2.1 on ImageNet-1K dataset. For segmentation task, MobileViTv3-1.0 achieves 2.07 and 1.1 PascalVOC2012 dataset respectively. Our code and the trained models are available at: https://github.com/micronDLA/MobileViTv3

READ FULL TEXT

page 17

page 18

page 19

page 20

research
10/05/2021

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

Light-weight convolutional neural networks (CNNs) are the de-facto for m...
research
07/17/2022

Performance degradation of ImageNet trained models by simple image transformations

ImageNet trained PyTorch models are generally preferred as the off-the-s...
research
03/30/2023

ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing

Recent studies have shown that higher accuracy on ImageNet usually leads...
research
04/07/2022

Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results

ImageNet serves as the primary dataset for evaluating the quality of com...
research
07/21/2022

SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks

Recent isotropic networks, such as ConvMixer and vision transformers, ha...
research
03/30/2021

Automated Cleanup of the ImageNet Dataset by Model Consensus, Explainability and Confident Learning

The convolutional neural networks (CNNs) trained on ILSVRC12 ImageNet we...
research
05/13/2023

Meta-Polyp: a baseline for efficient Polyp segmentation

In recent years, polyp segmentation has gained significant importance, a...

Please sign up or login with your details

Forgot password? Click here to reset