MobileViG: Graph-Based Sparse Attention for Mobile Vision Applications

07/01/2023
by   Mustafa Munir, et al.
0

Traditionally, convolutional neural networks (CNN) and vision transformers (ViT) have dominated computer vision. However, recently proposed vision graph neural networks (ViG) provide a new avenue for exploration. Unfortunately, for mobile applications, ViGs are computationally expensive due to the overhead of representing images as graph structures. In this work, we propose a new graph-based sparse attention mechanism, Sparse Vision Graph Attention (SVGA), that is designed for ViGs running on mobile devices. Additionally, we propose the first hybrid CNN-GNN architecture for vision tasks on mobile devices, MobileViG, which uses SVGA. Extensive experiments show that MobileViG beats existing ViG models and existing mobile CNN and ViT architectures in terms of accuracy and/or speed on image classification, object detection, and instance segmentation tasks. Our fastest model, MobileViG-Ti, achieves 75.7 accuracy on ImageNet-1K with 0.78 ms inference latency on iPhone 13 Mini NPU (compiled with CoreML), which is faster than MobileNetV2x1.4 (1.02 ms, 74.7 top-1) and MobileNetV2x1.0 (0.81 ms, 71.8 MobileViG-B obtains 82.6 faster and more accurate than the similarly sized EfficientFormer-L3 model (2.77 ms, 82.4 architectures can be a new avenue of exploration for designing models that are extremely fast and accurate on mobile devices. Our code is publicly available at https://github.com/SLDGroup/MobileViG.

READ FULL TEXT
research
06/02/2022

EfficientFormer: Vision Transformers at MobileNet Speed

Vision Transformers (ViT) have shown rapid progress in computer vision t...
research
07/18/2023

RepViT: Revisiting Mobile CNN From ViT Perspective

Recently, lightweight Vision Transformers (ViTs) demonstrate superior pe...
research
05/30/2023

Vision Transformers for Mobile Applications: A Short Survey

Vision Transformers (ViTs) have demonstrated state-of-the-art performanc...
research
03/29/2018

Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision

Continuous computer vision (CV) tasks increasingly rely on convolutional...
research
12/21/2015

Quantized Convolutional Neural Networks for Mobile Devices

Recently, convolutional neural networks (CNN) have demonstrated impressi...
research
04/11/2023

PP-MobileSeg: Explore the Fast and Accurate Semantic Segmentation Model on Mobile Devices

The success of transformers in computer vision has led to several attemp...
research
12/01/2017

Accelerating Convolutional Neural Networks for Continuous Mobile Vision via Cache Reuse

Convolutional Neural Network (CNN) is the state-of-the-art algorithm of ...

Please sign up or login with your details

Forgot password? Click here to reset