Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention and Residual Connection in Kernel Space

04/13/2023
by   Seokju Yun, et al.
0

We introduce Dynamic Mobile-Former(DMF), maximizes the capabilities of dynamic convolution by harmonizing it with efficient operators.Our Dynamic MobileFormer effectively utilizes the advantages of Dynamic MobileNet (MobileNet equipped with dynamic convolution) using global information from light-weight attention.A Transformer in Dynamic Mobile-Former only requires a few randomly initialized tokens to calculate global features, making it computationally efficient.And a bridge between Dynamic MobileNet and Transformer allows for bidirectional integration of local and global features.We also simplify the optimization process of vanilla dynamic convolution by splitting the convolution kernel into an input-agnostic kernel and an input-dependent kernel.This allows for optimization in a wider kernel space, resulting in enhanced capacity.By integrating lightweight attention and enhanced dynamic convolution, our Dynamic Mobile-Former achieves not only high efficiency, but also strong performance.We benchmark the Dynamic Mobile-Former on a series of vision tasks, and showcase that it achieves impressive performance on image classification, COCO detection, and instanace segmentation.For example, our DMF hits the top-1 accuracy of 79.4 ImageNet-1K, much higher than PVT-Tiny by 4.3 FLOPs.Additionally,our proposed DMF-S model performed well on challenging vision datasets such as COCO, achieving a 39.0 that of the Mobile-Former 508M model, despite using 3 GFLOPs less computations.Code and models are available at https://github.com/ysj9909/DMF

READ FULL TEXT
research
08/12/2021

Mobile-Former: Bridging MobileNet and Transformer

We present Mobile-Former, a parallel design of MobileNet and Transformer...
research
07/22/2019

MixNet: Mixed Depthwise Convolutional Kernels

Depthwise convolution is becoming increasingly popular in modern efficie...
research
12/20/2021

Lite Vision Transformer with Enhanced Self-Attention

Despite the impressive representation capacity of vision transformer mod...
research
06/08/2021

Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight

Vision Transformer (ViT) attains state-of-the-art performance in visual ...
research
10/04/2022

MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models

This paper presents MOAT, a family of neural networks that build on top ...
research
08/29/2023

Learning to Upsample by Learning to Sample

We present DySample, an ultra-lightweight and effective dynamic upsample...
research
03/15/2021

Revisiting Dynamic Convolution via Matrix Decomposition

Recent research in dynamic convolution shows substantial performance boo...

Please sign up or login with your details

Forgot password? Click here to reset