Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts

05/30/2023
by   Rishov Sarkar, et al.
0

Computer vision researchers are embracing two promising paradigms: Vision Transformers (ViTs) and Multi-task Learning (MTL), which both show great performance but are computation-intensive, given the quadratic complexity of self-attention in ViT and the need to activate an entire large MTL model for one task. M^3ViT is the latest multi-task ViT model that introduces mixture-of-experts (MoE), where only a small portion of subnetworks ("experts") are sparsely and dynamically activated based on the current task. M^3ViT achieves better accuracy and over 80 challenges for efficient deployment on FPGA. Our work, dubbed Edge-MoE, solves the challenges to introduce the first end-to-end FPGA accelerator for multi-task ViT with a collection of architectural innovations, including (1) a novel reordering mechanism for self-attention, which requires only constant bandwidth regardless of the target parallelism; (2) a fast single-pass softmax approximation; (3) an accurate and low-cost GELU approximation; (4) a unified and flexible computing unit that is shared by almost all computational layers to maximally reduce resource usage; and (5) uniquely for M^3ViT, a novel patch reordering method to eliminate memory access overhead. Edge-MoE achieves 2.24x and 4.90x better energy efficiency comparing with GPU and CPU, respectively. A real-time video demonstration is available online, along with our open-source code written using High-Level Synthesis.

READ FULL TEXT

page 1

page 4

page 5

research
10/26/2022

M^3ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

Multi-task learning (MTL) encapsulates multiple learned tasks in a singl...
research
05/25/2022

Eliciting Transferability in Multi-task Learning with Task-level Mixture-of-Experts

Recent work suggests that transformer models are capable of multi-task l...
research
10/18/2021

Energon: Towards Efficient Acceleration of Transformers Using Dynamic Sparse Attention

In recent years, transformer models have revolutionized Natural Language...
research
07/31/2023

Generative models for wearables data

Data scarcity is a common obstacle in medical research due to the high c...
research
07/16/2023

TransNuSeg: A Lightweight Multi-Task Transformer for Nuclei Segmentation

Nuclei appear small in size, yet, in real clinical practice, the global ...
research
05/25/2022

Real-Time Video Deblurring via Lightweight Motion Compensation

While motion compensation greatly improves video deblurring quality, sep...

Please sign up or login with your details

Forgot password? Click here to reset