Light-Weight Vision Transformer with Parallel Local and Global Self-Attention

07/18/2023
by   Nikolas Ebert, et al.
0

While transformer architectures have dominated computer vision in recent years, these models cannot easily be deployed on hardware with limited resources for autonomous driving tasks that require real-time-performance. Their computational complexity and memory requirements limits their use, especially for applications with high-resolution inputs. In our work, we redesign the powerful state-of-the-art Vision Transformer PLG-ViT to a much more compact and efficient architecture that is suitable for such tasks. We identify computationally expensive blocks in the original PLG-ViT architecture and propose several redesigns aimed at reducing the number of parameters and floating-point operations. As a result of our redesign, we are able to reduce PLG-ViT in size by a factor of 5, with a moderate drop in performance. We propose two variants, optimized for the best trade-off between parameter count to runtime as well as parameter count to accuracy. With only 5 million parameters, we achieve 79.5% top-1 accuracy on the ImageNet-1K classification benchmark. Our networks demonstrate great performance on general vision benchmarks like COCO instance segmentation. In addition, we conduct a series of experiments, demonstrating the potential of our approach in solving various tasks specifically tailored to the challenges of autonomous driving and transportation.

READ FULL TEXT

page 1

page 7

research
10/08/2018

Light-Weight RefineNet for Real-Time Semantic Segmentation

We consider an important task of effective and efficient semantic image ...
research
02/16/2023

Efficiency 360: Efficient Vision Transformers

Transformers are widely used for solving tasks in natural language proce...
research
04/14/2022

MiniViT: Compressing Vision Transformers with Weight Multiplexing

Vision Transformer (ViT) models have recently drawn much attention in co...
research
04/21/2023

Transformer-based models and hardware acceleration analysis in autonomous driving: A survey

Transformer architectures have exhibited promising performance in variou...
research
08/08/2017

Fast Scene Understanding for Autonomous Driving

Most approaches for instance-aware semantic labeling traditionally focus...
research
09/05/2023

A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking

Vision Transformer (ViT) architectures are becoming increasingly popular...
research
05/18/2023

Less is More! A slim architecture for optimal language translation

The softmax attention mechanism has emerged as a noteworthy development ...

Please sign up or login with your details

Forgot password? Click here to reset