Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers

06/03/2023
by   Chenyang Lu, et al.
0

This paper introduces Content-aware Token Sharing (CTS), a token reduction approach that improves the computational efficiency of semantic segmentation networks that use Vision Transformers (ViTs). Existing works have proposed token reduction approaches to improve the efficiency of ViT-based image classification networks, but these methods are not directly applicable to semantic segmentation, which we address in this work. We observe that, for semantic segmentation, multiple image patches can share a token if they contain the same semantic class, as they contain redundant information. Our approach leverages this by employing an efficient, class-agnostic policy network that predicts if image patches contain the same semantic class, and lets them share a token if they do. With experiments, we explore the critical design choices of CTS and show its effectiveness on the ADE20K, Pascal Context and Cityscapes datasets, various ViT backbones, and different segmentation decoders. With Content-aware Token Sharing, we are able to reduce the number of processed tokens by up to 44

READ FULL TEXT

page 5

page 13

page 14

page 15

page 16

page 17

research
08/03/2023

Dynamic Token-Pass Transformers for Semantic Segmentation

Vision transformers (ViT) usually extract features via forwarding all th...
research
06/05/2020

Visual Transformers: Token-based Image Representation and Processing for Computer Vision

Computer vision has achieved great success using standardized image repr...
research
03/23/2022

StructToken : Rethinking Semantic Segmentation with Structural Prior

In this paper, we present structure token (StructToken), a new paradigm ...
research
11/24/2021

An Image Patch is a Wave: Phase-Aware Vision MLP

Different from traditional convolutional neural network (CNN) and vision...
research
04/12/2022

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

Although vision transformers (ViTs) have achieved great success in compu...
research
07/07/2022

Entropy-Based Feature Extraction For Real-Time Semantic Segmentation

This paper introduces an efficient patch-based computational module, coi...
research
07/05/2023

MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers

The input tokens to Vision Transformers carry little semantic meaning as...

Please sign up or login with your details

Forgot password? Click here to reset