Fully Transformer Networks for Semantic Image Segmentation

06/08/2021
by   Sitong Wu, et al.
0

Transformers have shown impressive performance in various natural language processing and computer vision tasks, due to the capability of modeling long-range dependencies. Recent progress has demonstrated to combine such transformers with CNN-based semantic image segmentation models is very promising. However, it is not well studied yet on how well a pure transformer based approach can achieve for image segmentation. In this work, we explore a novel framework for semantic image segmentation, which is encoder-decoder based Fully Transformer Networks (FTN). Specifically, we first propose a Pyramid Group Transformer (PGT) as the encoder for progressively learning hierarchical features, while reducing the computation complexity of the standard visual transformer(ViT). Then, we propose a Feature Pyramid Transformer (FPT) to fuse semantic-level and spatial-level information from multiple levels of the PGT encoder for semantic image segmentation. Surprisingly, this simple baseline can achieve new state-of-the-art results on multiple challenging semantic segmentation benchmarks, including PASCAL Context, ADE20K and COCO-Stuff. The source code will be released upon the publication of this work.

READ FULL TEXT

page 2

page 4

page 15

page 16

page 17

research
03/18/2021

UNETR: Transformers for 3D Medical Image Segmentation

Fully Convolutional Neural Networks (FCNNs) with contracting and expansi...
research
01/04/2022

Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images

Semantic segmentation of brain tumors is a fundamental medical image ana...
research
09/20/2018

Recent progress in semantic image segmentation

Semantic image segmentation, which becomes one of the key applications i...
research
12/28/2022

Representation Separation for Semantic Segmentation with Vision Transformers

Vision transformers (ViTs) encoding an image as a sequence of patches br...
research
05/14/2019

Efficient Ladder-style DenseNets for Semantic Segmentation of Large Images

Recent progress of deep image classification models has provided great p...
research
03/31/2022

ReSTR: Convolution-free Referring Image Segmentation Using Transformers

Referring image segmentation is an advanced semantic segmentation task w...
research
07/03/2023

MeT: A Graph Transformer for Semantic Segmentation of 3D Meshes

Polygonal meshes have become the standard for discretely approximating 3...

Please sign up or login with your details

Forgot password? Click here to reset