DeepAI AI Chat
Log In Sign Up

Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation

by   Sunghwan Hong, et al.

This paper presents a novel cost aggregation network, called Volumetric Aggregation with Transformers (VAT), for few-shot segmentation. The use of transformers can benefit correlation map aggregation through self-attention over a global receptive field. However, the tokenization of a correlation map for transformer processing can be detrimental, because the discontinuity at token boundaries reduces the local context available near the token edges and decreases inductive bias. To address this problem, we propose a 4D Convolutional Swin Transformer, where a high-dimensional Swin Transformer is preceded by a series of small-kernel convolutions that impart local context to all pixels and introduce convolutional inductive bias. We additionally boost aggregation performance by applying transformers within a pyramidal structure, where aggregation at a coarser level guides aggregation at a finer level. Noise in the transformer output is then filtered in the subsequent decoder with the help of the query's appearance embedding. With this model, a new state-of-the-art is set for all the standard benchmarks in few-shot segmentation. It is shown that VAT attains state-of-the-art performance for semantic correspondence as well, where cost aggregation also plays a central role.


page 2

page 20

page 23

page 24

page 25

page 26

page 27


Cost Aggregation Is All You Need for Few-Shot Segmentation

We introduce a novel cost aggregation network, dubbed Volumetric Aggrega...

CATs++: Boosting Cost Aggregation with Convolutions and Transformers

Cost aggregation is a highly important process in image matching tasks, ...

Characterizing Renal Structures with 3D Block Aggregate Transformers

Efficiently quantifying renal structures can provide distinct spatial co...

nnFormer: Interleaved Transformer for Volumetric Segmentation

Transformers, the default model of choices in natural language processin...

Semantic Correspondence with Transformers

We propose a novel cost aggregation network, called Cost Aggregation wit...

Integrative Feature and Cost Aggregation with Transformers for Dense Correspondence

We present a novel architecture for dense correspondence. The current st...

Edge-augmented Graph Transformers: Global Self-attention is Enough for Graphs

Transformer neural networks have achieved state-of-the-art results for u...