Cost Aggregation Is All You Need for Few-Shot Segmentation

by   Sunghwan Hong, et al.

We introduce a novel cost aggregation network, dubbed Volumetric Aggregation with Transformers (VAT), to tackle the few-shot segmentation task by using both convolutions and transformers to efficiently handle high dimensional correlation maps between query and support. In specific, we propose our encoder consisting of volume embedding module to not only transform the correlation maps into more tractable size but also inject some convolutional inductive bias and volumetric transformer module for the cost aggregation. Our encoder has a pyramidal structure to let the coarser level aggregation to guide the finer level and enforce to learn complementary matching scores. We then feed the output into our affinity-aware decoder along with the projected feature maps for guiding the segmentation process. Combining these components, we conduct experiments to demonstrate the effectiveness of the proposed method, and our method sets a new state-of-the-art for all the standard benchmarks in few-shot segmentation task. Furthermore, we find that the proposed method attains state-of-the-art performance even for the standard benchmarks in semantic correspondence task although not specifically designed for this task. We also provide an extensive ablation study to validate our architectural choices. The trained weights and codes are available at:


page 1

page 8

page 14

page 16

page 17

page 18

page 19

page 20


Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation

This paper presents a novel cost aggregation network, called Volumetric ...

Doubly Deformable Aggregation of Covariance Matrices for Few-shot Segmentation

Training semantic segmentation models with few annotated samples has gre...

Semantic Correspondence with Transformers

We propose a novel cost aggregation network, called Cost Aggregation wit...

Integrative Feature and Cost Aggregation with Transformers for Dense Correspondence

We present a novel architecture for dense correspondence. The current st...

CATs++: Boosting Cost Aggregation with Convolutions and Transformers

Cost aggregation is a highly important process in image matching tasks, ...

Multi-Level Matching and Aggregation Network for Few-Shot Relation Classification

This paper presents a multi-level matching and aggregation network (MLMA...

Characterizing Renal Structures with 3D Block Aggregate Transformers

Efficiently quantifying renal structures can provide distinct spatial co...

Please sign up or login with your details

Forgot password? Click here to reset