DeepAI AI Chat
Log In Sign Up

CATs++: Boosting Cost Aggregation with Convolutions and Transformers

by   Seokju Cho, et al.

Cost aggregation is a highly important process in image matching tasks, which aims to disambiguate the noisy matching scores. Existing methods generally tackle this by hand-crafted or CNN-based methods, which either lack robustness to severe deformations or inherit the limitation of CNNs that fail to discriminate incorrect matches due to limited receptive fields and inadaptability. In this paper, we introduce Cost Aggregation with Transformers (CATs) to tackle this by exploring global consensus among initial correlation map with the help of some architectural designs that allow us to fully enjoy global receptive fields of self-attention mechanism. Also, to alleviate some of the limitations that CATs may face, i.e., high computational costs induced by the use of a standard transformer that its complexity grows with the size of spatial and feature dimensions, which restrict its applicability only at limited resolution and result in rather limited performance, we propose CATs++, an extension of CATs. Our proposed methods outperform the previous state-of-the-art methods by large margins, setting a new state-of-the-art for all the benchmarks, including PF-WILLOW, PF-PASCAL, and SPair-71k. We further provide extensive ablation studies and analyses.


Semantic Correspondence with Transformers

We propose a novel cost aggregation network, called Cost Aggregation wit...

Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation

This paper presents a novel cost aggregation network, called Volumetric ...

Cost Aggregation Is All You Need for Few-Shot Segmentation

We introduce a novel cost aggregation network, dubbed Volumetric Aggrega...

TANet: A new Paradigm for Global Face Super-resolution via Transformer-CNN Aggregation Network

Recently, face super-resolution (FSR) methods either feed whole face ima...

Multi-scale Feature Aggregation for Crowd Counting

Convolutional Neural Network (CNN) based crowd counting methods have ach...

Integrative Feature and Cost Aggregation with Transformers for Dense Correspondence

We present a novel architecture for dense correspondence. The current st...

Efficient Linear Attention for Fast and Accurate Keypoint Matching

Recently Transformers have provided state-of-the-art performance in spar...