MatchFormer: Interleaving Attention in Transformers for Feature Matching

03/17/2022
by   Qing Wang, et al.
5

Local feature matching is a computationally intensive task at the subpixel level. While detector-based methods coupled with feature descriptors struggle in low-texture scenes, CNN-based methods with a sequential extract-to-match pipeline, fail to make use of the matching capacity of the encoder and tend to overburden the decoder for matching. In contrast, we propose a novel hierarchical extract-and-match transformer, termed as MatchFormer. Inside each stage of the hierarchical encoder, we interleave self-attention for feature extraction and cross-attention for feature matching, enabling a human-intuitive extract-and-match scheme. Such a match-aware encoder releases the overloaded decoder and makes the model highly efficient. Further, combining self- and cross-attention on multi-scale features in a hierarchical architecture improves matching robustness, particularly in low-texture indoor scenes or with less outdoor training data. Thanks to such a strategy, MatchFormer is a multi-win solution in efficiency, robustness, and precision. Compared to the previous best method in indoor pose estimation, our lite MatchFormer has only 45 GFLOPs, yet achieves a +1.3 large MatchFormer reaches state-of-the-art on four different benchmarks, including indoor pose estimation (ScanNet), outdoor pose estimation (MegaDepth), homography estimation and image matching (HPatch), and visual localization (InLoc). Code will be made publicly available at https://github.com/jamycheung/MatchFormer.

READ FULL TEXT

page 1

page 6

page 9

page 14

research
03/06/2023

Improving Transformer-based Image Matching by Cascaded Capturing Spatially Informative Keypoints

Learning robust local image feature matching is a fundamental low-level ...
research
09/06/2021

Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

3D human shape and pose estimation is the essential task for human motio...
research
07/11/2023

ResMatch: Residual Attention Learning for Local Feature Matching

Attention-based graph neural networks have made great progress in featur...
research
04/01/2022

DFNet: Enhance Absolute Pose Regression with Direct Feature Matching

We introduce a camera relocalization pipeline that combines absolute pos...
research
07/01/2022

TopicFM: Robust and Interpretable Feature Matching with Topic-assisted

Finding correspondences across images is an important task in many visua...
research
08/19/2023

Scene-Aware Feature Matching

Current feature matching methods focus on point-level matching, pursuing...
research
05/08/2023

Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting

Class-agnostic counting (CAC) aims to count objects of interest from a q...

Please sign up or login with your details

Forgot password? Click here to reset