WT-MVSNet: Window-based Transformers for Multi-view Stereo

05/28/2022
by   Jinli Liao, et al.
0

Recently, Transformers were shown to enhance the performance of multi-view stereo by enabling long-range feature interaction. In this work, we propose Window-based Transformers (WT) for local feature matching and global feature aggregation in multi-view stereo. We introduce a Window-based Epipolar Transformer (WET) which reduces matching redundancy by using epipolar constraints. Since point-to-line matching is sensitive to erroneous camera pose and calibration, we match windows near the epipolar lines. A second Shifted WT is employed for aggregating global information within cost volume. We present a novel Cost Transformer (CT) to replace 3D convolutions for cost volume regularization. In order to better constrain the estimated depth maps from multiple views, we further design a novel geometric consistency loss (Geo Loss) which punishes unreliable areas where multi-view consistency is not satisfied. Our WT multi-view stereo method (WT-MVSNet) achieves state-of-the-art performance across multiple datasets and ranks 1^st on Tanks and Temples benchmark.

READ FULL TEXT

page 5

page 7

page 8

page 12

page 13

research
11/29/2021

TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers

In this paper, we present TransMVSNet, based on our exploration of featu...
research
06/21/2022

Enhancing Multi-view Stereo with Contrastive Matching and Weighted Focal Loss

Learning-based multi-view stereo (MVS) methods have made impressive prog...
research
05/10/2020

Epipolar Transformers

A common approach to localize 3D human joints in a synchronized and cali...
research
11/28/2022

A Light Touch Approach to Teaching Transformers Multi-view Geometry

Transformers are powerful visual learners, in large part due to their co...
research
03/21/2021

Multi-view analysis of unregistered medical images using cross-view transformers

Multi-view medical image analysis often depends on the combination of in...
research
10/16/2021

Multi-View Stereo Network with attention thin volume

We propose an efficient multi-view stereo (MVS) network for infering dep...
research
08/04/2022

MVSFormer: Multi-View Stereo with Pre-trained Vision Transformers and Temperature-based Depth

Feature representation learning is the key recipe for learning-based Mul...

Please sign up or login with your details

Forgot password? Click here to reset