MVSTER: Epipolar Transformer for Efficient Multi-View Stereo

04/15/2022
by   Xiaofeng Wang, et al.
6

Learning-based Multi-View Stereo (MVS) methods warp source images into the reference camera frustum to form 3D volumes, which are fused as a cost volume to be regularized by subsequent networks. The fusing step plays a vital role in bridging 2D semantics and 3D spatial associations. However, previous methods utilize extra networks to learn 2D information as fusing cues, underusing 3D spatial correlations and bringing additional computation costs. Therefore, we present MVSTER, which leverages the proposed epipolar Transformer to learn both 2D semantics and 3D spatial associations efficiently. Specifically, the epipolar Transformer utilizes a detachable monocular depth estimator to enhance 2D semantics and uses cross-attention to construct data-dependent 3D associations along epipolar line. Additionally, MVSTER is built in a cascade structure, where entropy-regularized optimal transport is leveraged to propagate finer depth estimations in each stage. Extensive experiments show MVSTER achieves state-of-the-art reconstruction performance with significantly higher efficiency: Compared with MVSNet and CasMVSNet, our MVSTER achieves 34 and 14 reductions in running time. MVSTER also ranks first on Tanks Temples-Advanced among all published works. Code is released at https://github.com/JeffWang987.

READ FULL TEXT

page 7

page 11

page 14

page 18

page 19

page 20

research
04/04/2022

RayMVSNet: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo

Learning-based multi-view stereo (MVS) has by far centered around 3D con...
research
05/28/2022

RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo

In this paper, we present a learning-based approach for multi-view stere...
research
08/19/2022

Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning

Self-supervised monocular methods can efficiently learn depth informatio...
research
04/08/2023

POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo

Enable neural networks to capture 3D geometrical-aware features is essen...
research
12/04/2021

Generalized Binary Search Network for Highly-Efficient Multi-View Stereo

Multi-view Stereo (MVS) with known camera parameters is essentially a 1D...
research
12/11/2021

Curvature-guided dynamic scale networks for Multi-view Stereo

Multi-view stereo (MVS) is a crucial task for precise 3D reconstruction....
research
04/06/2022

DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors

Camera-based 3D object detectors are welcome due to their wider deployme...

Please sign up or login with your details

Forgot password? Click here to reset