VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion

12/01/2021
by   Noah Stier, et al.
0

Recent volumetric 3D reconstruction methods can produce very accurate results, with plausible geometry even for unobserved surfaces. However, they face an undesirable trade-off when it comes to multi-view fusion. They can fuse all available view information by global averaging, thus losing fine detail, or they can heuristically cluster views for local fusion, thus restricting their ability to consider all views jointly. Our key insight is that greater detail can be retained without restricting view diversity by learning a view-fusion function conditioned on camera pose and image content. We propose to learn this multi-view fusion using a transformer. To this end, we introduce VoRTX, an end-to-end volumetric 3D reconstruction network using transformers for wide-baseline, multi-view feature fusion. Our model is occlusion-aware, leveraging the transformer architecture to predict an initial, projective scene geometry estimate. This estimate is used to avoid backprojecting image features through surfaces into occluded regions. We train our model on ScanNet and show that it produces better reconstructions than state-of-the-art methods. We also demonstrate generalization without any fine-tuning, outperforming the same state-of-the-art methods on two other datasets, TUM-RGBD and ICL-NUIM.

READ FULL TEXT

page 1

page 4

page 7

research
07/05/2021

TransformerFusion: Monocular RGB Scene Reconstruction using Transformers

We introduce TransformerFusion, a transformer-based 3D scene reconstruct...
research
10/17/2021

3D-RETR: End-to-End Single and Multi-View 3D Reconstruction with Transformers

3D reconstruction aims to reconstruct 3D objects from 2D views. Previous...
research
05/26/2020

SurfaceNet+: An End-to-end 3D Neural Network for Very Sparse Multi-view Stereopsis

Multi-view stereopsis (MVS) tries to recover the 3D model from 2D images...
research
07/05/2022

Array Camera Image Fusion using Physics-Aware Transformers

We demonstrate a physics-aware transformer for feature-based data fusion...
research
05/05/2021

FLEX: Parameter-free Multi-view 3D Human Motion Reconstruction

The increasing availability of video recordings made by multiple cameras...
research
08/17/2023

V-FUSE: Volumetric Depth Map Fusion with Long-Range Constraints

We introduce a learning-based depth map fusion framework that accepts a ...
research
06/23/2021

LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction

Most modern deep learning-based multi-view 3D reconstruction techniques ...

Please sign up or login with your details

Forgot password? Click here to reset