TransformerFusion: Monocular RGB Scene Reconstruction using Transformers

07/05/2021
by   Aljaž Božič, et al.
19

We introduce TransformerFusion, a transformer-based 3D scene reconstruction approach. From an input monocular RGB video, the video frames are processed by a transformer network that fuses the observations into a volumetric feature grid representing the scene; this feature grid is then decoded into an implicit 3D scene representation. Key to our approach is the transformer architecture that enables the network to learn to attend to the most relevant image frames for each 3D location in the scene, supervised only by the scene reconstruction task. Features are fused in a coarse-to-fine fashion, storing fine-level features only where needed, requiring lower memory storage and enabling fusion at interactive rates. The feature grid is then decoded to a higher-resolution scene reconstruction, using an MLP-based surface occupancy prediction from interpolated coarse-to-fine 3D features. Our approach results in an accurate surface reconstruction, outperforming state-of-the-art multi-view stereo depth estimation methods, fully-convolutional 3D reconstruction approaches, and approaches using LSTM- or GRU-based recurrent networks for video sequence fusion.

READ FULL TEXT

page 2

page 4

page 9

page 12

page 13

page 15

page 16

page 17

research
12/01/2021

VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion

Recent volumetric 3D reconstruction methods can produce very accurate re...
research
03/24/2022

RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers

We propose a transformer-based neural network architecture for multi-obj...
research
06/15/2022

PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed Monocular Videos

We present PlanarRecon – a novel framework for globally coherent detecti...
research
01/31/2023

Monocular Scene Reconstruction with 3D SDF Transformers

Monocular scene reconstruction from posed images is challenging due to t...
research
06/29/2022

GO-Surf: Neural Feature Grid Optimization for Fast, High-Fidelity RGB-D Surface Reconstruction

We present GO-Surf, a direct feature grid optimization method for accura...
research
04/21/2023

VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos

We propose VisFusion, a visibility-aware online 3D scene reconstruction ...
research
02/18/2019

Multi-layer Depth and Epipolar Feature Transformers for 3D Scene Reconstruction

We tackle the problem of automatically reconstructing a complete 3D mode...

Please sign up or login with your details

Forgot password? Click here to reset