Multi-view 3D Reconstruction with Transformer

03/24/2021
by   Dan Wang, et al.
53

Deep CNN-based methods have so far achieved the state of the art results in multi-view 3D object reconstruction. Despite the considerable progress, the two core modules of these methods - multi-view feature extraction and fusion, are usually investigated separately, and the object relations in different views are rarely explored. In this paper, inspired by the recent great success in self-attention-based Transformer models, we reformulate the multi-view 3D reconstruction as a sequence-to-sequence prediction problem and propose a new framework named 3D Volume Transformer (VolT) for such a task. Unlike previous CNN-based methods using a separate design, we unify the feature extraction and view fusion in a single Transformer network. A natural advantage of our design lies in the exploration of view-to-view relationships using self-attention among multiple unordered inputs. On ShapeNet - a large-scale 3D reconstruction benchmark dataset, our method achieves a new state-of-the-art accuracy in multi-view reconstruction with fewer parameters (70% less) than other CNN-based methods. Experimental results also suggest the strong scaling capability of our method. Our code will be made publicly available.

READ FULL TEXT

page 6

page 7

page 11

page 12

page 13

page 14

research
06/23/2021

LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction

Most modern deep learning-based multi-view 3D reconstruction techniques ...
research
05/29/2022

3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction

Recently, the transformer model has been successfully employed for the m...
research
11/11/2022

An Improved End-to-End Multi-Target Tracking Method Based on Transformer Self-Attention

This study proposes an improved end-to-end multi-target tracking algorit...
research
02/27/2023

UMIFormer: Mining the Correlations between Similar Tokens for Multi-View 3D Reconstruction

In recent years, many video tasks have achieved breakthroughs by utilizi...
research
11/04/2022

GARNet: Global-Aware Multi-View 3D Reconstruction Network and the Cost-Performance Tradeoff

Deep learning technology has made great progress in multi-view 3D recons...
research
08/21/2020

Single-Image Depth Prediction Makes Feature Matching Easier

Good local features improve the robustness of many 3D re-localization an...
research
08/05/2021

Semi- and Self-Supervised Multi-View Fusion of 3D Microscopy Images using Generative Adversarial Networks

Recent developments in fluorescence microscopy allow capturing high-reso...

Please sign up or login with your details

Forgot password? Click here to reset