Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images

01/31/2019
by   Hongxun Yao, et al.
0

Recovering the 3D representation of an object from single-view or multi-view RGB images by deep neural networks has attracted increasing attention in the past few years. Several mainstream works (e.g., 3D-R2N2) use recurrent neural networks (RNNs) to fuse multiple feature maps extracted from input images sequentially. However, when given the same set of input images with different orders, RNN-based approaches are unable to produce consistent reconstruction results. Moreover, due to long-term memory loss, RNNs cannot fully exploit input images to refine reconstruction results. To solve these problems, we propose a novel framework for single-view and multi-view 3D reconstruction, named Pix2Vox. By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image. Then, a context-aware fusion module is introduced to adaptively select high-quality reconstructions for each part (e.g., table legs) from different coarse 3D volumes to obtain a fused 3D volume. Finally, a refiner further refines the fused 3D volume to generate the final output. Experimental results on the ShapeNet and Pascal 3D+ benchmarks indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large margin. Furthermore, the proposed method is 24 times faster than 3D-R2N2 in terms of backward inference time. The experiments on ShapeNet unseen 3D categories have shown the superior generalization abilities of our method.

READ FULL TEXT

page 6

page 7

research
03/14/2022

VPFusion: Joint 3D Volume and Pixel-Aligned Feature Fusion for Single and Multi-view 3D Reconstruction

We introduce a unified single and multi-view neural implicit 3D reconstr...
research
08/02/2018

Attentional Aggregation of Deep Feature Sets for Multi-view 3D Reconstruction

We study the problem of recovering an underlying 3D shape from a set of ...
research
09/20/2023

Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates

Neural radiance field is an emerging rendering method that generates hig...
research
07/27/2022

Towards the Probabilistic Fusion of Learned Priors into Standard Pipelines for 3D Reconstruction

The best way to combine the results of deep learning with standard 3D re...
research
06/23/2021

LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction

Most modern deep learning-based multi-view 3D reconstruction techniques ...
research
04/07/2018

MVSNet: Depth Inference for Unstructured Multi-view Stereo

We present an end-to-end deep learning architecture for depth map infere...

Please sign up or login with your details

Forgot password? Click here to reset