Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images

by   Hongxun Yao, et al.

Recovering the 3D representation of an object from single-view or multi-view RGB images by deep neural networks has attracted increasing attention in the past few years. Several mainstream works (e.g., 3D-R2N2) use recurrent neural networks (RNNs) to fuse multiple feature maps extracted from input images sequentially. However, when given the same set of input images with different orders, RNN-based approaches are unable to produce consistent reconstruction results. Moreover, due to long-term memory loss, RNNs cannot fully exploit input images to refine reconstruction results. To solve these problems, we propose a novel framework for single-view and multi-view 3D reconstruction, named Pix2Vox. By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image. Then, a context-aware fusion module is introduced to adaptively select high-quality reconstructions for each part (e.g., table legs) from different coarse 3D volumes to obtain a fused 3D volume. Finally, a refiner further refines the fused 3D volume to generate the final output. Experimental results on the ShapeNet and Pascal 3D+ benchmarks indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large margin. Furthermore, the proposed method is 24 times faster than 3D-R2N2 in terms of backward inference time. The experiments on ShapeNet unseen 3D categories have shown the superior generalization abilities of our method.


page 6

page 7


VPFusion: Joint 3D Volume and Pixel-Aligned Feature Fusion for Single and Multi-view 3D Reconstruction

We introduce a unified single and multi-view neural implicit 3D reconstr...

Attentional Aggregation of Deep Feature Sets for Multi-view 3D Reconstruction

We study the problem of recovering an underlying 3D shape from a set of ...

Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates

Neural radiance field is an emerging rendering method that generates hig...

Towards the Probabilistic Fusion of Learned Priors into Standard Pipelines for 3D Reconstruction

The best way to combine the results of deep learning with standard 3D re...

LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction

Most modern deep learning-based multi-view 3D reconstruction techniques ...

MVSNet: Depth Inference for Unstructured Multi-view Stereo

We present an end-to-end deep learning architecture for depth map infere...

Please sign up or login with your details

Forgot password? Click here to reset