1 Introduction
Shape representations are central to many of the recent advancements in 3D computer vision and computer graphics, ranging from neural rendering
Oechsle2019ICCV; Mildenhall2020ECCV; Sitzmann2019CVPR; Meshry2019CVPR to shape reconstruction Mescheder2019CVPR; Park2019CVPR; Chen2019CVPR; Peng2020ECCV; Jiang2020CVPR; Niemeyer2020CVPR; Yariv2020NEURIPS. While conventional representations such as point clouds and meshes are efficient and wellstudied, they also suffer from several limitations: Point clouds are lightweight and easy to obtain, but do not directly encode surface information. Meshes, on the other hand, are usually restricted to fixed topologies. More recently, neural implicit representations Mescheder2019CVPR; Park2019CVPR; Chen2019CVPR have shown promising results for representing geometry due to their flexibility in encoding varied topologies, and their easy integration with differentiable frameworks. However, as such representations implicitly encode surface information, extracting the underlying surface is typically slow as they require numerous network evaluations in 3D space for extracting complete surfaces using marching cubes Mescheder2019CVPR; Park2019CVPR; Chen2019CVPR, or along rays for intersection detection in the context of volumetric rendering Niemeyer2020CVPR; Yariv2020NEURIPS; Mildenhall2020ECCV; Oechsle2021ARXIV.In this work, we introduce a novel Poisson solver which performs fast GPUaccelerated Differentiable Poisson Surface Reconstruction (DPSR) and solves for an indicator function from an oriented point cloud in a few milliseconds. Thanks to the differentiablility of our Poisson solver, gradients from a loss on the output mesh or a loss on the intermediate indicator grid can be efficiently backpropagated to update the oriented point cloud representation. This differential bridge between points, indicator functions, and meshes allows us to represent shapes as oriented point clouds. We therefore call this shape representation
ShapeAsPoints (SAP). Compared to existing shape representations, ShapeAsPoints has the following advantages (see also Table 1):Representations 









Efficiency  Grid Eval Time ()  n/a  n/a  n/a  n/a  0.33s  0.012s  
Priors  Easy Initialization  ✔  ✔  ✔  ✘  ✘  ✔  
Quality  Watertight  ✘  ✔  ✔  ✘  ✔  ✔  
No Selfintersection  n/a  n/a  ✘  ✘  ✔  ✔  
TopologyAgnostic  ✔  ✔  ✘  ✔  ✔  ✔ 
Efficiency: SAP has a low memory footprint as it only requires storing a collection of oriented point samples at the surface, rather than volumetric quantities (voxels) or a large number of network parameters for neural implicit representations. Using spectral methods, the indicator field can be computed efficiently (12 ms at resolution^{1}^{1}1On average, our method requires 12 ms for computing a indicator grid from 15K points on a single NVIDIA GTX 1080Ti GPU. Computing a indicator grid requires 140 ms.), compared to the typical rather slow query time of neural implicit networks (330 ms using Mescheder2019CVPR at the same resolution). Accuracy: The resulting mesh can be generated at high resolutions, is guaranteed to be watertight, free from selfintersections and also topologyagnostic. Initialization: It is easy to initialize SAP with a given geometry such as template shapes or noisy observations. In contrast, neural implicit representations are harder to initialize, except for few simple primitives like spheres Atzmon2020CVPR.
To investigate the aforementioned properties, we perform a set of controlled experiments. Moreover, we demonstrate stateoftheart performance in reconstructing surface geometry from unoriented point clouds in two settings: an optimizationbased setting that does not require training and is applicable to a wide range of shapes, and a learningbased setting for conditional shape reconstruction that is robust to noisy point clouds and outliers. In summary, the main contributions of this work are:

[leftmargin=*]

We present ShapeAsPoints, a novel shape representation that is interpretable, lightweight, and yields highquality watertight meshes at low inference times.

The core of the ShapeAsPoints representation is a versatile, differentiable and generalizable Poisson solver that can be used for a range of applications.

We study various properties inherent to the ShapeAsPoints representation, including inference time, sensitivity to initialization and topologyagnostic representation capacity.

We demonstrate stateoftheart reconstruction results from noisy unoriented point clouds at a significantly reduced computational budget compared to existing methods.
2 Related Work
2.1 3D Shape Representations
3D shape representations are central to 3D computer vision and graphics. Shape representations can be generally categorized as being either explicit or implicit. Explicit shape representations and learning algorithms depending on such representations directly parameterize the surface of the geometry, either as a point cloud Qi2017CVPR; Qi2017NIPS; Fan2017CVPR; Wang2019TOG; Yang2019ICCV, parameterized mesh Wang2018ECCV; Jiang2019ICLR; Gupta2020NEURIPS; Jiang2020NEURIPS or surface patches Yang2018CVPR; Groueix2018CVPR; Williams2019CVPR; Badki2020ARXIV; YangCVPR2020; Ma2021CVPR. Explicit representations are usually lightweight and require few parameters to represent the geometry, but suffer from discretization, the difficulty to represent watertight surfaces (point clouds, surface patches), or are restricted to a predefined topology (mesh). Implicit representations, in contrast, represent the shape as a level set of a continuous function over a discretized voxel grid Wu2016NIPS; Jiang2017ARXIV; Liao2018CVPR; Dai2018CVPR
or more recently parameterized as a neural network, typically referred to as neural implicit functions
Mescheder2019CVPR; Park2019CVPR; Chen2019CVPR. Neural implicit representations have been successfully used to represent geometries of objects Mescheder2019CVPR; Park2019CVPR; Chen2019CVPR; Wang2019NIPSa; Genova2019ARXIV; Sitzmann2020NEURIPS; Niemeyer2020CVPR; Tancik2020NEURIPS and scenes Peng2020ECCV; Jiang2020CVPR; Chabra2020ECCV; Sitzmann2020NEURIPS; Niemeyer2020GIRAFFE; Lionar2021WACV. Additionally, neural implicit functions are able to represent radiance fields which allow for highfidelity appearance and novel view synthesis Mildenhall2020ARXIV; Martinbrualla2020CVPR. However, extracting surface geometry from implicit representations typically requires dense evaluation of multilayer perceptrons, either on a volumetric grid or along rays, resulting in slow inference time. In contrast, SAP efficiently solves the Poisson Equation during inference by representing the shape as an oriented point cloud.
2.2 Optimizationbased 3D Reconstruction from Point Clouds
Several works have addressed the problem of inferring continuous surfaces from a point cloud. They tackle this task by utilizing basis functions, set properties of the points, or neural networks. Early works in shape reconstruction from point clouds utilize the convex hull or alpha shapes for reconstruction Edelsbrunner1994TOG. The ball pivoting algorithm Bernardini1999TOVCG leverages the continuity property of spherical balls of a given radius. One of the most popular techniques, Poisson Surface Reconstruction (PSR) Kazhdan2006PSR; Kazhdan2013SIGGRAPH
, solves the Poisson Equation and inherits smoothness properties from the basis functions used in the Poisson Equation. However, PSR is sensitive to the normals of the input points which must be inferred using a separate preprocessing step. In contrast, our method does not require any normal estimation and is thus more robust to noise. More recent works take advantage of the continuous nature of neural networks as function approximators to fit surfaces to point sets
Williams2019CVPR; Hanocka2020SIGGRAPH; Gropp2020ICML; Metzer2021SIGGRAPH. However, these methods tend to be memory and computationally intensive, while our method yields highquality watertight meshes in a few milliseconds.2.3 Learningbased 3D Reconstruction from Point Clouds
Learningbased approaches exploit a training set of 3D shapes to infer the parameters of a reconstruction model. Some approaches focus on local data priors Jiang2020CVPR; Badki2020ARXIV which typically result in better generalization, but suffer when large surfaces must be completed. Other approaches learn objectlevel Liao2018CVPR; Mescheder2019CVPR; Park2019CVPR or scenelevel priors Dai2019ARIXV; Peng2020ECCV; Jiang2020CVPR; Dai2020CVPR. Most reconstruction approaches directly reconstruct a meshed surface geometry, though some works Guerrero2018CGF; Ben2019CVPR; Ben2020ECCV; Lenssen2020CVPR first predict point set normals to subsequently reconstruct the geometry via PSR Kazhdan2006PSR; Kazhdan2013SIGGRAPH. However, such methods fail to handle large levels of noise, since they are unable to move points or selectively ignore outliers. In contrast, our endtoend approach is able to address this issue by either moving outlier points to the actual surface or by selectively muting outliers either by forming paired point clusters that selfcancel or reducing the magnitude of the predicted normals which controls their influence on the reconstruction.
3 Method
At the core of the ShapeAsPoints representation is a differentiable Poisson solver, which can be used for both optimizationbased and learningbased surface estimation. We first introduce the Poisson solver in Section 3.1. Next, we investigate two applications using our solver: optimizationbased 3D reconstruction (Section 3.2) and learningbased 3D reconstruction (Section 3.3).
3.1 Differentiable Poisson Solver
The key step in Poisson Surface Reconstruction Kazhdan2006PSR; Kazhdan2013SIGGRAPH involves solving the Poisson Equation. Let denote a spatial coordinate and denote its corresponding normal. The Poisson Equation arises from the insight that a set consisting of point coordinates and normals can be viewed as samples of the gradient of the underlying implicit indicator function
that describes the solid geometry. We define the normal vector field as a superposition of pulse functions
, where . By applying the divergence operator, the variational problem transforms into the standard Poisson equation:(1) 
In order to solve this set of linear Partial Differential Equations (PDEs), we discretize the function values and differential operators. Without loss of generality, we assume that the normal vector field
and the indicator function are sampled at uniformly spaced locations along each dimension. Denote the spatial dimensionality of the problem to be . Without loss of generality, we consider the three dimensional case where for . We have the indicator function , the point normal field , the gradient operator , the divergence operator , and the derived laplacian operator . Under such a discretization scheme, solving for the indicator function amounts to solving the linear system by inverting the divergence operator subject to boundary conditions of surface points having zero level set. Following Kazhdan2006PSR, we fix the overall scale to at :(2) 
Point Rasterization: We obtain the uniformly discretized point normal field
by rasterizing the point normals onto a uniformly sampled voxel grid. We can differentiably perform point rasterization via inverse trilinear interpolation, similar to the approach in
Kazhdan2006PSR; Kazhdan2013SIGGRAPH. We scatter the point normal values to the voxel grid vertices, weighted by the trilinear interpolation weights. The point rasterization process has space complexity, linear with respect to the number of grid cells, and time complexity, linear with respect to the number of points. See supplementary for details.Spectral Methods for Solving PSR: In contrast to the finiteelement approach taken in Kazhdan2006PSR; Kazhdan2013SIGGRAPH, we solve the PDEs using spectral methods Canuto2007Springer. While spectral methods are commonly used in scientific computing for solving PDEs and in some cases applied to computer vision problems Li2001ICMP
, we are the first to apply them in the context of Poisson Surface Reconstruction. Unlike finiteelement approaches that depend on irregular data structures such as octrees or tetrahedral meshes for discritizing space, spectral methods can be efficently solved over a uniform grid as they leverage highly optimized Fast Fourier Transform (FFT) operations that are well supported for GPUs, TPUs, and mainstream deep learning frameworks. Spectral methods decompose the original signal into a linear sum of functions represented using the sine / cosine basis functions whose derivatives can be computed analytically. This allows us to easily approximate differential operators in spectral space. We denote the spectral domain signals with a tilde symbol, i.e.,
. We first solve for the unnormalized indicator function , not accounting for boundary conditions(3) 
where the spectral frequencies are denoted as corresponding to the spatial dimensions, and represents the inverse fast Fourier transform of . is a Gaussian smoothing kernel of bandwidth at grid resolution in the spectral domain to mitigate ringing effects as a result of the Gibbs phenomenon from rasterizing the point normals. We denote the elementwise product as , the L2norm as , and the dot product as . Finally, we subtract by the mean of the indicator function at the point set and scale the indicator function to obtain the solution to the PSR problem in Eq. (2):
(4) 
A detailed derivation of our differentiable PSR solver is provided in the supplementary material.
3.2 SAP for Optimizationbased 3D Reconstruction
We can use the proposed differentiable Poisson solver for various applications. First, we consider the classical task of surface reconstruction from unoriented point clouds. The overall pipeline for this setting is illustrated in Fig. 1 (top). We now provide details about each component.
Forward pass: It is natural to initialize the oriented 3D point cloud serving as 3D shape representation using the noisy 3D input points and corresponding (estimated) normals. However, to demonstrate the flexibility and robustness of our model, we purposefully initialize our model using a generic 3D sphere with radius in our experiments. Given the orientated point cloud, we apply our Poisson solver to obtain an indicator function grid, which can be converted to a mesh using Marching Cubes Lorensen1987SIGGRAPH.
Backward pass: For every point sampled from the mesh , we calculate a bidirectional L2 Chamfer Distance with respect to the input point cloud. To backpropagate the loss through to point
in our source oriented point cloud, we decompose the gradient using the chain rule:
(5) 
All terms in (5) are differentialable except for the middle one which involves Marching Cubes. However, this gradient can be effectively approximated by the inverse surface normal Remelli2020NEURIPS:
(6) 
where is the normal of the point . Different from MeshSDF Remelli2020NEURIPS that uses the gradients to update the latent code of a pretrained implicit shape representation, our method updates the source point cloud using the proposed differentiable Poisson solver.
Resampling:
To increase the robustness of the optimization process, we uniformly resample points and normals from the largest mesh component every 200 iterations, and replace all points in the original point clouds with the resampled ones. This resampling strategy eliminates outlier points that drift away during the optimization, and enforces a more uniform distribution of points. We provide an ablation study in supplementary.
Coarsetofine: To further decrease runtime, we consider a coarsetofine strategy during optimization. More specifically, we start optimizing at an indicator grid resolution of for 1000 iterations, from which we obtain a coarse shape. Next, we sample from this coarse mesh and continue optimization at a resolution of for 1000 iterations. We repeat this process until we reach the target resolution () at which we acquire the final output mesh. See also supplementary.
3.3 SAP for Learningbased 3D Reconstruction
We now consider the learningbased 3D reconstruction setting in which we train a conditional model that takes a noisy, unoriented point cloud as input and outputs a 3D shape. More specifically, we train the model to predict a a clean oriented point cloud, from which we obtain a watertight mesh using our Poisson solver and Marching Cubes. We leverage the differentiability of our Poisson solver to learn the parameters of this conditional model. Following common practice, we assume watertight meshes as ground truth and consequently supervise directly with the ground truth indicator grid obtained from these meshes. Fig. 1 (bottom) illustrates the pipeline of our architecture for the learningbased surface reconstruction task.
Architecture: We first encode the unoriented input point cloud coordinates into a feature . The resulting feature should encapsulate both local and global information about the input point cloud. We utilize the convolutional point encoder proposed in Peng2020ECCV for this purpose. Note that in the following, we will use to denote the features at point , dropping the dependency of on the remaining points for clarity. Also, we use to refer to network parameters in general.
Given their features, we aim to estimate both offsets and normals for every input point in the point cloud . We use a shallow MultiLayer Perceptron (MLP) to predict the offset for :
(7) 
where is obtained from the feature volume using trilinear interpolation. We predict offsets per input point, where . We add the offsets to the input point position and call the updated point position . Additional offsets allow us to densify the point cloud, leading to enhanced reconstruction quality. We choose for all learningbased reconstruction experiments (see ablation study in Table 4). For each updated point , we use a second MLP to predict its normal:
(8) 
We use the same decoder architecture as in Peng2020ECCV for both and . The network comprises 5 layers of ResNet blocks with a hidden dimension of 32.
Training and Inference: During training, we obtain the estimated indicator grid from the predicted point clouds using our differentiable Poisson solver. Since we assume watertight and noisefree meshes for supervision, we acquire the ground truth indicator grid by running PSR on a densely sampled point clouds of the ground truth meshes with the corresponding ground truth normals. This avoids running Marching Cubes at every iteration and accelerates training. We use the Mean Square Error (MSE) loss on the predicted and ground truth indicator grid:
(9) 
We implement all models in PyTorch
Pytorch2019NIPS and use the Adam optimizer Kingma2015ICML with a learning rate of 5e4. During inference, we use our trained model to predict normals and offsets, use DPSR to solve for the indicator grid, and run Marching Cubes Lorensen1987SIGGRAPH to extract meshes.4 Experiments
Following the exposition in the previous section, we conduct two types of experiments to evaluate our method. First, we perform single object reconstruction from unoriented point clouds. Next, we apply our method to learningbased surface reconstruction on ShapeNet Chang2015ARXIV, using noisy point clouds with or without outliers as inputs.
Datasets: We use the following datasets for optimizationbased reconstruction: 1) Thingi10K Zhou2016ARXIV, 2) Surface reconstruction benchmark (SRB) Williams2019CVPR and 3) DFAUST Bogo2017CVPR. Similar to prior works, we use 5 objects per dataset Gropp2020ICML; Williams2019CVPR; Hanocka2020SIGGRAPH. For learningbased objectlevel reconstruction, we consider all 13 classes of the ShapeNet Chang2015ARXIV subset, using the train/val/test split from Choy2016ECCV.
Thingi10K 


Thingi10K 

SRB 

DFAUST 

Input  IGR Gropp2020ICML  Point2Mesh Hanocka2020SIGGRAPH  SPSR Kazhdan2013SIGGRAPH  Ours  GT mesh 
Dataset  Method  Chamfer ()  FScore ()  Normal C. ()  Time (s) 

Thingi10K  IGR Gropp2020ICML  0.440  0.505  0.692  1842.3 
Point2Mesh Hanocka2020SIGGRAPH  0.109  0.656  0.806  3714.7  
SPSR Kazhdan2013SIGGRAPH  0.223  0.787  0.896  9.3  
Ours  0.054  0.940  0.947  370.1  
SRB  IGR Gropp2020ICML  0.178  0.755  –  1847.6 
Point2Mesh Hanocka2020SIGGRAPH  0.116  0.648  –  4707.9  
SPSR Kazhdan2013SIGGRAPH  0.232  0.735  –  9.2  
Ours  0.076  0.830  –  326.0  
DFAUST  IGR Gropp2020ICML  0.235  0.805  0.911  1857.2 
Point2Mesh Hanocka2020SIGGRAPH  0.071  0.855  0.905  3678.7  
SPSR Kazhdan2013SIGGRAPH  0.044  0.966  0.965  4.3  
Ours  0.043  0.966  0.959  379.9 
Baselines: In the optimizationbased reconstruction setting, we compare against networkbased methods IGR Gropp2020ICML and Point2Mesh Hanocka2020SIGGRAPH, as well as Screened Poisson Surface Reconstruction^{2}^{2}2We use the official implementation https://github.com/mkazhdan/PoissonRecon. (SPSR) Kazhdan2013SIGGRAPH on planefitted normals. To ensure that the predicted normals are consistently oriented for SPSR, we propagate the normal orientation using the minimum spanning tree Zhou2018OPEN3D. For learningbased surface reconstruction, we compare against pointbased Point Set Generation Networks (PSGN) Fan2017CVPR, patchbased AtlasNet Groueix2018CVPR, voxelbased 3DR2N2 Choy2016ECCV, and ConvONet Peng2020ECCV, which has recently reported stateoftheart results on this task. We use ConvOnet in their bestperforming setting (3plane encoders). SPSR is also used as a baseline. In addition, to evaluate the importance of our differentiable PSR optimization, we design another pointbased baseline. This baseline uses the same network architecture to predict points and normals. However, instead of passing them to our Poisson solver and calculate on the indicator grid, we directly supervise the point positions with a bidirectional Chamfer distance, and an L1 Loss on the normals as done in Ma2021CVPR. During inference, we also feed the predicted points and normals to our PSR solver and run Marching Cubes to obtain meshes.
Metrics: We consider Chamfer Distance, Normal Consistency and FScore with the default threshold of for evaluation, and also report optimization & inference time.
4.1 Optimizationbased 3D Reconstruction
In this part, we investigate whether our method can be used for the singleobject surface reconstruction task from unoriented point clouds or scans. We consider three different types of 3D inputs: point clouds sampled from synthetic meshes Zhou2016ARXIV with Gaussian noise, realworld scans Williams2019CVPR, and highresolution raw scans of humans with comparably little noise Bogo2017CVPR.
Fig. 2 and Table 2 show that our method achieves superior performance compared to both classical methods and networkbased approaches. Note that the objects considered in this task are challenging due to their complex geometry, thin structures, noisy and incomplete observations. While some of the baseline methods fail completely on these challenging objects, our method achieves robust performance across all datasets.
Low Noise 


High Noise 

Outliers 

Input  SPSR Kazhdan2013SIGGRAPH  3DR2N2 Choy2016ECCV  AtlasNet Groueix2018CVPR  ConvONet Peng2020ECCV  Ours  GT mesh 
(a) Noise=0.005  (b) Noise=0.025  (c) Noise=0.005, Outliers=50%  

Chamfer  FScore  Normal C.  Chamfer  FScore  Normal C.  Chamfer  FScore  Normal C.  Runtime  
SPSR Kazhdan2013SIGGRAPH  0.298  0.612  0.772  0.499  0.324  0.604  1.317  0.164  0.636   
PSGN Fan2017CVPR  0.147  0.259    0.151  0.247    0.736  0.007    0.010 s 
3DR2N2 Choy2016ECCV  0.172  0.400  0.715  0.173  0.418  0.710  0.202  0.387  0.709  0.015 s 
AtlasNet Groueix2018CVPR  0.093  0.708  0.855  0.117  0.527  0.821  1.822  0.057  0.609  0.025 s 
ConvONet Peng2020ECCV  0.044  0.942  0.938  0.066  0.849  0.913  0.052  0.916  0.929  0.327 s 
Ours (w/o )  0.044  0.942  0.935  0.067  0.841  0.907  0.085  0.819  0.903  0.068 s 
Ours  0.034  0.975  0.944  0.054  0.896  0.917  0.038  0.959  0.936  0.068 s 
In particular, Fig. 2 shows that IGR occasionally creates meshes in free space, as this is not penalized by its optimization objective when point clouds are unoriented. Both, Point2Mesh and our method alleviate this problem by optimizing for the Chamfer distance between the estimated mesh and the input point clouds. However, Point2Mesh requires an initial mesh as input of which the topology cannot be changed during optimization. Thus, it relies on SPSR to provide an initial mesh for objects with genus larger than 0 and suffers from inaccurate initialization Hanocka2020SIGGRAPH. Furthermore, compared to both IGR and Point2Mesh, our method converges faster.
While SPSR is even more efficient, it suffers from incorrect normal estimation on noisy input point clouds, which is a nontrivial task on its own. In contrast, our method demonstrates more robust behavior as we optimize points and normals guided by the Chamfer distance. Note that in this single object reconstruction task, our method is not able to complete large unobserved regions (e.g., the bottom of the person’s feet in Fig. 2 is unobserved and hence not completed). This limitation can be addressed using learningbased objectlevel reconstruction as discussed next.
To analyze whether our proposed differentiable Poisson solver is also beneficial for learningbased reconstruction, we evaluate our method on the single object reconstruction task using noise and outlieraugmented point clouds from ShapeNet as input to our method. We investigate the performance for three different noise levels: (a) Gaussian noise with zero mean and standard deviation 0.005, (b) Gaussian noise with zero mean and standard deviation 0.025, (c) 50% points have the same noise as in a) and the other 50% points are outliers uniformly sampled inside the unit cube.
4.2 Learningbased Reconstruction on ShapeNet
Fig. 3 and Table 3 show our results. Compared to the baselines, our method achieves similar or better results on all three metrics. The results show that, in comparison to directly using Chamfer loss on point positions and L1 loss on point normals, our DPSR loss can produce better reconstructions in all settings as it directly supervises the indicator grid which implicitly determines the surface through the Poisson equation. SPSR fails when the noise level is high or when there are outliers in the input point cloud. We achieve significantly better performances than other representations such as point clouds, meshes, voxel grids and patches. Moreover, we find that our method is robust to strong outliers. We refer to the supplementary for more detailed visualizations on how SAP handles outliers.
Table 3 also reports the runtime for setting (a) for all GPUaccelerated methods using a single NVIDIA GTX 1080Ti GPU, averaged over all objects of the ShapeNet test set. The baselines Choy2016ECCV; Fan2017CVPR; Groueix2018CVPR demonstrate fast inference time but suffer in terms of reconstruction quality while the neural implicit model Peng2020ECCV attains high quality reconstructions but suffers from slow inference. In contrast, our method is able to produce competitive reconstruction results at reasonably fast inference time. In addition, since ConvONet and our method share a similar reconstruction pipeline, we provide a more detailed breakdown of the runtime at a resolution of and voxels in Table 4. We use the default setup from ConvONet^{3}^{3}3To be consistent, we use the Marching Cubes implementation from Van2014Scikit for both ConvONet and ours.. As we can see from Table 4, the difference in terms of point encoding and Marching Cubes is marginal, but we gain more than speedup over ConvONet in evaluating the indicator grid. In total, we are roughly and faster regarding the total inference time at a resolution of and voxels, respectively.
Enc.  Grid  MC  Total  Enc.  Grid  MC  Total  
ConvONet  0.010  0.280  0.037  0.327  0.010  3.798  0.299  4.107 
Ours  0.013  0.012  0.039  0.064  0.019  0.140  0.374  0.533 
Chamfer  FScore  NormalC  
Offset 1x  0.041  0.952  0.928 
Offset 3x  0.039  0.958  0.934 
Offset 5x  0.039  0.957  0.934 
Offset 7x  0.038  0.959  0.936 
2D Enc.  0.043  0.939  0.928 
3D Enc.  0.038  0.959  0.936 
4.3 Ablation Study
In this section, we investigate different architecture choices in the context of learningbased reconstruction. We conduct our ablation experiments on ShapeNet for the third setup (most challenging).
Number of Offsets: From Table 4 we notice that predicting more offsets per input point leads to better performance. This can be explained by the fact that with more points near the object surface, geometric details can be better preserved.
Point Cloud Encoder: Here we compare two different point encoder architectures proposed in Peng2020ECCV: a 2D encoder using 3 canonical planes at a resolution of pixels and a 3D encoder using a feature volume with a resolution of voxels. We find that the 3D encoder works best in this setting and hypothesize that this is due to the representational alignment with the 3D indicator grid.
5 Conclusion
We introduce ShapeAsPoints, a novel shape representation which is lightweight, interpretable and produces watertight meshes efficiently. We demonstrate its effectiveness for the task of surface reconstruction from unoriented point clouds in both optimizationbased and learningbased settings. Our method is currently limited to small scenes due to the cubic memory requirements with respect to the indicator grid resolution. We believe that processing scenes in a slidingwindow manner and spaceadaptive data structures (e.g., octrees) will enable extending our method to larger scenes. Point cloudbased methods are broadly used in realworld applications ranging from household robots to selfdriving cars, and hence share the same societal opportunities and risks as other learningbased 3D reconstruction techniques.
Acknowledgement: Andreas Geiger was supported by the ERC Starting Grant LEGO3D (850533) and the DFG EXC number 2064/1  project number 390727645. The authors thank the Max Planck ETH Center for Learning Systems (CLS) for supporting Songyou Peng and the International Max Planck Research School for Intelligent Systems (IMPRSIS) for supporting Michael Niemeyer. We would also like to thank Xu Chen, Christian Reiser, and Rémi Pautrat for proofreading.
References
Supplementary Material for
Shape As Points: A Differentiable Poisson Solver
In this supplementary document, we first provide derivation details for our Differentiable Poisson Solver in Section A. In Section B, we provide implementation details for our optimizationbased and learningbased methods. Additional results and ablations for the optimizationbased and learningbased reconstruction can be found in Section C and Section D, respectively.
Appendix A Derivations for Differentiable Poisson Solver
a.1 Point Rasterization
Given the origin of the voxel grid , and the size of each voxel , we scatter the point normal values to the voxel grid vertices, weighted by the trilinear interpolation weights. For a given point , with point location and point normal , we can compute the neighbor indices as , where . Here and denote the floor and ceil operators for rounding integers. We denote the trilinear sampling weight function as , where and denote the location of the point and the grid vertex. The contribution from point to voxel grid vertex can be computed as:
(10) 
Hence the grid value at grid index can be computed via summing over all neighborhood points:
(11) 
where denotes the set of point indices in the neighborhood of vertex .
a.2 Spectral Methods for Solving PSR
We solve the PDEs using spectral methods Canuto2007Springer. In three dimensions, the multidimensional Fourier Transform and Inverse Fourier Transform are defined as:
(12)  
(13) 
where are the spatial coordinates, and represent the frequencies corresponding to and . Derivatives in the spectral space can be analytically computed:
In discrete form, we have the rasterized point normals , where . Hence in spectral domain, the divergence of the rasterized point normals can be written as:
(14) 
The Laplacian operator can be simply written as:
(15) 
Therefore, the unnormalized solution to the Poisson Equations , not accounting for boundary conditions, can be written as:
(16) 
Where is a Gaussian smoothing kernel of bandwidth for grid resolution of in the spectral domain to mitigate the ringing effects as a result of the Gibbs phenomenon from rasterizing the point normals. The unnormalized indicator function in the physical domain can be obtained via inverse Fourier Transform:
(17) 
We further normalize the indicator field to incorporate the boundary condition that the indicator field is valued at zero at point locations and valued inside and outside the shapes.
(18) 
Appendix B Implementation Details
In this section, we provide implementation details for baselines and our method for both settings, optimizationbased and the learningbased reconstruction.
Optimizationbased 3D reconstruction: We use the official implementation of IGR^{4}^{4}4https://github.com/amosgropp/IGR Gropp2020ICML and Point2Mesh^{5}^{5}5https://github.com/ranahanocka/point2mesh Hanocka2020SIGGRAPH. We optimize IGR for 15000 iterations on each object until convergence. For Point2Mesh, we follow the official implementation and use 6000 iterations for each object. We generate the initial mesh required by Point2Mesh following the description of the original paper. Specifically, the initial mesh is provided as the convex hull of the input point cloud for objects with a genus of zero. If the genus is larger than zero, we apply the watertight manifold algorithm Huang2018ARXIV2 using a lowresolution octree reconstruction on the output mesh of SPSR to obtain a coarse initial mesh.
For our method, we follow the coarsetofine and resampling strategy described in the main paper (Section 3.2). To smooth the output mesh as well as to stabilze the optimization process, we gradually increase the Gaussian smoothing parameter in Eq. (16) when increasing the grid resolution: for a grid resolution of and , when the grid resolution is . At the final resolution of , we use for objects with more details (e.g. objects in SRB Williams2019CVPR and DFAUST Bogo2017CVPR, and for the input points with noises (Thingi10K Zhou2016ARXIV). We use the Adam optimizer Kingma2015ICML with a learning rate decay. The learning rate is set to at the initial resolution of with a decay of after every increase of the grid resolution. Moreover, we run 1000 iterations at every grid resolution of , and , and 200 iterations for . 20000 source points and normals are used by our method to represent the final shapes for all objects.
Learningbased 3D reconstruction: For AtlasNet Groueix2018CVPR, we use the official implementation^{6}^{6}6https://github.com/ThibaultGROUEIX/AtlasNet with 25 parameterizations. We change the number of input points from 2500 (default) to 3000 for our setting. Depending on the experiment, we adde different noise levels or outlier points (see Section 4.2 in main paper). We train ConvONet Peng2020ECCV, PSGN Fan2017CVPR, and 3DR2N2 Choy2016ECCV for at least 300000 iterations, and use Adam optimizer Kingma2015ICML with a learning rate of for all methods.
We train our method as well as Ours (w/o ) for all 3 noise levels for 300000 iterations and use Adam optimizer with a learning rate of . To generate the ground truth PSR indicator field in Eq. (9) of the main paper, first we sample 100000 points and the corresponding point normals from the ground truth mesh, and input to our DPSR at a grid resolution of .
Thingi10K 


SRB  
DFAUST  
Input  IGR Gropp2020ICML  Point2Mesh Hanocka2020SIGGRAPH  SPSR Kazhdan2013SIGGRAPH  Ours  GT mesh 
Appendix C Optimizationbased 3D Reconstruction
c.1 Qualitative Comparison of All Objects in 3 Datasets
As a complementary to Fig. 2 in the main paper, Fig. 4 shows qualitative comparisons of the remaining 11 objects in the optimizationbased setting, including 3 objects in Thingi10K Zhou2016ARXIV, 4 in SRB Williams2019CVPR and 4 in DFAUST Bogo2017CVPR.
c.2 Ablation Study of Point Resampling Strategy
In Table 5 and Fig. 5, we compare the reconstructed shapes with and without the proposed resampling strategy. Our method is able to produce reasonable reconstructions even without the resampling strategy, but the shapes are much noisier. Since we directly optimize the source point positions and normals without any additional constraints, the optimized point clouds can be unevenly distributed as shown in Fig. 5. This limits the representational expressivity of the point clouds given the same number of points. The resampling strategy acts as a regularization to enforce a uniformly distributed point cloud, which leads to better surface reconstruction.
Dataset  Method  Chamfer ()  FScore ()  Normal C. () 

Thingi10K  Ours (w/o resampling)  0.061  0.897  0.902 
Ours  0.053  0.941  0.947  
DGP  Ours (w/o resampling)  0.077  0.813  – 
Ours  0.067  0.848  –  
DFAUST  Ours (w/o resampling)  0.044  0.964  0.952 
Ours  0.043  0.965  0.959 
Thingi10K 

SRB 

Point cloud  Mesh  Point cloud  Mesh  
Ours w/o resampling  Ours  GT 
c.3 Ablation Study of Gaussian Smoothing Parameter
We study the effect of Gaussian smoothing parameter at a resolution of . As visualized in Fig. 6, we can obtain faithful reconstructions given different values. Nevertheless, we can notice that lower can preserve details better but also is prone to noise, while high results in smooth shapes but can also lead to losing of details. In practice, can be chosen according to the noise level of the target point cloud. In the results depicted in Fig. 4 and Table 2 in main paper, we choose for SRB and DFAUST dataset and for Thingi10K dataset.
Thingi10K 

DFAUST 

GT 
Appendix D Learningbased 3D Reconstruction
d.1 Visualization of How SAP Handles Noise and Outliers
In this section, we visualize how our trained models handle noise and outliers during inference.
Noise Handling: We can see from the top row of Fig. 7 that, compared to the input point cloud, the updated SAP points are densified because we predict offsets per input point. More importantly, all SAP points are located roughly on the surface, which leads to enhanced reconstruction quality.
Outlier Handling: We also visualize how SAP handles outlier points at the bottom row of Fig. 7. The arrows’ length represents the magnitude of the predicted normals. There are two interesting observations: a) A large amount of outlier points in the input are moved near to the surface. b) Some outlier points still remain outliers. For these points, the network learns to predict normals with a very small magnitude/norm as shown in the zoomin view (we do not normalize the point normals to unit length). In this way, those outlier points are “muted” when being passed to the DPSR layer such that they do not contribute to the final reconstruction.
Input  Ours  point clouds  Ours  mesh  GT mesh 
d.2 Additional Results for Reconstruction from Noisy Point Clouds
In Fig. 8, we show additional qualitative comparison on ShapeNet. In Table 6, we provide quantitative results on all 13 classes of the ShapeNet subset of Choy et al. Choy2016ECCV.
High Noise  

Outliers  
Input  SPSR Kazhdan2013SIGGRAPH  3DR2N2 Choy2016ECCV  AtlasNet Groueix2018CVPR  ConvONet Peng2020ECCV  Ours  GT mesh 
Chamfer  FScore  Normal Consistency  

SPSR  PSGN  3DR2N2  AtlasNet  ConvONet  Ours  Ours  SPSR  PSGN  3DR2N2  AtlasNet  ConvONet  Ours  Ours  SPSR  PSGN  3DR2N2  AtlasNet  ConvONet  Ours  Ours  
category  (w/o )  (w/o )  (w/o )  
airplane  0.437  0.102  0.151  0.064  0.034  0.040  0.027  0.551  0.476  0.382  0.827  0.965  0.940  0.981  0.747    0.669  0.854  0.931  0.919  0.931 
bench  0.544  0.129  0.153  0.073  0.035  0.041  0.032  0.430  0.266  0.431  0.786  0.965  0.949  0.979  0.649    0.691  0.820  0.921  0.915  0.920 
cabinet  0.154  0.164  0.167  0.112  0.047  0.044  0.037  0.728  0.137  0.412  0.603  0.955  0.952  0.975  0.835    0.786  0.875  0.956  0.951  0.957 
car  0.180  0.132  0.197  0.099  0.075  0.061  0.045  0.729  0.211  0.348  0.642  0.849  0.875  0.928  0.783    0.719  0.827  0.893  0.886  0.897 
chair  0.369  0.168  0.181  0.114  0.046  0.047  0.036  0.473  0.152  0.393  0.629  0.939  0.939  0.979  0.715    0.673  0.829  0.943  0.940  0.952 
display  0.280  0.160  0.170  0.089  0.036  0.036  0.030  0.544  0.175  0.401  0.727  0.971  0.975  0.990  0.749    0.747  0.905  0.968  0.967  0.972 
lamp  0.278  0.207  0.243  0.137  0.059  0.069  0.047  0.586  0.204  0.333  0.562  0.892  0.897  0.959  0.765    0.598  0.759  0.900  0.899  0.921 
loudspeaker  0.148  0.205  0.199  0.142  0.063  0.058  0.041  0.731  0.107  0.405  0.516  0.892  0.900  0.957  0.843    0.735  0.867  0.938  0.935  0.950 
rifle  0.409  0.091  0.147  0.051  0.028  0.027  0.023  0.590  0.615  0.381  0.877  0.980  0.982  0.990  0.788    0.700  0.837  0.929  0.935  0.937 
sofa  0.227  0.144  0.160  0.091  0.041  0.039  0.032  0.712  0.184  0.427  0.717  0.953  0.960  0.982  0.826    0.754  0.888  0.958  0.957  0.963 
table  0.393  0.166  0.177  0.102  0.038  0.043  0.033  0.442  0.158  0.404  0.692  0.967  0.958  0.986  0.706    0.734  0.867  0.959  0.954  0.962 
telephone  0.281  0.110  0.130  0.054  0.027  0.026  0.023  0.674  0.317  0.484  0.867  0.989  0.992  0.997  0.805    0.847  0.957  0.983  0.982  0.984 
vessel  0.181  0.130  0.169  0.078  0.043  0.043  0.030  0.771  0.363  0.394  0.757  0.931  0.930  0.974  0.820    0.641  0.837  0.918  0.917  0.930 
mean  0.299  0.147  0.173  0.093  0.044  0.044  0.034  0.612  0.259  0.400  0.708  0.942  0.942  0.975  0.772    0.715  0.855  0.938  0.935  0.944 
(a) Noise = 0.005 Chamfer FScore Normal Consistency SPSR PSGN 3DR2N2 AtlasNet ConvONet Ours Ours SPSR PSGN 3DR2N2 AtlasNet ConvONet Ours Ours SPSR PSGN 3DR2N2 AtlasNet ConvONet Ours Ours category (w/o ) (w/o ) (w/o ) airplane 0.716 0.107 0.147 0.103 0.052 0.059 0.045 0.268 0.457 0.413 0.558 0.883 0.857 0.915 0.550  0.665 0.787 0.904 0.897 0.905 bench 0.661 0.133 0.154 0.101 0.056 0.060 0.050 0.296 0.255 0.446 0.587 0.872 0.862 0.905 0.551  0.683 0.797 0.887 0.879 0.885 cabinet 0.323 0.166 0.165 0.118 0.065 0.067 0.051 0.383 0.138 0.435 0.554 0.883 0.863 0.920 0.671  0.784 0.855 0.937 0.927 0.938 car 0.338 0.137 0.188 0.115 0.104 0.091 0.071 0.415 0.200 0.388 0.528 0.739 0.749 0.817 0.632  0.714 0.792 0.875 0.862 0.871 chair 0.524 0.176 0.191 0.126 0.071 0.073 0.058 0.263 0.141 0.387 0.527 0.818 0.799 0.882 0.585  0.666 0.811 0.915 0.905 0.920 display 0.409 0.166 0.167 0.111 0.057 0.057 0.047 0.321 0.164 0.431 0.554 0.889 0.881 0.925 0.600  0.743 0.884 0.946 0.944 0.951 lamp 0.457 0.210 0.261 0.146 0.090 0.101 0.076 0.319 0.195 0.329 0.455 0.754 0.734 0.841 0.617  0.588 0.737 0.866 0.859 0.881 loudspeaker 0.320 0.205 0.203 0.144 0.090 0.092 0.065 0.369 0.109 0.407 0.471 0.793 0.778 0.853 0.675  0.734 0.850 0.920 0.910 0.925 rifle 0.848 0.097 0.144 0.119 0.047 0.044 0.042 0.218 0.575 0.403 0.439 0.905 0.917 0.928 0.541  0.691 0.746 0.888 0.895 0.896 sofa 0.452 0.152 0.153 0.109 0.065 0.061 0.051 0.337 0.166 0.457 0.572 0.857 0.865 0.908 0.631  0.744 0.860 0.936 0.934 0.941 table 0.514 0.169 0.177 0.115 0.057 0.061 0.049 0.293 0.158 0.431 0.564 0.885 0.869 0.923 0.597  0.729 0.847 0.936 0.929 0.940 telephone 0.521 0.112 0.128 0.105 0.038 0.038 0.033 0.329 0.311 0.508 0.520 0.959 0.964 0.976 0.591  0.847 0.917 0.975 0.975 0.976 vessel 0.403 0.135 0.173 0.111 0.073 0.072 0.058 0.399 0.341 0.397 0.527 0.796 0.794 0.856 0.612  0.639 0.788 0.881 0.875 0.886 mean 0.499 0.151 0.173 0.117 0.066 0.067 0.054 0.324 0.247 0.418 0.527 0.849 0.841 0.896 0.604  0.710 0.821 0.913 0.907 0.917 (b) Noise = 0.025 Chamfer FScore Normal Consistency SPSR PSGN 3DR2N2 AtlasNet ConvONet Ours Ours SPSR PSGN 3DR2N2 AtlasNet ConvONet Ours Ours SPSR PSGN 3DR2N2 AtlasNet ConvONet Ours Ours category (w/o ) (w/o ) (w/o ) airplane 1.573 0.745 0.164 2.113 0.041 0.213 0.031 0.093 0.011 0.405 0.027 0.938 0.667 0.970 0.621  0.650 0.561 0.920 0.864 0.923 bench 1.499 0.573 0.166 1.856 0.041 0.076 0.036 0.126 0.007 0.431 0.053 0.945 0.829 0.965 0.575  0.695 0.630 0.910 0.876 0.909 cabinet 1.060 0.712 0.175 1.472 0.052 0.065 0.042 0.248 0.004 0.399 0.083 0.938 0.868 0.960 0.659  0.778 0.639 0.950 0.925 0.950 car 1.262 0.536 0.200 1.844 0.087 0.092 0.057 0.177 0.009 0.351 0.059 0.812 0.764 0.893 0.634  0.711 0.620 0.884 0.863 0.888 chair 0.984 0.689 0.228 1.478 0.055 0.087 0.041 0.186 0.005 0.362 0.076 0.903 0.783 0.959 0.628  0.672 0.644 0.930 0.898 0.941 display 1.312 0.965 0.201 1.685 0.041 0.060 0.034 0.188 0.004 0.374 0.062 0.956 0.878 0.980 0.627  0.747 0.593 0.962 0.945 0.967 lamp 1.402 0.958 0.399 2.080 0.073 0.110 0.047 0.123 0.004 0.283 0.037 0.838 0.699 0.941 0.630  0.587 0.569 0.885 0.844 0.910 loudspeaker 0.930 0.905 0.224 1.392 0.075 0.093 0.050 0.264 0.003 0.376 0.093 0.861 0.792 0.927 0.673  0.732 0.652 0.930 0.904 0.940 rifle 1.689 0.479 0.163 2.442 0.037 0.055 0.026 0.066 0.022 0.386 0.012 0.953 0.888 0.985 0.627  0.679 0.519 0.916 0.907 0.929 sofa 1.267 0.607 0.172 1.656 0.047 0.064 0.037 0.211 0.006 0.412 0.080 0.934 0.866 0.967 0.655  0.756 0.666 0.950 0.933 0.956 table 1.159 0.913 0.202 1.581 0.044 0.082 0.037 0.166 0.004 0.405 0.081 0.950 0.810 0.972 0.618  0.737 0.672 0.951 0.915 0.954 telephone 1.458 0.851 0.146 1.890 0.030 0.036 0.025 0.173 0.005 0.461 0.047 0.983 0.971 0.994 0.668  0.839 0.589 0.980 0.975 0.982 vessel 1.530 0.639 0.189 2.200 0.054 0.072 0.036 0.108 0.009 0.387 0.032 0.890 0.814 0.956 0.652  0.635 0.568 0.907 0.886 0.921 mean 1.317 0.736 0.202 1.822 0.052 0.085 0.038 0.164 0.007 0.387 0.057 0.916 0.818 0.959 0.636  0.709 0.609 0.929 0.903 0.936 (c) Noise = 0.005, Outliers = 50%
Comments
There are no comments yet.