Getting Topology and Point Cloud Generation to Mesh

12/08/2019 ∙ by Austin Dill, et al. ∙ Carnegie Mellon University 0

In this work, we explore the idea that effective generative models for point clouds under the autoencoding framework must acknowledge the relationship between a continuous surface, a discretized mesh, and a set of points sampled from the surface. This view motivates a generative model that works by progressively deforming a uniform sphere until it approximates the goal point cloud. We review the underlying concepts leading to this conclusion from computer graphics and topology in differential geometry, and model the generation process as deformation via deep neural network parameterization. Finally, we show that this view of the problem produces a model that can generate quality meshes efficiently.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The proliferation of 3D point cloud data from LiDAR sensors for self-driving cars and commercial sensors like Microsoft’s Kinect has led to an increase in the interest for machine learning techniques to interpret 3D data. Just as generative models for text and images gained traction following the success of classification models for those domains, the success of neural networks on point clouds 

(Qi et al., 2017a, b; Zaheer et al., 2017) has led to a growth in research into point cloud generative models (Yang et al., 2018; Li et al., 2018; Groueix et al., 2018b; Yang et al., 2019; Groueix et al., 2018a; Li et al., 2019).

In this paper, we study point cloud autoencoders, which are building components of a full point cloud generative model (Li et al., 2018; Yang et al., 2019)

. The decoder itself is a generative model which samples from a corresponding 3-dimensional distribution of the input point cloud. If we naively train a neural network to generate point clouds, we will end up with an extremely under-constrained problem. This will result in low reconstruction loss but a non-uniform distribution of points on the surface of the object and it will require an additional meshing process.

In order to remedy this, we draw inspiration from computer graphics and topology by learning a generative model by way of deformation. We begin with a goal object we hope to reconstruct and an initial point cloud. At each step, we will learn a slight modification of the initial point cloud so the output will be close to the goal point cloud. As the underlying object that generated the point cloud sample is a solid object, we will constrain our model to learn deformations that obey the rules that govern valid topological and mesh deformations. As we will see, this will limit our network to invertible deformations. Incidentally, by focusing on this type of deformation, we can frequently use a mesh from the initial point cloud as a template for our final object, simplifying the meshing pipeline.

2 Background

Traditional computer graphics approaches such as ball-pivoting, marching cubes, and Poisson reconstruction work by algorithmically building a mesh to fit a fixed set of points.

Ezuz et al. (2019)

connect this generation of meshes to the problem of matching points on corresponding meshes. Their problem then becomes finding a plausible, smooth, and accurate map between triangular meshes. As investigated in their work, this requires a harmonic and reversible mapping between meshes that can be found by optimizing a loss function for individual pairs of meshes with a true underlying correspondence. This approach has the benefit of allowing the transference of texture and other mesh properties from the source mesh to the target mesh.

The closest work to ours in the deep learning community is that of FoldingNet 

(Yang et al., 2018). Their model learns a series of deformations (or folds) from an initial fixed 2D grid of points to a final object. It seems intuitive that starting from a 3D surface could lead to an easier learning problem, just as it is simpler to mold clay than to fold origami. This idea is generalized with AtlasNet where multiple 2D grids are used with multiple generators. In addition they explore using a sphere to sample points, but neglect the theoretical advantages connecting this approach to topology and the concept of deformations  (Groueix et al., 2018b).

While our work focuses on the generation of point clouds, a similar vein of work has been explored in the mesh reconstruction field by works like Wang et al. (2018) and Kanazawa et al. (2018). These methods both proceed by deforming an initial mesh (given a priori or learned respectively) into a final shape. These methods employ a graph-based method that does not allow for sampling an arbitrary number of points.

3 Theoretical Justification

We must examine the complexity of transforming various initial distributions to the type of surfaces encoded with point clouds to justify our claim that our model is a more natural generative model for point clouds. To simplify matters, we will limit our discussion to maps from a -ball and a 3-dimensional isotropic Gaussian to a -sphere.

Point clouds are typically sampled from the surfaces of real-world objects or realistic meshes. Intuitively, this means that the majority of points are located on the boundary of objects and not on the interior. If we hope to perfectly capture the surface of objects in point clouds with a continuous, invertible map as has become common practice in many generative models, we must consider the topology of our initial shape  (Rezende and Mohamed, 2015; Grathwohl et al., 2018; Behrmann et al., 2018).

Theorem 1.

There is no continuous invertible map between the 3-ball and the 2-sphere that respects the boundary.


This follows from Brouwer’s fixed point theorem. ∎

Theorem 2.

There is no continuous invertible map between and the 2-sphere that respects the boundary.


This follows from the relationship between Hausdorf spaces and compact subspaces. ∎

These results show that if we wish to learn a transformation that is continuous, invertible, and achieves no error on the boundary, we must choose an initial point cloud that is topologically close to our goal point cloud. Otherwise, our efforts will be thwarted by the underlying topology. For these reasons we choose to start from a hollow sphere of points with radius 1 as our initial shape. We believe that this is the most topologically similar structure that is simple to sample from. As we will see in section 4.2, this decision gives us additional advantages.

4 Method

4.1 Architecture

The architecture for our network is built on the idea of repeated deformations to an initial point cloud based on the encoding generated by a Deep Set model  (Zaheer et al., 2017). This model takes inspiration from FoldingNet with its series of "folds" replaced by deformations and its graph-based encoder exchanged for a set-based encoder  (Yang et al., 2018).

On top of this basic framework, we introduce a forward deformation network (going from a random sphere to the goal point cloud) and a backward pass (from the goal point cloud to a sphere). Training both of these networks simultaneously is meant to regularize the transformation, as inspired by the computer graphics community’s requirement for an invertible function without limiting ourselves to models with an analytic inverse. The forward architecture is depicted in Figure 1 and the backward architecture is identical.

Figure 1: A residual structure allows for gradual deformation from a random sphere to the final shape.

4.2 Loss Function

Our loss function encodes our desire to minimize distortion. While the majority of point cloud generative models are trained using Chamfer distance for autoencoder models or maximum likelihood for normalizing flow models, our loss function takes inspiration from the computer graphics community. Note that each is a point and is our learned function.


Equation 1 can be seen as an approximation to the Laplacian loss frequently used in mesh generation tasks  (Wang et al., 2018). While the true Laplacian would require knowledge of the neighborhood of each point, we can use properties of the sphere to approximate it and then enforce that the neighborhood persists in the output point cloud. For each point on the sphere, its neighborhood may be simply approximated as the -nearest neighbors in Euclidean distance. This allows us to define the neighborhood function required in equation 1. Our final loss function is a weighted combination of the Chamfer loss in both directions and the deformation loss.

5 Experimental Results

5.1 Dataset

In order to train and test our model, we sample points uniformly from the surface of meshes provided in the ShapeNet dataset (Chang et al. (2015)). The 51,300 meshes cover 55 distinct categories including airplanes, cars, lamps, and doors. All of the results in the following sections are trained on a portion of each category.

5.2 Metrics and Results

Taking our cue from previous works, we will use Distance to Face (D2F) and Coverage as measures for the quality of our reconstruction  (Li et al., 2018). These metrics are used by Lucic et al. (2018)

to evaluate GANs with the different names called precision and recall.


The results in Table 1 show that our topologically motivated approach is competitive with traditional methods for point cloud generation. In other words, we do not pay a substantial cost for incorporating topological similarity into our loss function.

While these metrics give us an insight into the quality of our reconstruction, their failure to capture structure leads to its poor performance as a measure of accuracy for the underlying surface the point cloud describes. As can be seen in Figure 2, our model is able to produce plausible meshes without a secondary meshing procedure. This is accomplished simply by feeding the vertices of a sphere mesh into our pretrained network. While omitted here for brevity, our ablation experiments show that omitting our deformation loss leads to under-constrained transformations that cause intersecting faces.

Category D2F () Coverage ()
Airplanes 2.35 1.59 1.44 1.59 2.00 5.44 7.40 6.71
Benches 7.96 2.23 1.92 2.34 11.8 11.5 13.8 12.66
Cars 3.67 3.98 1.46 1.24 1.62 1.01 1.44 1.31
Chairs 8.80 7.35 7.55 7.73 17.2 13.2 14.0 14.57
Cups 7.11 1.97 2.11 2.21 19.1 12.5 15.0 14.73
Guitars 5.02 5.76 0.995 1.29 3.80 2.29 3.43 3.56
Lamps 12.9 10.7 2.84 3.17 8.94 6.85 12.7 12.29
Laptops 8.55 4.86 1.20 1.30 9.47 8.80 12.3 11.66
Sofas 6.31 6.15 1.87 1.66 11.6 8.11 11.1 9.95
Tables 10.9 9.88 9.86 10.1 13.1 12.0 12.3 13.5
Table 1: Our method is comparable with other methods for point cloud reconstruction.
(a) Airplane
(b) Car
(c) Chair
Figure 2: Meshes automatically generated by our approach.

Because our model is limited to deformations between topologically similar objects, the standard reconstruction metrics show a decrease in performance when compared to objects trained for reconstruction loss alone. These shortcomings may be due to the deformation loss incentivizing the deletion of smaller details.

6 Conclusions

Traditionally, generative models for point clouds have been based entirely on the properties of sets: permutation invariance and conditional independence of the points given the underlying shape. Although these properties are crucial for efficiently modeling point cloud distributions, they ignore the relationship between point cloud and mesh, making mesh generation less effective. Our preliminary results show that methods that incorporate this knowledge can be trained solely on point sets and yet produce a generative process for meshes. Our hope is that this progress motivates further research into how set models can benefit from external structure, either as regularization or as a means for improving downstream tasks.


  • J. Behrmann, W. Grathwohl, R. T. Q. Chen, D. Duvenaud, and J. Jacobsen (2018) Invertible residual networks. External Links: 1811.00995 Cited by: §3.
  • A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu (2015) ShapeNet: An Information-Rich 3D Model Repository. Technical report Technical Report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago. Cited by: §5.1.
  • D. Ezuz, J. Solomon, and M. Ben-Chen (2019) Reversible harmonic maps between discrete surfaces. ACM Transactions on Graphics 38 (2), pp. 1–12. External Links: ISSN 0730-0301, Link, Document Cited by: §2.
  • W. Grathwohl, R. T. Q. Chen, J. Bettencourt, I. Sutskever, and D. Duvenaud (2018) FFJORD: free-form continuous dynamics for scalable reversible generative models. External Links: 1810.01367 Cited by: §3.
  • T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and M. Aubry (2018a) 3d-coded: 3d correspondences by deep deformation. In

    Proceedings of the European Conference on Computer Vision (ECCV)

    pp. 230–246. Cited by: §1.
  • T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and M. Aubry (2018b) AtlasNet: a papier-mâché approach to learning 3d surface generation. arXiv preprint arXiv:1802.05384. Cited by: §1, §2.
  • A. Kanazawa, S. Tulsiani, A. A. Efros, and J. Malik (2018) Learning category-specific mesh reconstruction from image collections. Lecture Notes in Computer Science, pp. 386–402. External Links: ISBN 9783030012670, ISSN 1611-3349, Link, Document Cited by: §2.
  • C. Li, T. Simon, J. Saragih, B. Póczos, and Y. Sheikh (2019) LBS autoencoder: self-supervised fitting of articulated meshes to point clouds. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 11967–11976. Cited by: §1.
  • C. Li, M. Zaheer, Y. Zhang, B. Poczos, and R. Salakhutdinov (2018) Point cloud gan. External Links: 1810.05795 Cited by: §1, §1, §5.2.
  • M. Lucic, K. Kurach, M. Michalski, S. Gelly, and O. Bousquet (2018) Are gans created equal? a large-scale study. In Advances in neural information processing systems, pp. 700–709. Cited by: §5.2.
  • C. R. Qi, H. Su, K. Mo, and L. J. Guibas (2017a) Pointnet: deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660. Cited by: §1.
  • C. R. Qi, L. Yi, H. Su, and L. J. Guibas (2017b) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099–5108. Cited by: §1.
  • D. J. Rezende and S. Mohamed (2015) Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770. Cited by: §3.
  • N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y. Jiang (2018) Pixel2Mesh: generating 3d mesh models from single rgb images. Lecture Notes in Computer Science, pp. 55–71. External Links: ISBN 9783030012526, ISSN 1611-3349, Link, Document Cited by: §2, §4.2.
  • G. Yang, X. Huang, Z. Hao, M. Liu, S. Belongie, and B. Hariharan (2019) PointFlow: 3d point cloud generation with continuous normalizing flows. arXiv preprint arXiv:1906.12320. Cited by: §1, §1.
  • Y. Yang, C. Feng, Y. Shen, and D. Tian (2018) Foldingnet: point cloud auto-encoder via deep grid deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 206–215. Cited by: §1, §2, §4.1.
  • M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola (2017) Deep sets. In Advances in neural information processing systems, pp. 3391–3401. Cited by: §1, §4.1.