HyperFlow: Representing 3D Objects as Surfaces

06/15/2020 ∙ by Przemysław Spurek, et al. ∙ Jagiellonian University 0

In this work, we present HyperFlow - a novel generative model that leverages hypernetworks to create continuous 3D object representations in a form of lightweight surfaces (meshes), directly out of point clouds. Efficient object representations are essential for many computer vision applications, including robotic manipulation and autonomous driving. However, creating those representations is often cumbersome, because it requires processing unordered sets of point clouds. Therefore, it is either computationally expensive, due to additional optimization constraints such as permutation invariance, or leads to quantization losses introduced by binning point clouds into discrete voxels. Inspired by mesh-based representations of objects used in computer graphics, we postulate a fundamentally different approach and represent 3D objects as a family of surfaces. To that end, we devise a generative model that uses a hypernetwork to return the weights of a Continuous Normalizing Flows (CNF) target network. The goal of this target network is to map points from a probability distribution into a 3D mesh. To avoid numerical instability of the CNF on compact support distributions, we propose a new Spherical Log-Normal function which models density of 3D points around object surfaces mimicking noise introduced by 3D capturing devices. As a result, we obtain continuous mesh-based object representations that yield better qualitative results than competing approaches, while reducing training time by over an order of magnitude.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 5

page 11

page 15

page 16

page 17

page 18

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Representing 3D objects efficiently is a prerequisite for a multitude of contemporary computer vision and machine learning applications, including robotic manipulation 

kehoe2015survey and autonomous driving yang2018pixor . 3D registration devices used currently to create those representations, such as LIDARs and depth cameras, sample object surfaces and output a set of 3D points called a point cloud.

Processing point clouds poses several challenges. First of all, the size of the point cloud can vary between objects and processing variable-size inputs is cumbersome for contemporary neural networks used in practical applications. Although one can subsample or upsample point clouds, it requires additional processing steps, continuous signed distance functions 

park2019deepsdf or even separate models yifan2019patch ; yu2018pu . Other solutions to that problem rely on discretizing 3D space into regular 3D voxel grids wu20153d ; wu2016learning , collections of images su2015multi or occupancy grids ji20123d ; maturana2015voxnet . These approaches, however, increase the memory footprint of object representations and lead to quantization losses. Secondly, processing point clouds with neural networks is challenging due to the lack of ordering within sets of 3D points. More precisely, permuting the points in the cloud can lead to inconsistent outputs. DeepSets zaheer2017deep and PointNet qi2017pointnet ; qi2017pointnet++ address this problem by including permutation invariant layers in neural network architectures. Nonetheless, the same modifications cannot be used when the task requires a model to produce outputs of various sizes, e.g. in the case of point cloud reconstruction tasks.

More recent methods that create representations of 3D objects from variable-size unordered point clouds rely on generative neural networks that treat point clouds as a sample from a 3D probability distribution yang2019pointflow ; stypulkowski2019conditional ; spurek2020hypernetwork . PointFlow yang2019pointflow returns probability distributions of the 3D object point cloud, instead of an exact set of points. Its main limitation, however, is a computationally expensive training process caused by conditioning the Continuous Normalizing Flow (CNF) module grathwohl2018ffjord

of the network with the autoencoder latent space. As a consequence, PointFlow models require a significant number of parameters which results in a high memory footprint of the model and long training procedure. To reduce this burden and simplify the model, HyperCloud 

spurek2020hypernetwork uses a hypernetwork, instead of a CNF module as in PointFlow, to return weights of a fully-connected target network

that maps a uniform distribution on a 3D ball to a 3D point cloud. Although the simplicity of this approach leads to increased efficiency of HyperCloud, the quantitative results obtained by the model are inferior to those of PointFlow, mostly because conventional fully-connected neural networks are not capable of modeling complex 3D point cloud structures. Even though using more sophisticated CNF as a target network could address this shortcoming, the formulation of HyperCloud does not allow sampling from non-compact support prior, required by the Continuous Normalizing Flow (CNF) to work.

Figure 1: HyperFlow method leverages a hypernetwork architecture to take a 3D point cloud as an input and return parameters of the Continuous Normalizing Flow (CNF) target network (Part A). To represent 3D objects as families of surfaces, we use CNF to parametrize density of point clouds around surfaces with non-compact support distribution called Spherical Log-Normal (Part B). Using this parametrization in the hypernetwork configuration, we can obtain high-quality point cloud reconstructions as well as 3D object meshes (Part C), at a fraction of the training cost required by the vanilla CNF model and with a significantly lower memory footprint.

In this paper, we take a fundamentally different approach to representing 3D objects and, inspired by mesh triangulation methods used in computer graphics edelsbrunner_2000 , we model objects as families of surfaces. More specifically, we consider a point cloud as a sample from a distribution on object surfaces with additive noise introduced by a registration device, such as LIDAR. To model this distribution, we propose a new Spherical Log-Normal function which mimics the topology of 3D objects and provides non-compact support. This, in turn, enables effective utilization of a CNF model as a part of a hypernetwork, instead of a fully-connected neural network as done in HyperCloud spurek2020hypernetwork .

The resulting generative model we introduce in this work, dubbed HyperFlow111The code is available https://github.com/maciejzieba/HyperFlow., produces state-of-the-art generative results both for point clouds and mesh representations. Because we rely on a hypernetwork instead of conditioning a CNF with the autoencoder latent space, our model uses far fewer parameters of the CNF function. As a result, we reduce the training time and corresponding memory footprint of the model by over an order of magnitude with respect to the competing PointFlow.

Our contributions can be summarized as follows:

  • We introduce a new HyperFlow generative network that models 3D objects as families of surfaces and allows to build state-of-the-art point cloud representations that can be transformed into 3D meshes by leveraging generative properties of a target network.

  • We propose a new Spherical Log-Normal distribution which models a point cloud density with non-compact support and, hence, can be effectively used by a CNF model.

  • To the best of our knowledge, our work is the first approach to train a CNF as a target network which reduces its training time and memory footprint by over an order of magnitude, while preserving state-of-the-art generative capabilities.

2 Spherical Log-Normal distribution and the triangulation trick

In this section, we introduce a Spherical Log-Normal distribution that models density of point clouds around surfaces of 3D object and show how it can be used to generate meshes via the so-called triangulation trick.

Figure 2: Level sets and samples from Spherical Log-Normal distribution with different parameters  and . Since Spherical Log-Normal distribution does not have a compact support, it can be used in flow-based architectures.

Since our approach relies on flow-based models, a density distribution has to fulfil several conditions to be used in practice. First of all, flow-based methods cannot be trained on probability distributions with compact support. For instance, it is not possible to train a flow-based model on a 3D ball, as proposed in HyperCloud spurek2020hypernetwork

, since computing the log-likelihood cost function used in flows would return infinity for this distribution. As a result, the model does not converge due to numerical instability. Secondly, we would like to model the probability distribution of the surface (mesh representation), which is two-dimensional (the border of a 3D object). Therefore, a Gaussian distribution in

is not a good choice, since it models only elements in 3D. Finally, the density distribution should be topologically coherent with the density of the modeled object. More precisely, because of the way registration devices sample space around object surfaces, point clouds are populated with the highest density around object edges and missing points within object structure. Modeling this density with a distribution that does not allow discontinuities is infeasible as per Theorem 2.1 theorem1 .

Theorem 2.1.

There is no continuous invertible map between the 3-ball and the 2-sphere that respects the boundary.

For modeling the surface of an object with a continuous, invertible map, one shall consider the topology of the object rezende2015variational ; grathwohl2018ffjord ; behrmann2018invertible . To learn a transformation that is continuous, invertible and provides results close to object boundary, one has to choose a prior that is topologically similar to the expected point cloud, i.e. has the same number of discontinuities222Continuous normalizing flows (FFJORD grathwohl2018ffjord ) are able to approximate discontinuous density functions. This, however, remains insufficient to model high-quality 3D point clouds while generating continuous parametrization of object surfaces. Consequently, in our approach, we propose a density distribution without compact support and with a single discontinuity, which corresponds to topology of 3D objects represented with point clouds.. Therefore, we construct a probability distribution on a sphere without compact support.

Spherical Log-Normal distribution on .

A probability distribution on a sphere in can by constructed by using one-dimensional density distribution, which takes only positive real values In such case, we can define spherical density distribution as:

(1)

where is a surface area of a -dimensional unitary sphere and is a one-dimensional density, which takes only positive real values. We use one-dimensional density distribution along radius of unit sphere in all directions. In our model, we use a Log-normal distribution

that is a continuous probability distribution of a random variable, whose logarithm is normally distributed and, hence, provides a non-compact support.

Spherical Log-Normal distribution in .

To develop an intuition behind the proposed distribution, we start with a simple visualization in . Fig. 2 shows level sets and sample from Spherical Log-Normal distribution with different parameters . Spherical Log-Normal distribution does not have a compact support and can therefore be used in a flow-based architecture. Furthermore, we can force the distribution to concentrate as close as possible to a 2D sphere boundaries.

In , our Spherical Log-Normal distribution is defined as:

(2)

In order to use our distribution in a flow-based model, we need to compute its log-likelihood function:

(3)

Finally, sampling elements from our Spherical Log-Normal distribution can be done by following a simple procedure. First sample from one-dimensional Gaussian then sample from -dimensional Gaussian . Sample form Spherical Log-Normal we obtain by the following equation:

We avoid numerical instabilities of training by applying a straightforward strategy to find the right values of parameter: we start with an arbitrary large value of and reduce it linearly during training.

Figure 3: We compare how the prior density is modified for the model with Gaussian prior (upper two rows) and Spherical Log-Normal (bottom two rows). In the first and third row we show how the flow model transforms the original density into the target dataset. The second and fourth row show the cross-sections along the plane depicted by red points. For the Gaussian, target space points are not distributed evenly across the object (a central part of Gaussian distribution is transformed into the bottom of the plane, while its tails are used to model wing tips). For the Spherical Log-Normal, target space points are distributed evenly, across the object, showcasing that our approach truly models the distribution of the points along object surfaces.

Triangulation trick

To model 3D object surfaces as meshes using HyperFlow generative model, we need to investigate the relationship between point clouds and object surfaces. In principle, a point cloud representing a 3D object can be considered a set of samples located on the surface of the object with additive noise introduced by a registration device. We use Spherical Log-Normal to model this distribution with peak density around object surfaces (in 2D, around circle edges, in 3D close to the surface of the sphere) and limited by the radius of the distribution. Once we obtain a parametrized distribution of a point cloud which models object surface together with a registration noise, we can produce a mesh with a simple operation which we call the triangulation trick.

The triangulation trick involves transferring vertices of a sphere mesh through a target network the same way as 3D points, as shown in Part C of Fig 1. Since the target network transforms a sample from a Spherical Log-Normal distribution into a 3D point cloud, when we feed it with a sphere triangulation, it outputs a mesh. In fact, when we substitute samples from Spherical Log-Normal distribution with sphere vertices, we effectively assume minimal registration noise. Processing vertices by the target network pre-trained on point clouds allows us to directly generate denoised mesh representation of object surfaces and obtain a high-quality 3D object rendering. The generative character of our HyperFlow model enables construction of the entire mesh by processing only vertices with a target network, without the need for information about the connections between them, as done in traditional rendering methods.

Fig. 3 presents reconstructions obtained using Gaussian and Spherical Log-Normal distributions. We look at the cross-sections of the reconstructions to observe the main differences on how the input distribution is transformed into a final model by a target network. For the Gaussian distribution, its tails are transformed into object details, such as wing tips and airplane rear aileron. Therefore, we cannot claim that the peak density models surfaces of the object, while its tails model the registration noise. For Spherical Log-Normal, its distribution tails are spread along object surfaces, modeling registration noise. This allows us to produce the final mesh through the triangulation trick, effectively denoising 3D mesh-based object representation and yielding high-quality results, as shown in Fig. 4.

Figure 4: Mesh representations generated by our HyperFlow method. Contrary to the existing methods that return point cloud representations sparsely distributed in 3D space, our approach allows to create a continuous 3D object representation in the form of high-quality meshes.

3 HyperFlow: hypernetwork and Continuous Normalizing Flows for generating 3D point clouds

In this section, we present our HyperFlow model that leverages a hypernetwork framework to train a Continuous Normalizing Flow grathwohl2018ffjord target network and generate 3D point clouds together with its mesh-based representation. Since HyperFlow encompasses previously introduced autoencoder-based PointFlow yang2019pointflow with conditioned continuous normalizing flow modules, and HyperCloud method ha2016hypernetworks , that also leverages hypernetworks, we briefly describe these two approaches before presenting ours.

Autoencoder-based generative model for 3D Point Clouds

Let us first present the autoencoder architecture. The basic aim of autoencoder is to transport the data through a typically, but not necessarily, lower dimensional latent space while minimizing the reconstruction error. Thus, we search for an encoder and decoder functions, which minimizes the reconstruction error. In the Autoencoder-based generative model we additionally ensure that the data transported to the latent comes from the prior distribution (typically Gaussian one) kingma2013auto ; tolstikhin2017wasserstein ; tabor2018cramer .

Continuous normalizing flow

Generative models are one of the fastest growing areas of deep learning. Variational Autoencoders (VAE) 

kingma2013auto and Generative Adversarial Networks (GAN) goodfellow2014generative are the most popular approaches. Another model gained popularity – Normalizing Flow (NF) rezende2015variational

. A flow-based generative model is constructed by a sequence of invertible transformations. Unlike the other two methods mentioned previously, the model explicitly learns the data distribution and therefore the loss function is simply the negative log-likelihood.

Normalizing Flow (NF) rezende2015variational is able to model complex probability distributions. A normalizing flow transforms a simple prior distribution (usually Gaussian one) into a complex one (represented by data distribution ) by applying a sequence of invertible transformation functions: . Flowing through a chain of transformations we obtain a probability distribution of the final target variable.

Then the probability density of the output variable is given by the change of variables formula:

(4)

where can be computed from using the inverse flow: In such framework, both the inverse map and the determinant of the Jacobian should be computable.

The continuous normalizing flow chen2018neural is a modification of the above approach, where instead of a discrete sequence of iterations we allow the transformation to be defined by a solution to a differential equation where is a neural network that has an unrestricted architecture. Continuous Normalizing Flows (CNF ) is a solution of differential equations with the initial value problem , . In such a case we have

(5)

where defines the continuous-time dynamics of the flow and .

The log probability cost function with prior distribution with density can be computed by:

(6)

In PointFlow yang2019pointflow authors show that CNF can be used for modeling 3D objects. Instead of directly parametrizing the distribution of points in a shape (fixed size 3D point cloud), PointFlow models this distribution as an invertible parameterized transformation of 3D points from a prior distribution (e.g., a 3D Gaussian). Intuitively, under this model, generating points for a given shape involves sampling points from a generic Gaussian prior, and then moving them according to this parameterized transformation to their new location in the target shape.

Hypernetwork

Hypernetworks, introduced in ha2016hypernetworks , are defined as neural models that generate weights for a separate target network solving a specific task. Making an analogy between hypernetworks and generative models, the authors of sheikh2017stochastic , use this mechanism to generate a diverse set of target networks approximating the same function. Hypernetworks can also be used for functional representations of images klocek2019hypernetwork .

In the case of generating 3D point clouds, objects are represented by a neural network. Autoencoder based architecture "produces" the neural network which transforms prior distribution into elements from a point cloud. In HyperCloud spurek2020hypernetwork autoencoder based architecture takes as an input point cloud and directly produces weights to another neural network, which models elements from a 3D object.

HyperFlow

In this section, we present details of our novel model dubbed HyperFlow333We make our implementation available at https://github.com/maciejzieba/HyperFlow which encompasses and extends prior works by training continuous normalizing flow modules to model 3D point cloud distributions with a hypernetwork framework. Our model is inspired by a Variational Autoencoder (VAE) kingma2013auto ; rezende2014stochastic framework that allows learning from a dataset of observations of . VAE models data distribution via a latent variable with a prior distribution , and a decoder which reconstructs the distribution of condition on a given . The model is trained together with an encoder by minimizing the lower bound on the log-likelihood of the observations (ELBO).

Instead of using a Gaussian prior over shape representations as done in yang2019pointflow , we add another CNF to model a learnable prior . The corresponding ELBO cost function can be rewritten after yang2019pointflow as:

(7)

where is the entropy and is the prior distribution with trainable parameters .

We propose to adapt the above cost function to a hypernetwork framework. We therefore introduce our HyperFlow model that consists of two main parts, as shown in Fig. 1. The first one is a hypernetwork that outputs weights (Fig. 1 Part A) of another neural network. The second one is a target network (Fig. 1 Part B) which models the distribution of elements on the surface of a 3D object. Using autoencoder terminology, we define three elements: an encoder, a decoder and a prior distribution.The encoder can reduce data dimensionality by mapping it to a lower-dimensional latent space . We follow achlioptas2017learning and use a simple permutation-invariant encoder to predict .

We use over shape representations proposed by PointFlow yang2019pointflow . The assumed probability distribution on the latent pace can be more complex than the commonly used and not given in an explicit form. In such a framework, we use an additional continuous normalizing flow , which transfers latent space into a Gaussian prior. Finally, we propose to use a decoder that returns weights of the target network , instead of 3D points as done in yang2019pointflow ; stypulkowski2019conditional . The resulting hypernetwork contains an encoder , a decoder and a flow (Fig. 1 Part A).

The hypernetwork takes as an input a point-cloud and returns weights to that defines the continuous-time dynamics of the flow . CNF takes an element from the prior distribution and transfers it to an element on the surface of the object, see Part B: target network in Fig. 1. In our work, we use a Free-form Jacobian of Reversible Dynamics (FFJORD) grathwohl2018ffjord and transformation between Spherical Log-Normal distribution and the 3D object. As presented in Sec. 2 this choice of distribution function allows one to create a continuous mesh representation with the triangulation trick.

The cost function of HyperFlow consists of two parts. The first one correspond to hypernetwork. This part of the architecture is similar to PointFlow. The second one is a cost function of CNF corresponding to target network. The final cost function of our HyperFlow model can be calculated using Eq. (7):

where is the entropy function, is a CNF cost function between point cloud and Spherical Log-Normal density and is a CNF cost function between latent representation and a Gaussian prior.

Airplane Chair Car
Method JSD MMD COV JSD MMD COV JSD MMD COV
CD EMD CD EMD CD EMD CD EMD CD EMD CD EMD
l-GAN 3.61 0.269 3.29 47.90 50.62 2.27 2.61 7.85 40.79 41.69 2.21 1.48 5.43 39.20 39.77
PC-GAN 4.63 0.287 3.57 36.46 40.94 3.90 2.75 8.20 36.50 38.98 5.85 1.12 5.83 23.56 30.29
PointFlow 4.92 0.217 3.24 46.91 48.40 1.74 2.42 7.87 46.83 46.98 0.87 0.91 5.22 44.03 46.59
HyperCloud 4.84 0.266 3.28 39.75 43.70 2.73 2.56 7.84 41.54 46.67 3.09 1.07 5.38 40.05 40.05
HyperFlow 5.39 0.226 3.16 46.66 51.60 1.50 2.30 8.01 44.71 46.37 1.07 1.14 5.30 45.74 47.44
Table 1: Generation results. MMD-CD scores are multiplied by ; MMD-EMD scores and JSDs are multiplied by .

4 Experiments

In this section, we present the evaluation of our model against the competing methods on two tasks: 3D point clouds generation and 3D mesh generation. Furthermore, we test the efficiency of our approach in terms of training time and memory footprint. All experiments are done on a stationary unit with a Nvidia GeForce GTX 1080 GPU. If not stated otherwise, default parameters are used.

Generating 3D point clouds

We compare the generative capabilities with competing approaches: latent-GAN achlioptas2017learning , PC-GAN li2018point , PointFlow yang2019pointflow , HyperCloud spurek2020hypernetwork . We follow the evaluation protocol of yang2019pointflow and train each model using point clouds from one of the three categories in the ShapeNet dataset shapenet : airplane, chair, and car. Tab. 1 presents the results and shows that HyperFlow obtains comparable or superior generative results to the state-of-the-art PointFlow method.

Generating 3D meshes

The main advantage of our method, when compare to the reference solutions, is the ability to generate high-quality 3D point clouds as well as meshes using the triangulation trick presented in Sec. 2. For evaluation of the quality of mesh grid representation, we follow the evaluation protocol of spurek2020hypernetwork . For PointFlow, we use the triangulation trick and create object meshes by feeding the target network a 3D sphere. For HyperCloud and our HyperFlow method we use a sphere with radius . As can be seen in Tab. 2, PointFlow that uses a Gaussian distribution as a prior provides results inferior to HyperCloud and HyperFlow, while our HyperFlow method offers the best performance, thanks to using Spherical Log-Normal as a prior instead of a compact support distribution function as in HyperCloud. More qualitative mesh results as well as detailed description of metrics used in our experiments can be found in the supplementary material.

Training time and memory footprint comparison

Airplane Chair Car
Sphere R JSD MMD COV JSD MMD COV JSD MMD COV
CD EMD CD EMD CD EMD CD EMD CD EMD CD EMD
PointFlow
R=2.795 22.26 0.49 6.65 44.69 20.74 19.28 4.28 13.38 36.85 20.84 16.59 1.6 8.00 20.17 17.04
R=3.136 26.46 0.60 6.89 39.50 19.01 22.52 4.89 14.47 32.47 17.22 20.21 1.75 7.80 21.59 17.3
R=3.368 29.65 0.68 6.84 40.49 16.79 24.68 5.36 14.97 31.41 17.06 24.10 1.96 8.35 18.75 17.04
HyperCloud
R=1 9.51 0.45 5.29 30.60 28.88 4.32 2.81 9.32 40.33 40.63 5.20 1.11 6.54 37.21 28.40
HyperFlow
R=1 6.55 0.38 3.65 40.49 48.64 4.26 3.33 8.27 41.99 45.32 5.77 1.39 5.91 28.40 37.21
Table 2: The values of quality measures of 3D representations obtained by sampling from sphere of a given radius for airplane, chair and car shapes. HyperFlow generates higher quality of point cloud representation than those of PointFlow and HyperCloud.

Fig. 5 displays a comparison between our HyperFlow method and the competing PointFlow. For a fair comparison we evaluated the architectures used in the previous sections that obtain best quantitative results. The models were trained on the car dataset. Our HyperFlow approach leads to a significant reduction in both training time and memory footprint due to a more compact flow architecture enabled by a hypernetwork framework.

Figure 5: Comparison of training times and GPU memory used by PointFlow and HyperFlow. Our HyperFlow method offers over an order of magnitude decrease in both training time and memory.

5 Conclusions

In this work, we introduce a novel HyperFlow method that uses a hypernetwork to model 3D objects as families of surfaces and, hence, allows to build state-of-the-art point cloud reconstructions and mesh-based object representations. To model a distribution of a point cloud we propose a new Spherical Log-Normal distribution with non-compact support that can be effectively used by a CNF model. Finally, we believe our work is the first approach to train CNF as a target network which reduces training cost and opens new research paths for modeling complex 3D structures, such as indoor scenes.

Broader Impact

This research can be beneficial for researchers and engineers working in the space of 3D point clouds and related registration devices, such as LIDARs and depth cameras. As such, the proposed methods can be used in the context of autonomous driving and robotics. Further extensions of this work can be beneficial for people with disparities, especially related to sensory disorders, such as shortsightedness or blindness, as 3D capturing devices can effectively extend their way of interacting and perceiving the external world. On the other hand, robotic automation resulting from this work can potentially put at disadvantage people whose livelihoods depend on manual execution of jobs that can be substituted with robotics. In case of system failure, the consequences include problems with handling outputs of registration devices, such as LIDARs and depth cameras. Our method does not leverage any biases in the data.

6 Supplementary material

In this supplementary material, we first present the full description of evaluation metrics used in the experiments. We then describe two experiments showing the relationship between Gaussian distribution and Spherical Log-Normal distribution proposed in our work. Finally, we show an extended set of visualizations obtained by HyperFlow.

6.1 Description of evaluation metrics

Following the methodology for evaluating generative fidelity and diversification among samples proposed in achlioptas2017learning and yang2019pointflow , we use the following evaluation metrics: Jensen-Shannon Divergence, Coverage, Minimum Matching Distance 1-nearest Neighbor Accuracy.

Jensen-Shannon Divergence (JSD): a measure of the distance between two empirical distributions and , defined as:

Coverage (COV): a measure of generative capabilities in terms of richness of generated samples from the model. For two point cloud sets coverage is defined as a fraction of points in that are in the given metric the nearest neighbor to some points in .

Minimum Matching Distance (MMD): since COV only takes the closest point clouds into account and does not depend on the distance between the matchings additional metric was introduced. For point cloud sets , MMD is a measure of similarity between point clouds in to those in .

We examine the generative capabilities of our HyperFlow model with respect to the existing reference approaches. We strictly follow the methodology presented in yang2019pointflow . We train each model using point clouds from one of the three categories in the ShapeNet dataset: airplane, chair, and car.

6.2 Scheduling parameters of Spherical Log-Normal

In our model we use Spherical Log-Normal density with and . Using Spherical Log-Normal density with small might be unstable since density distributing has small tails, see Fig. 2 (in main paper). At the beginning of training a log-likelihood cost function in some points might be close to zeros (numerically unstable).

Therefore, in the training procedure we start with large and reduce such parameter to . We use linear scheduling. In the case of starting and final value of with epochs we reduce the parameter by in each epoch.

Our model is approximately 10 times faster than PointFlow (see experimental section in main paper), and can be easily trained on HyperFlow density. In PointFlow architecture is larger and it is diffitult to train such model on our distribution from scratch. This process can be accelerated by using pre-trained model on classical Gaussian distribution. In such a case we can start from Spherical Log-Normal distribution with parameters and which approximate Gaussian distribution (see Theorem 6.1). In Fig. 6 we present comparison between samples from Gaussian distribution and Spherical Log-Normal distribution with such parameters. Thanks to such solution we can take a model already trained on Gaussian distribution and train it further with our strategy.

Figure 6: Comparison between samples from a Gaussian distribution (left) and Spherical Log-Normal distribution (right) which approximates normal distribution (with parameters from Theorem 6.1).
Figure 7: Object meshes generated for different radii. For the Gaussian prior, the quality and size of the mesh heavily depends on the radius size, while for Spherical Log-Normal the quality and size remains stable across radii sizes.
Figure 8: In the first row, we present a car from a data-set which contains elements inside objects. In the second row, we present reconstructions of the object. In the third row, we show meshes generated by radii which contains and percent of the density. As we can see, radius containing percent of the density generate the best mesh.
Theorem 6.1.

Classical Gaussian distribution in can be approximated by Spherical Log-Normal distribution (with log normal distribution) with parameters:

Proof.

Observe that both Gaussian and Spherical Log-Normal distributions are spherical. This means that to compare them it is enough to consider the distributions of the radius. In the case of Gaussian in , the distribution of radius is given by

distribution, which has mean and variance given by

On the other hand, Log-Normal (LN) distribution with parameters and has mean and variance given by

Now we have to solve above system of equations and calculate parameters by i .

6.3 Families of surfaces

In this section we would like to describe in a more detailed way, how HyperFlow approximates objects by families of surfaces. Let us recall that Fig. 3 of the main paper compares how the prior density is modified for the model with Gaussian prior and Spherical Log-Normal. For the Gaussian distribution, its tails are transformed into object details, such as wing tips and airplane rear aileron. Therefore, we cannot claim that the peak density models surfaces of the object, while its tails model the registration noise, as is the case for our Spherical Log-Normal distribution. For Spherical Log-Normal, the distribution tails are spread along object surfaces, modeling registration noise. This allows us to produce the final mesh through the triangulation trick, effectively denoising 3D mesh-based object representation and yielding high-quality results. In HyperFlow we use triangulation on unit sphere. It is motivated by the fact that point on surfaces has symmetric noise (gaussian noise). Nevertheless, we can use triangulation on sphere with different radii (corresponding to different percent of the density). To compare the models, for both of them we can draw the images of spheres which contain inside the same percentage of the data. In such a case we obtain families of surfaces. In Fig. 7 we present meshes obtained by different radii which contains and percent of the density. Spherical Log-Normal stabilizes triangulation, while for model with normal prior relatively high fluctuations can be observed.

Usually, it is enough to use triangulation on unit sphere. But in some cases we can obtain better meshes by changing radius of the sphere. For instance, some elements from ShapeNet do not contain only surfaces of objects. In the case of some cars, we have additional elements like steering wheel, see Fig. 8. In such a case, we can use triangulation trick with a larger radius sphere for obtaining better mesh representation, see Fig. 8.

6.4 Visualization of mesh representation obtained by HyperFlow

Below we present:

  • Fig. 9: Mesh representations generated by our HyperFlow method - extended version of the meshes presented in the main paper.

  • Fig. 10 and Fig. 11: Visualizations of how the triangulation on the sphere is transformed into a mesh of an object.

  • Fig 12: Visualizations on how the samples from our Spherical Log-Normal prior are transformed into points on objects.

Overall, our HyperFlow method offers stable and high-quality object meshes at significantly lower computation cost than the competing point cloud generative models.

Figure 9: Mesh representations generated by our HyperFlow method. Contrary to the existing methods that return point cloud representations sparsely distributed in 3D space, our approach allows to create a continuous 3D object representation in the form of high-quality meshes.
Figure 10: In the image we present how the triangulation on the sphere is transformed into mesh of object. As we can see, thanks to triangulation trick we obtain high quality mesh. Thanks to us CNF as a target network we can visualize continuous transformation between uniform sphere and surfaces of objects.
Figure 11: We show how the triangulation on the sphere is transformed into a mesh of object. Thanks to the so-called triangulation trick, we obtain high quality object meshes. Since we use a CNF as a target network, we can visualize a continuous transformation between a uniform sphere and surfaces of objects.
Figure 12: We visualize how the samples from prior (Spherical Log-Normal) are transformed into points on object surfaces. The transformation produces a point representation of a similar quality to PiontFlow.

References