 # Direct Fitting of Gaussian Mixture Models

When fitting Gaussian Mixture Models to 3D geometry, the model is typically fit to point clouds, even when the shapes were obtained as 3D meshes. Here we present a formulation for fitting Gaussian Mixture Models (GMMs) directly to geometric objects, using the triangles of triangular mesh instead of using points sampled from its surface. We demonstrate that this modification enables fitting higher-quality GMMs under a wider range of initialization conditions. Additionally, models obtained from this fitting method are shown to produce an improvement in 3D registration for both meshes and RGB-D frames.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

In robotics and computer vision, there exist many forms of representation for 3D geometric data. For example, some researchers use unordered point sets

[mckaybesl], others require points with surface normals [SGP:SGP06:061-070] or dense volumetric representations such as signed distance fields [KinectFusion]. The variation in forms of representation is related to the wide variety of sources and uses for this data, from raw depth sensor measurements to Computed-Aided Design (CAD) models.

Many researchers have found that Gaussian Mixture Models provide a powerful representation, especially for use in registration between unknown poses [Jian2011, Eckart2015, Eckart2016, Eckart2018, wenniegmm]. Producing a Gaussian Mixture Model requires only unstructured point sets from the underlying geometric data, which makes them widely applicable. Our contribution in this work is to demonstrate how Gaussian Mixture Model can be constructed, evaluated on and fit directly to geometric primitives, such as the triangles of a polygon mesh. This is done by incorporating the structural information from each primitive into the algorithm, for a visual example, see Figure 1.

Our contributions include a mathematical framework for how geometric primitives can be incorporated with probability distributions (Section

II-B). We demonstrate how to obtain the structural properties for a triangular mesh (Section II) and how it can be generalized to other primitives (Section V-A). Incorporating structural information allows us to build Gaussian Mixture Models that not only converge faster and in more conditions (Section IV) but also provide a representation that produces higher quality registration results when the models are used in practice (Section VI).

The code for all methods and experiments in this paper is available at https://github.com/leonidk/direct_gmm. Fig. 1: Visual example of the Stanford Bunny, highlighting 8 triangles on the head. Each triangle is shown with its covariance (plotted to 1.5σ). We demonstrate how Gaussian Mixture Models can use the covariance information from given geometric structures (in this case, triangles) to fit models more efficiently, robustly and with higher fidelity.

## Ii Method

### Ii-a Gaussian Mixture Models

The Gaussian Mixture Model (GMMs) is a well studied probability distribution. It is possible to fit these models to empirical data via Maximum Likelihood Estimation (MLE)

[Hasselblad1966, Dempster1977]. The likelihood of any point in a Gaussian is given by

 N(x;μ,Σ)=2π−k/2det(Σ)−12e−12(x−μ)TΣ−1(x−μ) (1)

where is the mean and is the positive-definite covariance matrix. The log-likelihood of a given point is given by

 logN(x;μ,Σ)= −k2log(2π) (2) −12log(det(Σ)) −12(x−μ)TΣ−1(x−μ)

In a Gaussian Mixture Model with components, with are mixing weights subject to and , the probability of a point is given by

 g(x)=K∑i=1πi Ni(x;μi,Σi)

### Ii-B Approximating a GMM over a triangular mesh

To approximate a Gaussian Mixture with components and a mesh of triangles of areas and centroids , let us first consider sampling points independently from each triangle over the area of the mesh. Then our total likelihood of these samples is the product over all triangles, over the number of samples on each triangle.

 L=M∏j=1S∏k=1K∑i=1πi Ni(xjk;μi,Σi) (3)

We can analyze equation 3 in the limit of samples with a product integral [feynman1951operator, guenther1983product, BASHIROV200836] over the bounds of a triangle (

). Note that we’re taking the geometric mean over all points in the triangle to account for the variable number of points sampled. Otherwise, the result would trend to 0 (in areas where

) or (in areas where ).

 L=M∏j=1limS→∞⎡⎢⎣S∏k=1(K∑i=1πi Ni(xjk;μi,Σi))1S⎤⎥⎦=M∏j=1limS→∞⎡⎢⎣exp⎛⎜⎝logS∏k=1(K∑i=1πi Ni(xjk;μi,Σi))1S⎞⎟⎠⎤⎥⎦=M∏j=1limS→∞[exp(S∑k=11Slog(K∑i=1πi Ni(xjk;μi,Σi)))]=M∏j=1exp(∫△log(K∑i=1πi Ni(x;μi,Σi))dx) (4)

We note that analyzing the likelihood as a geometric mean over an infinite number of point samples has one very nice property – it is invariant to resampling. That is, the additional of a single new primitive is equivalent to adding extra surface to integrate. New primitives provide new independent measurements, which is equivalent to creating a new area to integrate. Whether a surface is represented as one large primitive with sophisticated integral bounds, or as many small disjoint pieces of surface with their own bounds, this formulation gives the same answer.

Following the use of Jensen’s inequality, we get the following lower bound on likelihood

 L=exp(M∑j=1∫△log(K∑i=1πi Ni(x;μi,Σi))dx)log(L)=M∑j=1∫△log(K∑i=1πi Ni(x;μi,Σi))dx≥M∑j=1K∑i=1∫△log(πi Ni(x;μi,Σi))dx (5)

We also consider a simplified model, where each triangle is sampled at its center of mass (), and has weight corresponding to its area (). As combining probabilities is done with multiplication, we use a weighted geometric mean over all points, obtaining the following approximation

 L≈LS=M∏j=1(K∑i=1πi Ni(μj;μi,Σi))αj∑kRk (6)

## Iii Modifying EM maximization to account for triangles

In this section we derive the expressions for fitting a GMM to a triangular mesh. We will represent each GMM component with parameters and each triangle with vertices , centroid and area . Traditional EM minimization is possible by analyzing the lower bound of the log-likelihood. Following standard formulations (see [Dempster1977, jordan2009, sridharan2014gaussian]), we add mixture sampling probabilities and move the logarithm inside the summation to obtain a valid lower-bound to minimize by using Jensen’s inequality.

To perform fitting, we need an M-step which obtains by maximizing the lower bound

 LB=M∑j=1K∑i=1ηijlog(πiNi(xjk;μi,Σi)) (7)

To maximize this expression, we can take derivatives with respect to the variables of interest and obtain

 ∂LB∂μi=12M∑j=1Σ−1i(xj−ui)ηij∂LB∂Σ−1i=12M∑j=1ηij(Σi−(xj−μi)(xj−μi)T) (8)

Now we can integrate these three expressions over the two dimensional surface of a triangle via a change of variables substitution, then set the result equal to zero and solve, thus obtaining the update equations. For clarity we will also define a weight variable and its corresponding normalization constant

 wij=ηijαj
 Wi=M∑j=1wij

The resulting lower-bound likelihood can be written as

 log(L) ≥12M∑j=1K∑i=1wij (9) [2log(πi)−klog(2π)−log(det(Σi)) −(μj−μi)TΣ−1i(μj−μi) −112(ATjΣ−1iAj+BTjΣ−1iBj+CTjΣ−1iCj −3μTjΣ−1iμj)]

The new mean is obtained as simply weighted mean of centroids. This update equation is identical to the one derived for the approximation in equation 6 μ_i = 1Wi ∑_j=1^M w_ij μ_j The same technique will provide an answer to the update equation for covariance. Σ_i =1Wi ∑_j=1^M w_ij [ ⏟(μ_j-μ_i)(μ_j-μ_i)^T_cov(μ_j,μ_i) + ⏟112(A_j A_j^T + B_j B_j^T + C_j C_j^T - 3 μ_j μ_j^T)_cov(△_j) ] The final update is surprisingly simple, it is the area weighted average of the covariance obtained by using centroids as point measurements plus the covariance equation for a triangle. That is, at every iteration, each mixture is updated with some fraction of the structure of the triangles associated with it. A visual example of ellipsoids showing triangle covariance structures is shown in Fig. 1. Our derived expression for the covariance of a triangle is expressed in terms of vertices, but is consistent with the standard formulation in CGAL [gupta:inria-00327027].111

The reference document has a typographic error in the moment matrix for triangles, which should be a 5x multiple of the one for 3D tetrahedrons. The CGAL source code correctly implements this matrix in practice.

The update equation for eq. 6 is similar, simply lacking the term.

### Iii-a Evaluating the derived loss function

To evaluate the validity of the expression in equation 9, we compare its fitting fidelity numerically against a large number of sampled points. The results are shown in figure 2. We can see that, the lower bound expression for triangles is equal to that obtained numerically from a large number of points. Since the lower-bound expression is all that’s needed in EM optimization [Dempster1977], the equality of this expression suggests we can use it in fitting real data. Fig. 2: Comparison of fitting metrics. After fitting a 50 mixture GMM to 100,000 points randomly sampled from the res4 Stanford bunny, different log-likelihood expressions are compared. Triangle refers to the equation 9, while Centroid refers to using only the triangle centroids, while Area × Centroid refers to using our approximation in eq. 6. The results are on the res4 variant of the Stanford bunny, which has 948 faces and 453 vertices. The remaining bars show results using different numbers of points sampled from the mesh surface. The y-axis shows the sum of individual Gaussian component log-likelihoods (∑∑log(x)), equivalent to the lower-bound obtained from Jensen’s inequality. The horizontal line shows the result of using all the points, our best approximation of the correct answer.

## Iv Results

We performed experiments fitting Gaussian Mixture Models to triangular meshes. We swept a wide range of

mixture components (from 6 to 400) and evaluated two different initialization schemes. The first initialization performs k-means++

[arthur2007k] clustering, and uses those clusters as initial assignments for the EM method. The second method uses simple random assignments for initialization. We run 25 iterations of EM for all methods, with a tight tolerance () to prevent an early exit from the optimization.

To evaluate the converged model, we use a densely sampled point cloud of the initial mesh (Figure 4(e)) and report its likelihood according to equation 1. Since all of our experiments tend to operate on 1,000 points or triangles, the use of 50,000 points for evaluating the model should provide a good test of GMM model fidelity. In all of these cases, we focus on the Stanford Bunny. Visual examples of our input and output data is shown in Figure 5.

### Iv-a Mesh Input Data Fig. 3: GMMs fit to different data-types of the low-res Stanford Bunny. The graphs show fitting fidelity of the converged model. The dashed lines use triangle likelihood estimates, while the solid lines use traditional point loss. Exact refers to the M-step derived in eq. III & III, while Approx refers to only using eq. 6. The results are on the res4 variant of the Stanford bunny, which has 948 faces and 453 vertices. Evaluation is performed by evaluating the likelihood of 50,000 points sampled from the res4 Stanford bunny.

The first set of experiments, shown in Figure 3, compares fitting GMM models to different input formats of the res4 Stanford Bunny. We compare our exact (eq. 9) and approximate (eq. 6) mesh loss equations against fitting a traditional point-loss to the triangle centroids and the mesh vertices. The best results came from our exact mesh expression, with the second best being its approximation. The proposed methods handled random initialization and k-means initialization. On the other hand, the point-based methods often had a preferred initialization. A qualitative look at the resulting models, shows that the mesh GMM (Fig. 4(h)) produces a fuller model of the Stanford Bunny than the center-of-mass GMM (Fig. 4(d)), even when the evaluated likelihood was numerically very similar.

### Iv-B Mesh Decimation

While the above experiments used the low-resolution res4 Stanford Bunny, we also evaluated which GMM fitting strategy works best when subsampling high-resolution mesh data. In our case, we try two methods point-sampling (Poisson and Random) and two methods of triangle collapse (Quadric and Clustering). Our experimental procedure involves fitting a GMM to a low-resolution mesh (1,000 faces) or point cloud (1,000 points) and evaluating the likelihood of the resulting GMM against a dense point sampling of the original shape (50,000 points). The different sampling strategies can be seen visually in Figure 5.

To generate a low-resolution mesh, we try two methods, clustering decimation [Rossignac1993], which is fast, and quadric error decimation [Garland1997], which is more accurate. To generate a low-resolution point cloud, we both randomly sample points on the surface on the mesh, and use Poisson Disc sampling to ensure uniform samples [Cignoni2012]. As before, the first is faster while the latter produces better results. Poisson Disc sampling is also used to generate the high-resolution, ”ground truth”, point cloud used for evaluation.

The results of these experiments are reported in Figure 4. As before, the mesh-based registration strategies are largely invariant to initialization method while the point-based strategies often prefer k-means initialization (exceptions discussed in Sec. IV-C). The best results often came from the use of Poisson Disc Sampling (with k-means), which ensures uniform coverage of the surface areas. In contrast, using random samples generated the lowest quality results. The mesh-based techniques proved to be reliable across all tests, regardless of mixture number and initialization. Fig. 4: GMMs fit to different input formats of the Stanford Bunny. The graphs show fitting fidelity of the convered Gaussian Mixture MModel. The dashed lines use eq. III,III, while the solid lines use traditional point loss. The results are on the Stanford bunny, which has been simplified to ≈1000 triangles or points respectively with two different methods each. Random or Poisson disc sampling  [Cignoni2012], and with either clustering [Rossignac1993] or quadric-error decimation [Garland1997]. See Fig. 5 for visual examples of these formats. Evaluation is performed by evaluating the likelihood of a high-resolution point cloud sampled from the original Stanford bunny.

### Iv-C Discussion

All of these experiments were run using the GaussianMixture code base from scikit-learn v0.20.0  [scikit-learn]. We modified the code to support additional weight and covariance terms (Sec. V-A), which were general enough to allow us to implement all proposed methods.

One surprising result in Figures  3 and 4 was that random sampled points performed better with random initialization than k-means initialization (at high mixture numbers). As the EM algorithm only finds a local minima, this suggests that k-means may not always be an ideal initialization technique. We believe that this occurred due to either a bad local minima from initialization, or fairly flat cost during optimization, leading to an early exit condition being triggered (despite our tight tolerance of ). This behavior was never observed when using our proposed exact mesh formulation.

We used 25 iterations for all of these experiments. When 100 iterations were used, the point-based methods performed better (nearly as good as our proposed method). However, our method often converged in about the number of iterations, so we picked a lower iteration number for consistency in runtime.

## V Extensions

### V-a Generalization to other primitives

While equations 9,III,III were derived specifically for triangles, the update equations can be written more generally for any primitive (triangles, Gaussian mixtures, cuboids, etc.) using to denote primitive’s mean, covariance and size () respectively. Then the loss, mean update, and covariance update equations for a Gaussian Mixture can be written with equations 10,11,12. The previous equations can be seen as a special case of these formulas, which provide an M-step update for any set of geometric primitives in fitting a Gaussian Mixture Model.

 log(L) ≥12P∑p=1K∑i=1wip[2log(πi)−klog(2π) (10) −log(det(Σi))−(μp−μi)TΣ−1i(μp−μi)−Σp]
 μi =1WiP∑pwipμp (11) Σi =1WiP∑pwip[(μp−μi)(μp−μi)T+Σp] (12)

We note that these are the exact same update equations used in fitting hierarchical Gaussian Mixture Models [NIPS1998_1543]. However, while previous work applied these equations to fitting GMMs to existing GMMs, we show how this update can be used for fitting geometric data. This general form allows for easy substitution of known structural information (in the form of a second moment) about any geometric primitive. Computing covariance structures for arbitrary polyhedra is well studied area of research [fastCovariance, gupta:inria-00327027].

### V-B Number of Mixtures

In our later experiments, we fix the number of mixture models. Unless otherwise stated, we use . A visual example of this mixture can be see in Figures 4(d) and 4(h). We picked as this matches the experimental conditions recommended for using GMMs for SLAM [wenniegmm]. In practice, there are many ways to select this number, including flatness of the distribution’s KL-divergence  [shobitGMMNum, dhawale2018fast], flatness of the mixture themselves [Eckart2018], or by evaluating an information criterion [Pelleg00x-means:extending]. We believe that when this technique is used in practice, this number can either be found through cross-validation [wenniegmm] on a registration dataset or by using external system information such as depth sensor noise models [Keselman_2017_CVPR_Workshops]. Fig. 6: Registration results on the Stanford Bunny, following the experimental setup described in in [Eckart2018]. The experimental conditions for these tests are described in Section VI. We plot the results of 250 random rigid deformations. We show the actual data, along with a box and whisker plot showing the median and its confidence interval; the mean is plotted as a larger dot. mesh and points are the results of maximizing the likelihood of a set of points (N=453) against our fit Gaussian Mixture models (K=100). The mesh result uses our proposed method, fitting a GMM to the mesh triangles, while points shows the results of fitting a GMM to the mesh vertices (N=453). icp is our implementation of point-to-point ICP [mckaybesl], and cpd [cpd] is from pyCPD [PyCPD]. Model sizerefers to the length of the diagonal of the model’s bounding box, and we report our results in percent (so all reported methods have position error, on average, better than 1.1% of the model size). Some methods have outliers that converged to the wrong local minima, and hence have a very large mean relative to their distribution.

## Vi Applications

The experiments in section IV showed that fitting Gaussian Mixture Models using structural information tends to produce higher quality probability distributions. Some recent work has focused solely on the efficient nature of GMMs in representing shapes [Eckart2016]. Here we show that our improvements in model quality produce an appreciable performance improvement in actual 3D computer vision applications.

Gaussian Mixture Models have found wide use in the 3D registration literature. From the Normal Distance Transform [5152538, doi:10.1177/0278364912460895], to variants of the loss [pointsetregistration, L2LossNDT3D] and even Coherent Point Drift [cpd], many 3D registration methods utilize Gaussian Mixture Models. Their benefits include robustness to noise, smooth variation over 3D space, speed of evaluation, and straightforward control over model complexity. These models can provide results that are state-of-the-art in both runtime and registration accuracy [Eckart2018]. We show that applying our proposed mesh GMM fitting can produce an improvement in these results.

### Vi-a Mesh Registration

We replicate the experimental setup of a recent paper [Eckart2018], demonstrating how Gaussian Mixture Models can be used for efficient 3D registration. As our experimental setup matches [Eckart2018], the 20 different dozen registration methods compared in Figure 3 of that work can be directly compared against the results here. Their experiment operates on taking a large number of random deformations of the Stanford Bunny and evaluating the final quality of fitting result. Our results can be seen in Figure 6.

To perform 3D registration, we first build a GMM for the res4 Stanford Bunny, as in Section IV-A, using 100 iterations of EM with a tolerance of . We then sample vertex number of 3D points from the surface of the mesh (). While previous work has focused on a Point-to-Distribution (P2D) technique with a polynomial approximation of the likelihood function [Magnusson_2009], we do straightforward P2D in our experiments. Our registration process consists of finding the rigid body transformation that maximizes equation 1. We use the identity transformation for initialization and then perform gradient-based optimization to find the local minima. We compute gradients using numerical differences. For the optimizer, we tried both Conjugate Gradients [cgog, Shewchuk94anintroduction] and BFGS [NoceWrig06] as optimization strategies, which produced similar results and we report the BFGS results as it ran faster.

To parameterize our rigid-body transformation, we perform the optimization on , with a translation and a quaternion . Quaternions are well studied in the context of optimization for rigid-body transformation [Schmidt:2001:UQP:647260.718651, Hartley2009]. As we use numerical differences in our optimizer, we did not utilize methods the closed-form gradients for quaternions [quaterniongrads]. While many authors prefer the exponential map for optimizing rigid-body transformations [wenniegmm]

, our experiments using the rotation vector

(with rotation angle around the unit vector ) produced nearly identical results in our final registration result.

We performed these experiments on our mesh and point-derived Gaussian Mixture Models, as well as two baselines. We implemented our own point-to-point Iterative Closest Point (ICP) method [mckaybesl] and used an exiting implementation of Coherent Point Drift (CPD) [cpd] from pyCPD [PyCPD]. We adjusted pyCPD to run for 150 iterations to approximately match the run-time of our P2D GMM registration. ICP we ran for up to 50,000 iterations, or until the improvement in mean matching error was below . For consistency, all methods used in this paper were implemented in the Python programming language and only used the CPU.

### Vi-B Analysis of Mesh Registration

The results in Figure 6

demonstrate that our mesh-derived Gaussian Mixture Model provides improved registration results when using P2D compared to the existing baselines, ICP and CPD. Not only does our method produce better registration on average, but it also demonstrates a better distribution of errors. Specifically, the small difference between the median and mean errors shows that our method is less prone to outliers. On the other hand, the point-based GMM P2D registration results had outliers that dragged the mean towards the worst quartile of results. The randomly initialized point-based GMM had a mean that was about three times that of its median result, suggesting that some experiments produced results in the incorrect local minima.

### Vi-C Other 3D Models

We report results on additional 3D models in Table I. For consistency, we decimated each model to 1000 faces using [Garland1997] and then repeated our previous experiments exactly (except that the registration results are now the average of 25 runs). The likelihood column reports the per-sample average log-likelihood of ground truth, where larger numbers are better. The translation and rotation errors are reported as a percentage of the error obtained by ICP registration. In all our experiments, the mesh-based GMM always outperformed the point-based one, often significantly.

### Vi-D Visual Odometry

Our proposed method can be also used for improved models of partial view observations, such as those seen in simultaneous localization and mapping (SLAM). In this case, the geometric primitives used represent not the exact surface, as with our mesh experiment above, but instead an uncertainty region for each 3D measurement.

We performed experiments using a GMM distribution-to-distribution (D2D) registration method for visual odometry [wenniegmm], reproducing an experiment on an RGB-D dataset sequence from the TUM dataset (freiburg3 long office household[sturm12iros]. We are able to incorporate structural information into the fitting of the Gaussian Mixture Models by adding depth uncertainty information around each 3D point and applying equation 12 during GMM fitting. The results are shown in Figure 7.

For our registration experiments, we first subsampled the depth images to resolution before performing frame-to-frame registration over the 2510 frame sequence. The ICP method used our aforementioned point-to-point ICP method over 2,500 points randomly sampled from each point cloud (selected to roughly match the run-time performance of our method). The GMM method fit a GMM to the point cloud using our uncertainty model, and performed registration using a D2D metric [wenniegmm]. We used the determinant-free method as it was much faster in our re-implementation. Additionally, our implementation used numerical gradients and BFGS [NoceWrig06] as the optimizer.

In this case, our primitive model was simple one, a rectangle representing the size of each 3D measurement in the X and Y axes of the camera. This generated a trajectory with absolute translational error RMSE of , a small improvement () over the APE produced from building GMMs without uncertainty primitives. Additionally, we found that D2D registration time was faster when using GMMs built from primitives. Fig. 7: Top-down view of trajectories generated using different registration methods for visual odometry. The GMM method uses a per-pixel uncertainty primitive during GMM fitting. The ICP method is pt2pt ICP [mckaybesl]. Runtime for both methods was similar. For details, see Section VI-D. For additional baselines and experiments on this dataset see [wenniegmm].

## Vii Conclusion

We have shown how to build Gaussian Mixture Models by incorporating structural information into their Expectation Maximization algorithm. We demonstrate theoretical and empirical equivalence with traditional techniques, along with providing a fast approximation to our proposed method. By using the covariance structure from the triangles of a mesh, we are able to build GMM models more quickly, robustly and to higher quality. Additionally, these models provide an improved result in 3D registration. For a theoretical understanding of how geometric structures, point samples, and integrals interact, our product integral derivation provides a model that is invariant to resampling (such as triangles being merged or split while retaining the same overall 3D structure). We believe that paper demonstrates that using structural information can lead to methods that are faster, more robust, and more lead to improved performance.