I Introduction
Would it not be great to have one probabilistic tool to describe arbitrary distributions that arise from simultaneous localization and mapping (SLAM), tracking or even point set registration? We show how to include Gaussian mixture models into the typical nonlinear least squares solvers that are used for these problems today.
State estimation in robotics is based on deterministic models, that describe the physical relation between measurements and states. Due to the absence of perfect measurements, probabilistic models are required additionally to describe the related uncertainty. For these models, the Gaussian distribution is the first choice, since it allows efficient least squares optimization.
However, sensor information is often corrupted by outliers that do not follow a Gaussian: Vision can blur, odometry can slip and GNSS can be affected by reflections and multipath. Even if the sensor noise itself is Gaussian, ambiguity can lead to multimodal densities for SLAM or point set registration problems. Without better probabilistic models, state estimation is likely to fail in these cases.
One of the most popular models for robust estimation is the Gaussian mixture model (GMM), which is composed of a sum of multiple weighted Gaussian components. In difference to the also common Mestimators, GMMs are preferred when outliers follow a systematic pattern or ambiguity has to be represented. They are flexible enough to represent asymmetry, heavy tails and skewness, but close enough to a single Gaussian to allow sufficient convergence using nonlinear least squares.
There already exist algorithms to include GMMs in nonlinear least squares optimization, like MaxMixture (MM) [15] or the approach proposed in [19], in this work referred as SumMixture (SM). However, both of them come with individual drawbacks in terms of model accuracy or convergence as we explain in Section II.
In this paper, we introduce a novel representation for Gaussian mixtures in least squares problems, such as state estimation using factor graphs or point set registration. Based on SumMixture, we derive a hybrid formulation of MaxMixture and SumMixture using robust numerical methods in Section III. It offers the combined advantages of both, without their individual disadvantages.In addition, we provide:

A discussion how to reduce the impact of nonlinearity on the estimation process in Section IIIB.

A comparison against MaxMixture and SumMixture in extensive Monte Carlo experiments, as shown in Figure 1, in Section IVA.

A novel method to apply the proposed model to point set registration in Section IVB.

A detailed analysis of the reasons why stateoftheart approaches fail under certain conditions in Section V.
Furthermore, we provide open source implementations^{1}^{1}1Available under www.mytuc.org/mix of our approach – as well as SM and MM – for two of the most common optimization frameworks in robotics: Ceres [3] and GTSAM [8].
Ii Prior Work
While the field of robust state estimation is remarkably broad, we focus on a brief overview of different robust probabilistic models for nonlinear least squares (NLS) and factor graph optimization. The common goal of all algorithms is to reduce the huge impact of nonGaussian outliers to least squares problems.
Mestimators are one of the standard solutions for robust estimation. They describe a set of loss functions that weight large residuals down. Frameworks like GTSAM or Ceres bring a variety of Mestimators with them, like the historical Huber Mestimator
[11] or the more recent Dynamic Covariance Scaling (DCS) [2]. Due to their symmetry, they are not suited to describe arbitrary multimodal distributions that arise from ambiguity or effects like reflected GNSS signals. Also, their performance depends strongly on the choice of specific parameters, which often leads to manual tuning. In more recent work like [1] and [7], the parametrization issue was resolved by selftuning algorithms based on expectationmaximization. But still, they are limited by their symmetry constraint.
Switchable constraints [21] introduced weights for each residual that can be jointly optimized within the least squares problem. In this way, the optimizer can downweight outliers individually. Albeit it is formulated different, the authors of [2] showed that its solution is equivalent to an Mestimator.
Gaussian mixtures are quite flexible models but hard to integrate into nonlinear least squares. So the authors of [18] proposed the idea of preselecting the dominant mode and using it as an approximation of the full GMM. Since the optimization problem was constructed fully Gaussian afterwards, it is just a limited workaround and no reselection is possible. Olson et al. went one step further in [15] and proposed a better approximation – a local maximum of multiple Gaussians. In contrast to the explicit selection in [18], the MaxMixture approach selects the dominant mode locally and implicitly during the optimization. MM is widely adopted in SLAM and factor graph optimization as an efficient and robust model. However, it is still a limited approximation of the true density, especially GMMs with overlapping components are not represented well.
With [19], Rosen et al. proposed a very general approach to represent almost arbitrary distributions in NLS problems. This idea had a fundamental importance for later work [17], were we firstly implemented an exact GMM for NLS. As we show in this publication, the SumMixture model from [17] is fundamentally illposed beside some special cases. Nevertheless, both papers [15] and [19] inspired the novel approach that is presented in this work.
Besides leastsquares optimization, the authors of [20]
applied Gaussian mixtures to solve probabilistic point set registration. They used the full Gaussian mixture, but approximated the product of probability densities by a sum of them, which violates fundamental assumptions of probability theory.
The authors of [10] and [9] introduced novel algorithms to represent ambiguity, which includes equivalents of Gaussian mixture models. However, they require different kinds of specialized solvers and come with a lot of overhead.
We, instead, present a novel Gaussianmixture implementation for common nonlinear least squares solvers, that can be applied to a variety of robotic applications. Our approach combines the wellbehaved convergence of MaxMixture with the exact representation of SumMixture without adopting their particular weaknesses.
Iii Gaussian Mixture Models for Least Squares
To introduce our novel Gaussian mixture representation for least squares, we take a look at the Gaussian case before we extend it to mixtures.
Central idea of probabilistic state estimation is to find the most likely set of states based on a set of perceived measurements . The likelihood maximization is done by minimizing the negative loglikelihood, which leads to the common least squares formulation:
(1)  
(2)  
(3)  
(4) 
Please note, that we use two notations for the squared Mahalanobis distance, which are
(5) 
where is the measurement covariance matrix and is the squareroot information matrix. Residual term describes a nonlinear difference between a subset of states and measurements . Constant is the Gaussian lognormalization and does not depend on the states , so it can be omitted. The remaining pure quadratic form is the direct result of assuming Gaussian distributed noise for the residual . Since a quadratic problem is sensitive to any kind of outlier, the residual term is often wrapped into a loss function which brings us to robust least squares:
(6) 
This new cost function (missing) is composed of the residual function and the loss function . The loss can be derived directly from negative loglikelihood of the residual (missing) to apply different probabilistic models:
(7) 
The Mahalanobis distance (missing) is a special case of this loss function. Please note, that we define inside the square of (missing) to allow asymmetric loss functions, which is slightly different to the literature. The main contribution of this paper can be condensed down to the answer to a simple question: How can we represent the loglikelihood of a GMM using the loss function ?
Iiia MaxSumMixture – A Novel Approach
For a better understanding of the underlying mathematical structure, we introduce a compressed notation for the probability density of a Gaussian mixture with components:
(8) 
The constant scaling contains the weighting of a component and the square root information matrix :
(9) 
The exponent is defined as half of the squared Mahalanobis distance between the residual and the mean :
(10) 
The loglikelihood of (missing) is the scaled version of the logsumexp function
(11) 
with the set of scalings and exponents . While it is not possible to push the logarithm inside the sum, it is indeed possible to pull one particular exponent outside:
(12) 
This is a common approach to formulate a numerical robust logsumexp function, but we apply it algebraically to split the sum in a linear and a remaining nonlinear part. Since can be chosen freely, we choose the locally dominant component to get a big linear term. So the index of is defined by
(13) 
Interestingly, this part is identical with the MaxMixture formulation from [15]. In difference to MM, we keep the remaining nonlinear term in the least squares problem. Therefore, we redefine the set of exponents as
(14) 
and apply the generalization approach of [19] by introducing a normalization constant :
(15) 
The normalization guarantees that the expression inside the square root is always positive. As shown in detail in Appendix A, it can be set to
(16) 
Damping parameter can be used to reduce the nonlinearity of (missing) – its optimal choice is discussed the next section. Now, everything can be put together to formulate our main contribution – the square root of the negative loglikelihood of a GMM:
(17) 
Please note that this is a vector expression with a dimensionality of
. The related Jacobian is provided in Appendix B.IiiB Damping the Nonlinearity
The previously introduced damping factor seems superfluous in the first place, but it can be used to control the impact of nonlinearities. Therefore, its worth thinking about the best choice of .
The effect of the damping factor on the optimization itself is rather small, since it only shifts the cost surface. However, as elaborated in [14, p. 22], shifting the cost can decrease the rate of convergence. Additionally, a high deceases the relative cost change, which can lead to an early termination of the optimization. Therefore, should not be too high.
The damping factor affects also the estimated covariance of a state , that can be recovered from the optimization problem. Covariance recovery can be done by inverting the Hessian that is approximated by the squared Jacobian :
(18) 
We refer to this approximated Hessian as pseudoHessian. In Figure 3, we illustrate the result of this approximation for different choices of the damping parameter. For a high , the pseudoHessian of MSM becomes similar to the pseudoHessian of MM, which is slightly to big – and therefore slightly overconfident as shown in Figure 3. As smaller the damping gets, as more overconfident the pseudoHessian becomes. So, should not be too low.
Based on practical experience, almost any value will work for the damping factor. Nevertheless, we could achieve the best results with values of between 1 and 1000. So, we recommend using and apply this for our evaluation. Figure 3 shows the resulting cost function compared to MaxMixture [15] and SumMixture [19].
IiiC Relation between SumMixture and MaxSumMixture
Since the proposed MaxSumMixture has a substantial algebraic overlap with SumMixture, we take a closer look at their mathematical relation. Both loss functions are provided together with MM in Table I and their Jacobians are given in Appendix B. For a fair evaluation, we apply the numerical robust logsumexp formulation for SM, even though this was not part of the original publication.
By comparing the scalar loss of SM with the intermediate step (missing) of MSM, it becomes clear that both describe the same loglikelihood and their squared costs are directly proportional:
(19) 
They differ only in two points: their normalization constants are slightly different and MSM uses a vectorized loss function where the dominant component is pulled out.
Since the normalization constants barely influence the optimization, the major advantage of MSM stems from the vectorization. Separating the dominant linear part from the nonlinear in (missing), exposes a linear representation of the dominant mode to the optimizer. For SM on the other hand, the optimizer has to calculate its steps based on the Jacobian of a completely nonlinear cost term. Thus, the actual steps of MSM behave better than those of SM, even though their squared costs are almost equal. In consequence, the proposed MSM is a much more efficient implementation of the concepts of [19] than the original scalar formulation. We will verify this advantage in the next section.
Iv Monte Carlo Evaluation
In the following, we explain the motivation behind our simulative evaluation. Our experiments are designed to analyze the properties of the proposed MaxSumMixture formulation compared to stateoftheart approaches from [15] and [19]. A good probabilistic loss function should be an accurate and efficient representation of the corresponding distribution. Therefore, we stress all approaches for their convergence properties, their accuracy, their credibility and their computational efficiency.
We provide two types of synthetic Monte Carlo experiments to generalize our findings beyond specific applications. A plain optimization of a random set of Gaussian mixtures and a set of randomized point set registration problems. All experiments were executed on a standard AMD Ryzen 7 desktop PC.
Iva Plain Optimization
We design the first synthetic experiment to test our approach for a wellbehaved convergence. This includes the ability to represent the true mode of a Gaussian mixture and a cost surface that allows the optimizer to converge fast.
Hence, we create the simplest possible optimization problem that involves a GMM – a single linear residual wrapped into a nonlinear loss function:
(20) 
The loss function represents the negative loglikelihood as described in (missing). To allow a rough comparison with Mestimators, we also add DCS [2] to the symmetric scenarios. Starting from different initial positions, leastsquares optimization is applied to find the minimum of each cost function. By repeating the experiment for a set of different Gaussian mixtures, we run optimizations in total.
Dataset Generation
To generate a variety of different models, we sample the mixture parameters randomly from a set of uniform distributions. All mixtures are composed of two components, the first one with a zero mean and a standard deviation drawn between
and . The second component has a random mean from the range of and the standard deviation of the first, multiplied by a random factor between and . A weight between and is used for the first component – the second weight is simply the difference to . For twodimensional models, we draw the values for both dimensions independently. To guarantee that converging to the global optimum is always possible, we detect models with multiple minima and reject them, using a combination of gridbased sampling and local maxima detection. The initial positions are linearly spaced in a range of to cover the relevant cost surface. We implement the evaluation as part of the libRSF [16], using the Ceres [3] version of the LevenbergMarquardt algorithm.Metrics
We evaluate the rootmeansquare error (RMSE) as distance between the true mode of the GMM and the optimized variable after convergence. The true mode is found by using a grid search, followed by a local optimization. To check if an optimization converged to the right minimum, we apply an error threshold of and define the ratio of runs below this limit as convergence success rate. All results are listed together with the averaged run time and number of iterations in Table III.
IvB Point Set Registration
Our motivation behind the second simulated evaluation stems from egomotion estimation using sensors that measure sets of spatial points, such as radars or lidars. This task can be interpreted as a robust point set registration problem, as shown for example by [5] with radar sensors. We apply the concept from [12] to represent two set of points with Gaussian mixture models and register them in a distributiontodistribution manner. By using a GMM representation, we can simply integrate a varying uncertainty of each target point (or sometimes referred to as landmark) within the two point sets – crucial for having a credible estimator. Furthermore, we can omit the usually required step of finding explicit correspondences compared to other point set registration algorithms.
However, with focus on credibility, we differ from the implementation in [5]. They used a sum approximation to align both GMM distributions which leads to robustness against outliers, but results in wrong covariances. Their approach increases furthermore the possible number of local minima – we refer to future work for a more detailed investigation.
Theoretical Background
In general, point set registration is the task of finding the transformation which aligns a current or moving point set , so that it overlaps with a reference or fixed set . We use the index for a point measurement within and for a point in respectively. For finding the best transformation
, we can maximize the joint distribution
(21) 
Following the idea of [12] and [20], we model the reference set as GMM, which results in the following distribution for one point of :
(22) 
Taking into account the transformation and rewriting the normalization terms leads to the likelihood function
(23) 
The residual represents the transformation of point into the reference frame with the current estimate of :
(24) 
Also, the covariance matrix of is transformed into the current frame, using the current estimate’s rotation matrix . The resulting least squares problem is
(25) 
where the loss describes the negative loglikelihood of the GMM (missing). Note, that we can simply add an uncertain Gaussian component to the fixed frame, representing outliers, to have a robust version of the point set registration problem. We use this robust version in our last experiment with the Manhattan dataset. Again, LevenbergMarquardt is applied to solve the optimization, in this experiments based on the GTSAM framework [8].
Dataset Generation
Experiment  2D  3D 

Num. of Configurations  100  100 
lightgrayLandmark
Generation 

Clustering (size)  ()  () 
Cluster spread  
blackRuns per Configuration  1000  1000 
lightgrayTransformation
sampling 

lightgrayMeasurement noise  
black 
To evaluate the different mixture models applied to the point set registration problem, we simulate different measurement pairs on different landmark configurations. Each consisting of uniformly distributed landmarks in a square of . Following the typical characteristics of an automotive radar sensor, of the landmarks are duplicated two times based on a standard deviation of to generate clusters. Now, we assign a measurement noise (in polar space) with standard deviation of and to each of the landmarks. Based on this, we sample a new measurement pair for each Monte Carlo run. The transformation between the measurements follows a uniform distribution in the range of in x and ydirection and for the angle. The random set of different transformations is the same for each of the landmark configurations.
In addition to the 2D dataset, a 3D variant with 6 degrees of freedom was generated in the same style, but with
landmarks to compensate for the higher degree of freedom. Parameters for both experiments are summarized in Table II, where the variables , , and () are used for radius, azimuth, and (elevation) to define parameters in polar and (spherical) coordinates. Please note, that means the radius in this context, while is the residual function. Furthermore, we use Cartesian coordinates with the nomenclature and an Euler angle representation with for rotations around the corresponding axes.Since real world applications have to deal with outliers and sets of not fully overlapping points, we include an additional evaluation based on the Manhattan dataset M3500b from [6]. A number of landmarks are added uniformly into the world with the same clustering and noise properties as before. Instead of random transformations, the dataset’s trajectory is used – excluding loop closure transitions. Landmark measurements are simulated by a sensor with range and FoV, resulting in an average outlier rate of and a maximum of .
Metrics
As it is common practice, we apply the RMSE separately for the translational and rotational errors to measure accuracy. Aside from accuracy, we also want to measure, how good the estimator calculates its own uncertainty – also referred to as the estimator’s credibility. Therefore, the average normalized estimation error squared (ANEES) is used as originally described in [4, Chp. 5.4], but normalized with the problem’s dimension as stated in [13]. If the estimator is perfectly credible, the ANEES value equals . Since the optimization problem is likely to have multiple local minima, we expect that none of the evaluated algorithms will be fully credible. Nevertheless, we use the ANEES as a comparative metric to evaluate which algorithm is the most credible under this rough conditions.
Alg. 







DCS  
MM  
SM  
MSM  

MM  
SM  
MSM  

DCS  
MM  
SM  
MSM  

MM  
SM  
MSM 
Alg. 


ANEES 





MM  
SM  
MSM  

MM  
SM  
MSM  

MM  
SM  
MSM 
V Discussion
Based on the Monte Carlo experiments, we discuss the performance of our approach in the following section – for the numbers see Table III and Table IV.
In terms of accuracy, we have to distinguish between plain optimization and point set registration. In our plain optimization evaluation, we can see that the accuracy depends strongly on a wellbehaved convergence. Due to its linearity, MaxMixture converges perfectly for symmetric mixture models. However, it fails in almost of the asymmetric cases, where it converges to a false local minimum. Although we only include GMMs with a distinct global optimum, the MM approximation is prone to have multiple ones as shown in Figure 3. This demonstrates that albeit it is a close approximation, it behaves differently from the exact model. SumMixture on the other hand is an exact model and performs well for onedimensional problems. For twodimensional problems, however, it fails as shown in Figure 1 on the first page. A convergence success rate below indicates that the underlying optimization problem is seriously illposed. This is caused by the problematic Jacobian that drops to zero, as shown in Figure 3. We cannot discuss this deeply here, but these singularities slow down the convergence in 1D and prevent it in 2D. The proposed MaxSumMixture has none of these issues. It converges fast to the correct minimum in all evaluated cases and achieves a consistently low RMSE. The error is slightly higher than for wellconverging cases of SM/MM, because of the early termination due to the remaining constant part of the cost function (see Section IIIB). However, with a magnitude of , this effect is usually negligible for practical applications. The Mestimator DCS shows the highest accuracy through all algorithms, which is expectable, since it is perfectly quadratic in the final phase of convergence. Still, it is not applicable for asymmetric problems.
In difference to the artificial example of plain optimization, all approaches are able to converge in our point set registration experiment. This is possible because the reasons of failure – singularities of SM or minima of MM – occur just locally in one of many cost terms. To give an example: A cost function with a Jacobian of zero, does not contribute to the optimization if the other derivatives are nonzero – the algorithm can still converge. Nevertheless, the individual weaknesses of each approach degrade the accuracy and the proposed MSM provides the smallest RMSE in translation as well as in rotation. SM achieves a similar accuracy in the 2D Monte Carlo scenario, but for 3D it has twice the error of MSM. MM is more consistent than SM through the different scenarios, but its RMSE is about – higher than our proposed approach. The comparatively low rotational error of MSM in the Manhattan world scenario supports these findings. Since variant M3500b is characterized by a large initial orientation error, MSM can profit from its wellbehaved convergence.
Analyzing the credibility of the point set registration problem, shows again the consequences of the problematic Jacobian of SM – a strongly underconfident result. It is worth mentioning, that the ANEES is an inverse proportional value, so an ANEES of means the estimated covariance is about times higher than the actual squared error. Hence, it is clear that in the 2D case, the covariance estimated by SM is way too high for practical applications. MSM is just slightly overconfident with an ANEES of , which is a good result if we take the multiple minima into account that occur in point set registration. The estimated uncertainty of MM is similar to MSM (as shown by Figure 3) and since its error is higher, it is more optimistic (overconfident). For 3D, the results are slightly shifted: Due to the increased number of local minima, all algorithms tend do have a higher ANEES. Since the ANEES of SM was too low in the 2D case, it now has the “best” ANEES for 3D. However, we have to mention that this is just the result of two undesirable effects that compensate each other – multiple minima and singular Jacobians. If any measure is taken to improve the convergence, like a better initialization or graduated nonconvexity [22], it will become worse again. Therefore, we are still convinced that our approach is able to deliver the best credibility if unmodeled properties like multiple minima are taken into account.
By comparing the run time of all GMMbased algorithms, we can see that MM is the most rapidly converging one – which is not surprising due to its perfect linearity. Interestingly, DCS is not noticeably faster, even though the evaluation of multiple GMM components comes with a bigger overhead. The proposed MSM is just slightly slower across all experiments, with a comparable number of iterations. SM however, is far behind and converges between and times slower. Its nonlinearity forces the optimizer to iterate often with many small or even unsuccessful steps.
In summary, we can say that the proposed MaxSumMixture algorithm is the most consistent one in our evaluation. It shows good convergence properties in the single cost function tests and the best accuracy combined with a good credibility in the point set registration experiment.
Vi Conclusion
In this work, we proposed a novel approach to implement Gaussian mixture models for robust least squares optimization. These models provide the flexibility to describe almost arbitrary distributions in state estimation or to represent the ambiguity of assignment problems. We demonstrated how to derive a wellposed loss function that represents the loglikelihood of a GMM mostly linear and without approximations.
By comparing against stateoftheart methods, we showed the improved convergence properties of our MaxSumMixture approach. It was able to converge fast and accurate to the global minimum for a broad range of models, as our extensive Monte Carlo evaluation confirmed. With point set registration, a practical application supported this results and showed also the superior combination of accuracy, credibility and computational efficiency that our approach provides.
We think that many state estimation problems in robotics can benefit from a wellconverging Gaussian mixture formulation. Therefore, we want to emphasize again the available open source implementation of the proposed algorithm.
Appendix A Proof of the Normalization Term
In this section, we prove that our choice (missing) of the normalization constant guarantees a negative loglikelihood above zero.
Appendix B Jacobians
Here, we provide the Jacobian matrices for all cost functions that are given in Table I.
For MaxMixture [15] the Jacobian is
with as the zero matrix.
For SumMixture [19] the Jacobian is
with and .
The proposed MaxSumMixture uses the modified scalings
and exponents , which are defined in detail as and .
The Jacobian is
References
 [1] (2015) Selftuning mestimators. In Proc. of Int. Conf. on Robotics and Automation (ICRA), pp. 4628–4635. Cited by: §II.
 [2] (2013) Robust map optimization using dynamic covariance scaling. In Proc. of Int. Conf. on Robotics and Automation (ICRA), pp. 62–69. Cited by: §II, §II, §IVA.
 [3] Ceres Solver. Note: http://ceressolver.org Cited by: §I, §IVA.
 [4] (2001) Estimation with Applications to Tracking and Navigation: Theory Algorithms and Software. 1 edition, John Wiley & Sons, Inc.. Cited by: §IVB.
 [5] (2015) Joint spatial and Dopplerbased egomotion estimation for automotive radars. In Proc. of Int. Symposium on Intelligent Vehicles (IV), pp. 839–844. Cited by: §IVB, §IVB.
 [6] (2014) From Angular Manifolds to the Integer Lattice: Guaranteed Orientation Estimation With Application to Pose Graph Optimization. IEEE Transactions on Robotics 30 (2), pp. 475–492. Cited by: §IVB.
 [7] (2021) Adaptive robust kernels for nonlinear least squares problems. Robotics and Automation Letters (RAL). Cited by: §II.
 [8] GTSAM. Note: http://research.cc.gatech.edu/borg/gtsam Cited by: §I, §IVB.
 [9] (2016) A nonparametric belief solution to the bayes tree. In Proc. of Int. Conf. on Intelligent Robots and Systems (IROS), pp. 2189–2196. Cited by: §II.
 [10] (2019) MHisam2: multihypothesis isam using bayes tree and hypotree. In Proc. of Int. Conf. on Robotics and Automation (ICRA), Montreal, QC, Canada, pp. 1274–1280. External Links: Document Cited by: §II.
 [11] (1964) Robust estimation of a location parameter. The Annals of Mathematical Statistics 35, pp. 73–101. Cited by: §II.
 [12] (2011) Robust Point Set Registration Using Gaussian Mixture Models. IEEE Trans. on Pattern Analysis and Machine Intelligence 33 (8), pp. 1633–1645. External Links: ISSN 01628828, Document Cited by: §IVB, §IVB.
 [13] (2002) Estimator’s credibility and its measures. In Proc. of Int. Federation of Automatic Control (IFAC), Cited by: §IVB.
 [14] (2004) Methods for nonlinear least squares problems. Technical report Technical University of DenmarkTechnical University of Denmark. Cited by: §IIIB.
 [15] (2013) Inference on networks of mixtures for robust robot mapping. Int. Journal of Robotics Research 32 (7), pp. 826–840. Cited by: Appendix B, §I, TABLE I, §II, §II, §IIIA, §IIIB, §IV.
 [16] libRSF. Note: https://github.com/TUCProAut/libRSF Cited by: §IVA.
 [17] (2016) Robust factor graph optimization – a comparison for sensor fusion applications. In Proc. of Int. Conf. on Emerging Technologies and Factory Automation (ETFA), pp. 1–4. Cited by: §II.

[18]
(2012)
Simultaneous localization and mapping with multimodal probability distributions
. The Int. Journal of Robotics Research 32 (2), pp. 143–171. Cited by: §II.  [19] (2013) Robust incremental online inference over sparse factor graphs: beyond the Gaussian case. In Proc. of Int. Conf. on Robotics and Automation (ICRA), pp. 1025–1032. Cited by: Appendix B, §I, TABLE I, §II, §IIIA, §IIIB, §IIIC, §IV.
 [20] (2012) Fast and accurate scan registration through minimization of the distance between compact 3D NDT representations. The Int. Journal of Robotics Research 31 (12), pp. 1377–1393. Cited by: §II, §IVB.
 [21] (2012) Switchable constraints for robust pose graph slam. In Proc. of Int. Conf. on Intelligent Robots and Systems (IROS), pp. 1879–1884. Cited by: §II.
 [22] (2019) Graduated nonconvexity for robust spatial perception: from nonminimal solvers to global outlier rejection. Robotics and Automation Letters (RAL), pp. 1127–1134. Cited by: §V.