Log In Sign Up

Advancing Mixture Models for Least Squares Optimization

by   Tim Pfeifer, et al.
TU Chemnitz

Gaussian mixtures are a powerful and widely used tool to model non-Gaussian estimation problems. They are able to describe measurement errors that follow arbitrary distributions and can represent ambiguity in assignment tasks like point set registration or tracking. However, using them with common least squares solvers is still difficult. Existing approaches are either approximations of the true mixture or prone to convergence issues due to their strong nonlinearity. We propose a novel least squares representation of a Gaussian mixture, which is an exact and almost linear model of the corresponding log-likelihood. Our approach provides an efficient, accurate and flexible model for many probabilistic estimation problems and can be used as cost function for least squares solvers. We demonstrate its superior performance in various Monte Carlo experiments, including different kinds of point set registration. Our implementation is available as open source code for the state-of-the-art solvers Ceres and GTSAM.


An Adaptive Data Representation for Robust Point-Set Registration and Merging

This paper presents a framework for rigid point-set registration and mer...

DeepGMR: Learning Latent Gaussian Mixture Models for Registration

Point cloud registration is a fundamental problem in 3D computer vision,...

A Bayesian Filtering Algorithm for Gaussian Mixture Models

A Bayesian filtering algorithm is developed for a class of state-space s...

Robust Registration of Gaussian Mixtures for Colour Transfer

We present a flexible approach to colour transfer inspired by techniques...

DUGMA: Dynamic Uncertainty-Based Gaussian Mixture Alignment

Registering accurately point clouds from a cheap low-resolution sensor i...

miniSAM: A Flexible Factor Graph Non-linear Least Squares Optimization Framework

Many problems in computer vision and robotics can be phrased as non-line...

Fast and Accurate Least-Mean-Squares Solvers

Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regressi...

I Introduction

Would it not be great to have one probabilistic tool to describe arbitrary distributions that arise from simultaneous localization and mapping (SLAM), tracking or even point set registration? We show how to include Gaussian mixture models into the typical nonlinear least squares solvers that are used for these problems today.

State estimation in robotics is based on deterministic models, that describe the physical relation between measurements and states. Due to the absence of perfect measurements, probabilistic models are required additionally to describe the related uncertainty. For these models, the Gaussian distribution is the first choice, since it allows efficient least squares optimization.

However, sensor information is often corrupted by outliers that do not follow a Gaussian: Vision can blur, odometry can slip and GNSS can be affected by reflections and multipath. Even if the sensor noise itself is Gaussian, ambiguity can lead to multimodal densities for SLAM or point set registration problems. Without better probabilistic models, state estimation is likely to fail in these cases.

One of the most popular models for robust estimation is the Gaussian mixture model (GMM), which is composed of a sum of multiple weighted Gaussian components. In difference to the also common M-estimators, GMMs are preferred when outliers follow a systematic pattern or ambiguity has to be represented. They are flexible enough to represent asymmetry, heavy tails and skewness, but close enough to a single Gaussian to allow sufficient convergence using nonlinear least squares.

There already exist algorithms to include GMMs in nonlinear least squares optimization, like Max-Mixture (MM) [15] or the approach proposed in [19], in this work referred as Sum-Mixture (SM). However, both of them come with individual drawbacks in terms of model accuracy or convergence as we explain in Section II.

In this paper, we introduce a novel representation for Gaussian mixtures in least squares problems, such as state estimation using factor graphs or point set registration. Based on Sum-Mixture, we derive a hybrid formulation of Max-Mixture and Sum-Mixture using robust numerical methods in Section III. It offers the combined advantages of both, without their individual disadvantages.In addition, we provide:

  • A discussion how to reduce the impact of nonlinearity on the estimation process in Section III-B.

  • A comparison against Max-Mixture and Sum-Mixture in extensive Monte Carlo experiments, as shown in Figure 1, in Section IV-A.

  • A novel method to apply the proposed model to point set registration in Section IV-B.

  • A detailed analysis of the reasons why state-of-the-art approaches fail under certain conditions in Section V.

Furthermore, we provide open source implementations111Available under of our approach – as well as SM and MM – for two of the most common optimization frameworks in robotics: Ceres [3] and GTSAM [8].

Ii Prior Work

While the field of robust state estimation is remarkably broad, we focus on a brief overview of different robust probabilistic models for nonlinear least squares (NLS) and factor graph optimization. The common goal of all algorithms is to reduce the huge impact of non-Gaussian outliers to least squares problems.

M-estimators are one of the standard solutions for robust estimation. They describe a set of loss functions that weight large residuals down. Frameworks like GTSAM or Ceres bring a variety of M-estimators with them, like the historical Huber M-estimator

[11] or the more recent Dynamic Covariance Scaling (DCS) [2]. Due to their symmetry, they are not suited to describe arbitrary multimodal distributions that arise from ambiguity or effects like reflected GNSS signals. Also, their performance depends strongly on the choice of specific parameters, which often leads to manual tuning. In more recent work like [1] and [7]

, the parametrization issue was resolved by self-tuning algorithms based on expectation-maximization. But still, they are limited by their symmetry constraint.

Switchable constraints [21] introduced weights for each residual that can be jointly optimized within the least squares problem. In this way, the optimizer can down-weight outliers individually. Albeit it is formulated different, the authors of [2] showed that its solution is equivalent to an M-estimator.

Gaussian mixtures are quite flexible models but hard to integrate into nonlinear least squares. So the authors of [18] proposed the idea of preselecting the dominant mode and using it as an approximation of the full GMM. Since the optimization problem was constructed fully Gaussian afterwards, it is just a limited workaround and no reselection is possible. Olson et al. went one step further in [15] and proposed a better approximation – a local maximum of multiple Gaussians. In contrast to the explicit selection in [18], the Max-Mixture approach selects the dominant mode locally and implicitly during the optimization. MM is widely adopted in SLAM and factor graph optimization as an efficient and robust model. However, it is still a limited approximation of the true density, especially GMMs with overlapping components are not represented well.

With [19], Rosen et al. proposed a very general approach to represent almost arbitrary distributions in NLS problems. This idea had a fundamental importance for later work [17], were we firstly implemented an exact GMM for NLS. As we show in this publication, the Sum-Mixture model from [17] is fundamentally ill-posed beside some special cases. Nevertheless, both papers [15] and [19] inspired the novel approach that is presented in this work.

Besides least-squares optimization, the authors of [20]

applied Gaussian mixtures to solve probabilistic point set registration. They used the full Gaussian mixture, but approximated the product of probability densities by a sum of them, which violates fundamental assumptions of probability theory.

The authors of [10] and [9] introduced novel algorithms to represent ambiguity, which includes equivalents of Gaussian mixture models. However, they require different kinds of specialized solvers and come with a lot of overhead.

We, instead, present a novel Gaussian-mixture implementation for common nonlinear least squares solvers, that can be applied to a variety of robotic applications. Our approach combines the well-behaved convergence of Max-Mixture with the exact representation of Sum-Mixture without adopting their particular weaknesses.

Algorithm Max-Mixture [15] Sum-Mixture [19] Max-Sum-Mixture
Cost Function
TABLE I: Gaussian Mixture Loss Functions for Least Squares

Iii Gaussian Mixture Models for Least Squares

To introduce our novel Gaussian mixture representation for least squares, we take a look at the Gaussian case before we extend it to mixtures.

Central idea of probabilistic state estimation is to find the most likely set of states based on a set of perceived measurements . The likelihood maximization is done by minimizing the negative log-likelihood, which leads to the common least squares formulation:


Please note, that we use two notations for the squared Mahalanobis distance, which are


where is the measurement covariance matrix and is the square-root information matrix. Residual term describes a nonlinear difference between a subset of states and measurements . Constant is the Gaussian log-normalization and does not depend on the states , so it can be omitted. The remaining pure quadratic form is the direct result of assuming Gaussian distributed noise for the residual . Since a quadratic problem is sensitive to any kind of outlier, the residual term is often wrapped into a loss function which brings us to robust least squares:


This new cost function (missing) is composed of the residual function and the loss function . The loss can be derived directly from negative log-likelihood of the residual (missing) to apply different probabilistic models:


The Mahalanobis distance (missing) is a special case of this loss function. Please note, that we define inside the square of (missing) to allow asymmetric loss functions, which is slightly different to the literature. The main contribution of this paper can be condensed down to the answer to a simple question: How can we represent the log-likelihood of a GMM using the loss function ?

Iii-a Max-Sum-Mixture – A Novel Approach

For a better understanding of the underlying mathematical structure, we introduce a compressed notation for the probability density of a Gaussian mixture with components:


The constant scaling contains the weighting of a component and the square root information matrix :


The exponent is defined as half of the squared Mahalanobis distance between the residual and the mean :


The log-likelihood of (missing) is the scaled version of the log-sum-exp function


with the set of scalings and exponents . While it is not possible to push the logarithm inside the sum, it is indeed possible to pull one particular exponent outside:


This is a common approach to formulate a numerical robust log-sum-exp function, but we apply it algebraically to split the sum in a linear and a remaining nonlinear part. Since can be chosen freely, we choose the locally dominant component to get a big linear term. So the index of is defined by


Interestingly, this part is identical with the Max-Mixture formulation from [15]. In difference to MM, we keep the remaining nonlinear term in the least squares problem. Therefore, we redefine the set of exponents as


and apply the generalization approach of [19] by introducing a normalization constant :


The normalization guarantees that the expression inside the square root is always positive. As shown in detail in Appendix A, it can be set to


Damping parameter can be used to reduce the nonlinearity of (missing) – its optimal choice is discussed the next section. Now, everything can be put together to formulate our main contribution – the square root of the negative log-likelihood of a GMM:


Please note that this is a vector expression with a dimensionality of

. The related Jacobian is provided in Appendix B.

Fig. 2: The influence of the damping factor on the squared cost and the approximated Hessian of our approach. Each line represents a Max-Sum-Mixture cost function with a different value of – as darker the line, as lower the damping. A higher damping decreases the effect of nonlinearities on the Hessian, but shifts its cost surface upwards.
Fig. 3: Comparison of cost functions and approximated Hessians of different mixture representations against the true Gaussian mixture. The dotted line represents the negative log-likelihood of the underlying GMM and its true Hessian. Solid lines represent the different approaches to describe this GMM for least squares. The Hessians in the lower plot are approximated by squared Jacobians. At the cost minimum, Max-Sum-Mixture and Max-Mixture are close to the true Hessian, while the Hessian of Sum-Mixture is singular.

Iii-B Damping the Nonlinearity

The previously introduced damping factor seems superfluous in the first place, but it can be used to control the impact of nonlinearities. Therefore, its worth thinking about the best choice of .

The effect of the damping factor on the optimization itself is rather small, since it only shifts the cost surface. However, as elaborated in [14, p. 22], shifting the cost can decrease the rate of convergence. Additionally, a high deceases the relative cost change, which can lead to an early termination of the optimization. Therefore, should not be too high.

The damping factor affects also the estimated covariance of a state , that can be recovered from the optimization problem. Covariance recovery can be done by inverting the Hessian that is approximated by the squared Jacobian :


We refer to this approximated Hessian as pseudo-Hessian. In Figure 3, we illustrate the result of this approximation for different choices of the damping parameter. For a high , the pseudo-Hessian of MSM becomes similar to the pseudo-Hessian of MM, which is slightly to big – and therefore slightly overconfident as shown in Figure 3. As smaller the damping gets, as more overconfident the pseudo-Hessian becomes. So, should not be too low.

Based on practical experience, almost any value will work for the damping factor. Nevertheless, we could achieve the best results with values of between 1 and 1000. So, we recommend using and apply this for our evaluation. Figure 3 shows the resulting cost function compared to Max-Mixture [15] and Sum-Mixture [19].

Iii-C Relation between Sum-Mixture and Max-Sum-Mixture

Since the proposed Max-Sum-Mixture has a substantial algebraic overlap with Sum-Mixture, we take a closer look at their mathematical relation. Both loss functions are provided together with MM in Table I and their Jacobians are given in Appendix B. For a fair evaluation, we apply the numerical robust log-sum-exp formulation for SM, even though this was not part of the original publication.

By comparing the scalar loss of SM with the intermediate step (missing) of MSM, it becomes clear that both describe the same log-likelihood and their squared costs are directly proportional:


They differ only in two points: their normalization constants are slightly different and MSM uses a vectorized loss function where the dominant component is pulled out.

Since the normalization constants barely influence the optimization, the major advantage of MSM stems from the vectorization. Separating the dominant linear part from the nonlinear in (missing), exposes a linear representation of the dominant mode to the optimizer. For SM on the other hand, the optimizer has to calculate its steps based on the Jacobian of a completely nonlinear cost term. Thus, the actual steps of MSM behave better than those of SM, even though their squared costs are almost equal. In consequence, the proposed MSM is a much more efficient implementation of the concepts of [19] than the original scalar formulation. We will verify this advantage in the next section.

Iv Monte Carlo Evaluation

In the following, we explain the motivation behind our simulative evaluation. Our experiments are designed to analyze the properties of the proposed Max-Sum-Mixture formulation compared to state-of-the-art approaches from [15] and [19]. A good probabilistic loss function should be an accurate and efficient representation of the corresponding distribution. Therefore, we stress all approaches for their convergence properties, their accuracy, their credibility and their computational efficiency.

We provide two types of synthetic Monte Carlo experiments to generalize our findings beyond specific applications. A plain optimization of a random set of Gaussian mixtures and a set of randomized point set registration problems. All experiments were executed on a standard AMD Ryzen 7 desktop PC.

Iv-a Plain Optimization

We design the first synthetic experiment to test our approach for a well-behaved convergence. This includes the ability to represent the true mode of a Gaussian mixture and a cost surface that allows the optimizer to converge fast.

Hence, we create the simplest possible optimization problem that involves a GMM – a single linear residual wrapped into a nonlinear loss function:


The loss function represents the negative log-likelihood as described in (missing). To allow a rough comparison with M-estimators, we also add DCS [2] to the symmetric scenarios. Starting from different initial positions, least-squares optimization is applied to find the minimum of each cost function. By repeating the experiment for a set of different Gaussian mixtures, we run optimizations in total.

Dataset Generation

To generate a variety of different models, we sample the mixture parameters randomly from a set of uniform distributions. All mixtures are composed of two components, the first one with a zero mean and a standard deviation drawn between

and . The second component has a random mean from the range of and the standard deviation of the first, multiplied by a random factor between and . A weight between and is used for the first component – the second weight is simply the difference to . For two-dimensional models, we draw the values for both dimensions independently. To guarantee that converging to the global optimum is always possible, we detect models with multiple minima and reject them, using a combination of grid-based sampling and local maxima detection. The initial positions are linearly spaced in a range of to cover the relevant cost surface. We implement the evaluation as part of the libRSF [16], using the Ceres [3] version of the Levenberg-Marquardt algorithm.


We evaluate the root-mean-square error (RMSE) as distance between the true mode of the GMM and the optimized variable after convergence. The true mode is found by using a grid search, followed by a local optimization. To check if an optimization converged to the right minimum, we apply an error threshold of and define the ratio of runs below this limit as convergence success rate. All results are listed together with the averaged run time and number of iterations in Table III.

Iv-B Point Set Registration

Our motivation behind the second simulated evaluation stems from ego-motion estimation using sensors that measure sets of spatial points, such as radars or lidars. This task can be interpreted as a robust point set registration problem, as shown for example by [5] with radar sensors. We apply the concept from [12] to represent two set of points with Gaussian mixture models and register them in a distribution-to-distribution manner. By using a GMM representation, we can simply integrate a varying uncertainty of each target point (or sometimes referred to as landmark) within the two point sets – crucial for having a credible estimator. Furthermore, we can omit the usually required step of finding explicit correspondences compared to other point set registration algorithms.

However, with focus on credibility, we differ from the implementation in [5]. They used a sum approximation to align both GMM distributions which leads to robustness against outliers, but results in wrong covariances. Their approach increases furthermore the possible number of local minima – we refer to future work for a more detailed investigation.

Theoretical Background

In general, point set registration is the task of finding the transformation which aligns a current or moving point set , so that it overlaps with a reference or fixed set . We use the index for a point measurement within and for a point in respectively. For finding the best transformation

, we can maximize the joint distribution


Following the idea of [12] and [20], we model the reference set as GMM, which results in the following distribution for one point of :


Taking into account the transformation and rewriting the normalization terms leads to the likelihood function


The residual represents the transformation of point into the reference frame with the current estimate of :


Also, the covariance matrix of is transformed into the current frame, using the current estimate’s rotation matrix . The resulting least squares problem is


where the loss describes the negative log-likelihood of the GMM (missing). Note, that we can simply add an uncertain Gaussian component to the fixed frame, representing outliers, to have a robust version of the point set registration problem. We use this robust version in our last experiment with the Manhattan dataset. Again, Levenberg-Marquardt is applied to solve the optimization, in this experiments based on the GTSAM framework [8].

Fig. 4: Example of one random point set registration problem with otherwise unknown correspondence information for better visual association. The transformation of the ground truth points, before adding simulated measurement noise to both sets, was in x-direction and in rotation. Specific properties of the described experiment like a measurement noise based on polar coordinates and clustered landmarks are apparent.

Dataset Generation

Experiment 2D 3D
Num. of Configurations 100 100
Clustering (size) () ()
Cluster spread
blackRuns per Configuration 1000 1000
lightgrayMeasurement noise
TABLE II: Parameters for the Point Set Registration Experiments.

To evaluate the different mixture models applied to the point set registration problem, we simulate different measurement pairs on different landmark configurations. Each consisting of uniformly distributed landmarks in a square of . Following the typical characteristics of an automotive radar sensor, of the landmarks are duplicated two times based on a standard deviation of to generate clusters. Now, we assign a measurement noise (in polar space) with standard deviation of and to each of the landmarks. Based on this, we sample a new measurement pair for each Monte Carlo run. The transformation between the measurements follows a uniform distribution in the range of in x- and y-direction and for the angle. The random set of different transformations is the same for each of the landmark configurations.

In addition to the 2D dataset, a 3D variant with 6 degrees of freedom was generated in the same style, but with

landmarks to compensate for the higher degree of freedom. Parameters for both experiments are summarized in Table II, where the variables , , and () are used for radius, azimuth, and (elevation) to define parameters in polar and (spherical) coordinates. Please note, that means the radius in this context, while is the residual function. Furthermore, we use Cartesian coordinates with the nomenclature and an Euler angle representation with for rotations around the corresponding axes.

Since real world applications have to deal with outliers and sets of not fully overlapping points, we include an additional evaluation based on the Manhattan dataset M3500b from [6]. A number of landmarks are added uniformly into the world with the same clustering and noise properties as before. Instead of random transformations, the dataset’s trajectory is used – excluding loop closure transitions. Landmark measurements are simulated by a sensor with range and FoV, resulting in an average outlier rate of and a maximum of .


As it is common practice, we apply the RMSE separately for the translational and rotational errors to measure accuracy. Aside from accuracy, we also want to measure, how good the estimator calculates its own uncertainty – also referred to as the estimator’s credibility. Therefore, the average normalized estimation error squared (ANEES) is used as originally described in [4, Chp. 5.4], but normalized with the problem’s dimension as stated in [13]. If the estimator is perfectly credible, the ANEES value equals . Since the optimization problem is likely to have multiple local minima, we expect that none of the evaluated algorithms will be fully credible. Nevertheless, we use the ANEES as a comparative metric to evaluate which algorithm is the most credible under this rough conditions.

Converg. Succ.
Rate []
Runtime []
TABLE III: Results of the Single Cost Function Evaluation.
Runtime []
TABLE IV: Results of the Point Set Registration Experiments.

V Discussion

Based on the Monte Carlo experiments, we discuss the performance of our approach in the following section – for the numbers see Table III and Table IV.

In terms of accuracy, we have to distinguish between plain optimization and point set registration. In our plain optimization evaluation, we can see that the accuracy depends strongly on a well-behaved convergence. Due to its linearity, Max-Mixture converges perfectly for symmetric mixture models. However, it fails in almost of the asymmetric cases, where it converges to a false local minimum. Although we only include GMMs with a distinct global optimum, the MM approximation is prone to have multiple ones as shown in Figure 3. This demonstrates that albeit it is a close approximation, it behaves differently from the exact model. Sum-Mixture on the other hand is an exact model and performs well for one-dimensional problems. For two-dimensional problems, however, it fails as shown in Figure 1 on the first page. A convergence success rate below indicates that the underlying optimization problem is seriously ill-posed. This is caused by the problematic Jacobian that drops to zero, as shown in Figure 3. We cannot discuss this deeply here, but these singularities slow down the convergence in 1D and prevent it in 2D. The proposed Max-Sum-Mixture has none of these issues. It converges fast to the correct minimum in all evaluated cases and achieves a consistently low RMSE. The error is slightly higher than for well-converging cases of SM/MM, because of the early termination due to the remaining constant part of the cost function (see Section III-B). However, with a magnitude of , this effect is usually negligible for practical applications. The M-estimator DCS shows the highest accuracy through all algorithms, which is expectable, since it is perfectly quadratic in the final phase of convergence. Still, it is not applicable for asymmetric problems.

In difference to the artificial example of plain optimization, all approaches are able to converge in our point set registration experiment. This is possible because the reasons of failure – singularities of SM or minima of MM – occur just locally in one of many cost terms. To give an example: A cost function with a Jacobian of zero, does not contribute to the optimization if the other derivatives are non-zero – the algorithm can still converge. Nevertheless, the individual weaknesses of each approach degrade the accuracy and the proposed MSM provides the smallest RMSE in translation as well as in rotation. SM achieves a similar accuracy in the 2D Monte Carlo scenario, but for 3D it has twice the error of MSM. MM is more consistent than SM through the different scenarios, but its RMSE is about higher than our proposed approach. The comparatively low rotational error of MSM in the Manhattan world scenario supports these findings. Since variant M3500b is characterized by a large initial orientation error, MSM can profit from its well-behaved convergence.

Analyzing the credibility of the point set registration problem, shows again the consequences of the problematic Jacobian of SM – a strongly underconfident result. It is worth mentioning, that the ANEES is an inverse proportional value, so an ANEES of means the estimated covariance is about times higher than the actual squared error. Hence, it is clear that in the 2D case, the covariance estimated by SM is way too high for practical applications. MSM is just slightly overconfident with an ANEES of , which is a good result if we take the multiple minima into account that occur in point set registration. The estimated uncertainty of MM is similar to MSM (as shown by Figure 3) and since its error is higher, it is more optimistic (overconfident). For 3D, the results are slightly shifted: Due to the increased number of local minima, all algorithms tend do have a higher ANEES. Since the ANEES of SM was too low in the 2D case, it now has the “best” ANEES for 3D. However, we have to mention that this is just the result of two undesirable effects that compensate each other – multiple minima and singular Jacobians. If any measure is taken to improve the convergence, like a better initialization or graduated non-convexity [22], it will become worse again. Therefore, we are still convinced that our approach is able to deliver the best credibility if unmodeled properties like multiple minima are taken into account.

By comparing the run time of all GMM-based algorithms, we can see that MM is the most rapidly converging one – which is not surprising due to its perfect linearity. Interestingly, DCS is not noticeably faster, even though the evaluation of multiple GMM components comes with a bigger overhead. The proposed MSM is just slightly slower across all experiments, with a comparable number of iterations. SM however, is far behind and converges between and times slower. Its nonlinearity forces the optimizer to iterate often with many small or even unsuccessful steps.

In summary, we can say that the proposed Max-Sum-Mixture algorithm is the most consistent one in our evaluation. It shows good convergence properties in the single cost function tests and the best accuracy combined with a good credibility in the point set registration experiment.

Vi Conclusion

In this work, we proposed a novel approach to implement Gaussian mixture models for robust least squares optimization. These models provide the flexibility to describe almost arbitrary distributions in state estimation or to represent the ambiguity of assignment problems. We demonstrated how to derive a well-posed loss function that represents the log-likelihood of a GMM mostly linear and without approximations.

By comparing against state-of-the-art methods, we showed the improved convergence properties of our Max-Sum-Mixture approach. It was able to converge fast and accurate to the global minimum for a broad range of models, as our extensive Monte Carlo evaluation confirmed. With point set registration, a practical application supported this results and showed also the superior combination of accuracy, credibility and computational efficiency that our approach provides.

We think that many state estimation problems in robotics can benefit from a well-converging Gaussian mixture formulation. Therefore, we want to emphasize again the available open source implementation of the proposed algorithm.

Appendix A Proof of the Normalization Term

In this section, we prove that our choice (missing) of the normalization constant guarantees a negative log-likelihood above zero.


Define and ,
where as defined in (missing)
and as defined in (missing).

Appendix B Jacobians

Here, we provide the Jacobian matrices for all cost functions that are given in Table I.
For Max-Mixture [15] the Jacobian is

with as the zero matrix.
For Sum-Mixture [19] the Jacobian is

with and .
The proposed Max-Sum-Mixture uses the modified scalings and exponents , which are defined in detail as and . The Jacobian is


  • [1] G. Agamennoni, P. Furgale, and R. Siegwart (2015) Self-tuning m-estimators. In Proc. of Int. Conf. on Robotics and Automation (ICRA), pp. 4628–4635. Cited by: §II.
  • [2] P. Agarwal, G. D. Tipaldi, L. Spinello, C. Stachniss, and W. Burgard (2013) Robust map optimization using dynamic covariance scaling. In Proc. of Int. Conf. on Robotics and Automation (ICRA), pp. 62–69. Cited by: §II, §II, §IV-A.
  • [3] S. Agarwal, K. Mierle, et al. Ceres Solver. Note: Cited by: §I, §IV-A.
  • [4] Y. Bar-Shalom, X. R. Li, and T. Kirubarajan (2001) Estimation with Applications to Tracking and Navigation: Theory Algorithms and Software. 1 edition, John Wiley & Sons, Inc.. Cited by: §IV-B.
  • [5] M. Barjenbruch, D. Kellner, J. Klappstein, J. Dickmann, and K. Dietmayer (2015) Joint spatial- and Doppler-based ego-motion estimation for automotive radars. In Proc. of Int. Symposium on Intelligent Vehicles (IV), pp. 839–844. Cited by: §IV-B, §IV-B.
  • [6] L. Carlone and A. Censi (2014) From Angular Manifolds to the Integer Lattice: Guaranteed Orientation Estimation With Application to Pose Graph Optimization. IEEE Transactions on Robotics 30 (2), pp. 475–492. Cited by: §IV-B.
  • [7] N. Chebrolu, T. Läbe, O. Vysotska, J. Behley, and C. Stachniss (2021) Adaptive robust kernels for non-linear least squares problems. Robotics and Automation Letters (RA-L). Cited by: §II.
  • [8] F. Dellaert et al. GTSAM. Note: Cited by: §I, §IV-B.
  • [9] D. Fourie, J. Leonard, and M. Kaess (2016) A nonparametric belief solution to the bayes tree. In Proc. of Int. Conf. on Intelligent Robots and Systems (IROS), pp. 2189–2196. Cited by: §II.
  • [10] M. Hsiao and M. Kaess (2019) MH-isam2: multi-hypothesis isam using bayes tree and hypo-tree. In Proc. of Int. Conf. on Robotics and Automation (ICRA), Montreal, QC, Canada, pp. 1274–1280. External Links: Document Cited by: §II.
  • [11] P. J. Huber (1964) Robust estimation of a location parameter. The Annals of Mathematical Statistics 35, pp. 73–101. Cited by: §II.
  • [12] B. Jian and B. C. Vemuri (2011) Robust Point Set Registration Using Gaussian Mixture Models. IEEE Trans. on Pattern Analysis and Machine Intelligence 33 (8), pp. 1633–1645. External Links: ISSN 0162-8828, Document Cited by: §IV-B, §IV-B.
  • [13] X. R. Li, Z. Zhao, and V. P. Jilkov (2002) Estimator’s credibility and its measures. In Proc. of Int. Federation of Automatic Control (IFAC), Cited by: §IV-B.
  • [14] K. Madsen, H.B. Nielsen, and O. Tingleff (2004) Methods for nonlinear least squares problems. Technical report Technical University of DenmarkTechnical University of Denmark. Cited by: §III-B.
  • [15] E. Olson and P. Agarwal (2013) Inference on networks of mixtures for robust robot mapping. Int. Journal of Robotics Research 32 (7), pp. 826–840. Cited by: Appendix B, §I, TABLE I, §II, §II, §III-A, §III-B, §IV.
  • [16] T. Pfeifer et al. libRSF. Note: Cited by: §IV-A.
  • [17] T. Pfeifer, P. Weissig, S. Lange, and P. Protzel (2016) Robust factor graph optimization – a comparison for sensor fusion applications. In Proc. of Int. Conf. on Emerging Technologies and Factory Automation (ETFA), pp. 1–4. Cited by: §II.
  • [18] M. Pfingsthorn and A. Birk (2012)

    Simultaneous localization and mapping with multimodal probability distributions

    The Int. Journal of Robotics Research 32 (2), pp. 143–171. Cited by: §II.
  • [19] D. M. Rosen, M. Kaess, and J. J. Leonard (2013) Robust incremental online inference over sparse factor graphs: beyond the Gaussian case. In Proc. of Int. Conf. on Robotics and Automation (ICRA), pp. 1025–1032. Cited by: Appendix B, §I, TABLE I, §II, §III-A, §III-B, §III-C, §IV.
  • [20] T. Stoyanov, M. Magnusson, H. Andreasson, and A. J. Lilienthal (2012) Fast and accurate scan registration through minimization of the distance between compact 3D NDT representations. The Int. Journal of Robotics Research 31 (12), pp. 1377–1393. Cited by: §II, §IV-B.
  • [21] N. Sünderhauf and P. Protzel (2012) Switchable constraints for robust pose graph slam. In Proc. of Int. Conf. on Intelligent Robots and Systems (IROS), pp. 1879–1884. Cited by: §II.
  • [22] H. Yang, P. Antonante, V. Tzoumas, and L. Carlone (2019) Graduated non-convexity for robust spatial perception: from non-minimal solvers to global outlier rejection. Robotics and Automation Letters (RA-L), pp. 1127–1134. Cited by: §V.