Probabilistic state estimation is a core component of mobile robot navigation. While the estimation machinery is reasonably mature, there are robot model parameters that are difficult to determine from first principles and vary with each new platform and sensor. Our vision is to develop a learning framework that allows the deployment of a robot with arbitrary sensors onboard, and have it learn the model parameters required for estimation (and planning/control) solely from the sensor data. This can be viewed as a form of nonlinear system identification, although we will approach the problem using modern machine learning techniques.
In this paper, we show that we can learn the parameters of a nonlinear system in concert with a nonlinear batch state estimation framework, namely Exactly Sparse Gaussian Variational Inference (ESGVI) . ESGVI exploits the fact that the joint likelihood between the observed measurements (data) and the latent state can be factored, which provides a family of scalable state estimation tools starting from a variational inference objective. To extend this to parameter learning, we use Expectation Maximization (EM). In the E-step, we fix all model parameters and optimize a bound on the data log-likelihood, the so-called Evidence Lower Bound (ELBO); this is equivalent to ESGVI latent state inference. In the M-step, we hold the latent state estimate fixed and optimize the ELBO for the parameters. Our method is general and applicable to any nonlinear system identification problem, even when the factorization of the joint likelihood has cycles (e.g., Simultaneous Localization and Mapping (SLAM)). Barfoot et al.  hint at the ESGVI extension to parameter learning, but do not demonstrate it in practice.
Our demonstration of parameter learning focuses on robot noise models. The noise models of the motion prior and observed measurements are often assumed to be known or tuned by trial and error. Our previous work demonstrated parameter learning for vehicle motion priors, but required accurate and complete (i.e., observation of the complete latent state) groundtruth . However, often times, collecting such groundtruth is not possible or extremely expensive. We demonstrate the ability to learn these noise models from only noisy measurements. If groundtruth is available, we treat it simply as another (noisy) measurement that can be included in the framework. We also demonstrate that an Inverse-Wishart (IW) prior over the time-varying measurement covariances, using a Maximum A Posteriori (MAP) treatment in the variational setting, achieves outlier rejection in both parameter learning and latent state inference. We then demonstrate our parameter learning method on a real-world lidar dataset with 16 km of driving and validate on a 20 km long dataset. We show that our parameter learning method is able to handle both noisy measurements and outliers during training and testing.
In summary, the main contribution of this paper is a detailed investigation and experimental demonstration of parameter learning as part of ESGVI. Our application focuses on trajectory estimation, where we show nonlinear system identification using noisy measurements, without groundtruth. We also include outlier rejection in the variational setting by placing an IW prior over covariances.
In Section II we review previous work. An overview of the ESGVI framework with parameter learning is provided in III. Section IV presents the noise models we use and how we learn their parameters. An experimental evaluation of our parameter learning method is presented in V. In Section VI, we provide concluding remarks and discuss future work.
Ii Related Work
System identification has been an active research area for decades [3, 4, 5, 6]. In the interest of space, we restrict our review of related work to the techniques that are most similar to our proposed approach. In the domain of parameter learning, the most common approach is to find parameters that maximize the likelihood of the data. One way to do this is to directly maximize the likelihood function with respect to the parameters [2, 7, 8]. This can be a difficult problem to solve, particularly when the model depends on missing or unobserved variables. In this case, an indirect approach can be taken by introducing a latent state to the problem, which can be estimated alongside of the parameters. This is known as Expectation Maximization (EM), an iterative algorithm that alternates between optimizing for a distribution over the latent state and the parameters.
Past work has shown how to estimate all the parameters of a linear dynamical system using EM, with Kalman smoothing in the E-step to update states and calculating analytic equations for parameter updates in the M-step . There have also been methods that attempt parameter learning for nonlinear systems with EM. Ghahramani and Roweis 
learn a full nonlinear model using Gaussian Radial Basis Functions (RBFs) to approximate the nonlinear expectations that would otherwise be intractable to compute. This method was applied to learn a simple sigmoid nonlinearity. Other methods approximate the required expectation using particle smoothing or sigmapoint smoothing [7, 8, 12]. These methods, however, did not learn a full nonlinear model, but only learned parameters of a mostly predefined model (e.g., calibration parameters), and were tested only in simulation.
Unlike all these other methods, we use ESGVI within the EM parameter learning framework, which is a more general method not limited to problems with a specific factorization of the joint likelihood between the data and the latent state (e.g., smoothing problems with a block-tridiagonal inverse covariance). We also demonstrate a practical application of parameter learning by estimating the parameters of our motion prior and measurement noise models in a batch estimation framework.
While we are interested in batch estimation, previous work has investigated learning the noise model parameters of filters. Abbeel et al. 
learn the noise model parameters of the Kalman Filter offline. However, these parameters are assumed to be static and do not vary with time. One popular area of study that handles changing covariances is Adaptive Kalman Filtering, where the measurement covariance is updated in an online fashion based on the statistics of the measurement innovation[14, 15, 16]. The measurement covariance in these cases is updated based solely on the data seen during inference, whereas we incorporate a prior.
Ko and Fox 
apply Gaussian process regression to learn robot measurement and motion models, but because they do not exploit sparsity, need to resort to using their learned models in a filter estimation framework. We exploit sparsity for batch estimation. Recent methods take advantage of deep neural networks (DNNs) to learn the robot noise models[18, 19, 20] but in many cases require groundtruth to train the DNN. We bypass this requirement by simultaneously estimating a distribution over the latent state.
Barfoot et al.  show how to learn a constant covariance using ESGVI through EM. We learn a time-varying covariance by introducing an IW prior over our covariances. The IW distribution has been used as a prior over time-varying covariance matrices before, but the parameters have always been assumed to be known . In our method, we seek to learn at least some of the parameters of the prior.
To the best of our knowledge, the work we present in this paper is the first attempt at wrapping parameter learning into the ESGVI framework to solve a practical problem, and shows that we can achieve a robust extension of ESGVI (with an outlier rejection scheme) by placing an IW prior on our measurement covariances. We also show comparable trajectory estimation performance between learning parameters with and without groundtruth.
Iii ESGVI with Parameter Learning
Iii-a Variational Setup
We begin with the maximum-likelihood problem for the given data, , which is expressed as
where represents the parameters of our system that we wish to learn.
We define the loss that we wish to minimize as the negative log-likelihood of the data and introduce the latent state, . Applying the usual EM decomposition results in
where we define our approximate posterior as a multivariate Gaussian distribution,. We now proceed iteratively in two steps, the expectation step (E-step) and the maximization step (M-step)111We are working with the negative log-likelihood so we are technically applying Expectation Minimization, but the acronym stays the same..
As commonly done in the EM framework, in both the E-step and the M-step, we optimize the upper bound term in (III-A), which is also known as the (negative) Evidence Lower Bound (ELBO). Using the expression for the entropy,
, for a Gaussian and dropping constants, the upper bound term is written as the loss functional ofESGVI,
where we define , is the expectation conditioned on the distribution , and is the matrix determinant. We drop in the notation for convenience as our expectation is over .
Taking the derivatives of the loss functional with respect to and , Barfoot et al.  developed a Newton-style iterative optimizer to update our estimate of . We summarize the optimization scheme here as
where superscript is used to denote variables at the iteration. We have exploited the factorization of the joint log-likelihood into factors as
For generality we have each factor, , affected by the entire parameter set, , but in practice it can be a subset. is a projection matrix that extracts from :
The marginal of associated with is
Critical to the efficiency of the ESGVI framework is the ability to compute the required marginals in (4a) and (4b), without ever constructing the complete (dense) covariance matrix, . A sparse solver based on the method of Takahashi et al.  is used to achieve this in .
The expectations in (4a) and (4b) can be approximated using Gaussian cubature samples (e.g., sigmapoints) of the marginal posterior. Importantly, approximating the expectations at only the mean of the posterior is equivalent to the MAP batch optimization with Newton’s method. Barfoot et al.  also provide a derivative-free optimization scheme with only Gaussian cubature, which we do not show here.
In the M-step, we hold fixed and optimize the upper bound for the parameters, . We can optimize for by taking the derivative of the loss functional as follows:
In the last step, the expectation reduces from being over the full Gaussian, , to the marginal associated with the variables in each factor, . We can then set the derivative to zero and isolate for a critical point, formulating an M-step. If isolation is not possible, we can use the gradient in (III-A) for a partial M-step, which is known as Generalized Expectation Maximization (GEM) .
Iii-B Alternate Loss Functional
In the E-step, we hold fixed and optimize for the best possible Gaussian fit. Barfoot et al.  present an alternate, Gauss-Newton-style loss functional for when the negative log-likelihood takes the form
where is now a covariance matrix, . With Jensen’s inequality  and dropping the second term since is a constant in the E-step, we can write
Motivated by this relationship, we can define a new loss functional for the E-step as
which is a conservative approximation of , appropriate for mild nonlinearities and/or concentrated posteriors. The alternate loss functional is simpler to implement in practice as it does not require the second derivative of the factors222Another alternative is the derivative-free optimization scheme with cubature sampling, at the cost of requiring more cubature samples (i.e., more computation). A derivative-free scheme for the alternate loss functional is also possible .. Also note how evaluating the expectation only at the mean of the posterior is equivalent to MAP Gauss-Newton.
Iv Parameter Learning for Robot Noise Models
Iv-a Constant Covariance
Barfoot et al.  outline parameter learning (M-step) for constant covariance noise models, which we summarize here. Our loss functional is
This expression is similar to (9), but we can exploit the factorization of to write:
where the factors (in practice, it could be a subset) are affected by the unknown parameter , a constant covariance matrix. Evaluating the derivative, as shown in (III-A), with respect to and setting to zero for computing a minimum results in the optimal to be
which can be approximated with Gaussian cubature if the error functions, , are nonlinear.
Alternatively, we can choose to linearize at the posterior marginal, , resulting in the following M-step:
Iv-B White-Noise-On-Acceleration Prior on Latent State
We next demonstrate parameter learning for the situation where the covariance of a factor is indirectly estimated through another quantity. Consider the example of a white-noise-on-acceleration (WNOA) motion prior on the latent state, where the parameter we wish to estimate is the power-spectral density matrix, . More specifically, let us study the application of the prior in , which is defined as follows:
where is the pose expressed as a transformation matrix, is the body-centric velocity, is a zero-mean, white-noise Gaussian process, and the operator, , transforms an element of into a member of Lie algebra, . The state at time is , and similarly, is the state at two consecutive times, and .
We express the factors of our loss functional from (3), but with only WNOA prior factors for simplicity:
and the covariance of the prior, , is defined as 
where is the Kronecker product. Solving for the derivative with respect to , the matrix element of , we have
where is the marginal posterior at two consecutive times, and . Setting the derivative to zero, the optimal estimate of our parameter is then
Iv-C Inverse-Wishart Prior on Covariance
We further extend covariance estimation by incorporating a prior. In order to do so, we redefine our joint likelihood as
where now we also include the covariances, , as random variables. We also redefine our posterior estimate to be
a product between a Gaussian and a posterior distribution for the covariances, .
The upper bound term in the EM decomposition of (III-A) can now be written as
We define the posterior over the covariances as
is the Dirac delta function (interpreted as a probability density function) andis the set of optimal covariances. The upper bound now simplifies to
where we have abused notation and written as , and similarly will later write as . We view our selection of the delta function as a convenient way of showing how we can approximate a Gaussian distribution for the trajectory and a MAP approximation of the covariances in a single variational framework. The last term is the differential entropy of a Dirac delta function, and because it is independent of our variational parameter, , we choose to drop it from our loss functional.
We assume factors as
We apply an IW prior over our covariances by defining
where is the dimension of , is the scale matrix,
is the degrees-of-freedom (DOF), andis the multivariate Gamma function. The IW distribution has been used as a prior over covariance matrices before, which led to outlier rejection at inference [27, 28], but the parameters of the prior were assumed to be known. We choose to estimate the scale matrix parameter and leave the degrees-of-freedom as a metaparameter.
Now we define our factors as
Dropping constant terms, the loss functional can finally be written as
In the E-step, we hold fixed and optimize for , which we accomplish by taking the derivative of the loss functional as follows:
Setting the derivative to zero,
where we see the optimal is a weighted average between the mode of the IW distribution and the optimal static covariance estimate from (14) at a single marginal factor. Since our E-step in ESGVI is already iterative, we can seamlessly extend it by applying (34) as iteratively reweighted least squares (IRLS).
In the M-step, we hold fixed and optimize for , which we accomplish by taking the derivative of the loss functional as follows:
Setting the derivative to zero,
) in the M-step, we found that our optimzation scheme was still ill-posed, and our covariance estimates tended toward the positive-definite boundary (i.e., the zero matrix). We propose constraining the determinant ofto be a constant , which can be thought of as constraining the volume of the uncertainty ellipsoid of the corresponding measurements to be fixed. We accomplish this by scaling the latest update as follows:
We then rely on the noise models of other factors (e.g., the motion prior) to adapt to our selection of during training.
V Experimental Validation
To evaluate our parameter learning method, we will be working with the vehicle dataset collected and used in our previous work . The dataset consists of 36 km of driving, with Velodyne VLS-128 lidar data and an Applanix POS-LV positioning system. An image of the experimental vehicle setup is shown in Figure 2. There are two sources of 6-DOF vehicle pose measurements. The first is from the POS-LV system, which we treat as groundtruth. The second is from a lidar localization system from Applanix, which localizes the lidar data to a prebuilt high-definition map.
We use Route A333 Map available at: https://tinyurl.com/rrjgxaj, our 16 km long training set, to learn the parameters of our noise models. For inference, we perform a batch trajectory optimization on Route B444 Map available at: https://tinyurl.com/r5m78nq, our 20 km long test set, using the learned noise model parameters of our motion prior and measurements.
V-a Training With and Without Groundtruth
In Experiment A, our first experiment, we only use the lidar localization measurements to learn our model parameters (training without groundtruth). As a benchmark, we also learn another set of model parameters where we additionally include groundtruth poses in our training (training with incomplete groundtruth). This is different from our previous work  where the training method required groundtruth of the entire state (training with complete groundtruth), which for our problem setup is pose and body-centric velocity. Additionally, in that paper, the measurement covariances were assumed to be known and not learned.
The loss functional corresponding to this experiment is
where are the WNOA prior factors, are the groundtruth factors (when available), and and are the lidar measurement factors with an IW prior over the covariances. See (18) for the definition of and (13) for the definition of and . For the definition of , see (IV-C) and (32).
The WNOA error function (required for ) is shown in (19), and the error function for pose measurements (required for ) is defined as
The estimation problem in this experiment can be represented by the factor graph in Figure 1, where we can train with or without the groundtruth factors, which are shown inside the dashed box. For the sake of conciseness in our notation, we denote as , as , as , and as .
We choose to fix the parameters to and and learn the parameters , (when groundtruth is available), and . For both sets of learned parameters, we then perform trajectory estimation on our test set, where we only use the lidar localization measurements with our learned covariance model and our learned motion prior.
Figure 3 shows the error plots where we have trained without groundtruth for our estimated , , and positions, along with their covariance envelopes. As can be seen, the errors consistently remain within the covariance envelopes. We do however note that our estimator appears to be underconfident. We believe that this is a result of our decision to constrain in order for our training method to work in practice. This decision is analogous to fixing the volume of the covariance ellipsoid to be constant. In doing so, we relied on the learned covariance of the motion prior to adjust relative to the measurement covariances. The posterior mean is unaffected by this choice but not the posterior covariance.
Table I shows the resulting mean translational errors from both training methods on all test sequences. We also include the results from our previous work where we trained using complete groundtruth for comparison.
While we achieve very similar errors across all training methods, the benefit is that we now do not require any groundtruth. Neither of the three training methods seem to outperform the others. We believe this is because our lidar localization measurements are quite accurate relative to groundtruth .
To further validate our method and show that we can indeed train with noisy measurements, we decided to artificially add additional noise to the measurements, where the noise statistics are unknown to the training process. We use the following perturbation scheme [27, 29] to inject noise into the position portion of our pose measurements:
We vary from 0.25 m to 1 m, injecting the same amount of noise into the test measurements and training measurements.
Table II shows how our test errors change with increasing noise on measurements in both our training and test set. While measurement error increases significantly, up to over 1.6 m, we are still able to achieve translational errors of below 0.5 m on our estimated trajectory. This shows that we are still able to learn reasonable parameters of our system even without any groundtruth and quite noisy measurements.
V-B Training and Testing With Measurement Outliers
In Experiment B, we show that our formulation of an IW prior on our measurement covariances results in outlier rejection. We artificially introduce outliers in our training and test set using the following method. With 5% probability, we apply the following perturbation to the actual pose measurement:
Figure 4 shows an example of the measurement outliers on sequence 3 of our test set.
We now seek to compare the performance between the cases where we have treated the measurement covariance, , as a static parameter to be learned, and where we have treated the measurement covariance at each time as a random variable and learn the parameter, , of the IW prior.
The loss functional corresponding to the static measurement covariance is
whereas for the IW prior on the measurement covariances, the loss functional is
Table III shows the resulting translational errors on our test trajectory. We can see that without the IW prior, the estimation framework fails to reject outliers, resulting in an overall translation error of above 5 m. However, using the IW prior, we see that the error is only 0.2365 m. When we did not have any outliers at all in our data, the error was 0.2335 m (Table I), meaning the average translational error on our test set only increased by 0.003 m.
From this experiment, we can see that using the IW prior allows for the handling of outliers in both training and testing.
|Experiment B||Experiment C|
V-C Training Without and Testing With Measurement Outliers
In Experiment B, we included outlier measurements in both the training and test set and saw that the IW prior allows us to achieve comparable errors to the case with no outliers. To see if this still holds even when we do not see any outliers in training, in Experiment C we train without any outliers but test with outliers. As the only difference between Experiment B and Experiment C is that we now train without any outliers, the loss functionals remain the same.
Table III shows that the resulting translational errors are again very high when we simply learn a static measurement covariance, but that we can still achieve reasonably low errors when learning the parameters of our IW prior. By incorporating the IW prior instead of learning a static measurement covariance, we decrease error from above 6 m to 0.2355 m. Compared to the error of 0.2335 m when there are no outliers in our test set (Table I), we see an increase in error of only 0.002 m with the IW prior. This experiment shows that we can indeed still benefit from the outlier rejection scheme that comes with using an IW prior even when there are no outliers in our training set.
Comparing the results from Experiments B and C, we see that in both cases, incorporating the IW prior helps to reject outliers in the test set, regardless of whether there were any outliers in the training set. While errors for the static measurement covariance were similarly poor in both experiments, we note that the concentration of the errors are different as shown in Figure 5. What we see is that when we train without any outliers (Experiment C), the errors are concentrated where the outliers are in the test set. This result is unsurprising given that no outliers were seen in training, which in turn is reflected in our learned noise model parameters. When we train with outliers (Experiment B), the errors still peak around the outliers, but are more spread out over the entire trajectory. Regardless of this difference, we can see that using the IW prior is robust to both cases and still results in low translational errors.
Vi Conclusions and Future Work
In this paper, we presented parameter learning for ESGVI. We showed that our parameter learning method does not need groundtruth, and is robust to noisy measurements and outliers. This is desirable because in many cases, we do not have a way of obtaining accurate groundtruth of robot trajectories and must estimate robot parameters based on whatever sensors are available. We experimentally demonstrated our method on a 36 km vehicle dataset.
However, we still assumed two parameters to be known: , the DOF parameter for the IW distribution, and , the determinant constraint on the scale matrix, . For future work, we will investigate how to additionally learn and eliminate the need to constrain the determinant of to be a constant, .
In this work, we chose to learn the noise model parameters as a useful practical application of our framework. However, our future intention with ESGVI is to learn entire robot models that are represented by rich modelling techniques, such as DNNs.
This work was supported by Applanix Corporation and the Natural Sciences and Engineering Research Council of Canada (NSERC). The test vehicle was donated by General Motors (GM) Canada.
-  T. D. Barfoot, J. R. Forbes, and D. J. Yoon, “Exactly sparse gaussian variational inference with application to derivative-free batch nonlinear state estimation,” arXiv:1911.08333 [cs.RO].
-  J. N. Wong, D. J. Yoon, A. P. Schoellig, and T. Barfoot, “A data-driven motion prior for continuous-time trajectory estimation on se (3),” RA-L, vol. 5, no. 2, pp. 1429–1436, 2020.
-  K. J. Åström and P. Eykhoff, “System identification—a survey,” Automatica, vol. 7, no. 2, pp. 123–162, 1971.
-  M. Gevers, “A personal view of the development of system identification: A 30-year journey through an exciting field,” IEEE Control systems magazine, vol. 26, no. 6, pp. 93–105, 2006.
-  G. Kerschen, K. Worden, A. F. Vakakis, and J.-C. Golinval, “Past, present and future of nonlinear system identification in structural dynamics,” Mechanical systems and signal processing, vol. 20, no. 3, pp. 505–592, 2006.
-  L. Fu and P. Li, “The research survey of system identification method,” in Intl. Conf. on Intelligent Human-Machine Systems and Cybernetics, vol. 2. IEEE, 2013, pp. 397–401.
-  J. Kokkala, A. Solin, and S. Särkkä, “Expectation maximization based parameter estimation by sigma-point and particle smoothing,” in Intl. Conf. on Information Fusion. IEEE, 2014, pp. 1–8.
-  ——, “Sigma-point filtering and smoothing based parameter estimation in nonlinear dynamic systems,” Journal of Advances in Information Fusion, vol. 11, no. 1, pp. 15–30, 2016.
-  R. H. Shumway and D. S. Stoffer, “An approach to time series smoothing and forecasting using the EM algorithm,” Journal of Time Series Analysis, vol. 3, no. 4, pp. 253–264, 1982.
-  Z. Ghahramani and S. T. Roweis, “Learning nonlinear dynamical systems using an em algorithm,” in NIPS, 1999, pp. 431–437.
-  T. B. Schön, A. Wills, and B. Ninness, “System ident. of nonlinear state-space models,” Automatica, vol. 47, no. 1, pp. 39–49, 2011.
-  M. Gašperin and D. Juričić, “Application of unscented transformation in nonlinear system identification,” IFAC Proceedings Volumes, vol. 44, no. 1, pp. 4428–4433, 2011.
-  P. Abbeel, A. Coates, M. Montemerlo, A. Y. Ng, and S. Thrun, “Discriminative training of kalman filters.” in RSS, vol. 2, 2005, p. 1.
-  A. Mohamed and K. Schwarz, “Adaptive kalman filtering for ins/gps,” Journal of geodesy, vol. 73, no. 4, pp. 193–203, 1999.
-  C. Hu, W. Chen, Y. Chen, D. Liu, et al., “Adaptive kalman filtering for vehicle navigation,” Journal of Global Positioning Systems, vol. 2, no. 1, pp. 42–47, 2003.
-  Y. Yang and W. Gao, “An optimal adaptive kalman filter,” Journal of Geodesy, vol. 80, no. 4, pp. 177–183, 2006.
-  J. Ko and D. Fox, “Learning gp-bayesfilters via gaussian process latent variable models,” Autonomous Robots, vol. 30, no. 1, pp. 3–23, 2011.
-  M. Brossard, A. Barrau, and S. Bonnabel, “Ai-imu dead-reckoning,” arXiv preprint arXiv:1904.06064, 2019.
-  K. Liu, K. Ok, W. Vega-Brown, and N. Roy, “Deep inference for covariance estimation: Learning gaussian noise models for state estimation,” in ICRA, 2018. IEEE, 2018, pp. 1436–1443.
-  R. L. Russell and C. Reale, “Multivariate uncertainty in deep learning,” arXiv preprint arXiv:1910.14215, 2019.
-  A. G. Wilson and Z. Ghahramani, “Generalised wishart processes,” in Proc. of Conf. on Uncertainty in Artificial Intelligence, ser. UAI’11. Arlington, Virginia, USA: AUAI Press, 2011, p. 736–744.
-  K. Takahashi, J. Fagan, and M.-S. Chen, “A sparse bus impedance matrix and its application to short circuit study,” in Proc. of the PICA Conference, 1973.
-  C. M. Bishop, Pattern Recog. and Machine Learning. Springer, 2006.
-  J. L. W. V. Jensen, “Sur les fonctions convexes et les inégalités entre les valeurs moyennes,” Acta mathematica, vol. 30, pp. 175–193, 1906.
-  T. D. Barfoot, C. H. Tong, and S. Särkkä, “Batch Continuous-Time Trajectory Estimation as Exactly Sparse Gaussian Process Regression,” in RSS. Citeseer, 2014.
-  S. Anderson and T. D. Barfoot, “Full STEAM Ahead: Exactly Sparse Gaussian Process Regression for Batch Continuous-Time Trajectory Estimation on SE(3),” in IROS, 2015. IEEE, 2015, pp. 157–164.
-  T. D. Barfoot, State Estimation for Robotics. Cambridge, 2017.
-  V. Peretroukhin, W. Vega-Brown, N. Roy, and J. Kelly, “Probe-gk: Predictive robust estimation using generalized kernels,” in RA-L. IEEE, 2016, pp. 817–824.
-  T. D. Barfoot and P. T. Furgale, “Associating Uncertainty With Three-Dimensional Poses for Use in Estimation Problems,” IEEE Trans. on Robotics, vol. 30, no. 3, pp. 679–693, 2014.