I Introduction
At the core of most control algorithms in robotics is a model that captures the relationship between the state, the input, and the dynamics of a robotic system. The model can be used to optimize a reward function and to ensure that the system achieves its goals in a safe and reliable way [1, 2]. If the model for the system is partially unknown, the reward function can incorporate an element to encourage exploration of the system dynamics [3, 4]. This establishes a better mapping between the state, input, and dynamics, such that the controller can later exploit wellknown, highreward actions [3]. An accurate assessment of the risk associated with taking a control action, especially if it has not been taken before, is important during the exploration process [5, 6]. Using an assessment of this risk to ensure safety is known as safe learning.
Safe learning methods generally incorporate an approximate initial guess for the system dynamics with some bounds on the modelling error incurred in the approximation [7, 8]. A learning term then refines the initial guess over time using experience data to better approximate the true dynamics. The goal is to guarantee that the system does not violate safety constraints (e.g., limits on the control input or path tracking error) while achieving the control objective (e.g., following a path) and at the same time improving the model of system and, consequently, its task performance over time. Most learning algorithms learn a single model for system dynamics or use multiple models that are trained ahead of time based on appropriate training data from operating the robot in all relevant conditions [7, 4, 9, 10, 11, 12]. This presents a challenge for robots that are deployed into a wide range of operating conditions which may not all be known ahead of time.
This paper builds on work in [7], which proposes a robust, learningbased Model Predicitve Controller (MPC) for repetitive path following using a visionbased localization algorithm [13]. The controller developed in [7] uses local Gaussian Process (GP) regression to construct a model for the dynamics at each timestep. These models are valid over a small section of the path around the vehicle’s current position. They are based on a fixed number of training points, or experiences, so have fixed computational cost. Our contribution is to generalize this approach for longterm safe learning with large and small, repeated changes in the dynamics that depend on the operating conditions or physical configuration of the robot (see Fig. 1). Examples are weather, terrain conditions [14], or payload configuration [15]. The proposed method builds local GPs of the robot dynamics on each previous run to select experiences from the run with the most similar dynamics. These experiences are used to construct the GP used for control. This allows the GP used for control to leverage knowledge related to changes in dynamics that are both sudden and gradual as long as a similar change has been observed in the past. We use the same GP hyperparameters as the GP in the controller. Accordingly, we introduce no additional model parameters and are able to assess how likely it is that the GP constructed from previous experiences will satisfy the assumptions of the safe controller, namely that the 3 bounds on uncertainty are an accurate upper bound on model error. The local GPs used by our method are inexpensive to compute enabling the robot to learn over and leverage data from a large number of runs without having to ‘forget’ previous experiences. This is a significant advantage over previous methods.
Ii Related Work
Learning control has received a great amount of attention in recent years, most notably in the case of singlemode learning control. This is the broad class of learning methods that assumes the true (but initially unknown) mapping between the state at time , , and the input , and the next state,
, is onetoone, or at least normally distributed according to some underlying process,
. Recent developments have contributed safety guarantees [8] and demonstrated impressive results in improved path following [16].Multimodal safe control and path planning has also received a growing amount of attention. Applications include safely gliding a parafoil under a variety of wind conditions [10] and planning safe paths among uncertain agents such as pedestrians or automobiles [12]. The assumption and challenge in these cases is that the environment or obstacles in the environment have hidden states that change the dynamics cannot be measured directly. Similar to our method, the algorithms try to infer this hidden state based on available observations and use this for safe planning. An additional feature of our approach is that we attempt to build this model online and do not explicitly require discrete changes in dynamics.
Recent results in singlemode, safe learning control have taken great steps to improve performance while maintaining bounds on modelling error and therefore safety. Approaches by [16, 7, 17, 18, 19]
use GPs as corrective terms for approximate prior models and update them over time as more experience is gathered. A GP assumes a single underlying function with additive Gaussian noise. They are a perfect tool when there is only one mode for the dynamics or the mode can be measured directly. Bounds on the model error are used to allocate margin on safety constraints such that the system is robust to this model error. It is essential for safety that the bounds from the GP, usually some multiple of the standard deviation, bound the true model error. It is essential for high performance that the bounds are not unnecessarily conservative. A GP assumes a single underlying function with additive Gaussian noise, and is excellent when there is only one mode for the dynamics; however, if there are multiple dynamic modes (e.g., caused by driving both on snow and asphalt), the single GP must have overly conservative bounds to account for the dynamics in all modes (which may be significantly different), and may learn some combination of the dynamics in each mode which is suboptimal in either mode. This is of particular concern to algorithms that update the GP online.
One model that exhibits especially good realtime performance and has been demonstrated in several realworld examples is presented in [7]. This approach continually reconstructs the GP disturbance model based on a fixed number of data points, to ensure the process model can be evaluated in constant time even if new experience is added. Storing the data in firstinfirstout bins of fixed size allows the algorithm to update the data used in the GP in real time. If the mode changes, the model unlearns the existing mode by overwriting all of that data and relearns the new mode. During this process, it suffers from the same problems related to hyperparameters as mentioned above including either requiring overconservative bounds to accommodate multiple modes, or have bounds that are realistic for a singlemode, but are unsafe while the model transitions between modes and is using data from more than one mode. Our method aims to overcome these limitations by only choosing data that is relevant to the current mode.
In addition to the singlemode, safe learning controllers, multimodal algorithms exist which identify a number of dynamic modes ahead of time using labelled or unlabelled training data and switch to the most likely model during operation [20, 10, 12, 21, 22]. This allows them to maintain persistent knowledge of a robot’s dynamics across a wide range of operating conditions. Inferring the correct mode from measurements during operation allows them to maintain a high level of performance and robustness even when the mode is not directly observed. The method proposed in [23] for linear systems even infers the number of modes at training time. These approaches do, however, require that the number of modes and/or training data from each mode be available ahead of time, which can be a challenging task in robotics. In contrast, our method does not require the number of modes or training data from each mode to be available ahead of time. Rather, it learns new dynamics as they arise during operation.
In our previous work, we presented an approach for learning multimodal dynamics by combining GPs and the Dirichlet Process, which is used in Bayesian nonparametric clustering models [15]. This allowed the robot to learn a new GP model for novel operating conditions and leverage an existing GP when the robot revisited an operating condition. The GPs used in [15] represented the dynamics over the entire region of the state space and therefore required a large number of training points to be effective. These GPs could therefore not be used directly in the controller, which is limited to GPs with only a small number of training points. The proposed method overcomes this by keeping the size of the GP used for control constant. In addition, the previous approach only used a relative measure of model quality. That is, it would only switch to the prior, safe mode if that described the current dynamics better than the existing set of GP experts. In this work, we add an explicit check to ensure that the assumptions made by the safe controller are likely to be valid.
In light of the current approaches and their limitations, the goal of this paper is to present a method for adapting to multiple dynamic modes over a long period of time with guarantees on safety using a realistic and computationally efficient representation of the system dynamics (including predictive uncertainty). The aim is to design an algorithm for life long model learning to achieve excellence in the relevant operating conditions regardless of whether they are known ahead of time.
Iii Problem Statement
The goal of this work is to learn a model for the dynamics of a ground robot performing a repetitive, pathfollowing task. The robot may be subjected to large changes in its dynamics due to factors such as payload, terrain, weather, or tyre pressure changes. We assume that these factors cannot be measured directly. The algorithm should scale to longterm operation and take advantage of repeated runs in the same operating conditions. The model should also include a reasonable estimate of model uncertainty that acts as an upper bound on model error at all times.
Further assumptions can be summarized as follows:

The mapping can be modelled as a GP for a single run.

The operating condition is constant over a short time horizon.

The number of operating conditions and the mapping for each is not known ahead of time.
A short time horizon could be similar to the horizon considered for MPC.
The system can be modelled by some nominal dynamics with additive, initially unknown dynamics that are specific to discrete or continuous operating conditions and depend on features , so
(1) 
The unknown dynamics are assumed to be a deterministic function with additive, zeromean, Gaussian noise,
(2) 
where , and is the measurement noise covariance.
Iv Methodology
In this section, we present our approach for longterm, safe learning control. Our approach makes extensive use of local GPs to model the robot dynamics.
Iva Gaussian Process (GP) Disturbance Model
We model the unknown dynamics, , as a GP based on past observations. We drop the for notational convenience because we learn a GP for each operating condition separately. Since there are many good references on GPs [24], here we provide only a highlevel sketch. The learned model depends on previously gathered experiences which are assembled from measurements of the state denoted by and the input using (1), so that
(3) 
The resulting pair, , forms an individual experience. For simplicity, we model each dimension of the disturbance using a separate GP. Below, we derive the equations for a single dimension of denoted by .
A GP is a distribution over functions given past experiences, , and kernel hyperparameters.
We assume the experiences are noisy observations of the true function ; this is, where
. The posterior distribution is characterized by a mean and variance which can be queried at any point
using(4)  
(5) 
where
is the vector of observed function values, the covariance matrix
has entries , where is the Kronecker delta, and the vector contains the covariances between the new test point and the observed data points . For this work, we use the squared exponential kernel,(6) 
because of its success in modelling robot dynamics [16, 6, 7, 18]. The hyperparameters are the diagonal matrix, , of lengthscales which are inversely related to the importance of each element of , and the process noise variance, , which is the variance of the prior family of functions represented by .
As training data is added to a particular GP, uncertainty is reduced and the posterior distribution of the GP specializes to a particular family of functions which represents the system dynamics in a particular operating condition. The job of the experience recommendation, which is the contribution of this work, is to choose training data that results in the GP specializing to the functions that represent the robot dynamics in the current operating condition.
IvB Controller and System Model
For this work, we use the robust model predictive controller from [7] and refer the reader to this paper for details.
IvC Data Management
The purpose of our method is to construct the best possible model of the system dynamics for MPC. MPC uses the dynamics over the upcoming section of the path to compute the control. We use data from the recently traversed part of the path to determine which past runs are relevant and can be used for the MPC prediction model. Referring to Fig. 2, for each previous run, , we use data, , from the recently traversed section of the path to construct a local GP, . Each of these GPs is then used to generate predictions for the mean and variance, , at each of the ’s in recent experience from the current run, . These estimates are used to identify the most similar run to the current run and reject runs that are substantially different from the current run.
To update the control GP (see Fig. 1) with new points to model the dynamics on the upcoming section of the path, we randomly draw ten experiences (if available) from the most similar run in a window ahead of the vehicle overlapping with the MPC prediction horizon. These experiences are added to the set of experiences already in the control GP. Fifty experiences (if available) are then randomly chosen from this combined set to get a new set of experiences for the control GP [15]. If no runs are recommended, we randomly remove ten data points from the set in the GP used for control. If this happens several times in a row, the control GP will quickly revert to the prior which acts as a ‘safe mode’ since its uncertainty bounds are the most conservative. We only use experiences from the most similar run, however, we could easily make use of experiences from multiple runs if not enough experience was available from one run.
The control GP uses fifty points to allow the controller to run at 10 Hz with a 1.5 second lookahead. Updating the GP incrementally helps the model change smoothly which in turn results in smoother control actions. We found to be helpful during our experiments.
IvD Run Rejection Criterion
Our first step is to eliminate runs where the dynamics are so different from the current dynamics that using data from these runs is likely to result in model errors that violate assumptions made by the safe controller. In particular, we must ensure that the bounds on the prediction from the GP are a reasonable upper bound on the model error [7].
To do this for candidate run , we test whether the proportion of samples from that lie further than from the corresponding predicted is significantly higher than would be expected by chance. We do this using the binomial test.
Let be the number of inputoutput pairs in and
be the number of outliers, points outside of the predicted
bounds, according to the predictions from. The binomial distribution
describes the probability of drawing exactly
outliers from independent samples where the probability of drawing an outlier is . We calculate the probability of or more outliers using(7) 
We reject the run if where is the significance level, or the probability of falsely rejecting a run. We chose a significance level for our experiments. This may be reduced to avoid falsely rejecting runs, or increased to be more conservative. Since is such a conservative bound, changing only changes the allowed number of outliers by one or two for so the algorithm is not very sensitive to this parameter.
Any runs that make it past this step are considered as candidates for drawing experiences to model the dynamics over the upcoming section of the path.
IvE Run Similarity Measure
Our next step is to identify which runs are most similar to the current run given recent data from the live run, and corresponding predictions from the local GP for each candidate run, .
We assume that the vehicle can be in a different operating condition for each run, and that one set of GP hyperparameters is sufficient to describe the robot dynamics in each operating condition when operating conditions are considered separately. We can then compute the posterior probability that the current dynamics are from the same operating condition,
, as run using(8) 
The second term on the right is the prior, which we assume to be equal for all runs. The first term on the right is the probability of recent experiences if the operating conditions are the same as run . Assuming each experience is independent, this is
(9) 
where and are the predicted mean and variance of the GP for run evaluated at point , which is in . Similar to our previous work [15], we reject any run that has lower probability than the GP prior of generating . This is to ensure that experience added to the GP for control is likely to improve the performance beyond what could be achieved with no experience at all.
In our implementation, we use the logprobability to avoid numerical issues and do not normalize the posterior since we are only interested in finding the most likely run and not the actual probability distribution. We denote the unnormalized logprobability for run
with . The run with the largest is chosen as the recommended run.IvF Overview of the Algorithm
Putting the components above together, we arrive at the experience recommendation algorithm, Alg. 1.
V Experiments
Experiments were conducted on a 900 kg Clearpath Grizzly skidsteer ground robot shown in Fig. 3. We tested our algorithm with the Grizzly in three configurations. First, the nominal configuration, with no changes to the vehicle. Second, the loaded configuration, with six bags of gravel, weighing approximately 30 kg each, in a cargo carrier mounted on the Grizzly (see Fig. 3). Finally, the altered configuration, where the rotational rate commands were multiplied by 0.7. Compared to the nominal configuration, the loaded configuration results in oversteer and the altered configuration results in understeer.
Our first experiment was conducted in a parking lot on a 42 m long course. During this experiment, the configuration was switched between the nominal and altered configurations. We compare our proposed method to a baseline method, which uses only experiences from the most recent run. We conducted three runs in each configuration to allow the baseline method to converge to the dynamics in each configuration before switching. This serves as a simple case to demonstrate a few features of our algorithm.
Our second experiment was also conducted in a parking lot on a similar course. However, we switched between all three configurations after only two runs in each over a total of 30 runs to compare our method to the baseline method during long term operations. This was to demonstrate that the proposed method continues to learn during long term operation and maintains high performance in all three configurations regardless of the order of configuration switches.
Va Implementation
Our algorithm was implemented in C++ and can process up to 300 runs in a single thread at 2 Hz on an Intel i7 2.70 GHz 8 core processor with 16 GB of RAM. This number is extrapolated based on the fact that the computational cost of the proposed method scales linearly with the number of runs. We consider the last three seconds of data (30 samples) from the live run for . The experience recommendation process runs in a separate thread to the controller. Therefore, it does not add any computation time to the control loop and can run at a different rate to process more runs. Our controller relies on a visionbased system, Visual Teach and Repeat [13], for localization.
VB System Model and Controller Parameters
Our process model is the unicycle with an additive GP learning term [7],
(10) 
where and are the commanded speed and turn rate, and are the position in the reference frame, is the orientation, and is the timestep of the controller. The cost function for MPC is a quadratic penalty on lateral error, heading error, , where is the desired speed, , and . The respective weights are 500, 35, 5, 4, 1000 and 500. The desired speed was set at 1.5 m/s for all of our experiments.
VC Model Predictive Performance
In order to evaluate the quality of the models constructed using our method, we first compared the multistep prediction performance of a GP constructed using our experience recommendation method to a GP constructed using experiences from the most recent run. We consider the rotational dynamics, because they differ the most between configurations.
To measure the accuracy of the prediction of the mean, we use the Multistep RMS Error (MRMSE) between the prediction made over the lookahead horizon in MPC and the measured state at these times. To measure the accuracy of the error bound, we will use the Multistep RMS Zscore (RMSZ) of the prediction at the future timesteps.
The MRMSZ for a prediction of timesteps is
(11) 
where and are the mean and standard deviation of the predicted rotational rate evaluated at the predicted GP input . The true measurements of angular velocity, , are used for comparison. An MRMSZ around one is ideal. A larger MRMSZ around two indicates that the model is overconfident and MRMSZ less than one indicates that the model is conservative.
Figure 4 shows that the MRMSE after a transition is up to 2.5 times higher when using experiences from the last run compared to the proposed method. The MRMSZ shows a similar trend with the proposed method continually closer to one than the baseline method. We ignore run one for this comparison because the robot did not have any experience.
VD Closed Loop Performance
To assess the impact of our method on closed loop performance, we compare the speed, control cost, and tracking error using our method compared to when the GP is constructed using experiences from the last run.
Figure 5 shows that after transitions from the nominal configuration to the altered configuration, the proposed method significantly lowers the control cost compared to the baseline method. This cost comes from large lateral tracking errors due to understeering. We do not observe the same difference when transitioning to the nominal configuration because the vehicle will tend to oversteer in this case, and the penalty function in MPC penalizes the magnitude of the turn rate commands, which naturally prevents the vehicle from oversteering. Choosing a different control cost may result in an increased cost for transitioning the configuration either way.
VE Experience Recommendation by Configuration
Figure 6 shows that the proposed method prefers experiences from the same configuration as the vehicle’s current configuration and relies heavily on experiences several runs in the past. This demonstrates that it is beneficial to store not only the most recent experiences, but also many experiences from the past in order to leverage experiences from the same configuration.
Some of the time, experiences are chosen from runs with a different configuration. Figure 7 shows that this is primarily along straight sections of the path where the vehicle is not turning and therefore the dynamics are similar. On all sections of the path with sharp corners where the dynamics are the most different between configurations, the proposed method prefers experiences from the same configuration.
The proportion of time that the proposed algorithm rejects all previous runs decreases rapidly over time. The highest proportion is during the first three runs (Nominal 1 in Fig. 6) where during the first run, there are no previous runs to draw experiences from, and during the second and third run, the algorithm rejects all runs 17% and 11% of the time, respectively. After this initial phase of adaptation, the algorithm manages to find relevant experience over 89% of the time for each run.
VF Closed Loop Performance, Long Term Experiment
Our second experiment was to demonstrate the benefit of the proposed method during long term operation with frequent changes in the operating conditions. The configuration was switched every two runs giving the baseline method one opportunity to adapt before changing the configuration again.
Figure 8 shows the cumulative control cost over thirty runs of a similar course to the first experiment. On average, the proposed method results in a 37% lower cumulative control cost compared to the baseline method primarily due to transitions to the altered configuration, which is the most distinct of the three.
In addition, the number of times the algorithm rejected all candidate runs decreased dramatically after the initial adaptation just like in the first experiment. For the first three pairs of runs, all runs were rejected 55%, 19% and 4% of the time, respectively. After this, the algorithm found matching experiences 97% of the time with the exception of the second pair of runs in the altered configuration, where it rejected all runs 9% of the time.
Vi Discussion
Our method requires that the GP is a good model for the dynamics in each configuration to work well. We conducted experiments in more challenging offroad environments and the GP did not achieve lower MRMSE than the prior even when learning only from runs in the same configuration. In these cases, our approach could not improve performance simply by changing the experiences in the model. It is important to note that during these runs, the vehicle remained within path tracking constraints so the model error did not jeopardize the safety of the vehicle, it just did not improve the performance. Improving the underlying controller in this way is beyond the scope of this paper, however our method is applicable to any modelbased controller that relies on local probabilistic models of the dynamics.
In addition, our method requires that the dynamics over the previous section of the path are a good indicator of the dynamics on the upcoming section of the path. The operating conditions are always sampled discretely (by run) but can be continuous (e.g. the factor we multiply turn rate commands). Although the method infers by run, which is discrete, it will still work for continuous variables that vary consistently between runs.
Our method also requires large changes in the dynamics to produce a noticeable improvement. The dynamics in the loaded and nominal configurations were actually quite similar, so using data from one or the other did not produce large enough changes to be noticeable over the course of a run. If a new configuration with half the load was added, our method would automatically determine that it was similar to the loaded or nominal configurations and leverage data from runs in those configurations.
Vii Conclusion
In this paper, we presented a new, principled method for experience recommendation for long term, GPbased, safe learning control. We demonstrated in closed loop experiments how this method can be used to improve the performance of a controller conducting repeated traverses of a path when the dynamics switch between distinct configurations. This enables the controller to maintain high performance when revisiting operating conditions that have been seen before and safely learn new dynamics when new operating conditions are encountered.
In future work, we aim to address the case where the dynamics along the previous section of the path match two or more runs with different dynamics on the upcoming section of the path. The current approach would take a moment to disambiguate the two during which the prediction performance would degrade. Ideally, we would like to avoid this and pass on this information to the controller.
References
 [1] J. Kober, J. Bagnell, and J. Peters. Reinforcement Learning in Robotics: A Survey. Intl. Journal of Robotics Research (IJRR), 32(11):1238–1274, 2013.
 [2] F. Berkenkamp and A. P. Schoellig. Safe and Robust Learning Control with Gaussian Processes. In Proc. of the European Control Conference (ECC), pages 2501–2506, 2015.
 [3] T. Moldovan, S. Levine, M. Jordan, and P. Abbeel. OptimismDriven Exploration for Nonlinear Systems. In Proc. of the Intl. Conf. on Robotics and Automation (ICRA), pages 3239–3246, 2015.
 [4] C. Xie, S. Patil, T. Moldovan, S. Levine, and P. Abbeel. Modelbased Reinforcement Learning with Parametrized Physical Models and OptimismDriven Exploration. In Proc. of the Intl. Conf. on Robotics and Automation (ICRA), pages 504–511, 2016.
 [5] F. Berkenkamp, A. P. Schoellig, and A. Krause. Safe Controller Optimization for Quadrotors with Gaussian Processes. In Proc. of the Intl. Conference on Intelligent Robots and Systems (IROS), pages 491–496, 2016.
 [6] C. Ostafew, A. P. Schoellig, and T. Barfoot. Conservative to Confident: Treating Uncertainty Robustly Within LearningBased Control. In Proc of the Intl. Conf. on Robotics and Automation (ICRA), pages 421–427, 2015.
 [7] C. Ostafew, A. P. Schoellig, and T. Barfoot. Robust Constrained Learningbased NMPC Enabling Reliable Mobile Robot Path Tracking. Intl. Journal of Robotics Research (IJRR), 35(13):1547–1563, 2016.
 [8] A. Aswani, H. Gonzalez, S. Sastry, and C. Tomlin. Provably Safe and Robust Learningbased Model Predictive Control. Automatica, 49(5):1216–1226, 2013.
 [9] G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots, and E. A. Theodorou. Information Theoretic MPC for ModelBased Reinforcement Learning. In Proc. of the International on Robotics and Automation (ICRA), pages 1714–1721, 2017.
 [10] B. Luders, I. Sugel, and J. How. Robust Trajectory Planning for Autonomous Parafoils Under Wind Uncertainty. In Proc. of the AIAA Conference on Guidance, Navigation and Control, 2013.

[11]
Q. Li, J. Qian, Z. Zhu, X. Bao, M. K. Helwa, and A. P. Schoellig.
Deep Neural Networks for Improved, Impromptu Trajectory Tracking of Quadrotors.
In Proc. of the Intl. Conf. on Robotics and Automation (ICRA), pages 5183–5189, 2017.  [12] G. Aoude, B. Luders, J. Joseph, N. Roy, and J. How. Probabilistically Safe Motion Planning to Avoid Dynamic Obstacles with Uncertain Motion Patterns. Autonomous Robots, 35(1):51–76, 2013.
 [13] M. Paton, F. Pomerleau, K. MacTavish, C. Ostafew, and T. Barfoot. Expanding the Limits of Visionbased Localization for Longterm Routefollowing Autonomy. Journal of Field Robotics (JFR), 34(1):98–122, 2017.
 [14] A. Angelova. Visual Prediction of Rover Slip: Learning Algorithms and Field Experiments. Phd thesis, California Institute of Technology, 2008.
 [15] Christopher D. McKinnon and Angela P. Schoellig. Learning MultiModal Models for Robot Dynamics with a Mixture of Gaussian Process Experts. In Proc. of the International Conference on Robotics and Automation (ICRA), 2017.
 [16] C. Ostafew, A. P. Schoellig, and T. Barfoot. Learningbased Nonlinear Model Predictive Control to Improve VisionBased Mobile Robot Pathtracking in Challenging Outdoor Environments. In Proc. of the Intl. Conf. on Robotics and Automation (ICRA), pages 4029–4036, 2014.
 [17] J. Gillula and C. Tomlin. Reducing Conservativeness in Safety Guarantees by Learning Disturbances Online: Iterated Guaranteed Safe Online Learning. In Proc. of Robotics: Science and Systems (RSS), pages 81–88, 2012.
 [18] P. Bouffard, A. Aswani, and C. Tomlin. Learningbased Model Predictive Control on a Quadrotor: Onboard Implementation and Experimental Results. In Proc. of the Intl. Conference on Robotics and Automation (ICRA), pages 279–284, 2012.
 [19] J. Mahler, S. Krishnan, M. Laskey, S. Sen, A. Murali, B. Kehoe, S. Patil, J. Wang, M. Franklin, P. Abbeel, et al. Learning Accurate Kinematic Control of Cabledriven Surgical Robots using Data Cleaning and Gaussian Process Regression. In Proc. of the Intl. Conference on Automation Science and Engineering (CASE), pages 532–539, 2014.
 [20] K. Jo, K. Chu, and M. Sunwoo. Interacting Multiple Model Filterbased Sensor Fusion of GPS with Invehicle Sensors for Realtime Vehicle Positioning. Transactions on Intelligent Transportation Systems, 13(1):329–343, 2012.
 [21] R. Calandra, S. Ivaldi, M. Deisenroth, E. Rueckert, and J. Peters. Learning Inverse Dynamics Models with Contacts. In Intl. Conf. on Robotics and Automation (ICRA), pages 3186–3191, 2015.
 [22] R. Pautrat, K. Chatzilygeroudis, and J. Mouret. Bayesian optimization with automatic prior selection for dataefficient direct policy search. In arXiv:1709.06919, 2017.
 [23] E. Fox, E. Sudderth, M. Jordan, and A. Willsky. Nonparametric Bayesian Learning of Switching Linear Dynamical Systems. In Proc. of Advances in Neural Information Processing Systems (NIPS), pages 457–464, 2009.

[24]
C. Rasmussen and C. Williams.
Gaussian Processes for Machine Learning
. MIT Press, 2006.
Comments
There are no comments yet.