I Introduction
At the core of most control algorithms is a model that captures the relationship between the state, input, and dynamics of a robotic system. During tasks such as repetitive path following, which are common in mining [1], agriculture, and logistics, a robot continuously gathers data which can be used to refine this model. If model errors are repetitive, learningbased control can be used to leverage data gathered over long periods of time to reduce the errors [2]. This can be combined with fast adaptation can be used to adapt quickly to novel scenarios [3, 4]. An accurate assessment of the risk associated with taking a control action, especially if it has not been taken before, is important during this process [5]. Using an assessment of this risk to ensure safety is known as safe learning.
Safe learning methods generally incorporate an approximate initial guess for the system dynamics with some bounds on the modelling error incurred in the approximation [5, 6]. A learning term then refines the initial guess over time using data to better approximate the true dynamics. The goal is to guarantee that the system does not violate safety constraints (e.g., limits on the control input or path tracking error) while achieving the control objective (e.g., following a path) and, at the same time, improving the model of the system and, consequently, its task performance over time. Most learning algorithms learn a single model for the system dynamics or use multiple models that are trained ahead of time based on appropriate training data from operating the robot in all relevant conditions [5, 7, 8, 9, 10, 11]. This presents a challenge for robots that are deployed into a wide range of operating conditions which may not all be known ahead of time.
In our previous work [2], we used Gaussian Processes (GPs) to learn the robot dynamics in a number of different operating conditions by leveraging experience gathered over multiple traverses of a path. However, we found that they have a number of limitations that make them difficult to apply in a wide range of operating conditions. First, they are computationally expensive, which limits the number of training points that can be used in the model for control [2]
. This limits the region of the input space over which the GP is accurate. Second, it was difficult to identify a set of hyperparameters that resulted in good closedloop performance. For this reason, we used a fixed set of hyperparameters which limited the range of operating conditions where the learning was effective. Third, using a GP assumes that the unknown dynamics are globally homoscedastic, even though we only fit the model locally along the path, which further limits the effectiveness of a GPbased approach.
In this paper, we propose a new approach to address these limitations by using weighted Bayesian Linear Regression (BLR) to model the actuator dynamics locally along the path. BLR is computationally inexpensive to fit and evaluate. This enables us to use more previous experience to learn repetitive model errors and current data to adapt quickly to novel operating conditions. Second, the model uncertainty does not depend on hyperparameters that must be preselected and so it can be adapted locally to each section of the path to allow for globally heteroscedastic model errors. This allows us to adapt to a wide range of operating conditions. Finally, the linear function does not introduce nonlinearity into the dynamics which is beneficial for our Model Predictive Control (MPC)based control strategy. In this paper, we also show how the model can be combined with Tube MPC to double the lookahead horizon of our previous approach to three seconds.
Ii Related Work
Learning control has received a great amount of attention in recent years, most notably in the case of singlemode learning control. This is the broad class of learning methods that assumes the true (but initially unknown) mapping between the state, , and the control input at time , and the state at the next time step,
, is onetoone or, at least, normally distributed according to some underlying process. Recent developments have contributed controller designs with safety guarantees
[6] and demonstrated impressive results in improved path following [5, 2].Most singlemode learning controllers use a nonlinear learning term to improve performance of a linear or nonlinear prior model over time using data [6, 5]. Common learning terms include Gaussian Processes (GPs) and various forms of locally linear regression[12, 13, 14, 5, 15, 16]. The main challenge with these methods is that they require many parameters to be identified ahead of time, and the performance depends greatly on the effort put into tuning these parameters [15]. This is problematic if we want to deploy a robotic system in a wide range of environments which may introduce operating conditions that require the parameters to be adjusted. In contrast to these approaches, we learn a linear model for the actuator dynamics with a nonlinear model for the robot dynamics. This greatly simplifies the learning problem allowing us to generalize to a wide range of operating conditions while still controlling a nonlinear system. In addition, our approach is computationally efficient but does not introduce additional nonlinearity into the robot dynamics, which is beneficial for the optimization in our MPC control strategy.
In addition to the singlemode learning control, multimodal algorithms exist, which identify a number of dynamic modes ahead of time using labelled or unlabelled training data and switch to the most likely model during operation [17, 9, 11, 18, 19]. This allows them to maintain persistent knowledge of a robot’s dynamics across a wide range of operating conditions. They infer the correct mode from measurements during operation to maintain a high level of performance even when the mode is not directly observed. These approaches do, however, require that the number of modes and/or training data from each mode are available ahead of time, which can be a challenging task in realworld robotics applications. Our previous approach [2] addressed this by selecting data from the relevant operating condition and constructing the model online, however to maintain realtime capability, we had to choose a few points from the most relevant previous run to include in the model so did not make full use of all past experience. In addition, we handled new scenarios by reverting to a conservative model rather than adapting quickly. In this work, we use all previous data over the upcoming section of the path and incorporate fast adaptation to adapt quickly to new scenarios.
To adapt quickly to new scenarios, some authors have proposed a combination of fast adaptation and longterm learning [4, 3]. In [4], an adaptive controller keeps the system behaving like a constant reference system in the presence of nonrepetitive disturbances and changes in the dynamics, while the (singlemode) iterative learning compensates for repetitive path tracking errors. The authors in [3]
also used a fast, adaptive, linear term to adapt to new scenarios but used neural network priors for the robot dynamics as their longterm learning. This approach was able to achieve impressive results in experiment, however, the neural networks required several hours of training data, were not adapted online, and are not in themselves probabilistic models. In this work, we use fast adaptation and longterm learning in one, unified, probabilistic framework. Our longterm learning starts acting after the first run and implicitly learns errors local to each section of the path in a way that is tailored specifically to predictive control.
In light of the current approaches and their limitations, the contributions of the paper are (i) to present a model learning framework that supports fast adaptation, longterm learning, and is tailored to predictive control; (ii) to incorporate that model (and its model uncertainty estimate) in a safe and robust predictive control scheme; and (iii) to demonstrate the advantage of fast adaptation and longterm learning in path tracking experiments over challenging terrain.
Iii Problem Statement
The goal of this work is to learn a probabilistic model for the dynamics of a ground robot performing a repetitive task, and show how it can be integrated with a stateoftheart path following controller for high performance control while maintaining a quantitative measure of safety. The robot may be subjected to changes in its dynamics due to factors such as payload, terrain, or tyre pressure. We assume that these factors cannot be measured directly. A good algorithm should scale to longterm operation, take advantage of repeated runs in the same operating conditions, and adapt quickly to new operating conditions. The model must include a reasonable estimate of model uncertainty that acts as an upper bound on model error at all times.
Further assumptions are that the plant has locally linear actuator dynamics, and the number of operating conditions and dynamics in each operating condition are not known ahead of time. Locally linear means that the true plant dynamics, , where is the actual actuation output, are well approximated by a function of the form:
(1) 
over the lookahead horizon used by the predictive controller, where is the state of the plant,
is a vector containing the state,
, and the control input that drives that actuator, , is a matrix of weights we wish to learn, and is additive, zeromean, Gaussian noise with covariance matrix . The subscript, , indicates the time step. We have replaced with , where and are learned. This allows us to keep the dominant part of the nonlinear plant dynamics and greatly simplifies the learning problem.The system is constrained by polytopic state and input constraints:
(2) 
We assume a Gaussian belief over the state at each time step and enforce constraints probabilistically using a chanceconstrained formulation so that the probability of violating state and input constraints is kept below an acceptable threshold. Since enforcing these constraints jointly can lead to undesirable, conservative behaviour, we enforce them individually, see
[12] for a detailed explanation.Iv Methodology
In this section, we present our approach for longterm, safe learning control with fast adaptation. Our approach makes extensive use of BLR to model the system dynamics. We assume a known nonlinear model for the plant with unknown linear actuator dynamics. We use weighted BLR to determine the coefficients of the actuator dynamics model and a measure of run similarity to determine the data weights. This allows us to compute the posterior for the model parameters in closed form, avoiding iterative approaches such as [13], which also optimizes the data weights. We then formulate the control problem as a Tube MPC problem following work in [12, 20] but using a modified ancillary controller.
Iva Weighted Bayesian Linear Regression
In this section, we give a brief overview of weighted BLR, which is use to learn the actuator dynamics. It is an extension of BLR, as presented in [21], and a modification of [13], where we assume a known data weighting. The main advantages of BLR are that it is computationally inexpensive and, does not introduce additional nonlinearity into the model, which is beneficial for MPC. We introduce BLR for a scalar system. Multiple scalar systems can be combined to get an equation of the form in (1).
We consider locally linear models of the form:
(3) 
to model the dynamics of each actuator, where and .
A dataset will consist of pairs , each with a scalar weight , where means the point has no influence on the solution, and means the point is fully included in the regression. Given a weighted dataset, , and the model (3), the likelihood of , assuming each data point is independent, is:
(4) 
where is a vector of stacked , and is a matrix with rows .
With this likelihood, the conjugate prior is a Normal Inverse Gamma distribution which gives us the following priors for
and [21](5)  
(6) 
where is the prior mean for the weights, is a prior inverse sum of squares of , and and are the parameters of the Inverse Gamma distribution, which are proportional to the effective number of data points in the prior and
times the prior output variance.
The likelihood, (4), can be manipulated into a NIG distribution over so that (5) and (6
) form a conjugate prior and the posterior joint distribution over
and is:(7) 
where,
(8)  
(9)  
(10)  
(11) 
where is the trace operator and is a diagonal matrix of the data weights.
The posterior marginals are then:
(12)  
(13) 
where is a Student t distribution. This gives us all of the components we need to make predictions of the state at future timesteps. It is important to note that while the uncertainty in decreases as more data is added, the mean value for can increase or decrease to reflect the data. The model uncertainty is then passed to the robust controller. This is in contrast to a GP where the uncertainty only decreases to a value determined by the hyperparameters as data is added.
IvB Data Management
The purpose of our method is to construct the best possible model of the system dynamics for MPC. MPC uses the dynamics over the upcoming section of the path to compute the control input. We use data from the recently traversed part of the path to determine which past runs are most similar to the current run. We then use data from these runs over the upcoming section of the path to build the prediction model for MPC. Since we are interested in learning dynamics of the form (3), individual experiences correspond to pairs .
Referring to Fig. 2, for each previous run, , we use data, , from the recently traversed section of the path to construct a local model for the actuator dynamics, . Each is then used to generate predictions for the mean and variance of the actuator output, , at each of the recent experiences from the current run, . These estimates are used to identify the most similar run to the current run and reject runs that are substantially different from the current run.
IvB1 Fast Adaptation
This section describes how we use data from the current run to quickly adapt to nonrepetitive changes or new dynamics that the system has not experienced before. This was not feasible in our previous approach, based on GPs, since the number of points in each GP model was limited due to computational constraints and so we focused entirely on modelling the dynamics along the upcoming section of the path using data from previous runs [2].
We recursively update the model parameters using data from the live run while keeping the strength of the prior fixed at . The value of determines how many effective data points we attribute to the prior. A large value for results in a smoother estimate for the model parameters, while a smaller value for allows the system to adapt to sudden changes more quickly. To update the model, we use the most recent data point from the live run and equations (8)  (11) with a data weight of 1.0. Then, we reweight the model parameters using:
(14)  
(15) 
which is equivalent to constructing a new model using the sufficient statistics of the existing model weighted by a factor of . This can be derived by using (8)  (11), a noninformative prior, and , where
is an identity matrix of the appropriate size.
This allows the model to quickly adapt to the dynamics on the current run. This is effective for changes to the dynamics such as adding a large payload and driving on a uniform surface since the model parameters are relatively constant for the entire run. The new model parameters are used as the prior for the longterm learning update and as the prior for the next timestep.
IvB2 Longterm Learning
In this section, we explain how we select experiences gathered over an arbitrarily long period of time to improve the estimate of the robot dynamics over the upcoming section of the path. This is effective to account for changes in the dynamics due to factors such as terrain type which are local to each section of the path.
In the previous section, it was assumed that the data weights, , were known. In this section, we describe our algorithm for determining these weights. Keeping this as a separate step allows us to determine the posterior for the model parameters, , in closed form. For this, we rely heavily on our previous approach and give a brief outline here but refer the reader to [2] for details.
For each candidate run , we first test whether the proportion of recent experiences from the live run that lie further than expected from the predicted mean given the model from run is higher than would be expected by chance. We do this using the binomial test (see [2] for details). Runs that fail this test are not used for fitting the current local model.
Our next step is to identify which runs are most similar to the current run,
. We can compute the posterior probability of model
using:(16) 
The second term on the right is the prior, which we assume to be equal for all runs; however, it could be informed by other sources such as computer vision, a weather report, or user input. The first term on the right is the probability of recent experiences from the live run if the operating conditions are the same as run
. Assuming each experience is independent, this is:(17) 
Similar to our previous work [2], we reject any run that has lower probability than the prior of generating . This is to ensure that experience added to the BLR model is likely to improve the performance beyond what could be achieved with no additional experience.
The data weights for each experience in a run are set proportional to the run posterior (16) and normalised by their maximum value. This automatically satisfies and means that the effective number of points can increase with each additional run. If we simply used the run posterior probability, the effective number of points used to update the model would remain the average number of points in a run over the portion of the path ahead of the vehicle, which limits how much the model can improve over time.
Once the weights have been computed, we update the parameters for the predictive model using (8)(11). These parameters are only used to compute the control for one timestep and do not affect the prior at the next timestep because we will have measured the vehicle’s dynamics over the current section of the path and our estimate for might change.
IvC Path Following MPC Controller Design
This section outlines our MPC formulation including the path parametrization, cost function, ancillary control design, and uncertainty propagation.
IvC1 Path Parametrization
For our controller design, we closely follow the results presented in [20] on Model Predictive Contouring Control (MPCC). The central idea is to drive a path parameter, , that corresponds to a location along the path, using a virtual input, , that is solved as part of the MPC optimization problem. The path parameter has dynamics:
(18) 
and increases monotonically towards the end of the path. Reference states can then be accessed by querying the reference at the path parameter, , for a given timestep. Progress along the path can then be traded against tracking error and other performance metrics in the cost function.
IvC2 Tracking Error
In contouring control , the position error is divided into two components [20]: (i), lag error, which is tangent to the path and (ii) contouring error, which is perpendicular to the path. The others states are compared directly to their references. Let be the stacked states and inputs over the prediction horizon. The cost associated with tracking for a given trajectory over the prediction horizon is:
(19) 
where is a diagonal matrix of penalties for each component of error at each timestep in the prediction horizon and is the matrix that projects position error onto a pathaligned frame. We penalize the rate of change of the inputs to avoid exciting unmodelled dynamics using a quadratic penalty on the numerical derivatives of the inputs, .
IvC3 Uncertainty Propagation
We assume a Gaussian belief over the state at each time step and nonlinear dynamics for the plant. This allows us to use the Extended Kalman Filter (EKF) prediction equations to propagate our belief of the state into the future given a series of inputs
[12]. For the prediction, we include uncertainty in the state, actuator model parameters, and actuator model output. Let be the Jacobian with respect to the stacked state and parameters, . The mean, , and covariance, , can be updated using the EKF prediction equations:(20)  
(21)  
(22) 
where is a block diagonal matrix containing the model weight covariance from (13) for each actuator, and is the process noise covariance. The only nonzero components in are the diagonal elements corresponding to uncertainty in the output of the actuators for which we use the posterior mean from (12). Alternatives to EKF such as the Sigma Point filter or Monte Carlo could be used here at slightly higher computational cost for a small increase in accuracy [5, 12].
The predicted uncertainty can be used to compute a robust set around the mean prediction from the model that the true system is guaranteed to lie within with high probability.
IvC4 Ancillary State Feedback Controller
The method for uncertainty propagation in Sec. IVC3 provides a robust bound on the prediction error, but does not take into account the fact that the controller can take corrective actions to reduce the predicted uncertainty [12]. The result is that the predicted uncertainty can grow quickly and without bound greatly reducing the size of the predicted robust, feasible set [12]. A common approach to account for feedback when predicting uncertainty is to use Tube MPC [6] and to parametrize the control policy as an linear state feedback controller with a feedforward control input. We can then optimize over the feedforward control input to drive the mean state of the system while the feedback controller limits uncertainty growth around the mean[12].
In contrast to other approaches for tube MPC for nonlinear systems, we make use of the fact that our actuator dynamics are linear to design linear ancillary controllers for these states. This keeps the uncertainty in these states bounded, which limits the uncertainty growth in other states over the prediction horizon. Section VB shows how we apply this to a unicycletype robot.
IvC5 Constraint Tightening
Since our plant model has uncertainty, we must tighten the constraints on the state and input to make sure the true system respects the true constraints (with high probability), and that the ancillary control policy remains robustly feasible for our choice of the inputs. Our treatment of the constraint tightening follows [12]. For contouring error, , our chance constraints can be reexpressed as:
(23)  
(24) 
where
is the quantile of the Gaussian CDF corresponding to the small probability of violating the contouring constraint
(e.g. 2.0 for ) [12], and is a unit vector perpendicular to the path at . Other constraints on the state may be treated analogously.Analogous treatment of the input constraints yields:
(25)  
(26) 
where is the th of , is an associated ancillary gain which acts on an error of our choosing, , and
is the standard deviation associated with that error. Here, we can see that while the ancillary controller reduces the prediction uncertainty it will also reduce the control input available for controlling the nominal state.
IvD Optimal Control Problem
At each timestep, we wish to solve for the optimal states and inputs subject to a set of safety constraints derived from the model uncertainty, path tracking error and actuator constraints. The decision variable is .
This leads to the following optimization problem:
(27)  
subject to  (28)  
(29)  
(30) 
where is a stacked vector of small, acceptable probabilities of violating each state and input constraint, which must be solved at every timestep. To solve the nonlinear optimization problem efficiently, we linearise about the initial guess from the previous timestep and solve it as a sequential quadratic program.
V Application to a Ground Robot
This section outlines how to apply our method to the unicycle ground robot pictured in Fig. 3.
Va System Model
We consider the robot system (see Fig. 3) to be well approximated by a unicycle model with first order translational and rotational dynamics:
(31) 
where and are the position of the vehicle at time step , is the heading, is the time step, and and are the forward speed and turn rate about the body axis. The control inputs are the commanded speed, , and turn rate, . The parameters of the model that we wish to learn are the coefficients of the firstorder actuator dynamics, and , which are column vectors. With and , this is of the form (1). The process noise in (22) is the variance of and , so .
VB Ancillary Control Design for the Unicycle with First Order Actuator Dynamics
The ancillary controller is meant to reduce uncertainty growth to maintain a large robust, feasible set over the prediction horizon. For the unicycle, lateral uncertainty growth (which is constrained) depends on heading uncertainty and speed. Keeping uncertainty in these states low therefore keeps the lateral uncertainty low maintaining a large robust, feasible set. With a linear feedback controller on the heading and speed error, the speed and turn rate dynamics become:
(32) 
where is the difference between the state and the predicted mean at time step . These controllers keep the system close to the predicted speed and heading.
Vi Experiments
Experiments were conducted on a 900 kg Clearpath Grizzly skidsteer ground robot shown in Fig. 3. First, we compare the predictive performance of a GP to our proposed method on a dataset with varied payload and terrain type. Second, we demonstrate the effectiveness of each component of our algorithm in closed loop. Finally, we demonstrate the path tracking performance of our algorithm at high speed on a 175 m offroad course.
Via Implementation
Our algorithm was implemented in C++ on an Intel i7 2.70 GHz 8 core processor with 16 GB of RAM. Our controller relies on a visionbased system, Visual Teach and Repeat [23], for localization, which is running on the same laptop as the controller. The control runs at 10 Hz with a three second lookahead discretized by 30 points. The optimization problem (27)(30) is relinearized three times, taking an average of 70 ms to compute the control. The model updates (Sec. IVB1 and IVB2) are executed at every time step.
We consider the last three seconds of data (30 samples) from the live run for . The penalties on lag, contouring, heading, speed, and turn rate error are 50, 200, 200, 2, and 2 respectively. The penalties on commanded speed, turn rate, and reference speed from their references are 1, 1, and 50 respectively. The penalties on rate of change of commands in the same order are 10, 15, and 5. The maximum lateral error is 2 m, is 1, and the ancillary controller gains are both . The prior strength, , was set to 100. For the high speed experiment, we increased the penalty on commanded turning acceleration from 15 to 20 to achieve smoother performance on the rough terrain.
ViB Model Predictive Performance Comparison
In order to evaluate the suitability of the proposed method for predictive control, we first evaluate the predictive performance of each component of the proposed method (Sec. IVB1 and IVB2) and their combination compared to a GP trained on data from the upcoming section of the path from the previous run. We consider the rotational dynamics because they differ the most between configurations. We used the same GP hyperparameters as in our previous paper [24] since these have been tested extensively in closed loop. We consider a prediction horizon of 3 s, since this was used by the proposed controller in our closedloop experiments.
To measure the accuracy of the prediction of the mean, we use the MultiStep RMS Error (MRMSE) between the predictions made over the lookahead horizon and the measured state at these times [2]
. To measure the accuracy of the error bound, we use the MultiStep RMS Zscore (MRMSZ) of the prediction at the future time steps
[2].Referring to Figure 4, the proposed methods achieve lower MRMSE than the GP indicating high accuracy. The MRMSZ for the proposed method is below 1.0 for all of the configurations, which demonstrates that the proposed method maintains a good estimate of model uncertainty. Therefore, the proposed method is a good candidate for safe, predictive learning control in a wide range of operating conditions.
ViC Closed Loop Tracking Performance Comparison
To demonstrate the impact of each component of our method on closedloop performance and show that our method can adapt to repetitive model errors, we drive the vehicle on a course that consists of two laps of a circular course and apply an artificial disturbance by multiplying the turn rate commands by 0.5 at the start of the second lap (vertex 100 in Fig. 5). We compare the tracking performance of each component of our algorithm over eight repeats of the path. For this experiment, the desired speed was 2 m/s.
Figure 5 shows that all methods achieve similar performance before the disturbance is applied because the model for all methods was a good representation of the vehicle dynamics over this portion of the path. After this point, the nonlearning controller incurs a large lateral error because the model is no longer accurate. Longterm learning (Sec. IVB2) similarly incurs a large path tracking error on the first run (see Figure 6) since there are no previous runs with experience. However, after the first run, it improves greatly but then converges slowly because it is constantly working against a static prior (the same model used for the nonlearning comparison), that is incorrect after the disturbance is applied. When fast adaptation (Sec. IVB1
) is enabled, the controller incurs a large tracking error at the moment the disturbance is applied but adapts quickly to the new robot dynamics to achieve low error as expected. When both fast adaptation and longterm learning are enabled, the fast adaptation keeps the prior close to the true dynamics such that the longterm learning is able to reduce the transient error by leveraging data from the upcoming section of the path. This combination achieves the lowest path tracking error and the fastest convergence (see Fig.
6).ViD High Speed Tracking Performance
Finally, we evaluated the performance of our controller on a 175 m offroad course with tight turns and fast straights. The desired speed was 3 m/s and the controller achieved an average speed of 1.6 m/s with a top speed of 2.7 m/s and a RMS lateral error of 0.25 m. This is a significant improvement over our previous work, where the controller achieved an average speed around 1.0 m/s on pavement [2].
Vii Conclusions
In this paper, we have proposed a new method for longterm, safe learning control based on local, weighted BLR. This method is computationally inexpensive which enables fast model updates and allows us to leverage large amounts of data gathered over previous traverses of a path. This enables both fast adaptation to new scenarios and highaccuracy tracking in the presence of repetitive model errors. The model parameters can be determined reliably online which enables our method to be applied in a wide range of operating conditions with little to no tuning. We have demonstrated the effectiveness of the proposed approach in a range of challenging, offroad experiments. We encourage the reader to watch our video at http://tiny.cc/fastslowlearn showing the experiments and datasets used in this paper.
References
 [1] L. G. Dekker, J. A. Marshall, and J. Larsson. IndustrialScale Autonomous WheeledVehicle Path Following by Combining Iterative Learning Control with Feedback Linearization. In Proc. of the Intl. Conference on Intelligent Robots and Systems (IROS), pages 2643–2648, 2017.
 [2] C. McKinnon and A.P. Schoellig. ExperienceBased Model Selection to Enable LongTerm, Safe Control for Repetitive Tasks Under Changing Conditions. In Proc. of the Intl. Conference on Intelligent Robots and Systems (IROS), page accepted, 2018.
 [3] J. Fu, S. Levine, and P. Abbeel. Oneshot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors. In Proc. of the Intl. Conference on Intelligent Robots and Systems (IROS), pages 4019–4026, 10 2016.
 [4] K. Pereida, D. Kooijman, R. Duivenvoorden, and A. P. Schoellig. Transfer Learning for Highprecision Trajectory Tracking Through L1 Adaptive Feedback and Iterative Learning. Intl. Journal of Adaptive Control and Signal Processing, 52(6):802–823, 2018.
 [5] C. Ostafew, A. P. Schoellig, and T. Barfoot. Robust Constrained Learningbased NMPC Enabling Reliable Mobile Robot Path Tracking. Intl. Journal of Robotics Research (IJRR), 35(13):1547–1563, 2016.
 [6] A. Aswani, H. Gonzalez, S. Sastry, and C. Tomlin. Provably Safe and Robust Learningbased Model Predictive Control. Automatica, 49(5):1216–1226, 2013.

[7]
C. Xie, S. Patil, T. Moldovan, S. Levine, and P. Abbeel.
Modelbased Reinforcement Learning with Parametrized Physical Models and OptimismDriven Exploration.
In Proc. of the Intl. Conf. on Robotics and Automation (ICRA), pages 504–511, 2016.  [8] G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots, and E. A. Theodorou. Information Theoretic MPC for ModelBased Reinforcement Learning. In Proc. of the International on Robotics and Automation (ICRA), pages 1714–1721, 2017.
 [9] B. Luders, I. Sugel, and J. How. Robust Trajectory Planning for Autonomous Parafoils Under Wind Uncertainty. In Proc. of the AIAA Conference on Guidance, Navigation and Control, 2013.
 [10] Q. Li, J. Qian, Z. Zhu, X. Bao, M. K. Helwa, and A. P. Schoellig. Deep Neural Networks for Improved, Impromptu Trajectory Tracking of Quadrotors. In Proc. of the Intl. Conf. on Robotics and Automation (ICRA), pages 5183–5189, 2017.
 [11] G. Aoude, B. Luders, J. Joseph, N. Roy, and J. How. Probabilistically Safe Motion Planning to Avoid Dynamic Obstacles with Uncertain Motion Patterns. Autonomous Robots, 35(1):51–76, 2013.
 [12] L. Hewing and M. N. Zeilinger. Cautious Model Predictive Control Using Gaussian Process Regression. In arXiv:1705.10702, 2017.
 [13] J. Ting, A. D’Souza, S. Vijayakumar, and S. Schaal. A Bayesian Approach to Empirical Local Linearization for Robotics. In Proc. of Intl. Conf on Robotics and Automation, pages 2860–2865, 2008.
 [14] V. Desaraju, A. Spitzer, and N. Michael. Experiencedriven Predictive Control with Robust Constraint Satisfaction under TimeVarying State Uncertainty. In Proc. of Robotics: Science and Systems Conference (RSS), 2017.
 [15] G. Sicard, C. Salaün, S. Ivaldi, V. Padois, and O. Sigaud. Learning the Velocity Kinematics of ICUB for Modelbased Control: XCSF versus LWPR. In Proc. of the Intl. Conf. on Humanoid Robots, pages 570–575, 2011.
 [16] Y. Gao, A. Gray, H. Tseng, and F. Borrelli. A Tubebased Robust Nonlinear Predictive Control Approach to Semiautonomous Ground Vehicles. Vehicle System Dynamics, 52(6):802–823, 2014.
 [17] K. Jo, K. Chu, and M. Sunwoo. Interacting Multiple Model Filterbased Sensor Fusion of GPS with Invehicle Sensors for Realtime Vehicle Positioning. Transactions on Intelligent Transportation Systems, 13(1):329–343, 2012.
 [18] R. Calandra, S. Ivaldi, M. Deisenroth, E. Rueckert, and J. Peters. Learning Inverse Dynamics Models with Contacts. In Intl. Conf. on Robotics and Automation (ICRA), pages 3186–3191, 2015.
 [19] R. Pautrat, K. Chatzilygeroudis, and J. Mouret. Bayesian Optimization with Automatic Prior Selection for DataEfficient Direct Policy Search. In arXiv:1709.06919, 2017.
 [20] D. Lam, C. Manzie, and M. Good. Model Predictive Contouring Control. In Conf. on Decision and Control (CDC), pages 6137–6142, 2010.
 [21] K. Murphy. Machine Learning: A Probabilistic Perspective. MIT Press, 2012.
 [22] M. Vitus and C. Tomlin. On Feedback Design and Risk Allocation in Chance Constrained Control. In Proc. of the Decision and Control and European Control Conference (CDCECC), pages 734–739, 2011.
 [23] M. Paton, F. Pomerleau, K. MacTavish, C. Ostafew, and T. Barfoot. Expanding the Limits of Visionbased Localization for Longterm Routefollowing Autonomy. Journal of Field Robotics (JFR), 34(1):98–122, 2017.
 [24] Christopher D. McKinnon and Angela P. Schoellig. Learning MultiModal Models for Robot Dynamics with a Mixture of Gaussian Process Experts. In Proc. of the Int’l Conf. on Robotics and Automation (ICRA), 2017.