Learn Fast, Forget Slow: Safe Predictive Learning Control for Systems with Unknown, Changing Dynamics Performing Repetitive Tasks

10/15/2018 ∙ by Christopher D. McKinnon, et al. ∙ 0

We present a control method for improved repetitive path following for a ground vehicle that is geared towards long-term operation where the operating conditions can change over time and are initially unknown. We use weighted Bayesian Linear Regression to model the unknown actuator dynamics, and show how this simple model is more accurate in both its estimate of the mean behaviour and model uncertainty than Gaussian Process Regression and generalizes to novel operating conditions with little or no tuning. In addition, it allows us to use fast adaptation and long-term learning in one, unified framework, to adapt quickly to new operating conditions and learn repetitive model errors over time. This comes with the added benefit of lower computational cost, longer look-ahead, and easier optimization when the model is used in a robust, Model Predictive controller (MPC). In order to fully capitalize on the long prediction horizons that are possible with this new approach, we use Tube MPC to reduce predicted uncertainty growth. We demonstrate the effectiveness of our approach in experiment on a 900 kg ground robot showing results over 2.7 km of driving with both physical and artificial changes to the robot's dynamics. All of our experiments are conducted using a stereo camera for localization.



There are no comments yet.


page 1

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

At the core of most control algorithms is a model that captures the relationship between the state, input, and dynamics of a robotic system. During tasks such as repetitive path following, which are common in mining [1], agriculture, and logistics, a robot continuously gathers data which can be used to refine this model. If model errors are repetitive, learning-based control can be used to leverage data gathered over long periods of time to reduce the errors [2]. This can be combined with fast adaptation can be used to adapt quickly to novel scenarios [3, 4]. An accurate assessment of the risk associated with taking a control action, especially if it has not been taken before, is important during this process [5]. Using an assessment of this risk to ensure safety is known as safe learning.

Fig. 1: Block diagram showing the proposed model learning method in closed-loop with a safe controller. The system dynamics can change from one run to another and over the course of a run. We use weighted Bayesian Linear Regression (BLR) to learn the actuator dynamics of the plant. This approach, which enables fast adaptation and long-term learning, is shown to be highly effective in experiment. We encourage the reader to watch our video showing the experiments and datasets used in this paper http://tiny.cc/fast-slow-learn.

Safe learning methods generally incorporate an approximate initial guess for the system dynamics with some bounds on the modelling error incurred in the approximation [5, 6]. A learning term then refines the initial guess over time using data to better approximate the true dynamics. The goal is to guarantee that the system does not violate safety constraints (e.g., limits on the control input or path tracking error) while achieving the control objective (e.g., following a path) and, at the same time, improving the model of the system and, consequently, its task performance over time. Most learning algorithms learn a single model for the system dynamics or use multiple models that are trained ahead of time based on appropriate training data from operating the robot in all relevant conditions [5, 7, 8, 9, 10, 11]. This presents a challenge for robots that are deployed into a wide range of operating conditions which may not all be known ahead of time.

In our previous work [2], we used Gaussian Processes (GPs) to learn the robot dynamics in a number of different operating conditions by leveraging experience gathered over multiple traverses of a path. However, we found that they have a number of limitations that make them difficult to apply in a wide range of operating conditions. First, they are computationally expensive, which limits the number of training points that can be used in the model for control [2]

. This limits the region of the input space over which the GP is accurate. Second, it was difficult to identify a set of hyperparameters that resulted in good closed-loop performance. For this reason, we used a fixed set of hyperparameters which limited the range of operating conditions where the learning was effective. Third, using a GP assumes that the unknown dynamics are globally homoscedastic, even though we only fit the model locally along the path, which further limits the effectiveness of a GP-based approach.

In this paper, we propose a new approach to address these limitations by using weighted Bayesian Linear Regression (BLR) to model the actuator dynamics locally along the path. BLR is computationally inexpensive to fit and evaluate. This enables us to use more previous experience to learn repetitive model errors and current data to adapt quickly to novel operating conditions. Second, the model uncertainty does not depend on hyperparameters that must be pre-selected and so it can be adapted locally to each section of the path to allow for globally heteroscedastic model errors. This allows us to adapt to a wide range of operating conditions. Finally, the linear function does not introduce nonlinearity into the dynamics which is beneficial for our Model Predictive Control (MPC)-based control strategy. In this paper, we also show how the model can be combined with Tube MPC to double the look-ahead horizon of our previous approach to three seconds.

Ii Related Work

Learning control has received a great amount of attention in recent years, most notably in the case of single-mode learning control. This is the broad class of learning methods that assumes the true (but initially unknown) mapping between the state, , and the control input at time , and the state at the next time step,

, is one-to-one or, at least, normally distributed according to some underlying process. Recent developments have contributed controller designs with safety guarantees

[6] and demonstrated impressive results in improved path following [5, 2].

Most single-mode learning controllers use a nonlinear learning term to improve performance of a linear or nonlinear prior model over time using data [6, 5]. Common learning terms include Gaussian Processes (GPs) and various forms of locally linear regression[12, 13, 14, 5, 15, 16]. The main challenge with these methods is that they require many parameters to be identified ahead of time, and the performance depends greatly on the effort put into tuning these parameters [15]. This is problematic if we want to deploy a robotic system in a wide range of environments which may introduce operating conditions that require the parameters to be adjusted. In contrast to these approaches, we learn a linear model for the actuator dynamics with a nonlinear model for the robot dynamics. This greatly simplifies the learning problem allowing us to generalize to a wide range of operating conditions while still controlling a nonlinear system. In addition, our approach is computationally efficient but does not introduce additional nonlinearity into the robot dynamics, which is beneficial for the optimization in our MPC control strategy.

In addition to the single-mode learning control, multi-modal algorithms exist, which identify a number of dynamic modes ahead of time using labelled or unlabelled training data and switch to the most likely model during operation [17, 9, 11, 18, 19]. This allows them to maintain persistent knowledge of a robot’s dynamics across a wide range of operating conditions. They infer the correct mode from measurements during operation to maintain a high level of performance even when the mode is not directly observed. These approaches do, however, require that the number of modes and/or training data from each mode are available ahead of time, which can be a challenging task in real-world robotics applications. Our previous approach [2] addressed this by selecting data from the relevant operating condition and constructing the model online, however to maintain real-time capability, we had to choose a few points from the most relevant previous run to include in the model so did not make full use of all past experience. In addition, we handled new scenarios by reverting to a conservative model rather than adapting quickly. In this work, we use all previous data over the upcoming section of the path and incorporate fast adaptation to adapt quickly to new scenarios.

To adapt quickly to new scenarios, some authors have proposed a combination of fast adaptation and long-term learning [4, 3]. In [4], an adaptive controller keeps the system behaving like a constant reference system in the presence of non-repetitive disturbances and changes in the dynamics, while the (single-mode) iterative learning compensates for repetitive path tracking errors. The authors in [3]

also used a fast, adaptive, linear term to adapt to new scenarios but used neural network priors for the robot dynamics as their long-term learning. This approach was able to achieve impressive results in experiment, however, the neural networks required several hours of training data, were not adapted online, and are not in themselves probabilistic models. In this work, we use fast adaptation and long-term learning in one, unified, probabilistic framework. Our long-term learning starts acting after the first run and implicitly learns errors local to each section of the path in a way that is tailored specifically to predictive control.

In light of the current approaches and their limitations, the contributions of the paper are (i) to present a model learning framework that supports fast adaptation, long-term learning, and is tailored to predictive control; (ii) to incorporate that model (and its model uncertainty estimate) in a safe and robust predictive control scheme; and (iii) to demonstrate the advantage of fast adaptation and long-term learning in path tracking experiments over challenging terrain.

Iii Problem Statement

The goal of this work is to learn a probabilistic model for the dynamics of a ground robot performing a repetitive task, and show how it can be integrated with a state-of-the-art path following controller for high performance control while maintaining a quantitative measure of safety. The robot may be subjected to changes in its dynamics due to factors such as payload, terrain, or tyre pressure. We assume that these factors cannot be measured directly. A good algorithm should scale to long-term operation, take advantage of repeated runs in the same operating conditions, and adapt quickly to new operating conditions. The model must include a reasonable estimate of model uncertainty that acts as an upper bound on model error at all times.

Further assumptions are that the plant has locally linear actuator dynamics, and the number of operating conditions and dynamics in each operating condition are not known ahead of time. Locally linear means that the true plant dynamics, , where is the actual actuation output, are well approximated by a function of the form:


over the look-ahead horizon used by the predictive controller, where is the state of the plant,

is a vector containing the state,

, and the control input that drives that actuator, , is a matrix of weights we wish to learn, and is additive, zero-mean, Gaussian noise with covariance matrix . The subscript, , indicates the time step. We have replaced with , where and are learned. This allows us to keep the dominant part of the nonlinear plant dynamics and greatly simplifies the learning problem.

The system is constrained by polytopic state and input constraints:


We assume a Gaussian belief over the state at each time step and enforce constraints probabilistically using a chance-constrained formulation so that the probability of violating state and input constraints is kept below an acceptable threshold. Since enforcing these constraints jointly can lead to undesirable, conservative behaviour, we enforce them individually, see

[12] for a detailed explanation.

Iv Methodology

In this section, we present our approach for long-term, safe learning control with fast adaptation. Our approach makes extensive use of BLR to model the system dynamics. We assume a known nonlinear model for the plant with unknown linear actuator dynamics. We use weighted BLR to determine the coefficients of the actuator dynamics model and a measure of run similarity to determine the data weights. This allows us to compute the posterior for the model parameters in closed form, avoiding iterative approaches such as [13], which also optimizes the data weights. We then formulate the control problem as a Tube MPC problem following work in [12, 20] but using a modified ancillary controller.

Iv-a Weighted Bayesian Linear Regression

In this section, we give a brief overview of weighted BLR, which is use to learn the actuator dynamics. It is an extension of BLR, as presented in [21], and a modification of [13], where we assume a known data weighting. The main advantages of BLR are that it is computationally inexpensive and, does not introduce additional non-linearity into the model, which is beneficial for MPC. We introduce BLR for a scalar system. Multiple scalar systems can be combined to get an equation of the form in (1).

We consider locally linear models of the form:


to model the dynamics of each actuator, where and .

A dataset will consist of pairs , each with a scalar weight , where means the point has no influence on the solution, and means the point is fully included in the regression. Given a weighted dataset, , and the model (3), the likelihood of , assuming each data point is independent, is:


where is a vector of stacked , and is a matrix with rows .

With this likelihood, the conjugate prior is a Normal Inverse Gamma distribution which gives us the following priors for

and [21]


where is the prior mean for the weights, is a prior inverse sum of squares of , and and are the parameters of the Inverse Gamma distribution, which are proportional to the effective number of data points in the prior and

times the prior output variance.

The likelihood, (4), can be manipulated into a NIG distribution over so that (5) and (6

) form a conjugate prior and the posterior joint distribution over

and is:




where is the trace operator and is a diagonal matrix of the data weights.

The posterior marginals are then:


where is a Student t distribution. This gives us all of the components we need to make predictions of the state at future time-steps. It is important to note that while the uncertainty in decreases as more data is added, the mean value for can increase or decrease to reflect the data. The model uncertainty is then passed to the robust controller. This is in contrast to a GP where the uncertainty only decreases to a value determined by the hyperparameters as data is added.

Iv-B Data Management

Fig. 2: Runs 1 and 2 represent previous autonomous traverses of the path and the live run represents the current traverse. The filled black circle indicates the current position of the vehicle and the dotted circle indicates the predicted position of the vehicle at the end of the MPC prediction horizon. Data, , for each run, , is stored in small sets at each vertex, , represented by a circle. The goal is to use recent data from the live run (solid box) to assess the similarity of the current dynamics to the dynamics in all previous runs along the same section of the path (dashed box). This is used to recommend experiences from a run with similar dynamics (dotted box) and construct a predictive model for the dynamics on the upcoming section of the path. We also use data from the live run to recursively update the model and achieve fast adaptation.

The purpose of our method is to construct the best possible model of the system dynamics for MPC. MPC uses the dynamics over the upcoming section of the path to compute the control input. We use data from the recently traversed part of the path to determine which past runs are most similar to the current run. We then use data from these runs over the upcoming section of the path to build the prediction model for MPC. Since we are interested in learning dynamics of the form (3), individual experiences correspond to pairs .

Referring to Fig. 2, for each previous run, , we use data, , from the recently traversed section of the path to construct a local model for the actuator dynamics, . Each is then used to generate predictions for the mean and variance of the actuator output, , at each of the recent experiences from the current run, . These estimates are used to identify the most similar run to the current run and reject runs that are substantially different from the current run.

Iv-B1 Fast Adaptation

This section describes how we use data from the current run to quickly adapt to non-repetitive changes or new dynamics that the system has not experienced before. This was not feasible in our previous approach, based on GPs, since the number of points in each GP model was limited due to computational constraints and so we focused entirely on modelling the dynamics along the upcoming section of the path using data from previous runs [2].

We recursively update the model parameters using data from the live run while keeping the strength of the prior fixed at . The value of determines how many effective data points we attribute to the prior. A large value for results in a smoother estimate for the model parameters, while a smaller value for allows the system to adapt to sudden changes more quickly. To update the model, we use the most recent data point from the live run and equations (8) - (11) with a data weight of 1.0. Then, we re-weight the model parameters using:


which is equivalent to constructing a new model using the sufficient statistics of the existing model weighted by a factor of . This can be derived by using (8) - (11), a non-informative prior, and , where

is an identity matrix of the appropriate size.

This allows the model to quickly adapt to the dynamics on the current run. This is effective for changes to the dynamics such as adding a large payload and driving on a uniform surface since the model parameters are relatively constant for the entire run. The new model parameters are used as the prior for the long-term learning update and as the prior for the next time-step.

Iv-B2 Long-term Learning

In this section, we explain how we select experiences gathered over an arbitrarily long period of time to improve the estimate of the robot dynamics over the upcoming section of the path. This is effective to account for changes in the dynamics due to factors such as terrain type which are local to each section of the path.

In the previous section, it was assumed that the data weights, , were known. In this section, we describe our algorithm for determining these weights. Keeping this as a separate step allows us to determine the posterior for the model parameters, , in closed form. For this, we rely heavily on our previous approach and give a brief outline here but refer the reader to [2] for details.

For each candidate run , we first test whether the proportion of recent experiences from the live run that lie further than expected from the predicted mean given the model from run is higher than would be expected by chance. We do this using the binomial test (see [2] for details). Runs that fail this test are not used for fitting the current local model.

Our next step is to identify which runs are most similar to the current run,

. We can compute the posterior probability of model



The second term on the right is the prior, which we assume to be equal for all runs; however, it could be informed by other sources such as computer vision, a weather report, or user input. The first term on the right is the probability of recent experiences from the live run if the operating conditions are the same as run

. Assuming each experience is independent, this is:


Similar to our previous work [2], we reject any run that has lower probability than the prior of generating . This is to ensure that experience added to the BLR model is likely to improve the performance beyond what could be achieved with no additional experience.

The data weights for each experience in a run are set proportional to the run posterior (16) and normalised by their maximum value. This automatically satisfies and means that the effective number of points can increase with each additional run. If we simply used the run posterior probability, the effective number of points used to update the model would remain the average number of points in a run over the portion of the path ahead of the vehicle, which limits how much the model can improve over time.

Once the weights have been computed, we update the parameters for the predictive model using (8)-(11). These parameters are only used to compute the control for one time-step and do not affect the prior at the next time-step because we will have measured the vehicle’s dynamics over the current section of the path and our estimate for might change.

Iv-C Path Following MPC Controller Design

This section outlines our MPC formulation including the path parametrization, cost function, ancillary control design, and uncertainty propagation.

Iv-C1 Path Parametrization

For our controller design, we closely follow the results presented in [20] on Model Predictive Contouring Control (MPCC). The central idea is to drive a path parameter, , that corresponds to a location along the path, using a virtual input, , that is solved as part of the MPC optimization problem. The path parameter has dynamics:


and increases monotonically towards the end of the path. Reference states can then be accessed by querying the reference at the path parameter, , for a given time-step. Progress along the path can then be traded against tracking error and other performance metrics in the cost function.

Iv-C2 Tracking Error

In contouring control , the position error is divided into two components [20]: (i), lag error, which is tangent to the path and (ii) contouring error, which is perpendicular to the path. The others states are compared directly to their references. Let be the stacked states and inputs over the prediction horizon. The cost associated with tracking for a given trajectory over the prediction horizon is:


where is a diagonal matrix of penalties for each component of error at each time-step in the prediction horizon and is the matrix that projects position error onto a path-aligned frame. We penalize the rate of change of the inputs to avoid exciting un-modelled dynamics using a quadratic penalty on the numerical derivatives of the inputs, .

Iv-C3 Uncertainty Propagation

We assume a Gaussian belief over the state at each time step and nonlinear dynamics for the plant. This allows us to use the Extended Kalman Filter (EKF) prediction equations to propagate our belief of the state into the future given a series of inputs

[12]. For the prediction, we include uncertainty in the state, actuator model parameters, and actuator model output. Let be the Jacobian with respect to the stacked state and parameters, . The mean, , and covariance, , can be updated using the EKF prediction equations:


where is a block diagonal matrix containing the model weight covariance from (13) for each actuator, and is the process noise covariance. The only non-zero components in are the diagonal elements corresponding to uncertainty in the output of the actuators for which we use the posterior mean from (12). Alternatives to EKF such as the Sigma Point filter or Monte Carlo could be used here at slightly higher computational cost for a small increase in accuracy [5, 12].

The predicted uncertainty can be used to compute a robust set around the mean prediction from the model that the true system is guaranteed to lie within with high probability.

Iv-C4 Ancillary State Feedback Controller

The method for uncertainty propagation in Sec. IV-C3 provides a robust bound on the prediction error, but does not take into account the fact that the controller can take corrective actions to reduce the predicted uncertainty [12]. The result is that the predicted uncertainty can grow quickly and without bound greatly reducing the size of the predicted robust, feasible set [12]. A common approach to account for feedback when predicting uncertainty is to use Tube MPC [6] and to parametrize the control policy as an linear state feedback controller with a feed-forward control input. We can then optimize over the feed-forward control input to drive the mean state of the system while the feedback controller limits uncertainty growth around the mean[12].

In contrast to other approaches for tube MPC for non-linear systems, we make use of the fact that our actuator dynamics are linear to design linear ancillary controllers for these states. This keeps the uncertainty in these states bounded, which limits the uncertainty growth in other states over the prediction horizon. Section V-B shows how we apply this to a unicycle-type robot.

Iv-C5 Constraint Tightening

Since our plant model has uncertainty, we must tighten the constraints on the state and input to make sure the true system respects the true constraints (with high probability), and that the ancillary control policy remains robustly feasible for our choice of the inputs. Our treatment of the constraint tightening follows [12]. For contouring error, , our chance constraints can be re-expressed as:



is the quantile of the Gaussian CDF corresponding to the small probability of violating the contouring constraint

(e.g. 2.0 for ) [12], and is a unit vector perpendicular to the path at . Other constraints on the state may be treated analogously.

Analogous treatment of the input constraints yields:


where is the th of , is an associated ancillary gain which acts on an error of our choosing, , and

is the standard deviation associated with that error. Here, we can see that while the ancillary controller reduces the prediction uncertainty it will also reduce the control input available for controlling the nominal state.

The feedback gain can be chosen as an infinite horizon LQR controller with the same cost function as MPC [12, 16] or included in the optimization problem [22], but we found that a wide range of gains worked for our system so left the gain as a tuning parameter.

Iv-D Optimal Control Problem

At each time-step, we wish to solve for the optimal states and inputs subject to a set of safety constraints derived from the model uncertainty, path tracking error and actuator constraints. The decision variable is .

This leads to the following optimization problem:

subject to (28)

where is a stacked vector of small, acceptable probabilities of violating each state and input constraint, which must be solved at every time-step. To solve the non-linear optimization problem efficiently, we linearise about the initial guess from the previous time-step and solve it as a sequential quadratic program.

V Application to a Ground Robot

This section outlines how to apply our method to the unicycle ground robot pictured in Fig. 3.

Fig. 3: Clearpath Grizzly in the loaded configuration traversing a gravel mound at a target speed of 2.0 m/s with the proposed algorithm.

V-a System Model

We consider the robot system (see Fig. 3) to be well approximated by a unicycle model with first order translational and rotational dynamics:


where and are the position of the vehicle at time step , is the heading, is the time step, and and are the forward speed and turn rate about the body -axis. The control inputs are the commanded speed, , and turn rate, . The parameters of the model that we wish to learn are the coefficients of the first-order actuator dynamics, and , which are column vectors. With and , this is of the form (1). The process noise in (22) is the variance of and , so .

V-B Ancillary Control Design for the Unicycle with First Order Actuator Dynamics

The ancillary controller is meant to reduce uncertainty growth to maintain a large robust, feasible set over the prediction horizon. For the unicycle, lateral uncertainty growth (which is constrained) depends on heading uncertainty and speed. Keeping uncertainty in these states low therefore keeps the lateral uncertainty low maintaining a large robust, feasible set. With a linear feedback controller on the heading and speed error, the speed and turn rate dynamics become:


where is the difference between the state and the predicted mean at time step . These controllers keep the system close to the predicted speed and heading.

Vi Experiments

Experiments were conducted on a 900  kg Clearpath Grizzly skid-steer ground robot shown in Fig. 3. First, we compare the predictive performance of a GP to our proposed method on a dataset with varied payload and terrain type. Second, we demonstrate the effectiveness of each component of our algorithm in closed loop. Finally, we demonstrate the path tracking performance of our algorithm at high speed on a 175 m off-road course.

Vi-a Implementation

Our algorithm was implemented in C++ on an Intel i7 2.70 GHz 8 core processor with 16 GB of RAM. Our controller relies on a vision-based system, Visual Teach and Repeat [23], for localization, which is running on the same laptop as the controller. The control runs at 10 Hz with a three second look-ahead discretized by 30 points. The optimization problem (27)-(30) is re-linearized three times, taking an average of 70 ms to compute the control. The model updates (Sec. IV-B1 and IV-B2) are executed at every time step.

We consider the last three seconds of data (30 samples) from the live run for . The penalties on lag, contouring, heading, speed, and turn rate error are 50, 200, 200, 2, and 2 respectively. The penalties on commanded speed, turn rate, and reference speed from their references are 1, 1, and 50 respectively. The penalties on rate of change of commands in the same order are 10, 15, and 5. The maximum lateral error is 2 m, is 1, and the ancillary controller gains are both . The prior strength, , was set to 100. For the high speed experiment, we increased the penalty on commanded turning acceleration from 15 to 20 to achieve smoother performance on the rough terrain.

Vi-B Model Predictive Performance Comparison

In order to evaluate the suitability of the proposed method for predictive control, we first evaluate the predictive performance of each component of the proposed method (Sec. IV-B1 and IV-B2) and their combination compared to a GP trained on data from the upcoming section of the path from the previous run. We consider the rotational dynamics because they differ the most between configurations. We used the same GP hyperparameters as in our previous paper [24] since these have been tested extensively in closed loop. We consider a prediction horizon of 3 s, since this was used by the proposed controller in our closed-loop experiments.

To measure the accuracy of the prediction of the mean, we use the Multi-Step RMS Error (M-RMSE) between the predictions made over the look-ahead horizon and the measured state at these times [2]

. To measure the accuracy of the error bound, we use the Multi-Step RMS Z-score (M-RMSZ) of the prediction at the future time steps


Referring to Figure 4, the proposed methods achieve lower M-RMSE than the GP indicating high accuracy. The M-RMSZ for the proposed method is below 1.0 for all of the configurations, which demonstrates that the proposed method maintains a good estimate of model uncertainty. Therefore, the proposed method is a good candidate for safe, predictive learning control in a wide range of operating conditions.

GPFast AdaptationFast+Long-termLong-term1345678910111213141516171819202Run Number0. Predictive Performance Comparison
Fig. 4: A comparison of M-RMSE and M-RMSZ for the rotational dynamics with the vehicle in five different configurations. The error bars indicate the 25th and 75th percentiles and the marker indicates the median. The 65 m path traversed sand, gravel, and concrete. Runs 2-4 are in the Loaded configuration, with 6 gravel bags in the rear of the Grizzly (see Fig. 3), runs 6-8 are with the vehicle in the Nominal configuration (no modification), runs 9-12 are with the vehicle in the Loaded & Understeer configuration, where it is loaded and the turn rate commands are multiplied by 0.7, runs 13-16 are in the Loaded & Oversteer configuration, where the vehicle is loaded and the turn rate commands are multiplied by 1.2, and runs 17-20 in the Oversteer configuration, where the turn rate commands are multiplied by 1.2. This figure shows that the performance of the GP model strongly depends on the operating conditions and in some cases it does not improve over time, whereas the BLR methods perform well in all operating conditions.

Vi-C Closed Loop Tracking Performance Comparison

To demonstrate the impact of each component of our method on closed-loop performance and show that our method can adapt to repetitive model errors, we drive the vehicle on a course that consists of two laps of a circular course and apply an artificial disturbance by multiplying the turn rate commands by 0.5 at the start of the second lap (vertex 100 in Fig. 5). We compare the tracking performance of each component of our algorithm over eight repeats of the path. For this experiment, the desired speed was 2 m/s.

Figure 5 shows that all methods achieve similar performance before the disturbance is applied because the model for all methods was a good representation of the vehicle dynamics over this portion of the path. After this point, the non-learning controller incurs a large lateral error because the model is no longer accurate. Long-term learning (Sec. IV-B2) similarly incurs a large path tracking error on the first run (see Figure 6) since there are no previous runs with experience. However, after the first run, it improves greatly but then converges slowly because it is constantly working against a static prior (the same model used for the non-learning comparison), that is incorrect after the disturbance is applied. When fast adaptation (Sec. IV-B1

) is enabled, the controller incurs a large tracking error at the moment the disturbance is applied but adapts quickly to the new robot dynamics to achieve low error as expected. When both fast adaptation and long-term learning are enabled, the fast adaptation keeps the prior close to the true dynamics such that the long-term learning is able to reduce the transient error by leveraging data from the upcoming section of the path. This combination achieves the lowest path tracking error and the fastest convergence (see Fig.


Lateral Error Distribution by Vertex1. LearningFast AdaptationLong-termFast+Long-termVertex IDDisturbance Applied Here
Fig. 5: This figure shows the closed-loop performance of the controller when we introduce a large, repetitive disturbance at vertex 100 by multiplying the turn rate commands by 0.5 after this point. This introduces a large, repeatable disturbance such as one might expect if the vehicle was traversing a patch of ice. The solid line indicates the mean lateral tracking error over eight runs and the shaded region indicates one standard deviation. The proposed method with both long-term and fast adaptation learning achieves the lowest error and fastest convergence.
No LearningFast AdaptationLong-termFast+Long-termLateral Error Distribution by RunRun Number123456780.
Fig. 6: Figure showing the closed-loop performance of each element of the proposed algorithm in the presence of a large, repetitive disturbance described in Sec. VI-C. This figure shows that the proposed algorithm is able to quickly adapt to new scenarios, and that the combination of fast adaptation (Sec. IV-B1) and long-term learning (Sec. IV-B2) achieves the best performance.

Vi-D High Speed Tracking Performance

Finally, we evaluated the performance of our controller on a 175 m off-road course with tight turns and fast straights. The desired speed was 3 m/s and the controller achieved an average speed of 1.6 m/s with a top speed of 2.7 m/s and a RMS lateral error of 0.25 m. This is a significant improvement over our previous work, where the controller achieved an average speed around 1.0 m/s on pavement [2].

Vehicle Trajectory Colored by Forward Speed[m]
Fig. 7: This figure shows the path taken by the vehicle on five traverses of a 175 m course. The direction of travel is indicated by the black arrows. The maximum path tracking error is 0.7 m when the controller cuts a corner (dashed blue circle). The vehicle was in the Nominal configuration.

Vii Conclusions

In this paper, we have proposed a new method for long-term, safe learning control based on local, weighted BLR. This method is computationally inexpensive which enables fast model updates and allows us to leverage large amounts of data gathered over previous traverses of a path. This enables both fast adaptation to new scenarios and high-accuracy tracking in the presence of repetitive model errors. The model parameters can be determined reliably online which enables our method to be applied in a wide range of operating conditions with little to no tuning. We have demonstrated the effectiveness of the proposed approach in a range of challenging, off-road experiments. We encourage the reader to watch our video at http://tiny.cc/fast-slow-learn showing the experiments and datasets used in this paper.


  • [1] L. G. Dekker, J. A. Marshall, and J. Larsson. Industrial-Scale Autonomous Wheeled-Vehicle Path Following by Combining Iterative Learning Control with Feedback Linearization. In Proc. of the Intl. Conference on Intelligent Robots and Systems (IROS), pages 2643–2648, 2017.
  • [2] C. McKinnon and A.P. Schoellig. Experience-Based Model Selection to Enable Long-Term, Safe Control for Repetitive Tasks Under Changing Conditions. In Proc. of the Intl. Conference on Intelligent Robots and Systems (IROS), page accepted, 2018.
  • [3] J. Fu, S. Levine, and P. Abbeel. One-shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors. In Proc. of the Intl. Conference on Intelligent Robots and Systems (IROS), pages 4019–4026, 10 2016.
  • [4] K. Pereida, D. Kooijman, R. Duivenvoorden, and A. P. Schoellig. Transfer Learning for High-precision Trajectory Tracking Through L1 Adaptive Feedback and Iterative Learning. Intl. Journal of Adaptive Control and Signal Processing, 52(6):802–823, 2018.
  • [5] C. Ostafew, A. P. Schoellig, and T. Barfoot. Robust Constrained Learning-based NMPC Enabling Reliable Mobile Robot Path Tracking. Intl. Journal of Robotics Research (IJRR), 35(13):1547–1563, 2016.
  • [6] A. Aswani, H. Gonzalez, S. Sastry, and C. Tomlin. Provably Safe and Robust Learning-based Model Predictive Control. Automatica, 49(5):1216–1226, 2013.
  • [7] C. Xie, S. Patil, T. Moldovan, S. Levine, and P. Abbeel.

    Model-based Reinforcement Learning with Parametrized Physical Models and Optimism-Driven Exploration.

    In Proc. of the Intl. Conf. on Robotics and Automation (ICRA), pages 504–511, 2016.
  • [8] G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots, and E. A. Theodorou. Information Theoretic MPC for Model-Based Reinforcement Learning. In Proc. of the International on Robotics and Automation (ICRA), pages 1714–1721, 2017.
  • [9] B. Luders, I. Sugel, and J. How. Robust Trajectory Planning for Autonomous Parafoils Under Wind Uncertainty. In Proc. of the AIAA Conference on Guidance, Navigation and Control, 2013.
  • [10] Q. Li, J. Qian, Z. Zhu, X. Bao, M. K. Helwa, and A. P. Schoellig. Deep Neural Networks for Improved, Impromptu Trajectory Tracking of Quadrotors. In Proc. of the Intl. Conf. on Robotics and Automation (ICRA), pages 5183–5189, 2017.
  • [11] G. Aoude, B. Luders, J. Joseph, N. Roy, and J. How. Probabilistically Safe Motion Planning to Avoid Dynamic Obstacles with Uncertain Motion Patterns. Autonomous Robots, 35(1):51–76, 2013.
  • [12] L. Hewing and M. N. Zeilinger. Cautious Model Predictive Control Using Gaussian Process Regression. In arXiv:1705.10702, 2017.
  • [13] J. Ting, A. D’Souza, S. Vijayakumar, and S. Schaal. A Bayesian Approach to Empirical Local Linearization for Robotics. In Proc. of Intl. Conf on Robotics and Automation, pages 2860–2865, 2008.
  • [14] V. Desaraju, A. Spitzer, and N. Michael. Experience-driven Predictive Control with Robust Constraint Satisfaction under Time-Varying State Uncertainty. In Proc. of Robotics: Science and Systems Conference (RSS), 2017.
  • [15] G. Sicard, C. Salaün, S. Ivaldi, V. Padois, and O. Sigaud. Learning the Velocity Kinematics of ICUB for Model-based Control: XCSF versus LWPR. In Proc. of the Intl. Conf. on Humanoid Robots, pages 570–575, 2011.
  • [16] Y. Gao, A. Gray, H. Tseng, and F. Borrelli. A Tube-based Robust Nonlinear Predictive Control Approach to Semiautonomous Ground Vehicles. Vehicle System Dynamics, 52(6):802–823, 2014.
  • [17] K. Jo, K. Chu, and M. Sunwoo. Interacting Multiple Model Filter-based Sensor Fusion of GPS with In-vehicle Sensors for Real-time Vehicle Positioning. Transactions on Intelligent Transportation Systems, 13(1):329–343, 2012.
  • [18] R. Calandra, S. Ivaldi, M. Deisenroth, E. Rueckert, and J. Peters. Learning Inverse Dynamics Models with Contacts. In Intl. Conf. on Robotics and Automation (ICRA), pages 3186–3191, 2015.
  • [19] R. Pautrat, K. Chatzilygeroudis, and J. Mouret. Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search. In arXiv:1709.06919, 2017.
  • [20] D. Lam, C. Manzie, and M. Good. Model Predictive Contouring Control. In Conf. on Decision and Control (CDC), pages 6137–6142, 2010.
  • [21] K. Murphy. Machine Learning: A Probabilistic Perspective. MIT Press, 2012.
  • [22] M. Vitus and C. Tomlin. On Feedback Design and Risk Allocation in Chance Constrained Control. In Proc. of the Decision and Control and European Control Conference (CDC-ECC), pages 734–739, 2011.
  • [23] M. Paton, F. Pomerleau, K. MacTavish, C. Ostafew, and T. Barfoot. Expanding the Limits of Vision-based Localization for Long-term Route-following Autonomy. Journal of Field Robotics (JFR), 34(1):98–122, 2017.
  • [24] Christopher D. McKinnon and Angela P. Schoellig. Learning Multi-Modal Models for Robot Dynamics with a Mixture of Gaussian Process Experts. In Proc. of the Int’l Conf. on Robotics and Automation (ICRA), 2017.