Learning based control for autonomous racing
Learning to race autonomously is a challenging problem. It requires perception, estimation, planning, and control to work together in synchronization while driving at the limit of a vehicle's handling capability. Among others, one of the fundamental challenges lies in predicting the vehicle's future states like position, orientation, and speed with high accuracy because it is inevitably hard to identify vehicle model parameters that capture its real nonlinear dynamics in the presence of lateral tire slip. We present a model-based planning and control framework for autonomous racing that significantly reduces the effort required in system identification. Our approach bridges the gap between the design in a simulation and the real world by learning from on-board sensor measurements. Thus, the teams participating in autonomous racing competitions can start racing on new tracks without having to worry about tuning the vehicle model.READ FULL TEXT VIEW PDF
Driving on the limits of vehicle dynamics requires predictive planning o...
Automated driving applications require accurate vehicle specific models ...
A fundamental aspect of racing is overtaking other race cars. Whereas
Professional race car drivers can execute extreme overtaking maneuvers.
In autonomous racing, vehicles operate close to the limits of handling a...
Trajectory planning at high velocities and at the handling limits is a
An accurate vehicle dynamic model is the key to bridge the gap between
Learning based control for autonomous racing
Learning from experience is essential to racing due to the repetitive nature of the task. It forms an integral part of the professional training of racing drivers and their preparation before a race, which we can describe in three steps. First, the drivers identify the best racing strategy in a simulator to minimize their lap time. Second, they practice in the simulator to execute the same strategy and produce the best lap time consistently. Third, they get out of the simulator and onto the real track to fine-tune their racing strategy to compensate for sim-to-real differences. These steps can be extended naturally to autonomous racing. First, we compute the racing line for a given track profile. Second, we design a motion planner and controller in a simulation (assuming some model of vehicle dynamics) that minimize the deviation from the precomputed racing line. Third, to optimize the performance of this controller on a real vehicle, we learn to compensate for the mismatch between the model used in the simulation and real vehicle dynamics.
Bridging this simulation-to-reality gap is challenging because it is hard to obtain a high fidelity model of vehicle dynamics, especially at the limit of the vehicle’s handling capability. While the kinematics of the vehicle is precisely known, the dynamics, specifically the lateral tire forces are complex nonlinear functions whose identification requires several time-intensive experiments; see [Liniger2018] for an elaborate process of model tuning. A wrong choice of model parameters can severely affect the controller’s performance in terms of lap times and meeting critical safety constraints. Moreover, since the tire forces strongly depend upon the racing surface, one must repeat the process of system identification if the track is changed.
In this paper, we present a model-based planning and control framework for autonomous racing that significantly reduces the effort required in model identification by learning from prior experience.
Related work. Given the repetitive nature of the task, the racing problem is formulated as an iterative learning control problem in [Kapania2015]. First, the racing line is derived using professional driving techniques [Theodosis2011], and then a proportional derivative (PD) controller is used to track this racing line. The performance of the controller in the current lap is improved based on knowledge of the tracking error from the previous lap. This work falls in the realm of model-free control methods. Another example is of end-to-end learning that maps images from a camera directly to control actions like steering and throttle [Bojarski2016, Balaji2019]. Arguably, a model-based method like model predictive control (MPC) is more suitable for autonomous racing. MPC predicts the states in the future using a model of the vehicle dynamics and explicitly handles track constraints and obstacle avoidance, allowing the vehicle to pull off aggressive maneuvers while staying under control. MPC is implemented in the form of hierarchical receding horizon control (HRHC) in [Liniger2015], where first a trajectory that provides maximum progress along the track is generated using a motion planner, and then MPC is used for path tracking. An alternative is to combine the motion planning and predictive control into a joint nonlinear optimization problem called model predictive contouring control (MPCC) [Liniger2015]. The performance of MPC can seriously deteriorate with incorrect choice of model parameters. Thus, learning-based control algorithms play an important role in autonomous racing, where we seek to correct the inaccurate parameter estimates by collecting real-world data. In light of this, an iterative procedure that uses data from previous laps to identify an affine time-varying model of vehicle dynamics and reformulate the MPC problem with an updated terminal set and terminal cost is proposed in [Rosolia2019]. It is shown in [Hewing2018] that model mismatch to the tune of 15% can be fixed with the help of a Gaussian process (GP) in the MPCC problem. All the above variants of MPC [Liniger2015, Rosolia2019, Hewing2018] use the so-called dynamic model, which is too complex and time-intensive to tune.
In contrast, our approach requires a much simpler extended kinematic model that has only three tuning parameters; the unmodeled component of the dynamics is learned using three GP models. We provide an in-depth comparison of different types of vehicle models in Section 2.
Contributions. We show that using the extended kinematic model (whose all three parameters – mass, the distance of the center of gravity from the front and rear wheels – can be physically measured) as a nominal model and thereafter using Gaussian processes for correcting model mismatch, we converge to a model that matches the real vehicle dynamics closely. These GP models for error correction are trained on real sensor measurements that can be obtained by driving the vehicle around with a model-free controller (like pure pursuit) or even manual control on any track, see Section 4.1-4.2. We demonstrate the efficacy of our approach with the design of a motion planner (trajectory generator) and MPC for tracking pre-computed racing lines using this corrected model in Section 4.3. We show that the performance is further enhanced by updating the GP models with data generated by MPC in Section 4.4. Our learning procedure is essential to reducing the cost of system identification and thus enables rapid sim-to-real transfer. It is especially relevant to teams participating in autonomous racing competitions who can design a competitive controller without spending time on model tuning. We present experiments in simulations with 1:43 scale miniature race cars at ETH Zürich.
Among many choices for the models of vehicle dynamics, the most widely used are kinematic and dynamic bicycle models, see expressions for a rear-wheel drive in Table 1 and more details in [Kong2015, Rajamani2012].
Notation. We use the following nomenclature throughout the paper. States, inputs, and forces: are the coordinates in an inertial frame, is the inertial heading, and are speed and acceleration in the inertial frame, , are velocities in the body frame, is the angular velocity, is the steering angle, is the change in the steering angle, is the longitudinal force in the body frame, and are the lateral forces in the body frame with subscripts and denoting front and rear wheels, respectively, and are the corresponding slip angles. Vehicle model parameters: denotes the mass,
the moment of inertia about the vertical axis passing through the center of gravity,and the distance of the center of gravity from the front and the rear wheels in the longitudinal direction. , , , , , and are track specific parameters for the tire force curves.
|Pacejka tire model|
Kinematic model is preferred in some applications [Thrun2006, Kanayama1990] for its simplicity as it requires only two tuning parameters, namely lengths and , which can be physically measured. The kinematic model ignores the effect of tire slip and thus does not reflect actual dynamics at high-speed cornering. Therefore, it is considered unsuitable for model-based control in autonomous racing.
Dynamic model, on the other hand, is more complex and painful to tune as it requires several tests to identify tire, drivetrain, and friction parameters. The lateral forces are typically modeled using a Pacejka tire model, see Table 1 and [Bakker1987]. A complete procedure of system identification is available in [Liniger2018]. When well-tuned, the dynamic model is considered suitable for autonomous racing in the MPC framework [Liniger2015, Rosolia2019, Hewing2018, Kabzan2019]. However, the model complexity makes the tuning procedure time prohibitive, especially when the tire slip curves must be re-calibrated for a new racing surface, which is indeed common for autonomous racing competitions.
Extended kinematic model. The essential difference between the kinematic and dynamic models is that three states, , , and , are not defined in the former. Thus, to easily measure the discrepancy between real measurements and model predictions, we consider a variant of the kinematic model that has the same states as the dynamic model. We call this extended kinematic (e-kinematic) model, see mathematical representation in Table 1. The advantage of using the e-kinematic model is that it has only three tuning parameters, namely , , and , all of which can be physically measured. However, unlike the dynamic model which is closer to the real dynamics, the e-kinematic model does not consider tire forces. Thus, using it in MPC in its standard form will result in undesirable errors. Specifically, the evolution of the first three states , , and is exactly same in the e-kinematic and the dynamic model; the difference lies only in , , and . Our learning procedure presented in Section 4 is based on reducing the mismatch between the e-kinematic model and the real measurements (or estimates) of the states , , , , , and . The e-kinematic model is used in [Kabzan2019] to approximate the vehicle dynamics at low speeds where the Pacejka model is undefined due to division by .
Comparison. We compare the response of all three models with the same inputs in Figure 1. A constant acceleration of 1 m/s is applied for 1s starting from zero initial speed while the steering angle is kept constant at 0.2 rad. The vehicle parameters are taken from [Liniger2015]. The impact of model mismatch is evident while turning even at low speeds as nonlinear lateral tire forces start to dominate. The trajectories diverge with time. The real vehicle dynamics are best represented by the orange curve when the dynamic model is well-tuned.
The experiments are performed in simulations on the 1:43 scale autonomous racing platform at ETH Zürich [Liniger2015]. The real vehicle dynamics is simulated using the dynamic model . The model predictive controller uses the e-kinematic model with error correction to make real-time decisions for minimizing the lap time. This is graphically illustrated in Figure 2. In Section 4, we show how BayesRace learns this error correction function using Gaussian processes. We also compare BayesRace to two different scenarios: (1) WorstCase when there is no correction for model mismatch, i.e., MPC uses the e-kinematic model in Figure 2, and (2) BestCase when MPC has full knowledge of the real dynamics, i.e., MPC uses the dynamic model in Figure 2.
The vehicle (dynamic model) is powered by a DC electric motor. The longitudinal force is given by
where and are the known coefficients of the motor model, is the rolling resistance, the drag resistance, and the pulse width modulation (PWM) duty cycle for the motor. A positive implies an acceleration and a negative deceleration. For the e-kinematic model, we further reduce the complexity by ignoring rolling and drag resistance
Thus, with this definition, the states of both models are defined as and inputs as . We denote the discrete time representation of the e-kinematic model by . We assume that the car is equipped with the relevant sensors needed for state estimation, mapping, and localization. For further details, we refer the reader to [Kabzan2019, Valls2018].
We break down our approach into four steps: (1) data capture (2) training of Gaussian process models (3) predictive controller design (4) model update by exploration.
We begin with collecting sensor measurements and actuation data from the vehicle by driving it around using a simple controller. A pure pursuit controller [Coulter1992] is a popular choice for path tracking and requires little tuning effort; it was reportedly used by three teams in the DARPA Urban Challenge [Buehler2009]. For a known track, we compute the racing line using [JainCDC2020] and then track it using the pure pursuit controller. The controller gain and look ahead distance are not tuned well to enforce non-aggressive maneuvers. We collect the data sampled every 20 ms in the form of state-action-state pairs, denoted by where is the length of the trajectory. The racing line and the trajectory taken by the car are shown in Figure 3. As discussed in Section 3 and Figure 2, comes from the dynamic model. In practice, one could drive the vehicle on a track using manual controls or use a similar pure pursuit controller to drive it autonomously to collect the real world data.
Training. We use the collected data to address the model mismatch between the dynamic and e-kinematic models. Since the parameters of the e-kinematic model are known, we generate a new dataset that captures its response when excited with the same inputs starting from the same initialization; , where come from . We define the training data set . Our next goal is to learn the model mismatch error in single-step perturbation:
Note that based on the description in Table 1, in and in differ in only three states, namely , , and . Thus, error is of the form , where denotes nonzero terms. For each state with nonzero error, we learn a Gaussian process model of the form
where equal to 4, 5, 6 corresponds to the model mismatch in the states , and , respectively. More specifically, , and , where each and is a function of and whose closed-form expressions are known, for more details see [Rasmussen2006]. Now the corrected model that is suitable for controller design is related to the e-kinematic model as
Validation. We validate the trained GP models on a new track shown in Figure 4. However, this time we drive the car with a more aggressive controller. In practice, we will never know the real vehicle dynamics but for the purpose of testing the quality of the trained models, we consider a trajectory from BestCase scenario when an MPC controller is designed to minimize lap time using full knowledge of the dynamics. Thus, this trajectory is simply more aggressive than the one obtained using a pure pursuit controller for training and thus also captures high speed cornering. The mean predictions and 95% confidence intervals for all three erroneous sates are shown in Figure 5. The regions with high uncertainty in predictions where are marked on the track in Figure 4. The GP models have high uncertainty mostly during high-speed cornering and while braking before corners.
Controller. Our goal is to design a predictive controller that tracks the racing line using the corrected e-kinematic model . To reduce the computational complexity of the controller, we eliminate stochasticity in (5
) by approximating the probability distributions ofby their mean estimates. Thus, the corrected e-kinematic model used in the controller design is given by
We know the analytical (non-convex) expression of all the s from the training step. At any time , given the current state estimate , we solve the following nonlinear program recursively in a receding horizon manner
Here, the norm and we choose tracking penalty , actuation penalty , and slack penalty . The reference trajectory is generated using the motion planner described in the following paragraph. The set of constraints in (7d
) come from the track boundary approximated by two hyperplanes for each time step in the horizon. These hyperplanes are parallel to the direction of centerline at the projection of the referenceon the centerline. The slack variables are introduced to prevent infeasibilities. Actuation constraints are defined in (7e)-(7g). The optimization problem is solved every 20 ms using IPOPT [Waechter2009] with CasADi [Andersson2018].
Motion planner. The reference trajectory at each time in (7) is based on the racing line computed using Bayesian optimization [JainCDC2020]. This racing line not only provides the path followed around a track but also the optimal speed profile along the path as a function of the distance traveled along the track . For each time step we compute
where is computed at the projection of current position on the racing line and is the sampling time equal to 20 ms. Any other trajectory generator like the lattice planner in [Howard2007] can also be used.
Effect of model correction. We show the path followed by the vehicle with BayesRace controller (7) in Figure 6. We compare this to WorstCase scenario when MPC uses e-kinematic model without error correction in Figure 7. In both figures, after every 0.5 s, we also compare the solution of the optimization solver in red to the open-loop trajectory obtained by applying the same inputs to the vehicle (in our case, the dynamic model) in green. The higher the deviation between the red and green curves, the higher the model mismatch. If the optimization solver used the exact model for real vehicle dynamics, the only source of discrepancy would be due to discretization. We illustrate how correction with GP models in Figure 6 reduces the model mismatch between the solution returned by the optimization and the open-loop trajectory. As a result, we also observe a reduction in lap times by over 0.5 s. Next, we show a comparison of BayesRace controller (7) against BestCase scenario case when MPC uses full knowledge of the dynamics and there is no model mismatch in Figure 8. The corresponding set of optimal inputs is shown in Figure 9. Although the inputs show the same pattern, the curves are drifting with time because the model mismatch still persists in .
Figure 6 and 7 show that by error correction with GPs and thus reduction in the model mismatch, we observe the performance is improved to a large extent. However, when compared to the best-case scenario in Figure 8 and 9, we observe there is still scope for improvement. We bridge this gap further by performing a model update in Section 4.4.
As the final step, we use the data generated by running BayesRace controller (7) on the vehicle for one lap to update the GP models (4). Denote these data by where is the length of the trajectory. Like in Section 4.2, we also generate a corresponding dataset from the e-kinematic model . Now, to perform the model update, we simply combine the original dataset obtained by running the pure pursuit controller and the new dataset generated by MPC, and then re-train the GP models on . Like in (6), the updated GP models are used to correct the e-kinematic model; we denote this vehicle model by , where superscript denotes number of laps completed with MPC. The controller is updated accordingly to
Like in Figure 5, we again use the data generated by BestCase MPC with full knowledge of the vehicle dynamics to validate the updated GP models and regenerate the error plots; these are shown in Figure 10. A simple model update after only one lap with MPC suppresses the prediction uncertainty observed in Figure 5 in most regions on the track. However, a little bit of uncertainty persists at the start and the last corner. For practical purposes, represents the real vehicle dynamics closely. We verify this in Figure 11 and 12 by driving a lap with BayesRace controller (9) and comparing the solution against BestCase MPC with full knowledge of the vehicle dynamics. Note that, to focus only on the effect of model mismatch, we relaxed the penalty on the slack variables for this comparison (only) to reduce the effect of the boundary constraints on the optimization. Thus, the dashed curve in Figure 8 differs slightly from Figure 11. While we used all of the new data to update the GP models, one could also select specific samples based on prediction of uncertainty on the MPC data .
We present a learning-based planning and control algorithm that significantly reduces the effort required in system identification of an autonomous race car. The real vehicle dynamics are highly nonlinear and complex to model due to lateral tire forces. Starting with a kinematic model with only three parameters that can be physically measured, our algorithm uses measurements from the vehicle to gradually correct the initial model of the vehicle dynamics. This allows the racing teams to first design an aggressive model predictive controller in simulations without worrying about tuning the vehicle model parameters, and then implement it on the real car with minimum sim-to-real effort. We demonstrate our approach in simulations on the 1:43 scale autonomous racing platform at ETH Zürich and will test it on the real platform in the future.