1 Introduction
Reinforcement learning has become an important approach to the planning and control of autonomous agents in complex environments. However, recent interest in reinforcement learning is yet to be reflected in robotics applications; possibly due to their specific challenges.
In robotic systems, reinforcement learning methods must deal with continuous, and potentially highdimensional, state and control spaces. For example, the dynamics of autonomous vehicles are most naturally described in terms of continuous variables like position, velocity, orientation and steering angle. Data efficiency is also critical since collecting experience, or evaluating control policies, with a robot in the loop can be costly and time consuming. Poorly modeled dynamics, due to a slow learning rate, may result in unstable trajectories and dangerous collisions. The system also needs to handle physical constraints — from joint and actuator limitations to the boundaries defined by obstacles. Finally, in fastpaced applications rapid decisions, requiring realtime planning, must be made.
In this paper we propose a method which tackles all of the above challenges simultaneously. Our method achieves this through the marriage of two key components:

a learned dynamics model based on sparse spectrum Gaussian processes (GPs), and

a planner based on receding horizon control (RHC), and the structure exploiting interior point solver FORCES (Domahidi and Jerez, 2014).
Sparse spectrum GPs allow us to update the learned model online and the RHC framework provides the ability plan in realtime while naturally handling constraints. This means that we can learn and plan online in constrained environments where data collection is expensive. We believe that these are important features for real world applications of reinforcement learning, particularly in safety critical systems. Our proposed method will be referred to as Gaussian ProcessReceding Horizon Control (GPRHC hereafter).
The remainder of the paper is structured as follows. In section 2 we provide an overview of related approaches in modelbased reinforcement learning. The receding horizon control framework is presented in section 3. This is followed, in section 4, by a discussion on the application of Gaussian process regression to model learning. In section 5 we apply GPRHC to a cart pole swingup environment and a challenging autonomous racing task. The results demonstrate that (a) models can be learned quickly from limited data, (b) complex nonlinear constraints can be handled in realtime, and (c) online updates improve the rate of learning and result in more consistent performance.
2 Related Work
GPRHC falls into the class of modelbased reinforcement learning methods. These are generally the methods of choice in robotics primarily due to their impressive data efficiency (Kober et al., 2013)
. Modelbased approaches can be broadly classified according to (a) how the dynamics model is learned and, (b) the choice of planner.
PILCO (Deisenroth and Rasmussen, 2011), for example, combines policy search for planning with a Gaussian process model of the dynamics. The policy search relies on analytic gradients of closed form solutions to the long term expected cost. This requires the cost function and policy to take specific functional forms, making it difficult or impossible to incorporate general constraints. In a similar vein, Kim et al. (2004)
use policy search with locally weighted linear regression to learn a controller for helicopter flight.
As an alternative, trajectory optimization based on differential dynamic programming is often used for planning. Methods such as PDDP (Pan and Theodorou, 2014) and AGPiLQR (Boedecker et al., 2014) make use of this idea by combining dynamics models learned by locally weighted projection regression with either an iterative linear quadratic regulator or Gaussian as the planner. These planners can take simple box constraints into account but cannot handle general nonlinear constraints.
Our work is most closely related to the RLRCO method (Andersson et al., 2015) which leverages sparse Gaussian processes for learning the dynamics and trajectory optimization based sequential quadratic programming. We improve upon RLRCO by proposing an approach which allows the dynamics model to be updated online as the agent interacts with the environment. Furthermore, in contrast to the work of Andersson et al. (2015), we present results that highlight the ability to handle nonlinear constraints.
We show how these enhancements improve data efficiency, learning rate and constraint handling, rendering GPRHC overall more applicable to realistic scenarios.
3 Receding Horizon Control
In this paper, we consider an agent operating in a continuous environment described by a set of differential equations, , where represents the agent’s state and the control signal. The objective of the agent is to select a control to minimize a cost function (1a), while conforming to the system dynamics, and additional constraints (1b)(1d).
We address this overall problem using receding horizon control where the idea is to plan over a finite horizon by iteratively solving an optimization problem. Given a measurement of the current state , a trajectory through the statecontrol space is calculated, and the first step of the control signal is applied to the system. The horizon is then shifted forward and the process repeats. Formally, at each time step the agent must minimize the cost function
(1a)  
where is the current time, denotes the length of the planning horizon, is an intermediate cost function and is a terminal cost function. In order to ensure that the trajectory is feasible, the optimization is subject to the constraints:  
(1b)  
(1c)  
(1d)  
Inequalities (1b) and (1c) specify box constraints on the control and state spaces (described by upper and lower bounds), and can represent operational limits on actuators or joints. On the other hand, (1d) can express nonlinear, nonconvex constraints through the function . This can be used, for example, to define road boundaries or obstacles in an autonomous driving task.
The receding horizon formulation differs from other reinforcement learning approaches in a few important ways. First, instead of explicitly maintaining a representation of a policy or value function, the control is recalculated (over the shifting horizon) at each time step. This has the advantage of not requiring a representation for the entire problem a priori, but incurs the additional computational burden of repeatedly replanning. Second, hard constraints can be explicitly specified and handled through (1d). This allows agents to safely learn by avoiding dangerous regions of the state and control spaces.
Without some simplifying assumptions we cannot solve problem (1) directly. However, efficient approximate solutions can be found by following the “first discretize, then optimize” approach described in the sections below.
3.1 Direct Multiple Shooting
Using direct multiple shooting (Bock and Plitt, 1984), problem (1
) can be transformed into a structured nonlinear program (NLP). First, the time horizon
is partitioned into equal subintervals for . Then, taking a piecewise constant approximation of the control signal over each interval, a sequence of initial value problems,(2) 
can be set up for the state trajectory. Here, the variables have been added as initial values. Each of these problems can then be integrated to obtain a discretized trajectory. However, in order to enforce continuity over the planning horizon, matching constrains,
are placed on at the boundary of each subinterval. The functions represent the solutions to the initial value problems (2) at time .
Finally, the cost function is discretized over each time interval, resulting in the following NLP:
(3a)  
subject to  
(3b)  
(3c)  
(3d)  
(3e)  
(3f) 
In principle, any nonlinear programming method can be used to solve the above problem (Nocedal and Wright, 2006). However, the computational burden of solving an NLP at each time step is a severe limitation in realtime applications. To address this issue, there has been much interest in efficient optimization for receding horizon control (Domahidi et al., 2012; Vukov et al., 2013) . A particularly successful approach has been the combination of sequential quadratic programming (SQP) with structure exploiting stagewise solvers (Kouzoupis et al., 2015).
3.2 Sequential Quadratic Programming
In the SQP framework, the NLP (3) is linearized about a given nominal trajectory . The trajectory can then be improved according to the update (Nocedal and Wright, 2006),
(4) 
where is the step size and denotes the solution to the following quadratic program:
(5a)  
subject to the constraints  
(5b)  
(5c)  
(5d)  
(5e)  
(5f) 
The objective function (5a) is a quadratic approximation of the discretized cost function (3a) and the constraints correspond to linearizations of (3b) to (3f).
Provided that is positive semidefinite and is strictly positive definite, the trajectory can be shown to converge to a local optimum by iteratively linearizing and optimizing (Nocedal and Wright, 2006). This is guaranteed for least squares cost functions popular in tracking and stabilization tasks. When a more general cost function is desired a positive definite approximation to the Hessian can be obtained using BFGS updates.
In this paper we use FORCES (Domahidi and Jerez, 2014) as the stagewise solver for the quadratic program (5). FORCES is an interiorpoint method tailored for problems arising in RHC. In particular, the block diagonal structure of the Hessian (5a) and the fact that the states are only directly coupled to the previous time step are exploited to give linear computational complexity in the planning horizon.
It is often unnecessary to iterate to convergence before a reasonable improvement is found. In realtime settings this is important because we need to maintain a balance between efficiency and accuracy. An example of this tradeoff can be seen in figure 1. Given an initial trajectory, the car must maximize its progress along the race track.
4 Dynamics Model Learning
Typically, trajectory optimization based on receding horizon control relies on an analytic model of the system’s dynamics. However, in the setting of reinforcement learning the dynamics are unknown and must be learned by interacting with the environment. In order to effectively plan over the horizon, an accurate model of the system’s dynamics needs to be learned quickly.
Gaussian process (GP) regression is a nonparametric, Bayesian approach to model learning that has demonstrated impressive dataefficiency on both simulated and real robotic systems (Deisenroth and Rasmussen, 2011). Unfortunately, GPs do not scale well to large data sets, limiting their applicability in practice. There are, however, a number of approximation schemes that significantly reduce computational costs (QuiñoneroCandela and Rasmussen, 2005). In this paper, we use a sparse spectrum approximation of the kernel function (LázaroGredilla et al., 2010). Sparse spectrum GPs were chosen because (a) online data can be incorporated through incremental updates (Gijsberts and Metta, 2013), and (b) the learned model can be efficiently linearized to form the sequential quadratic programs.
4.1 Gaussian Process Regression
Given an input set of statecontrol pairs and the resulting (possibly noisy) state transitions , GP regression can be used to learn a model of the underlying dynamics. Following Deisenroth and Rasmussen (2011)
, we train conditionally independent GPs on each component of the state transition vector, so for the remainder of this section we will represent target data as
.Formally, a Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution
(Rasmussen and Williams, 2006). To completely specify a GP we need to choose a mean function and a covariance function, parametrized by a set of hyperparameters. A common, but flexible choice for the covariance function is the squared exponential kernel
where the signal variance
and characteristic length scales constitute the kernel’s set of hyperparameters. When modelling noisy targets, an additive noise term with variance is included in the covariance function.Since we model the relative rather than absolute state transitions, we choose a zero mean prior. This means that in the absence of data, the state is expected to remain unchanged regardless of the control input.
After defining the input matrix and the corresponding target vector
, the posterior predictive distribution for a set of test points,
, is a multivariate Gaussian with mean(6) 
and covariance
(7) 
where , and denotes the matrix of covariances evaluated at all pairs of input points. Computing the predictive mean and covariance are and respectively.
A common approach for learning the hyperparameters and the noise variance is to maximize the marginal likelihood (Rasmussen and Williams, 2006). The negative log marginal likelihood, given by
(8) 
can be minimized using gradient based optimizers. Since needs to be inverted each time the log marginal likelihood is evaluated, which is an operation in general, hyperparameter inference represents the main bottleneck for GP regression.
4.2 Sparse Spectrum Approximation
In this section, we assume that the covariance function is stationary, i.e. that is a function of . In this case, Bochner’s theorem (Rudin, 2011) states that
can be represented as the Fourier transform,
(9) 
of a positive finite measure . If has a density, then it is called the power spectrum of the covariance function and, by the WienerKhintchine theorem (Carlson et al., 2009), is the Fourier dual of . In particular, this means that
is proportional to some probability measure
over , and so equation (9) can be rewritten as an expectationwhere is the constant of proportionality.
To approximate the expectation, we draw sample frequencies from and take averages. Since the power spectrum is symmetric about zero, we also include for each sample frequency, in order to guarantee that is real valued for all . This results in the sparse spectrum approximation (LázaroGredilla et al., 2010) of the covariance function:
(10) 
In the particular case of the squared exponential kernel
, and the frequencies are drawn from the normal distribution
, where .For convenience, we define the feature mapping by,
In order to make use of the approximation (10), the matrix inversion lemma is applied to equations (6) and (7), giving
where , and is the matrix obtained by applying to each column of . Instead of inverting the matrix , we now only require the inverse of the matrix , which constitutes a significant saving in computational cost if . Importantly, the size of is independent of the number of training points, which makes it amenable to incremental updates (Gijsberts and Metta, 2013).
Applying the same idea to the negative log likelihood (8) gives the expression
(11) 
Again, the smaller size of results in reduced computational complexity for hyperparameter inference. In particular, each step of the gradient based optimization is .
4.3 Incremental Updates
To incrementally handle a stream of data, the matrix and the vector need to be updated in real time. Given a new sample, , the updates are computed according to the rules:
(12) 
Since remains positive semidefinite after each update, we do not need to store it explicitly. Instead, we can keep track of its upper triangular Cholesky factor. This allows us to make use of fast, numerically stable rank1 Cholesky updates (Gijsberts and Metta, 2013).
5 Experiments
In the following sections we evaluate the performance of GPRHC on a cartpole swing up task and an autonomous racing scenario. In all the experiments GPRHC was able to run in realtime with appropriate choices of the sparsity and planning horizon . The choice of essentially involves a tradeoff between model accuracy and the computational costs of both the learning and the prediction routines. Empirically, we found that a value between 20 and 100 allowed us to learn a sufficiently accurate dynamics model while remaining within the realtime constraints. Similarly, the choice of involves a balance between computational costs and the quality of the controller. Ideally, one would choose a large value of so that the controller can optimally react to future changes in dynamics or constraints.
5.1 Cartpole Swingup
Cartpole experiments are a common benchmark in both reinforcement learning and control theory (Kober et al., 2013). The basic setup consists of a cart with an attached pendulum running along a track. In the swingup task, the pendulum is initially pointing downwards and the objective is to apply horizontal forces to the cart in order to swing the pendulum up and balance it above the cart in the center of the track. This is a relatively difficult control problem as the dynamics are fairly nonlinear. Additionally, a long planning horizon is required because the cart must be pushed back and forth in order to develop enough momentum to swing the pendulum up.
The state of the system, , is described by the position of the cart, the velocity of the cart, the angle of the pendulum and its angular velocity. A horizontal force in the range of N to N can be applied to the cart at a sampling rate of s. The cost function is given by a least squares objective penalizing the distance from the set point .
In the first experiment we compare GPRHC against PILCO (Deisenroth and Rasmussen, 2011) and the groundtruth analytical model. To initialize each trial, 80 data points were collected from an episode with random control inputs. For the sparse spectrum approximation 50 sample frequencies were drawn and a planning horizon of 50 time steps was used. After each episode the GP models were retrained on all the preceding data using 10 random restarts to avoid poor local minima. PILCO can potentially have problems with least squares costs (Deisenroth, 2010) so we used (the preferred) saturating cost function and postprocessed the results. Figure 2 shows that GPRHC is competitive with PILCO both in terms of sample efficiency and overall performance.
In the second experiment the length of the track is limited; constraining the cart’s position to between m and m. Since PILCO does not handle constraints we just compare GPRHC to the analytical model. The GP models were initialized with 20 points of training data collected from an episode with random inputs.
As seen in figures 2 and 3, GPRHC learns to solve the cartpole swingup task in just a few episodes. In particular, by updating the dynamics model online we can achieve performance comparable to the optimal analytic model at least a full episode earlier. Another noticeable feature is the reduced variability in cost. GPRHC, with online updates, performs more consistently and is less susceptible to variations in the initial training data. Finally, table 1 shows that fewer constraint violations can be expected when using online updates. In fact, after the first episode only a single constraint violation was encountered, in contrast to the method without updates which incurred significantly more violations. These results indicate that GPRHC enables fast learning and planning in safety critical conditions.
EPISODE  UPDATES  NO UPDATES 

1  100.0  100.0 
2  4.5  18.2 
3  0.0  13.6 
4  0.0  4.5 
5  0.0  0.0 
6  0.0  0.0 
7  0.0  0.0 
5.2 Autonomous Racing
In this section we apply GPRHC to the autonomous racing of 1:43 scale remote control cars. Liniger et al. (2015) originally investigated this problem using an analytical model of the cars incorporated into a contouring control framework (see section 5.2.2). The objective is to maximize progress along the race course (depicted in figure 6) while remaining within the track boundaries.
5.2.1 Bicycle Model
Following Liniger et al. (2015), the cars are modeled using a bicycle model. The cars are treated as rigid bodies and symmetry is used to approximate the pairs of front and back tyres as single wheels. Pitch and roll dynamics are neglected so only inplane motion is considered. In our experiments the additional complexity of tyre dynamics is ignored by assuming a noslip model.
The state space is described by the vector , where and denote the position of the car, the longitudinal velocity of the car, and the car’s orientation. The control signal consists of the PWM duty cycle of the electric drive train motor and the steering angle of the front wheels. The duty cycle is constrained to the interval and the steering angle cannot exceed 18 degrees.
5.2.2 Contouring Control
Contouring control was originally designed for industrial applications like machine tool control and laser profiling (Lam et al., 2010). The objective of the controller is to track a given reference path while maximizing some measure of progress. Often these are competing interests and we need to find a balance between speed and tracking accuracy. In contrast to standard tracking approaches, the reference path is described only in terms of spatial coordinates. By specifying velocities and orientations the contouring controller is free to determine how the path is followed.
Here we assume that the reference path is given by an arc length parametrized curve
where
is the total length of the path. In our experiments, the center line of the race track is used as the reference path. To find an arc length parametrization, the center line is interpolated by a cubic spline, using the method described by
Wang et al. (2002).Let denote the position of the car at time . Then, the contouring error,
is defined as the normal deviation from the path , where is the value of the path parameter which minimizes the distance between the point and the path, and is the unit normal to at .
Calculating the contouring error requires us to determine the value of at each point along the planned trajectory. This is too computationally intensive to use as a cost function in the iterative SQP framework. To address this issue, approximations to at each point are introduced into the state. The dynamics are then augmented by the equation
where denotes the approximation to at time , and is a virtual control input. Since the path is parameterized by arc length, can be thought of as the velocity of the car along the center line. For the auxiliary state to be a useful approximation we introduce a lag error term defined as the distance between the points and along the reference path.
Since neither the contouring nor lag error can be used directly in the cost function, approximations defined only in terms of and are made. The approximate contouring error and approximate lag error are defined as the orthogonal and tangential component of the error between the points and ,
where is the unit tangent to at . It is clear from figure 4 that approaches , and approaches as the lag error is reduced. Therefore, in order to get a good approximation of the lag error is heavily penalized in the cost function (13).
Using the approximate contouring and lag errors an intermediate cost function can be defined
(13) 
The term can be thought of as a reward for progressing along the track and the weights and represent the relative importance of fast progress and accurate path tracking.
5.2.3 Track Constraints
To ensure that the car remains within the track, limits are placed on the and components of the car’s state. Each point on the planned trajectory is constrained to lie within two half spaces defined by the left and right track boundaries (see figure 5). The relevant tangent lines are found by projecting the point (an approximation to the closest point on the center line) onto the track boundaries.
The planned trajectory is often along the limit of these constraints so in order to avoid infeasibility problems in practice, the constraints are softened by adding slack variables. By penalizing the slack variables heavily in the cost function the original solution of the hard constrained problem is recovered where it would admit a solution.
5.2.4 Results
Initially, 70 points of data were collected from a demonstrated trajectory around a simple oval track. 100 sample frequencies were chosen for the sparse spectrum approximation. A planning horizon of 20 time steps was used at a sampling rate of 0.03s. We found that we could plan in real time by limiting the number of SQP iterations to 30. The race track is shown in figure 6 along with the driven trajectory. The car initially starts at rest the point and the race is completed after one full lap. The track is 7.23m long (measured at the center line).
ANALYTIC  UPDATES  NO UPDATES 
2.565  2.639  2.745 
Using a learned dynamics model, GPRHC quickly accelerates from rest and drives trajectories that satisfy the complex track constraints (see figure 6). The sparse spectrum Gaussian processes are very data efficient, learning a sufficiently accurate model of the car’s dynamics from just 70 data points. In particular, the training data was collected by driving around a much simpler oval track yet the learned dynamics were able to generalize well enough to effectively navigate the sharp left turn in the new race course. Table 2 shows the lap times of GPRHC with and without online updates compared to the baseline analytic model. By updating the model online we improve the lap time by more than 0.1s (a relative improvement of about 3.86 percent). This represents a significant saving considering the high speeds and small length scales involved in the problem. In fact, naively scaling up the domain results in a 311m track with a lap time improvement of about 4.56 seconds. This corresponds to a 16.4km/h increase in average speed around the track when using online updates.
In addition to the racing task we also consider an obstacle avoidance problem depicted in figure 7. Static obstacles, represented by the blue cars, are included by adjusting the track constraints. Liniger et al. (2015) determine these adjustments using a highlevel planner based on dynamic programming. In our experiments, we manually specify the obstacle constraints but a high level planner could be employed in principle. Since the dynamics are independent of the constraints and cost function, GPRHC can implicitly take advantage of all the information gained in the racing task by simply reusing the learned model with no further learning. This could be useful in safety critical tasks where costly trials can avoided by safely learning a model of the dynamics in a simpler, safer environment.
6 Conclusion and Future Work
In this paper we introduce GPRHC for online learning and planning in continuous environments with nonlinear constraints. This is achieved by combining receding horizon control for planning with data efficient sparse spectrum Gaussian processes for model learning. We show that incorporating online data results in faster convergence to optimal behaviour while significantly reducing the number of constraint violations during learning — an important feature for safety critical applications. We demonstrate our approach on a complex autonomous racing task, showing that GPRHC enables learning from only few training points, and the ability to apply the learned model to new tasks with no additional training. This method provides a promising approach to deploying online reinforcement learning algorithms on complex systems such as robots.
In future work plan to apply GPRHC to real robotic systems such as quadrotors or manipulators. To achieve this goal, a possible improvement to the current method would be the ability to handle input noise. GP regression methods typically assume that the training inputs are noise free. However, in real robotic systems, sensors and filtering algorithms can introduce noise into the state estimation. This issue could be addressed, by incorporating ideas from
Mchutchon and Rasmussen (2011) for example, to make GPRHC more robust.Acknowledgements
We thank the reviewer for their helpful insights and feedback.
References

Andersson et al. (2015)
Olov Andersson, Fredrik Heintz, and Patrick Doherty.
Modelbased reinforcement learning in continuous environments using
realtime constrained optimization.
In
TwentyNinth AAAI Conference on Artificial Intelligence (AAAI15)
, 2015.  Bock and Plitt (1984) Hans Georg Bock and KarlJosef Plitt. A multiple shooting algorithm for direct solution of optimal control problems. In Proceedings of the IFAC world congress, 1984.
 Boedecker et al. (2014) Joschka Boedecker, Jost Tobias Springenberg, Jan Wülfing, and Martin Riedmiller. Approximate realtime optimal control based on sparse gaussian process models. In 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pages 1–8. IEEE, 2014.
 Carlson et al. (2009) A.B. Carlson, P. Crilly, and P.B. Crilly. Communication Systems. McGrawHill Education, 2009.

Deisenroth and Rasmussen (2011)
Marc Deisenroth and Carl E Rasmussen.
Pilco: A modelbased and dataefficient approach to policy search.
In
Proceedings of the 28th International Conference on machine learning (ICML11)
, pages 465–472, 2011.  Deisenroth (2010) Marc Peter Deisenroth. Efficient reinforcement learning using Gaussian processes. KIT Scientific Publishing, 2010.
 Domahidi and Jerez (2014) Alexander Domahidi and Juan Jerez. FORCES Professional. embotech GmbH (http://embotech.com/FORCESPro), July 2014.
 Domahidi et al. (2012) Alexander Domahidi, Aldo U Zgraggen, Melanie N Zeilinger, Manfred Morari, and Colin N Jones. Efficient interior point methods for multistage problems arising in receding horizon control. In 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pages 668–674. IEEE, 2012.
 Gijsberts and Metta (2013) Arjan Gijsberts and Giorgio Metta. Realtime model learning using incremental sparse spectrum gaussian process regression. Neural Networks, 41:59–69, 2013.
 Kim et al. (2004) H. J. Kim, Michael I. Jordan, Shankar Sastry, and Andrew Y. Ng. Autonomous helicopter flight via reinforcement learning. In Advances in Neural Information Processing Systems 16, pages 799–806. MIT Press, 2004.
 Kober et al. (2013) Jens Kober, J Andrew Bagnell, and Jan Peters. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 2013.
 Kouzoupis et al. (2015) D Kouzoupis, A Zanelli, Helfried Peyrl, and Hans Joachim Ferreau. Towards proper assessment of qp algorithms for embedded model predictive control. In Control Conference (ECC), 2015 European, pages 2609–2616. IEEE, 2015.
 Lam et al. (2010) Denise Lam, Chris Manzie, and Malcolm Good. Model predictive contouring control. In Decision and Control (CDC), 2010 49th IEEE Conference on, pages 6137–6142. IEEE, 2010.
 LázaroGredilla et al. (2010) Miguel LázaroGredilla, Joaquin QuiñoneroCandela, Carl Edward Rasmussen, and Aníbal R FigueirasVidal. Sparse spectrum gaussian process regression. The Journal of Machine Learning Research, 11:1865–1881, 2010.
 Liniger et al. (2015) Alexander Liniger, Alexander Domahidi, and Manfred Morari. Optimizationbased autonomous racing of 1:43 scale rc cars. Optimal Control Applications and Methods, 36(5):628–647, 2015.
 Mchutchon and Rasmussen (2011) Andrew Mchutchon and Carl E. Rasmussen. Gaussian process training with input noise. In Advances in Neural Information Processing Systems 24, pages 1341–1349. Curran Associates, Inc., 2011.
 Nocedal and Wright (2006) J. Nocedal and S. Wright. Numerical Optimization. Springer Series in Operations Research and Financial Engineering. Springer New York, 2006.
 Pan and Theodorou (2014) Yunpeng Pan and Evangelos Theodorou. Probabilistic differential dynamic programming. In Advances in Neural Information Processing Systems, pages 1907–1915, 2014.
 QuiñoneroCandela and Rasmussen (2005) Joaquin QuiñoneroCandela and Carl Edward Rasmussen. A unifying view of sparse approximate gaussian process regression. Journal of Machine Learning Research, 6(Dec):1939–1959, 2005.
 Rasmussen and Williams (2006) Carl Edward Rasmussen and Christopher K I Williams. Gaussian processes for machine learning. 2006.
 Rudin (2011) Walter Rudin. Fourier analysis on groups. John Wiley & Sons, 2011.
 Vukov et al. (2013) Milan Vukov, Alexander Domahidi, Hans Joachim Ferreau, Manfred Morari, and Moritz Diehl. Autogenerated algorithms for nonlinear model predictive control on long and on short horizons. In 52nd IEEE Conference on Decision and Control, pages 5113–5118. IEEE, 2013.
 Wang et al. (2002) Hongling Wang, Joseph Kearney, and Kendall Atkinson. Arclength parameterized spline curves for realtime simulation. In In in Proc. 5th International Conference on Curves and Surfaces, pages 387–396, 2002.
Comments
There are no comments yet.