I Introduction
Reinforcement learning (RL) has been successfully applied in a variety of challenging tasks, such as Go game and robotic control [1, 2]
. The increasing interest in RL is primarily stimulated by its datadriven nature, which requires little prior knowledge of the environmental dynamics, and its combination with powerful function approximators, e.g. deep neural networks. In spite of these advantages, many purely datadriven RL suffers from slow convergence rate in continuous action space of stochastic systems, which hinders its widespread adoption in realworld applications
[3, 4].To alleviate this problem, researchers have investigated the use of modeldriven RL algorithms, which searches the optimal policy with known environmental models by employing the principle of Bellman optimality[5, 6, 7, 8]. Modeldriven RL has shown faster convergence compared to the datadriven counterparts, since environmental models provide the information of environmental evolution in the whole stateaction space. Thus, gradient calculation can be easier and more accurate than merely using data samples [9]. To solve the Bellman equation in the continuous action space, most existing RL methods adopt an iterative technique to gradually find the optimum. One classic framework is called policy iteration RL, which consists of two steps: 1) policy evaluation (PEV), that aims at solving selfconsistency condition equation and evaluating the current policy, and 2) policy improvement (PIM) that seeks to optimize the corresponding value function[10, 11].
A number of prior works focus on improving the PEV step by using modeldriven value expansion, which corrects the cumulative return or the approximated value function by using environmental models[12, 13]. However, due to the inherent model inaccuracy, this technique is not suitable for longterm PEV. To partly solve this problem, modelbased value expansion algorithm proposed a hybrid algorithm that uses environmental dynamic model only to simulate the shortterm horizon, and utilizes the explored data to estimate the longterm value beyond the simulation horizon[14]. Nevertheless, the inaccuracy problem hinders the application of environmental model in PEV.
So far, the environmental model has limited applications in the PIM step due to two main issues: 1) the inaccuracy and overfitting of environmental dynamic models and 2) policy oscillation caused by the timevarying models, since the system model is iteratively learned or updated in the training process [15, 16, 17]. Prior works provide the model ensemble technique for solving these problems. For example, the modelensemble trust region policy optimization (TRPO) algorithm [18] limits model overtraining by using an ensemble metric during policy search. The stochastic ensemble value expansion [19]
, which is an extension to the modelbased value expansion, interpolates between many different horizon lengths and different models to favor models that generate more accurate estimates. Although the ensemble techniques effectively avoid overfitting, it brings extra computational overhead.
Facing the aforementioned challenges of RL algorithms, this paper proposes a mixed reinforcement learning (mixed RL) algorithm that utilizes the dual representations of environmental dynamics to improve both learning accuracy and training speed. The environmental model, either empirical or theoretical, is used as the prior information to avoid overfitting, while the model error is iteratively compensated by the measured data of states and actions using Bayesian estimation. Precisely, the contributions of this paper are as follows,

A dual representation of environmental dynamics is utilized in RL by integrating the designer’s knowledge with the measured data. An iterative Bayesian estimator (IBE) with explored data is designed for improving the model accuracy and computation efficiency.

A mixed RL algorithm is developed by embedding the iterative Bayesian estimator into the policy iteration. We propose the sufficient recursive stability and convergence condition which limits the estimated difference of iterative Bayesian estimator between two consecutive iterations. And we proved that the sufficient condition holds with probability one after sufficient iterations.
The rest of this paper is organized as follows. Section II defines a mixed RL problem. Section III introduces the mixed representation of environmental dynamics. Section IV and Section V presents the mixed RL algorithm, as well as the parametrization of the policy and value function. Section VI evaluates the effectiveness of mixed RL problem using the double lane change task with a automated vehicle, and Section VII concludes this paper.
Ii Problem Description
We consider a discretetime environment with additive stochastic uncertainty and its actual dynamics is mathematically described as
(1) 
where is the current time, is the state, is the action, is the deterministic part of environmental dynamics, is the additive stochastic uncertainty with unknown mean and covariance . In this study, we assume that the additive stochastic uncertainty
follows the Gaussian distribution and
. Parameters and can be completely independent of or form a functional relationship with .As shown in Fig. 1, actual environmental dynamics contains both deterministic part and uncertain part , where is the probability density of and is the probability density of under given .
The objective of mixed RL is to minimize the expectation of cumulative cost under the distribution of additive stochastic uncertainty , shown as (2):
(2) 
where is policy, is the state value, which is a function of initial state , is the utility function, which is positive definite, is the discounting factor with , and is the expectation w.r.t. the additive stochastic uncertainty . Here, the policy is a deterministic mapping:
(3) 
The optimal cost function is defined as
(4) 
where is the action sequence starting from time . In mixed RL, the selfconsistency condition (5) is used to describe the relationship of state values between current time and next time:
(5) 
By using Bellman’s principle of optimality. we have the wellknown Bellman equation:
(6) 
The Bellman equation implies that optimal policy can be calculated in a stepbystep backward manner. Therefore, optimal action is
(7) 
where represent the optimal policy that maps from an arbitrary state to its optimal action . Similar to other indirect RL problems, mixed RL aims to find an optimal policy by minimizing cost (2) while being subjected to the constrains of environmental dynamics. The searching procedure can be replaced by solving the Bellman equation in an iterative way. Obviously, the performance of the generated policy depends on the accuracy of the representation of the environmental dynamics. In fact, either an analytical model or stateaction samples can be an useful representation, which corresponds to the socalled modeldriven RL and datadriven RL, respectively. The analytical model is usually inaccurate due to environmental uncertainties, which will impair the optimality of the generated policy. The stateaction samples, on the other hand, have low sampling efficiency and will slow down the training process.
Iii Dual Representation of Environmental Dynamics
In mixed RL, the environmental dynamics are dually represented by both an analytical model and stateaction data . The former represents the designer’s knowledge about the environmental dynamics. It is defined in the whole stateaction space and can be used to accelerate the training speed. The latter comes from direct measurement of stateaction pairs during learning. It is generally more accurate than , and therefore can improve the estimation of the uncertain part in the analytical model. The mixed RL uses the dual representation of environmental dynamics, i.e., both analytical model and stateaction data , to search for optimal policy. Such dual representation can have accelerated training compared to purely datadriven RL while achieving better policy satisfaction than purely modeldriven counterpart.
The analytical model is similar to (1):
(8) 
where the mean and covariance of are given in advance by designers. The given distribution can be quite different from actual distribution due to the modelling errors. Here, and are taken as the prior knowledge of environmental dynamics.
The stateaction data, i.e., a sequence of triples , is denoted by :
(9) 
where is the th state in , is the th action in , and is the length of data samples. Obviously, the measured data also inherently contain the distribution information of , and are taken as the posterior knowledge of environmental dynamics.
If the environmental dynamics is exactly known, optimal policy can be computed by only using the dynamic model, which is also the most efficient RL. However, exact model is inaccessible in reality, and thus the generated policy might not converge to . Although collecting samples is less efficient, it can be quite accurate to represent the environment, thus being able to improve the generated policy. Therefore, the mixed representation is able to utilize advantages of both model and data to improve training efficiency and policy accuracy.
Improve model by using data :
We utilize data samples to improve the estimation of the additive stochastic uncertainty in the analytical model . The uncertainty that inherently exists in a stateaction triple is equal to
(10) 
A Bayesian estimator is adopted to fuse the distribution information of the additive stochastic uncertainty from both model and data
. The Bayesian estimator aims to maximize the posterior probability
. In general, we introduce and as the the prior distribution of and , then the maximum likelihood problem becomes(11) 
Under the assumption that data is iid, (11) can be rewritten into iterative form:
(12) 
Therefore, we can build an iterative Bayesian estimator with the following general form,
(13) 
Here, we discuss two simplified cases of the Bayesian estimator:
Case 1: Assume that the covariance is known and is independent from and , we introduce provided by model as the prior distribution of . Thus, the objective function of Bayesian estimation becomes,
(14) 
where is the prior distribution and is a constant. The optimal estimation of is calculated by (15).
(15) 
The can be iteratively computed by using IBE. Define , and , the iterative Bayesian estimator is
(16) 
Case 2: Assume that both the mean and covariance are unknown. The same prior distribution in case 1 is applied to . The covariance is estimated by the maximum likelihood estimation, since the parameters of the prior distribution of are inconvenient to determine by human designer. Subsequently, the optimal estimation of and are as follows,
(17) 
Define and . Then and can be iteratively computed by the following IBE,
(18)  
For more general cases where is related to and , i.e. , where is a general function with parameter , the likelihood becomes (19) and the optimal estimation of is the minimum of .
(19) 
Iv Mixed RL Algorithm
Iva Mixed RL Algorithm Framework
Existing RL algorithms that compute the optimal policy via the use of Bellman equation are known as indirect RL and they usually involve PEV and PIM steps. Different from traditional indirect RL algorithms, mixed RL consists of three alternating steps, i.e., IBE, PEV and PIM, as shown in Fig. 2. IBE that is proposed in Section III is used to estimate the mean and covariance of the additive stochastic uncertainty iteratively. PEV seeks to numerically solve a group of algebraic equations governed by the selfconsistency condition (5) under currentstep policy , and PIM is to search a better policy by minimizing a “weak” Bellman equation.
In the first step, IBE calculates and with the latest data and the mixed model is updated accordingly, i.e.,
(20) 
where is defined in (13). The optimal policy is searched by policy iteration with the mixed model (20). In the second step, PEV solves (21) under the estimated distribution of :
(21) 
where is the current policy at step iteration, and is the state value to be solved under policy . In the third step, PIM computes an improved policy by minimizing (22):
(22) 
where is the new policy. The use of estimated naturally embeds both analytical model and stateaction data into RL, which is able to improve the accuracy of the additive stochastic uncertainty and and achieve high convergence speed. The mixed RL algorithm is summarized in Algorithm 1.
IvB Recursive Stability and Convergence Under Fixed
In this section, we prove the recursive stability and convergence under fixed additive uncertainty .
IvB1 Recursive stability
Recursive stability means can stabilize the plant so long as can. We call the closedloop stochastic system is stable in probability, if for any , the following equality holds,
(23) 
Lemma 1 (Lyapunov stability criterion [20]):
If there exists a positive definite Lyapunov sequence on , which satisfies
(24) 
then the stochastic system is stable in probability, where is a continuous function, and .
Next, we prove the recursive stability criterion for mixed RL algorithm under fixed using Lemma 24.
Theorem 1 (Recursive stability theorem):
For any step in mixed RL, is stable in probability if is stable in probability and the discount factor is selected appropriately under the mixed model.
Proof:
Since is optimal for “weak” Bellman equation, and is nonoptimal for step value, we have:
(25) 
where is the next state with under the mixed model, and is the next state with . Therefore,
(26) 
Since is stable in probability, is bounded, thus, is bounded. Considering the fact that and is positive definite function, holds, except for which is stable in probability naturally.
We choose a proper to satisfy:
(27) 
Therefore, is monotonically decreasing w.r.t time with approximate , i.e.,
(28) 
In short, is stable in probability.
∎
IvB2 Convergence of mixed RL
The convergence property describes whether the generated policy, , can converge to the optimum under the mixed RL. Here, we prove the convergence of mixed RL algorithm under fixed .
Theorem 2 (State value decreasing theorem):
For any under the additive stochastic uncertainty , is monotonically decreasing with respect to , i.e.,
(29) 
Proof:
The key is to examine (except for )
(30) 
At each RL iteration, we initialize step value function by . The first PEV iteration for is
(31)  
With respect to (25), we know
(32) 
For following PEV iterations, we need to reuse the inequality (32):
(33)  
Similarly, . Therefore, is a monotonically decreasing sequence and bounded by 0 for always holds. Finally, will converge
(34) 
So we have .
∎
IvC Recursive Stability and Convergence Under Varying
In this section, we discuss the recursive stability and convergence under varying additive uncertainty , and propose the sufficient condition by designing an upper bound for the differences between and .
Under , the selfconsistency condition is
(35) 
Since is the optimal action with respect to of in the kth iteration, we have
(36) 
which is the key inequality in the proof in section IVB.
However, when is updated from to , the variation of should be bounded in the interest of stability and convergence. Here, we give the sufficient condition of recursive stability and convergence under varying , that is, the maximum variation condition (MVC) of the additive stochastic uncertainty (38).
Define as the expected cumulative cost under the additive stochastic uncertainty ,
(37) 
Theorem 3 (Sufficient condition for recursive stability and convergence):
For any step in mixed RL, is recursive stable and is monotonically decreasing with respect to , if the following MVC is satisfied
(38) 
where is the decrease of cumulative cost after PIM,
(39) 
The MVC requires that the change of have less impact on the cumulative cost calculation than PIM in the last iteration.
Proof:
Since , when MVC is satisfied, , thus, we have
(40) 
Next, we first present Lemma 2 that will be used for the convergence analysis of IBE, then we prove the MVC is satisfied with probability one.
Lemma 2 (Convergence criterion of Bayesian estimation [21]):
In Bayesian estimation, if the empirical data and the parameter’s prior distribution obey Gauss distribution and the covariance matrix of prior distribution is full rank, then the estimation result and will converge to the sample’s mean and covariance asymptotically.
Theorem 4 (MVC is satisfied with probability one criteria):
The MVC is satisfied with probability one after sufficient iterations, with the assumption that the IBE converges faster than PIM and PEV.
Proof:
Using Kolmogorov strong law of large numbers
[22], we have(43) 
where and are arbitrary small positive constants, and are the true mean and covariance of . Thus, using Lemma 2 and (43), we know that, when , and converges to and in probability one [21], i.e.,
(44) 
Since , , and both and obey Gaussian distribution, the KLdivergence between and converge to with probability one [23], i.e.,
(45) 
Thus, we have
(46) 
In general, MVC indicates that the excessive difference between and should be avoided. In mixed RL, we update the distribution of the additive stochastic uncertainty by Bayesian estimation. As shown in Fig. 3, if a single data batch has large deviation from the total data, the Bayesian estimator can reduce the deviation between the posterior distribution and the total data distribution by introducing appropriate prior distribution of parameters.
V Mixed RL with Parameterized Functions
For large state spaces, both value function and policy are parameterized in mixed RL, as shown in (48). The parameterized value function with known parameter is called the “critic”, and the parameterized policy with known parameter is called the “actor” [24].
(48) 
The parameterized critic is to minimize the average square error (49) in PEV, i.e.,
(49) 
The semigradient of the critic is
(50) 
where and .
The parameterized actor is to minimize the “weak” Bellman condition, i.e., to minimize the following objective function,
(51) 
where and are the mean and covariance of . The gradient of is calculated as follows,
(52) 
In essence, the parameterized method is called generalized policy iteration (GPI). Different from the traditional policy iteration, PEV and PIM each has only one step in GPI, which greatly improves the computational efficiency when RL is combined with neural network.
Since in each GPI cycle, the gradient descent of PIM is only carried out once, the maximum variation condition (MVC) may not be satisfied. We propose a Adaptive GPI (AGPI) method to solve this problem. In every iteration, we check whether the PIM results satisfy MVC. If not, the algorithm will continue the gradient descent steps in PIM until the MVC is satisfied or when the maximum internal circulation step is reached. Subsequently, the mixed RL algorithm with parameterized Adaptive GPI (AGPI) is summarized in Algorithm 2.
Vi Numerical Experiments
We consider a typical optimal control problem of stochastic nonaffine nonlinear systems, i.e., the combined lateral and longitudinal control of an automated vehicle with stochastic disturbance (i.e., the influence of small road slope and road bumps). The vehicle is subjected to random longitudinal interference force in the tracking process and the vehicle dynamics is shown in (53) [25].
(53) 
where the state , is the lateral velocity, is yaw rate, is the difference between longitudinal velocity and desired velocity, is the yaw angle, and is the distance between vehicle’s centroid and the target trajectory. For the control input , where is the front wheel angle and is the longitudinal acceleration. The and are the lateral tire forces of the front and rear tires respectively, which are calculated by the Fiala tire model [26]. In the tire model,the tireroad friction coefficient is set as 1.0. The front wheel cornering stiffness and rear wheel cornering stiffness are set as 88000 and 94000 respectively. The mass is set as 1500 , the and are the distances from centroid to front axle and rear axle, and set as 1.14 and 1.40
respectively. The polar moment of inertia
at centroid is set as 2420 . The random longitudinal interference force and the desired velocity is set as 12 [27].For comparison purpose, a doublelane change task was simulated respectively with three different RL algorithms. The task is to track the desired trajectory in the lateral direction while maintaining the desired longitudinal velocity under the longitudinal interference . Hence, the optimal control problem with discretized stochastic system equation is given by
(54) 
where is the discounting factor, is the deterministic part of the discretized system equation of (47), is the additive stochastic uncertainty and the simulation time interval is set as . In this simulated task, we compared the performance of mixed RL with both modeldriven RL and datadriven RL. The datadriven RL computes the control policy only by using the stateaction data with a typical datadriven algorithm (i.e., DDPG) [3]. The modeldriven RL computes the policy by GPI [28] directly using the given empirical model
(55) 
where the prior distribution is set as and is a diagonal matrix, whose diagonal elements are .
The convergence performance of these three algorithms are compared in Fig. 4. The mixed RL and modeldriven RL can converge in 1 iterations, while the datadriven RL needs 4 iterations to converge under the same hyperparameter.
For control performance, we test the policies calculated by three methods in the double lane change task. As shown in Fig. 5, all three policies stably tracked the target trajectory, but with different control error. In fact,
as shown in Fig. 6, the mixed RL has the minimum longitudinal speed error, since it enables the vehicle to decelerate rapidly at sharp turns and adjust back appropriately after passing the turns. In contrast, due to the model error, the modeldriven RL has higher speed error and its deceleration when making turns is insufficient. Due to the slow convergence, the datadriven RL generates a poor solution and has the largest speed error.
The mixed RL also outperforms the other two benchmark methods in terms of the lateral position error. As shown in Fig. 7, the mixed RL has the minimum steadystate lateral position error, while datadriven RL has the largest lateral position error and frequent speed fluctuation.
The mean absolute errors of three methods are compared in Table I. The longitudinal speed error of mixed RL is 77.41 less, and the lateral position error is 33.77 less than the datadriven RL. Besides, the longitudinal speed error of mixed RL is 58.82 less, and the lateral position error is 15.64 less than the modeldriven RL.
Method  Position error  Speed error 

Mixed RL  0.151  0.021 
Datadriven RL  0.228  0.093 
Modeldriven RL  0.179  0.051 
In summary, mixed RL exhibits the fastest convergence speed during the training process and the greatest control performance in double lane change task. The modeldriven RL has similar convergence speed as the mixed RL, but has higher control error due to the model error. The datadriven RL has the slowest convergence rate and the largest control error, due to the difficulties in finding the optimal policy only by stateaction data.
Vii Conclusion
This paper proposes a mixed reinforcement learning approach with better performances on convergence speed and policy accuracy for nonlinear systems with additive Gaussian uncertainty. The mixed RL utilizes an iterative Bayesian estimator to accurately model the environmental dynamics by integrating the designer’s knowledge with the measured state transition data. The convergence and recursive stability of learned policy were proved via Bellman’s principle of optimality and Lyapunov analysis. It is observed that mixed RL achieves faster convergence rate and more stable training process than the datadriven counterpart. Meanwhile, mixed RL has lower policy error than modeldriven counterpart since the environmental model is refined iteratively by Bayesian estimation. The benefits of mixed RL are demonstrated by a doublelane change task with an automated vehicle. The potential of mixed RL in more general environmental dynamics and nonGauss uncertainties will be investigated in the future.
References
 [1] E. Gibney, “Google ai algorithm masters ancient game of go,” Nature News, vol. 529, no. 7587, p. 445, 2016.
 [2] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Humanlevel control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
 [3] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
 [4] J. Duan, Y. Guan, Y. Ren, S. E. Li, and B. Cheng, “Addressing value estimation errors in reinforcement learning with a stateaction return distribution function,” arXiv preprint arXiv:2001.02811, 2020.
 [5] T. Bian, Y. Jiang, and Z.P. Jiang, “Adaptive dynamic programming and optimal control of nonlinear nonaffine systems,” Automatica, vol. 50, no. 10, pp. 2624–2632, 2014.
 [6] D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas, Dynamic programming and optimal control. Athena scientific Belmont, MA, 1995, vol. 1, no. 2.
 [7] J. Duan, S. E. Li, Z. Liu, M. Bujarbaruah, and B. Cheng, “Generalized policy iteration for optimal control in continuous time,” arXiv preprint arXiv:1909.05402, 2019.
 [8] J. Duan, Z. Liu, S. E. Li, Q. Sun, Z. Jia, and B. Cheng, “Deep adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints,” arXiv preprint arXiv:1911.11397, 2019.
 [9] F. L. Lewis and D. Liu, Reinforcement learning and approximate dynamic programming for feedback control. John Wiley & Sons, 2013, vol. 17.
 [10] D. P. Bertsekas, “Approximate policy iteration: a survey and some new methods,” Journal of Control Theory and Applications, vol. 9, no. 3, pp. 310–335, 2011.
 [11] Y. Guan, S. E. Li, J. Duan, J. Li, Y. Ren, and B. Cheng, “Direct and indirect reinforcement learning,” arXiv preprint arXiv:1912.10600, 2019.
 [12] S. Bansal, R. Calandra, K. Chua, S. Levine, and C. Tomlin, “Mbmf: Modelbased priors for modelfree reinforcement learning,” arXiv preprint arXiv:1709.03153, 2017.
 [13] A. Nagabandi, G. Kahn, R. S. Fearing, and S. Levine, “Neural network dynamics for modelbased deep reinforcement learning with modelfree finetuning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 7559–7566.

[14]
V. Feinberg, A. Wan, I. Stoica, M. Jordan, J. Gonzalez, and S. Levine,
“Modelbased value expansion for efficient modelfree reinforcement
learning,” in
Proceedings of the 35th International Conference on Machine Learning (ICML 2018)
, 2018.  [15] S. Levine and P. Abbeel, “Learning neural network policies with guided policy search under unknown dynamics,” in Advances in Neural Information Processing Systems, 2014, pp. 1071–1079.
 [16] M. C. Yip and D. B. Camarillo, “Modelless feedback control of continuum manipulators in constrained environments,” IEEE Transactions on Robotics, vol. 30, no. 4, pp. 880–889, 2014.
 [17] R. Lioutikov, A. Paraschos, J. Peters, and G. Neumann, “Samplebased informationltheoretic stochastic optimal control,” in 2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2014, pp. 3896–3902.
 [18] T. Kurutach, I. Clavera, Y. Duan, A. Tamar, and P. Abbeel, “Modelensemble trustregion policy optimization,” arXiv preprint arXiv:1802.10592, 2018.
 [19] J. Buckman, D. Hafner, G. Tucker, E. Brevdo, and H. Lee, “Sampleefficient reinforcement learning with stochastic ensemble value expansion,” in Advances in Neural Information Processing Systems, 2018, pp. 8224–8234.
 [20] H. Deng, M. Krstic, and R. J. Williams, “Stabilization of stochastic nonlinear systems driven by noise of unknown covariance,” IEEE Transactions on automatic control, vol. 46, no. 8, pp. 1237–1253, 2001.
 [21] P. Diaconis and D. Freedman, “On the consistency of bayes estimates,” The Annals of Statistics, pp. 1–26, 1986.
 [22] K. Chung, “The strong law of large numbers,” Selected Works of Kai Lai Chung, pp. 145–156, 2008.

[23]
J. M. H. Lobato, “Expectation propagation for approximate bayesian inference,” 2007.
 [24] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Advances in neural information processing systems, 2000, pp. 1057–1063.
 [25] S. E. Li, H. Chen, R. Li, Z. Liu, Z. Wang, and Z. Xin, “Predictive lateral control to stabilise highly automated vehicles at tireroad friction limits,” Vehicle System Dynamics, pp. 1–19, 2020.
 [26] Y.H. J. Hsu, S. M. Laws, and J. C. Gerdes, “Estimation of tire slip angle and friction limits using steering torque,” IEEE Transactions on Control Systems Technology, vol. 18, no. 4, pp. 896–907, 2009.
 [27] S. Xu, S. E. Li, B. Cheng, and K. Li, “Instantaneous feedback control for a fuelprioritized vehicle cruising system on highways with a varying slope,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 5, pp. 1210–1220, 2016.
 [28] D. Vrabie, O. Pastravanu, M. AbuKhalaf, and F. L. Lewis, “Adaptive optimal control for continuoustime linear systems based on policy iteration,” Automatica, vol. 45, no. 2, pp. 477–484, 2009.
Comments
There are no comments yet.