I Introduction
Model Predictive Control (MPC) is a well-known method to solve finite-horizon optimal control problems online, which has been extensively investigated in various fields [1, 2, 3]. However, existing MPC algorithms still suffer from a major challenge: relatively low computation efficiency [4].
One famous approach to tackle this issue is the moving blocking technique, which assumes constant control input in a fixed portion of the prediction horizon. It increases the computation efficiency by reducing the number of variables to be optimized [5]. However, this solution cannot guarantee the system stability and constraint satisfaction. In addition, Wang and Boyd (2009) proposed an early termination interior-point method to reduce the calculation time by limiting the maximum number of iterations per time step [6].
However, these methods are still unable to meet the online computing requirement for nonlinear and large-scale systems. Some control algorithms choose to calculate an near-optimal explicit policy offline, and then implement it online. Bemporad et al. (2002) first proposed the explicit MPC method to increase the computation efficiency, which partitioned the constrained state space into several regions and calculated explicit feedback control laws for each region [7]. During online implementation, the on-board computer only needs to choose the corresponding state feedback control law according to the current system state, thereby reducing the burden of online calculation to some extent. Such algorithms are only suitable for small-scale systems, since the required storage capacity grows exponentially with the state dimension [8].
Furthermore, significant efforts have been devoted to approximation MPC algorithms, which can reduce polyhedral state regions and simplify explicit control laws. Geyer et al. (2008) provided an optimal merging approach to reduce partitions via merging regions with the same control law [9]. Jones et al
. (2010) proposed a polytopic approximation method using double description and barycentric functions to estimate the optimal policy, which greatly reduced the partitions and could be applied to any convex problem
[10]. Wen et al. (2009) proposed a piecewise continuous grid function to represent explicit MPC solution, which reduced the requirements of storage capacity and improve online computation efficiency[11]. Borrelli et al. (2010) proposed an explicit MPC algorithm which can be executed partially online and partially offline[12]. In addition, some MPC studies employed a parameterized function to approximate the MPC controller. They updated the function parameters by minimizing the MPC cost function with a fixed prediction horizon through supervised learning or reinforcement learning
[13, 14, 15, 16].Noted that the policy performance and the computation time for each step usually increase with the number of prediction steps. The above-stated algorithms usually have to make a trade-off between control performance and computation time constraints, and select a conservative fixed prediction horizon. While the on-board computation resources are often changing dynamically. These algorithms thus usually lead to calculation timeouts or resources waste. In other words, these algorithms cannot adapt to the dynamic allocation of computing resources and make full use of the available computing time to select the longest model prediction horizon.
In this paper, we propose an offline MPC algorithm, called Recurrent MPC (RMPC), for finite-horizon optimal control problems with large-scale nonlinearities and nonaffine inputs. Our main contributions can be summarized as below:
-
A recurrent function is employed to approximate the optimal policy, which maps the system states and reference values directly to the control inputs. Compared to previous algorithms employing non-recurrent functions (such as multi-layer NNs), which must select a fixed prediction horizon previously[13, 14, 15, 16], the use of recurrent structure makes the algorithm be able to select appropriate model prediction horizon according to current computing resources. In particular, the output of the learned policy function after recurrent cycles corresponds to the nearly optimal solution of -step MPC.
-
A policy optimization objective is designed by decomposing the MPC cost function according to the Bellman’s principle of optimality. The optimal recurrent policy can be obtained by directly minimizing the designed objective function. Therefore, unlike the traditional explicit MPC algorithms[7, 8, 9, 10, 11, 12] that can only handle linear systems, the proposed algorithm is applicable for general nonlinear and non input-affine systems. Meanwhile, the proposed RMPC algorithm utilizes the recursiveness of Bellman’s principle. When the cost function of the longest prediction is optimized, the cost function of short prediction will automatically be optimal. Thus the proposed algorithm can deals with different shorter prediction horizons problems while only training with an objective function with respect to a long prediction horizons. Other MPC algorithms [13, 14, 15, 16, 7, 8, 9, 10, 11, 12]do not consider the recursiveness of Bellman’s principle, when the prediction horizons changes, the optimization problem must be reconstructed and the training or computing process must be re-executed to deal with the new problem.
The paper is organized as follows. In Section II, we provide the formulation of the MPC problem. Section III presents RMPC algorithm and proves its convergence. In Section IV, we present simulation demonstrations that show the generalizability and effectiveness of the RMPC algorithm. Section V concludes this paper.
Ii Preliminaries
Consider general time-invariant discrete-time dynamic system
(1) |
with state , control input and the system dynamics function . We assume that is Lipschitz continuous on a compact set , and the system is stabilizable on .
Define the cost function of the -step Model Predictive Control (MPC) problem
(2) |
where is initial state, is length of prediction horizon, is reference trajectory, is the -step cost function of state with reference , is the control input of the th step in -step prediction, and is the utility function. The purpose of MPC is to find the optimal control sequence to minimize the objective , which can be denoted as
(3) | |||
where the superscript represents optimal.
Iii Recurrent Model Predictive Control
Iii-a Recurrent Policy Function
In practical applications, we only need to execute the first control input of the optimal sequence in (3) at each step. Given a control problem, assume that is the maximum feasible prediction horizon. Our aim is to make full use of computation resources and adaptively select the longest prediction horizon , which means that we need to calculate and store the optimal control input of , and in advance. This requires us to find an efficient way to represent the policy and solve it offline.
We firstly introduce a recurrent function, denoted as , to approximate the control input , where
is the vector of function parameters and
is the number of recurrent cycles of the policy function. The goal of the proposed Recurrent MPC (RMPC) algorithm is to find the optimal parameters , such that(4) | ||||
The structure of the recurrent policy function is illustrated in Fig. 1. All recurrent cycles share the same parameters , where is the vector of hidden states.
Each recurrent cycle is mathematically described as
(5) | ||||
where , and
are activation functions of hidden layer and output layer, respectively.

As shown in Fig. 1, the recurrent policy function calculates and outputs a control input at each recurrent cycle. Assuming that we have found the optimal parameters , it follows that the output of the th cycle for . This indicates that the more cycles, the longer the prediction horizon. In practical applications, the calculation time of each cycle is different due to the dynamic change of computing resource allocation (see Fig. 2). At each time step, the total time assigned to the control input calculation is assumed to be . Denoting the total number of the recurrent cycles at each time step as , then the control input is , where
Therefore, the recurrent policy is able to make full use of computing resources and adaptively select the longest prediction step . In other word, the more computing resources allocated, the longer prediction horizon will be selected, which usually would lead to the better control performance.
Remark 1.
Previous MPC algorithms employs non-recurrent form neural networks

Iii-B Objective Function
To find the optimal parameters offline, we first need to represent the MPC cost function in (2) in terms of , denoted by . From (2) and the Bellman’s principle of optimality, the global minimum can be expressed as:
It follows that
(6) | ||||
Therefore, for the same and , it is clear that
(7) |
This indicates that the th optimal control input in (3) can be regarded as the optimal control input of the -+-step MPC control problem with initial state . Hence,by replacing all in (2) with , the -step MPC control problem can also be solved via minimizing . Then, we can obtain the -step cost function in terms of :
(8) |
To find the optimal parameters that make (4) hold, we can construct the following objective function:
(9) |
Therefore, we can update by directly minimizing . The policy update gradients can be derived as
(10) |
where
Denoting as and as , we have
By defining two immediate variables, and
, we have their recursive formula and the details of gradient backpropagation are shown in Fig.
3.Therefore, the gradient formula can be simplified as:


Taking the Gradient Descent (GD) method as an example, the updating rules of the policy function are
(11) |
where denotes the learning rate and indicates th iteration.
Remark 2.
Traditional explicit MPC algorithms[7, 8, 9, 10, 11, 12] can only handle linear systems.The proposed RMPC algorithm uses an optimization objective designed by decomposing the MPC cost function according to the Bellman’s principle of optimality. The optimal recurrent policy can be obtained by directly minimizing the designed objective function without restrictions on the form of systems. Meanwhile, the proposed algorithm utilizes the recursiveness of Bellman’s principle. When the cost function of the longest prediction is optimized, the cost function of short prediction will automatically be optimal. Thus the proposed algorithm can deals with different shorter prediction horizons problems while only training with an objective function with respect to a long prediction horizons. Other MPC algorithms [13, 14, 15, 16, 7, 8, 9, 10, 11, 12]do not consider the recursiveness of Bellman’s principle, when the prediction horizons changes, the optimization problem must be reconstructed and the training or computing process must be re-executed to deal with the new problem.
Iii-C Convergence and Optimality
There are many types of recurrent functions belonging to the structure defined in (5
), and recurrent neural networks (RNN) are the most commonly used. In recent years, deep RNNs have been successfully implemented in many fields, such as natural language processing and system control, attributing to their ability to process sequential data
[19, 20]. Next, we will show that as the iteration index , the optimal policy that make (4) hold can be achieved using Algorithm 1, as long asis an over-parameterized RNN. The over-parameterization means that the number of hidden neurons is sufficiently large. Before the main theorem, the following lemma and assumption need to be introduced.
Lemma 1.
(Universal Approximation Theorem[21, 22, 23]). Consider a sequence of finite functions , where , is the input dimension, is a continuous function on a compact set and is the output dimension. Describe the RNN as
where is the number of recurrent cycles, , and are parameters, and are activation functions. Supposing is over-parameterized, for any , , such that
where is an arbitrarily small error.
The reported experimental results and theoretical proofs have shown that the straightforward optimization methods, such as GD and Stochastic GD (SGD), can find global minima of most training objectives in polynomial time if the approximate function is an over-parameterized neural network or RNN [24, 25]. Based on this fact, we make the following assumption.
Assumption 1.
We now present our main result.
Theorem 1.
Proof. From Assumption 1, we can always find by repeatedly minimizing using (11), such that
According to the definition of in (9), we have
By Lemma 1, there always , such that
Since is the global minimum of , it follows that
and
Then, according to (6), (7) and the Bellman’s principle of optimality, can also make (4) hold, i.e., .
Thus, we have proven that RMPC algorithm can converge to . In other words, it can find the nearly optimal policy of MPC with different prediction horizon, whose output after th recurrent cycles corresponds to the nearly optimal solution of -step MPC.
Iv Algorithm Verification
In order to evaluate the performance of the proposed RMPC algorithm, we choose the vehicle lateral control problem in path tracking task as an example [27].
Iv-a Overall Settings
The policy network is trained offline on the PC, and then deployed to the industrial personal computer (IPC). The vehicle dynamics used for policy training are different from the controlled plant. For online applications, the IPveC-controller gives the control signal to the plant according to the state information and the reference trajectory. The plant feeds back the state information to the IPC-controller, so as to realize the closed-loop control process. The feedback scheme of the HIL experiment is depicted in Fig 5. The type of IPC-controller is ADLINK MXC-6401, equipped with Intel i7-6820EQ CPU and 8GB RAM, which is used as a vehicle on-board controller[28]. The plant is a real-time system, simulated by the vehicle dynamic model of CarSim [29]. The longitudinal speed is assumed to be constant, , and the expected trajectory is shown in Fig. 16. The system states and control inputs of this problem are listed in Table I, and the vehicle parameters are listed in Table II.

Mode | Name | Symbol | Unit |
---|---|---|---|
state | Lateral velocity | [m/s] | |
Yaw rate at center of gravity (CG) | [rad/s] | ||
Longitudinal velocity | [m/s] | ||
Yaw angle | [rad] | ||
trajectory | [m] | ||
input | Front wheel angle | [rad] |
Name | Symbol | Unit |
---|---|---|
Front wheel cornering stiffness | -88000 [N/rad] | |
Rear wheel cornering stiffness | -94000 [N/rad] | |
Mass | 1500 [kg] | |
Distance from CG to front axle | 1.14 [m] | |
Distance from CG to rear axle | 1.40 [m] | |
Polar moment of inertia at CG |
2420 [kg] | |
Tire-road friction coefficient | 1.0 | |
Sampling frequency | 20 [Hz] | |
System frequency | 20 [Hz] |
Iv-B Problem Description
The offline policy is trained based on the nonliner and non input-affine vehicle dynamics:
where and are the lateral tire forces of the front and rear tires respectively [30]. The lateral tire forces can be approximated according to the Fiala tire model:
where is the tire slip angle, is the tire load, is the friction coefficient, and the subscript represents the front or rear tires. The slip angles can be calculated from the relationship between the front/rear axle and the center of gravity (CG):
The loads on the front and rear tires can be approximated by:
The utility function of this problem is set to be
Therefore, the policy optimization problem of this example can be formulated as:
where , , , and .
Iv-C Algorithm Details
The policy function is represented by a variant of RNN, called GRU (Gated Recurrent Unit). The input layer is composed of the states, followed by 4 hidden layers using rectified linear unit (RELUs) as activation functions with
units per layer, and the output layer is set as a layer, multiplied by to confront bounded control. We use Adam method to update the network with the learning rate of and the batch size of .Iv-D Result Analysis
For nonlinear MPC problems, we can solve it with some optimization solvers, such as ipopt [17] and bonmin [18], which can be approximately regarded as the numerical optimal solution.
Fig. 6 compares the calculation efficiency of RMPC and the optimization solvers based on the symbolic framework CasADi [31] under different prediction steps for online applications. It is obvious that the calculation time of the optimization solvers is much longer than RMPC, and the gap increases with the number of prediction steps. Specifically, when , the fastest optimization solver ipopt is over 5 times slower than RMPC (ipopt for ms, RMPC for ms). This demonstrates the effectiveness of the RMPC method.

We run Algorithm 1 for 10 times and calculate the policy error between the solution of ipopt solver and RMPC at each iteration for ,
where and are respectively the maximum and minimum value of for , , is the number of prediction steps. indicates the relative error of control quantity from cycle network respect to the optimum in step prediction control problem .

Policy error during training. Solid lines are average values over 10 runs. Shaded regions correspond to 95% confidence interval.
In Fig. 7, we plot policy error curves during training with different prediction steps . It is clear that all the policy errors decrease rapidly to a small value during the training process. In particular, after iterations, policy errors for all reduce to less than 2%. This indicates that Algorithm 1 has the ability to find the near-optimal policy of MPC problems with different prediction horizons .
Fig. 8 shows the policy performance of the ipopt solver solution and learned policy with different prediction horizons. The policy performance is measured by the lost function of 200 steps (10s) during the simulation period staring from random initialized state, i.e.,
(12) |
For all prediction domains , the learned policy performs as well as the solution of ipopt solver. More recurrent cycles (or long prediction steps) help reduce the accumulated cost .


![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
In detail, Fig. 16 intuitively presents the control results of the learned policy with different recurrent cycles and Fig. 9 compares the control output between the learned policy(after 15 recurrent cycles) with ipopt controller.Obviously, the trajectory controlled by RMPC controller almost overlaps with the ipopt controller. The more recurrent cycles of the learned policy, the smaller the trajectory tracking error.This is why we want to adaptively select the optimal law with longest prediction horizon in real applications.
To summarize, the example demonstrates the optimality, efficiency and generality of the RMPC algorithm.
V Conclusion
This paper proposes the Recurrent Model Predictive Control (RMPC) algorithm to solve general nonlinear finite-horizon optimal control problems. Unlike traditional MPC algorithms, it can make full use of the current computing resources and adaptively select the longest model prediction horizon. Our algorithm employs an RNN to approximate the optimal policy, which maps the system states and reference values directly to the control inputs. The output of the learned policy network after recurrent cycles corresponds to the nearly optimal solution of -step MPC. A policy optimization objective is designed by decomposing the MPC cost function according to the Bellman’s principle of optimality.The optimal recurrent policy can be obtained by directly minimizing the designed objective function, which is applicable for general nonlinear and non input-affine systems. The convergence and optimality of RMPC is further proved. We demonstrate its optimality, generality and efficiency using a HIL experiment. Results show that RMPC is over 5 times faster than the traditional MPC algorithm. The control performance of the learned policy can be further improved as the number of recurrent cycles increases.
References
- [1] S. J. Qin and T. A. Badgwell, “A survey of industrial model predictive control technology,” Control engineering practice, vol. 11, no. 7, pp. 733–764, 2003.
- [2] S. Vazquez, J. Leon, L. Franquelo, J. Rodriguez, H. A. Young, A. Marquez, and P. Zanchetta, “Model predictive control: A review of its applications in power electronics,” IEEE Industrial Electronics Magazine, vol. 8, no. 1, pp. 16–31, 2014.
- [3] S. E. Li, Z. Jia, K. Li, and B. Cheng, “Fast online computation of a model predictive controller and its application to fuel economy–oriented adaptive cruise control,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 3, pp. 1199–1209, 2014.
- [4] J. H. Lee, “Model predictive control: Review of the three decades of development,” International Journal of Control, Automation and Systems, vol. 9, no. 3, p. 415, 2011.
- [5] R. Cagienard, P. Grieder, E. C. Kerrigan, and M. Morari, “Move blocking strategies in receding horizon control,” Journal of Process Control, vol. 17, no. 6, pp. 563–570, 2007.
- [6] Y. Wang and S. Boyd, “Fast model predictive control using online optimization,” IEEE Transactions on control systems technology, vol. 18, no. 2, pp. 267–278, 2009.
- [7] A. Bemporad, M. Morari, V. Dua, and E. N. Pistikopoulos, “The explicit linear quadratic regulator for constrained systems,” Automatica, vol. 38, no. 1, pp. 3–20, 2002.
- [8] B. Kouvaritakis, M. Cannon, and J. A. Rossiter, “Who needs qp for linear mpc anyway?” Automatica, vol. 38, no. 5, pp. 879–884, 2002.
- [9] T. Geyer, F. D. Torrisi, and M. Morari, “Optimal complexity reduction of polyhedral piecewise affine systems,” Automatica, vol. 44, no. 7, pp. 1728–1740, 2008.
- [10] C. N. Jones and M. Morari, “Polytopic approximation of explicit model predictive controllers,” IEEE Transactions on Automatic Control, vol. 55, no. 11, pp. 2542–2553, 2010.
- [11] C. Wen, X. Ma, and B. E. Ydstie, “Analytical expression of explicit mpc solution via lattice piecewise-affine function,” Automatica, vol. 45, no. 4, pp. 910–917, 2009.
- [12] F. Borrelli, M. Baotić, J. Pekar, and G. Stewart, “On the computation of linear model predictive control laws,” Automatica, vol. 46, no. 6, pp. 1035–1041, 2010.
- [13] B. M. Åkesson, H. T. Toivonen, J. B. Waller, and R. H. Nyström, “Neural network approximation of a nonlinear model predictive controller applied to a ph neutralization process,” Computers & chemical engineering, vol. 29, no. 2, pp. 323–335, 2005.
- [14] B. M. Åkesson and H. T. Toivonen, “A neural network model predictive controller,” Journal of Process Control, vol. 16, no. 9, pp. 937–946, 2006.
- [15] L. Cheng, W. Liu, Z.-G. Hou, J. Yu, and M. Tan, “Neural-network-based nonlinear model predictive control for piezoelectric actuators,” IEEE Transactions on Industrial Electronics, vol. 62, no. 12, pp. 7717–7727, 2015.
- [16] J. Duan, Z. Liu, S. E. Li, Q. Sun, Z. Jia, and B. Cheng, “Deep adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints,” arXiv preprint arXiv:1911.11397, 2019.
- [17] A. Wachter and L. T. Biegler, “Biegler, l.t.: On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. mathematical programming 106, 25-57,” Mathematical Programming, vol. 106, no. 1, pp. 25–57, 2006.
- [18] P. Bonami, L. T. Biegler, A. R. Conn, G. Cornuéjols, I. E. Grossmann, C. D. Laird, J. Lee, A. Lodi, F. Margot, N. Sawaya et al., “An algorithmic framework for convex mixed integer nonlinear programs,” Discrete Optimization, vol. 5, no. 2, pp. 186–204, 2008.
- [19] T. Mikolov, M. Karafiát, L. Burget, J. Černockỳ, and S. Khudanpur, “Recurrent neural network based language model,” in Eleventh annual conference of the international speech communication association, 2010.
- [20] S. Li, H. Wang, and M. U. Rafique, “A novel recurrent neural network for manipulator control with improved noise tolerance,” IEEE transactions on neural networks and learning systems, vol. 29, no. 5, pp. 1908–1918, 2017.
- [21] L. K. Li, “Approximation theory and recurrent networks,” in Proc. of IJCNN, vol. 2, pp. 266–271. IEEE, 1992.
- [22] A. M. Schäfer and H.-G. Zimmermann, “Recurrent neural networks are universal approximators,” International journal of neural systems, vol. 17, no. 04, pp. 253–263, 2007.
- [23] B. Hammer, “On the approximation capability of recurrent neural networks,” Neurocomputing, vol. 31, no. 1-4, pp. 107–123, 2000.
-
[24]
Z. Allen-Zhu, Y. Li, and Z. Song, “A convergence theory for deep learning via over-parameterization,” in
International Conference on Machine Learning
, pp. 242–252. Long Beach, California, USA: ICML, 2019. - [25] S. Du, J. Lee, H. Li, L. Wang, and X. Zhai, “Gradient descent finds global minima of deep neural networks,” in International Conference on Machine Learning, pp. 1675–1685. Long Beach, California, USA: ICML, 2019.
- [26] Z. Allen-Zhu, Y. Li, and Z. Song, “On the convergence rate of training recurrent neural networks,” in Advances in Neural Information Processing Systems, pp. 6673–6685, 2019.
- [27] R. Li, Y. Li, S. E. Li, E. Burdet, and B. Cheng, “Driver-automation indirect shared control of highly automated vehicles with intention-aware authority transition,” in 2017 IEEE Intelligent Vehicles Symposium (IV), pp. 26–32. Redondo Beach, California, USA: IEEE, 2017.
- [28] C. Chen, J. Pan, X. Chang, Q. Xu, and J. Wang, “System design and function verification of an extensible autonomous driving platform,” in 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5. Sapporo, Japan: IEEE, 2019.
- [29] R. Benekohal and J. Treiterer, “Carsim: Car-following model for simulation of traffic in normal and stop-and-go conditions,” Transportation research record, vol. 1194, pp. 99–111, 1988. [Online]. Available: http://dx.doi.org/
- [30] J. Kong, M. Pfeiffer, G. Schildbach, and F. Borrelli, “Kinematic and dynamic vehicle models for autonomous driving control design,” in 2015 IEEE Intelligent Vehicles Symposium (IV), pp. 1094–1099. Seoul, South Korea: IEEE, 2015.
- [31] J. A. E. Andersson, G. Joris, H. Greg, J. B. Rawlings, and D. Moritz, “Casadi: a software framework for nonlinear optimization and optimal control,” Mathematical Programming Computation, vol. 11, pp. 1–36, 2018.