1 Introduction
Model predictive control (MPC) is a dynamic optimization technique widely used in industrial process applications such as oil refineries and chemical plants [qin2003survey]. Recently, MPC has found mainstream use in robotics for the control of ground [richter2018bayesian], aerial [Watterson2015SafeRH, bouffard2012learning], and humanoid [erez2013integrated] robots due to its versatility, robustness, and safety guarantees. The transition from the process industry to robotics brings additional challenges since the computation time is reduced from hours to milliseconds.
MPC techniques can be categorized into implicit and explicit MPC. Implicit MPC focuses on efficient online computation of an openloop control sequence, optimizing the system performance at its current state. Wang and Boyd [wang2010fast] exploit the MPC problem structure to design an interior point method that decreases the time complexity of solving the associated quadratic program (QP). Vichik and Borelli [vichik2014solving] experiment with an entirely different computation scheme and demonstrate that analog circuits can solve QPs in microseconds, but require custom hardware.
Rather than solving an optimization problem online, explicit methods manage the computational load by precomputing the optimal control law offline as a function of all feasible states , where the optimal control is known to be piecewise affine on polytopes. The drawback of explicit MPC approaches is that the computational complexity, measured by the number of polytopic regions, grows quickly with the number of constraints. As a result, computing the optimal explicit control law can become computationally intractable for large systems. In addition, even if this optimal control law can be computed, the process of determining which region contains the system state can require too much processing power or memory storage for realtime execution [kvasnica2012clipping].
One approach to address the computational limitations of optimal explicit MPC control laws is to compute an approximate suboptimal controller. Jones and Morari [jones2010polytopic] use a doubledescription method to build piecewise affine (PWA) approximations of the value function, and use barycentric functions on the polytopic regions of the approximate value function to compute an approximate control law. In addition, they are able to prove recursive feasibility and asymptotic stability of the suboptimal controller. These approaches, however, have not been demonstrated to scale to the same problem sizes as those handled by implicit MPC methods.
A promising approach that this paper investigates is the use of a neural network to approximate the MPC control law. There have been several recent works using neural network architectures for MPC design. Chen et al. [Chen2018ApproximatingEM] use a neural network with an orthogonal projection operation to approximate the optimal control law. Hertneck et al. [Hertneck2018] use a neural network in a robust MPC framework to provide statistical guarantees of feasibility and stability. Zhang et al. [zhang2019safe] use a neural network to approximate the primal and dual variables, and provide statistical guarantees as well as certificates of suboptimality. Our work extends these approaches by providing nonstatistical guarantees on recursive feasibility and asymptotic stability through the integration with an online solver. Moreover, we demonstrate for the first time that neuralnetwork based approximations of MPC control laws can scale to very large systems.
A few closely related works combine the strengths of machine learning for scalability with the analytical tractability of QP problems in an explicitimplicit MPC approach. Zeilinger et al.
[zeilinger2011real] compute a piecewise affine approximation of the optimal control law, combine it with an active set method, and provide criteria to terminate the active set method early while still obtaining guarantees on recursive feasibility and asymptotic stability. Klaučo et al. [klauvco2019machine] use classification trees and nearest neighbors to warm start an active set method and solve to the optimal solution. These approaches differ from our method because they do not utilize a neural network as the function approximator, and do not demonstrate the ability to scale to large systems.The challenge of generating a large training data set containing the initial feasible states and their corresponding optimal inputs, primal variables, or dual variables for highdimensional systems has not been widely studied in the MPC literature. Yet, generating such data is critical for scaling any learningbased explicit MPC approach to high dimensions because a naive approach would waste large amounts of computation to obtain feasible initial states. Current work either does not address this problem, uses gridding [klauvco2019machine, summers2011multiresolution] or random sampling [zhang2019safe, Hertneck2018]
methods that cannot scale to high dimensions, or relies on reinforcement and imitation learning techniques that perform closedloop simulations leading to weaknesses discussed in
[ross2011reduction]. Our work proposes an algorithm to efficiently generate large datasets in high dimensions based on theory from geometric random walks and quasi Monte Carlo (QMC) sequences.In summary, the goal of this work is to develop a neuralneworkbased suboptimal MPC control law that can scale to large system sizes, reduce online computation time relative to implicit methods, and provide guarantees on recursive feasibility and asymptotic stability. Specifically, our contributions include:

a primaldual loss function that incorporates state and input constraints to train a neural network approximation of explicit MPC control law;

a control law that corrects the network output using a primal active set solver and terminates early once necessary criteria for recursive feasibility and asymptotic stability have been reached;

a data set generation algorithm utilizing ideas from geometric random walks that efficiently generates feasible samples for large systesms; and

a demonstration that the proposed approach scales to large problems with states, inputs, and time horizon of .
2 Problem Statement
Consider a discretetime linear timeinvariant system,
(1) 
subject to a set of constraints,
(2)  
Assume that the pair is stabilizable. For a given state , our goal is to compute a sequence of control inputs to regulate the system to a desired state in the interior of the feasible set (2). Without loss of generality, this desired state will be assumed to be the origin. This problem, known as the constrained infinitehorizon LQR problem, is:
(3)  
where and are chosen to define the desired optimal behavior for the system. When optimal behavior is needed at multiple initial states, instead of recomputing the input sequence every time, it is desirable to obtain a control policy that specifies the optimal input for an arbitrary state .
3 Preliminaries
3.1 Model Predictive Control Formulation
Receding horizon control (RHC) obtains a suboptimal controller for Problem (3) by repeatedly solving the following finitehorizon problem:
(4)  
where is the horizon, is a prediction of the state at and is an optimization variable, while is the actual state at the current time and is a parameter. Note that a terminal cost, , that bounds the cost for the remaining time and a terminal constraint set, , have been added to the problem to ensure feasibility and asymptotic stability of the receding horizon controller as will be discussed in Sec. 3.3. In the following, we drop the time to simplify notation. The batch formulation of (4) is:
(5)  
s.t.  
with the following definitions
(6)  
where is the matrix of size with ones on the first subdiagonal and zeros elsewhere, is the
th standard basis vector,
is the vector of all ones of size , , , denote the number of constraints specified by the rows of , , , respectively, ”” denotes the vertical concatenation operation, and denotes the Kronecker product. When the parameter is fixed, Problem (5) is a quadratic program (QP) and the solution is a vector .A common choice for the terminal cost matrix is the solution to the Algebraic Riccati Equation:
(7) 
which corresponds to the optimal costtogo of an unconstrained infinitehorizon LQR problem. With this choice of terminal cost and a large enough , solving the finitehorizon problem (4) will actually yield the optimal solution to the infinitehorizon problem (3) [bemporad2002explicit]. A common choice for is the region is the maximal positively invariant set of the LQR controller , which is computable via reachability analysis [borrelli2017predictive, Ch.11].
System 1 (Double Integrator).
Throughout the paper, we will use the 2D double integrator system,
to illustrate various concepts. We are interested in stabilizing the system by solving (3) with cost terms , , subject to position and velocity constraints, , , and input constraints , for . We consider the RHC formulation in (4) with a planning horizon . We choose the terminal region to be , and the terminal cost . The constraint sizes are , , and . The primal variables have dimension , the dual variables corresponding to the inequality constraints have dimension , and the dual variables corresponding to the equality constraints have dimension .
3.2 Feasibility and Duality
For a given state parameter , QP (5) is a strictly convex QP because the matrix is positive definite. A decision variable is called primal feasible for a parameter if and .
Let be the set of parameters for which QP (5) is feasible [borrelli2017predictive, Defn. 6.3]. The set is a polyhedron but is difficult to compute because it requires expensive polyhedral projections or recursive backward reachable set computations [borrelli2017predictive, Ch. 6 and 11]. Given and an associated primal feasible , the suboptimality level of is:
(8) 
where is the unique minimizer (due to strict convexity) of QP (5). A common technique to handle the constraints in QP (5) is to use duality theory and introduce dual variables [boyd2004convex, Ch.5]. Define the Lagrangian associated with QP (5) as:
(9) 
The Lagrangian dual of a minimization QP is a maximization QP over the variables with the following objective function:
(10) 
where the minimization over can be done in closedform for a convex QP. The dual variables are called dual feasible if . For a given parameter , any primal feasible , any dual feasible , and optimal primaldual variables , strong duality holds because Slater’s conditions are satisfied [boyd2004convex, Ch.5]:
(11)  
Finally, define the feasible duality gap associated with , , , as
(12) 
which is an upper bound on the suboptimality level for any feasible due to Eqn. (11).
3.3 Receding Horizon Control
Fixing the state parameter yields QP (5) that minimizes over a vector of primal variables . We can instead view the analog, the multiparametric quadratic program (mpQP) that minimizes over functions that map to [borrelli2017predictive, Ch. 6].
[Planner] A planner is a function that maps a parameter to a decision variable . A planner is primal feasible if is a primal feasible variable for QP (5) . It is optimal if is the optimal solution .
The following definitions show how a planner can be implemented as a receding horizon controller (RHC) .
[Closed Loop System and RHC] Let be a planner. Let denote the first control input in , that is . Define the closed loop system
is known as the RHC corresponding to the planner .
There are two desirable properties that we would like to guarantee for our control law, which we define below.
[Recursive Feasiblity] The RHC is recursively feasible if .
Primal feasibility is a property of the planner and open loop optimization problem, while recursive feasibility is a property of the RHC and corresponding closed loop system. In general, primal feasibility of the open loop optimization problem does not imply recursive feasibility of the corresponding closed loop controller. We next define the notion of stability around an equilibrium point.
[Equilibrium Point] A point is called an equilibrium point if .
[Asymptotic Stability]
The equilibrium point is stable if such that .
In addition, is asymptotically stable if it is stable and can be chosen such that
Recursive feasibility of a control law is a necessary, but not sufficient, condition for asymptotic stability.
4 Technical Approach and Overview
Our primary motivation is learning a controller with guarantees for highdimensional problems with large state and input dimensions and long time horizons. Large problems are challenging since traditional explicit MPC approaches become too computationally expensive, and even implicit MPC may require custom architectures on FPGAs for realtime inference [jerez2014embedded]. We propose a hybrid explicitimplicit procedure that merges an offline trained neural network with an online primal active set solver to compute a planner . The neural network approximation of the planner enables our procedure to scale to these large problems. The primal active set solver then provides a few corrective steps to meet the conditions necessary to guarantee recursive feasibility and asymptotic stability.
In Sec. 5, we describe the offline neural network architecture training procedure. The neural network approximates a planner by mapping the state to a primal prediction
. We highlight the connections between neural networks with rectified linear unit (ReLu) activation functions and the optimal planner and RHC for MPC problems. We then introduce a primaldual loss function based on the Lagrangian function to train the neural network.
In Sec. 6, we describe the conditions necessary to obtain guarantees of recursive feasibility and asymptotic stability of the RHC corresponding to an optimal and suboptimal planner . These conditions center on both checking for primal feasibility and bounding the suboptimality of the planner. We then introduce a procedure to obtain certificates of primal feasibility and suboptimality for a given state parameter and the primal variable , without the need of additional inputs such as the dual variables .
In Sec. 7, we introduce primal active set methods. We describe how to merge these methods with the neural network in a manner that reduces the extra necessary online computation. The combination computes a new suboptimal planner that yields a RHC that is recursively feasible and asymptotically stable.
In Sec. 8, we highlight the practical issues that arise when scaling the proposed procedure to high dimensional problems. The main challenge is drawing samples from the domain of our planner to train and evaluate the neural network, as is difficult to compute for large problems. Instead we have a membership oracle in the form of a QP solver that given a state parameter , will determine whether the corresponding QP (5) has a solution. We demonstrate how standard sampling methods such as gridding or Monte Carlo rejection sampling will not scale to high dimensions, and introduce the theory of geometric random walks to surpass this problem. We then develop a practical method motivated by this theory to generate large datasets for large scale problems.
In Sec. 9, we present results and analysis on four systems that demonstrate the ability of our proposed approach to scale to large systems. We also provide illustrative metrics that highlight the encountered challenges that shaped the development of this method. The largest problem we analyze has states, inputs, and time horizon of , and our results indicate that our method can be applied to even larger systems. We did not analyze a larger system since the implementation we used to generate Sobol sequences (described in Sec. 8) was limited to dimensions.
5 Offline Neural Network Training
The first step of our approach is offline training of a deep neural network that provides a candidate solution for the QP optimization in (5). We choose to approximate the entire primal prediction over the planning horizon instead of the control law . Although this choice requires approximating variables compared to only , these additional predicted variables are used to obtain the desired guarantees through the online primal active set solver.
(Deep Neural Network) A deep neural network (DNN) with layers is a composition of affine functions
(13) 
each except the last one followed by a nonlinear activation function , so that:
where are the affine function parameters to be optimized, and is a fixed (not optimized) function, typically chosen as a sigmoid, hypertangent, or ReLu function (see [Goodfellowetal2016, Ch.6] for details). The depth of the DNN is the number of layers , and each layer has a width defined by the number of rows of and .
The ReLu activation function is , where the operation is applied elementwise. We restrict our search for a planner within the class of functions computed by a ReLu DNN . The goal is to design the architecture of , and then search for parameters to approximate the optimal planner .
5.1 Piecewise Affine Neural Networks
We show that there exists a ReLu DNN architecture and corresponding network parameters that can represent the optimal planner for the mpQP in (5) exactly. This justifies our choice to restrict the planner representation to a ReLu DNN. The solution to an mpQP is continuous and piecewiseaffine on polyhedra [borrelli2017predictive, Thm. 6.7]. In addition, this solution is unique due to the positive definiteness of . As a result, is a unique, continuous, and piecewiseaffine function on polyhedra. ReLu DNNs also represent continuous and piecewiseaffine functions on polyhedra [arora2018understanding]. These observations allow us to establish the following connection between a ReLu DNN and the solution to an mpQP.
(Exact Representation) Suppose for is a continuous and piecewiseaffine function on polyhedra. Then, can be represented exactly by a ReLu DNN with depth at most
(14) 
Follows directly from [arora2018understanding, Thm. 2.1] which states the above result for . Generalizing to is trivial as we can take such neural networks to represent the individual output dimensions, each bounded with the depth .
Thm. 5.1 provides a guideline for choosing an appropriate network depth to approximate the planner for a system with states. For example, a neural network architecture approximating for the double integrator system (Sys. 1) should have depth greater than .
Specifying an appropriate network architecture also requires a choice on the widths of each layer. It is possible to use ideas from [Chen2018ApproximatingEM] that connect additional ReLu DNN theory from [montufar2014number], [arora2018understanding], [hanin2019deep] with mpQP problems. These ideas provide valuable intuition on bounding the maximum number of regions representable by various architectures, but do not offer constructive procedures. As a result, we choose a network depth according to Thm. 5.1, and then use grid search to choose appropriate widths.
5.2 Neural Network Training
After designing the neural network architecture, the next step is to optimize the neural network parameters to approximate . We rely on a training data set
(supervised learning) to minimize a loss function over
. Since the neural network outputs the primal variables , a natural choice would be to use a dataset to optimize the primal leastsquares loss function:(15) 
While this approach is reasonable, a more accurate estimate of the optimal planner
may be obtained by minimizing the suboptimality level defined in Eqn. (8). Using the same dataset , the ideal loss function would be:(16)  
s.t. 
which includes a primal feasibility constraint on the neural network output. Including the primal feasibility constraint is necessary as there are potentially multiple infeasible primal variables , where and . The primal variables that minimize the suboptimality level and obey Eqn. (8) are unique only when we include the constraint of primal feasibility. However, minimizing Eqn. (16) is a constrained optimization problem that is difficult to minimize directly.
These challenges motivate a primaldual loss function based on the Lagrangian (9) associated with QP (5). Assuming access to optimal dual variables, the data set format is now augmented to be , and the loss function is:
(17) 
The dual variables in both the predicted and optimal Lagrangian are the optimal dual variables . By strong duality in Eqn. (11), the unique minimizers are the optimal primal variables . As a result, minimizing Eqn. (17) minimizes the original Eqn. (16). Although this loss function requires storing more information in the data set , there is no additional computation necessary as both the optimal primal and dual variables are calculated by a QP solver when generating the dataset. The least squares Lagrangian loss in Eqn. (17) thus has the strengths of both the primal least squares loss in Eqn. (15) and the ideal suboptimal loss in Eqn. (16) as it is an unconstrained minimization problem that is still based on the QP cost matrices.
With the loss function designed in Eqn. (17), we use a standard stochastic gradientbased optimization techniques such as Adam [kingma2014adam] or RMSProp [tieleman2012lecture]. These approaches train the network by sampling a subset of the training dataset (minibatch), computing the loss in Eqn. (17
) over the minibatch instead of the entire dataset, and then use backpropagation to compute the loss gradient to update the neural network parameters
. This sampling is continued until the dataset is exhausted, which concludes an epoch, and is then repeated until convergence.
6 Conditions and Certificates
The trained neural network represents an approximate planner that maps states to primal predictions . As a result, it can be implemented as a RHC by efficiently recomputing using and applying the corresponding first control input at each encountered state. We would like to guarantee recursive feasibility and asymptotic stability of this RHC. There are a few criteria that the planner has to satisfy in order to guarantee these properties.
6.1 Optimal RHC
We first analyze the conditions necessary to guarantee feasibility and stability of the optimal RHC. Recall that with a large enough and proper choice of terminal cost, solving the finitehorizon problem (4) yields the solution to the infinitehorizon LQR problem (3) [bemporad2002explicit]. The optimal RHC then corresponds to the optimal control law for problem (3), which is guaranteed to be recursively feasible and asymptotically stable. However, it is not always possible to choose a large , and the optimal RHC will then correspond to a suboptimal control law for problem (3), which is not guaranteed to be recursively feasible and asymptotically stable. The following conditions specify how to construct the finitehorizon problem so that the optimal RHC still obtains these guarantees.
[Control Invariant Set] A set is a control invariant set for system (1) subject to constraints (2) if
Examples of a control invariant set are , and the set containing only the equilibrium point.
[Control Lyapunov Function] Suppose the stage cost is a continuous, positivedefinite function. A function that is continuous, positive definite, and satisfies, for every ,
is called a control Lyapunov function for system (1) over the set .
The following theorem states that choosing the terminal constraint set to be a control invariant set and terminal cost to be a control Lyapunov function over will guarantee recursive feasibility and asymptotic stability of the RHC corresponding to the optimal planner .
Suppose , and contain the origin in their interior and are closed, and in addition that is a control invariant set. Suppose the stage cost is continuous and positive definite, and choose the terminal cost to be a control Lyapunov function over . Then, the optimal RHC corresponding to the optimal planner with domain is recursively feasible and asymptotically stable with domain of attraction for system (1) subject to constraints (2). See [borrelli2017predictive, Thm 12.1] for recursive feasibility and [borrelli2017predictive, Thm 12.2] for asymptotic stability.
The choice of cost satisfies the conditions of a control Lyapunov function where the stage cost has the form .
6.2 Approximate RHC
Recall that the trained neural network represents an approximate planner , not the optimal planner . In addition to the conditions stated in Thm. 6.1, an approximate planner must satisfy two more conditions to guarantee recursive feasibility and asymptotic stability of the corresponding suboptimal RHC.
The condition to guarantee recursive feasibility of the RHC is primal feasibility of the estimated for all feasible initial state parameters . The condition to guarantee asymptotic stability is bounding the suboptimality of the estimated for all feasible initial state parameters . Note that these conditions depend on the specific planner, and not just the construction of the finitehorizon problem (4) from the original infinitehorizon LQR problem (3). These conditions are formally stated in the following theorem.
Suppose , and contain the origin in their interior and are closed, and in addition that is a control invariant set. Suppose the stage cost is a continuous and positive definite function, and choose the terminal cost to be a control Lyapunov function over . Suppose that an approximate planner with domain is primal feasible, and there exists a function where ,
(18)  
then the corresponding approximate RHC is recursively feasible and asymptotically stable with domain of attraction for system (1) subject to constraints (2). A similar argument as for the optimal case proves recursive feasibility. See [borrelli2017predictive, Thm 13.1] for asymptotic stability.
6.3 Strategy To Obtain Guarantees
These theorems motivate the following strategy to guarantee recursive feasibility and asymptotic stability of our approximate RHC. We first choose to be a control invariant set, and the terminal cost to be a control Lyapunov function over . We then need to guarantee that our approximate planner is primal feasible, and the corresponding RHC will be recursively feasible.
We next set the function in Thm. 6.2 to be the feasible duality gap in Eqn. (12). We compute by using primal feasible variables to estimate corresponding dual feasible variables . By Eqn. (12), will satisfy the first row in Eqn. (18). To satisfy the second row, we need to bound by the first zero input stage cost .
We can thus guarantee recursive feasibility and asymptotic stability of by obtaining a certificate of primal feasibility and a certificate of suboptimality for . Sec. 6.4 details our method to determine whether given , the predicted from satisfies both certificates, and Sec. 7 discusses our strategy in using an online primal active set method to ensure that these certificates will be obtained .
6.4 Certificate Evaluation
Evaluating primal feasibility is trivial, since given an initial state parameter , we can check primal feasibility by directly evaluating constraint violation for the predicted . If we obtain a certificate of primal feasibility, the next step is find dual feasible variables that satisfy , which obtains the certificate of suboptimality.
One option is to additionally approximate the dual function [zhang2019safe] with another neural network, but it has a few drawbacks. First, this approach will substantially increase the size of the neural networks, and a smaller network is preferable to reduce inference speed. Second, the optimal dual function is typically not unique. The linear independence constraint qualification (LICQ) does not hold when the number of rows in constraint matrices and are more than the columns, which is typically the case in most problems. As a result, the Hessian of the concave objective in the dual problem is negative semidefinite [borrelli2017predictive, Ch. 3.3] and implies that the dual solution may not be unique. This nonuniquess requires extra care when generating the dual labels and designing the loss function to train a neural network. Finally, separately predicting both the primal and dual variables is redundant, as one can easily be derived from the other.
We use the KKT optimality conditions to derive a set of feasible dual variables from primal variables. For the optimal primal variables , this procedure will yield optimal dual variables . Using the predicted primal variables , the first step is forming a working set consisting of the active equality and inequality constraints. Alg. 1 demonstrates how to estimate this working set given a set of primal variables . Notice that for any primal feasible , , so Alg. 1 assumes that will contain all of the equality constraints.
Due to the complementary slackness KKT optimality condition, the dual variables corresponding to the inactive constraints should be set to . We can thus focus on the dual variables for the remaining active constraints in , which sets up the following KKT system:
(19) 
The matrices , , and are constructed using the rows of the active constraints, and are the dual variables corresponding to both equality and inequality active constraints. The matrix on the left is called the KKT matrix.
If is positive definite, and if the matrix has full row rank, the KKT matrix is invertible. Solving this KKT system yields primal and dual variables that satisfy the stationarity KKT condition, expressed in the first row, and primal feasibility KKT condition, expressed in the second row.
Given primal feasible , we can construct , and thus , by using Alg. 1 and ensuring that has full row rank. We thus obtain the following equation relating to :
(20) 
Since is positive definite and is full row rank, is positive definite, and thus (20) can be solved for using Cholesky factorization. It is simple to obtain the dual variables from based on whether the original dual variable or is active and set to the corresponding , or inactive and set to .
Unless corresponds to the active constraints at the optimal solution, the resulting will be dual infeasible and needs to be corrected. The pair of primal feasible and dual infeasible satisfy the complementary slackness, stationarity, and primal feasibility KKT conditions. If the dual feasibility KKT condition were also satisfied, then all the KKT optimality conditions are satisfied and thus must be optimal.
When is dual infeasible, we can obtain feasible dual variables since the feasible region for the inequality dual variables is the nonnegative orthant, and the Euclidean projection is particularly simple because it is the elementwise . The equality variables are unconstrained. After performing the projection, Eqn. (19) will no longer be satisfied, as we have swapped satisfying the stationarity condition with the dual feasibility KKT condition.
Given a primal feasible , this procedure will always yield dual feasible . We can then check if the feasible duality gap , which implies that , and thus satisfies the second row in the conditions for Eqn. (18). Successfully finding these dual feasible variables obtains a certificate of suboptimality.
The above procedure only checks whether the certificates of primal feasibility and suboptimality are satisfied, but does not guarantee satisfaction. The predicted primal variables may not necessarily be primal feasible and will not yield the certificate of primal feasibility. In addition, the procedure may not yield the certificate of suboptimality because the predicted primal variables or derived dual feasible variables are too conservative, and the resulting . The following Sec. 7 details how we guarantee satisfying both certificates by using an online primal active set method, and thus obtain the guarantees on recursive feasibility and asymptotic stability of the approximate RHC .
7 Obtaining Guarantees with Online Primal Active Set Solver
For a planner represented solely by an offline trained neural network , it is a challenging to guarantee that both certificates of primal feasibility and suboptimality are satisfied for all states . Most guarantees that can be provided on neural networks will be statistical [hertneck2018learning], as there will be a chance that the neural network outputs an extremely poor prediction.
Our approach is to instead use an online method to ensure satisfaction of these conditions for all states . We utilize a primal active set solver [wright1999numerical, Ch. 16] because it is distinguished among QP solver techniques in that it can be accelerated through warm starts from good initializations from the neural network, and it can also be terminated early once the certificates of primal feasibility and suboptimality are achieved. Methods such as interiorpoint methods are difficult to warm start [john2008implementation], and other nonprimal active set methods do not guarantee primal feasibility of intermediate iterates and cannot be terminated early [Ferreau2008]. Intuitively, the neural network removes the early iterations of the solver, and checking the criteria of Thm. 6.2 removes the final iterations.
7.1 Warm Starting and Early Termination
Active set methods solve inequalityconstrained QPs by solving a sequence of equalityconstrained quadratic subproblems [wright1999numerical]. These solvers are closely related to the concept of working sets introduced in Sec. 6, as they explicitly maintain and update , where each intermediate working set represents an equalityconstrained quadratic subproblem. The goal of these procedures is to determine the optimal working set , after which solving the corresponding KKT system (19) will yield the optimal primal and dual variables.
The general approach is to start with some initial guess of the active set , and proceed by adding and removing constraints from this set until is found. During the course of solving the QP, these active set algorithms maintain intermediate terms such as primal variables, dual variables, basis variables, and factorizations of the basis and Hessian, with the specific terms tracked being dependent on the particular solver chosen. Combined these intermediate terms constitute the solver state.
Active set methods are wellsuited to acceleration due to good initalization [wright1999numerical]. This strategy is called a warm start if part of the solver state is initialized, or a hot start if all of the solver state is initialized. Using Alg. 1 in Sec. 6 on the neural network primal prediction can yield a working set to initialize .
We reduce online computation by terminating the online solver once the criteria in Thm. 6.2 are satisfied. Primal active set solvers explicitly find and maintain primal feasibility of intermediate iterates, which is necessary to satisfy the certificate of primal feasibility prior to reaching optimality. These solvers are split into two phases. In Phase I, the solver first finds an initial primal feasible point by solving a linear feasibility program. Starting from this primal feasible point, Phase II updates the primal solution and the working active set while maintaining primal feasibility. As a result, once Phase I completes, we will obtain a certificate of primal feasibility. During Phase II, we check the duality gap at the intermediate iterates using the techniques described in Sec. 6 in order to obtain the certificate of suboptimality.
7.2 Suboptimal Planner with Guarantees
Using these ideas, we reach our final hybrid explicitimplicit algorithm that computes another approximate planner by following the procedure in Alg. 2. This procedure is guaranteed to terminate in finite time as the active set method will reach the optimal solution in finite time [wright1999numerical, Ch. 16]. In addition, for all , will also obtain certificates of primal feasibility and suboptimality. Finally, given that the terminal constraint and terminal cost were chosen to satisfy the conditions specified in Thm. 6.2, the resulting is guaranteed to be recursively feasible and asymptotically stable according to Thm. 6.2.
8 Scaling to Large Systems
The motivation for using a neural network is scaling to large systems and long time horizons. While the previous sections detailed our approach to compute an approximate planner with guarantees on the corresponding RHC, this section discusses the particular challenges of function approximation in large MPC problems which arise during the training data set generation process. Our approach to overcome these challenges draws on ideas from geometric random walks [vempala2005geometric] and quasi Monte Carlo (QMC) sequences [niederreiter1992random, sobol1967distribution].
The domain of the planner is the polyhedron . The main challenge is that, as previously mentioned in Sec. 3.2, computing an explicit (halfspace or vertex) representation of is computationally intractable for large systems. Instead, is defined by a membership oracle [vempala2005geometric] which on an input returns Yes if , and No otherwise. In mpQP problems, this membership oracle is a QP solver that given a parameter solves the corresponding QP (5).
We will generate a training data set to train the neural network by sampling data points from the domain . As a result, we need a method that can sample data points from a convex set defined by a membership oracle which will also scale to high dimensions. One method is rejection sampling, which involves sampling from a superset of and querying the membership oracle for each point. A point is added to the data set only if the oracle returns Yes. An example of a sampling superset is , as .
Unfortunately, rejection sampling will not scale to high dimensions. First, the cost of querying the membership oracle becomes more expensive with higher dimensions, as the corresponding QP (5
) becomes larger. More prohibitively, the probability of sampling a feasible state will, in general, become very small, which we empirically observe in the MPC application (see Fig.
8 in Sec. 9). This phenomenon can be understood by comparing , the volume of , with (or more generally to a sampling superset) as the time horizon, states, and inputs increase. With an increasing time horizon , will shrink [borrelli2017predictive, Rmk. 11.4] until becomes the maximal stabilizable set for control invariant set [borrelli2017predictive, Rmk. 11.3], and there is no guarantee that will become the maximal stabilizable set in finite time [borrelli2017predictive, Sec. 10.2]. This shrinking occurs since there will be more constraints in QP (5) with increasing . As a result, will typically decrease with increasing and the probability of a sampled point from lying inwill likewise decrease for typical distributions such as the uniform distribution over the convex set.
We can intuitively understand how behaves with increasing states and inputs by recalling the classic curse of dimensionality example , where is an ndimensional hypersphere of diameter and is an ndimensional hypercube with side length that is the minimum bounding hypercube for . With increasing dimensions, most of the volume in a cube is contained in the corners. A similar phenomenon is likely to occur in the MPC application as the set difference will contain initial states closer to the edges and corners of rather than the origin.
To surmount this problem, we develop an algorithm motivated by geometric random walks [vempala2005geometric], which sample from convex sets using the membership oracle. To sample from a given distribution, these methods set up a random walk whose steady state is the desired distribution to sample from. Examples of these random walks are the grid walk which starts from the interior of the convex set and moves to neighboring points on a grid, or the hit and run walk which picks a random line at the current point and goes to a random point on the chord . At each potential point , the random walk will query the membership oracle and move to if the oracle returns Yes, otherwise stay at the current point .
The states reached by this random walk will lie inside the convex set and overcome the previous sampling challenges. Since the desired distribution to sample is the stationary distribution, effective sampling with these methods requires the distribution of the current point to converge rapidly to its steady state, which is called mixing. Many of the technical details are beyond the scope of this manuscript and are described in [vempala2005geometric], but a key positive result in this theory is that for convex sets and distributions with logconcave density (which include most of the common distributions such as the uniform distribution over a convex set), various geometric random walk strategies will mix in time polynomial to the dimension of the problem. As a result, these techniques can generate samples efficiently even in high dimensions.
The geometric random walk is an example of a Monte Carlo method which randomly samples points to include in the data set. QMC methods instead generate a deterministic lowdiscrepancy sequence of points that are “well distributed” compared to random sampling in the sense that the generated points are spread apart [niederreiter1992random, leobacher2014introduction]. These techniques are widely used in mathematical finance to evaluate highdimensional integrals. In addition, the concept of lowdiscrepancy data sets has recently been explored in Analytical Learning Theory to provide bounds on generalization [kawaguchi2018generalization]. One example of a QMC lowdiscrepancy sequence is the Sobol sequence [sobol1967distribution], and there are efficient algorithms to construct these sequences [antonov1979economic].
8.1 Data Set Generation Algorithm
Our proposed data set generation algorithm merges ideas from both geometric random walks and QMC sequences. The full algorithm is described in Algs. 3 and 4, and Figs. 0(a) and 0(b) provide illustrations of its behavior on Sys. 1. While our proposed algorithm does not satisfy the criteria of a geometric random walk, it takes advantage of attractive behaviors from both theories, and empirically performs well on large systems.
Our algorithm initializes and updates a set of seed states that are known to be in . It is also given a set of goal states that correspond to the Sobol sequence. See Fig. 0(a) for an illustration of the first few points of the Sobol sequence in 2 dimensions. The algorithm proceeds by randomly selecting a seed state and pairing it with the next goal state in the Sobol sequence. It then performs the line solve subprocedure detailed in Alg. 3. The subprocedure queries the membership oracle (QP solver) along the line segment formed by the seed and goal state starting at the seed state, where the oracle returns the corresponding optimal primal and dual variables for states determined to be in .
If the membership oracle returns No during this process, the subprocedure terminates, as all remaining points in the line segment will lie outside of due to the convexity of . Otherwise the subprocedure terminates once it reaches the goal state. Upon termination in either case, the subprocedure returns the set of intermediate states that were determined to be feasible, as well as the corresponding optimal primal and dual variables.
Our algorithm receives this set of feasible intermediate states, and includes it into the aggregated data set. In addition, it updates the set of seed states with the last state that the subprocedure determined to be in . The intuition for using the last intermediate state is to spread out the states in the seed set. While geometric random walks typically are not intialized from a predetermined starting point [vempala2005geometric], we initialize our seed set with the equilibrium state, known to be inside , and its corresponding optimal solver state.
Our proposed algorithm is thus similar to the hit and run random walk method with a few differences. The hit and run random walk has a subprocedure similar to our line solve subprocedure. It starts from the current state, which corresponds to our seed state, and randomly chooses a line segment , which can be implemented by randomly choosing a goal state. On the other hand, our algorithm randomly chooses the seed state from a set of states that are updated throughout the algorithm, and deterministically chooses a goal state from a Sobol sequence that has been constructed to be spread apart. We made this modification to introduce the Sobol sequence and take advantage of its ability to generate goal points that are spread apart.
In addition, the subprocedure in hit and run only returns state randomly selected on the line segment, whereas our line solve subprocedure performs a search along this line segment and returns all the intermediate states determined to be feasible. By performing this line search, we can more efficiently query our membership oracle by solving a sequence of closely related points using hot starts described in Sec. 7.1. This behavior of moving to a successor state that is in close proximity to the current state is found in the grid walk, but not the hit and run walk, and solving for intermediate states on the line segment reintroduces this behavior.









Sys. 1  Total  s  s  ms  
(Sobol)  (2)  (0.4)  (0.4)  
Sys. 2  Total  m  h  ms  
(Sobol)  (200)  (40)  (40)  
Sys. 3  Total  m  h  ms  
(Sobol)  (20)  (4)  (4)  
Sys. 4  Total  h  h  ms  
(Sobol)  (400)  (40)  (40) 
Neural networks are typically trained on a train set, and evaluated on a test set. Ideally, the samples are i.i.d, and as a result the test set is completely independent of the train set, but drawn from the same underlying distribution. As a result, performance on the test set is a good indicator of expected performance on the underlying distribution. While geometric random walks provide a way to sample nearly independent points [vempala2005geometric, Thm. 7.3], our proposed algorithm does not satisfy the criteria of a geometric random walk. As a result, we take extra care when creating the train and test data set split, and the procedure is detailed in Alg. 4.
We introduce a buffer set the purpose of which is to generate additional seed states that do not appear in the train data set. The Sobol points are divided into three segments, denoted as train, buffer, and test. We first create the train data set by iterating through each of the train Sobol points and updating the train seed set. We then copy the train seed states generated from this process to form the buffer seed set, and iterate through the buffer Sobol points while updating the buffer seed set. After we exhaust the buffer Sobol points, we form the test seed set by taking the set difference between the buffer seed set and the train seed set. As a result, all of these seed points in the test seed set do not appear in the train data set, and the test Sobol points will also not appear in the train set. Using these test seed states and test goal states, we can then generate the test set. The illustration of the different trajectories generated in the train and test sets are depicted in Fig. 0(b).
9 Numerical Examples
We provide numerical examples of our approach on four systems. The purpose of these examples is to demonstrate that our proposed approach has the ability to scale to large problems, as well as highlight the challenges that arise when scaling to a high dimensional system. We use Tensorflow
[abadi2016tensorflow] to train and evaluate the neural networks, SQOPT [sqopt77] as our primal active set solver, and MPT3 [MPT3] to compute the terminal cost and constraints. Fig. 2 provides an overview of the data sets for each system we evaluated our approach on. These results were generated using dual Xeon E52683 v4 CPUs with cores for the data set generation, and two NVIDIA Titan X Pascal GPUs for the training.9.1 System Descriptions
The first system we evaluated is the double integrator defined in Sys. 1. It is a secondorder system that models the dynamics of a mass in 1D space under a force input.
System 2 (Quadrotor).
The differential flatness of quadrotor systems allows us to generate trajectories in the space of flat outputs (position, yaw and their derivatives) that can be followed by the underactuated quadrotor [mellinger2012trajectory, liu2017search]. We can apply MPC for quadrotor trajectory generation by considering the timeinvariant continuoustime system:
where
There are states and inputs. The submatrices correspond to position, velocity, acceleration, and jerk. We discretize this continuous time system using Euler discretization with a time step of . The constraints are given by , , and input constraints , . The cost matrices are and . For the receding horizon controller, we choose a time horizon , the terminal region to be , and the terminal cost . The number of constraints are , , and . The primal variables have dimension , the dual variables corresponding to the inequality constraints have dimension , and the dual variables corresponding to the equality constraints have dimension .
System  Layer Widths  Num. Parameters  
Sys. 1  2, 32, 32, 30  2,142  
Sys. 2  12, 32, 32, 300  89,900  
Sys. 3 

159,522  
Sys. 4 

1,668,554 
System 3 (Oscillating Masses).
The oscillating masses system, introduced in [wang2010fast], is a linear system that can be easily scaled to large dimensions by increasing the number of masses and springs in the system. We use masses with a mass of , and springs with a spring constant and damping constant . Let and . The system is defined as
where