In the past, diverse approaches have been used for robust, accurate and fast control [KJAstrom:PID:1995], [Bellman:1954], [AlbertoBemporad:2003] and [Houska:2011]. Applied across a wide range of domains, from chemicals to aerospace industries, one of the most powerful and practically useful approaches is NMPC [Allgower:2017]. Its main advantage is that it allows making predictions about the immediate future point under constraints while considering all predicted future points over a given horizon. In recent years, several efficient solutions to NMPC and improvements have been proposed [Qin:2000]. One of the most prominent methods for solving NMPC problem is the real-time iteration (RTI) scheme [Diehl:2012]
. The advantage of RTI is that it allows "on-the-fly" prediction updates: as new estimates become available, an iterative solution using only a small number of iteration steps gives the new predictions.
In the NMPC problem, the state and control weights for the corresponding costs significantly impact the control performance. Usually, fixed weights are carefully selected based on human expert knowledge (platform and trajectory wise tuning) for unmanned aerial vehicle navigation. Under many scenarios, this strategy is preferable. Such weight selection takes into account stability-related properties and exploits the strengths of NMPC. However, whether this approach uses the advantages of the NMPC to the full extent remains an open question.
Additionally, weight tuning might not allow good generalization across a set111In a sense that a single choice of weights might not allow tracking with the lowest positional error and "smooth" change over the predicted controls. of diverse trajectories. On the other hand, online and data-adaptive strategies for updating the cost weights during trajectory execution were not studied extensively. In this line, besides the link between NMPC and online learning [Wagener:2019]
, other connections to known machine learning paradigms with emphasis on the weigh estimation remain not explored.
To utilize the full control potential of NMPC, we present online weight-adaptive approach. We introduce a novel control problem formulation, where, contrary to using predefined and fixed weights for the state cost, we include the weights of the state cost as a variable in our control problem. This allows us to propose an algorithm that can improve the state and control prediction by optimally updating the weights in an online fashion. Moreover, in our approach, we provide a generalization for a class of weight cost priors and function approximations (including but not limited to neural networks). Also, we give connections of our approach to online learning[Shalev-Shwartz:2014]Sutton:1998], and metric learning [Kulis:2013].
In the following, we summarize our main contributions.
We introduce a novel variant of the very well known NMPC control problem, where we address joint prediction of state and control variables and the estimate of the corresponding state and control weights.
We propose a two-stage alternating RTI algorithm. It consists of (i) state and commands variable prediction and (ii) optimal update of the control weights.
We validate our approach by numerical simulation for the task of quadrotor navigation. We demonstrate that our approach reduces the error in the trajectory tracking while predicting controls that have "smooth" change over the executed trajectory. Compared to the solution of the commonly used NMPC, we show improvements of up to in the accuracy of the executed trajectory.
I-B Related Work
Adaptive Model Predictive Control (MPC) was studied in [An:2009], [Turki:2017], [Bujarbaruah:2018], [Zhao:2017] and [Wagener:2019]. In [An:2009], the authors proposed an adaptive multi-variable zone controller and gave robustness guarantees for the controlled process. As an adaptive component, they added weighted slack coefficients to the nominal weight coefficients. In [Turki:2017], the authors were focused on an analytical approach for tuning the control horizon. Their idea consists of computing the value for the optimal control horizon that ensures numerical stability. At the same time, their interest was on a wide set of linear controllable single-input single-output processes. In [Zhao:2017], the authors developed an adaptive cruise control system for vehicles. The authors utilized a hierarchical control architecture in which a low-level controller compensated for the nonlinear longitudinal vehicle dynamics. Their design enabled tracking of the desired acceleration. They solved a multi-objective control problem by a real-time weight tuning strategy by adjusting each objective’s weight for different operating conditions. The authors in [Bujarbaruah:2018], proposed an adaptive stochastic model predictive control strategy. They considered multi-input multi-output systems that can be expressed by a finite impulse response model. In [Wagener:2019], a close connection between MPC and online learning was shown. The authors have proposed a new algorithm based on dynamic mirror descent. By doing so, they presented a general family of MPC algorithms that include many existing techniques as special instances. The same authors also provided different MPC perspectives and suggested principled schemes for the design of new MPC algorithms.
In contrast to the past work, as our main novelty, we consider cost-related weights as an additional variable in the NMPC problem. In this regard, our problem formulation extends the common NMPC problem. We propose a novel algorithm as a solution, while along the way, we also give connections to machine learning paradigms with a focus on metric learning.
I-C Paper Organization
The rest of the paper is organized as follows. In Section II, we present the continuous control problem and its approximate discrete version. In Section III, we first present the approximate control problem formulation for our NMPC with online weight adaptation. Then, we propose our RTI solution in the form of a two-stage alternating algorithm and discuss the solution. We also give connections to known learning paradigms. While, we devote Section IV to numerical evaluation, and with Section V, we conclude the paper.
Ii Robot Control
In this section, we first present the continuous problem formulation for robot control and then give its discrete version.
We assume that the dynamics are described by a set of differential equations , where and denote state and control variables. Furthermore, we assume that an action objective is given, which defines the action cost . The task of taking action can be expressed by the following optimization problem:
where and represent equality and inequality constraints that the solution should satisfy when we take action. In order to solve (1), first, a cost function for taking action is defined. Then (1) is discretized, transcribed, and linearized. In the following text, we go through these steps, which will allow us to present the discretized version of (1).
Ii-a The Discrete Control Problem
As a common practice, the system dynamics (1) are discretized into system points by using a time step over fixed time horizon . This results in
state vectorsand control vectors for . We assume that we have a reference state and reference controls . While, we denote the state errors as and the control errors as . We define our discrete objective as , where and denote the state and control weight matrices.
In (1), the equality constraints represents the model for the robot dynamics . While the inequality constraints represent the physical limitations of the robot platform. As a common practice, problem (1) is transcribed using multiple shooting technique. Moreover, having the discrete dynamics and constraints over the coarse grid for each time interval , , a boundary value problem is solved, where additional continuity variables are imposed. In addition, an explicit integrator is applied to forward simulate the system dynamics along each interval .
Under the above considerations, usually (1) is sequentially approximated by quadratic problems (QPs). The solution of the QPs are used as an gradient directions and in order to take steps that minimize the original continuous problem. Starting from the available guess (predictions) and , the iterations are repeated by taking (not necessarily full) Newton steps in the form , where and is the step size which guarantees that the update is in the decent direction. As an example, in the sequential QP approach, given a state estimate , a prediction about the state and the control , the usual approximation of (1) with respect to and is the following QP:
where , . While , and . Where denoted the partial derivative of with respect to , which is evaluated at , is the Hessian of the Lagrangian for (1) and . One popular approximation to the Hessian is [Sebastien:2016]. We note that in order to simplify the problem description (2), we omitted the cost related to the last state prediction, but nonetheless, we are taking it into account.
Iii Online Weight-Adaptive Nmpc
In this section, we present the problem formulation for our online weight-adaptive NMPC. Then, we present our two-stage alternating algorithm. Afterward, we explain and discuss the related problems at each stage. Finally, we give connections to known learning principles.
Iii-a Min-Max Approximate Control Problem
We propose to jointly (i) predict the control, and state variables and (ii) estimate the weight matrix. To that end, we present the following problem formulation:
where and together with are problem variables that we would like to optimally estimate, while is the Lagrangian variable. In general, could be any weighs related cost function, and is its corresponding parameter. In the simplest form, we consider diagonal , and we define , where is one vector.
Our proposed formulation is a min-max problem with quadratic and bilinear cost functions. If we fix the weights, the reduced problem over the remaining state and control variables is convex. On the other hand, if we fix the state and control variables then the reduced problem over the weight variables can be converted again into a convex problem.
Iii-B The Algorithm
To solve (3), we propose an alternating algorithm, which consists of two stages. In stage one, we fix the weights and predict the state and control variables. In stage two, we fix the state and control variables and estimate the weights. In the following, we describe the corresponding problems for the two stages and discuss on their solution.
Iii-B1 State and Control Prediction
Let the weights be fixed, then problem (3) reduces to the following QP:
Since we have quadratic losses, and linear equality and inequality constraints, problem (4) represents a convex quadratic program with linear constraints. Problem (4) is well known and explored [Boyd:2019], while for a possible solver, we refer to [Boyd:2019], [Sebastien:2016]. After (4) is solved, as a prediction for the state we use , while as as a prediction for the control we use .
Iii-B2 Weights Update
This stage enables us to adapt our cost function for taking actions by adjusting and updating the weigh. In turn, this allows us to penalize future errors based on the error between the reference and the currently predicted state. To do so, we fix the control and state variables in (3) and let . In the simplest form, we define as 222It is worthwhile to mention that with we can consider a wide range of parametric function. Meaning that given data, we could also offline learn and estimate the parameters in our function .. By denoting , (3) reduces to the following quadratic problem:
where , , is the sub-horizon for the weight update. The main advantage of (5) is that it has a closed form solution, i.e., . Having the estimated , we update as .
We point out that the vector is with non-negative values as long as . Therefore, under bounded variations , we can ensure that our weight matrix is positive definite. While, under arbitrarily variations a non-negativity333Note that if we include non-negativity constraint in (5), the closed from solution for reads as . constraint in (5) could be used to ensure that is positive definite.
Iii-C Connection to Known Learning Paradigms
Our online weight update approach also represents a special case of metric learning [Kulis:2013], wherein our case, the loss represents the metric, which is included in (3). In the following, we give connections to metric learning, online learning, and reinforcement learning.
Iii-C1 Metric Learning
The defined cost function for predicting the state is is the general Mahalanobis distance metric . The online update of our distance metric under with is equivalent to learning a linear mapping (given by ) that transforms the data in the space of . After projecting the data onto the new space through the linear map , the corresponding distance metric is the usual Euclidean distance.
Iii-C2 Online Learning
Our algorithm can be viewed as an extension of the online learning approach to model predictive control [Wagener:2019]. Online learning concerns iterative interactions between a learner and an environment over several rounds . At round (or time instance)
, the learner picks one out of the set of decisions. The environment then evaluates a loss function based on the learner’s decision, and the learner suffers a cost. The learner’s goal is to minimize the accumulated costs. As shown in[Wagener:2019], at time t (i.e., round t), an MPC algorithm optimizes a controller (i.e., the decision) over some cost function (i.e., the per-round loss). In this regard, we highlight the following connection to online adaptation and learning. Our alternating algorithm, observes the cost of the initial controller and then improves the controller by updating the cost and the controller, and only then executes a control based on the improved controller.
Iii-C3 Reinforcement Learning
Regarding the connection of our algorithm to the core principle behind reinforcement learning, we have the following. Upon observing a measurement, in the first stage of our algorithm, we generate state and control prediction. In light of reinforcement learning, we consider this as a sample from some underlining control policy. Afterward, in stage two, we update the weights. Thus our cost metric translates into updating the policy after observing the error between the predicted and reference state.
Iv Numerical Evaluation
In this section, we evaluate our approach. Our numerical experiments consider trajectory execution for a quadrotor. Therefore, in the following subsection, we first present the used dynamical model for quadrotor control. Afterward, we describe the experimental setup and discuss the results.
|e[m] | TV||e[m] | TV||e[m] | TV||e[m] | TV|
|[Houska:2011]||0.44 | 1.57||0.66 | 1.29||1.9 | 1.21||0.87 | 1.19|
|0.76 | 2.25||0.95 | 1.58||1.55 | 1.56||1.93 | 1.58|
|0.78 | 2.26||1.19 | 1.52||1.36 | 1.43||1.70 | 1.38|
|0.79 | 1.89||1.39 | 1.48||1.58 | 1.43||1.82 | 1.41|
|[Houska:2011]||0.37 | 0.72||0.64 | 0.67||1.16 | 0.64||1.85 | 0.63|
|0.49 | 0.77||0.58 | 0.75||0.68 | 0.69||0.80 | 0.67|
|0.5 | 0.73||0.60 | 0.73||0.72 | 0.68||0.84 | 0.66|
|0.52 | 0.72||0.62 | 0.72||0.75 | 0.68||0.88 | 0.65|
|[Houska:2011]||0.34 | 0.03||0.60 | 0.02||2.11 | 0.02||4.49 | 0.02|
|1.22 | 0.38||0.98 | 0.04||1.89 | 0.04||3.87 | 0.03|
|0.94 | 0.20||0.73 | 0.04||1.82 | 0.03||3.81 | 0.03|
|0.68 | 0.86||0.57 | 0.04||1.76 | 0.03||3.72 | 0.03|
|[Houska:2011]||0.27 | 0.08||4.63 | 0.05||14.6 | 0.05||28.7 | 0.04|
|0.38 | 0.21||2.56 | 0.12||6.42 | 0.11||12.4 | 0.28|
|0.29 | 0.11||2.43 | 0.08||6.33 | 0.12||12.2 | 0.19|
|0.27 | 0.46||2.03 | 0.46||5.87 | 0.14||11.9 | 0.31|
Iv-a Used Model
In the following, we describe the used dynamical model.
Iv-A1 Quadrotor Dynamics
Our state and control vectors of the quadrotor are defined as , where and denote the position and the orientation of the body frame with respect to the world frame , expressed in world frame, respectively. While denotes the linear velocity of the body, expressed in world frame, and its angular velocity, expressed in the body frame. The vector is the mass-normalized thrust vector, where , is the thrust produced by the -th motor, and is the mass of the vehicle. We define the dynamical model for the quadrotor as , where , and are the time derivatives of the position, liner velocity and the quaternion, while is the gravity vector, with . The operator denotes the multiplication between a quaternion and a vector, denotes the time derivative of a quaternion , while
is the the skew-symmetric matrix of the vector.
|e[m] | TV||e[m] | TV||e[m] | TV||e[m] | TV|
|[Houska:2011]||12.0 | 1.24||2.22 | 1.38||0.98 | 1.29||0.64 | 1.34|
|1.49 | 1.38||1.38 | 1.49||1.13 | 1.56||1.22 | 1.55|
|0 | 0||1.61 | 1.58||1.45 | 1.53||1.22 | 1.55|
|0 | 0||0 | 0||1.28 | 1.55||1.00 | 1.56|
|[Houska:2011]||3.03 | 0.64||1.19 | 0.66||0.71 | 0.67||0.47 | 0.68|
|0.97 | 0.69||0.73 | 0.71||0.62 | 0.72||0.53 | 0.74|
|0 | 0||0.75 | 0.70||0.64 | 0.72||0.55 | 0.73|
|0 | 0||0 | 0||0.65 | 0.72||0.56 | 0.73|
|[Houska:2011]||12.75 | 0.03||2.73 | 0.02||0.72 | 0.02||0.45 | 0.02|
|4.05 | 0.05||1.79 | 0.04||0.86 | 0.04||0.52 | 0.04|
|0 | 0||1.17 | 0.07||0.66 | 0.04||0.50 | 0.04|
|0 | 0||0 | 0||0.61 | 0.92||0.51 | 0.10|
|[Houska:2011]||59.62 | 0.06||15.7 | 0.05||5.73 | 0.05||2.07 | 0.05|
|24.69 | 0.07||6.90 | 0.08||2.94 | 0.08||1.44 | 0.08|
|0 | 0||5.48 | 0.08||2.27 | 0.56||1.19 | 0.42|
|0 | 0||0 | 0||1.95 | 0.74||1.15 | 0.73|
Iv-A2 Quadrotor Physical Constraints
By the inequality constraint , we model the physical limitations of the drone platform in order to attain feasible solutions. In our case it is the minimum and maximum thrust and , as well as minimum and maximum angular velocities and , respectively, which we compactly express as and .
Iv-B Setup, Error Measures and Comparative Analysis
We generated four different trajectories, which were computed as proposed in [Mellinger:2011]. As shown in Figure 3, the trajectories have different geometries. The first and the second trajectory are aggressive and are denoted as and . The third trajectory is circle and the fourth trajectory is diamond.
We validated our approach under different setups. We experimented with different strength for the state costs in the control problem. As well as we experimented with different lengths of the fixed horizon and different lengths of the sub horizon, which were used to update the weight in the cost. In addition, we validated our approach under additive white Gaussian noise perturbation in the available state estimate. In summary, we present simulation results under:
Different strength of the state cost,
Different length of the prediction horizon and
Noise perturbation with different noise levels in the available (measured) state.
Over all trajectory points, we measure and report the total accumulation of error as , where represents the error between the simulated position after applying the predicted control and the reference position . In addition, we also measure the total variation over commands, i.e., and consider it as an indicator for "smooth" changes in the predicted commands during trajectory execution.
In the first two series of experiments for different strength of the state cost, and for different lengths of the prediction horizon , we also experimented with different sub-horizon lengths . Regarding the noise corruption protocol for the third set of experiments, we implement it as follows. We randomly selected the index for one point from the trajectory. Then, during execution, at the corresponding index , we corrupted the available state with Additive White Gaussian Noise (AWGN) as follows , while we ensured that . In the same experiment, we compute average error as for runs of this procedure. We compare our algorithm with the standard solution to NMPC with fixed weights, which we implemented using [Houska:2011].
In the standard NMPC, the hole weight matrix has to be tuned. In contrast, in our approach the only tuning parameter is . During simulation, we also found out that , has better performance then . Therefore, in all of the experiments, we update the weights as .
Iv-C Results Discussion
In Tables I, II and III, we present the results of our computer simulation. We show the resulting trajectory tracking errors and total variation errors of our algorithm and the comparing method (the standard NMPC, with fixed weights).
As we can see in Table I, the total error in position is reduced as a result of a small increase in the total variation of the command prediction when compared to the common solution [Houska:2011] of the NMPC. Moreover, as shown in Table I, for very small values of the parameter, the accuracy for the trajectory tracking of our algorithm is lower than the accuracy of the comparing algorithm [Houska:2011]. However, we note that at such small values for , the changes in the predicted controls during the trajectory execution are not "smooth". In practice, very small might lead to potentially non-stable behavior. On the other hand for values above , i.e., , the changes in the predicted controls are more "smooth". At the same values, we report improved performance. Our approach achieves lower total error compared to the common solution of the standard NMPC. When the sub-horizon 444Our results are computed for sub-horizon length smaller then the horizon length , . has larger length, the results error is lower, but the TV is also high.
Table II shows that the total error in position is consistently lower compared to [Houska:2011] over different horizon lengths. It is interesting to highlight that even for low horizon length like , the algorithm achieves high tracking accuracy. While, both of the comparing algorithms have the lowest errors for horizons between and .
In Table III, we can see that for a noise level in the range of to , the average positional error of the our algorithm is lower compared to the same error for the standard NMPC [Houska:2011].
As a summary, the simulation results demonstrated that by our approach, which is without manual weight tuning, we could archive accurate and stable quadrotor navigation. The execution of smooth as well as fast and rapidly changing reference trajectories benefits from online weigh adaptation. The results also show that we can have relatively good tracking performance even under a small horizon length. It is essential to point out that not all online weight update configurations are useful. In other words, not all algorithm setups (for different and ) provide improved accuracy with "smooth" changes in the predicted controls over the executed trajectories in the simulation.
In this paper, we presented a novel control problem formulation for NMPC with an online update of the cost weights. As a solution, we proposed a two-stage alternating algorithm. It consists of: (i) state and commands variable prediction and (ii) optimal weights update. Our evaluation by computer simulation demonstrated not only high accuracy for trajectory tracking but also robustness to noise perturbation. Comparing the solution of our approach to the solution of the common NMPC with fixed weights, we demonstrated lower tracking error for the used reference trajectories. Our next steps are to test the performance on a real drone platform.