I Introduction
The process of designing high performance controllers for agile robotic systems that satisfy state and actuation constraints is challenging for systems with important nonlinear dynamical effects. Model predictive control (MPC) can capture appropriate performance objectives and constraints, and it can be used with nonlinear dynamics models if carefully implemented [Kouzoupis2018RecentControl, Gros2020FromIteration, Grandia2020NonlinearFunctions]. However, obtaining a sufficiently accurate dynamical model is crucial to achieve good performance with nonlinear MPC (NMPC). Many learning approaches have been proposed to capture a robots’s complex mechanics and environmental interactions, reducing the need for time consuming system identification (cf. [Rasmussen2018GaussianLearning, Shi2019NeuralDynamics, Taylor2019EpisodicSystems, Recht2019AControl, Kaiser2018SparseLimit]). However, NMPC design based on these models is typically impossible because of the model discretization and the intractable forward simulation needed to evaluate the nonlinear program. Building on our previous work [op_icra_21], we take a Koopmancentric approach to jointly learn a function dictionary and a lifted bilinear model of the dynamics that can readily be used with realtime NMPC to obtain nearly optimal controllers that respect state and actuation constraints.
We focus on learning controlaffine dynamics of the form
(1) 
where is the system state and
is a vector of control inputs. This model characterizes a wide class of aerial and ground robots. Rather than describing a system’s behavior by its state space flows, Koopman methodologies study the evolution of
observables, which are functions over the statespace. In this space, a dynamical system can be represented by a linear (but possibly infinite dimensional) operator [Lan2013LinearizationSpectrum, Mauroy2016LinearOperator], which is approximated in practice by a finite dimensional model. The extended dynamic mode decomposition (EDMD) [Schmid2010DynamicData, Williams2015ADecomposition, Korda2018LinearControl, Brunton2016DiscoveringSystems, Kaiser2021DatadrivenControl, Proctor2018GeneralizingControl] is a common identification technique.Most current EDMDmethods model the unknown control system dynamics by a lifted linear model, which implicitly restricts the control vector fields, , to be stateinvariant. This is a significant limitation, as many robotic systems, e.g. systems where input forces enter the dynamics through rotation matrices, are best described by nonlinear controlaffine dynamics. Instead, we learn a model underpinned by the Koopman canonical transform (KCT) [Surana2016KoopmanSystems], which allows a large class of nonlinear controlaffine dynamic models to be described by a bilinear (but possibly infinite dimensional) dynamical system. Our previous work showed how to formulate an EDMDtype method to identify an approximate lifted bilinear model from data and how to utilize the bilinear model for NMPC [op_icra_21]. However, for realistic systems, such as the quadrotor drone considered in the experimental section, the dimension of the lifted model becomes too high when using a fixed function dictionary, resulting in the NMPC being intractable to implement in realtime.
As a result, we incorporate the bilinear Koopman structure in a neural network model, allowing the function dictionary and bilinear model to be learned jointly from data. This enables similar or improved prediction performance with much fewer observable functions, thus allowing realtime use. Additionally, the resulting model maintains the bilinear structure, making it partially interpretable and possible to simulate its behavior forward in time by only evaluating the neural network at the initial state and propagating the lifted state forward using the bilinear system matrices. Choosing a good function dictionary is a central challenge in Koopmanbased methods, and using neural networks to jointly learn Koopman models and function dictionaries has been attempted previously. Multiple works have shown that using an encoderdecoder type architecture to parametrize the function dictionary allows many of the benefits of a Koopmanbased model to be maintained while obtaining more compact model and/or improving prediction performance compared to fixed dictionaries [Li2017ExtendedOperator, Kaiser2021DatadrivenControl, Wang2021DeepRacing]. However, no existing method utilizes the proper bilinear models to accurately capture controlaffine systems, nor has a view towards achieving high performance control for robotic systems. As such, our contributions are:

We develop a flexible learning method underpinned by the Koopman canonical transformation to a bilinear form that can readily be used for NMPC design.

We show in experiments with a quadrotor that a controller based strictly on the datadriven model can outperform an NMPC based on a known, identified model, while needing limited training data.
This manuscript is organized as follows. Preliminaries on the KCT are presented in Section II, and the problem statement and assumptions are discussed in Section III. Then, the method to jointly learn the function dictionary and bilinear model is presented in Section IV before control design is discussed in Section V. Finally, the method is demonstrated experimentally on a quadrotor drone in Section VI followed by concluding remarks in Section VII.
Ii Koopman theory preliminaries
Iia Koopman spectral theory
Before considering the effects of control inputs, we introduce the Koopman operator, which is defined for autonomous continuoustime dynamical systems, , with state , and is assumed to be Lipschitz continuous on . The flow is denoted by , defined as for all . The Koopman operator semigroup , from now on simply denoted as the Koopman operator, is defined as:
(2) 
for all , where is the space of continuous observables , and denotes function composition. Each element of is a linear operator.
of the Koopman operator associated to an eigenvalue
is any function that defines a coordinate evolving linearly along the flow of the system:(3) 
IiB The Koopman canonical transform
We now return to controlaffine dynamics (1). and recall when and how they can be transformed to a bilinear form through the Koopman canonical transform [Surana2016KoopmanSystems]. Let be eigenvalueeigenfunction pairs of the Koopman operator associated with the autonomous dynamics of (1). The KCT relies on the assumption that the state vector can be described by a finite number of eigenfunctions, i.e. that for all , and where . This is likely to hold if is large. If not, they may be well approximated by eigenfunctions.
When , the KCT is defined as
(4) 
where , , and is a diagonal matrix with entries .
Under certain conditions, system (4) is bilinearizable in a countable, possibly infinite basis. We restate the conditions for the existence of such bilinear form in the following theorem that underpins our learning method.
Theorem 1.
Hereafter, the finite basis of functions used in the KCT is termed the function dictionary.
Although the conditions of Theorem 1 may be hard to satisfy in a given problem, an approximation of the true system (1) can be obtained with sufficiently small approximation error by including adequately many eigenfunctions in the basis. As a result, and the system can be expressed as the Koopman bilinear form (KBF) (see [Goswami2018GlobalApproach] for details):
(5) 
Iii System modeling and data collection
Iiia Modeling assumptions
We model the quadrotor dynamics using states global position , velocity , attitude rotation matrix , and body angular velocity , and consider the following dynamics:
(6) 
where and are the vehicle mass and inertia matrices, respectively, is a skew symmetric mapping, and is the gravitational force vector. The state and controldependent functions and capture unmodelled dynamic terms, such as the nearground effect.
As is common, we abstract the system’s control inputs to a total thrust in the body zdirection, , and body torques , . The mapping of rotor speeds to the abstract controls is typically modeled by a linear combination of the squared rotor speeds:
(7) 
where and is the propeller thrust and torque coefficients, is the distance from the vehicle’s center of gravity to the propeller axle,and are the propeller rotation rates. This mapping enables the abstracted controls to be translated into propeller rotational rates.
IiiB The learning process and data collection
To collect the learning data set, we assume that data collection trajectories of length , from initial conditions , can be executed under a nominal controller. From each trajectory, state and control actions are sampled at a fixed time interval , resulting in a data set
(8) 
where is the state at the next timestep, i.e. .
Since the NMPC design requires continuoustime models to be discretized, we learn a discretetime lifted bilinear model, thereby avoiding potential numerical differentiation and discretization errors. This is further motivated by the existence of discretization procedures that maintain stability properties and the bilinear structure of the original system, such as the trapezoidal rule with zeroorderhold [Phan2012DiscretetimeModels, Surana2018KoopmanEstimation].
Iv Joint learning of the Koopman dictionary and model
To formulate the learning problem, we follow an analogous approach to the EDMD methods where we seek to use a dictionary of nonlinear transformations to lift the state to a higher dimensional space where the system’s dynamics can be described by a bilinear model [op_icra_21]. However, unlike classical EDMD, we parametrize the function dictionary by a neural network, and jointly learn the function dictionary and bilinear model matrices. As a result, we use to parametrize the neural network representing the function dictionary , and use to represent the bilinear model matrices described in Section II.
The key idea of the learning method is that we can encode the function dictionary parametrization and the bilinear model structure in a single neural network. Then, once training is complete, the learned function dictionary and model matrices can be extracted to maintain the benefits of Koopman based learning methods outlined in Section I
, while improving the prediction performance and/or reducing the dimension of the function dictionary. To achieve this, we formulate a loss function,
to be minimized by empirical risk minimization over all the inputoutput pairs in the training data set, :(9) 
where is the mean squared error (MSE) of the reconstruction loss when lifting the system’s state using the encoder and projecting back down to the original system state with the projection matrix , is the MSE of the onestep prediction error of the model when projected down in the original state space, and
is the MSE of the onestep prediction error in the lifted space. Tunable hyperparameters
and determine the weight of the prediction and KCT losses, respectively relative to the reconstruction loss.The learned model should realize good prediction performance in the original state space: minimizing encourages this goal. Loss terms and respectively promote accurate projection of the function dictionary to the original state space and an approximately bilinear dynamical system model in the lifted space. Loss promotes good prediction accuracy over multiple time steps, as multistep prediction is performed in the lifted space before the result is projected to the original state space only at relevant times.
Figure 1 depicts the neural network architecture that implements loss (9). The autoencoder (Fig. (a)a) passes the system state through the neural network, , to obtain the lifted state . Subsequently, the lifted state is projected back to the original state space through the projection matrix , resulting in the reconstructed state, , which is compared to the original state, , in . The state prediction model (Fig. (b)b) passes the system state through the neural network to obtain the lifted state, , before evolving the lifted state one timestep with the bilinear model based on and to get the onestepahead lifted state, . The lifted state prediction is projected to the original state space to get the onestepahead state prediction, , which is compared to the true onestepahead state, , in . Finally, the lifted state prediction model (Fig. (c)c) follows the same forward pass through the same state prediction model to get the onestepahead lifted state prediction . This is compared to the true onestepahead lifted state in , obtained by passing the true onestepahead state, through the neural network .
This approach can be implemented using modern neural network software packages with basis functions modeled as a feedforward neural network having fully connected layers and as single fully connected layers with no nonlinear activations. The entire model can be trained simultaneously using gradientbased learning algorithms applied to the loss function (9).
V Nonlinear model predictive control design
Va Design considerations
We formulate the NMPC problem using the bilinear model identified by the Koopman framework [op_icra_21]:
(10) 
where is the desired trajectory to be tracked.
Although (10) models a quadratic cost function and linear state and actuation constraints, nonlinear objective and constraint terms can be included by adding them to the lifted state . For example, if it is desired to enforce the constraint , we can add to the lifted state and enforce where is the index corresponding to [Korda2018LinearControl].
VB Sequential quadratic programming
The optimization constraint set is generally nonconvex due to the bilinear term, , in the lifted dynamic model and control inputs, . As a result, we solve the optimization problem (10) using sequential quadratic programming (SQP). The nonconvex optimization problem (10) is sequentially approximated by quadratic programs (QPs), whose solutions are Newton directions for performing steps toward the optimal solution of the nonlinear program (NLP). The sequence is initialized at a guessed solution, , at which the following QP is iteratively solved. The initial guess is updated at each iteration i until converged:
(11) 
As a result of the bilinear structure of the dynamics model, the linearization can be efficiently computed for a given initial guess by simple matrix multiplication and addition with the dynamics matrices of the Koopman model and the matrices containing the initial guesses of .
VC Warmstart of SQP at each timestep
As discussed in Section VB, the SQP algorithm requires an initial guess of the solution . Selecting an initial guess that is sufficiently close to the true optimal solution is essential for the algorithm to converge fast and reliably [Gros2020FromIteration]. It is well known that the receding horizon nature of MPC can be exploited to obtain excellent initial guesses. At a time instant , this can be achieved by shifting the NMPC solution from the previous timestep and by updating the guess of the final control input. Under certain conditions, a locally stable controller enforcing state and actuation constraints can be designed allowing feasibility of the initial guess to be guaranteed [Rawlings2012ModelDesign]. Typically, simpler approaches are taken, such as adding a copy of the final control signal and calculating the implied final state using the dynamics model
(13) 
If the previous solution is feasible, the shifted solution will also be feasible for all but the last timestep.
VD Koopman NMPC algorithm
The initialization and closed loop operation of the controller can be summarized as follows (see Algorithm 1). An initial guess of the state and control, is chosen and the state is lifted with the dictionary of functions at each timestep to obtain . Then, before task execution, the SQP algorithm with the Koopman QP subproblem (10) is executed to convergence to obtain a good initial guess of the solution. Subsequently, in closed loop operation, the Koopman bilinear model is linearized along the initial guess. The current state is obtained from the system and lifted using the function dictionary, and then the QP subproblem is solved only once. Finally, the first control input of the optimal control sequence is deployed to the system, and the full solution is shifted one timestep and used as an initial guess at the next timestep.
Vi Experiments
We conduct experiments to demonstrate the performance of the proposed method on a quadrotor drone. To capture the nonlinearity in the quadrotor dynamics, we perform 6 secondlong “Figure 8” tracking maneuvers (see Fig. (b)b) at very low altitudes above the ground so that “ground effect” corrupts the nominal model. The coordinates of the “Figure 8” trajectory require high roll and pitch maneuvers and the low altitude flights highlight the effect of unmodeled dynamics, encapsulated in the and terms in (6).
Via Implementation and experimental details
The experiments are conducted with a commercially available Crazyfile 2.1 quadrotor. Global position is measured via an OptiTrack
motion capture system (tracking at 120 Hz) and fused with the onboard IMU data to get filtered state estimates. The allup quadrotor weight is 33.6 g, with a thrust to weight ratio of 1.7. Control commands are computed on a offboard computer and communicated to the drone over radio, see Fig.
(a)a. All communication with the drone is done using the Crazyswarm Python API [Preiss2017Crazyswarm:Swarm].To collect training data, waypoints were sampled uniformly at random within meters and tracked with a PID position controller for a total of 4 minutes. The estimated position, attitude, linear and angular velocites, motor pulse width modulation (PWM) signals, and battery voltage were collected at 500 Hz. The state estimates where downsampled and the PWMs and voltage averaged over each 0.02 s interval to obtain a dataset at 50 Hz, the target update rate for the controller. This smoothing process better captures motor PWM inputs, as the raw data showed significant variability within each 0.02 s period. Finally, we map PWMs and voltage to thrust using the model identified in [Shi2020NeuralSwarm2:Interactions] (see Tab. 1, [Shi2020NeuralSwarm2:Interactions]) and apply the thrust mixing in (7) to get total thrust and torques around each axis, resulting in the inputs used in the learning process.
A discretetime lifteddimensional bilinear model is learned as described in Section IV. Thirty percent of the data set is held out for validation and testing and the hyperparameters are tuned to obtain good open loop prediction performance over 0.5 seconds, the same prediction horizon used in the model predictive controllers. The final parameters used are included in Tab. I. The neural network is implemented using PyTorch [Paszke2019PyTorch:Library]^{1}^{1}1github.com/Cafolkes/koopmanlearningandcontrol. To simplify encoding the objective and constraints of the NMPC, the state itself is added to the lifted state, , making the projection matrix known, . This makes the reconstruction loss, in (9) redundant and we set .
Two controllers are implemented for the task. First, a nominal NMPC using the dynamic model (6) with and the system parameters described in Tab. II, is used as a performance baseline. Second, the Koopman NMPC described in Section V, using the learned model, is implemented. Both controllers are coded in Python using the OSQP quadratic program solver [Stellato2018OSQP:Programs]. The objective penalizes errors between the state and desired trajectory in . The control inputs are constrained using the vehicle’s mechanical limitations. The position error penalties, , and control effort penalties, , and remaining control parameters are described in Tab. II. The control thrusts and desired attitude are calculated by the NMPCs and sent to the onboard drone controller, which decides the final motor control allocations. This architecture enables both controllers to cycle at 50 Hz while attitude tracking runs at a higher rate onboard the drone.
ViB Results and discussion
Fig. (c)c and Tab. III detail the tracking performance of the nominal and Koopman NMPC tracking a 6 secondlong “Figure 8” trajectory over 10 consecutive experiment runs. The yaw and pitch performance is comparable indicating accurate extraction of the rollpitchyaw dynamics. Furthermore, the large tracking error from the nominal NMPC induced by the ground effect is successfully captured by the Koopman learned dynamics. Note that the drone both starts and ends at zero velocity causing increased tracking error at the beginning and end of the trajectory.
To further study this conjecture, we executed 5 “Figure 8” trajectories at decreasing altitudes, as shown in Fig. 3. At altitudes 0.25 m and higher, nominal and Koopman NMPCs tracking performance is similar. However, at lower altitudes nominal NMPC stability fails, leading to catastrophic crashes. In contrast, the Koopman NMPC completed all five tests, albeit with increased tracking error as the altitude setpoint approaches 0.05 m above the ground. We also note that the Koopman NMPC exploits the buoyancy in resulting from the ground effect at lower altitudes to significantly reduce the thrust needed to complete the task.
# of encoder layers  2  Learning rate  1e3 
Layer width  100  KCT loss penalty,  2e1 
Activation function  tanh  regularization strength  5e4 
Total lifting dimension  23  # of epochs 
200 
Mass  33.6 g  State error penalty, Q  [10, 10, 10] 

Max thrust  57.0 g  Control penalty, R  [1, 1, 1, 1] 
[Landry2016AggressiveProgramming]  1.7e5  NMPC prediction  
[Landry2016AggressiveProgramming]  1.7e5  horizon  0.5 s 
[Landry2016AggressiveProgramming]  2.9e5  Controller timestep  0.02 s 
Nominal  Koopman  

NMPC  NMPC  
Avg. position tracking error (mse)  4.7e3  4.8e3 
Avg. control effort (thrust norm)  10.7  10.5 
Vii Conclusion
The coupling of Koopmanbased bilinear models and NMPC allows for realtime optimal control of robots that captures important nonlinearities, while allowing for critical state and control limits. Function dictionary design is a key challenge in Koopmanbased modeling and control methods. This paper presented a method to jointly learn a lifted Koopman bilinear model and KCT function dictionary strictly from data, enabling more compact models and/or better prediction performance compared to predefined function dictionaries. More compact models, i.e. lower lifting dimension, are crucial for robotic applications where realtime control is needed. Datadriven models are valuable in cases where firstprinciples modeling may be difficult. Our quadrotor drone experiments demonstrate good prediction performance with a lifting dimension of only 23. The associated Koopman NMPC can match the performance of a NMPC based on the nominal model (6) far away from the ground and outperforms the nominal NMPC in nearground regimes. Notably, the nominal controller is unable to maintain stable flight near the ground, whereas the Koopman NMPC maintains acceptable tracking performance down to 0.05 m altitude.
Future work includes deriving model error bounds for the learned model and exploiting the bounds to derive theoretical guarantees on controller performance. This can be achieved by extending our previous work to the bilinear model setting [op_acc_21].
Comments
There are no comments yet.