Quadrotor Trajectory Tracking with Learned Dynamics: Joint Koopman-based Learning of System Models and Function Dictionaries

10/20/2021
by   Carl Folkestad, et al.
0

Nonlinear dynamical effects are crucial to the operation of many agile robotic systems. Koopman-based model learning methods can capture these nonlinear dynamical system effects in higher dimensional lifted bilinear models that are amenable to optimal control. However, standard methods that lift the system state using a fixed function dictionary before model learning result in high dimensional models that are intractable for real time control. This paper presents a novel method that jointly learns a function dictionary and lifted bilinear model purely from data by incorporating the Koopman model in a neural network architecture. Nonlinear MPC design utilizing the learned model can be performed readily. We experimentally realized this method on a multirotor drone for agile trajectory tracking at low altitudes where the aerodynamic ground effect influences the system's behavior. Experimental results demonstrate that the learning-based controller achieves similar performance as a nonlinear MPC based on a nominal dynamics model in medium altitude. However, our learning-based system can reliably track trajectories in near-ground flight regimes while the nominal controller crashes due to unmodeled dynamical effects that are captured by our method.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 5

09/03/2021

A Comparative Study of Nonlinear MPC and Differential-Flatness-Based Control for Quadrotor Agile Flight

Accurate trajectory tracking control for quadrotors is essential for saf...
05/17/2021

Koopman NMPC: Koopman-based Learning and Nonlinear Model Predictive Control of Control-affine Systems

Koopman-based learning methods can potentially be practical and powerful...
12/07/2021

Policy Search for Model Predictive Control with Application to Agile Drone Flight

Policy Search and Model Predictive Control (MPC) are two different parad...
11/19/2018

Neural Lander: Stable Drone Landing Control using Learned Dynamics

Precise trajectory control near ground is difficult for multi-rotor dron...
04/06/2021

Nonlinear Model Based Guidance with Deep Learning Based Target Trajectory Prediction Against Aerial Agile Attack Patterns

In this work, we propose a novel missile guidance algorithm that combine...
03/06/2020

Neural-Swarm: Decentralized Close-Proximity Multirotor Control Using Learned Interactions

In this paper, we present Neural-Swarm, a nonlinear decentralized stable...
07/31/2018

Learning Stabilizable Dynamical Systems via Control Contraction Metrics

We propose a novel framework for learning stabilizable nonlinear dynamic...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The process of designing high performance controllers for agile robotic systems that satisfy state and actuation constraints is challenging for systems with important nonlinear dynamical effects. Model predictive control (MPC) can capture appropriate performance objectives and constraints, and it can be used with nonlinear dynamics models if carefully implemented [Kouzoupis2018RecentControl, Gros2020FromIteration, Grandia2020NonlinearFunctions]. However, obtaining a sufficiently accurate dynamical model is crucial to achieve good performance with nonlinear MPC (NMPC). Many learning approaches have been proposed to capture a robots’s complex mechanics and environmental interactions, reducing the need for time consuming system identification (cf. [Rasmussen2018GaussianLearning, Shi2019NeuralDynamics, Taylor2019EpisodicSystems, Recht2019AControl, Kaiser2018SparseLimit]). However, NMPC design based on these models is typically impossible because of the model discretization and the intractable forward simulation needed to evaluate the nonlinear program. Building on our previous work [op_icra_21], we take a Koopman-centric approach to jointly learn a function dictionary and a lifted bilinear model of the dynamics that can readily be used with real-time NMPC to obtain nearly optimal controllers that respect state and actuation constraints.

We focus on learning control-affine dynamics of the form

(1)

where is the system state and

is a vector of control inputs. This model characterizes a wide class of aerial and ground robots. Rather than describing a system’s behavior by its state space flows, Koopman methodologies study the evolution of

observables, which are functions over the state-space. In this space, a dynamical system can be represented by a linear (but possibly infinite dimensional) operator [Lan2013LinearizationSpectrum, Mauroy2016LinearOperator], which is approximated in practice by a finite dimensional model. The extended dynamic mode decomposition (EDMD) [Schmid2010DynamicData, Williams2015ADecomposition, Korda2018LinearControl, Brunton2016DiscoveringSystems, Kaiser2021Data-drivenControl, Proctor2018GeneralizingControl] is a common identification technique.

Most current EDMD-methods model the unknown control system dynamics by a lifted linear model, which implicitly restricts the control vector fields, , to be state-invariant. This is a significant limitation, as many robotic systems, e.g. systems where input forces enter the dynamics through rotation matrices, are best described by nonlinear control-affine dynamics. Instead, we learn a model underpinned by the Koopman canonical transform (KCT) [Surana2016KoopmanSystems], which allows a large class of nonlinear control-affine dynamic models to be described by a bilinear (but possibly infinite dimensional) dynamical system. Our previous work showed how to formulate an EDMD-type method to identify an approximate lifted bilinear model from data and how to utilize the bilinear model for NMPC [op_icra_21]. However, for realistic systems, such as the quadrotor drone considered in the experimental section, the dimension of the lifted model becomes too high when using a fixed function dictionary, resulting in the NMPC being intractable to implement in real-time.

As a result, we incorporate the bilinear Koopman structure in a neural network model, allowing the function dictionary and bilinear model to be learned jointly from data. This enables similar or improved prediction performance with much fewer observable functions, thus allowing real-time use. Additionally, the resulting model maintains the bilinear structure, making it partially interpretable and possible to simulate its behavior forward in time by only evaluating the neural network at the initial state and propagating the lifted state forward using the bilinear system matrices. Choosing a good function dictionary is a central challenge in Koopman-based methods, and using neural networks to jointly learn Koopman models and function dictionaries has been attempted previously. Multiple works have shown that using an encoder-decoder type architecture to parametrize the function dictionary allows many of the benefits of a Koopman-based model to be maintained while obtaining more compact model and/or improving prediction performance compared to fixed dictionaries [Li2017ExtendedOperator, Kaiser2021Data-drivenControl, Wang2021DeepRacing]. However, no existing method utilizes the proper bilinear models to accurately capture control-affine systems, nor has a view towards achieving high performance control for robotic systems. As such, our contributions are:

  1. We develop a flexible learning method underpinned by the Koopman canonical transformation to a bilinear form that can readily be used for NMPC design.

  2. We show in experiments with a quadrotor that a controller based strictly on the data-driven model can outperform an NMPC based on a known, identified model, while needing limited training data.

This manuscript is organized as follows. Preliminaries on the KCT are presented in Section II, and the problem statement and assumptions are discussed in Section III. Then, the method to jointly learn the function dictionary and bilinear model is presented in Section IV before control design is discussed in Section V. Finally, the method is demonstrated experimentally on a quadrotor drone in Section VI followed by concluding remarks in Section VII.

Ii Koopman theory preliminaries

Ii-a Koopman spectral theory

Before considering the effects of control inputs, we introduce the Koopman operator, which is defined for autonomous continuous-time dynamical systems, , with state , and is assumed to be Lipschitz continuous on . The flow is denoted by , defined as for all . The Koopman operator semi-group , from now on simply denoted as the Koopman operator, is defined as:

(2)

for all , where is the space of continuous observables , and denotes function composition. Each element of is a linear operator.

An eigenfunction

of the Koopman operator associated to an eigenvalue

is any function that defines a coordinate evolving linearly along the flow of the system:

(3)

Ii-B The Koopman canonical transform

We now return to control-affine dynamics (1). and recall when and how they can be transformed to a bilinear form through the Koopman canonical transform [Surana2016KoopmanSystems]. Let be eigenvalue-eigenfunction pairs of the Koopman operator associated with the autonomous dynamics of (1). The KCT relies on the assumption that the state vector can be described by a finite number of eigenfunctions, i.e. that for all , and where . This is likely to hold if is large. If not, they may be well approximated by eigenfunctions.

When , the KCT is defined as

(4)

where , , and is a diagonal matrix with entries .

Under certain conditions, system (4) is bilinearizable in a countable, possibly infinite basis. We restate the conditions for the existence of such bilinear form in the following theorem that underpins our learning method.

Theorem 1.

[Goswami2018GlobalApproach] Suppose there exist Koopman eigenfunctions , where of the autonomous dynamics of (1) whose span, , forms an invariant subspace of . Then, the system (1), and in turn system (4), are bilinearizable with an n-dimensional state space.

Hereafter, the finite basis of functions used in the KCT is termed the function dictionary.

Although the conditions of Theorem 1 may be hard to satisfy in a given problem, an approximation of the true system (1) can be obtained with sufficiently small approximation error by including adequately many eigenfunctions in the basis. As a result, and the system can be expressed as the Koopman bilinear form (KBF) (see [Goswami2018GlobalApproach] for details):

(5)

Iii System modeling and data collection

Iii-a Modeling assumptions

We model the quadrotor dynamics using states global position , velocity , attitude rotation matrix , and body angular velocity , and consider the following dynamics:

(6)

where and are the vehicle mass and inertia matrices, respectively, is a skew symmetric mapping, and is the gravitational force vector. The state and control-dependent functions and capture unmodelled dynamic terms, such as the near-ground effect.

As is common, we abstract the system’s control inputs to a total thrust in the body z-direction, , and body torques , . The mapping of rotor speeds to the abstract controls is typically modeled by a linear combination of the squared rotor speeds:

(7)

where and is the propeller thrust and torque coefficients, is the distance from the vehicle’s center of gravity to the propeller axle,and are the propeller rotation rates. This mapping enables the abstracted controls to be translated into propeller rotational rates.

Iii-B The learning process and data collection

To collect the learning data set, we assume that data collection trajectories of length , from initial conditions , can be executed under a nominal controller. From each trajectory, state and control actions are sampled at a fixed time interval , resulting in a data set

(8)

where is the state at the next timestep, i.e. .

Since the NMPC design requires continuous-time models to be discretized, we learn a discrete-time lifted bilinear model, thereby avoiding potential numerical differentiation and discretization errors. This is further motivated by the existence of discretization procedures that maintain stability properties and the bilinear structure of the original system, such as the trapezoidal rule with zero-order-hold [Phan2012Discrete-timeModels, Surana2018KoopmanEstimation].

(a) Autoencoder model
(b) State prediction model
(c) Lifted state prediction model
Fig. 1: Koopman neural network model architecture

Iv Joint learning of the Koopman dictionary and model

To formulate the learning problem, we follow an analogous approach to the EDMD methods where we seek to use a dictionary of nonlinear transformations to lift the state to a higher dimensional space where the system’s dynamics can be described by a bilinear model [op_icra_21]. However, unlike classical EDMD, we parametrize the function dictionary by a neural network, and jointly learn the function dictionary and bilinear model matrices. As a result, we use to parametrize the neural network representing the function dictionary , and use to represent the bilinear model matrices described in Section II.

The key idea of the learning method is that we can encode the function dictionary parametrization and the bilinear model structure in a single neural network. Then, once training is complete, the learned function dictionary and model matrices can be extracted to maintain the benefits of Koopman based learning methods outlined in Section I

, while improving the prediction performance and/or reducing the dimension of the function dictionary. To achieve this, we formulate a loss function,

to be minimized by empirical risk minimization over all the input-output pairs in the training data set, :

(9)

where is the mean squared error (MSE) of the reconstruction loss when lifting the system’s state using the encoder and projecting back down to the original system state with the projection matrix , is the MSE of the one-step prediction error of the model when projected down in the original state space, and

is the MSE of the one-step prediction error in the lifted space. Tunable hyperparameters

and determine the weight of the prediction and KCT losses, respectively relative to the reconstruction loss.

The learned model should realize good prediction performance in the original state space: minimizing encourages this goal. Loss terms and respectively promote accurate projection of the function dictionary to the original state space and an approximately bilinear dynamical system model in the lifted space. Loss promotes good prediction accuracy over multiple time steps, as multi-step prediction is performed in the lifted space before the result is projected to the original state space only at relevant times.

Figure 1 depicts the neural network architecture that implements loss (9). The autoencoder (Fig. (a)a) passes the system state through the neural network, , to obtain the lifted state . Subsequently, the lifted state is projected back to the original state space through the projection matrix , resulting in the reconstructed state, , which is compared to the original state, , in . The state prediction model (Fig. (b)b) passes the system state through the neural network to obtain the lifted state, , before evolving the lifted state one time-step with the bilinear model based on and to get the one-step-ahead lifted state, . The lifted state prediction is projected to the original state space to get the one-step-ahead state prediction, , which is compared to the true one-step-ahead state, , in . Finally, the lifted state prediction model (Fig. (c)c) follows the same forward pass through the same state prediction model to get the one-step-ahead lifted state prediction . This is compared to the true one-step-ahead lifted state in , obtained by passing the true one-step-ahead state, through the neural network .

This approach can be implemented using modern neural network software packages with basis functions modeled as a feedforward neural network having fully connected layers and as single fully connected layers with no nonlinear activations. The entire model can be trained simultaneously using gradient-based learning algorithms applied to the loss function (9).

V Nonlinear model predictive control design

V-a Design considerations

We formulate the NMPC problem using the bilinear model identified by the Koopman framework [op_icra_21]:

(10)

where is the desired trajectory to be tracked.

Although (10) models a quadratic cost function and linear state and actuation constraints, nonlinear objective and constraint terms can be included by adding them to the lifted state . For example, if it is desired to enforce the constraint , we can add to the lifted state and enforce where is the index corresponding to [Korda2018LinearControl].

V-B Sequential quadratic programming

The optimization constraint set is generally non-convex due to the bilinear term, , in the lifted dynamic model and control inputs, . As a result, we solve the optimization problem (10) using sequential quadratic programming (SQP). The non-convex optimization problem (10) is sequentially approximated by quadratic programs (QPs), whose solutions are Newton directions for performing steps toward the optimal solution of the nonlinear program (NLP). The sequence is initialized at a guessed solution, , at which the following QP is iteratively solved. The initial guess is updated at each iteration i until converged:

(11)

where , , is the Hessian of the Lagrangian of the NLP (10) and

(12)

As a result of the bilinear structure of the dynamics model, the linearization can be efficiently computed for a given initial guess by simple matrix multiplication and addition with the dynamics matrices of the Koopman model and the matrices containing the initial guesses of .

Input: reference trajectory , initial guess
while Controller is running do
        Form using (12)
        Get and lift current state,
        Solve (10) to get the Newton direction
        Update solution,
        Deploy first input to the system
        Construct using (13)
end while
Algorithm 1 [op_icra_21] Koopman NMPC (closed loop)
(a) Experiment set-up
(b) “Figure 8” tracking with position depicted at selected times
(c) Mean position coordinates (solid lines) +/- 3 std (shaded area) of 10 consecutive experiment runs with both controllers at 0.25 m altitude.
Fig. 2: Hardware configuration and experimental results from “Figure 8” trajectory tracking control experiment.

V-C Warm-start of SQP at each timestep

As discussed in Section V-B, the SQP algorithm requires an initial guess of the solution . Selecting an initial guess that is sufficiently close to the true optimal solution is essential for the algorithm to converge fast and reliably [Gros2020FromIteration]. It is well known that the receding horizon nature of MPC can be exploited to obtain excellent initial guesses. At a time instant , this can be achieved by shifting the NMPC solution from the previous timestep and by updating the guess of the final control input. Under certain conditions, a locally stable controller enforcing state and actuation constraints can be designed allowing feasibility of the initial guess to be guaranteed [Rawlings2012ModelDesign]. Typically, simpler approaches are taken, such as adding a copy of the final control signal and calculating the implied final state using the dynamics model

(13)

If the previous solution is feasible, the shifted solution will also be feasible for all but the last timestep.

V-D Koopman NMPC algorithm

The initialization and closed loop operation of the controller can be summarized as follows (see Algorithm 1). An initial guess of the state and control, is chosen and the state is lifted with the dictionary of functions at each timestep to obtain . Then, before task execution, the SQP algorithm with the Koopman QP subproblem (10) is executed to convergence to obtain a good initial guess of the solution. Subsequently, in closed loop operation, the Koopman bilinear model is linearized along the initial guess. The current state is obtained from the system and lifted using the function dictionary, and then the QP subproblem is solved only once. Finally, the first control input of the optimal control sequence is deployed to the system, and the full solution is shifted one timestep and used as an initial guess at the next timestep.

Vi Experiments

We conduct experiments to demonstrate the performance of the proposed method on a quadrotor drone. To capture the nonlinearity in the quadrotor dynamics, we perform 6 second-long “Figure 8” tracking maneuvers (see Fig. (b)b) at very low altitudes above the ground so that “ground effect” corrupts the nominal model. The coordinates of the “Figure 8” trajectory require high roll and pitch maneuvers and the low altitude flights highlight the effect of unmodeled dynamics, encapsulated in the and terms in (6).

Vi-a Implementation and experimental details

The experiments are conducted with a commercially available Crazyfile 2.1 quadrotor. Global position is measured via an OptiTrack

motion capture system (tracking at 120 Hz) and fused with the onboard IMU data to get filtered state estimates. The all-up quadrotor weight is 33.6 g, with a thrust to weight ratio of 1.7. Control commands are computed on a offboard computer and communicated to the drone over radio, see Fig.

(a)a. All communication with the drone is done using the Crazyswarm Python API [Preiss2017Crazyswarm:Swarm].

To collect training data, waypoints were sampled uniformly at random within meters and tracked with a PID position controller for a total of 4 minutes. The estimated position, attitude, linear and angular velocites, motor pulse width modulation (PWM) signals, and battery voltage were collected at 500 Hz. The state estimates where downsampled and the PWMs and voltage averaged over each 0.02 s interval to obtain a dataset at 50 Hz, the target update rate for the controller. This smoothing process better captures motor PWM inputs, as the raw data showed significant variability within each 0.02 s period. Finally, we map PWMs and voltage to thrust using the model identified in [Shi2020Neural-Swarm2:Interactions] (see Tab. 1, [Shi2020Neural-Swarm2:Interactions]) and apply the thrust mixing in (7) to get total thrust and torques around each axis, resulting in the inputs used in the learning process.

A discrete-time lifted-dimensional bilinear model is learned as described in Section IV. Thirty percent of the data set is held out for validation and testing and the hyperparameters are tuned to obtain good open loop prediction performance over 0.5 seconds, the same prediction horizon used in the model predictive controllers. The final parameters used are included in Tab. I. The neural network is implemented using PyTorch [Paszke2019PyTorch:Library]111github.com/Cafolkes/koopman-learning-and-control. To simplify encoding the objective and constraints of the NMPC, the state itself is added to the lifted state, , making the projection matrix known, . This makes the reconstruction loss, in (9) redundant and we set .

Two controllers are implemented for the task. First, a nominal NMPC using the dynamic model (6) with and the system parameters described in Tab. II, is used as a performance baseline. Second, the Koopman NMPC described in Section V, using the learned model, is implemented. Both controllers are coded in Python using the OSQP quadratic program solver [Stellato2018OSQP:Programs]. The objective penalizes errors between the state and desired trajectory in . The control inputs are constrained using the vehicle’s mechanical limitations. The position error penalties, , and control effort penalties, , and remaining control parameters are described in Tab. II. The control thrusts and desired attitude are calculated by the NMPCs and sent to the onboard drone controller, which decides the final motor control allocations. This architecture enables both controllers to cycle at 50 Hz while attitude tracking runs at a higher rate onboard the drone.

Fig. 3: Average MSE and total thrust (dots) +/- 3 std (error bars) from 5 consecutive experiment runs with each of the controllers at decreasing altitudes. Stable flight not achieved by the nominal NMPC for altitudes below 0.25 m.

Vi-B Results and discussion

Fig. (c)c and Tab. III detail the tracking performance of the nominal and Koopman NMPC tracking a 6 second-long “Figure 8” trajectory over 10 consecutive experiment runs. The yaw and pitch performance is comparable indicating accurate extraction of the roll-pitch-yaw dynamics. Furthermore, the large -tracking error from the nominal NMPC induced by the ground effect is successfully captured by the Koopman learned dynamics. Note that the drone both starts and ends at zero velocity causing increased tracking error at the beginning and end of the trajectory.

To further study this conjecture, we executed 5 “Figure 8” trajectories at decreasing altitudes, as shown in Fig. 3. At altitudes 0.25 m and higher, nominal and Koopman NMPCs tracking performance is similar. However, at lower altitudes nominal NMPC stability fails, leading to catastrophic crashes. In contrast, the Koopman NMPC completed all five tests, albeit with increased tracking error as the altitude setpoint approaches 0.05 m above the ground. We also note that the Koopman NMPC exploits the buoyancy in resulting from the ground effect at lower altitudes to significantly reduce the thrust needed to complete the task.

# of encoder layers 2 Learning rate 1e-3
Layer width 100 KCT loss penalty, 2e-1
Activation function tanh -regularization strength 5e-4
Total lifting dimension 23

# of epochs

200
TABLE I: Learning architecture and tuned hyperparameter values
Mass 33.6 g State error penalty, Q [10, 10, 10]
Max thrust 57.0 g Control penalty, R [1, 1, 1, 1]
[Landry2016AggressiveProgramming] 1.7e-5 NMPC prediction
[Landry2016AggressiveProgramming] 1.7e-5 horizon 0.5 s
[Landry2016AggressiveProgramming] 2.9e-5 Controller timestep 0.02 s
TABLE II: Crazyflie system properties and NMPC parameters
Nominal Koopman
NMPC NMPC
Avg. position tracking error (mse) 4.7e-3 4.8e-3
Avg. control effort (thrust norm) 10.7 10.5
TABLE III: Tracking error and control effort of “Figure 8” tracking experiment averaged over 10 experiment runs at 0.25 m altitude.

Vii Conclusion

The coupling of Koopman-based bilinear models and NMPC allows for real-time optimal control of robots that captures important nonlinearities, while allowing for critical state and control limits. Function dictionary design is a key challenge in Koopman-based modeling and control methods. This paper presented a method to jointly learn a lifted Koopman bilinear model and KCT function dictionary strictly from data, enabling more compact models and/or better prediction performance compared to predefined function dictionaries. More compact models, i.e. lower lifting dimension, are crucial for robotic applications where real-time control is needed. Data-driven models are valuable in cases where first-principles modeling may be difficult. Our quadrotor drone experiments demonstrate good prediction performance with a lifting dimension of only 23. The associated Koopman NMPC can match the performance of a NMPC based on the nominal model (6) far away from the ground and outperforms the nominal NMPC in near-ground regimes. Notably, the nominal controller is unable to maintain stable flight near the ground, whereas the Koopman NMPC maintains acceptable tracking performance down to 0.05 m altitude.

Future work includes deriving model error bounds for the learned model and exploiting the bounds to derive theoretical guarantees on controller performance. This can be achieved by extending our previous work to the bilinear model setting [op_acc_21].

References