Safe and Fast Tracking Control on a Robot Manipulator: Robust MPC and Neural Network Control

12/22/2019 ∙ by Julian Nubert, et al. ∙ University of Stuttgart Max Planck Society ETH Zurich 0

Fast feedback control and safety guarantees are essential in modern robotics. We present an approach that achieves both by combining novel robust model predictive control (MPC) with function approximation via (deep) neural networks (NNs). The result is a new approach for complex tasks with nonlinear, uncertain, and constrained dynamics as are common in robotics. Specifically, we leverage recent results in MPC research to propose a new robust setpoint tracking MPC algorithm, which achieves reliable and safe tracking of a dynamic setpoint while guaranteeing stability and constraint satisfaction. The presented robust MPC scheme constitutes a one-layer approach that unifies the often separated planning and control layers, by directly computing the control command based on a reference and possibly obstacle positions. As a separate contribution, we show how the computation time of the MPC can be drastically reduced by approximating the MPC law with a NN controller. The NN is trained and validated from offline samples of the MPC, yielding statistical guarantees, and used in lieu thereof at run time. Our experiments on a state-of-the-art robot manipulator are the first to show that both the proposed robust and approximate MPC schemes scale to real-world robotic systems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The need to handle complexity becomes more prominent in modern control design, especially in robotics. First of all, complexity often stems from tasks or system descriptions that are high-dimensional and nonlinear. Second, not only classic control properties such as nominal stability or step-response characteristics are of interest, but also additional guarantees such as stability under uncertain conditions or satisfaction of hard constraints on inputs and states. In particular, the ability to robustly guarantee safety becomes absolutely essential when humans are involved within the process, such as for automated driving or human-robot interaction (HRI). Finally, many robotic systems and tasks require fast acting controllers in the range of milliseconds, which is exacerbated by the need to run algorithms on resource-limited hardware.

Designing controllers for such challenging applications often involves the combination of several different conceptual layers. For example, classical robot manipulator control involves trajectory planning in the task space, solving for the inverse kinematics of a single point (i.e., the setpoint) or multiple points (task space trajectory), and the determination of required control commands in the state space [21]. These approaches can be affected by corner cases of one of the components; for example, solving for the inverse kinematics may not be trivial for redundant robots. For many complex scenarios, a direct approach is hence desirable for tracking of (potentially unreachable) reference setpoints in task space.

Fig. 1: Apollo robot with two LBR4+ arms (at MPI-IS Tübingen). The end effector tracks the reference encircled in green, while guaranteeing stability and constraint satisfaction at all times (e.g., avoiding obstacles).

In this paper, we propose a single-layer approach for robot tracking control that handles all aforementioned challenges. We achieve this by combining (and extending) recent robust model predictive control (RMPC) and function approximation via supervised learning with (deep) neural networks (NNs). The proposed RMPC can handle nonlinear systems, constraints, and uncertainty. In order to overcome the computational complexity inherent in the online MPC optimization, we present a solution that approximates the RMPC with supervised learning yielding a NN as an explicit control law and a speed improvement by two orders of magnitude. Through experiments on a KUKA LBR4+ robotic manipulator (see Figure 

1), we demonstrate – for the first time – the feasibility of both the novel robust MPC and its NN approximation for robot control.

Related Work

MPC can handle nonlinear constraints and is applicable to nonlinear systems [20], however, disturbances or uncertainty can compromise the safety guarantees of nominal MPC schemes. RMPC overcomes this by preserving safety and stability despite disturbances and uncertainty.

Recent advances in computationally efficient RMPC schemes allow for guaranteeing constraint satisfaction despite uncertainty. For instance, tube-based MPC does so by predicting a tube around the nominal (predicted) trajectory that confines the actual (uncertain) system trajectory. A robust constraint tightening scheme for linear systems is presented in [5]. In [25], an approach based on min-max differential inequalities is presented to achieve robustness for the nonlinear case. In this work, we build upon the novel nonlinear constraint tightening approach in [13], which provides slightly more conservative results than the approach in [25], but is far more computationally efficient.

We herein extend [13] to setpoint tracking. Setpoint tracking MPC, as introduced in [17] for nonlinear systems, enables the controller to track piece-wise constant output reference signals. A robust version for linear systems is presented in [16]. To obtain a robust version for nonlinear systems, we optimize the size of the terminal set around the artificial steady state online, similar as done in [15] for nominal MPC. None of the aforementioned robust or setpoint tracking MPC approaches has been applied on a real-world, safety-critical system of complexity similar to the robot arm herein.

Approximate MPC (AMPC) allows for running high performance control on relatively cheap hardware by using supervised learning (e.g. NNs) to approximate the implicit solution of the optimization problem. Recently, in [3, 26, 12][4] theoretical approaches for AMPC for linear systems were presented, which use projection/active set iterations for feasibility [3, 4], statistical validation [12], and duality for performance bounds [26]. Herein, we leverage the AMPC approach for nonlinear systems, recently proposed in [9], which yields a NN control law that inherits the MPC’s guarantees (in a statistical sense) through robust design and statistical validation.

MPC control for robotic manipulators is investigated, for example, in [7, 2]. However, both of these approaches assume a trajectory in the joint space to be given beforehand. In [6], reference tracking in the task space by using task scaling to solve for the inverse kinematics of a redundant manipulator is proposed, taking kinematic limits into account. In none of these approaches, safety guarantees or robustness under uncertainty are considered. Approaches making use of robust MPC schemes are not widely used in robotics (yet), but tube and funnel approaches have recently been explored for robust robot motion planning [18, 22, 8]. However, to the best of our knowledge, no experimental implementation of an MPC design with theoretically guaranteed robustness exists yet for a robotic system.

Contributions

This paper makes contributions in three main directions: (i) robust setpoint tracking MPC, (ii) approximate MPC via supervised learning with NNs, and (iii) their application to real robotic systems.

(i) We present a new RMPC setpoint tracking approach that combines the RMPC [13] with the MPC setpoint tracking in [17] by proposing online optimized terminal ingredients to improve performance subject to safety constraints. The resulting robust approach provides safety guarantees in face of disturbances and uncertainties while yielding fully integrated robot control in one-layer (i.e., robust motion planning and feedback control). (ii) The presented AMPC builds and improves upon the approach in [9] by providing a novel, less conservative validation criterion that also considers model mismatch, which is crucial for robot experiments. The proposed AMPC considerably improves performance due to fast NN evaluation, while providing statistical guarantees on safety. (iii) Finally, this work comprises the first experimental implementations of both, the RMPC based on [13] and the AMPC originating from [9]. To the best of our knowledge, this is the first experimental implementation of nonlinear tracking RMPC with safety properties theoretically guaranteed by design.

Ii Problem Formulation

We consider disturbed nonlinear continuous-time systems

(1)

with state , control input , output , nominal dynamics and model mismatch with some known compact set . For the nonlinear state, input and output constraint set , we consider

In the following, we denote and omit the time index when clear from context.

Objective

Given an output reference , the control goal is to exponentially stabilize the optimal reachable setpoint, while ensuring robust constraint satisfaction, i.e. . This should hold irrespective of the reference, and even for a non-reachable output reference . To meet requirements of modern robotics, the controller should operate at fast update rates, e.g., ideally at the order of milliseconds.

Such control problems are ubiquitous in robotics and other areas and combine the challenges of safe and fast tracking for complex (i.e., nonlinear, uncertain, constrained) systems.

Iii Methods: RMPC Setpoint Tracking & AMPC

In this section, we introduce the RMPC scheme based on [13] (Sec. III-A) and extend it to robust output tracking (Sec. III-B). Following this, we show how the online control can be accelerated by moving the optimization offline using AMPC (Sec. III-C) as an extension to the approach in [9].

Iii-a Robust MPC Design

To ensure fast feedback, the piece-wise constant MPC control input is combined with a continuous-time control law , i.e. the closed-loop input is given by

(2)

where denotes the sampling time of the RMPC, the sampling instance, and the piece-wise constant MPC control law. Denote , , and .

Iii-A1 Incremental Stability

For the design of the RMPC, we assume that the feedback ensures incremental exponential stability, similar to [13, Ass. 9].

Assumption 1.

There exists an incremental Lyapunov function and constants s.t. the following properties hold , :

(3a)
(3b)
(3c)

with , . Furthermore, the following norm-like inequality holds :

(4)

The first and third condition ((3a), (3c)) formulate stability while the second is fulfilled for locally Lipschitz continuous . Incremental stability is a rather general condition, among others allowing for the usage of standard polytopic and ellipsoidal Lyapunov functions (i.e. ), which satisfy condition (4) due to the triangular inequality. Compare [13, Remark 1] for a general discussion.

Iii-A2 Tube

In this work, we use to characterize the tube around the nominal trajectory according to the system dynamics . The predicted tube is parameterized by , where denotes the nominal prediction and the tube size is a scalar. For the construction of the tube and hence, for the design of the RMPC controller, we use a characterization of the magnitude of occurring uncertainties.

Iii-A3 Disturbance Description

To characterize the magnitude of the uncertainties arising from the model mismatch , we need a (possibly constant) function . Given , it is possible to construct satisfying

(5)

The state and input dependency of can e.g. represent larger uncertainty in case of high dynamic operation due to parametric uncertainty. For simplicity, we only consider a positive constant in the following, for details regarding the general case see [13].

Iii-A4 Tube Dynamics and Design Quantities

By using inequality (5), the tube propagation is given by , yielding . To allow for an efficient online optimization, we consider the discrete-time system , where is the discretization of with sampling time and denoting the discrete-time model mismatch. Given the sampling time , the corresponding discrete-time tube size is given by with , . The discrete-time model mismatch satisfies , . The contraction rate defines the growing speed of the tube while denotes the size of the tube around the nominal trajectory, which bounds the uncertainties.

Iii-B Robust Setpoint Tracking

A standard MPC design (c.f. [20]) minimizes the squared distance to some desired setpoint , which requires a feasible target reference in the state and input space. For the considered problem of (robust) setpoint tracking of the output (the end effector position in Sec. IV), this would require a (usually unknown) mapping of the form .

Remark 1.

In our specific use case of controlling a robotic manipulator, corresponds to the inverse kinematics. For MPC-based robot control such mappings are used in [7, 2], which we particularly avoid within our work.

The proposed approach is a combination of [13] and [17] and hence, can be seen as an extension of [15] to the robust case. The following optimization problem characterizes the proposed RMPC scheme for setpoint tracking and avoids the need of providing :

subject to
(6a)
(6b)
(6c)
(6d)
(6e)

with the objective function

(7)

. The terminal set is given as

(8)

The optimization problem (6) is solved at time with the initial state . The optimal input sequence is denoted by with the control law denoted as . The predictions along the horizon are done w.r.t. the nominal system description in (6a). Furthermore, the constraints in (6b) are tightened with tube size . In the following, we explain the considered objective function in (7) and the conditions for the terminal set in (6d), (6e) and (8) for setpoint tracking in more detail.

Iii-B1 Objective Function

To track the external output reference , we use the setpoint tracking formulation introduced by Limon et al. [17]. Additional decision variables are used to define an artificial steady-state (6c). The first part of the objective function ensures that the MPC steers the system to the artificial steady-state, while the term ensures that the output at the artificial steady-state tracks the desired output . In Theorem 1, we prove exponential stability of the optimal (safely reachable) steady-state, as an extension of [17, 15] to the robust setting.

Iii-B2 New Terminal Ingredients

The main approach in MPC design for ensuring stability and recursive feasibility is to introduce terminal ingredients, i.e. a terminal cost and a terminal set . Determining the setpoint online and occurring disturbances, further complicate their design.

The proposed approach determines the terminal set size online, using one additional scalar variable similar to [15], which is less conservative than the design in [17]. Furthermore, by parametrizing the terminal set with the incremental Lyapunov function , we can derive intuitive formulas that ensure robust recursive feasibility in terms of lower and upper bounds on  (6d). As a result, we improve and extend [17, 15] to the case of nonlinear robust setpoint tracking. The properties of the terminal ingredients are summarized in the following proposition.

Proposition 1.

The set of constraints (6c), (6d) and (6e) together with (8) and the terminal controller , provide a terminal set that ensures the following properties needed for robust recursive feasibility (c.f. [13, Ass. 7]).

  • The terminal set constraint is robust recursively feasible for fixed values .

  • The tightened state and input constraints (6b) are satisfied within the terminal region.

Proof.

The candidate (c.f. [13, Ass. 7]) satisfies the terminal constraint (6e) by using

Satisfaction of the tightened constraints (6b) inside the terminal set follows with

In addition to the presented terminal set, we consider some Lipschitz continuous terminal cost , which satisfies the following conditions in the terminal set with some

(9a)
(9b)

For the computation of the terminal cost for nonlinear systems with varying setpoints, we refer to [17, 15].

Iii-B3 Offline/Online Computations

The procedure for performing the offline calculations can be found in Algorithm 1. One approach to compute suitable functions using a quasi-LPV parametrization and linear matrix inequalities (LMIs) is described in [14]. The subsequent online calculations can then be performed according to Algorithm 2.

1:Determine a stabilizing feedback and a corresponding incremental Lyapunov function (Ass. 1).
2:Compute constant satisfying (5).
3:Compute constants satisfying (3b).
4:Define sampling time and compute , as described in Section III-A4.
5:Determine terminal cost satisfying (9).
Algorithm 1 Offline calculations for RMPC design.
1:Solve the MPC problem from (6).
2:Apply input .
Algorithm 2 Online calculations, executed at every time step , during the sampling time interval .

Iii-B4 Closed-Loop Properties

In the following, we derive the closed-loop properties of the proposed scheme. The set of safely reachable steady-state outputs is given by . The optimal (safely reachable) setpoint , is the minimizer to the steady-state optimization problem .

The following technical condition is necessary to ensure convergence to the optimal steady-state, compare [17], [15].

Assumption 2.

There exist (typically unknown) unique functions , that are Lipschitz continuous. Furthermore, the set of safe output references is convex.

Consequently, save operation and stability convergence is guaranteed due to the following theorem.

Theorem 1.

Let Assumption 1 hold and suppose that the optimization problem (6) is feasible at . Then the optimization problem (6) is recursively feasible and the posed constraints are satisfied for the resulting closed loop (Algorithm 2), i.e., the system operates safely. Suppose further that Assumption 2 holds and is constant. Then the optimal (safely reachable) setpoint is practically exponentially stable for the closed-loop system and the output practically exponentially converges to .

Proof.

The safety properties of the proposed scheme are due to the RMPC theory in [13], using the known contraction rate and the constant (bounding the uncertainty) to compute a safe constraint tightening in (6b). Proposition 1 ensures that the novel design of the terminal ingredients using (6d), (6e) and (8) also satisfies the conditions in [13, Ass. 7] for fixed values . The stability/convergence properties of the considered formulation are based on the non-empty terminal set () with corresponding terminal cost (9) and convexity of (Ass. 2), which allow for an incremental change in towards the desired output , compare [17, 15] for details. Thus, the Lyapunov arguments in [15] remain valid with a quadratically bounded Lyapunov function satisfying with a positive definite function from [13] bounding the effect of the model mismatch. This implies practical exponential stability of , and thus the output (practically) converges to a neighborhood of the optimal setpoint . ∎

Remark 2.

Practical stability implies that the system only converges to a neighborhood (with size depending on the model mismatch ) around the optimal setpoint .

Remark 3.

Convexity of and uniqueness of the functions , (Ass. 2) are strong assumption for general nonlinear problems. In particular, for the considered redundant 7-DOF robotic manipulator (Sec. IV), the functions are not unique (potentially multiple optimal steady-states) and the feasible steady-state manifold is not convex (collision avoidance constraint). Nevertheless, the safety properties are not affected by Assumption 2 and in the experimental implementation, the RMPC typically converges to some (not necessarily unique) steady-state.

Iii-C Approximate MPC

In the following, we introduce the AMPC, which provides an explicit approximation of the RMPC control law , yielding a significant decrease in computational complexity. In particular, as demonstrated in the numerical study in [4, Sec. 9.4], approximate MPC without additional modifications will in general not satisfy the constraints. Consequently, the core idea of the presented AMPC approach is to compensate for inaccuracies of the approximation by introducing additional robustness within the RMPC design. In the following, we present a solution to obtain statistical guarantees (III-C2) for the application of the resulting AMPC. To that end, we introduce an improved validation criterion (Prop. 2III-C1) compared to the one in [9], being more suitable for real world applications.

Iii-C1 Validation Criterion

The following proposition provides a sufficient condition for AMPC safety guarantees.

Proposition 2.

Let Assumption 1 hold. Suppose the model mismatch between the real and the nominal system satisfies

(10)

, . If is designed with some and the approximation satisfies

(11)

for any state with (6) being feasible, then the AMPC ensures the same properties as the RMPC in Theorem 1.

Proof.

We use the following bound on the perturbed AMPC:

Then, the properties follow from Theorem 1.∎

Iii-C2 Statistical Guarantees

In practice, guaranteeing a specified error for all possible values with a supervised learning approach is difficult, especially for deep NNs. However, it is possible to make statistical statements about using Hoeffding’s inequality [10]. For the statistical guarantees, we adopt the approach from [9] and use our improved validation criterion as introduced in Proposition 2.

Assumption 3.

The prestabilized, disturbed system dynamics characterize a deterministic (possibly unknown) map.

We validate full trajectories under the AMPC with independent and identically distributed (i.i.d.) initial condition and setpoints. Due to Assumption 3, also the trajectories themselves are i.i.d.. Specifically, we define a trajectory as

(12)

Further, we consider the indicator function based on (11)

The indicator measures, whether for any time step along the trajectory, there is a discrepancy larger than between the ideal trajectory with and the trajectory with the approximated input . The empirical risk is given as for sampled trajectories, while

is denoting the true expected value of the random variable. With Hoeffding’s inequality the following Lemma can be derived.

Lemma 1.

[9, Lemma 1] Suppose Assumption 3 holds. Then the condition , holds at least with confidence .

Remark 4.

In practice, it is not possible to check for infinite length trajectories . Since in our definition, the reference is fixed along the whole trajectory, we do the validation until a steady state is reached below a certain threshold.

We provide the following illustration: given a large enough number of successfully validated trajectories, we obtain a high empirical risk, e.g. . This result ensures that with confidence of e.g. , (11

) holds at least with probability

(e.g. %) for a new trajectory with initial condition . Thus, with high probability, the guarantees in Proposition 2 (safety and stability) hold.

Iii-C3 Algorithm

The overall procedure for the AMPC is summarized in Algorithm 3, based on Hertneck et al. in [9].

1:Choose , determine and calculate .
2:Design the RMPC according to Algorithm 1.
3:Learn .
4:Validate according to Lemma 1.
5:If the validation fails, repeat the learning from step 3.
Algorithm 3 Procedure for the AMPC.

Iv Robot Experiments

We demonstrate the proposed RMPC and AMPC approaches on a KUKA LBR4+ robotic manipulator (Fig. 1).

Iv-a Robotic System

Several works investigated the dynamics formulation of the KUKA LBR4+ and LBR iiwa robotic manipulators [11, 24], with dynamic equations of the form . Here, denotes the applied torque and the joint angle, joint velocity and joint acceleration [21].

Iv-A1 System Formulation

In this work, we leverage existing low-level controllers as an inverse dynamics inner-loop feedback linearization ending up with a kinematic model that assumes direct control of joint accelerations, i.e., . Such a description is not uncommon for designing higher-level controllers in robotics, compare e.g. the MPC scheme in [2] based on a kinematic model. As the control objective, we aim for tracking a given reference in the task space with the manipulator end effector position, defined as . Since this position only depends on the first four joints, we consider those for our control design. The resulting nonlinear system with state is given by

(13)

The output is given by the forward kinematic of the robot:

(14)

where and denote the sine and cosine of , respectively, and .

Iv-A2 Constraints

States and inputs are subject to the following polytopic constraints: joint angles can turn less than (exact values can be found in [19]), joint velocity , and joint acceleration .

More interestingly, we also impose constraints on the output function to ensure obstacle avoidance in the Cartesian space. We approximate the obstacles with differentiable functions, compare Figure 2. This allows for a simpler implementation and design.

Fig. 2: Visualization of the output constraints. We use (quadratic) differentiable functions to over-approximate the non-differentiable obstacles.

For example,

models the box-shaped obstacle, with the obstacle position , the end effector , and here . Similarly, we introduce a nonlinear constraint that prevents the robot from hitting itself (see Figure 2).

Remark 5.

This constraint formulation uses a simple (conservative) over-approximation and assumes static obstacles. Both limitations can be addressed by using the exact reformulation in [27] based on duality and using the robust extension in [23] for uncertain moving obstacles.

Iv-B Robust MPC Design

In general, the dynamic compensation introduced in the previous subsection is not exact and hence, the resulting model mismatch needs to be addressed in the robust design.

Iv-B1 Determination of Disturbance Level

For the determination of the disturbance level, we sample trajectories for a specified sampling time and compare the observed trajectory to the nominal prediction for each discrete-time step. The deviation of the two determines the disturbance bound introduced for the discrete-time case, i.e. . In Figure 3, a plot of the -norm of the observed disturbance with respect to the applied acceleration is shown.

Fig. 3: Observed disturbance with respect to the applied acceleration for a sampling rate of . Proportionality-like behavior is apparent.

The maximal observed model mismatch satisfies . As a precaution we add some tolerance and use for our design. From the figure, it can be seen that the induced disturbance can be larger for higher accelerations. This behavior is not surprising, since the low level controllers have more difficulties to follow the reference acceleration for more dynamic movements. Using instead of a constant bound, could help to further decrease conservatism (compare [13]). Furthermore, the uncertainty could also be reduced by improving the kinematic model using data, as e.g. done in [2] with an additional gaussian process (GP) error model.

Iv-B2 Computations

The offline computations are done according to Algorithm 1. We consider a quadratic incremental Lyapunov function and a linear feedback , both computed using tailored LMIs (incorporating (5), (3b)), compare [19]. The terminal cost is given by the LQR infinite horizon cost. The online computations from Algorithm 2 are performed in a real-time C++ environment by deploying the CasADi C++ API for solving the involved optimization problem [1]. The feedback is updated with a rate of – hence, it can be considered as being continuous-time for all practical purposes. Furthermore, is evaluated every .

In general, is non-convex due to the collision avoidance constraints and hence Assumption 2 is not satisfied, compare Remark 3. Issues owing to local minima were not observed in the considered experiments.

Iv-C Experimental Results RMPC

With the RMPC design, we demonstrate a reliable and safe way for controlling the end effector position of the robotic manipulator. An exemplary trajectory on the real system can be observed in Figure 1, where the end effector tracks the reference, which is set by the user.

Fig. 4: Experimental (solid) and simulation (dashed) data of RMPC (blue-colored) and AMPC (orange-colored) with the same reference . Reference is continuously moving for s; constant, but unreachable in the interval s; and moving again after a step for s. Left: Tracking error . Right: Relative closed-loop input , with and .

It can be seen that even though the direct way is obstructed by an obstacle, the controller obtains a solution while keeping safe distance to the obstacle. In Fig. 4 the tracking error and closed-loop input of the RMPC for an exemplary use-case can be observed. The controller is able to track the reference. However, due to the computational complexity and its induced delay, the controller has a larger tracking error in intervals of changing set points (interval s in Fig. 4). Note that the constraint tightening of the considered RMPC method only restricts future control actions and thus the scheme can in principle utilize the full input magnitude. However, due to the combination of the velocity constraint and the long sampling time , the full input is only utilized with the AMPC, with the faster sampling time. More experimental results can be observed in the supplementary video111https://youtu.be/c5EekdSl9To.

Integrating the tracking control within a single optimization problem and automatically resolving corner cases such as unreachable setpoints are particular features that make the deployment of the approach simple, safe, and reliable in practice. As expected by the considered robust design, in thousands of runs (one run corresponds to one initial condition and one output reference), the robot never came close to hitting any of the obstacles (e.g. in the video). This is the result of using the conservative bound on the model mismatch, implying safe but conservative operation. Furthermore, the controller is able to steer the end effector along interesting trajectories in order to avoid potential collisions in an optimal way (e.g. video: ).

Iv-D AMPC Design

For the robot control, the AMPC is designed according to Algorithm 3. For this purpose, we first design an RMPC with a sampling time of , i.e., ten times faster than the previous RMPC. To simplify the learning problem, we only consider the self-collision avoidance constraint. Therefore, the MPC control law depends on the state and the desired reference , i.e., on parameters in total.

To obtain the necessary precision for the AMPC, interesting questions emerged regarding the structure of the used NN, its training procedure and the sampling of the (ground truth) RMPC. Regarding the depth of the network, our observations confirm insights in [12]: deep NNs are better suited to obtain an explicit policy representation.

A tradeoff exists between the higher expressiveness and the slower training of deeper networks. We decided to use a fully connected NN with 20 hidden layers, consecutively shrinking the layers from neurons in the first hidden layer to in the output layer. This results in roughly trainable parameters in total. All hidden neurons are ReLu

-activated, whereas the output layer is activated linearly. Other techniques such as batch normalization, regularization, or skip connections did not help to improve the approximation.

The RMPC control law can become relatively large in magnitude, which makes the regression more difficult. We circumvent this problem by directly learning the applied input . This can be seen as a zero-centered normalization of the reference output, which allows us to achieve significantly smaller approximation errors. In addition, can be readily evaluated online, since is known.

For the training, we use a set of approximately datapoints which are obtained by offline sampling the RMPC. Our training corpus consists of a combination of random sampling and trajectory-based sampling of i.i.d. trajectories , with trajectory-wise random initial condition and reference . The former helps the network to get an idea of all areas, whereas the latter one represents the areas of high interest.

Given the AMPC design, we next aim to perform the validation as per Sec. III-C. We execute the validation in simulation, which is deterministic. We account for the model mismatch with a separate term during the validation (cf. Prop. 2). We found that for the considered system and controller tasks, performing the validation is demanding. Currently, we are able to satisfy criterion (11) for approximately % of all sampled points. While this is not fully satisfactory for a high-probability guarantees on full trajectories, it is still helpful to understand the quality of the learned controller. While no failure cases were observed in the experiments reported herein, performing such a priori validation for the robot implementation is subject to future work.

Iv-E Experimental Results AMPC

With the AMPC design, we are able to obtain a 10 times faster feedback on the robot while at the same time reducing the computational demand by a factor of compared to the used RMPC ( vs. evaluation time, times faster update rate). Due to the short evaluation of less than , the control input can be applied immediately for the current sampling interval of ms instead of performing the optimization for the predicted next state. This results in a response time in the interval for the AMPC instead of for the RMPC. The resulting, more aggressive input can be observed in Figure 4. Note that the AMPC sometimes violates the input constraints in the shown experiment. This is mainly due to a combination of a large control gain in the pre-stabilization and large measurement noise in the experiment. To circumvent this problem, the noise could be considered in the design or a less aggressive feedback could be used.

We emphasize that the results and achieved performance are significant, considering the parameters in the nonlinear MPC, while standard explicit MPC approaches are only applicable to small-medium scale linear problems.

V Conclusion

The approach developed in this paper achieves safe and fast tracking control on complex systems such as modern robots by combining robust MPC and NN control.

The proposed robust MPC ensures safe operation (stability, constraint satisfaction) despite uncertain system descriptions. What is more, the MPC scheme simplifies complex tracking control tasks to a single design step by joining otherwise often separate planning and control layers: real-time control commands are directly computed for given reference and constraints. Our experiments on a KUKA LBR4+ arm are the first to demonstrate such robust MPC on a real robotic system. The proposed RMPC thus, provides a complete framework for tracking control of complex robotic tasks.

We tackled the computational complexity of MPC in fast robotics applications by proposing an approximate MPC. This approach replaces the online optimization with the evaluation of a NN, which is trained and validated in an offline fashion on a suitably defined robust MPC. The proposed approach demonstrates significant speed and performance improvements. Again, the presented experiments are the first to demonstrate the suitability of such NN-based control on real robots. Providing a priori statistical guarantees for such robot experiments by further improving the learning and validation procedures are relevant topics for future work.

Vi Acknowledgments

The authors thank A. Marco and F. Solowjow for helpful discussions, and their colleagues at MPI-IS who contributed to the Apollo robot platform.

References

  • [1] J. A. E. Andersson, J. Gillis, G. Horn, J. B. Rawlings, and M. Diehl (2019) CasADi: A software framework for nonlinear optimization and optimal control. Mathematical Programming Computation 11. Cited by: §IV-B2.
  • [2] A. Carron, E. Arcari, M. Wermelinger, L. Hewing, M. Hutter, and M. N. Zeilinger (2019) Data-driven model predictive control for trajectory tracking with a robotic arm. IEEE Robotics and Automation Letters 4 (4), pp. 3758–3765. Cited by: §I, §IV-A1, §IV-B1, Remark 1.
  • [3] S. Chen, K. Saulnier, N. Atanasov, D. D. Lee, V. Kumar, G. J. Pappas, and M. Morari (2018) Approximating explicit model predictive control using constrained neural networks. In Proc. American Control Conf. (ACC), Vol. , pp. 1520–1527. External Links: Document, ISSN 2378-5861 Cited by: §I.
  • [4] S. W. Chen, T. Wang, N. Atanasov, V. Kumar, and M. Morari (2019) Large scale model predictive control with neural networks and primal active sets. arXiv preprint arXiv:1910.10835. Cited by: §I, §III-C.
  • [5] L. Chisci, J. A. Rossiter, and G. Zappa (2001) Systems with persistent disturbances: predictive control with restricted constraints. Automatica 37 (7), pp. 1019–1028. Cited by: §I.
  • [6] M. Faroni, M. Beschi, N. Pedrocchi, and A. Visioli (2019-02) Predictive inverse kinematics for redundant manipulators with task scaling and kinematic constraints. IEEE Transactions on Robotics 35 (1), pp. 278–285. External Links: Document, ISSN 1941-0468 Cited by: §I.
  • [7] T. Faulwasser, T. Weber, P. Zometa, and R. Findeisen (2017-07) Implementation of nonlinear model predictive path-following control for an industrial robot. IEEE Trans. on Control Systems Technology 25 (4), pp. 1505–1511. External Links: Document, ISSN 1063-6536 Cited by: §I, Remark 1.
  • [8] D. Fridovich-Keil, S. L. Herbert, J. F. Fisac, S. Deglurkar, and C. J. Tomlin (2018) Planning, fast and slow: a framework for adaptive real-time safe trajectory planning. In Proc. IEEE International Conf. on Robotics and Automation (ICRA), pp. 387–394. Cited by: §I.
  • [9] M. Hertneck, J. Köhler, S. Trimpe, and F. Allgöwer (2018-07) Learning an approximate model predictive controller with guarantees. IEEE Control Systems Letters 2 (3), pp. 543–548. External Links: Document, ISSN 2475-1456 Cited by: §I, §I, §III-C2, §III-C3, §III-C, §III, Lemma 1.
  • [10] W. Hoeffding (1963) Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58 (301), pp. 13–30. External Links: ISSN 01621459 Cited by: §III-C2.
  • [11] A. Jubien, M. Gautier, and A. Janot (2014) Dynamic identification of the Kuka LWR robot using motor torques and joint torque sensors data. In Proc. 19th IFAC World Congress, pp. 8391 – 8396. Cited by: §IV-A.
  • [12] B. Karg and S. Lucia (2018)

    Efficient representation and approximation of model predictive control laws via deep learning

    .
    arXiv preprint arXiv:1806.10644, pp. . Cited by: §I, §IV-D.
  • [13] J. Köhler, R. Soloperto, M. A. Müller, and F. Allgöwer (2019) A computationally efficient robust model predictive control framework for uncertain nonlinear systems. arXiv preprint arXiv:1910.12081. Cited by: §I, §I, §I, §III-A1, §III-A1, §III-A3, §III-B2, §III-B4, §III-B, §III, §IV-B1, Proposition 1.
  • [14] J. Köhler, M. A. Müller, and F. Allgöwer (2019) A nonlinear model predictive control framework using reference generic terminal ingredients. IEEE Trans. Autom. Control. Note: accepted Cited by: §III-B3.
  • [15] J. Köhler, M. A. Müller, and F. Allgöwer (2019) A nonlinear tracking model predictive control scheme for dynamic target signals. arXiv preprint arXiv:1911.03304. Cited by: §I, §III-B1, §III-B2, §III-B2, §III-B4, §III-B4, §III-B.
  • [16] D. Limon, I. Alvarado, T. Alamo, and E. Camacho (2010) Robust tube-based MPC for tracking of constrained linear systems with additive disturbances. J. Proc. Contr. 20 (3), pp. 248–260. Cited by: §I.
  • [17] D. Limon, A. Ferramosca, I. Alvarado, and T. Alamo (2018-11) Nonlinear MPC for tracking piece-wise constant reference signals. IEEE Trans. Autom. Control 63 (11), pp. 3735–3750. External Links: Document, ISSN 0018-9286 Cited by: §I, §I, §III-B1, §III-B2, §III-B2, §III-B4, §III-B4, §III-B.
  • [18] A. Majumdar and R. Tedrake (2017) Funnel libraries for real-time robust feedback motion planning. The International Journal of Robotics Research 36 (8), pp. 947–982. Cited by: §I.
  • [19] J. Nubert (2019) Learning-based Approximate Model Predictive Control With Guarantees - Joining Neural Networks with Recent Robust MPC. Master’s Thesis, ETH Zürich, 8092, Zürich. Note: https://www.research-collection.ethz.ch/handle/20.500.11850/385654 Cited by: §IV-A2, §IV-B2.
  • [20] J. B. Rawlings and D. Q. Mayne (2009) Model predictive control: theory and design. Nob Hill Pub.. Cited by: §I, §III-B.
  • [21] B. Siciliano, L. Sciavicco, L. Villani, and G. Oriolo (2008) Robotics: modelling, planning and control. 1st edition, Springer. External Links: ISBN 1846286417, 9781846286414 Cited by: §I, §IV-A.
  • [22] S. Singh, A. Majumdar, J. Slotine, and M. Pavone (2017) Robust online motion planning via contraction theory and convex optimization. In Proc. IEEE International Conf. on Robotics and Automation (ICRA), pp. 5883–5890. Cited by: §I.
  • [23] R. Soloperto, J. Köhler, M. A. Müller, and F. Allgöwer (2019) Collision avoidance for uncertain nonlinear systems with moving obstacles using robust model predictive control. In Proc. European Control Conf. (ECC), pp. 811–817. Cited by: Remark 5.
  • [24] Y. Stuerz, L. Affolter, and R. Smith (2017) Parameter identification of the Kuka LBR iiwa robot including constraints on physical feasibility. In Proc. 20th IFAC World Congress, pp. 6863 – 6868. Cited by: §IV-A.
  • [25] M. E. Villanueva, R. Quirynen, M. Diehl, B. Chachuat, and B. Houska (2017) Robust MPC via min–max differential inequalities. Automatica 77, pp. 311–321. Cited by: §I.
  • [26] X. Zhang, M. Bujarbaruah, and F. Borrelli (2019) Near-optimal rapid MPC using neural networks: a primal-dual policy learning framework. arXiv preprint arXiv:1912.04744. Cited by: §I.
  • [27] X. Zhang, A. Liniger, and F. Borrelli (2017) Optimization-based collision avoidance. arXiv preprint arXiv:1711.03449. Cited by: Remark 5.