Log In Sign Up

Neural Solvers for Fast and Accurate Numerical Optimal Control

Synthesizing optimal controllers for dynamical systems often involves solving optimization problems with hard real-time constraints. These constraints determine the class of numerical methods that can be applied: computationally expensive but accurate numerical routines are replaced by fast and inaccurate methods, trading inference time for solution accuracy. This paper provides techniques to improve the quality of optimized control policies given a fixed computational budget. We achieve the above via a hypersolvers approach, which hybridizes a differential equation solver and a neural network. The performance is evaluated in direct and receding-horizon optimal control tasks in both low and high dimensions, where the proposed approach shows consistent Pareto improvements in solution accuracy and control performance.


Implicit energy regularization of neural ordinary-differential-equation control

Although optimal control problems of dynamical systems can be formulated...

Optimal Control for Structurally Sparse Systems using Graphical Inference

Dynamical systems with a distributed yet interconnected structure, like ...

A Direct-Indirect Hybridization Approach to Control-Limited DDP

Optimal control is a widely used tool for synthesizing motions and contr...

Incremental Correction in Dynamic Systems Modelled with Neural Networks for Constraint Satisfaction

This study presents incremental correction methods for refining neural n...

Surrogate Models for Optimization of Dynamical Systems

Driven by increased complexity of dynamical systems, the solution of sys...

Learning a Family of Optimal State Feedback Controllers

Solving optimal control problems is well known to be very computationall...

1 Introduction

Optimal control of complex, high–dimensional systems requires computationally expensive numerical methods for differential equations (pytlak2006numerical; rao2009survey). Here, real–time and hardware constraints preclude the use of accurate and expensive methods, forcing instead the application of cheaper and less accurate algorithms. While the paradigm of optimal control has successfully been applied in various domains (vadali1999initial; lewis2012optimal; zhang2016optimal), improving accuracy while satisfying computational budget constraints is still a great challenge (ross2006issues; baotic2008efficient). To alleviate computational overheads, we detail a procedure for offline optimization and subsequent online application of hypersolvers (poli2020hypersolvers) to optimal control problems. These hybrid solvers achieve the accuracy of higher–order methods by augmenting numerical results of a base solver with a learning component trained to approximate local truncation residuals. When the cost of a single forward–pass of the learning component is kept sufficiently small, hypersolvers improve the computation–accuracy Pareto front of low–order explicit solvers (butcher2016numericalmethods). However, direct application of hybrid solvers to controlled dynamical system involves learning truncation residuals on the higher–dimensional spaces of state and control inputs. To extend the range of applicability of hypersolvers to controlled dynamical systems, we propose two pretraining strategies designed to improve, in the set of admissible control inputs, on the average or worst–case hypersolver solution. With the proposed methodology, we empirically show that Pareto front improvements of hypersolvers hold even for optimal control tasks. In particular, we then carry out performance and generalization evaluations in direct and model predictive control tasks. Here, we confirm Pareto front improvements in terms of solution accuracy and subsequent control performance, leading to higher quality control policies and lower control losses. In high–dimensional regimes, we obtain the same control policy as the one obtained by accurate high–order solvers with more than speedup.

2 Numerical Optimal Control

We consider control of general nonlinear systems of the form


with state , input defined on a compact time domain where is a finite set of free parameters of the controller. Solutions of (1) are denoted with for all . Given some objective function and a distribution of initial conditions with support in , we consider the following nonlinear program, constrained to the system dynamics:

subject to
Figure 1: Overview of the proposed method. [Left] The hypersolver is trained to approximate residuals given a distribution of control inputs and states. [Right] The pre–trained hypersolver model is then used to accelerate and improve the accuracy of numerical solutions used during optimization of control policies, leading to higher–quality controllers.

where the controller parameters are optimized. We will henceforth omit the subscript and write . Since analytic solutions of (2) exist only for limited classes of systems and objectives, numerical solvers are often applied to iteratively find a solution. For these reasons, problem 2 is often referred to as numerical optimal control.

Direct optimal control

If the problem (2) is solved offline by directly optimizing over complete trajectories, we call it direct optimal control. The infinite–dimensional optimal control problem is time–discretized and solved numerically: the obtained control policy is then applied to the real target system without further optimization.

Model predictive control

Also known in the literature as receding horizon control, Model Predictive Control (MPC) is a class of flexible control algorithms capable of taking into consideration constraints and nonlinearities (mayne1988receding; Garcia1989ModelPC). MPC considers finite time windows which are then shifted forward in a receding manner. The control problem is then solved for each window by iteratively forward–propagating trajectories with numerical solvers i.e. predicting the set of future trajectories with a candidate controller and then adjusting it iteratively to optimize the cost function (further details on the MPC formulation in Appendix B.2). The optimization is reiterated online until the end of the control time horizon.

2.1 Solver Residuals

Given nominal solutions of (1) we can define the residual of a numerical ODE solver as the normalized error accumulated in a single step size of the method, i.e.


where is the step size and is the order of the numerical solver corresponding to . From the definition of residual in (3), we can define the local truncation error which is the error accumulated in a single step; while the global truncation error represents the error accumulated in the first steps of the numerical solution. Given a –th order explicit solver, we have and (butcher2016numericalmethods).

3 Hypersolvers for Optimal Control

We extend the range of applicability of hypersolvers (poli2020hypersolvers) to controlled dynamical systems. In this Section we discuss the proposed hypersolver architectures and pre–training strategies of the proposed hypersolver methodology for numerical optimal control of controlled dynamical systems.

3.1 Hypersolvers

Given a –order base solver update map , the corresponding hypersolver is the discrete iteration


where is some parametric function with free parameters . The core idea is to select as some function with universal approximation properties and fit the higher-order terms of the base solver by explicitly minimizing the residuals over a set of state and control input samples. This procedure leads to a reduction of the overall local truncation error , i.e. we can improve the base solver accuracy with the only computational overhead of evaluating the function . It is also proven that, if is a –approximator of , i.e.


then , where depends on the hypersolver training results (poli2020hypersolvers, Theorem 1). This result practically guarantees that if is a good approximator for , i.e. , then the overall local truncation error of the hypersolved ODE is significantly reduced with guaranteed upper bounds.

3.2 Numerical Optimal Control with Hypersolvers

Our approach relies on the pre–trained hypersolver model for obtaining solutions to the trajectories of the optimal control problem (2). After the initial training stage, control policies are numerically optimized to minimize the cost function (see Appendix B.3 for further details). Figure 1 shows an overview of the proposed approach consisting in pre–training and system control.

4 Hypersolver Pre–training and Architectures

We introduce in Section 4.1loss functions which are used in the proposed pre–training methods of Section 4.2 and Section 4.3. We also check the generalization properties of hypersolvers with different architectures in Section 4.4. In Section 4.5 we introduce multi–stage hypersolvers

in which an additional first–order learned term is employed for correcting errors in the vector field.

Figure 2: Mean local residuals of the spring–mass system of (17) as a function of control inputs at different step sizes . HyperEuler (see Appendix A.1 for its explicit formulation) improves on the local residuals compared to the baseline Euler and even compared to higher-order ODE solvers at larger step sizes.

4.1 Loss Functions

Residual fitting

Training the hypersolver on a single nominal trajectory results in a supervised learning problem where we minimize point–wise the Euclidean distance between the residual (3) and the output of , resulting in an optimization problem minimizing a loss function of the form


which is also called residual fitting since the target of is the residual .

Trajectory fitting

The optimization can also be carried out via trajectory fitting as following


where corresponds to the exact one–step trajectory and is its approximation, derived via (4) for standard hypersolvers or via (11) for their multi–stage counterparts. This method can also be used to contain the global truncation error in the domain. We will refer to as a loss function of either residual or trajectory fitting types; we note that these loss functions may also be combined depending on the application. The goal is to train the hypersolver network to explore the state–control spaces so that it can effectively minimize the truncation error. We propose two methods with different purposes: stochastic exploration aiming at minimizing the average truncation error and active error minimization whose goal is to reduce the maximum error i.e., due to control inputs yielding high losses.

4.2 Stochastic Exploration

Stochastic exploration aims to minimize the average error of the visited state–controller space i.e., to produce optimal hypersolver parameters as the solution of a nonlinear program


where is a distribution with support in of the state and controller spaces and is the training loss function. In order to guarantee sufficient exploration of the state–controller space, we use Monte Carlo sampling (robert2013monte) from the given distribution. In particular, batches of initial conditions are sampled from and the loss function is calculated with the given system and step size

. We then perform backpropagation for updating the parameters of the hypersolver using a

stochastic gradient descent (SGD) algorithm e.g., (kingma2017adam)

and repeat the procedure for every training epoch. Figure

2 shows pre–training results with stochastic exploration for different step sizes (see Appendix C.2). We notice how higher residual values generally correspond to higher absolute values of control inputs. Many systems in practice are subject to controls that are constrained in magnitude either due to physical limitations of the actuators or safety restraints of the workspace. This property allows us to design an exploration strategy that focuses on worst-case scenarios i.e. largest control inputs.

4.3 Active Error Minimization

Figure 3: Mean Absolute Error (MAE) along trajectories with different pre–training techniques on the spring–mass system of (17). [Top] Stochastic exploration performs better on average i.e. . [Bottom] Active error minimization achieves better results in limit situations as in the case of a bang–bang controller i.e. , in which controllers yielding the highest residuals have been minimized.

The goal of active error minimization is to actively reduce the highest losses in terms of the control inputs, i.e., to obtain as the solution to a minmax problem:


Similarly to stochastic exploration, we create distribution with support in and perform Monte Carlo sampling of batches , from . Then, losses are computed pair–wisely for each state with each control input . We then take the first controllers yielding the maximum loss for each state. The loss is recalculated using these controller values with their respective states and SGD updates to hypersolver parameters are performed. Figure 3 shows a comparison of the pre–training techniques (further experimental details in Appendix C.2). The propagated error on trajectories for the hypersolver pre–trained via stochastic exploration is lower on average with random control inputs compared with the one pre–trained with active error minimization. The latter accumulates lower error for controllers yielding high residuals. [width=]Different exploration strategies may be used depending on the down–stream control task.

4.4 Generalization Properties of Different Architectures

Figure 4:

Generalization outside of the training region (red rectangle) in the state space of an inverted pendulum model with different hypersolver activation functions. Architectures containing activation functions with periodic components achieve better extrapolation properties compared to the others.

We have assumed the state and controller spaces to be bounded and that training be performed by sampling for their known distributions. While this is sufficient for optimal control problems given a priori known bounds, we also investigate how the system generalizes to unseen states and control input values. In particular, we found that activation functions have an impact on the end result of generalization beyond training boundaries. We take into consideration two commonly used activation functions, and , along with network architectures which employ activation functions containing periodic components: (sitzmann2020implicit) and (ziyin2020neuralsnake). We train hypersolver models with the different activation functions for the inverted pendulum model of (18) with common experimental settings (see Appendix C.3). Figure 4 shows generalization outside the training states (see Figure 9 in Appendix C.3 for generalization of controllers and step sizes). We notice that while and perform well on the training set of interest, performance degrades rapidly outside of it. On the other one hand, and manage to extrapolate the periodicity of the residual distribution even outside of the training region, thus providing further empirical evidence of the universal extrapolation theorem (ziyin2020neuralsnake, Theorem 3). Activation function choice plays an important role in Hypersolver performance and generalization.

4.5 Multi–stage Hypersolvers

We have so far considered the case in which the vector field (1) fully characterizes the system dynamics. However, if the model does not completely describe the actual system dynamics, first–order errors are introduced. We propose Multi-Stage Hypersolvers to correct these errors: an additional term is introduced in order to correct the inaccurate dynamics . The resulting procedure is a modified version of (4) in which the base solver does not iterate over the modeled vector field but over its corrected version :


where is a function with universal approximation properties. While the inner stage is a first–order error approximator, the outer stage further reduces errors approximating the –th order residuals:


We note that is continuously adjusted due to the optimization of . For this reason, it is not possible to derive the analytical expression of the residuals to train the stages with the residual fitting loss function (6). Instead, both stages can be optimized at the same time via backpropagation calculated on one–step trajectory fitting loss (7) which does not require explicit residuals calculation.

5 Experiments

We introduce the experimental results divided for each system into hypersolver pre–training and subsequent optimal control. We use as accurate adaptive step–size solvers the Dormand/Prince method (Dormand1980AFO) and an improved version of it by Tsitouras (TSITOURAS2011770) for training the hypersolvers and to test the control performance at runtime.

5.1 Direct optimal control of a Pendulum

Hypersolver pre–training

We consider the inverted pendulum model with a torsional spring described in (18). We select

as a uniform distribution with support in

where and to guarantee sufficient exploration of the state-controller space. Nominal solutions are calculated using with absolute and relative tolerances set to . We train the hypersolver on local residuals via stochastic exploration using the optimizer with learning rate of for epochs.

Direct optimal control

The goal is to stabilize the inverted pendulum in the vertical position . We choose and a step size for the experiment. The control input is assumed continuously time–varying. The neural controller is optimized via SGD with with learning rate of for epochs. Figure 5 shows nominal controlled trajectories of HyperEuler and other baseline fixed–step size solvers. Trajectories obtained with the controller optimized with HyperEuler reach final positions while Midpoint and RK4 ones and respectively. On the other hand, the controller optimized with the Euler solver fails to control some trajectories obtaining a final . HyperEuler considerably improved on the Euler baseline while requiring only more Floating Point Operations (FLOPs) and less compared to Midpoint. Further details are available in Appendix C.3.

Figure 5: Direct optimal control of the inverted pendulum in phase space. While the controller optimized with the Euler solver fails to control the system for some trajectories, the one obtained with HyperEuler improves the performance while introducing a minimal overhead with results comparable to higher–order solvers.

5.2 Model Predictive Control of a Cart-Pole System

Hypersolver pre–training

We consider the partial dynamics of the cart–pole system of (19) with wrong parameters for the frictions between cart and track as well as the one between cart and pole. We employ the multi–stage hypersolver approach to correct the first–order error in the vector field as well as base solver residual. We select as a uniform distribution with support in where and . Nominal solutions are calculated on the accurate system using instead of adaptive–step solvers due faster training times. We train our multi–stage Hypersolver (i.e. a multi–stage hypersolver with the second–order Midpoint as base solver with the partial dynamics) on nominal trajectories of the accurate system via stochastic exploration using the optimizer for epochs, where we set the learning rate to for the first epochs, then decrease it to for epochs and to for the last .

Model predictive control

The goal is to stabilize the cart–pole system in the vertical position around the origin, e.g. . We choose and a step size for the experiment. The control input is assumed piece-wise constant during MPC sampling times. The receding horizon is chosen as . The neural controller is optimized via SGD with with learning rate of for a maximum of iterations at each sampling time. Figure 6 shows nominal controlled trajectories of multi–stage Hypersolver and other baseline solvers. The Midpoint solver on the inaccurate model fails to stabilize the system at the origin position , while multi–stage Hypersolver manages to stabilize the cart–pole system and improve on final positions . Further details are available in Appendix C.4.

Figure 6: Model Predictive Control with constrained inputs on the cart–pole model. MPC with the Midpoint solver iterating on the partial dynamic model successfully swings up the pole but fails to reach the target position. Multi–stage Hypersolver with the Midpoint base solver has knowledge restricted to the inaccurate system, yet it manages to obtain a similar control performance compared to controllers with access to the nominal dynamics while also needing less control effort and absolute energy inflow compared to its base solver.

Multi–stage Hypersolvers can correct first–order errors on dynamic models and base solver residuals.

5.3 Model Predictive Control of a Quadcopter

Hypersolver pre–training

We consider the quadcopter model of (20). We select as a uniform distribution with support in where is chosen as a distribution of possible visited states and each of the four motors has control inputs . Nominal solutions are calculated on the accurate system using with relative and absolute tolerances set to and respectively. We train HyperEuler on local residuals via stochastic exploration using the optimizer with learning rate of for epochs.

Model predictive control

The control goal is to reach a final positions . We choose and a step size for the experiment. The control input is assumed piece–wise constant during MPC sampling times. The receding horizon is chosen as . The neural controller is optimized via SGD with with learning rate of for iterations at each sampling time. Figure 7 shows local residual distribution and control performance on the quadcopter over experiments starting at random initial conditions which are kept common for the different ODE solvers. HyperEuler requires a single function evaluation per step as for the Euler solver compared to two function evaluations per step for Midpoint and four for RK4. Controlled trajectories optimized with Euler, Midpoint and RK4 collect an error on final positions of , , respectively while HyperEuler achieves the lowest terminal error value of . Additional experimental details are available in Appendix C.5.

Figure 7: [Left] Local residual distribution for the quadcopter model for . [Center] Trajectories of controlled quadcopters with MPC whose receding horizon controller is optimized by solving the ODE with different methods. [Right] Final positions error distribution. The proposed approach with HyperEuler achieves lower average error compared to other baseline solvers while requiring a low overhead compared to higher–order solvers due to a smaller number of dynamics function evaluations.

5.4 Boundary Control of a Timoshenko Beam

Hypersolver pre–training

We consider the finite element discretization of the Timoshenko beam of (22). We create as a distribution with support in which is generated at training time via random walks from known boundary conditions in order to guarantee both physical feasibility and sufficient exploration of the state-controller space (see Appendix C.6 for further details). Nominal solutions are calculated using with absolute and relative tolerances set to . We train the hypersolver on local residuals via stochastic exploration using the optimizer for epochs, where we set the learning rate to for the first epochs, then decrease it to for epochs and to for the last .

Figure 8: Displacement variables and of the discretized Timoshenko beam as a function of position of the finite elements and time . The controller optimized with HyperEuler manages to stabilize the beam while the baseline solvers Euler and Midpoint fail, yet requiring less than a third in terms of runtime compared to RK4.

Boundary direct optimal control

The task is to stabilize the beam in the straight position, i.e. each of its elements have velocities and displacements equal to . We choose and step size for the experiment. The control input is assumed continuously time–varying. The neural controller is optimized via SGD with with learning rate of for epochs. Figure 8 shows nominal controlled trajectories for HyperEuler and other baseline fixed–step size solvers. Control policies trained with Euler and Midpoint obtain averaged final states of and thus failing to stabilize the beam, while HyperEuler and RK4 obtain and respectively. HyperEuler considerably improves on both the Euler and Midpoint baselines obtaining a very similar performance to RK4, while requiring less FLOPs; the mean runtime per training iteration was cut from for RK4 to just for HyperEuler. Further details on this experiment are available in Appendix C.6. Hypersolvers are even more impactful in complex high–dimensional controlled systems.

6 Related Work

This work is rooted in the broader literature on surrogate methods for speeding up simulations and solutions of dynamical systems (grzeszczuk1998neuroanimator; james2003precomputing; gorissen2010surrogate). Differently from these approaches, we investigate a methodology to enable faster solution during a downstream, online optimization problem involving a potential mismatch compared to data seen during pre–training. We achieve this through the application of the hypersolver (poli2020hypersolvers) paradigm. Modeling mismatches between approximate and nominal models is explored in (saveriano2017residualdynamics) where residual dynamics are learned efficiently along with the control policy while (fisac2018uncertainrobotics; taylor2019learningsafetycritical)

model systems uncertainties in the context of safety–critical control. In contrast to previous work, we model uncertainties with the proposed multi–stage hypersolver approach by closely interacting with the underlying ODE base solvers and their residuals to improve solution accuracy. The synergy between machine learning and optimal control continues a long line of research on introducing neural networks in optimal control

(hunt1992neuralautomatica), applied to modeling (lin1995new), identification (chu1990neural) or parametrization of the controller itself (lin1991neural). Existing surrogate methods for systems (grzeszczuk1998neuroanimator; james2003precomputing) pay a computational cost upfront to accelerate downstream simulation. However, ensuring transfer from offline optimization to the online setting is still an open problem. In our approach, we investigate several strategies for an accurate offline–online transfer of a given hypersolver, depending on desiderata on its performance in terms of average residuals and error propagation on the online application. Beyond hypersolvers, our approach further leverages the latest advances in hardware and machine learning software (paszke2019pytorch) by solving thousands of ODEs in parallel on graphics processing units (GPUs).

7 Conclusion

We presented a novel method for obtaining fast and accurate control policies. Hypersolver models were firstly pre–trained on distributions of states and controllers to approximate higher–order residuals of base fixed–step ODE solvers. The obtained models were then employed to improve the accuracy of trajectory solutions over which control policies were optimized. We verified that our method shows consistent improvements in the accuracy of ODE solutions and thus on the quality of control policies optimized through numerical solutions of the system. We envision the proposed approach to benefit the control field and robotics in both simulated and potentially real–world environments by efficiently solving high–dimensional space–continuous problems.

Code of Ethics

We acknowledge that all the authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Reproducibility Statement

We share the code used in this paper and make it publicly available on Github111Supporting reproducibility code is at
. The following appendix also supplements the main text by providing additional clarifications. In particular, Appendix A provides further details on the considered hypersolver models. We provide additional information on optimal control policy in Appendix B while in Appendix C we provide details on on the system dynamics, architectures and other experimental details. Additional explanations are also provided as comments in the shared code implementation.


Appendix A Additional Hypersolver Material

a.1 Explicit HyperEuler Formulation

Our analysis in the experiments takes into consideration the hypersolved version of the Euler scheme, namely HyperEuler. Since Euler is a first–order method, it requires the least number of function evaluations (NFE) of the vector field in (1) and yields a second order local truncation error . This error is larger than other fixed–step solvers and thus has the most room for potential improvements. The base solver scheme of (4) can be written as , which is approximating the next state by adding an evaluation of the vector field multiplied by the step size . We can write the HyperEuler update explicitly as


while we write its residual as


a.2 Hypersolvers for Time–invariant Systems

A time–invariant system with time–invariant controller can be described as following


in which and do not explicitly depend on time. The models considered in the experiments satisfy this property.

Appendix B Control Policy Details

b.1 Optimal Control Cost Function

The general form of the integral cost functional can be written as follows


where matrix is a penalty on deviations from the target of the last states, penalizes all deviations from the target of intermediate states while is a regulator for the control inputs. Evaluation of (15) usually requires numerical solvers such as the proposed hypersolvers of this work. Discretizations of the cost functional are also called cost function in the literature.

b.2 Model Predictive Control Formulation

The following problem is solved online and iteratively until the end of the time span

subject to

where is a cost function and is the receding horizon over which predicted future trajectories are optimized.

b.3 Neural Optimal Control

We parametrize the control policy of problem (2) as where is a finite set of free parameters. Specifically, we consider the case of neural optimal control in which controller

is a multi–layer perceptron. The optimal control task is to minimize the cost function

described in (15) and we do so by optimizing the parameters via SGD; in particular, we use the (kingma2017adam) optimizer for all the experiments.

Appendix C Experimental Details

In this section we include additional modeling and experimental details divided into the different dynamical systems.

c.1 Hypersolver Network Architecture

We design the hypersolver networks

as feed–forward neural networks. Table

LABEL:tab:parameters summarizes the parameters used for the considered controlled systems, where Activation denotes the activation functions, i.e. : , : and (ziyin2020neuralsnake).

Spring–Mass Inverted
Pendulum 222This architecture refers to the optimal control experiment. Details on hypersolver models for the generalization experiment on the inverted pendulum are available in Appendix C.3. Cart–Pole333In the multi–stage hypersolver experiment we consider both the inner stage and the outer stage with the same architecture and jointly trained (more information and ablation study in Appendix C.4). Quadcopter Timoshenko
Input Layer 5 5 9 28 322
Hidden Layer 1 32 32 32 64 256
Activation 1 Softplus Softplus Snake Softplus Snake
Hidden Layer 2 32 32 32 64 256
Activation 2 Tanh Tanh Snake Softplus Snake
Output Layer 2 2 4 12 160
Table 1: Hyper–parameters for the hypersolver networks in the experiments.

We also use the vector field as an input of the hypersolver, which does not require a further evaluation since it is pre–evaluated at runtime by the base solver . We emphasize that the size of the network should depend on the application: a too–large neural network may require more computations than just increasing the numerical solver’s order: Pareto optimality of hypersolvers also depends on their complexity. Keeping their neural network small enough guarantees that evaluating the hypersolvers is cheaper than resorting to more complex numerical routines.

c.2 Spring-mass System

System Dynamics

The spring-mass system considered is described in the Hamiltonian formulation by


where and .

Pre-training methods comparison

We select as a uniform distribution with support in where while . Nominal solutions are calculated on the accurate system using with relative and absolute tolerances set to and respectively. We train two separate HyperEuler models with different training methods on local residual for step size : stochastic exploration and active error minimization. The optimizer used is with learning rate of for epochs.

Hypersolvers with different step sizes

We select as a uniform distribution with support in where while . Nominal solutions are calculated on the accurate system using with relative and absolute tolerances set to and respectively. We train separate HyperEuler models with stochastic exploration with different step sizes . The optimizer used is with learning rate of for epochs.

c.3 Inverted Pendulum

System Dynamics

We model the inverted pendulum with elastic joint with Hamiltonian dynamics via the following:


where , , , , .

Figure 9: Generalization with different hypersolver activation functions (HyperEuler models are marked with ”HE”) on the inverted pendulum. [Left] Generalization for the controller space outside of the training region (red area). The architecture with Snake can to generalize better compared to other hypersolvers. [Right] Generalization for different time steps outside of the training step (red line). HyperEuler is able to improve the baseline Euler solver performance even for unseen .

Pre–training for the generalization study

We perform sampling via stochastic exploration from the uniform distribution with support in with and for the different architectures. We choose as a common time step ; the networks are trained for 100000 epochs with the optimizer and learning rate of . The network architectures share the same parameters as the inverted pendulum ones in LABEL:tab:parameters, while the activation functions are substituted by the ones in Figure 4. The architecture is chosen with 2 hidden layers of size 64. Figure 9 provides an additional empirical results on generalization properties across controller values and step sizes: we notice how can generalize to unseen control values better compared to other hypersolvers.

Additional visualization

Figure 10 provides an additional visualization of the inverted pendulum controlled trajectories from Figure 5 with positions and momenta over time.

Figure 10: Controlled trajectories of the inverted pendulum with controllers optimized via different solvers.

c.4 Cart-Pole

System Dynamics

We consider a continuous version of a cart–pole system additionally taking into account the full dynamic model in razvan2005cartpole. This formulation considers the friction coefficient between the track and the cart inducing a force opposing the linear motion as well as the friction generated between the cart and the pole , whose generated torque opposes the angular motion. The full cart–pole model is described by the four variables and the accelerations update is as following


where , , and . represents the normal force acting on the cart. For simulation purposes, we consider its sign to be always positive when evaluating the sign (sgn) function as the cart should normally not jump off the track. Setting , to results in the same dynamic model used in the OpenAI Gym (brockman2016openaigym) implementation.

Figure 11: One step Mean Absolute Error (MAE) for multi–stage hypersolvers and different solvers as well as correction schemes in the ablation study. Multi–stage Hypersolver (MS) with joint training and Midpoint base solver iterating on an inaccurate vector field agnostic of friction forces outperforms the Midpoint solver with full knowledge of the vector field.

Multistage training strategies

We study two different training strategies for the inner and outer networks and in (11). We first consider a joint training strategy in which both stages are trained at the same time via stochastic exploration. Secondly, we do a separated training in which only the inner stage network is trained first and then the outer stage network is added and only its parameters are trained in a finetuning process. We find that, as shown in Figure 11, joint training yields slightly better results. A further advantage of jointly training both stages is that only a single training procedure is required.

Ablation study

We consider the same model as our Multi–stage Hypersolver with base Midpoint solver but no first–stage , which corresponds to learning the residual dynamics only, and we train this model with stochastic exploration. We show in Figure 11 that while the residual dynamics model can improve the one–step error compared to the base solver on the inaccurate dynamics, it performs worse the Multi–stage Hypersolver scheme. We additionally study the contribution of each stage in the prediction error improvements by separately zeroing out the contributions of the inner and outer stage. While iterating over the inner stage only improves on the base–solver error, including the outer stage further contributes in improving the error. We notice how the excluding the inner–stage yields higher errors: this may be due to the fact that the inner–stage specializes in correcting the first–order vector field inaccuracies while the outer–stage corrects the one step base solver residual.

Additional experimental details

For the Multi–stage Hypersolver control experiment, we pre–train both inner and outer stage networks and in (11) at the same time using stochastic exploration. The base solver is chosen as the second–order Midpoint iterating on the partial dynamics (19) with , set to . The nominal dynamics considers non–null friction forces: we set the cart friction coefficient to and the one of the pole to . We note how the friction coefficients make the vector field (19) non–smooth: simulation through adaptive–step size solvers as results experimentally time–consuming, hence we resort to for training the hypersolver networks. Nonetheless, as shown in the error propagation of Figure 12, this does not degrade the performance of the trained multi–stage hypersolver scheme. All neural networks in the experiements, including the ablation study, are trained with the optimizer for epochs, where we set the learning rate to for the first epochs, then decrease it to for epochs and to for the last .

c.5 Quadcopter

System Dynamics

The quadcopter model is a suitably modified version of the explicit dynamics update in (panerati2021learningtofly)

for batched training in PyTorch. The following accelerations update describes the dynamic model


mwhere corresponds to the drone positions and to its angular positions; and are its rotation and inertial matrices respectively, is a function calculating the torques induced by the motor speeds , while arm length , mass , gravity acceleration constant along with and are scalar variables describing the quadcopter’s physical properties.

Figure 12: Symmetric Mean Absolute Percentage Error (SMAPE) propagation along controlled trajectories of the cart–pole system. The Multi–stage Hypersolver with knowledge limited to the inaccurate model manages to outperform the Euler solver iterating on the accurate dynamics in terms of positions and angular positions.

c.6 Timoshenko Beam

System Dynamics

We consider as a system from the theory of continuum dynamics the Timoshenko beam with no dissipation described in (Macchelli2004ModelingAC; massaroli2021differentiablemsl). The system can be described in the coenergy formulation by the following partial differential equation (PDE)


where is the mass density, is the cross section area, is the rotational inertia, and are the shear and bending compliance; the discretized state variables represent the translational and rotational velocities respectively while denote the translational and rotational displacements. cantilever beam. In order to discretize the problem, we implement a software routine based on the (alnaes2015fenics)open–source software suite to obtain the finite–elements discretization of the Timoshenko PDE of (21) given the number of elements, physical parameters of the model and initial conditions of the beam. We choose a 40 elements discretization of the PDE for a total of 160 dimensions of the discretized state and we initialize the beam at time as . The system can thus be reduced to the following controlled linear system


where the mass matrices , matrices , vectors are computed through the routine and boundary controllers and are the control torque and the control force applied at the free end of the beam.

Stochastic exploration strategy via random walks

We pre–train the hypersolver model via stochastic exploration of the state-controller space . We restrict the boundary control input values in

. As for the state space, naively generating a probability distribution with box boundaries on each of the 160 dimensions of

would require an inefficient search over this high-dimensional space: in fact, not every combination is physically feasible due to the Timoshenko beam’s structure. We solve this problem by propagating batched trajectories with RK4 from the initial boundary condition with random control actions sampled from a uniform distribution with support in applied for a time . We save the states and forward propagate from these states again by sampling from the controller and time distributions. We repeat the process times and obtain a sequence of batched initial conditions characterized by physical feasibility. Finally, we train the hypersolver with stochastic exploration by sampling from the generated distribution on local one–step residuals as described in Section 5.4. This initial state generation strategy is repeated every 100 epochs for guaranteeing an extensive exploration of all possible boundary conditions. Figure 13 shows the error propagation over controlled trajectories: the trained HyperEuler achieves the lowest error among baseline fixed–step solvers.

Figure 13: Mean Absolute Error (MAE) propagation on velocities and displacements for the finite elements of the discretized Timoshenko beam along controlled trajectories. While solutions from Euler and Midpoint quickly diverge due to the system’s stiffness, HyperEuler manages not only to contain errors but even outperform the fourth-order RK4 whilst requiring a fraction of the number of vector field evaluations.

Additional details on the results

We report additional details regarding the runtime of the experiments on the Timoshenko Beam. As for the training time, training the hypersolver for epochs takes around minutes, where the time for each training epoch slightly varies depending on the length of the sequence of initial condition batches obtained via random walks. As for the averaged runtime per training epoch during the control policy optimization, HyperEuler takes per training iteration, Euler , Midpoint and RK4 . Experiments were run on the CPU of the machine described in Section C.7.

c.7 Hardware and Software

Experiments were carried out on a machine equipped with an AMD Ryzen Threadripper 3960X CPU with threads and two NVIDIA RTX 3090 graphic cards. Software–wise, we used  (paszke2019pytorch)

for deep learning and the

(poli2020torchdyn) and (chen2019neural) libraries for ODE solvers. We additionally share the code used in this paper and make it publicly available on Github444Supporting reproducibility code is at