Robust Counterexample-guided Optimization for Planning from Differentiable Temporal Logic

by   Charles Dawson, et al.

Signal temporal logic (STL) provides a powerful, flexible framework for specifying complex autonomy tasks; however, existing methods for planning based on STL specifications have difficulty scaling to long-horizon tasks and are not robust to external disturbances. In this paper, we present an algorithm for finding robust plans that satisfy STL specifications. Our method alternates between local optimization and local falsification, using automatically differentiable temporal logic to iteratively optimize its plan in response to counterexamples found during the falsification process. We benchmark our counterexample-guided planning method against state-of-the-art planning methods on two long-horizon satellite rendezvous missions, showing that our method finds high-quality plans that satisfy STL specifications despite adversarial disturbances. We find that our method consistently finds plans that are robust to adversarial disturbances and requires less than half the time of competing methods. We provide an implementation of our planner at



page 7


Multi-agent Motion Planning from Signal Temporal Logic Specifications

We tackle the challenging problem of multi-agent cooperative motion plan...

Footstep Planning with Encoded Linear Temporal Logic Specifications

This article presents an approach to encode Linear Temporal Logic (LTL) ...

Iterator-Based Temporal Logic Task Planning

Temporal logic task planning for robotic systems suffers from state expl...

Reactive Task and Motion Planning under Temporal Logic Specifications

We present a task-and-motion planning (TAMP) algorithm robust against a ...

Proof-Carrying Plans: a Resource Logic for AI Planning

Recent trends in AI verification and Explainable AI have raised the ques...

An Abstraction-Free Method for Multi-Robot Temporal Logic Optimal Control Synthesis

The majority of existing Linear Temporal Logic (LTL) planning methods re...

Control of Magnetic Microrobot Teams for Temporal Micromanipulation Tasks

In this paper, we present a control framework that allows magnetic micro...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction & Related Work

There is a substantial gap between how many users dream of interacting with intelligent robots and how those robots are programmed in reality. The dream is for the human user to instruct their robot in something not too far from natural language, e.g. “please visit both the gas station and the grocery store, and make sure you get back here within 30 minutes”, or “land at one of three landing pads, but stay clear of other aircraft”. Unfortunately, robots today usually expect much more concrete guidance, such as a specific trajectory or feedback controller. As the tasks we wish to assign our robots grow increasingly complex, there is a correspondingly increased need for flexible specification of robot programs and tools to automatically derive concrete plans from those specifications. Moreover, since the real world is unavoidably messy, any plan thus derived must also be robust to unforeseen variation in the environment; the robot must be able to accomplish its plan even when the environment changes.

Luckily, when it comes to flexibly specifying complex tasks, we have a convenient tool in the form of temporal logic. There are many flavors of temporal logic, but most relevant to many robotics problems is signal temporal logic (or STL), which provides a flexible language for specifying requirements for continuous real-valued signals [donze10, Sun2022, Pant2017]. STL allows a user to specify a wide range of planning problems by combining logical and temporal operators to express requirements about ordering and dependencies between subtasks. In addition, although the formal syntax of STL can seem opaque at first, it is often quite easy to translate STL formulae into readily-understood natural language. Due to its flexibility, STL is a common choice for specifying robotics problems such as trajectory planning [Pant2018, Pantazides2022] and combined task and motion planning [Plaku2016, Sun2022, Takano2021].

A number of classical methods exist for planning from STL specifications, the most common being abstraction-based methods [Plaku2016], mixed-integer optimization-based methods [Sun2022, yang20], and nonlinear optimization methods [Pant2018, Pantazides2022, Leung2020] (other approaches include sampling-based methods such as [kantaros20] and [vasile17]). Abstraction-based methods have the longest history; these methods first construct a discrete abstraction (in the form of a graph or automaton) of the continuous state space, then plan over this discrete abstraction [Plaku2016]. The drawback of abstraction-based methods is that the size of the discrete abstraction grows exponentially with the dimension of the state space, limiting the scalability of these methods.

Other methods, based on mixed-integer optimization, exploit the fact that STL specifications can be expressed as linear constraints with integer variables, and the resulting optimization formulation provide soundness and completeness guarantees. Unfortunately, although mixed-integer optimization is sound and complete, these mixed-integer programs quickly become intractable as the planning horizon increases [raman15, sadraddini15, yang20]. Some works reduce the size of the program by using timed waypoints instead of a receding horizon [Sun2022], but this requires assumptions (such as access to a bounded-error tracking controller) that can be restrictive.

A more recent line of work has focused on solving STL planning problems using nonlinear optimization [Pant2017, Pant2018, Pantazides2022, Leung2020, Takano2021]. In these approaches, the STL specification is replaced with a continuously differentiable approximation and optimized using local gradient-based methods. These approaches achieve increased generality and scalability by sacrificing completeness and optimality guarantees.

A significant gap in the state of the art is that existing optimization-based STL planners [Pant2017, Pant2018, Takano2021, Leung2020, Pantazides2022, Sun2022] do not explicitly consider the effects of environmental disturbances while planning. These approaches include some amount of robustness implicitly, typically by maximizing the margin by which a plan satisfies the STL specification, but this is often not sufficient in practice to prevent the plan from failing in response to small changes in the environment. Some methods do explicitly consider robustness to disturbances [raman15], but our experiments show that they yield mixed-integer problems that are intractable in practice.

In this paper, we fill this gap by developing a robust planner that uses counterexamples (examples of environmental changes that cause the plan to fail) to refine its plan using nonlinear optimization. This planner relies on an iterative optimization process, inspired by solution methods for multi-player games, that alternates between finding a plan that performs well for all counterexamples seen so far and finding new counterexamples to guide the optimization process. Our framework relies on differentiable simulation and differentiable temporal logic to derive gradients of the plan’s performance with respect to both the planning parameters and the environmental disturbance, enabling an efficient search for new plans and counterexamples.

We compare our approach against state-of-the-art methods, including both mixed-integer methods [raman15] and nonlinear optimization methods [Pant2018, Pantazides2022, Leung2020]. We find that our method not only finds plans that succeed despite worst-case disturbances from the environment, but it also requires less than half the time of the next-most-successful method. Our approach easily scales to handle long-horizon tasks with complex STL specifications that are not tractable for mixed-integer programming, and the plans found using our method are consistently more robust than those found using existing methods.

Ii Preliminaries

We begin by introducing the syntax and semantics of signal temporal logic, or STL. STL defines properties about real-valued functions of time called signals. For our purposes, a signal is defined by a finite number of sampled points , and we assume that the signal is piecewise-affine in between sampled points and constant after the last sample. Syntactically, an STL formula is constructed from predicates based on functions , logical connectives, and temporal operators [donze13]. The syntax of an STL formula is defined inductively as:

where is a closed (but potentially unbounded) time interval and is the “until” operator (read as: within interval , must be true until becomes true). For convenience, when is omitted it is assumed to be . Additional temporal operators such as eventually and always follow from this basic syntax, as do logical operators such as and .

For any signal , an STL formula is satisfied at a given time according to the following Boolean semantics [donze13]:

A useful feature of STL is that, in addition to the Boolean semantics defined above, it also admits a quantitative semantics giving the margin of satisfaction (or robustness margin) of an STL formula, denoted . The formula is satisfied when and not satisfied when . The robustness margin can also be defined inductively:

where is a constant taken to be greater than all other real values. In practice, linear-time algorithms exist for evaluating given a piecewise-affine signal  [donze13].

It is important to make a distinction between the robustness margin of the specification, , and the robustness of a plan designed to satisfy that specification. measures the margin by which the specification is met for a particular execution of a plan, but it does not provide much information about whether the specification will hold across multiple executions, particularly when external disturbances can affect those executions. In the next section, we formalize the robust planning problem, which aims at finding a plan that will satisfy the STL specification even when affected by external disturbances.

STL syntax may appear opaque at first glance, but its myriad symbols belie the fact that it is often straightforward to translate an STL formula into easily-understood natural language. For example, can be read as “between 10– from now, must be positive for before becomes negative.” We provide more examples of STL formulae for robotics problems in Section V.

Iii Problem Statement

In this paper, we focus on the problem of robust planning from an STL specification, which we view as a sequential two-player zero-sum game between the planner and its environment. In the first step of this game (planning time), the planner has the opportunity to tune a set of design parameters , but in the second step (run-time) the environment can change a distinct set of exogenous parameters to degrade the performance of the plan. Together, and define the behavior of an autonomous system , which we assume is a known simulator function mapping design and exogenous parameters to a length- trace of states . We assume that is deterministic, so all uncertainty must be imported via , but we assume that may be chosen adversarially to degrade the performance of our chosen as much as possible. We also assume that the designer must commit to a choice of before the adversary chooses

The performance of a plan is given by a cost function assigning a scalar cost to a behavior trace. To accommodate STL specifications, we deal mainly with cost functions of the form

where is the robustness margin of the behavior trace with respect to a given STL specification . We negate so that minimizing maximizes the robustness margin, and the term permits us to consider other factors in the plan’s performance (e.g. fuel use). The scaling factor is typically small to prioritize satisfying the STL specification.

Since we assume that can vary adversarially to impose worst-case performance for any plan , our goal is to find that are robust to this variation. Concretely, our goal is to solve an optimization problem representing a two-step sequential zero-sum game with two players:


To make this discussion concrete, consider a simple example of path planning for an aerial robot. In this case, might specify that we eventually () reach a goal and always () avoid some obstacles, might represent the locations of waypoints along the path and the parameters of a trajectory-tracking controller to follow those waypoints, and might represent the force from wind that attempts to drive the robot off course. The behavior might be a function that simulates the dynamics of the robot flying through wind, and the additional cost might impose a small penalty on large control inputs to conserve battery life. We provide more in-depth examples in Section V.

Our formulation differs from that presented in [Pant2018] and [Pantazides2022]; although both of these works seek to maximize the robustness margin , neither consider the effect of disturbances . Our formulation is also distinct from the mixed-integer formulation in [Sun2022], since we consider as part of an objective rather than as a constraint. Our unconstrained approach does not provide the same completeness guarantees as a mixed-integer constrained optimization (used in [raman15, sadraddini15, Sun2022]), but empirical results in Section V demonstrate that our approach scales much better.

Of course, solving (1) to global optimality in the general nonlinear case is intractable. Instead, we take advantage of this game structure to design an iterative algorithm to find the generalized Nash equilibrium: the design parameters and corresponding such that neither the planner nor the adversary have an incentive to change their choice [Facchinei2007]. The next section describes this iterative algorithm, which we implement using nonlinear programming with differentiable simulation and differentiable temporal logic.

Iv Approach

To solve the robust STL planning problem (1

), we need to address two key points. First, we must develop a meta-heuristic to find a generalized Nash equilibrium of the sequential game (

1), taking care that we do not overfit to any particular value of . We solve this challenge by developing an iterative counterexample-guided nonlinear optimization framework. Second, in order to solve this problem using nonlinear optimization, we need an efficient way to compute gradients of with respect to both and , which requires us to differentiate not only the behavior function but also the robustness margin computation . We address this challenge using differentiable programming, which we discuss next before introducing our high-level counterexample-guided optimization strategy.

Iv-a Differentiable Simulation and Temporal Logic

Although it is possible to solve nonlinear optimization problems without access to the gradients of the objective or constraint functions, either by estimating gradients 

[suh2021_bundled_gradients] or using zero-order methods [nevergrad], it is often much faster to use exact gradient information when it is available. However, exact gradients can be difficult to derive symbolically for complex optimization problems. Instead, recent works have turned to automatic differentiation using differentiable programming to automatically compute gradients in problems such as 3D shape optimization [cascaval2021differentiable], aircraft design optimization [sharpe_thesis], robot design optimization [dawson2022architect1, du2021underwater]

, and machine learning 


Inspired by this trend, we implement using the JAX framework for automatic differentiation [jax2018github], yielding a differentiable simulation of the underlying autonomous system. For a system where the behavior is defined by continuous-time dynamics , implementing numerical integration in a differentiable language such as JAX allows us to automatically back-propagate through the simulator to find the gradients and . These gradients can typically be computed much more quickly using automatic differentiation than by finite-difference methods [dawson2022architect1].

We can use a similar differentiable programming approach to obtain gradients through the quantitative semantics of an STL specification. Before doing so, we must replace the discontinuous and operators used to compute with smooth approximations:

where is a smoothing parameter and . This differentiable relaxation was introduced in [Pant2017] and later used in [Pant2018, Pantazides2022]; [Leung2020] uses a slightly different approximation.

Using these smooth approximations, we implement the fast, linear-time algorithms for computing the robustness margin proposed by [donze13], using the JAX framework to enable efficient automatic differentiation. In contrast to [Leung2020], our method achieves computational complexity that is linear in the length of the state trace (the complexity of the stlcg framework in [Leung2020] is quadratic in for the operator).

By combining smooth approximations of STL quantitative semantics with differentiable programming, we can efficiently compute the gradients and . By combining these gradients with those found using differentiable simulation, we can efficiently compute the gradient of the objective with respect to both the design parameters and the adversary’s response . Usefully, our use of differentiable programming means that we are not restricted to considering trajectory planning separately from the design of a tracking controller, as in [Pant2018] and [Sun2022]. Instead, we can consider an end-to-end gradient that combines the planned trajectory and controller parameters in and optimizes them jointly (see Section V for an example of this end-to-end optimization). In the next section, we discuss how end-to-end gradients enable an iterative algorithm for counterexample-guided robust optimization.

Iv-B Counterexample-guided Optimization

To solve the planning problem in (1), we need to find a generalized Nash equilibrium between the planner and the adversary; i.e. values of and where neither we nor the adversary has any local incentive to change. A common solution strategy for such problems is the family of nonlinear Gauss-Seidel-type methods [Facchinei2007]. These methods solve max-min problems like (1) by alternating between and , tuning one set of parameters while keeping the others constant; i.e. alternating between the two optimization problems:


Although these methods are not guaranteed to converge, it is known that if they do, then the convergence point is a Nash equilibrium [Facchinei2007].

A risk of applying such a simple alternating scheme is that the nonlinear optimization for both and can easily get caught in local minima. Such local minima not only reduce the performance of the optimized plan, but also increase the risk of “overfitting” to a particular value of . This risk is particularly salient because the planner must commit to a choice of before the adversary has a final opportunity to choose . To mitigate this risk and improve the robustness of our optimized plan, we extend a standard Gauss-Seidel method with two ideas from the machine learning and optimization literature. First, we take inspiration from the success of domain randomization in robust machine learning [tobin2017]: instead of optimizing with respect to a single fixed , we can maintain a dataset and optimize the performance of across all of these samples:


Incorporating domain randomization into the Gauss-Seidel method has the potential to improve the robustness of the resulting equilibria, but it is relatively sample inefficient; it may require a large number of random samples . To address this sample inefficiency, we take inspiration from a second idea in the optimization and learning literature: learning from counterexamples [Chang2019]. The key insight here is that we can do better than simply randomly sampling ; we can use the values of found during successive iterations of the Gauss-Seidel process as high-quality counterexamples to guide the optimization of . This insight results in our counterexample-guided Gauss-Seidel optimization method, which is outlined in pseudocode in Algorithm 1.

Our algorithm proceeds as follows. We begin by initializing the dataset with i.i.d. examples , then we alternate between solving the two optimization problems in (3a). At each iteration, we add our current estimate of the adversary’s best response to the dataset, and we stop either when the algorithm reaches a fixed point (the adversary’s best response after solving (3a) is the same as the best response from the previous round) or when a maximum number of iterations is reached. As we show experimentally in Section V, this counterexample-guided optimization achieves a higher sample efficiency than simple domain randomization, in that it finds plans that are more robust to adversarial disturbance while considering a much smaller dataset. Although our use of nonlinear optimization means that our algorithm is not complete, we find empirically that it succeeds in finding a satisfactory plan in the large majority of cases.

It is important to note that this algorithm is fundamentally enabled by the automatic differentiation approach detailed in Section IV-A; without access to the gradients of it would be much more difficult to solve the subproblems in lines 1 and 1 of Algorithm 1. Although some previous approaches obtain gradients of STL satisfaction with respect to using standard trajectory optimization formulations, as in [Pant2018], we are not aware of any approaches that make use of gradients with respect to disturbance parameters. There has been some work on using counterexamples to guide mixed-integer planning [raman15], but our experiments in the next section demonstrate that these mixed-integer programs are intractable for long horizon problems. Specifically, we find that solving even a single mixed-integer program can take more than an hour, so solving multiple programs to derive counterexamples is not a practical solution. In the next section, we demonstrate that our gradient-based counterexample-guided approach outperforms these existing approaches, not only finding more robust plans but requiring substantially less computation time to do so.

Input: Starting dataset size
Input: Maximum number of iterations
Output: Optimized design parameters
Output: Dataset of counterexamples
1 examples sampled uniformly i.i.d. for  do
2       if  then
3            break
4       Append to
return ,
Algorithm 1 Counterexample-guided Gauss-Seidel method for solving robust planning problems

V Experiments

We validate our approach by means of two case studies involving the satellite rendezvous problem posed in [Jewison2016]. We benchmark against state-of-the-art planning algorithms to show the robustness and scalability benefits of our approach.

In this satellite rendezvous problem, the goal is to maneuver a chaser satellite to catch a target satellite. We can express this problem in the Clohessy-Wiltshire-Hill coordinate frame [Jewison2016], which assumes that the target’s orbit is circular and constructs a coordinate frame with the origin at the target, the -axis pointing away from the Earth, the -axis pointing along the target’s orbit, and the -axis pointing out of the orbital plane. In this frame, the chaser’s dynamics are approximately linear, with positions , , and velocities , , varying according to controlled thrust in each direction , , :