Reach-avoid problems are prevalent in many engineering applications, especially those involving strategic or safety-critical systems. In these situations, one aims to find a control strategy that guarantees reaching a desired set of states while satisfying certain state constraints, all while accounting for unknown disturbances, which may be used to model adversarial agents. Reach-avoid sets capture the set of states from which the above task is guaranteed to be successful. Reach-avoid problems are challenging to analyze due to the asymmetric goals of the control and disturbance, leading to non-convex, max-min cost functions [1, 2, 3]. Due to the complexity of the cost function, dynamic programming-based methods for computing reach-avoid sets on a grid representing a state space discretization, such as Hamilton-Jacobi (HJ) formulations, have been popular and successful [4, 1, 2].
One specific class of reach-avoid problems is the reach-avoid game, in which the system consists of two adversarial players or teams. The first player, the attacker, assumes the role of the controller, and aims to reach some goal. The other player, the defender, assumes the role of the disturbance, and tries to prevent the attacker from achieving its goal. In [5, 6], the authors analyzed the two-player game of capture-the-flag by formulating it as a reach-avoid game, and then obtaining optimal strategies and winning regions for each player. The authors in [7, 8] extended previous results by analyzing the multiplayer case in which each team has an arbitrary number of players. Other dynamic programming-based methods for stochastic systems also exist [9, 10]. Other important applications of reach-avoid problems include motion planning in the presence of moving obstacles [11, 12, 13, 14]. In particular, the multi-vehicle motion planning problem is analyzed from the perspective of reach-avoid problems and solved using dynamic programming in .
Due to the difficulties of analyzing reach-avoid problems, dynamic programming methods have enjoyed great success due to their optimality and generality when the state space is smaller than 6D. For larger state spaces, computational burden becomes the main challenge. To address this challenge, heuristics typically based on information pattern simplifications and multi-agent structural assumptions have been proposed. In, the authors consider an open-loop formulation of the reach-avoid game in which one of the players declares its strategy at the beginning of the game; in addition, the player’s dynamics are assumed to be single integrators in 2D. An open-loop framework is also used in  for pursuit-evasion. A semi-open-loop approach for 2D single integrator players, based on the idea of “path defense,” has been proposed in . In the context of multi-vehicle motion planning, priority-based heuristics have been used . More broadly, methods for analyzing reach-avoid problems as well as related multi-agent systems make trade offs among generality of system dynamics, generality of the problem set up, conservatism of solutions, and computational efficiency [18, 19, 20, 21, 22, 23, 24, 25].
Since the action of an opposing agent is modeled as a disturbance in the joint system, reach-avoid problems are closely related to robust planning, in which disturbance rejection is of primary concern. In this context, methods that produce value functions with Lyapunov-like properties have been very effective. When the system dynamics are nonlinear in general but have small state spaces, HJ methods are able to produce Lyapunov-like functions through dynamic programming [4, 2]. Computational complexity has been alleviated in specific scenarios using decomposition techniques [26, 27]; however, these are not applicable to general reach-avoid problems.
When the system dynamics and functions representing sets are polynomial, the search for Lyapunov-like functions can be done more efficiently by leveraging sum-of-squares (SOS) programs, which can be converted to semidefinite programs  and solved using standard optimization toolboxes . SOS programs involve checking whether polynomial functions can be written as SOS, which is a sufficient condition for non-negativity or positivity, and thereby establishing Lyapunov-like properties. In addition, complex problem statements involving sets and implications can be written as SOS constraints. SOS programs have been used extensively in robust planning with methods involving barrier certificates and robust funnels [30, 31, 32, 33]. Other methods that utilize nonlinear optimization also exist; for example,  and  utilize nonlinear optimization techniques for motion planning through dynamic environments for a single vehicle and a flock of vehicles, respectively.
Statement of contributions: In this paper, we propose a method for computing the reach-avoid set and synthesizing a feedback controller that is guaranteed to drive the system into a target set while staying out of an avoid set. Our approach combines dynamic programming and SOS optimization: the reach-avoid set, represented by a Lyapunov-like value function, is obtained backwards one time step at a time as in dynamic programming, and at each time step a SOS program is solved. Building on previous SOS-based work such as [32, 33], we explicitly encode the avoidance constraint so that a single value function guarantees both reaching and avoidance, as in HJ methods [15, 6, 8]. Compared to previous dynamic programming-based work such as [15, 6, 8], we trade off optimality of solution for computational complexity: although our method produces conservative reach-avoid sets, we are able to analyze systems with higher-dimensional state spaces. Unlike analytic approaches such as [17, 20, 22], our approach applies much more broadly to different problem setups and system dynamics. We demonstrate our method in simulations.
The remainder of this paper is structured as follows:
In Section II, we formulate the reach-avoid problem and provide some background about SOS programming.
In Section III, we derive the SOS constraints for computing the reach-avoid set based on its basic properties.
In Section IV, we propose a dynamic programming approach for solving the SOS program more efficiently.
In Section V, we numerically validate our theory by comparing our method to the HJ method, and present, to the best of our knowledge, the first 6D reach-avoid set computed using a general numerical method.
In Section VI, we conclude and provide suggestions for future directions.
Ii-a Problem Formulation
Consider a system which evolves according to its dynamics, given by the following ordinary differential equation:
where is the system state, is its control, and is the disturbance. We assume that the control must be of a time-varying state feedback form, , and that and are measurable. Importantly, we make no assumption on the disturbance other than that it is bounded. We denote the function spaces from which and are drawn as and , respectively.
We would like to compute the set of joint states from which the attacker wins the game of time horizon . This is captured by the reach-avoid set, defined as follows:
The avoid set is the set of states that the system must avoid while reaching the target. Note the following propety:
Final condition. .
Ii-B SOS Programming Background
In this section, we provide a brief introduction to SOS programs. For a more detailed discussion, please refer to  and . In this paper we will represent a set of states as : . This allows us to transform set-based constraints such as to constraints of the form in our proposed optimization problem. Note that such a constraint is generally non-convex.
When is a polynomial in , checking non-positivity over is still NP-hard . However, in this case the constraint can be relaxed to the SOS condition , where are polynomials. This condition is equivalent to , where
is a vector of polynomial basis functions up to less than or equal to half the degree of, and is a semi-definite matrix of appropriate size. Note that this constraint is satisfied by matching coefficients on the left- and right-hand sides. A short-hand for the above constraint is “”.
One is often interested in guaranteeing non-positivity over a subset of state space. For example, one may desire over the set where . A constraint in the form of where can be written as .
Iii Solution via SOS Programming
We now formulate a SOS program whose solution characterizes the reach-avoid set defined in Eq. (2), and provides a feedback controller that guarantees reaching and avoidance.
As in earlier literature , let be characterized by the sublevel set of some function :
Iii-a The Value Function
Lyapunov-like property. By definition of , if and , then there exists such that for all , for any arbitrarily small .
In terms of the value function, and with the convention , Property 2 becomes
This Lyapunov-like property states that if is not in and is on the boundary of , then there must some control , over which the SOS program will optimize, such that regardless of the chosen disturbance , will remain non-positive. The boundary of is described by the condition , and the non-positivity condition on ensures the continued non-positivity of .
Avoidance property. By the definition of , if and , then it cannot be in .
Iii-B Control Parametrization and Bounds
As long as is linear, the set is semi-algebraic for a given .
Iii-C The SOS Program
Putting all of the above into consideration, we arrive at the following optimization problem in Eq. (6), which maximizes the volume of the reach-avoid set while enforcing the constraints described above. We will describe how volume can be maximized in Section IV-C.
Iv Solving the SOS Program via Dynamic Programming
The optimization in (7) involves polynomials in continuous time and state space. In this section, we discretize time so that the problem can be solved in a dynamic programming fashion, one time step at a time, akin to what is done in HJ reachability. This way, we avoid optimizing over an entire time horizon, which is computationally expensive.
Iv-a Time Discretization
We define time samples . All quantities dependent on time are now indexed by : for example, and . In addition, we approximate111Discretization error can be reduced with higher order schemes. the derivatives of and with respect to :
Now, the optimization problem becomes (9).
Iv-B Dynamic Programming
The optimization in (9) involves decision variables in the entire time horizon. However, the structure of the optimization program allows us to break it down into smaller problems, each representing one time step. This allows the computational complexity of our proposed method to scale linearly with the number of time discretization points.
Given or with some arbitrary , the optimization program starts at and decrements after every optimization of the form
Iv-C Volume Maximization
Maximizing the volume of a polynomial sublevel set is potentially intractable . We therefore substitute the objective of the previously defined optimization problems with a heuristic. Using cost heuristics does not remove any guarantees from our approach.
Since represents , one heuristic for maximizing volume is to restrict to be SOS, minimize the integral of over a region of interest as in , and maximize as in . We will later also use to slightly relax the SOS program in Section IV-D. Similarly to what is described in , we write where represents the coefficients of and a monomial basis. This allows us to write the integral as a linear function of , making it amenable to SOS optimization:
Iv-D Implementation Details
Since the optimization (10) is bilinear and thus non-convex, we propose heuristics for obtaining useful solutions. This section describes the heuristics, provides the final optimization program (13), and presents Alg. 1 for solving it.
Value function invariance: One property that also arises from the definition of in Eq. (2) is as follows:
In terms of the value function, this implies that . Using a slack variable , we encoded this property as an additional soft constraint in Eq. (13e) to guide the optimization. The value of is minimized in the objective (13a) with weight . In our experience, without , the optimization does not reliably produce value functions that represent non-empty sets.
Region of interest: To facilitate the search for polynomials and constant , we relax the constraints so that they only apply in some region of interest , which can be chosen to either be large enough to contain the reach-avoid set , or in general contain the region of the state space that one wishes to consider. This is done by adding the terms , , , in the constraints (13b), (13c), (13d), (13e), respectively.
Initialization: To obtain a feasible initial guess, we introduce a slack variable to allow to be initially non-negative in (13b). Throughout alternations of the optimization, we minimize in the objective (13a). In addition, we increase the weight by some factor to drive the value of down, as described in Alg. 1. Once is below some threshold , we consider the solution to be numerically feasible and stop optimizing over . In practice, we are consistently able to obtain feasible solutions.
Final optimization problem: All of the above considerations lead to the final optimization program for each time step in (13). The bilinear terms are and in (13b), in (13d), and in (13e). Therefore, we can optimize the three sets of variables , , and in an alternating fashion, where . Note that as mentioned in Section IV-C, cannot be optimized at the same time as due to the volume maximization heuristic. The optimization algorithm is outlined in Algorithm 1.
where is a constant vector of weights, and
V Numerical Example: the Reach-Avoid Game
We now demonstrate our approach for computing reach-avoid sets by analyzing the reach-avoid game, which involves an attacker with state trying to reach a target while avoiding capture by the defender with state . Let the joint state be denoted . The joint dynamics are given in Eq. (1), where we use to model the control of the attacker, and to model the control of the defender.
For convenience, we will use to denote the positions of the attacker and the defender, respectively. In general will be subsets of the player states , respectively.
In the context of reach-avoid problems, the target set is the set of joint states such that the attacker is at the target:
The avoid set is the set of joint states in which the attacker is captured by the defender. In this paper we will assume that the attacker is captured by the defender when the positions of the two players are within some capture radius:
We now present reach-avoid set computations for two examples of reach-avoid games. The first example involves two single integrator players moving in 2D space; the joint state space dimension is 4D; we compare our computation results with those obtained from HJ reachability, which is the most general method that provides the optimal solution up to small numerical errors. The second example involves two kinematic car players moving in 2D; the joint state space dimension is 6D, and computation using HJ reachability is intractable.
V-a Two single integrator players
Consider the following player dynamics:
Traditionally, the reach-avoid set for these player dynamics can be computed using HJ reachability [6, 8]. Under special scenarios such as those in which players have the same maximum speed, analytic methods may be employed. Like HJ reachability, our SOS-based approach is applicable in a general setting. In this section, by comparing our results with those of HJ reachability, we demonstrate that our numerical results are conservative approximations of the optimal reach-avoid set, and therefore maintains reaching and avoidance guarantees.
For this example, we have chosen the maximum speeds to be , and the target set to be approximately a square of length centered at the origin, . The hyper-parameters of the optimization are , , , , . The maximum degree of was set to , that of set to , maxIter set to , the region of interest set to , and the time discretizations set to .
To visualize the 4D reach-avoid set , we fix the defender position and show 2D slices of over several values of in Figure 2. One can notice that the growth of over time is not uniform as expected. This can be attributed to the solution of the SOS program being suboptimal, since the problem is non-convex. However, it is important to note that given the constraints of the SOS program, any feasible solution offers reaching and avoidance guarantees.
Figure 3 shows computations of sliced at various defender positions . The outer magenta boundary is the computation result from HJ reachability, and represents the true reach-avoid set up to small numerical errors. The solid blue boundary is the computation result from our SOS-based method. HJ reachability is better suited for this smaller 4D system, as the optimal solution can be obtained. However, any state inside the set computed using SOS programming is inside the set computed using HJ reachability, which means our computation results, although conservative, maintain reaching and avoidance guarantees.
Computations for this example were done on a desktop computer with an Intel Core i7 2600K CPU and 16 GB of RAM. The SOS computations took approximately 17 minutes with the above parameters using the spotless toolbox  and Mosek , and the HJ computations took approximately 25 minutes on a grid with points using the level set toolbox  and the helperOC library . Computational time varies greatly with the maximum degree of polynomials chosen in the case of the SOS-based method, and with the number of grid points in the HJ method.
V-B Two kinematic car players
In this section, we demonstrate our method on a system involving two kinematic cars. The joint dynamics are
Here we apply our SOS-based approach to derive and certify a controller for one of the cars that enables it to reach a goal in state space no matter what control action the other car might perform. This is indeed useful in the context of, for example, designing a safe lane-following controller that makes no assumption on the policy of other drivers, except for control bounds. Even though our controller and reach-avoid set are once again quite conservative, they provide formal guarantees on a system that is typically too high-dimensional for treatment with state of the art approaches like HJ-based methods.
For this example, the maximum speed and turning rates of the attacker are and . We assume that the other car, the defender, will not try to collide with us on purpose and is therefore limited to speeds of and for its velocity and turn rate. Such an assumption can be made when one assumes, for example, that the other driver is simply trying to stay inside his lane. The target set, representing our desire to have the attacker stay on the road without colliding with the other car, is a circle in the middle of the road () on the right side of the region of interest (a box from to in , and from to in ,). We also bound and to be inside the interval , which allows us to use a Chebyshev approximation with an accuracy of for and for , namely:
The approximation makes our dynamics polynomial, and therefore amenable to SOS optimization. The hyper-parameters of the optimization are , , , , . The maximum degree of was set to , that of set to . The maximum number of iterations (maxIter) was set to , and the time discretizations set to .
The right column of Figure 4 shows slices of the reach-avoid slices for the states corresponding to , , and the indicated defender position. The left column shows the result of applying the resulting controller in simulation. Even though the sets are conservative, none of the constraints are violated in simulation. Also note that the policy used by the defender is simply to maximize its velocity in some direction. Importantly though, the controller returned by our algorithm is robust to any policy the defender might use. A video of this example is available online222https://www.youtube.com/watch?v=gUytdFHkjYY.
Vi Conclusions and Future Work
We presented a novel method for computing reach-avoid sets and synthesizing a feedback controller that guarantees reaching and avoidance when the system starts inside the set. Our method utilizes sum-of-squares (SOS) optimization to trade-off optimality of solution for computational complexity, allowing us to compute, for the first time to the best of our knowledge, 6D reach-avoid sets, although our solution is conservative. Combining SOS optimization with dynamic programming, we also greatly reduce the computational complexity of solving the SOS program.
Future work includes investigating ways to reduce both the degree of conservatism in our solutions and the computation time by improving the optimization algorithm. In addition, since our method is applicable to polynomial system dynamics and general problem setups, there is much room for the exploration of potential applications of our work. Lastly, we would like to test our algorithm through hardware implementation.
The authors would like to thank Anirudha Majumdar for his guidance on sum-of-squares programming. This work was supported by the Office of Naval Research YIP program (Grant N00014-17-1-2433) and by the Toyota Research Institute (TRI). This article solely reflects the opinions and conclusions of its authors and not ONR, TRI or any other Toyota entity.
-  K. Margellos and J. Lygeros, “Hamilton-Jacobi Formulation for Reach-Avoid Differential Games,” IEEE Trans. Autom. Control, vol. 56, no. 8, pp. 1849–1861, Aug. 2011.
-  J. F. Fisac, M. Chen, C. J. Tomlin, and S. S. Sastry, “Reach-avoid problems with time-varying dynamics, targets and constraints,” in Proc. ACM Int. Conf. Hybrid Syst.: Computation and Control, 2015.
-  I. Yang, “A dynamic game approach to distributionally robust safety specifications for stochastic systems,” Jan. 2017. [Online]. Available: http://arxiv.org/abs/1701.06260
-  I. Mitchell, A. Bayen, and C. Tomlin, “A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games,” IEEE Trans. Autom. Control, vol. 50, no. 7, pp. 947–957, Jul. 2005.
-  H. Huang, J. Ding, W. Zhang, and C. J. Tomlin, “A differential game approach to planning in adversarial scenarios: A case study on capture-the-flag,” in Proc. IEEE Int. Conf. Robotics and Automation, 2011.
-  ——, “Automation-Assisted Capture-the-Flag: A Differential Game Approach,” IEEE Trans. Control Syst. Technol., vol. 23, no. 3, pp. 1014–1028, May 2015.
-  M. Chen, Z. Zhou, and C. J. Tomlin, “Multiplayer reach-avoid games via low dimensional solutions and maximum matching,” in Proc. Amer. Control Conf., 2014.
-  ——, “Multiplayer Reach-Avoid Games via Pairwise Outcomes,” IEEE Trans. Autom. Control, vol. 62, no. 3, pp. 1451–1457, Mar. 2017.
-  P. Mohajerin Esfahani, D. Chatterjee, and J. Lygeros, “The stochastic reach-avoid problem and set characterization for diffusions,” Automatica, vol. 70, pp. 43–56, Aug. 2016.
-  B. HomChaudhuri, M. Oishi, M. Shubert, M. Baldwin, and R. S. Erwin, “Computing reach-avoid sets for space vehicle docking under continuous thrust,” in Proc. IEEE Conf. Decision and Control, 2016.
-  J. van den Berg, Ming Lin, and D. Manocha, “Reciprocal Velocity Obstacles for real-time multi-agent navigation,” in Proc. IEEE Int. Conf. Robotics and Automation, 2008.
-  A. Wu and J. P. How, “Guaranteed infinite horizon avoidance of unpredictable, dynamically constrained obstacles,” Autonomous Robots, vol. 32, no. 3, pp. 227–242, Apr. 2012.
-  A. Giese, D. Latypov, and N. M. Amato, “Reciprocally-Rotating Velocity Obstacles,” in Proc. IEEE Int. Conf. on Robotics and Automation, 2014.
-  M. Chen, S. Bansal, J. F. Fisac, and C. J. Tomlin, IEEE Trans. Control Syst. Technol.
-  Z. Zhou, R. Takei, H. Huang, and C. J. Tomlin, “A general, open-loop formulation for reach-avoid games,” in Proc. IEEE Conf. Decision and Control, 2012.
-  Shih-Yuan Liu, Zhengyuan Zhou, C. Tomlin, and K. Hedrick, “Evasion as a team against a faster pursuer,” in Proc. Amer. Control Conf., 2013.
-  M. Chen, Z. Zhou, and C. J. Tomlin, “A path defense approach to the multiplayer reach-avoid game,” in Proc. IEEE Conf. Decision and Control, 2014.
-  Z. Zhou, W. Zhang, J. Ding, H. Huang, D. M. Stipanović, and C. J. Tomlin, “Cooperative pursuit with Voronoi partitions,” Automatica, vol. 72, pp. 64–72, 2016.
-  M. Kothari, J. G. Manathara, and I. Postlethwaite, “Cooperative Multiple Pursuers against a Single Evader,” J. Intelligent & Robotic Syst., vol. 86, no. 3-4, pp. 551–567, Jun. 2017.
-  A. Pierson, Z. Wang, and M. Schwager, “Intercepting Rogue Robots: An Algorithm for Capturing Multiple Evaders With Multiple Pursuers,” IEEE Robotics and Automation Lett., vol. 2, pp. 530–537, Apr. 2017.
-  B. Xue, A. Easwaran, N.-J. Cho, and M. Franzle, “Reach-Avoid Verification for Nonlinear Systems Based on Boundary Analysis,” IEEE Trans. Autom. Control, vol. 62, no. 7, pp. 3518–3523, Jul. 2017.
-  W. Zha, J. Chen, Z. Peng, and D. Gu, “Construction of Barrier in a Fishing Game With Point Capture,” IEEE Trans. Cybernetics, vol. 47, no. 6, pp. 1409–1422, Jun. 2017.
-  R. Yan, Z. Shi, and Y. Zhong, “Escape-avoid games with multiple defenders along a fixed circular orbit,” in Proc. IEEE Int. Conf. Control & Automation, 2017.
-  W. Li, “Escape Analysis on the Confinement-Escape Problem of a Defender Against an Evader Escaping From a Circular Region,” IEEE Trans. Cybernetics, vol. 46, no. 9, pp. 2166–2172, Sep. 2016.
-  ——, “A Dynamics Perspective of Pursuit-Evasion: Capturing and Escaping When the Pursuer Runs Faster Than the Agile Evader,” IEEE Trans. Autom. Control, vol. 62, no. 1, pp. 451–457, Jan. 2017.
-  M. Chen, S. Herbert, and C. J. Tomlin, “Fast reachable set approximations via state decoupling disturbances,” in Proc. IEEE Conf. Decision and Control, 2016.
-  M. Chen, S. L. Herbert, M. S. Vashishtha, S. Bansal, and C. J. Tomlin, “Decomposition of Reachable Sets and Tubes for a Class of Nonlinear Systems,” IEEE Trans. Autom. Control (to appear). [Online]. Available: http://arxiv.org/abs/1611.00122
-  P. A. Parrilo, “Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization,” Ph.D. Dissertation, California Institute of Technology, 2000. [Online]. Available: http://resolver.caltech.edu/CaltechETD:etd-05062004-055516
-  Mosek APS, “The MOSEK optimization software,” http://www.mosek.com, 2010.
-  R. Tedrake, I. R. Manchester, M. Tobenkin, and J. W. Roberts, “LQR-trees: Feedback Motion Planning via Sums-of-Squares Verification,” Int. J. Robotics Research, vol. 29, no. 8, pp. 1038–1052, Jul. 2010.
-  A. J. Barry, A. Majumdar, and R. Tedrake, “Safety verification of reactive controllers for UAV flight in cluttered environments using barrier certificates,” in Proc. IEEE Int. Conf. Robotics and Automation, 2012.
-  A. Majumdar, A. A. Ahmadi, and R. Tedrake, “Control design along trajectories with sums of squares programming,” in Proc. IEEE Int. Conf. Robotics and Automation, 2013.
-  A. Majumdar and R. Tedrake, “Funnel libraries for real-time robust feedback motion planning,” Int. J. Robotics Research, vol. 36, no. 8, pp. 947–982, Jul. 2017.
-  F. Gao and S. Shen, “Quadrotor trajectory generation in dynamic environments using semi-definite relaxation on nonconvex QCQP,” in Proc. IEEE Int. Conf. Robotics and Automation, 2017.
-  J. Alonso-Mora, S. Baker, and D. Rus, “Multi-robot formation control and object transport in dynamic environments via constrained optimization,” Int. J. Robotics Research, vol. 36, no. 9, pp. 1000–1021, Aug. 2017.
-  E. A. Coddington and N. Levinson, Theory of Ordinary Differential Equations. New York: McGraw-Hill Inc., 1955.
-  F. Dabbene and D. Henrion, “Minimum volume semialgebraic sets for robust estimation,” arXiv preprint arXiv:1210.3183, 2012.
-  “spotless Library,” https://github.com/spot-toolbox/spotless, 2018.
-  I. M. Mitchell, “The Flexible, Extensible and Efficient Toolbox of Level Set Methods,” J. Scientific Computing, vol. 35, no. 2-3, pp. 300–329, Jun 2008.
-  “helperOC Library,” https://github.com/HJReachability/helperOC, 2018.