Adversarial Plannning

by   Valentin Vie, et al.
Penn State University

Planning algorithms are used in computational systems to direct autonomous behavior. In a canonical application, for example, planning for autonomous vehicles is used to automate the static or continuous planning towards performance, resource management, or functional goals (e.g., arriving at the destination, managing fuel fuel consumption). Existing planning algorithms assume non-adversarial settings; a least-cost plan is developed based on available environmental information (i.e., the input instance). Yet, it is unclear how such algorithms will perform in the face of adversaries attempting to thwart the planner. In this paper, we explore the security of planning algorithms used in cyber- and cyber-physical systems. We present two adversarial planning algorithms-one static and one adaptive-that perturb input planning instances to maximize cost (often substantially so). We evaluate the performance of the algorithms against two dominant planning algorithms used in commercial applications (D* Lite and Fast Downward) and show both are vulnerable to extremely limited adversarial action. Here, experiments show that an adversary is able to increase plan costs in 66.9 only removing a single action from the actions space (D* Lite) and render 70 of instances from an international planning competition unsolvable by removing only three actions (Fast Forward). Finally, we show that finding an optimal perturbation in any search-based planning system is NP-hard.



page 8

page 15


Marvin: A Heuristic Search Planner with Online Macro-Action Learning

This paper describes Marvin, a planner that competed in the Fourth Inter...

Secure Minimum Time Planning Under Environmental Uncertainty: an Extended Treatment

Cyber Physical Systems (CPS) are becoming ubiquitous and affect the phys...

Soft Goals Can Be Compiled Away

Soft goals extend the classical model of planning with a simple model of...

Une approche totalement instanciée pour la planification HTN

Many planning techniques have been developed to allow autonomous systems...

Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Autonomous Robots

Modern cyber-physical systems (e.g., robotics systems) are typically com...

Adaptive Sampling using POMDPs with Domain-Specific Considerations

We investigate improving Monte Carlo Tree Search based solvers for Parti...

Recognising Affordances in Predicted Futures to Plan with Consideration of Non-canonical Affordance Effects

We propose a novel system for action sequence planning based on a combin...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The science of planning is about creating a plan—a set of actions—to achieve a goal. Applications of planning algorithms are found in robotics, aerospace, and industrial processes, where they are used to find the most optimal solution to a given problem. For example, planning algorithms have been used for unmanned vehicles (Stentz and Hebert, 1995)

, natural language generation 

(Koller and Hoffmann, 2010), greenhouse logistics (Helmert and Lasinger, 2010), manufacturing (Bullers et al., 1980), network vulnerability analysis (Boddy et al., 2005), and navigation (Koenig and Likhachev, 2002).

Plans are, in most instances, sequences of steps called actions. Each action represents an atomic operation that an agent performs to achieve some sub-goal within the domain, e.g., moving a step forward or loading or unloading a device. A planning algorithm receives an input planning problem consisting of states and operations and outputs a sequence of actions, which, when executed from the initial state, gets the agent to a goal state. The plan cost is the sum of the costs of the actions in the plan. Planning algorithms are designed to minimize the plan cost (which can include multiple metrics such as distance traveled, resources, etc.). A common visualization is to create a graph with states as nodes and actions as edges connecting two states (Fig. 1).

In practice, planners (often called agents) are embedded in larger systems of components including sensors (e.g., motion detection, LIDAR), classification or logic systems (e.g., machine learning-based image recognition, monotonic reasoning), logic/software driven mechanical actuators (e.g., break systems, batteries), and embedded operating systems 

(Behere and Torngren, 2015; Zong et al., 2018). For example, autonomous vehicles simultaneously use local and global planners to make short-term (e.g., motion planning through a busy intersection) and long term (e.g., route planning) decisions on how to safely direct the vehicle to its destination (Schwarting et al., 2018). While the security of many of these components and the system as a whole have been explored in many contexts (Thing and Wu, 2016; Amoozadeh et al., 2015), few, if any, previous efforts have attempted to understand how these systems can be subverted by an adversary attacking the planning process.

This work considers an adversary attempting to subvert a system by attacking the planner with the goal of reducing the effectiveness of the system (e.g., inducing long indirect routing) or in the degenerate case preventing the successful completion of the plan entirely (e.g., preventing the vehicle from reaching its destination). Here, we explore a threat model in which an adversary is able to remove a number of actions from the set of actions in the action space available to the agent. We consider two adversarial strategies: one where the adversary perturbs the elements of the input planning instance (called a static or offline attack) and another where the adversary adaptively counteracts an executing plan (called an online

attack). Here we build upon security efforts at identifying worst case inputs to complex systems and algorithms, e.g., in adversarial machine learning 

(Carlini and Wagner, 2017; Papernot et al., 2016; Kurakin et al., 2016) or fuzzing (Chen and Chen, 2018; Aschermann et al., 2019).

The consequences of inefficient or unachievable plans (the outcome of an attack) can be dire. For example, a poorly designed motion plan in an autonomous vehicle can lead to unsafe conditions or accidents (Paden et al., 2016). Poor or sub-optimal planning in chemical manufacturing (called chemical production scheduling) can lead to production quality problems, induce equipment failures, or reduce efficiency (Kidam and Hurme, 2013). Planning failures in transportation systems or logistics can cause widespread outages and lead to travelers being stranded (Wall and Slider, 2019) (see Section 5.2 in which we simulate a realistic attack on the Munich airport planning systems). In short, we posit any system that depends on a viable and efficient plan can be undermined by an adversary with the ability to make small perturbations to the plan space. Similar to work in adversarial machine learning, the adversary’s goal to (a) find a perturbation that achieves the negative outcome (increasing plan execution cost) while (b) minimizing the size of the perturbation (number of actions removed).

Figure 1. State-space viewed as a graph with actions linking the different states. A planner finds a way to reach a goal state

In this paper, we develop and evaluate two adversarial algorithms that manipulate real-world planning systems to induce sub-optimal (i.e., more costly) plans. We introduce the


as an approximation function to predict the expected plan cost impact made by removing part of the action space (called an adversarial perturbation of the input planning instance). Several adversarial algorithms are presented, and intuition and examples are provided.

We empirically evaluate our approach on the D* Lite algorithm (Koenig and Likhachev, 2002) (a path-finding algorithm) and the Fast Downward planner (Helmert, 2006), two of the most widely used planning algorithms used on industrial systems. The D* Lite algorithm has been widely used for autonomous vehicle navigation for its capacity to adapt to changes in the environment. The original version (D*) was, for example, used by DARPA for its Unmanned Ground Vehicle program (Stentz and Hebert, 1995) and for Mars rover prototypes (Koenig and Likhachev, 2002). The Fast Downward planner is a classical planning system based on forward heuristic search (Helmert, 2006) and was used by two winning planners of the sequential-satisfying track to find low-cost plans in a fixed amount of time of the 9 International Planning Competition (IPC).

Our experiments of D* Lite planning show that 66.9% of randomly selected planning instances in a size maze111For reference, routing in this grid is equivalent to real world taxi route planning during periods of high congestion (e.g., New Years eve) in mid-town New York City (i.e., bounded by 8th Avenue (W), 59th (N), the East River (E), and Times Square(S)). have an increased cost or become unsolvable with a single perturbation (82.7% with two). Unsolvable is defined as the inability for the agent to find a valid path to the goal state. In a second set of experiments, we evaluate the performance of the window-heuristic on the international 2014 IPC competition planning instances. Interestingly, for some competition domains, we found that 70% of instances become unsolvable if an adversary can remove only three actions out of over 500 available.

We make the following contributions in this work:

  • We develop an algorithm to find adversarial changes in STRIPS-written tasks and path-finding problems. We use our window-heuristic to create a table of adversarial changes which we use to perturb an planning instance (Section 4).

  • The attacks are applied to two of the dominant planning systems used in commercial applications: D* Lite and the Fast Downward planner in real-world settings (Section 5).

  • The online attack achieved an 82% success rate at inducing cost while the offline attack reached up to a 100% success rate, depending upon the domain considered.

Section 6 further presents the high-level results of a extended survey of the security of fielded planning systems. In this, we explore the realism of the proposed threat model and provide concrete examples of how (and why) such perturbations can be achieved in real systems and applications such as autonomous vehicles, manufacturing, and data center management.

2. Background

Planning Algorithms - The objective of a planning algorithm is to output a plan containing different actions to achieve a goal. This objective can take different forms depending on the problem. For example, given a Rubik’s cube, the goal is to have one color on each face. For the air cargo transportation domain, the goal is to deliver all packages to their destination airports (Fig. 2). A planning algorithm outputs a sequence of actions which, when executed from the initial state , gets the agent to a goal state . A state is defined as the configuration of the environment. There can be as many states as the environment requires. For example, a Rubik’s cube contains more than 43 quintillion different states. If we call the transition function, then we have the following state trajectory: . The solution is said to be optimal if the total cost is minimized, where is the cost of action from state . The cost function is chosen by the user depending on the criterion under optimization, e.g., time, resources, etc.

We distinguish two categories of planning systems: online and offline. We define a planning system as offline when the state trajectory to the goal state is computed before the execution begins. The entire plan is computed beforehand, which leaves little room for adaptation but leaves enough time to optimize the solution. On the other hand, with an online planner, an agent can react to changes in the environment (miscalculation, error, obstacle discovery, etc.). The plan is updated as the agent moves. Examples of these types of planning systems are the Mars Rover’s navigation (offline) and a Tesla’s navigation system (online).

Figure 2. —air cargo transportation—: planning to deliver the packages to the destination with the minimum number of actions (—Load—, —Unload—, —Fly—).

[frame=single] (:action LOAD :parameters (?c - cargo ?p - plane ?a - airport) :precondition (and (At ?c ?a) (At ?p ?a)) :effect (and (In ?c ?p) (not (At ?c ?a)))) [frame=single] (:action LOAD_c1_p1_LAX :parameters (c1 - cargo p1 - plane LAX - airport) :precondition (and (At c1 LAX) (At p1 LAX)) :effect (and (In c1 p1) (not (At c1 LAX))))

Figure 3. (Top) Non-grounded —Load— action for —air cargo transportation— domain. (Bottom) A grounded —Load— operator.

STRIPS Notation - STRIPS is a standard language to describe classical planning problems. Simply put, a planning problem articulated in STRIPS is a triple . See Appendix A for an example of a STRIPS instance.

The first item is the initial state represented as a collection of variables known as atoms or predicates. For example, to specify that cargo1 is at LAX airport (Fig. 2), we would describe the state with the predicate (At, cargo1, LAX). The second element is a characterization of the goal state, defined as a set of predicates. In air cargo transportation goal is the final position of the cargo. The last item is a finite set of grounded actions. A grounded action is an action in which all parameters have been assigned to real objects, i.e., the action has been fully specified. A non-grounded action has at least one parameter not bound to a specific object. We call actions operators that can transform a state into (potentially) another state by impacting the cost of the plan. Only deterministic actions are considered, meaning there is no uncertainty about the outcome of an action. Three sets of predicates define an operator: parameters, preconditions, and effects. The parameters specify the objects’ type. As demonstrated in Figure 3, c has to be cargo, p a plane, and a an airport. The preconditions must be satisfied for the action to be executed. For example, in Figure 3, in order to load a unit of cargo into a plane, the cargo must be at the same airport as the plane. The effects impact the input state modifying its predicates, they define the consequences of the action on the state. The set can contain thousands of grounded operators. For example, in Figure 2, there would be one grounded Load operator for each combination of cargo, plane, and airport. Consequently, we mostly use non-grounded actions to define a planning task. For example, in Figure 3, a could be any airport (SFO, LAX, etc.). If two planning problems share the same non-grounded actions, they belong to the same domain, e.g., instances of the Rubik’s cube have different starting positions, but are always solved by rotating the faces (i.e., the same operators).

Finally, within the scope of this paper, we introduce an adversarial change as a grounded operator being removed from the set of operators , with the goal of increasing the cost. The goal for the adversary is to find grounded actions to remove from to increase the cost of the plan. In the offline attack, the removal happens before the agent starts planning whereas in the online attack, the removal happens while the agent is executing the plan. We consider to never exceed ten and can have an arbitrary size (usually thousands of grounded actions).

General Purpose Planners - Since we develop a general heuristic to find effective adversarial changes in any domain, we work with domain-independent planners. General purpose planners generally perform worse than domain-specific ones because they lack the domain-specific knowledge to prune the search for a solution (Junghanns and Schaeffer, 1999). However, they offer a convenient way to solve planning tasks without domain—and planning—expertise. These planners are divided into three main categories (Russell and Norvig, 2009). Currently, the most popular and effective approaches to solving a deterministic planning problem are: (1) performing a search using a planning graph (Blum and Furst, 1997), (2) translating it to a Boolean satisfiability (SAT) problem and using a SAT solver (Kautz and Selman, 1998), or (3) executing a forward or backward search in the state-space with a heuristic (Bonet and Geffner, 2001). We build the window-heuristic (Section 4) on top of the last category of planners: forward heuristic search planners. We also test the attack on forward heuristic search planners although our techniques apply to other classes of planners as well.

In more detail, the planners we focus on here perform a forward or backward search in the state-space: a graph where the nodes are the states and the directed edges are the grounded actions (See Fig. 1

). The goal is to find a minimum cost path from the initial state to the goal state in the state-space. The search is guided by a heuristic by estimating the distance from a particular state to the goal state. Heuristics can be extracted from the problem directly or can be specific to a domain. For path-finding problems, a common practice is to use the Manhattan distance or the Euclidean distance. The competition planner Fast Downward 

(Karen Scarfone, June 2014) can run with different search algorithms (e.g., A* and Best-first search) and different heuristics (e.g., FF heuristic and Additive heuristic). While this work focuses on forward/backward search, our preliminary analysis of other general-purpose planners suggests vulnerability to adversarial manipulation is a function of the instance and less on the specific planning algorithm. We defer that analysis to future work.

3. Threat Model

We develop an algorithm to adversely influence planning systems. Given a planning task, we want to output a set of adversarial changes to decrease the cost of the initial plan. The metric used to measure the cost of the plan changes between the different applications of planning systems. It can be the computation time, the algorithmic complexity, the resource requirements, or the cost function. For instance, the complexity is critical for online planning algorithms such as on-board planners in unmanned vehicles, where the energy resources are limited (McGann et al., 2007).

Specifically, an adversary will seek to come up with adversarial changes that will impact the cost of the plan. The computation time needed to find a plan may also be affected by the adversarial changes. Indeed, a planning system has to potentially explore deeper in the state-space to find an acceptable solution. The ultimate goal for the adversary is to make sure the agent will never reach the goal state without the agent knowing it. In this way, the cost of the plan is infinite, and the planner can loop almost forever. Note that perturbing a plan is not always feasible; it depends on the adversary’s capabilities and the instance considered.

3.1. Adversarial Capabilities

The strength of an adversary is defined by the information and capabilities at their disposal. We consider two kinds of attacks: online and offline. In the offline case, the adversary perturbs the input planning instance given to the agent’s planner. For example, given a task in the air cargo transportation domain, the agent needs to find a series of actions reaching the goal, without (Load cargo1 in plane1 at SFO) and (Fly plane1 from SFO to JFK). In the online case, the adversary removes actions from during the execution of the plan (while the agent is interacting with the environment).

We call the search heuristic used by the agent. Additionally, for an online planning system, we define as the current state of the agent at time and as the next state of the agent at time . These values do not exist for an offline planner because the plan is computed beforehand. We explore adversaries threat models including: () Agent’s Heuristic and Informed, Online - This adversary knows the search heuristic used by the agent’s planner (Euclidean, Manhattan, etc.). Knowing the next state of the agent, , means our adversary is informed. The adversary knows , , and . () Agent’s Heuristic, Online - This adversary knows the search heuristic used by the agent. However, is unknown. The adversary can only guess using the state with the best cost estimate given by . This guess can be incorrect if the agent’s search algorithm is non-deterministic. Here, the adversary knows , , and . () Black-Box, Online - This adversary does not know anything concerning the agent’s planner. Here, the adversary knows and . () Agent’s Heuristic, Offline - This adversary knows the heuristic used by the planning system of the agent. The adversary knows , , and . () Black-Box, Offline - This adversary does not know anything concerning the agent’s planner, only and . The adversarial changes are specified at the beginning of the agent’s computation. The agent needs to find a plan taking these changes into account.

4. Approach

This section presents the window-heuristic to find adversarial changes. The window-heuristic outputs a set of grounded actions to remove the instance () to increase the cost of the plan.

One could use the min-cut algorithm to find a minimum cut of the state space. Here, we would partition the initial state and the goal state(s). The min-cut algorithm removes edges, which represent grounded actions in the state-space. When all the edges from the cut are removed by an adversary, there would be no path between and , meaning no plan actually exists and the adversary has succeeded. Unfortunately, the min-cut algorithm does not guarantee to cut less than edges, i.e., less than the number of grounded actions that an adversary is able to prevent. The min-cut problem can be solved in polynomial time in the size of the input graph (Karger, 1993). However, in the general case, the state-space has an exponential number of nodes in the length of the task definition, thus making the min-cut algorithm inappropriate for finding adversarial examples.

One might alternately try all possible changes and keep the most adversarial one, i.e., brute-force. More formally, given a task we do the following: (1) Compute a plan for task . (2) For every action in this plan, compute a solution of . (3) Keep the most adversarial action , i.e., the one that maximally increases the cost of the initial plan. This brute-force approach guarantees to find the best adversarial change if the plan computed in (1) is optimal. However, it requires that we run a planner times, where is the length of the initial plan. This is not practical because the search for a solution is generally computationally intensive (Erol et al., 1995; Bylander, 1994).

Even the most sophisticated planners can fail to find an existing solution. Moreover, the brute-force approach only outputs a single adversarial change. In order to output two adversarial changes, an adversary would have to run the planner once for every pair of actions in the initial plan, i.e., . To output adversarial changes, the adversary would have to run their planner times. Additionally, if the adversary does not find a solution to the initial problem during step (1), it is impossible to run step (2). Hence, brute-forcing is not practical in the general case: it is computationally intensive and assumes that an adversary has a planner as equally sophisticated as the agent.

We develop an approximation algorithm to output adversarial changes. We formally show that finding the adversarial changes is at least NP-Hard (Appendix B). The window-heuristic enables us to bring planning instances of arbitrary size into a tractable scope. The intuition is as follows: our strategy is to store known successful attacks from smaller (tractable) problems and project them onto larger (intractable) planning instances. Similar methods have been shown to enhance planning algorithms by reducing the number of nodes explored (Culberson and Schaeffer, 1998). Here we model a large problem as a graph. Here, the goal for an adversary is to identify a small region of that graph (i.e., a window) without being able to see the entire graph. To achieve this, the adversary will walk through the large graph to observe if the current region matches a previously observed region. Once a match is found, the adversary executes a known attack on the region.

Formally, we define a window as a connected sub-graph of the state-space, parameterized by (a state-space is a graph where the nodes are the states and the directed edges are the grounded actions). Hence, a window contains nodes (i.e., states) and at least edges (i.e., grounded actions) as it is connected. Figure (c)c is a visual representation of a window for the air cargo transportation domain. The formulation of our attack is divided into two parts: a generation phase and an execution phase.

Window Generation - The adversary creates a table of windows, that are known to be adversarial (i.e., when applied, the cost of the plan increases). To create the table, the adversary generates several simpler planning instances (all from the same domain). The instances should be simple enough for the adversary to brute force them within the reduced state-space, i.e., run step (1) and (2) from the previously explained brute-force algorithm (Section 4). When the most adversarial action is found, we extract a window around it (Section 4.1 and 4.2).

Window-Heuristic Execution - Once the table is generated, the adversary is given a planning task and searches for adversarial changes with the window-heuristic (As described in Section 4.3). The adversary runs their own planning algorithm, computes a solution, and applies the window-heuristic. The adversary slides the windows from the table on the state-space. A match occurs when the windows are isomorphically equivalent (further discussed in Section 4.2), and thus, we output the associated grounded action (as shown in Fig. 7).

4.1. The Window-View

An adversary links a window and an adversarial change. We say we apply the window when we remove the grounded action associated with it from the set of grounded actions, . Intuitively, when we attack a planning instance, we apply a window when we see a matching one in the state-space of the arbitrary problem. Windows can take different shapes and sizes depending on the class domain.

For path-finding domains, we choose a window to be a local node view of the agent’s surroundings (Fig. (a)a, (b)b). We define a wall to be a node the agent cannot reach. The adversarial change associated with the view is a wall at the center of the window. We apply the window when we see the same arrangement of walls in the environment. In this sense, to apply a window means to add a wall at the center of the window.

For STRIPS-written problems, a window is a succession of states linked by actions. We apply a window when we see an equivalent succession of the first states in the state-space. The adversary applies a window by removing the last grounded action in that window from (i.e., the ). As shown in Figure (c)c, preventing (previously loaded) cargo from being unloaded is likely to be adversarial.

Figure 4. (a) An adversarial window in path-finding. Placing a wall where the “X” is will prevent the agent from reaching the goal if it approaches from the bottom or the right of the window. As the only way to reach the goal is left, the overall path length is likely increased. (b) the (non-adversarial) window is not likely to increase the cost of the plan. (c) An adversarial window (, states, actions) from the —air cargo— domain: removing the last action——Unload c5 from p2 at PHX— makes planning fail.

Window-size is chosen empirically. Ideally, it should be a function of the considered domain to maximize the success rate of the adversary. In the rest of this paper, we set for path-finding domains and

for STRIPS domains. Assuming a fixed number of entries in the table and a larger window size, finding an equivalent window is less likely to happen because it needs to be found over the entire sub-graph described by the window. On the other hand, with larger window sizes, the probability to perturb the plan when a match is found is increased. Indeed, the larger the window, the more alike the sub-graph on which we match the window has to be. The adversarial change within the window is more specialized and thus has a better chance to be adversarial.

4.2. The Table of Advantageous Windows

To create a table containing the most effective adversarial windows, the adversary generates several random simple problems and extracts the most adversarial windows with an exhaustive search. Then, the adversary adds them to the table only if no other equivalent window (i.e., isomorphic) is already in the table. Without an equivalence relation the adversary would end up with, potentially, an exponential number of entries in the table. The adversary can also limit the number of predicates in each node of a window using a normalization process. When we extract a window for STRIPS instances, we capture states. Each of those states contains hundreds of predicates to describe the entire environment. However, because the environment does not change drastically within a window, many predicates remain unchanged across the states. Finally, while creating the table, we also compute how frequently a window is adversarial. We can threshold the table to only keep the windows with the highest frequency. In doing so, we get a higher probability to increase the cost of the plan when we apply a window. Intuitively, adversarial windows that are frequently observed in smaller problems are more likely to increase the cost in larger problems.

Note that selecting random examples to generate windows is appropriate when the adversary has no knowledge of the instances expected at run time. In truth, table creation could be improved (perhaps vastly) by using examples of instances (or similar instances) likely to be encountered by the target at run time. Indeed, in practice an intelligent adversary would collect known instances and “train” the window heuristic generation to find advantageous windows representative of those encountered by the victim system. We leave actively investigating other training approaches to future work.

Graph Isomorphism - The predicates describing a state are grounded, meaning they do not contain any free variable. A state would contain the predicate (At, p1, JFK) instead of (At, plane, airport). This notation is dependent on how we choose to name the airports and the planes. We need an equivalence relation that does not rely on the objects’ names. Consider the following predicates (At, p1, JFK) and (At, plane1, airport1). Renaming p1 to plane1 and JFK to airport1 gives us the equivalence. For STRIPS-written tasks, we say that window is isomorphically equivalent to window if (1) there is a bijection between the name of the objects such that , plane1 p1 and (2) and share the same non-grounded operators (Fig. 5). For path-finding, we say that two windows are equivalent if there is a rotation such that .

Figure 5. Two equivalent windows for the —air cargo transportation— domain. The actions are similar (—Fly—, —Fly—, —Unload—) and there is a bijection between the objects. The bijection is the following: , , , , , and .

Normalizing - Normalizing is a process we only perform for STRIPS windows because path-finding states are not defined with predicates. We normalize to remove constant predicates in an extracted window. Consider a window () containing four states and three grounded actions extracted from the air cargo transportation domain. The planning task can contain an arbitrary number of plane objects. Each plane would need a predicate to indicate its position: (At, plane, airport). However, if a plane does not move, the predicates concerning its position do not change and are repeated across the four states. Thus, they are not relevant to the evolution of the environment within the four states. In Figure 6, none of the three actions affect plane p1, so we remove all the predicates concerning p1 in the window. The only predicates that are going to change between the four states are the ones modified (i.e., added, deleted) by the three grounded actions. More formally, we call the four states in the window . Each is a set of predicates. We introduce and we define . The are the new normalized states where we removed the redundant predicates.

Figure 6. The window before normalization (top) and after (bottom). All the predicates that weren’t modified by the three actions were removed, e.g., —(At, p1, PHX)— is removed.

Thresholding - With the normalizing process and isomorphisms, we drastically decrease the size and the number of entries needed in the table. We introduce the threshold in order to only keep the windows that appeared the most in the table. Indeed, a window might be adversarial for 50% of the problems, and another one might be adversarial for 1% of the problems. Still, both of those windows appear in the table with the same importance. To address this, once the table is filled, we empirically select only the most frequently observed windows. If a window is adversarial once over a thousand tasks, there is no need to record it—it is too specific.

The formal process to create the table is the following (Algorithm 1). (1) Generate several random simple tasks from of the domain considered. (2) Find the most malicious windows using a brute-force approach for all of those (line 1 to 1). (3) Extract a window around the adversarial change with the highest cost increase. (4) Normalize the window if needed and add it to the table (line 1 to 1). Finally, (5) threshold the table to only keep the most adversarial windows (line 1).

1 for  to  do
2        Generate a random problem ;
3        Solve problem with a planner;
4        Let be its cost and the plan;
5        Let be the best adversarial cost;
6        Let be the best adversarial action;
7        foreach action in the plan  do
8               Solve problem without the action allowed;
9               Update and ;
11        end foreach
12        if  then
13               Take a window around ;
14               Normalize the window ;
15               if  has an equivalent in the table then
16                      Add to the number of occurrences of ;
18              else
19                      Add to the table;
21               end if
23        end if
25 end for
26Threshold the table;
27 return the table;
Algorithm 1 Constructing a table of advantageous adversarial windows using N random simple problems. For each of those problem we extract the most adversarial window and add it to the table.

4.3. The Window-heuristic

We now explain how to use the table to run the window-heuristic and output adversarial changes. An adversary and the agent are given an arbitrary size problem . The goal for the agent is to find an optimal plan. The goal for an adversary is to output the best set of grounded actions to perturb the plan. We distinguish the offline case (STRIPS-written problems) from the online case (path-finding). Those two cases differ in the attack scenario: in the offline case, the goal for an adversary is to output adversarial changes before the agent starts planning. For the online case, an adversary applies these changes directly and perturbs the agent’s environment while the agent is planning. That means the agent can react (find another plan) to an adversary’s changes.

Offline Window-heuristic - This procedure is detailed in Algorithm 2. An adversary explores the state-space by running a separate planner, to find the solution to the input task (Recall, the adversary knows at least the initial and goal states). In doing so, an adversary expects to expand the state-space in the same direction(s) as the agent. If an adversary successfully predicts the state-space expansion, applying windows hindering that expansion will be adversarial for the agent. The search starts from and stops either when an adversary’s planner returns or when adversarial changes have been found. Essentially, when an adversary recognizes a window from the table during the state-space expansion, the adversarial change associated is applied. During the search, we extract a window around the next state to be expanded by the adversary’s planner: (line 2). We check if this window has a match in the table. If it does, we apply the adversarial change to the planning task (line 2). In Figure 7, we show how the adversarial change associated with the window recognized is removed from the plan.

Input: Number of adversarial changes allowed ;
                The initial state ;
                The goal state characterization ;
                The set of operators ;
                The advantageous windows table ;
Output: A set of grounded actions ;
1 = ;
2 while  and  do
3        Find with search algorithm;
4        Take a window around ;
5        Normalize ;
6        Search for an equivalent of in ;
7        if an equivalent has been found then
8               Extract the adversarial change from ;
9               Add it to the the set ;
10               Remove the action from ;
11               Find a new with search algorithm;
13        end if
14        =
15 end while
16return ;
Algorithm 2 Offline window-heuristic: Given an offline planning instance it outputs a set of adversarial changes.

Online Window-heuristic - For the online case, we do not need to pre-determine a set of adversarial changes. The adversarial changes are applied at run-time and the agent needs to adapt. The window-heuristic for an online planning task runs about the same way except an adversary does not run a separate planner. Indeed, there is no need to predict the agent’s entire state-space expansion because we assume an adversary knows at all times. is the position of the agent and is updated as the agent moves (i.e. executes actions in the environment). We only need to predict the next state to be visited by the agent where is defined as:


Each time the agent moves and updates , an adversary computes an estimation of (line 3). In order to estimate , an adversary uses the search heuristic . The state predicted by an adversary is ’s neighbor with the lowest image:


An estimation of is calculated, the adversary can apply the same mechanism as the offline case. If the adversary recognizes a window around , we apply the adversarial change (line 3).

Input: Number of adversarial changes allowed ;
                The current state of the agent ;
                The goal state characterization ;
                The set of operators ;
                The lookup table ;
1 ;
2 while  and  do
3        Estimate using , and ;
4        Take a window around ;
5        Normalize ;
6        Search for an equivalent of in ;
7        if an equivalent has been found then
8               Extract the adversarial change from ;
9               Apply the adversarial change;
10               ;
               // The agent has to re-plan.
12        end if
13       Wait for to change;
        // The agent has re-planned and moved.
15 end while
Algorithm 3 Online window-heuristic: Given an online planning instance it perturbs the agent’s plan at run-time.
Figure 7. An adversary expands the state space starting with the initial state and discovers and . Assume , then is the next state to expand. After discovering ’s successors, an adversary checks if the windows and have a match in the table. Assuming no match is found, it continues until a match is discovered at .

5. Evaluation

In this section, we evaluate our approach in several planners and domains for both online and offline attacks. Table 1 summarizes the experiments and results. We begin by exploring D*Lite. We ask:

Planning type Domain Agent’s planner Threat model Success Rate
Online Maze D* Lite Agent’s heuristic and Informed 66.86% - 82.65%
Online Maze D* Lite Agent’s heuristic 57.53% - 79.73%
Online Maze D* Lite Black-box 60.41% - 82.75%
Offline Barman Fast Downward Black-box 85.71%
Offline Floortile Fast Downward Black-box 95.00%
Offline Hiking Fast Downward Black-box 75.00%
Offline Tetris Fast Downward Black-box 70.59%
Offline Airport Fast Downward Black-box 100.00%
Offline Openstacks Fast Downward Black-box 100.00%
Offline Data-network Fast Downward Black-box 41.67%

Table 1. All threat models and domains considered for the evaluation. Ranges in the success rates signify the 1 wall to 2 wall success rate while concrete values are when an adversary has up to 4 grounded actions.

5.1. D*Lite

Figure 8. The three bar graphs show the success of an adversary placing walls in mazes. The threat models are the following (a) Agent’s heuristic and Informed (b) Agent’s heuristic (c) Black-box. The bars on the left (resp. right) describe the success rate of an adversary capable of placing one wall (resp. two walls).

We evaluate the success rate of the window-heuristic for an agent using the path-finding algorithm D*Lite(Koenig and Likhachev, 2002) in a maze. Here are the details of the experimental setup:

  • The agent uses D*Lite combined with the Euclidean distance to guide the search for a solution: . When the agent’s heuristic is unknown (Black-box scenario), an adversary uses the Manhattan distance: .

  • There is only one goal state in the maze (). The goal for the agent is to reach it from the initial state ().

  • The agent is not allowed to move in diagonals, the only actions authorized are to move up, down, left, or right.

The goal of an adversary is to select the optimal location in a maze to place a wall in order to maximally increase the length of the path. The first step is to create the table of windows explained in Section 4.2. Recall that the creation of the table is an offline process. To do so, we generate 500 random mazes (size - wall frequency ) and we brute-force the best adversarial wall. For each tile on the initial path, we try to place a wall and we compute the path cost of the modified maze. We extract a window around the most adversarial wall and we add it to the table. Appendix C, shows the empirical settings for the generation phase that gave the best experimental results (success rate, average path cost increase).

Once the table of advantageous adversarial windows is created, we can run the window-heuristic. The agent is given a maze instance and tries to reach a goal position. The adversary places walls as the agent moves increasing the number of steps required for the agent. We distinguish the three different online scenarios.

Agent’s Heuristic and Informed, Online - We start with a powerful adversary knowing and . Figure (a)a shows an increase the cost of the plan by adding wall in 66.86% of all cases. Note that not all mazes can have their cost perturbed; sometimes one wall is not enough if two or more disjoint but equally optimal paths exist. We reach a success rate of 82.65% when the adversary has the opportunity to place two walls in the way of the agent. On average, we increase the path by steps when the adversary adds wall and by steps when allowed to place up to walls.

Agent’s Heuristic, Online - This scenario is harder for the adversary and more realistic. The adversary only has access to the heuristic the agent is using, its current position, and the goal state. Since we do not know which way the agent is going in that case, the adversary has to estimate the next move of the agent using . With one wall we modify on average 57.53% (Fig. (b)b) of the plans and 79.73% with two walls (all sizes of mazes considered). As we can expect the efficiency drops, but by 5-10% which is still a notable success rate because statistically, the adversary still manages to increase the cost. We increase it by steps with wall and by with walls.

Black-Box, Online - The adversary does not have access to the heuristic the agent is using. The adversary has to create the table and predict the using a different heuristic. We choose . This will answer the question about the transferability (between and ) of the attack. The agent however solves the maze using the Euclidean distance. With wall the adversary achieves a success rate of 60.41% and increase the cost on average by steps (Fig. (c)c). With walls we have a high success rate of 82.75% and a cost increase of steps.

Takeaways - To summarize, we found that an adversary equipped with the window-heuristic is able to successfully increase the plan cost. The adversary needs little knowledge to perturb these kinds of path-finding problems. The difference between the success rate of the black-box and the informed scenario is small. Intuitively, the estimation of in the black-box scenario is often correct, given the limited number of actions the agent can take. Also, the window-heuristic scales well with the problem size and the success rate stays constant with different maze sizes.

All of the experiments here considered an online setting. We now move to an evaluation of the offline setting with an adversary running the heuristic before the agent starts to plan.

5.2. Fast Downward

We show the efficiency of the window-heuristic on diverse tasks solved by the Fast Downward (FD) planner. The planner solves STRIPS instances with a large variety of settings. The user can specify the search strategy (e.g., A*, Greedy search, hill-climbing…) and the search heuristic—called evaluator (e.g., FF, CEA, and Landmark-count). Fast Downward also translates the STRIPS instance into other data structures to enhance the search for a solution. The Fast Downward planner solves a planning instance in three phases: translation, knowledge compilation, and search. During the translation phase, the FD planner performs grounding of predicates and operators (Helmert, 2006), basically creating the set of grounded operators, . Concretely, our algorithm removes grounded actions from at translation phase. In practice, an adversary would have to prevent those actions.

The planner the adversary can run is a simple forward search planner running breadth-first search (BFS) or A* coupled with a single heuristic (Additive Cost  (Bonet and Geffner, 2001), etc.). This self-made planner run by the adversary is less sophisticated compared to FD. We evaluate an attack in which the adversary does not know how the FD planner works; only the domain specifications, and are known. This attack is considered to be black-box offline.

We evaluate the window-heuristic a in the following steps: () We generate tables as described in Table 2 () and run the FD planner to output a plan without the adversary interfering. The agent runs FD with the FF heuristic (Hoffmann and Nebel, 2001) and context-enhanced additive heuristic (Helmert and Geffner, 2008) with lazy best-first search and preferred operators.222This was one of the best configurations available according to the benchmarks run by Helmert in 2006 (Helmert, 2006). () We run the window-heuristic with the adversary’s planner to output adversarial change(s), i.e., the grounded operator(s) that will be removed from () We run FD again (same settings) with the limited set of operators. () Finally, we compare the cost of the plan with and without an adversary.

Domain Problem Threshold Window Algorithm
Barman 200 10 5 A* Additive cost
Floortile 200 1 35 A* Additive cost
Hiking 100 3 11 BFS
Tetris 200 1 24 A*
Airport 15 5 8 A* Additive cost
Openstacks 10 0 34 A* Additive cost
Data-network 15 0 19 A*
Table 2. The generation process’ settings across domains. The problem corresponds to the number of problems generated to fill the table, the window is the number of windows kept in the table, and the algorithm is the algorithm used by the adversary. is defined as .

We benchmark tasks and domains including several FD-based planners that performed the best during the 2014 IPC competition as well as several real-world applications:

  • The airport domain tasks aim to control the ground traffic at an airport. Airplanes must reach their destination gate. There is outbound and inbound traffic; the former are airplanes that must take off, the latter are airplanes that have just landed and have to park (Edelkamp et al., 2006). The instances we ran were based on the Munich Airport333Hatzack developed a realistic simulation tool, which he supplied to the IPC organizers to generate the domain instances. The simulator included Frankfurt, Zurich, and Munich airports. Frankfurt and Zurich proved too large for IPC purposes (Edelkamp et al., 2006)..

  • The data-network domain tackles distributed computing related problems. In a given network of servers, each server can produce data by processing some existing data and sends this to other servers on the network. The goal is to process data dispatched across servers with variable hardware and connection capabilities while minimizing the processing cost.

  • Finally, the Openstacks

    domain is based on the “minimum maximum simultaneous open stacks” combinatorial optimization problem. A manufacturer has orders for a combination of different products and can only make one product at a time, e.g., schedule order completion (NP-hard).

Figure 9. Domains: (a) —barman—, (b) —floortile—, (c) —hiking—, (d) —tetris— and (e) —data-network—. We benchmarked the window-heuristic using all the tasks available in the optimizing track. We show the success rate of the window-heuristic depending on the number of adversarial changes allowed (adversarial budget).

The results of our experiments are multifaceted. First, the most readily perturbed domains (i.e., most vulnerable to attack) were the ones where the goal cannot be divided into independent sub-goals. Assuming a planning instance can be divided into multiple independent sub-planning tasks with sub-goals, an agent can find a sub-plan to reach each sub-goal independently. The agent then assembles each sub-plan in an arbitrary order to create the final plan. An adversarial change is likely to perturb the sub-plan to achieve one of the sub-goals, but not the other sub-plans. For example, given a task in the air cargo transportation domain with two packages to deliver (c1, c2). Each delivery is a sub-goal: we can deliver c1 first and then c2 or the opposite. An adversarial change only perturbs one of the sub-goals; the agent’s planner just has to find another sub-goal-plan. This requires less work from the agent than finding another global plan. This is why the success rate of the data-network domain remains around 50-60%. On the other hand, tasks from the floortile and hiking domain cannot be divided into independent sub-goals and are consequently highly perturbed (Figure 9). Figure 9 excludes the airport and openstacks domains as they become unsolvable early on in the number of adversarial changes allowed.

Because the cost of removing an action from the plan may have real cost (see Section 6), the number of changes may represent and important success parameter. For example, in Figure 9, the data-network domain does not seem to be a successful attack (58.32% success rate with 4 grounded actions to remove). However, over few data-network instances, an adversary would increase the cost on average by units with four adversarial changes.

Takeaways - We find that an adversary can be successful without knowing the agent’s planner. The success rate of the adversary decreases if the tasks perturbed can be divided into sub-tasks with sub-goals. Overall, two adversarial changes are generally sufficient to efficiently perturb an agent’s task.

6. Realizing Attacks

One of the key questions one might ask is how an adversary can practically alter the plan instance. Here we highlight the results of our ongoing survey of the security of real-world planning systems and demonstrate scenarios in which planning instances can be (and have been) targeted by adversaries.

Transportation systems - Next-generation transportation systems use vehicle area networks to exchange motion, location and hazard information (e.g., V2X (35)). These vehicle to vehicle messages are used by internal planning systems in autonomous cars to determine how to make local and global decisions (Zong et al., 2018; Behere and Torngren, 2015). However, such messages can simply misreport the state of the environment, therein allowing selfish behavior (Petit and Shladover, 2015). Here, the misreports will alter the maneuver (action) space of the receiving victim cars, and therein alter the planning inputs as posited throughout. Anecdotally, a recent low-tech attack on route planning was demonstrated by a performance artist Simon Weckert, who created a virtual traffic jam by carrying 99 phones on an otherwise empty street. This fake “congestion” (perturbation of the plan space) was avoided by mapping software of those in the nearby area(Cox, 2020).

Motion planning - Motion planning in vision-based robotic systems is used at multi-scales to plan for the movement or manipulation of objects within the environment (Gupta and Pobil, 1998). However, it is known that vision systems used in robotics can be profoundly affected by changes in light. In particular darkness, shadows and reflection, possibly caused by an adversary, can inhibit the scene or object interpretation/perception which can vastly alter the plan space (Hollerbach et al., 1999).

Chemical manufacturing - Similar to other industrial manufacturing planning systems, chemical production scheduling is the process of scheduling, delivering and retrieving chemical components through a plant. Such planning is highly dependent on the correct understanding of the available resource inventory and equipment states as recorded in plant databases (Kidam and Hurme, 2013). An adversary who is able to compromise the DB server (through techniques such as phishing, APT, or exploiting host vulnerabilities) can alter the database to manipulate the planning of the production schedule.

Data center/cloud management - Data centers migrate virtual machines and containers to, among other goals, balance load, reduce resource usage, and provide isolation for sensitive computation. These migrations are most often coordinated using a discrete or continuous resource planner (Usmani and Singh, 2016). Any adversary who is able to occupy a VM host or surrounding infrastructure and generate network and/or computational load will change the resource signature and alter the data center planning instance.

8. Conclusion

This paper has explored adversarial capabilities in planning systems. We introduced an adversarial heuristic and algorithm for an adversary to identify malicious modifications to the environment that disrupt the plan to induce high cost or prevent the goal state from being reached. This algorithm can be adapted to any kind of deterministic, single-goal planning problem online or offline and scale with the size of the planning instance. For some domains the approach is successful in 60% to 95% instances if the adversary knows the desired goal and state of the agent.

In future work, we plan to explore defenses and measures of robustness. We will also explore more complicated planning domains (e.g., multiple goals, irreversible actions, etc.) and methods for training adversarial planning heuristics. As well as the practical impacts of manipulated planners in situ. By experimenting and measuring the real-world impacts of manipulated planners, we can understand the methods and degree to which planning system are vulnerable in the wild.


  • M. Amoozadeh, A. Raghuramu, C. Chuah, D. Ghosal, H. M. Zhang, J. Rowe, and K. Levitt (2015) Security vulnerabilities of connected vehicle streams and their impact on cooperative driving. IEEE Communications Magazine 53 (6), pp. 126–132. Cited by: §1.
  • N. Arshad, D. Heimbigner, and A. L. Wolf (2004) A planning based approach to failure recovery in distributed systems. In Proceedings of the 1st ACM SIGSOFT Workshop on Self-managed Systems, WOSS ’04, New York, NY, USA, pp. 8–12. External Links: Document, ISBN 1-58113-989-6, Link Cited by: Appendix D.
  • C. Aschermann, S. Schumilo, T. Blazytko, R. Gawlik, and T. Holz (2019) REDQUEEN: fuzzing with input-to-state correspondence.. In NDSS, Vol. 19, pp. 1–15. Cited by: §1.
  • A. Bar-Noy, S. Khuller, and B. Schieber (1995) The complexity of finding most vital arcs and nodes. Technical report University of Maryland at College Park. Note: Univ. of Maryland Institute for Advanced Computer Studies Report No. UMIACS-TR-95-96 Cited by: Appendix B.
  • C. Bazgan, T. Fluschnik, A. Nichterlein, R. Niedermeier, and M. Stahlberg (2018) A more fine-grained complexity analysis of finding the most vital edges for undirected shortest paths. CoRR abs/1804.09155. External Links: 1804.09155, Link Cited by: Appendix B.
  • S. Behere and M. Torngren (2015) A functional architecture for autonomous driving. In 2015 First International Workshop on Automotive Software Architecture (WASA), pp. 3–10. Cited by: §1, §6.
  • A. L. Blum and M. L. Furst (1997) Fast planning through planning graph analysis. Artificial Intelligence 90 (1), pp. 281 – 300. External Links: Document, ISSN 0004-3702, Link Cited by: §2.
  • M. S. Boddy, J. Gohde, T. Haigh, and S. A. Harp (2005) Course of action generation for cyber security using classical planning.. In ICAPS, pp. 12–21. Cited by: §1.
  • B. Bonet and H. Geffner (2001) Planning as heuristic search. Artificial Intelligence 129, pp. 5–33. Cited by: §2, §5.2.
  • W. I. Bullers, S. Y. Nof, and A. B. Whinston (1980) Artificial intelligence in manufacturing planning and control. A I I E Transactions 12 (4), pp. 351–363. Cited by: §1.
  • T. Bylander (1994) The computational complexity of propositional STRIPS planning. Artificial Intelligence 69 (1), pp. 165 – 204. External Links: Document, ISSN 0004-3702, Link Cited by: §4.
  • N. Carlini and D. Wagner (2017)

    Towards evaluating the robustness of neural networks

    In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: §1.
  • P. Chen and H. Chen (2018) Angora: efficient fuzzing by principled search. In 2018 IEEE Symposium on Security and Privacy (SP), pp. 711–725. Cited by: §1.
  • K. Cox (2020) How to virtually block a road: Take a walk with 99 phones. (en-us). External Links: Link Cited by: §6.
  • J. C. Culberson and J. Schaeffer (1998) Pattern databases. Computational Intelligence 14 (3), pp. 318–334. Cited by: §4.
  • S. Edelkamp, R. Englert, J. Hoffmann, F. dos S. Liporace, S. Thiébaux, and S. Trüg (2006) Engineering benchmarks for planning: the domains used in the deterministic part of IPC-4. J. Artif. Intell. Res. 26, pp. 453–541. Cited by: 1st item, footnote 3.
  • K. Erol, D. S. Nau, and V.S. Subrahmanian (1995) Complexity, decidability and undecidability results for domain-independent planning. Artificial Intelligence 76 (1), pp. 75 – 88. Note: Planning and Scheduling External Links: Document, ISSN 0004-3702, Link Cited by: §4.
  • K. Gupta and A. P. Pobil (1998) Practical motion planning in robotics: current approaches and future directions. John Wiley & Sons, Inc., USA. External Links: ISBN 047198163X Cited by: §6.
  • M. Helmert and H. Geffner (2008) Unifying the causal graph and additive heuristics. In Proceedings of the Eighteenth International Conference on Automated Planning and Scheduling, ICAPS 2008, Sydney, Australia, September 14-18, 2008, pp. 140–147. External Links: Link Cited by: §5.2.
  • M. Helmert and H. Lasinger (2010) The scanalyzer domain: greenhouse logistics as a planning problem. In Proceedings of the 20th International Conference on Automated Planning and Scheduling, ICAPS 2010, Toronto, Ontario, Canada, May 12-16, 2010, pp. 234–237. External Links: Link Cited by: §1.
  • M. Helmert (2006) The fast downward planning system. Journal of Artificial Intelligence Research 26, pp. 191–246. Cited by: §1, §5.2, footnote 2.
  • J. Hoffmann and B. Nebel (2001) The ff planning system: fast plan generation through heuristic search. J. Artif. Int. Res. 14 (1), pp. 253–302. External Links: ISSN 1076-9757, Link Cited by: §5.2.
  • J. M. Hollerbach, W. B. Thompson, and P. Shirley (1999) The convergence of robotics, vision, and computer graphics for user interaction. The International Journal of Robotics Research 18 (11), pp. 1088–1100. Cited by: §6.
  • A. E. Howe (1995) Improving the reliability of artificial intelligence planning systems by analyzing their failure recovery. IEEE Transactions on Knowledge & Data Engineering 7, pp. 14–25. External Links: Document, ISSN 1041-4347, Link Cited by: Appendix D.
  • A. Junghanns and J. Schaeffer (1999) Domain-dependent single-agent search enhancements. In Proceedings of the 16th International Joint Conference on Artifical Intelligence - Volume 1, IJCAI’99, San Francisco, CA, USA, pp. 570–575. External Links: Link Cited by: §2.
  • M. T. Karen Scarfone (June 2014) Description of participant planners of the deterministic track. The Eighth International Planning Competition. External Links: Link Cited by: §2.
  • D. Karger (1993) Global min-cuts in RNC and other ramifications of a simple mincut algorithm. In Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 21–30. External Links: Document Cited by: §4.
  • H. Kautz and B. Selman (1998) BLACKBOX: a new approach to the application of theorem proving to problem solving. In AIPS98 Workshop on Planning as Combinatorial Search, Vol. 58260, pp. 58–60. Cited by: §2.
  • K. Kidam and M. Hurme (2013) Analysis of equipment failures as contributors to chemical process accidents. Process Safety and Environmental Protection 91 (1), pp. 61 – 78. External Links: ISSN 0957-5820, Document, Link Cited by: §1, §6.
  • S. Koenig and M. Likhachev (2002) D*lite. In Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence, July 28 - August 1, 2002, Edmonton, Alberta, Canada., pp. 476–483. External Links: Link Cited by: §1, §1, §5.1.
  • A. Koller and J. Hoffmann (2010) Waking up a sleeping rabbit: on natural-language sentence generation with FF. In ICAPS, Cited by: §1.
  • A. Kurakin, I. Goodfellow, and S. Bengio (2016) Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236. Cited by: §1.
  • P. Masters and S. Sardina (2017) Deceptive path-planning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 4368–4375. External Links: Document, Link Cited by: Appendix D.
  • C. McGann, F. Py, K. Rajan, H. Thomas, R. Henthorn, and R. Mcewen (2007) T-REX: a model-based architecture for AUV control. The International Conference on Automated Planning and Scheduling. External Links: Link Cited by: §3.
  • [35] (2016-03) On-board system requirements for v2v safety communications. External Links: Document, Link Cited by: §6.
  • B. Paden, M. Čáp, S. Z. Yong, D. Yershov, and E. Frazzoli (2016) A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Transactions on Intelligent Vehicles 1. Cited by: §1.
  • N. Papernot, P. D. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami (2016)

    The limitations of deep learning in adversarial settings

    2016 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. Cited by: §1.
  • J. Petit and S. E. Shladover (2015) Potential cyberattacks on automated vehicles. IEEE Transactions on Intelligent Transportation Systems 16 (2), pp. 546–556. Cited by: §6.
  • S. Russell and P. Norvig (2009) Artificial intelligence: a modern approach. 3rd edition, Prentice Hall Press, Upper Saddle River, NJ, USA. External Links: ISBN 0136042597, 9780136042594 Cited by: §2.
  • W. Schwarting, J. Alonso-Mora, and D. Rus (2018) Planning and decision-making for autonomous vehicles. Annual Review of Control, Robotics, and Autonomous Systems 1. Cited by: §1.
  • A. Stentz and M. Hebert (1995) A complete navigation system for goal acquisition in unknown environments. Autonomous Robots 2 (2), pp. 127–145. External Links: Document, Link Cited by: §1, §1.
  • V. L. L. Thing and J. Wu (2016) Autonomous vehicle security: a taxonomy of attacks and defences. In 2016 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 164–170. Cited by: §1.
  • Z. Usmani and S. Singh (2016) A survey of virtual machine placement techniques in a cloud data center. Procedia Computer Science 78, pp. 491–498. External Links: Document Cited by: §6.
  • R. Wall and A. Slider (2019) ”U.s. airlines report delays caused by system fault: faa said it was aware that several airlines were experiencing issues with a flight-planning program”. The Wall Street Journal. Cited by: §1.
  • W. Zong, C. Zhang, Z. Wang, J. Zhu, and Q. Chen (2018) Architecture design and implementation of an autonomous vehicle. IEEE Access 6, pp. 21956–21970. Cited by: §1, §6.

Appendix A Sample task from the air cargo transportation domain

The air cargo transportation domain specifies a class of task where cargos must be delivered to a destination airport minimizing the cost of the plan. The domain is described by three different operator Load, Unload and Fly each with unit cost. The Load operator is used to load a cargo in an airplane, Unload is the inverse operator. Finally, Fly is used to freight a cargo from one airport to another when previously loaded into a plane.

[frame=single] (:action LOAD :parameters (?c - cargo ?p - plane ?a - airport) :precondition (and (At ?c ?a) (At ?p ?a)) :effect (and (In ?c ?p) (not (At ?c ?a))))

(:action UNLOAD :parameters (?c - cargo ?p - plane ?a - airport) :precondition (and (In ?c ?p) (At ?p ?a)) :effect (and (At ?c ?a) (not (In ?c ?p))))

(:action FLY :parameters (?p - plane ?from - airport ?to - airport) :precondition (and (At ?p ?from)) :effect (and (At ?p ?to) (not (At ?p ?from))))

Figure 10. The —air cargo transportation— domain.

Now we define a sample task belonging to the domain: we give the initial state and a specification of the goal state. Figure 11 describes an initial state with two cargos, one at SFO and one at JFK. The goal is to switch the position of those using the two planes.

[frame=single] (:objects p1 p2 - plane c1 c2 - cargo SFO JFK - airport) (:init (At c1 SFO) (At c2 JFK) (At p1 SFO) (At p2 JFK)) (:goal (and (At c1 JFK) (At c2 SFO)))

Figure 11. A problem from the —air cargo transportation— domain.

Appendix B Finding adversarial examples is NP-hard

Given a planning instance, we show that finding adversarial changes to increase the length of the optimal plan is an NP-Hard problem. We define the Adversarial Change Problem (ADVCP) as follows:
Input - A planning instance where is a set of grounded operators with non-negative cost and an integer .
Output - A set of grounded actions whose removal from maximize the cost increase of the optimal plan— adversarial changes.
The decision problem associated with ADVCP (D-ADVCP) is the following. “Given a planning instance and two integers , is there grounded actions whose removal from makes the length of the optimal plan at least ”.

We also introduce the Most Vital Arcs Problem (MVAP):
Input - A graph (directed or undirected), an integer , and two nodes . Each edge in has a non-negative cost of .
Output - A set of edges—arcs—whose removal maximize the length increase of the optimal path between and in .

The decision problem associated with the MVAP (D-MVAP) is the following. “Given , and , is there edges whose removal makes the length of the shortest path from to at least ?”.

We prove that ADVCP is an NP-Hard problem by reducing D-MVAP to D-ADVCP. D-MVAP is known to be NP-Hard [4] [5]. We first introduce the polynomial time reduction between instances of the two decision problems. In a second time, we show that the answer for an instance of D-ADVCP is “yes” if and only if the answer of the corresponding instance of D-MVAP is “yes”.

Suppose we are given an instance of the D-MVAP consisting of a graph , two integers and two nodes. Let and assume the nodes in are labelled . We create the following equivalent D-ADVCP instance (planning instance).

  • We define as the state with the following set of predicate: (Node, s)

  • A state will be recognized as a goal state if its predicates include the following set of predicates: (Node, t)

  • If node is connected to with an edge in , we add the following operator to . The cost of this operator is set to the same cost as the -to--edge cost.

    [xleftmargin=-4em,fontsize=,frame=single] (:grounded-action Oij :precondition (Node, i) :effect (and (Node, j) (not (Node, i))))

The resulting planning instance can be constructed in polynomial time in the size of and . We now show that the answer for an instance of the D-ADVCP is “yes” if and only if the answer of the corresponding D-MVAP instance is “yes”.

Assuming the answer is “yes” for a D-ADVCP instance where (Node, s) and (Node, t). Then removing grounded actions from makes the length of the optimal plan at least . We call the graph given by the inverse reduction. is exactly the state space graph generated by . The cost of ’s optimal plan is the same as the cost of an optimal path from to in . The graph , and : equivalent D-MVAP instance will be a “yes”.

If , and (D-MVAP instance) is a “yes”, there exist edges whose removal makes the length of the shortest path from to at least . The equivalent planning instance defined by the reduction has the same state space graph as . Then, removing the equivalent grounded actions from in will increase the cost of the optimal plan by at least .

Hence ADVCP is an NP-Hard problem. Note that we defined the ADVCP problem such that would be given as an input. However, in real-life, the set is not given, instead we give a set of non-grounded operators with a set of objects to ground them. Unfortunately, can grow exponentially with the number of non-grounded operators and the number of objects. In the end, finding adversarial changes is finding an NP-hard problem’s solution for an input with exponential size.

Appendix C Table generation parameters

Threat model Threshold Window Algorithm
AHI 0 231 D*Lite
Agent’s heuristic 0 231 D*Lite
Black-box 10 45 D*Lite
Table 4. The table generation process’ settings for the different scenarios in the Maze domain. The window corresponds to the number of windows kept in the table, the algorithm is the one used by the adversary, and the AHI is the agent’s heuristic and informed.

Appendix D Defenses

It is natural to ask what defenses would mitigate this kind of adversarial planning. Consider that the agent will always follow what seems to be the shortest way to reach the goal. If we were to implement a new kind of planner resisting adversarial changes, we would face two contradictory incentives. (1) Try to find the shortest plan to the goal state and the plan becomes predictable or (2) find a less predictable plan that can be worse in terms of resources than the initial one. The first trade-off is the one followed by most planners and we showed it was sensible to an adversary. The second choice is exactly what an adversary wants because the plan is in the end worse than the initial one. This tension between these two contradictory goals makes it hard to come up with a defense against adversarial changes.

Deception and secrecy might be the only way to prevent an adversary from interfering the plan. In [33], researchers worked on deceptive path planning: finding a path such that an observer cannot determine the goal the agent wants to reach until the last steps. Without a clear description of the goal state, an adversary is unable to predict the agent’s plan.

If an adversary still succeeds in finding an effective adversarial change, a way to limit its impact is to use failure recovery techniques. Howe [24] studied plan resilience and error recovery at planning and execution time. Arshad et al. [2] investigated failure recovery for distributed system using planning. Their approach automates failure recovery by defining an acceptable recovered state as a goal. Then their system runs a planner to get from the current failure state to the recovered state. However that planner can also be vulnerable to adversarial changes.