In competitive environments, commonly agents try to prevent opponents from achieving their goals, as in security [DBLP:conf/aips/BoddyGHH05, DBLP:conf/atal/PitaJMOPTWPK08], real-time strategy games [DBLP:journals/tciaig/OntanonSURCP13], or air combat [DBLP:conf/flairs/BorckKAA15]. There exist different approaches to frame and solve these kind of problems [DBLP:journals/ai/Carbonell81, DBLP:conf/iaw/Rowe03, DBLP:books/daglib/0040483, DBLP:conf/aaai/SpeicherS00K18]. However, most of these works assume the opponent’s goal is known a priori, which is not true in many real-world problems.
For example, consider a police control domain like the one shown in Figure 1, where two agents act concurrently in the same environment. A terrorist has committed an attack and wants to escape, while the police aim at stopping the terrorist before she leaves the city. The terrorist has carefully designed her escape plan buying bus tickets; she therefore needs to go to the bus station and take the bus. Before that, she needs to make a call to a partner. However, she is afraid that her phone is tapped by the police, so she needs to make the call from any of the phone booths distributed over the city. Once she reaches the bus station having made the call from a non-tapped phone, she will be out of reach of the police. On the other side, the police do not know which means of transport from a given known set (e.g. bus, train or plane) the terrorist is going to use to escape. The police can move around the city using a patrol car and set controls in the white tiles (blue tiles are obstacles, as a river). They can also tap the phone booths from the police station. The police have control over some of the city cameras, located at different key points around the city, which helps them to identify the terrorist’s executed actions. If we would like to use most of the existing techniques, we would need to group all the possible terrorist’s goals into one, thus rendering the problem unsolvable, i.e., no strategy would allow the police to prevent the terrorist from achieving any of the goals.
A different approach called domain-independent Counterplanning [DBLP:conf/ijcai/PozancoEFB18] was recently proposed to prevent opponents from achieving their goal in scenarios like the above, where that goal is unknown a priori. Counterplanning combines different techniques to generate counterplans: (1) planning-based goal recognition [DBLP:conf/ijcai/RamirezG09] to identify the actual opponent’s goal from a predefined set of candidate goals; (2) landmarks [DBLP:journals/jair/HoffmannPS04] to identify subgoals the opponent will need to achieve to reach its goal; and (3) classical automated planning to generate plans that delete these subgoals before the opponent stops needing them.
However, Pozanco et al. work presents a key drawback: the agent stands still observing the opponent until it infers the opponent’s goal, at which point it starts computing and executing a counterplan. This passiveness (reactiveness) of the agent renders many problems to be unsolvable, since at that point the opponent might be closer to her goal, and therefore no counterplan could prevent it from achieving the goal. This is the case of the scenario depicted in Figure 1, where current counterplanning techniques could not produce any valid counterplan. Another limitation of previous works was that they were generating plans in an off-line manner.
This paper overcomes these problems by making the agent proactive in on-line settings. A proactive agent would start executing actions that would allow her to be in a better spot for stopping the opponent once it infers her goal. In the case of Figure 1, the police patrol should start moving west regardless of the opponent’s first action. Thus, they would be closer on average to the potential opponent’s goals, therefore being closer to stop it. We argue that planning centroids [DBLP:conf/aips/PozancoEFB19, karpasCentroids] are these better spots, being the states towards the agent should start moving prior to inferring the opponent’s goal.
The main contributions of this paper are: (1) adaptation of previous counterplanning techniques to an on-line setting; and (2) definition of anticipatory strategies for counterplanning. Experimental results on several domains show the benefits of using anticipation over reactive behavior.
Automated Planning is the task of choosing and organizing a sequence of actions such that, when applied in a given initial state, it results in a goal state [DBLP:books/daglib/0014222]. Formally:
A single-agent strips planning task can be defined as a tuple , where is a set of propositions, is a set of instantiated actions, is an initial state, and is a set of goals.
A state consists of a set of propositions that are true at a given time. A state is totally specified if it assigns truth values to all the propositions in , as the initial state of a planning task. A state is partially specified (partial state) if it assigns truth values to only a subset of the propositions in , as the conjunction of propositions of a planning task. Each action is described by a set of preconditions (pre()), which represent literals that must be true in a state to execute an action, and a set of effects (eff()), which are the literals that are added (add() effects) or removed (del() effects) from the state after the action execution. The definition of each action might also include a cost (the default cost is one). The execution of an action in a state is defined by a function such that if pre(), and otherwise (it cannot be applied). The output of a planning task is a sequence of actions, called a plan, . The execution of a plan in a state can be defined as:
A plan is valid if . The plan cost is commonly defined as . A plan with minimal cost is called optimal. We will use to denote the optimal cost of reaching a goal state from a given state of a planning task. Finally, we will use the function Planner() to refer to an algorithm that computes a plan from a planning task .
When several agents act in the same environment, it becomes non-deterministic due to agents not knowing in advance which actions the other agents are executing. Classical plans are no longer valid solutions for these problems, and we need to extend these definitions. Following Bowling, Jensen and Veloso (DBLP:conf/ijcai/BowlingJV03), and Brafman and Domshlak (DBLP:conf/aips/BrafmanD08) formulations:
A multi-agent planning task is a tuple , where is a set of agents, is the set of propositions of agent , is the initial state of agent , is the set of actions agent can execute, and is the goal for agent .
The set of actions of each agent always includes a no-op action (). A solution to a MAP task will be plans that will be jointly executed. To ensure that joint actions have well-defined effects, it is necessary to impose concurrency constraints that model whether a set of actions can be performed in parallel. We will assume the propositions-based concurrency constraints introduced in PDDL 2.1 [DBLP:journals/jair/FoxL03], and only two agents executing one action at each time step. Two actions and can be applied concurrently in a state only iff , and they do not interfere [DBLP:journals/jair/FoxL03]. We will use to represent the joint execution of two actions.
The joint execution of two actions in a state results in a new state given by Equation 1.
Similarly, we define the joint execution of two plans as the iterative joint execution of the actions of those plans.
The joint execution of two plans in a state results in a new state given by Equation 2.
Following Cimatti et al. terminology (DBLP:journals/ai/CimattiPRT03) we define a strong plan in the context of two agents MAP as follows:
A plan that solves an agent’s planning task is a strong plan iff its joint execution with any sequence of actions that an agent with planning task can execute always achieves the goal :
Any other plan that does not meet this criteria will be weak.
Goal Recognition is the task of inferring another agent’ goals through the observation of its interactions with the environment. The problem has captured the attention of several computer science communities [DBLP:journals/ai/AlbrechtS18]. Among them, planning-based goal recognition approaches have been shown to be a valid domain-independent alternative to infer agents’ goals [DBLP:conf/ijcai/RamirezG09, DBLP:conf/aaai/KaminkaVA18, DBLP:journals/ai/PereiraOM20]. Ramírez and Geffner (DBLP:conf/aaai/RamirezG10) developed an approach that assumes observations are actions, and formally defined a planning-based goal recognition problem as:
A goal recognition problem is a tuple
where is a planning domain and initial conditions, is the set of possible goals
, , is an observation sequence with each being an action in , and is a prior probability distribution over the goals in
is a prior probability distribution over the goals in.
The solution to a goal recognition problem is a probability distribution over the set of goalsgiving the relative likelihood of each goal. In this work we assume that is uniform. We use Recognize() to refer to an algorithm that solves the goal recognition problem. This function returns the set of goals that are most probable according to the set of observations .
Planning centroids [DBLP:conf/aips/PozancoEFB19, karpasCentroids] are the states that minimize the average distance (cost) to a given set of possible goals. The setting is similar to the single-agent strips planning task with multiple goals, like in a goal recognition problem, and a weight associated to each of the goals, i.e., . The weighted average cost from a state to the possible goals is computed as:
where is a real number denoting the weight or importance of the goal . We adhere to the previous definition of planning centroids.
A state is a centroid state of a task iff (1) it is reachable from the initial state; and (2) it minimizes the weighted average cost to the set of possible goals .
Planning centroids can be computed exactly or approximately using different techniques [DBLP:conf/aips/PozancoEFB19, karpasCentroids]. We will use the function ExtractCentroids to refer to an algorithm that computes a set of centroids from such planning task.
In Automated Planning, landmarks were initially defined as sets of propositions that have to be true at some time in every solution plan [DBLP:journals/jair/HoffmannPS04]. Although computing all the landmarks of a planning task has been proven to be equivalent to solving the planning task [DBLP:journals/jair/HoffmannPS04], efficient techniques has been developed to find a large number of landmarks in reasonable time [DBLP:conf/aaai/RichterHW08, DBLP:conf/ecai/KeyderRH10]. We use ExtractLandmarks to refer to an algorithm that computes a set of landmarks from a planning task .
In this section we describe the counterplanning setting we assume. This setting is similar to the one used in previous works [DBLP:conf/ijcai/PozancoEFB18], and we will highlight any difference throughout the section.
We consider two planning agents acting concurrently in the same environment. A seeking agent, seek, which wants to achieve a goal; and a preventing agent, prev, which wants to stop the seeker from achieving its goal.
seek will try to achieve a goal through a rational (optimal or ’least suboptimal’) plan . Goal and plan will not change during the counterplanning episode.
prev’s goal is initially set to empty; hence, it does not have an initial plan. She will try to formulate a goal and a plan during the counterplanning episode to prevent seek from achieving .
prev knows seek’s model, but not seek’s actual goal or plan .
prev and seek models are coupled; i.e., they share some propositions . More specifically, prev can delete(add) some propositions appearing in seek’s actions preconditions.
prev knows a set of potential goals that seek might try to achieve, . seek actual goal is always within that set, .
Both agents have full observability of other’s actions.
Both agents only execute one action at each time step, and the duration of each action is one time step.
Most of these assumptions are common either in goal recognition research or real world applications. For instance, in most real world domains where counterplanning can be useful (e.g. police control, cyber security, strategy games,…), the preventing agent knows her enemy’s model and a set of potential goals that she is interested in. The rationality assumption is common in goal recognition research [DBLP:conf/ijcai/MastersS17]. Considering the above assumptions, we can formally define a counterplanning task.
A counterplanning task is defined by a tuple
is the planning task of seek.
is the planning task of prev.
is a set of observations in the form of executed actions that prev receives from the execution of seek’s plan
. The notation differentiates between observations (previously executed seek’s actions), , and future actions to be executed by seek, .
is the set of goals that prev currently thinks seek can be potentially pursuing.
The meaning of currently in the definition of indicates that this set changes according to the set of observations . In fact, given that we are assuming rational agent behavior to achieve its goal, the size of will tend to decrease with each observation . In other words, prev will consider less (or equal) seek’s potential goals as seek executes more actions of her plan.
As we have discussed, at the beginning of the counterplanning task prev has not performed any action (her goal and plan are empty). Therefore, the new composite state after receiving the set of observations is defined as . The solution to a counterplanning task is a preventing agent’s plan , namely a counterplan. We define valid counterplans111From now on we will use the terms counterplan and valid counterplans indistinctly. as follows:
A counterplan is a valid counterplan iff its joint execution from with the remaining of seek’s actual plan , results in a state from which seek cannot achieve any of the goals in , and therefore its actual goal . Formally:
Note that the definition of a valid counterplan is quite strict: it must prevent seek from achieving any of the goals prev thinks she is currently trying to achieve. Moreover, it will only be a valid counterplan with respect to the actual plan seek is executing , which prev does not know. Therefore, the validity of a counterplan can only be tested a posteriori. Going back to our running example, in the limit case where the terrorist has not started moving, , the police would need to compute a counterplan that blocks the achievement of any of the terrorist’s goals. In case such a counterplan does not exist, the police should wait until they infer the terrorist’s true goal by observing more actions. Other approaches would involve betting for one of the goals and setting a control at one of the stations. However, we are aiming at domains such as police control where we want to provide some guarantees about the opponent being stopped.
In automated planning, the only way of ensuring that a goal is not achievable is to thwart any of the planning landmarks involved in it (as a reminder, goals are landmarks by definition). If those landmarks cannot be achieved again as we are assuming here, this would prevent seek from achieving the goal regardless the plan it follows. prev does not know seek’s actual goal but a set of potential goals she might be trying to achieve .
Given a counterplanning task , we refer to the set of all the potential planning tasks that prev currently thinks seek might be solving as .
Therefore, prev must find a counterplan that deletes (or adds) any of the fact landmarks that are common in all the planning tasks in . We refer to this set of landmarks as counterplanning landmarks.
Given a counterplanning task , a fact in is a counterplanning landmark iff:
If is a positive literal, such that . If is a negative literal, such that ; and
prev can delete (add) applying less actions from than the last step of an optimal plan in which seek needs . Given , which contains all the optimal plans that achieve any of the goals ; and a function laststep(,) that returns the last step in which appears in any precondition of a plan :
This definition of counterplanning landmark is different from the one used by Pozanco et al. (DBLP:conf/ijcai/PozancoEFB18). In their work, a counterplanning landmark can be deleted by prev in less steps than seek can achieve it. However, that definition artificially restricts the number of counterplanning landmarks. For example, landmarks that are true in the initial state would have a cost of zero for seek, and therefore could not be considered as counterplanning landmarks. We propose a different definition that allows us to correctly compute all the counterplanning landmarks of a counterplanning task. We compute all the optimal plans that solve each of the planning tasks in , and extract the minimum last step amongst all the optimal plans in at which seek stops needing . This is needed in order to keep some stopping guarantees, considering a worst case seek agent following the plan that stops needing the landmarks as soon as possible. We refer to the set of counterplanning landmarks of a counterplanning task as CPL, and ExtractCPL() as the function that computes them. It returns tuples of the form , where is the cost of a plan that solves . As we will see next, prev will set any of these counterplanning landmarks as her goal , computing a counterplan that deletes (adds) it, making impossible for seek to achieve her goal .
We now adapt Pozanco et al. (DBLP:conf/ijcai/PozancoEFB18) Domain-independent Counterplanning (dicp) algorithm to an online setting. dicp is shown in Algorithm 1 with black lettering, while the teal lettering corresponds to the new components of the anticipatory counterplanning algorithm we will discuss later.
The algorithm receives a counterplanning task as input and returns preventing’s plan as output. The algorithm loops until no more observations are received, i.e., seek has reached its goal, or prev finds a counterplan to block seek. It first calls the Recognize function (line 5). Given a planning domain, initial conditions, a set of candidate goals, and a set of observations, this function updates the set of candidate goals . After that, it extracts the set of counterplanning landmarks (line 6, see Definition 11). If this set is not empty, a counterplan might exist and it proceeds to find it. Otherwise, the counterplanning task is unsolvable, i.e., no counterplan exists given the current observations, and we advance the simulator, i.e., update the composite state . This is done in lines 16-18, where it perceives the next seek’s action , inserts it to the set of observations received by prev , and updates the composite state .
If the set of counterplanning landmarks is not empty, it first selects a goal from CPL using the SelectGoal. As discussed by Pozanco et al. (DBLP:conf/ijcai/PozancoEFB18), there exist different ways of selecting this counterplanning landmark. For example, it could return the counterplanning landmark that is closer to prev, i.e., the one it can achieve with the lowest cost; or closer to seek, therefore stopping it as soon as possible. Finally, it uses a planner to compute a counterplan that achieves . Again, there exist different ways of generating the counterplan. For example, we could use a planner that only returns strong counterplans, i.e., counterplans that would guarantee the opponent is blocked; or optimal counterplans, that would guarantee minimal cost on the prev’s plan. If such (counter)plan exists, the algorithm will return it, concatenating it to the actions executed before by the preventing agent, . Otherwise, it removes that counterplanning landmark from CPL (line 12) and tries to select a new one to set it as . This is done until , in which case it advances the simulator (lines 16-18).
Many counterplanning tasks are not solvable given that seek is closer to all the landmarks involved in than prev (CPL). If this happens when seek has not even moved, i.e., a counterplanning task with , there is little prev can do to block its opponent. However, in some counterplanning episodes this prev’s handicap comes from the fact that prev stands still observing seek actions. Even if prev was able to stop seek at the beginning of the counterplanning episode (with ) now it cannot, since seek is now closer to all the landmarks.
We can see these two cases in Figure 2, which depicts two different counterplanning tasks using our police control running example. The counterplanning task depicted in the left figure is unsolvable regardless of prev actions. The opponent has not executed any action, and there is no counterplanning landmark that prev can falsify before seek stops needing it. On the other hand, the counterplanning task on the right would have a solution if the agent had started moving in the right direction at the same time as seek executed its first action, instead of standing still and watching.
Previous counterplanning works are reactive, assuming prev only generates a (counter)plan when it has inferred seek’s goal. We can see that in lines 15-18 of Algorithm 1, where no action is executed by prev () in case there are no common counterplanning landmarks among the most likely goals. In this work we want to go further, making prev proactive, i.e., it will start executing actions before inferring seek’s goal in a way that increases its chances of blocking the opponent once its goal has been inferred. The key insight is that we can use previous work on computing centroids to guide prev towards a reasonable state when it lacks any direction to move according to dicp. Hence, we have modified dicp to use planning centroids to drive prev’s actions until a counterplan is found. We call this new algorithm adicp (Anticipatory Domain-independent Counterplanning), and the two additions needed in Algorithm 1 are represented with teal lettering.
The first addition is line 7, where it calls a new function ExtractListOfCPL. This function computes the individual counterplanning landmarks, i.e., the counterplanning landmarks of each planning task , rather than only the common ones, i.e., those that appear in all . The second addition is line 14, where it calls a new function anticipate. This will make prev execute the action prescribed by the new anticipate function until a counterplan is found. anticipate returns the next action to be executed by prev based on the computation of the centroid of all the counterplanning landmarks. Algorithm 2 details how anticipate works.
It receives as input prev’s planning task , the list of individual counterplanning landmarks CPList, and the current composite state . The algorithm ranks the counterplanning landmarks in CPList to generate the set of candidate goals ExtractCentroids receives as input. This ranking is computed using the following formula in the Rank function (line 2). We use set as a function that maps a list into a set.
Hence, counterplanning landmarks appearing in multiple goals will have a higher weight in , and therefore ExtractCentroids will prioritize those regions of the state space. Finally, the algorithm calls GetFirstAction, which returns the action that prev can execute from that minimizes the cost (steps) of achieving the centroid. We compute only one action rather than a full plan to achieve the centroid since the plan might change in the next iteration of adicp after observing a new action executed by seek. This action will be the action returned by anticipate.
Let us exemplify how adicp works in the police control counterplanning task depicted in Figure 3.
The start of the counterplanning episode (and the algorithm) is shown in the left image. seek has not performed any action (no observation has been received), so all goals are equally likely. The algorithm then extracts the list and set of counterplanning landmarks. Since there is not common counterplanning landmark among the most likely goals (line 8), no valid counterplan can be produced yet, and the algorithm calls anticipate with the list of individual counterplanning landmarks CPList (line 14). The green cells depict the individual counterplanning landmarks. The size of CPList is , i.e., there are individual counterplanning landmarks since some are repeated, i.e., they are common to several goals. The counterplanning landmarks at each station appear only once in CPList. Therefore, it assigns them a weight of . On the other hand, the three counterplanning landmarks in the middle of the image ((not (free c3-2), (not (free c3-3)), (not (free c3-4))) appear twice in CPL, for seek escaping through the train and bus stations. Therefore, their associated weight is . This would be the set of weighted goals returned by the Rank function from the list of counterplanning landmarks:
Then, anticipate calls ExtractCentroids, which returns the next action that prev should execute: moving east from c5-2 to c4-2. This action makes the agent to be closer to most of the individual counterplanning landmarks. After that, the composite state is updated with seek’s next action and the action prescribed by anticipate, . This update ends the first iteration of adicp, and prev gets a new observation.
The second iteration is shown in the right image of Figure 3. seek has moved north, so now the most likely goals are the terrorist escaping through the bus and train stations. In this case there are counterplanning landmarks common to all the most likely goals (), and thus the algorithm can start counterplanning (which is shared with dicp). The first counterplanning landmark that prev can delete is also the closer to seek, (free c3-2), so its negation is set as . Finally, a plan to achieve is computed and the algorithm returns the counterplan . Neither this counterplan nor any other would have been existed if prev would not have moved towards the centroid of the counterplanning landmarks in the first place. In other words, this is a counterplanning task that dicp cannot solve, but solvable for adicp.
On the other hand, if seek’s goal would have been to escape through the airport, approaching the centroid would have been a bad decision, since the police patrol could have just moved south to block seek. It is even possible to artificially generate counterplanning tasks where approaching the centroid of the counterplanning landmarks renders the counterplanning task unsolvable. However, adicp will prescribe the strategy that maximizes the chances of stopping seek on average, as we will see in the next section.
We compare four different algorithms: (1) dicp; (2) adicp; (3) random-adicp, which is a variation of adicp that executes random actions in the anticipate function; and (4) random-goal-adicp, which executes actions to achieve a random goal from in the anticipate function. The four algorithms can be instantiated with many different combination of planners, goal recognition, landmarks extraction, and planning centroid techniques. However, for the sake of space we will run them with the same fixed configuration, leaving for future work the study of those different combinations. For the Recognize function, we will use Ramírez and Geffner (DBLP:conf/aaai/RamirezG10) probabilistic goal recognition approach. We use the seq-opt-lmcut configuration of Fast Downward [DBLP:journals/jair/Helmert06] to optimally solve the compiled goal recognition problems using the lmcut
admissible heuristic[DBLP:conf/aips/HelmertD09]. We also use that optimal planner for the Planner function. The (counter)plans in this case will be weak, and will only stop seek under some circumstances. For the ExtractCPL and ExtractListOfCPL functions, we use (1) the Keyder, Richter and Helmert (DBLP:conf/ecai/KeyderRH10) landmarks extraction algorithm as it is implemented in Fast Downward; and (2) the top-quality planner by Katz, Sohrabi and Udrea (DBLP:conf/aaai/KatzSU20) to compute all the optimal plans. Finally, we use Pozanco et al. (DBLP:conf/aips/PozancoEFB19) greedy centroid computation to get the next action to execute in the anticipate function.
|rovers & martians||dicp||-|
Comparison of the four counterplanning algorithms in the four considered domains. Numbers shown are all averages and standard deviations over the set of problems. Bold figures indicate the best performance in the given metric.
We compared the three algorithms in the following planning domains. For each domain, we randomly generated 100 counterplanning tasks.
police control. Our running example domain. Maps are grids with of obstacles and 10 randomly distributed booths. The set of candidate goals is .
painted blocks-words. In this domain a robotic arm (seek) is trying to build some words using a set of blocks. It can stack and unstack blocks as long as their top is not painted. There is another agent (prev) that can paint the top part of clear blocks, i.e., blocks that do not have other blocks on top of them. The paint buckets are randomly distributed over several connected rooms. prev is also randomly placed in one of these rooms. To paint a block, prev needs to have the paint and be in the room where seek is building the words. Problems in this domain contain blocks and rooms. Blocks are initially piled randomly into several towers. The set of candidate goals is , i.e., seek might be building five words. The words seek needs to build range from to blocks (letters).
rovers & martians. This is a game in which a robotic agent called rover (seek) has to conduct several experiments on Mars. It has to navigate to different locations, collect samples, analyze them and communicate the results. There is another agent, a martian (prev) that does not want intruders on its planet. It can destroy the rover’s experiments by stealing the samples or interrupting its communications. While the rover can only move through visible locations (i.e., may need to apply a set of actions to go from A to B), the martian can move between any two points of the map with just one action. Problems in this domain contain 10 locations and 6 samples, in addition to different number of objectives and cameras. Both agents are randomly distributed over the map. The set of candidate goals is , i.e., the rover might be trying to get/communicate the results of different experiments.
human resources. In this domain a company (seek) is trying to hire a set of recent graduates for a set of teams giving some budget limit over each team. The company can execute some actions to increase the available budget of a team. The graduates come from different universities, and have some preferences over the type of teams they would like to join, their salaries and their interviews’ availability. The company can establish contacts with universities in order to schedule interviews on different time slots and send offers to graduates. There is another company (prev) hiring, also interested in these graduates. It can schedule interviews with the graduates in the same time slots, as well as thwarting the contacts seek can try to achieve with universities. Problems in this domain contain 10 universities, 50 graduates, 4 teams and 5 time slots in which the interviews can be scheduled. Teams’ budgets and salaries are discretized in 10 consecutive bins. The set of candidate goals is , i.e., different graduates that seek might be trying to hire.
For all the counterplanning tasks, we randomly select one goal and set it as seek’s true goal. Then we compute an optimal plan to achieve it () using the Planner function. We measure the following quality and performance metrics:
: if seek is stopped, otherwise. To compute this number, we jointly execute from the beginning () and as returned by the algorithms. If the joint execution of both plans from does not allow seek to achieve its goal, .
: ratio of that seek can execute until it is blocked by prev. Lower numbers are better, and means that seek has not been blocked.
: counterplan length (cost) for prev.
: ratio of actions from that are part of , i.e., that were generated before the counterplan was computed.
: time to return a solution for a counterplanning task. We compute this time as the average over all the iterations of the algorithms.
For all the metrics (except for ) we only report the numbers in those cases where the given algorithm could find a valid counterplan, i.e., we do not report those metrics when the counterplanning task is unsolvable or when the generated (weak) counterplan is not valid. The experiments were run on Intel Core i5-8400 CPU 2.80GHz machines with a time limit of 600s and a memory limit of 8GB per algorithm iteration. Domains and problems are available upon request.
Table 1 summarizes the results of our evaluation. As we can see, adicp dominates the other algorithms in all domains without much overhead in terms of computation time. It is able to return valid counterplans () in more counterplanning tasks, also stopping seek before, i.e., allowing it to execute less part of its plan (). random-goal-adicp achieves comparable results in domains like rovers & martians. However their performance difference increases in others such as painted blocks-words, highlighting the fact that adicp’s strategy of approaching the centroid of the candidate goals is better than just starting to move towards one of the candidate goals. A key observation is that even random-adicp tends to behave better than dicp. Thus, in most counterplanning tasks, randomly moving until a counterplan is found is usually better than doing nothing as dicp does.
The average length of the plans of the three algorithms with an anticipatory component tends to be greater than those of dicp. This is specially noticeable in police control, where the counterplans generated by adicp triple the length of those generated by dicp. First, these algorithms solve more problems than dicp. And, second, the problems that could not be solved by dicp are the harder tasks, i.e., problems where ’s counterplanning landmarks are further away from prev’s initial state. The of prev’s plan with anticipatory actions differs across domains and algorithms, but it represents around - of the plan length.
Different approaches solve problems where agents try to prevent opponents from achieving their goals [DBLP:journals/ai/Carbonell81, DBLP:conf/iaw/Rowe03]. Stackelberg games [stackelberg1952theory] are among the most successful and prolific ones [DBLP:books/daglib/0040483], where the leader (defender) moves first, followed by the follower (attacker). A solution is an equilibrium pair that minimizes the defender’s loss given an optimal attacker (opponent) play. Stackelberg planning [DBLP:conf/aaai/SpeicherS00K18, DBLP:conf/aaai/TorralbaSKS021] has been recently introduced as a way of computing solutions to these games using automated planning. However unlike ours, these works assume (1) agents act sequentially, i.e., one after the other; and (2) the opponent’s goal is known a priori. In adversarial scenarios where agents execute concurrently, Jensen and Veloso (DBLP:journals/jair/JensenV00) present two algorithms: optimistic and strong cyclic adversarial planning. These algorithms return universal plans, which can be seen as policies that associate the best action to apply given a state for achieving the goal. However, they also assume the opponent’s goal is known, while we use planning-based goal recognition to infer it from a set of candidate goals.
In this paper, agents can reason about the goal they should pursue based on the environment [DBLP:conf/aaai/MolineauxKA10a]. The prev agent changes its goal at each step, approaching first the centroid of the counterplanning landmarks (which changes over time), to later change its behavior when it can compute a counterplan. We also provide agents with proactive behavior that allows them to start planning before their actual goal appears (as a reaction to the deduction of the opponent’s goal). This kind of anticipatory planning has been explored in the past in the context of single agent planning [DBLP:conf/aips/BurnsBRYD12, DBLP:journals/aicom/FuentetajaBR18] in settings where goals arrive dynamically and the planner must generate plans to start achieving them before they appear.
Conclusions and Future Work
In this paper we have shown that there are some counterplanning tasks that are unsolvable due to the preventing agent inactivity until it is able to infer its opponent’s goal. To overcome this limitation, we have introduced adicp, a new algorithm that combines a set of planning techniques (goal recognition, landmarks, and centroids) to yield proactive plans that greatly increase the chances of blocking an opponent in different adversarial planning domains.
In future work, we would like to study in more depth the impact of some key aspects in the performance of anticipation. For instance, deceptive seeking agents [DBLP:conf/aaai/KulkarniSK19] that actively try to mislead the preventer, as well as problem distribution (placement of initial state and goals) or domain definition (actions costs on both sides).