In its purest form, single-agent heuristic search is concerned with the problem of finding a least-cost path between two states (start and goal) in a state space given a heuristic function
that estimates the cost to reach the goal statefrom any state . Standard algorithms for single-agent heuristic search such as [KorfKorf1985] are guaranteed to find optimal paths if is admissible, i.e. never overestimates the actual cost to the goal state from , and their efficiency is heavily influenced by the accuracy of . Considerable research has therefore investigated methods for defining accurate, admissible heuristics.
A common method for defining admissible heuristics, which has led to major advances in combinatorial problems [Culberson SchaefferCulberson Schaeffer1998, HernádvölgyiHernádvölgyi2003, KorfKorf1997, Korf TaylorKorf Taylor1996] and planning [EdelkampEdelkamp2001], is to “abstract” the original state space to create a new, smaller state space with the key property that for each path in the original space there is a corresponding abstract path whose cost does not exceed the cost of . Given an abstraction, can be defined as the cost of the least-cost abstract path from the abstract state corresponding to to the abstract state corresponding to . The best heuristic functions defined by abstraction are typically based on several abstractions, and are equal to either the maximum, or the sum, of the costs returned by the abstractions [Korf FelnerKorf Felner2002, Felner, Korf, HananFelner et al.2004, Holte, Felner, Newton, Meshulam, FurcyHolte et al.2006].
The sum of the costs returned by a set of abstractions is not always admissible. If it is, the set of abstractions is said to be “additive”. The main contribution of this paper is to identify general conditions for abstractions to be additive. The new conditions subsume most previous notions of “additive” as special cases. The greater generality allows additive abstractions to be defined for state spaces that had no additive abstractions according to previous definitions, such as Rubik’s Cube, TopSpin, the Pancake puzzle, and related real-world problems such as the genome rearrangement problem described by genome. Our definitions are fully formal, enabling rigorous proofs of the admissibility and consistency of the heuristics defined by our abstractions. Heuristic is consistent if for all states , and , where is the cost of the least-cost path from to .
The usefulness of our general definitions is demonstrated experimentally by defining additive abstractions that substantially reduce the CPU time needed to solve TopSpin and the Pancake puzzle. For example, the use of additive abstractions allows the 17-Pancake puzzle to be solved three orders of magnitude faster than previous state-of-the-art methods.
Additional experiments show that additive abstractions are not always the best abstraction method. The main reason for this is that the solution cost calculated by an individual additive abstraction can sometimes be very low. In the extreme case, which actually arises in practice, all problems can have abstract solutions that cost 0. The final contribution of the paper is to introduce a technique that is sometimes able to identify that the sum of the costs of the additive abstractions is provably too small (“infeasible”).
The remainder of the paper is organized as follows. An informal introduction to abstraction is given in Section 2. Section 3 presents formal general definitions for abstractions that extend to general additive abstractions. We provide lemmas proving the admissibility and consistency of both standard and additive heuristics based on these abstractions. This section also discusses the relation to previous definitions. Section 4 describes successful applications of additive abstractions to TopSpin and the Pancake puzzle. Section 5 discusses the negative results. Section 6 introduces “infeasibility” and presents experimental results showing its effectiveness on the sliding tile puzzle and TopSpin. Conclusions are presented in Section 7.
2 Heuristics Defined by Abstraction
To illustrate the idea of abstraction and how it is used to define heuristics, consider the well-known 8-puzzle (the sliding tile puzzle). In this puzzle there are locations in the form of a grid and tiles, numbered 1–8, with the location being empty (or blank). A tile that is adjacent to the empty location can be moved into the empty location; every move has a cost of 1. The most common way of abstracting this state space is to treat several of the tiles as if they were indistinguishable instead of being distinct [Culberson SchaefferCulberson Schaeffer1996]. An extreme version of this type of abstraction is shown in Figure 1. Here the tiles are all indistinguishable from each other, so an abstract state is entirely defined by the position of the blank. There are therefore only 9 abstract states, connected as shown in Figure 1. The goal state in the original puzzle has the blank in the upper left corner, so the abstract goal is the state shown at the top of the figure. The number beside each abstract state is the distance from the abstract state to the abstract goal. For example, in Figure 1, abstract state is 2 moves from the abstract goal. A heuristic function for the distance from state to in the original space is computed in two steps: (1) compute the abstract state corresponding to (in this example, this is done by determining the location of the blank in state ); and then (2) determine the distance from that abstract state to the abstract goal. The calculation of the abstract distance can either be done in a preprocessing step to create a heuristic lookup table called a pattern database [Culberson SchaefferCulberson Schaeffer1994, Culberson SchaefferCulberson Schaeffer1996] or at the time it is needed [Holte, Perez, Zimmer, MacDonaldHolte et al.1996, Holte, Grajkowski, TannerHolte et al.2005, Felner AdlerFelner Adler2005].
Given several abstractions of a state space, the heuristic can be defined as the maximum of the abstract distances for given by the abstractions individually. This is the standard method for defining a heuristic function given multiple abstractions [Holte, Felner, Newton, Meshulam, FurcyHolte et al.2006]. For example, consider state of the sliding tile puzzle shown in the top left of Figure 2 and the goal state shown below it. The middle column shows an abstraction of these two states ( and ) in which tiles 1, 3, 5, and 7, and the blank, are distinct while the other tiles are indistinguishable from each other. We refer to the distinct tiles as “distinguished tiles” and the indistinguishable tiles as “don’t care” tiles. The right column shows the complementary abstraction, in which tiles 1, 3, 5, and 7 are the “don’t cares” and tiles 2, 4, 6, and 8 are distinguished. The arrows in the figure trace out a least-cost path to reach the abstract goal from state in each abstraction. The cost of solving is 16 and the cost of solving is 12. Therefore, is 16, the maximum of these two abstract distances.
2.1 Additive Abstractions
Figure 3 illustrates how additive abstractions can be defined for the sliding tile puzzle [Korf FelnerKorf Felner2002, Felner, Korf, HananFelner et al.2004, Korf TaylorKorf Taylor1996]. State and the abstractions are the same as in Figure 2, but the costs of the operators in the abstract spaces are defined differently. Instead of all abstract operators having a cost of 1, as was the case previously, an operator only has a cost of 1 if it moves a distinguished tile; such moves are called “distinguished moves” and are shown as solid arrows in Figures 2 and 3. An operator that moves a “don’t care” tile (a “don’t care” move) has a cost of 0 and is shown as a dashed arrow in the figures. Least-cost paths in abstract spaces defined this way therefore minimize the number of distinguished moves without considering how many “don’t care” moves are made. For example, the least-cost path for in Figure 3 contains fewer distinguished moves (9 compared to 10) than the least-cost path for in Figure 2—and is therefore lower cost according to the cost function just described—but contains more moves in total (18 compared to 16) because it has more “don’t care” moves (9 compared to 6). As Figure 3 shows, 9 distinguished moves are needed to solve and 5 distinguished moves are needed to solve . Because no tile is distinguished in both abstractions, a move that has a cost of 1 in one space has a cost of 0 in the other space, and it is therefore admissible to add the two distances. The heuristic calculated using additive abstractions is referred to as ; in this example, . Note that is less than in this example, showing that heuristics based on additive abstractions are not always superior to the standard, maximum-based method of combining multiple abstractions even though in general they have proven very effective on the sliding tile puzzles [Korf FelnerKorf Felner2002, Felner, Korf, HananFelner et al.2004, Korf TaylorKorf Taylor1996].
The general method defined by Korf, Felner, and colleagues [Korf FelnerKorf Felner2002, Felner, Korf, HananFelner et al.2004, Korf TaylorKorf Taylor1996] creates a set of additive abstractions by partitioning the tiles into disjoint groups and defining one abstraction for each group by making the tiles in that group distinguished in the abstraction. An important limitation of this and most other existing methods of defining additive abstractions is that they do not apply to spaces in which an operator can move more than one tile at a time, unless there is a way to guarantee that all the tiles that are moved by the operator are in the same group.
An example of a state space that has no additive abstractions according to previous definitions is the Pancake puzzle. In the -Pancake puzzle, a state is a permutation of tiles () and has successors, with the successor formed by reversing the order of the first positions of the permutation (). For example, in the 4-Pancake puzzle shown in Figure 4, the state at the top of the figure has three successors, which are formed by reversing the order of the first two tiles, the first three tiles, and all four tiles, respectively. Because the operators move more than one tile and any tile can appear in any location there is no non-trivial way to partition the tiles so that all the tiles moved by an operator are distinguished in just one abstraction. Other common state spaces that have no additive abstractions according to previous definitions—for similar reasons—are Rubik’s Cube and TopSpin.
The general definition of additive abstractions presented in the next section overcomes the limitations of previous definitions. Intuitively, abstractions will be additive provided that the cost of each operator is divided among the abstract spaces. Our definition provides a formal basis for this intuition. There are numerous ways to do this even when operators move many tiles (or, in other words, make changes to many state variables). For example, the operator cost might be divided proportionally across the abstractions based on the percentage of the tiles moved by the operator that are distinguished in each abstraction. We call this method of defining abstract costs “cost-splitting”. For example, consider two abstractions of the 4-Pancake puzzle, one in which tiles 0 and 1 are distinguished, the other in which tiles 2 and 3 are distinguished. Then the middle operator in Figure 4 would have a cost of in the first abstract space and in the second abstract space, because of the three tiles this operator moves, two are distinguished in the first abstraction and one is distinguished in the second abstraction.
A different method for dividing operator costs among abstractions focuses on a specific location (or locations) in the puzzle and assigns the full cost of the operator to the abstraction in which the tile that moves into this location is distinguished. We call this a “location-based” cost definition. In the Pancake puzzle it is natural to use the leftmost location as the special location since every operator changes the tile in this location. The middle operator in Figure 4 would have a cost of in the abstract space in which tiles 0 and 1 are distinguished and a cost of in the abstract space in which tiles 2 and 3 are distinguished because the operator moves tile 2 into the leftmost location.
Both these methods apply to Rubik’s Cube and TopSpin, and many other state spaces in addition to the Pancake puzzle, but the heuristics they produce are not always superior to the heuristics based on the same tile partitions. The theory and experiments in the remainder of the paper shed some light on the general question of when is preferable to .
3 Formal Theory of Additive Abstractions
In this section, we give formal definitions and lemmas related to state spaces, abstractions, and the heuristics defined by them, and discuss their meanings and relation to previous work. The definitions of state space etc. in Section 3.1 are standard, and the definition of state space abstraction in Section 3.2 differs from previous definitions only in one important detail: each state transition in an abstract space has two costs associated with it instead of just one. The main new contribution is the definition of additive abstractions in Section 3.3.
The underlying structure of our abstraction definition is a directed graph (digraph) homomorphism. For easy reference, we quote here standard definitions of digraph and digraph homomorphism [Hell NesetrilHell Nesetril2004].
A digraph is a finite set of vertices, together with a binary relation on The elements of E are called the arcs of G.
Let and be any digraphs. A homomorphism of to , written as f : is a mapping : such that whenever .
Note that the digraphs and may have self-loops,
, and a homomorphism is not required to be surjective in either vertices or arcs. We typically refer to arcs as edges, but it should be kept in mind that, in general, they are directed edges, or ordered pairs.
3.1 State Space
A state space is a weighted directed graph where is a finite set of states, is a set of directed edges (ordered pairs of states) representing state transitions, and is the edge cost function.
In typical practice, is defined implicitly. Usually each distinct state in corresponds to an assignment of values to a set of state variables. and derive from a successor function, or a set of planing operators. In some cases, is restricted to the set of states reachable from a given state. For example, in the 8-puzzle, the set of edges is defined by the rule “a tile that is adjacent to the empty location can be moved into the empty location”, and the set of states is defined in one of two ways: either as the set of states reachable from the goal state, or as the set of permutations of the tiles and the blank, in which case consists of two components that are not connected to one another. The standard cost function for the 8-puzzle assigns a cost of to all edges, but it is easy to imagine cost functions for the 8-puzzle that depend on the tile being moved or the locations involved in the move.
A path from state to state is a sequence of edges beginning at and ending at . Formally, is a path from state to state if where and . Note the use of superscripts rather than subscripts to distinguish states and edges within a state space. The length of is the number of edges and its cost is . We use to denote the set of all paths from to in .
The optimal (minimum) cost of a path from state to state in is defined by
A pathfinding problem is a triple , where is a state space and , with the objective of finding the minimum cost of a path from to , or in some cases finding a minimum cost path such that . Having just one goal state may seem restrictive, but problems having a set of goal states can be accommodated with this definition by adding a virtual goal state to the state space with zero-cost edges from the actual goal states to the virtual goal state.
3.2 State Space Abstraction
An Abstraction System is a pair where is a state space and is a set of abstractions, where each abstraction is a pair consisting of an abstract state space and an abstraction mapping, where “abstract state space” and “abstraction mapping” are defined below.
Note that these abstractions are not intended to form a hierarchy and should be considered a set of independent abstractions.
An abstract state space is a directed graph with two weights per edge, defined by a four-tuple .
is the set of abstract states and is the set of abstract edges, as in the definition of a state space. In an abstract space there are two costs associated with each , the primary cost and the residual cost . The idea of having two costs per abstract edge, instead of just one, is inspired by the practice, illustrated in Figure 3, of having two types of edges in the abstract space and counting distinguished moves differently than “don’t care” moves. In that example, our primary cost is the cost associated with the distinguished moves, and our residual cost is the cost associated with the “don’t care” moves. The usefulness of considering the cost of “don’t care” moves arises when the abstraction system is additive, as suggested by Lemmas 3.6 and 3.10 below. These indicate when the additive heuristic is infeasible and can be improved, the effectiveness of which will become apparent in the experiments reported in Section 6.
Like edges, each abstract path in has a primary and residual cost: , and .
An abstraction mapping between state space and abstract state space is defined by a mapping between the states of and the states of , , that satisfies the two following conditions.
The first condition is that the mapping is a homomorphism and thus connectivity in the original space is preserved, i.e.,
In other words, for each edge in the original space there is a corresponding edge in the abstract space . Note that if and then a non-identity edge in gets mapped to an identity edge (self-loop) in . We use the shorthand notation for the abstract state in corresponding to , and for the abstract edge in corresponding to .
The second condition that the state mapping must satisfy is that abstract edges must not cost more than any of the edges they correspond to in the original state space, i.e.,
As a consequence, if multiple edges in the original space map to the same abstract edge , as is usually the case, must be less than or equal to all of them, i.e.,
Note that if no edge maps to an edge in the abstract space, then no bound on the cost of that edge is imposed.
For example, the state mapping used to define the abstraction in the middle column of Figure 3 maps an 8-puzzle state to an abstract state by renaming tiles 2, 4, 6, and 8 to “don’t care”. This mapping satisfies condition (1) because “don’t care” tiles can be exchanged with the blank whenever regular tiles can. It satisfies condition (2) because each move is either a distinguished move ( and ) or a “don’t care” move ( and ) and in both cases , the cost of the edge in the original space.
The set of abstract states is usually equal to , but it can be a superset, in which case the abstraction is said to be non-surjective [Hernádvölgyi HolteHernádvölgyi Holte2000]. Likewise, the set of abstract edges is usually equal to but it can be a superset even if . In some cases, one deliberately chooses an abstract space that has states or edges that have no counterpart in the original space. For example, the methods that define abstractions by dropping operator preconditions must, by their very design, create abstract spaces that have edges that do not correspond to any edge in the original space (e.g. PearlHeuristics). In other cases, non-surjectivity is an inadvertent consequence of the abstract space being defined implicitly as the set of states reachable from the abstract goal state by applying operator inverses. For example, if a tile in the sliding tile puzzle is mapped to the blank in the abstract space, the puzzle now has two blanks and states are reachable in the abstract space that have no counterpart in the original space [Hernádvölgyi HolteHernádvölgyi Holte2000]. For additional examples and an extensive discussion of non-surjectivity see the previous paper by robistvan04.
For any path in , there is a corresponding abstract path from to in and .
Proof: By definition, in is a sequence of edges where and . Because , each of the corresponding abstract edges exists (). Because and , the sequence is a path from to .
By definition, . For each , Condition (2) ensures that , and therefore .
For example, consider state and goal in Figure 3. Because of condition (1), any path from state to in the original space is also a path from abstract state to abstract goal state and from abstract state to in the abstract spaces. Because of condition (2), the cost of the path in the original space is greater than or equal to the sum of the primary cost and the residual cost of the corresponding abstract path in each abstract space.
We use to mean the set of all paths from to in space .
The optimal abstract cost from abstract state to abstract state in is defined as
We define the heuristic obtained from abstract space for the cost from state to as
Note that in these definitions, the path minimizing the cost is not required to be the image, , of a path in .
for all and all .
Proof: By Lemma 3.1, , and therefore
The left hand side of this inequality is by definition, and the right hand side is proved in the following Claim 3.2.1 to be greater than or equal to . Therefore, .
Claim 3.2.1 for all .
for all and all .
Proof: By the definition of as a minimization and the definition of , it follows that .
To complete the proof, we observe that by Lemma 3.2, .
The heuristic from state to state defined by an abstraction system is
3.3 Additive Abstractions
In this section, we formalize the notion of “additive abstraction” that was introduced intuitively in Section 2.1. The example there showed that , the sum of the heuristics for state defined by multiple abstractions, was admissible provided the cost functions in the abstract spaces only counted the “distinguished moves”. In our formal framework, the “cost of distinguished moves” is captured by the notion of primary cost.
For any pair of states the additive heuristic given an abstraction system is defined to be
is the minimum primary cost of a path in the abstract space from to .
In Figure 3, for example, and because the minimum number of distinguished moves to reach from is and the minimum number of distinguished moves to reach from is .
Intuitively, will be admissible if the cost of edge in the original space is divided among the abstract edges that correspond to , as is done by the “cost-splitting” and “location-based” methods for defining abstract costs that were introduced at the end of Section 2.1. This leads to the following formal definition.
An abstraction system is additive if .
If is additive then for all .
Proof: Assume that , where . Therefore, . Since is additive, it follows by definition that
where the last line follows from the definitions of and .
If is additive then for all .
Proof: obeys the triangle inequality: for all . It follows that .
Because and , it follows that .
Since is additive, by Lemma 3.4, .
Hence for all .
We now develop a simple test that has important consequences for additive heuristics. Define and , the set of abstract paths from to whose primary cost is minimal.
The conditional optimal residual cost is the minimum residual cost among the paths in :
Note that the value of () is sometimes, but not always, equal to the optimal abstract cost . In Figure 3, for example, (a path with this cost is shown in Figure 2) and , while . As the following lemmas show, it is possible to draw important conclusions about by comparing its value to ().
Let be any additive abstraction system and let be any states. If for all , then .
Proof: By the definition of , . Therefore, .
For an additive and path with , for all .
Proof: Suppose for a contradiction that there exists some , such that . Then because , there must exist some , such that , which contradicts the definition of . Therefore, such an does not exist and for all .
For an additive and a path with , for all .
Proof: Following Lemma 3.7 and the definition of , for all . Because is the smallest residual cost of paths in , it follows that .
For an additive and a path with , for all .
Let be any additive abstraction system and let be any states. If for some , then .
Proof: This lemma follows directly as the contrapositive of Lemma 3.9.
Lemma 3.6 gives a condition under which is guaranteed to be at least as large as for a specific states and . If this condition holds for a large fraction of the state space , one would expect that search using to be at least as fast as, and possibly faster than, search using . This will be seen in the experiments reported in Section 4. The opposite is not true in general, i.e., failing this condition does not imply that will result in faster search than . However, as Lemma 3.10 shows, there is an interesting consequence when this condition fails for state : we know that the value returned by for is not the true cost to reach the goal from . Detecting this is useful because it allows the heuristic value to be increased without risking it becoming inadmissible. Section 6 explores this in detail.
3.4 Relation to Previous Work
The aim of the preceding formal definitions is to identify fundamental properties that guarantee that abstractions will give rise to admissible, consistent heuristics. We have shown that the following two conditions guarantee that the heuristic defined by an abstraction is admissible and consistent
and that a third condition
guarantees that is admissible and consistent.
Previous work has focused on defining abstraction and additivity for specific ways of representing states and transition functions. These are important contributions because ultimately one needs computationally effective ways of defining the abstract state spaces, abstraction mappings, and cost functions that our theory takes as given. The importance of our contribution is that it should make future proofs of admissibility, consistency, and additivity easier, because one will only need to show that a particular method for defining abstractions satisfies the three preceding conditions. These are generally very simple conditions to demonstrate, as we will now do for several methods for defining abstractions and additivity that currently exist in the literature.
3.4.1 Previous Definitions of Abstraction
The use of abstraction to create heuristics began in the late 1970s and was popularized in Pearl’s landmark book on heuristics [PearlPearl1984]. Two abstraction methods were identified at that time: “relaxing” a state space definition by dropping operator preconditions [GaschnigGaschnig1979, Guida SomalvicoGuida Somalvico1979, PearlPearl1984, ValtortaValtorta1984], and “homomorphic” abstractions [BanerjiBanerji1980, KiblerKibler1982]. These early notions of abstraction were unified and extended by mostow89 and machinediscovery, producing a formal definition that is the same as ours in all important respects except for the concept of “residual cost” that we have introduced.111Prieditis’s definition allows an abstraction to expand the set of goals. This can be achieved in our definition by mapping non-goal states in the original space to the same abstract state as the goal.
Today’s two most commonly used abstraction methods are among the ones implemented in Prieditis’s Absolver II system [PrieditisPrieditis1993]. The first is “domain abstraction”, which was independently introduced in the seminal work on pattern databases [Culberson SchaefferCulberson Schaeffer1994, Culberson SchaefferCulberson Schaeffer1998] and then generalized [Hernádvölgyi HolteHernádvölgyi Holte2000]. It assumes a state is represented by a set of state variables, each of which has a set of possible values called its domain. An abstraction on states is defined by specifying a mapping from the original domains to new, smaller domains. For example, an 8-puzzle state is typically represented by 9 variables, one for each location in the puzzle, each with the same domain of 9 elements, one for each tile and one more for the blank. A domain abstraction that maps all the elements representing the tiles to the same new element (“don’t care”) and the blank to a different element would produce the abstract space shown in Figure 1. The reason this particular example satisfies property (P1) is explained in Section 3.2. In general, a domain abstraction will satisfy property (P1) as long as the conditions that define when state transitions occur (e.g. operator preconditions) are guaranteed to be satisfied by the “don’t care” symbol whenever they are satisfied by one or more of the domain elements that map to “don’t care”. Property (P2) follows immediately from the fact that all state transitions in the original and abstract spaces have a primary cost of 1.
The other major type of abstraction used today, called “drop” by machinediscovery, was independently introduced for abstracting planning domains represented by grounded (or propositional) STRIPS operators [EdelkampEdelkamp2001]. In a STRIPS representation, a state is represented by the set of logical atoms that are true in that state, and the directed edges between states are represented by a set of operators, where each operator is described by three sets of atoms, , , and . lists ’s preconditions: can be applied to state only if all the atoms in are true in (i.e., ). and specify the effects of operator , with listing the atoms that become true when is applied (the “add” list) and listing the atoms that become false when is applied (the “delete” list). Hence if operator is applicable to state , the state it produces when applied to is the set of atoms .
In this setting, Edelkamp defined an abstraction of a given state space by specifying a subset of the atoms and restricting the abstract state descriptions and operator definitions to include only atoms in the subset. Suppose is the subset of the atoms underlying abstraction mapping , where is the original state space and is the abstract state space based on . Two states in will be mapped to the same abstract state if and only if they contain the same subset of atoms in , i.e., iff . This satisfies property (P1) because operator being applicable to state () implies abstract operator is applicable to abstract state () and the resulting state is mapped by to because set intersection distributes across set subtraction and union (). Again, property (P2) follows immediately from the fact that all operators in the original and abstract spaces have a primary cost of 1.
Recently, Helmert et al. ICAPS2007Helmert described a more general approach to defining abstractions for planning based on “transition graph abstractions”. A transition graph is a directed graph in which the arcs have labels, and a transition graph abstraction is a directed graph homomorphism that preserves the labels.222“Homomorphism” here means the standard definition of a digraph homomorphism (Definition 3.2), which permits non-surjectivity (as discussed in Section 3.2), as opposed to Helmert et al.’s definition of “homomorphism”, which does not allow non-surjectivity. Hence, Helmert et al.’s method is a restricted version of our definition of abstraction and therefore satisfies properties (P1) and (P2). Helmert et al. make the following interesting observations that are true of our more general definition of abstractions:
the composition of two abstractions is an abstraction. In other words, if is an abstraction of