Planning with Partial Preference Models

01/12/2011 ∙ by Tuan Nguyen, et al. ∙ Arizona State University PARC ibm Free University of Bozen-Bolzano University of Brescia 0

Current work in planning with preferences assume that the user's preference models are completely specified and aim to search for a single solution plan. In many real-world planning scenarios, however, the user probably cannot provide any information about her desired plans, or in some cases can only express partial preferences. In such situations, the planner has to present not only one but a set of plans to the user, with the hope that some of them are similar to the plan she prefers. We first propose the usage of different measures to capture quality of plan sets that are suitable for such scenarios: domain-independent distance measures defined based on plan elements (actions, states, causal links) if no knowledge of the user's preferences is given, and the Integrated Convex Preference measure in case the user's partial preference is provided. We then investigate various heuristic approaches to find set of plans according to these measures, and present empirical results demonstrating the promise of our approach.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 21

page 22

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Most work in automated planning takes as input a complete specification of domain models and/or user preferences and the planner searches for a single solution satisfying the goals, probably optimizing some objective function. In many real world planning scenarios, however, the user’s preferences on desired plans are either unknown or at best partially specified (c.f. Kambhampati (2007)). In such cases, the planner’s job changes from finding a single optimal plan to finding a set of representative solutions (“options”) and presenting them to the user with the hope that she can find one of them desirable. As an example, in adaptive web services composition, the causal dependencies among some web services might change at the execution time, and as a result the web service engine wants to have a set of diverse plans/compositions such that if there is a failure while executing one composition, an alternative may be used which is less likely to be failing simultaneously (Chafle et al., 2006). However, if a user is helping in selecting the compositions, the planner could be first asked for a set of plans that may take into account the user’s trust in some particular sources and when she selects one of them, it is next asked to find plans that are similar to the selected one. The requirement of searching for a set of plans is also considered in intrusion detection (Boddy et al., 2005) where a security analysis needs to analyze a set of attack plans that might be attempted by a potential adversary, given limited (or unknown) information about the adversary’s model (e.g., his goals, capabilities, habits, …), and the resulting analyzed information can then be used to set up defensive strategies against potential attacks in the future. Another example can be found in Memon et al. (2001) in which test cases for graphical user interfaces (GUIs) are generated as a set of distinct plans, each corresponding to a sequence of actions that a user could perform, given the user’s unknown preferences on how to interact with the GUI to achieve her goals. The capability of synthesizing multiple plans would also have potential application in case-based planning (e.g., Serina (2010)) where it is important to have a plan set satisfying a case instance. These plans can be different in terms of criteria such as resources, makespan and cost that can only be specified in the retrieval phase. In the problem of travel planning for individuals of a city in a distributed manner while also optimizing public resource (e.g., road, traffic police personel), the availability of a number of plans for each person’s goals could make the plan merging phase easier and reduce the conflicts among individual plans.

In this work, we investigate the problem of generating a set of plans in order to deal with planning situations where the preference model is not completely specified. In particular, we consider the following scenarios:

  • Even though the planner is aware that the user has some preferences on solution plans, it is not provided with any of that knowledge.

  • The planner is provided with incomplete knowledge of the user’s preferences. In particular, the user is interested in some plan attributes (such as the duration and cost of a flight, or whether all packages with priority are delivered on time in a logistic domain), each with different but unknown degree of importance (represented by weight or trade-off

    values). Normally, it is quite hard for a user to indicate the exact trade-off values, but instead more likely to determine that one attribute is more (or less) important than some others—for instance, a bussinessman would consider the duration of a flight much more important than its cost. Such kind of incomplete preference specification could be modeled with a probability distribution of weights values

    222Even if we do not have any special knowledge about this probability distribution, we can always start by initializing it to be uniform, and gradually improve it based on interaction with the user., and is therefore assumed to be given as an input (together with the attributes) to the planner.

Even though, in principle, the user would have a better chance to find her desired plan from a larger plan set, there are two problems to consider—one computational, and other comprehensional. The computational problem is that synthesis of a single plan is often quite costly already, and therefore it is even more challenging to search for a large plan set. Coming to the second problem, it is unclear that the user will be able to inspect a large set of plans to identify the plan she prefers. What is clearly needed, therefore, is the ability to generate a set of plans, among all sets of bounded (small) number of plans, with the highest chance of including the user’s preferred plan. An immediate challenge is formalizing what it means for a meaningful set of plans, in other words what the quality measure of plan sets should be given an incomplete preference specification.

We propose different quality measures for the two scenarios listed above. In the extreme case when the user could not provide any knowledge of her preferences, we define a spectrum of distance measures between two plans based on their syntactic features in order to define the diversity

measure of plan sets. These measures can be used regardless of the user’s preference, and by maximizing the diversity of a plan set we increase the chance that the set is uniformly distributed in the unknown preference space, and therefore likely contains a plan that is close to a user’s desired one.

This measure can be further refined when some knowledge of the user’s preferences is provided. As mentioned above, we assume that the user’s preference is specified by a convex combination of plan attributes, and incomplete in the sense that the distribution of trade-off weights is given, not their exact values. The whole set of best plans (i.e. the ones with the best value function) can be pictured as the lower convex-hull of the Pareto set on the attribute space. To measure the quality of any (bounded) set of plans on the whole optimal set, we adapt the idea of Integrated Preference Function (IPF) (Carlyle et al., 2003), in particular its special case Integrated Convex Preference (ICP). This measure was developed in the Operations Research (OR) community in the context of multi-criteria scheduling, and is able to associate a robust measure of representativeness for any set of solution schedules (Fowler et al., 2005).

Armed with these quality measures, we can then formulate the problem of planning with partial preference models as finding a bounded set of plans that has the best quality value. Our next contribution therefore is to investigate effective approaches for using quality measures to bias a planner’s search to find a high quality plan set efficiently. For the first scenario when the preference specification is not provided, two representative state-of-the-art planning approaches are considered. The first, gp-csp (Do and Kambhampati, 2001), typifies the issues involved in generating diverse plans in bounded horizon compilation approaches, while the second, lpg (Gerevini et al., 2003), typifies the issues involved in modifying the heuristic search planners. Our investigations with gp-csp allow us to compare the relative difficulties of enforcing diversity with each of the three different distance measures (elaborated in later section). With lpg, we find that the proposed quality measure makes it more effective in generating plan set over large problem instances. For the second case when part of the user’s preferences is provided, we also present a spectrum of approaches for solving this problem efficiently. We implement these approaches on top of Metric-LPG (Gerevini et al., 2008). Our empirical evaluation compares these approaches both among themselves as well as against the methods for generating diverse plans ignoring the partial preference information, and the results demonstrate the promise of our proposed solutions.

Figure 1: An overview picture of planning with respect to knowledge of user’s preferences.

Our work can be considered as a complement to current research in planning with preferences, as shown in Figure 1. Under the perspective of planning with preferences, most current work in planning synthesize a single solution plan, or a single best one, in situations where user has no preferences, or a complete knowledge of preferences is given to the planner. On the other hand, we address the problem of synthesizing a set of plans when knowledge of user’s preferences is either completely unknown or partially specified.

The paper is organized as follows. Section 2 gives fundamental concepts in preferences, and formal notations. In Section 3, we formalize quality measures of plan set in the two scenarios. Sections 4 and 5 discuss our various heuristic approaches to generate plan sets, together with the experimental results. We discuss related work in Section 6, future work and conclusion in Section 7.

2 Background and Notation

Given a planning problem with the set of solution plans , a user preference model is a transitive, reflextive relation in , which defines an ordering between two plans and in . Intuitively, means that the user prefers at least as much as . Note that this ordering can be either partial (i.e. it is possible that neither nor holds—in other words, they are incomparable), or total (i.e. either or holds). A plan is considered more preferred than a plan , denoted by , if , , and they are equally preferred if and . A plan is an optimal (i.e., most preferred) plan if for any other plan . A plan set is considered more preferred than , denoted by , if for any and , and they are incomparable if there exists and such that and are incomparable.

The ordering implies a partition of into disjoint plan sets (or classes) , , … (, ) such that plans in the same set are equally preferred, and for any set , , either , , or they are incomparable. The partial ordering between these sets can be represented as a Hasse diagram (Birkhoff, 1948) where the sets are vetices, and there is an (upward) edge from to if and there is not any in the partition such that . We denote as the “layer” of the set in the diagram, assuming that the most preferred sets are placed at the layer 0, and if there is an edge from to . A plan in a set at a layer with the smaller value, in general, is either more preferred than or incomparable with ones at high-value layers.333If is a total ordering, then plans at smaller layer is more preferred than ones at higher layer. Figure 2 show examples of Hasse diagrams representing a total and partial preference ordering between plans.

Figure 2: The Hasse diagrams and layers of plan sets implied by two preference models. In (a), , and any two plans are comparable. In (b), on the other hand, , , and each plan in is incomparable with plans in and .

When the preference model is explicitly specified, answering queries such as comparing two plans, finding a most preferred (optimal) plan becomes an easy task. This is possible, however, only if the set of plans is small and known upfront. Many preference languages, therefore, have been proposed to represent the relation in a more compact way, and serve as starting points for algorithms to answer queries. Most preference languages fall into the following two categories:

  • Quantitative languages define a value function which assigns a real number to each plan, with a precise interpretation that . Although this function is defined differently in many languages, at a high level it combines the user’s preferences on various aspects of plan that can be measured quantitatively. For instance, in the context of decision-theoretic planning (Boutilier et al., 1999), the value function of a policy is defined as the expected rewards of states that are visited when the policy executes. In partial satisfaction (over-subcription) planning (PSP) (Smith, 2004; Van Den Briel et al., 2004), the quality of plans is defined as its total rewards of soft goals achieved minus its total action costs. In PDDL2.1 (Fox and Long, 2003), the value function is an arithmetic function of numerical fluents such as plan makespans, fuel used etc., and in PDDL3 (Gerevini et al., 2009) it is enhanced with individual preference specification defined as formulae over state trajectory using linear temporal logic (LTL) (Pnueli, 1977).

  • Qualitative languages provide qualitative statements that are more intuitive for lay users to specify. A commonly used language of this type is CP-networks (Boutilier et al., 2004), where the user can specify her preference statements on values of plan attributes, possibly given specification of others (for instance, “Among tickets with the same prices, I prefer airline A to airline B.”). Another example is LPP (Bienvenu et al., 2006) in which the statements can be specified using LTL formulae, and possibly being aggregated in different ways.

Figure 3 shows the conceptual relation of preference models, languages and algorithms. We refer the reader to the work by Brafman and Domshlak (2009) for a more detailed discussion on this metamodel, and by Baier and McIlraith (2009) for an overview of different preference languages used in planning with preferences.

Figure 3: The metamodel (Brafman and Domshlak, 2009).

From the modeling point of view, in order to design a suitable language capturing the user’s preference model, the modeler should be provided with some knowledge of the user’s interest that affects the way she evaluates plans (for instance, flight duration and ticket cost in a travel planning scenario). Such knowledge in many cases, however, cannot be completely specified. Our purpose therefore is to present a bounded set of plans to the user in the hope that it will increase the chance that she can find a desired plan. In the next section, we formalize the quality measures for plan sets in two situations where either no knowledge of the user’s preferences or only part of them is given.

3 Quality Measures for Plan Sets

3.1 Syntactic Distance Measures for Unknown Preference Cases

We first consider the situation in which the user has some preferences for solution plans, but the planner is not provided with any knowledge of such preferences. It is therefore impossible for the planner to assume any particular form of preference language representing the hidden preference model. There are two issues that need to be considered in formalizing a quality measure for plan sets:

  • What are the elements of plans that can be involved in a quality measure?

  • How should a quality measure be defined using those elements?

For the first question, we observe that even though users are normally interested in some high level features of plans that are relevant to them, many of those features can be considered as “functions” of base level elements of plans. For instance, the set of actions in the plan determine the makespan of a (sequential) plan, and the sequence of states when the plan executes gives the total reward of goals achieved. We consider the following three types of base level features of plans which could be used in defining quality measure, independently of the domain semantics:

  • Actions that are present in plans, which define various high level features of the plans such as its makespan, execution cost etc. that are of interest to the user whose preference model could be represented with preference languages such as in PSP and PDDL2.1.

  • Sequence of states that the agent goes through, which captures the behaviors resulting from the execution of plans. In many preference languages defined using high level features of plans such as the reward of goals collected (e.g., PSP), of the whole state (e.g., MDP), or the temporal relation between propositions occur in states (e.g. PDDL3, (Son and Pontelli, 2006) and LPP (Fritz and McIlraith, 2006)), the sequence of states can affect the quality of plan evaluated by the user.

  • The causal links representing how actions contribute to the goals being achieved, which measures the causal structures of plans.444A causal link records that a predicate is produced by and consumed by . These plan elements can affect the quality of plans with respect to the languages mentioned above, as the causal links capture both the actions appearing in a plan and the temporal relation between actions and variables.

A similar conceptual separation of features has also been considered recently in the context of case-based planning by Serina (2010

), in which planning problems were assumed to be well classified, in terms of costs to adapt plans of one problem to solve another, in some

unknown high level feature space. The similarity between problems in the space were implicitly defined using kernel functions of their domain-independent graph representations. In our situation, we aim to approximate quality of plan sets on the space of features that the user is interested in using distance between plans with respect to base level features of plans mentioned above (see below).

Table 1 gives the pros and cons of using the different base level elements of plan. We note that if actions in the plans are used in defining quality measure of plan sets, no additional problem or domain theory information is needed. If plan behaviors are used as base level elements, the representation of the plans that bring about state transition becomes irrelevant since only the actual states that an execution of the plan will take is considered. Hence, we can now compare plans of different representations, e.g., four plans where the first is a deterministic plan, the second is a contingent plan, the third is a hierarchical plan and the fourth is a policy encoding probabilistic behavior. If causal links are used, then the causal proximity among actions is now considered rather than just physical proximity in the plan.

Basis Pros Cons

Actions
Does not require No problem information
problem information is used
States Not dependent on any specific Needs an execution
plan representation simulator to identify states
Causal links Considers causal proximity Requires domain theory
of state transitions (action)
rather than positional
(physical) proximity

Table 1: The pros and cons of different base level elements of plan.

Given those base level elements, the next question is how to define a quality measure of plan sets using them. Recall that without any knowledge about the user’s preferences, there is no way for the planner to assume any particular preference language, because of which the motivation behind a choice of quality measure should come from the hidden user’s preference model. Given a Hasse diagram induced from the user’s preference model, a -plan set that will be presented to the user can be considered to be randomly selected from the diagram. The probability of having one plan in the set classified in a class at the optimal layer would increase when the individual plans are more likely to be at different layers, and this chance in turn will increase if they are less likely to be equally prefered by the user.555To see this, consider a diagram with at layer 0, and at layer 1, and at layer 2. Assuming that we randomly select a set of 2 plans. If those plans are known to be at the same layer, then the chance of having one plan at layer 0 is . However, if they are forced to be at different layers, then the probability will be . On the other hand, the effect of base level elements of a plan on high level features relevant to the user suggests that plans similar with respect to base level features are more likely to be close to each other on the high level feature space determining user’s preference model.

In order to define a quality measure using base level features of plans, we proceed with the following assumption: plans that are different from each other with respect to the base level features are less likely to be equally prefered by the user, in other words they are more likely to be at different layers of the Hasse diagram. With the purpose of increasing the chance of having a plan that the user prefers, we propose the quality measure of plan sets as its diversity measure, defined using the distance between two plans in the set with respect to a base level element. More formally, the quality measure of a plan set can be defined as either the minimal, maximal, or average distance between plans:

  • Minimal distance:

    (1)
  • Maximal distance:

    (2)
  • Average distance:

    (3)

where is the distance measures between two plans.

3.1.1 Distance measures between plans

There are various choices on how to define the distance measure between two plans using plan actions, sequence of states or causal links, and each way can have different impact on the diversity of plan set on the Hasse diagram. In the following, we propose distance measures in which a plan is considered as (i) a set of actions and causal links, or (ii) sequence of states the agent goes through, which could be used independently of plan representation (e.g. total order, partial order plans).

  • Plan as a set of actions or causal links: given a plan , let and be the set of actions or causal links of . The distance between two plans and can be defined as the ratio of the number of actions (causal links) that do not appear in both plans to the total number of actions (causal links) appearing in one of them:

    (4)
    (5)
  • Plan as a sequence of states: given two sequence of states and resulting from executing two plans and , and assume that . Since the two sequence of states may have different length, there are various options in defining distance measure between and , and we consider here two simple options. In the first one, it can be defined as the average of the distances between state pairs , and each state ,… is considered to contribute maximally (i.e. one unit) into the difference between two plans:

    (6)

    On the other hand, we can assume that the agent continues to stay at the goal state in the next time steps after executing , and the measure can be defined as follows:

    (7)

    The distance measure between two states , used in those two measures is defined as

    (8)
Figure 4: Example illustrating plans with base-level elements. and denote dummy actions producing the initial state and consuming the goal propositions, respectively (see text for more details).

Example: Figure 4 shows three plans , and for a planning problem where the initial state is and the goal propositions are . The specification of actions are shown in the table. The action sets of the first two plans ( and ) are quite similar (), but the causal links which involve (, ) and (, ) make their difference more significant with respect to causal-link based distance (). Two other plans and , on the other hand, are very different in terms of action sets (and therefore the sets of causal links): , but they are closer in term of state-based distance ( as defined in the equation 6, and if defined in the equation 7).

3.2 Integrated Preference Function (IPF) for Partial Preference Cases

We now discuss a quality measure for plan sets in the case when the user’s preference is partially expressed. In particular, we consider scenarios in which the preference model can be represented by some quantitative language with an incompletely specified value function of high level features. As an example, the quality of plans in PDDL2.1 (Fox and Long, 2003) and PDDL3 (Gerevini and Long, 2005)

are represented by a metric function combining metric fluents and preference statements on state trajectory with parameters representing their relative importance. While providing a convenient way to represent preference models, such parameterized value functions present an issue of obtaining reasonable values for the relative importance of the features. A common approach to model this type of incomplete knowledge is to consider those parameters as a vector of random variables, whose values are assumed to be drawn from a distribution. This is the representation that we will follow.

To measure the quality of plan sets, we propose the usage of Integrated Preference Function (IPF) (Carlyle et al., 2003), which has been used to measure the quality of a solution set in a wide range of multi-objective optimization problems. The IPF measure assumes that the user’s preference model can be represented by two factors: (1) a probability distribution of parameter vector such that (in the absence of any special information about the distribution, can be assumed to be uniform), and (2) a value function combines different objective functions into a single real-valued quality measure for plan . This incomplete specification of the value function represents a set of candidate preference models, for each of which the user will select a different plan, the one with the best value, from a given plan set . The IPF value of solution set is defined as:

(9)

with is the best solution according to for each given value. Let be its inverse function specifying a range of values for which is an optimal solution according to . As is piecewise constant, the value can be computed as:

(10)

Let then we have:

(11)

Since is the set of plans that are optimal for some specific parameter vector, now can be interpreted as the expected value that the user can get by selecting the best plan in . Therefore, the set of solutions (known as lower convex hull of ) with the minimal IPF value is most likely to contain the desired solutions that the user wants and in essense a good representative of the plan set .

While our work is applicable to more general planning scenarios, to make our discussion on generating plan sets concrete, we will concentrate on metric temporal planning where each action has a duration and execution cost . The planner needs to find a plan , which is a sequence of actions that is executable and achieves all goals. The two most common plan quality measures are: makespan, which is the total execution time of ; and plan cost, which is the total execution cost of all actions in —both of them are high level features that can be affected by the actions in the plan. In most real-world applications, these two criteria compete with each other: shorter plans usually have higher cost and vice versa. We use the following assumptions:

  • The desired objective function involves minimizing both components: measures the makespan of the plan and measures its execution cost.

  • The quality of a plan is a convex combination: , where weight represents the trade-off between the two competing objective functions.

  • The belief distribution of over the range is known. If the user does not provide any information or we have not learnt anything about the preference on the trade-off between time and cost of the plan, then the planner can assume a uniform distribution (and improve it later using techniques such as preference elicitation).

Given that the exact value of is unknown, our purpose is to find a bounded representative set of non-dominated plans666A plan is dominated by if and and at least one of the inequalities is strict. minimizing the expected value of with regard to the given distribution of over .

Figure 5: Solid dots represents plans in the pareto set (). Connected dots represent plans in the lower convex hull () giving optimal ICP value for any distribution on trade-off between cost and time.

Example: Figure 5 shows our running example in which there are a total of 7 plans with their and values as follows: , , , , , , and . Among these 7 plans, 5 of them belong to a pareto optimal set of non-dominated plans: . The other two plans are dominated by some plans in : is dominated by and is dominated by . Plans in are depicted in solid dots, and the set of plans that are optimal for some specific value of is highlighted by connected dots.

IPF for Metric Temporal Planning: The user preference model in our target domain of temporal planning is represented by a convex combination of the time and cost quality measures, and the IPF measure now is called Integrated Convex Preference (ICP). Given a set of plans , let and be the makespan and total execution cost of plan , the ICP value of with regard to the objective function and the parameter vector () is defined as:

(12)

where , and . In other words, we divide into non-overlapping regions such that in each region there is a single solution that has better value than all other solutions in .

We select the IPF/ICP measure to evaluate our solution set due to its several nice properties:

  • If and then is probabilistically better than in the sense that for any given , let and , then the probability of is higher than the probability of .

  • If is obviously better than , then the ICP measure agrees with the assertion. More formally: if such that is dominated by , then .

Empirically, extensive results on scheduling problems in Fowler et al. (2005) have shown that ICP measure “evaluates the solution quality of approximation robustly (i.e., similar to visual comparison results) while other alternative measures can misjudge the solution quality”.

In the next two sections 4 and 5, we investigate the problem of generating high quality plan sets for two cases mentioned: when no knowledge about the user’s preferences is given, and when part of its is given as input to the planner.

4 Generating Diverse Plan Set in the Absence of Preference Knowledge

In this section, we describe approaches to searching for a set of diverse plans with respect to a measure defined with base level elements of plans as discussed in the previous section. In particular, we consider the quality measure of plan set as the minimal pair-wise distance between any two plans, and generate a set of plans containing plans with the quality of at least a predefined threshold . As discussed earlier, by diversifying the set of plans on the space of base level features, it is likely that plans in the set would cover a wide range of space of unknown high level features, increasing the possibility that the user can select a plan close to the one that she prefers. The problem is formally defined as follows:


dDISTANTkSET : Find with , = and

where any distance measure between two plans formalized in Section 3.1.1 can be used to implement .

We now consider two representative state-of-the-art planning approaches in generating diverse plan sets. The first one is gp-csp (Do and Kambhampati, 2001) representing constraint-based planning approaches, and the second one is lpg (Gerevini et al., 2003) that uses an efficient local-search based approach. We use gp-csp to comparing the relation between different distance measures in diversifying plan sets. On the other hand, with lpg we stick to the action-based distance measure, which is shown experimentally to be the most difficult measure to enforce diversity (see below), and investigate the scalability of heuristic approaches in generating diverse plans.

4.1 Finding Diverse Plan Set with GP-CSP

The gp-csp planner (Do and Kambhampati, 2001) converts Graphplan’s planning graph into a CSP encoding, and solves it using a standard CSP solver. The solution of the encoding represents a valid plan for the original planning problem. In the encoding, the CSP variables correspond to the predicates that have to be achieved at different levels in the planning graph (different planning steps) and their possible values are the actions that can support the predicates. For each CSP variable representing a predicate , there are two special values: i) : indicates that a predicate is not supported by any action and is false at a particular level/planning-step; ii) “noop”: indicates that the predicate is true at a given level because it was made true at some previous level and no other action deletes between and . Constraints encode the relations between predicates and actions: 1) mutual exclusion relations between predicates and actions; and 2) the causal relationships between actions and their preconditions.

4.1.1 Adapting GP-CSP to Different Distance Metrics

When the above planning encoding is solved by any standard CSP solver, it will return a solution containing var, value of the form . The collection of where represents the facts that are made true at different time steps (plan trajectory) and can be used as a basis for the state-based distance measure777We implement the state-based distance between plans as defined in equation 6.; the set of represents the set of actions in the plan and can be used for action-based distance measure; lastly, the assignments themselves represent the causal relations and can be used for the causal-based distance measure.

However, there are some technical difficulties we need to overcome before a specific distance measure between plans can be computed. First, the same action can be represented by different values in the domains of different variables. Consider a simple example in which there are two facts and , both supported by two actions and . When setting up the CSP encoding, we assume that the CSP variables and are used to represent and . The domains for and are and , both representing the two actions (in that order). The assignments and have a distance of 2 in traditional CSP because different values are assigned for each variable and . However, they both represent the same action set and thus lead to the plan distance of 0 if we use the action-based distance in our plan comparison. Therefore, we first need to translate the set of values in all assignments back to the set of action instances before doing comparison using action-based distance. The second complication arises for the causal-based distance. A causal link between two actions and indicates that supports the precondition of . However, the CSP assignment only provides the first half of each causal-link. To complete the causal-link, we need to look at the values of other CSP assignments to identify action that occurs at the later level in the planning graph and has as its precondition. Note that there may be multiple “valid” sets of causal-links for a plan, and in the implementation we simply select causal-links based on the CSP assignments.

4.1.2 Making GP-CSP Return a Set of Plans

To make gp-csp return a set of plans satisfying the dDISTANTkSET constraint using one of the three distance measures, we add “global” constraints to each original encoding to enforce d-diversity between every pair of solutions. When each global constraint is called upon by the normal forward checking and arc-consistency checking procedures inside the default solver to check if the distance between two plans is over a predefined value , we first map each set of assignments to an actual set of actions (action-based), predicates that are true at different plan-steps (state-based) or causal-links (causal-based) using the method discussed in the previous section. This process is done by mapping all CSP assignments into action sets using a call to the planning graph, which is outside of the CSP solver, but works closely with the general purpose CSP solver in gp-csp. The comparison is then done within the implementation of the global constraint to decide if two solutions are diverse enough.

We investigate two different ways to use the global constraints:

  1. The parallel strategy to return the set of plans all at once. In this approach, we create one encoding that contains identical copies of each original planning encoding created using gp-csp planner. The copies are connected together using pair-wise global constraints. Each global constraint between the and copies ensures that two plans represented by the solutions of those two copies will be at least distant from each other. If each copy has variables, then this constraint involves variables.

  2. The greedy strategy to return plans one after another. In this approach, the copies are not setup in parallel up-front, but sequentially. We add to the copy one global constraint to enforce that the solution of the copy should be -diverse from any of the earlier solutions. The advantage of the greedy approach is that each CSP encoding is significantly smaller in terms of the number of variables ( vs. ), smaller in terms of the number of global constraints (1 vs. ), and each global constraint also contains lesser number of variables ( vs. ).888However, each constraint is more complicated because it encodes (i-1) previously found solutions. Thus, each encoding in the greedy approach is easier to solve. However, because each solution depends on all previously found solutions, the encoding can be unsolvable if the previously found solutions comprise a bad initial solution set.

4.1.3 Empirical Evaluation

We implemented the parallel and greedy approaches discussed earlier for the three distance measures and tested them with the benchmark set of Logistics problems provided with the Blackbox planner (Kautz and Selman, 1998). All experiments were run on a Linux Pentium 4, 3Ghz machine with 512MB RAM. For each problem999log-easy=prob1, rocket-a=prob2, log-a = prob3, log-b = prob4, log-c=prob5, log-d=prob6., we test with different values ranging from 0.01 (1%) to 0.95 (95%)101010Increments of 0.01 from 0.01 to 0.1 and of 0.05 thereafter. and increases from 2 to where is the maximum value for which gp-csp can still find solutions within plan horizon. The horizon (parallel plan steps) limit is 30.

We found that the greedy approach outperformed the parallel approach and solved significantly higher number of problems. Therefore, we focus on the greedy approach hereafter. For each combination of , , and a given distance measure, we record the solving time and output the average/min/max pairwise distances of the solution sets.

Baseline Comparison: As a baseline comparison, we have also implemented a randomized approach. In this approach, we do not use global constraints but use random value ordering in the CSP solver to generate different solutions without enforcing them to be pairwise -distance apart. For each distance , we continue running the random algorithm until we find solutions where is the maximum value of that we can solve for the greedy approach for that particular value. In general, we want to compare with our approach of using global constraint to see if the random approach can effectively generate diverse set of solutions by looking at: (1) the average time to find a solution in the solution set; and (2) the maximum/average pairwise distances between randomly generated solutions.

Prob1 Prob2 Prob3 Prob4 Prob5 Prob6
0.087 7.648 1.021 6.144 8.083 178.633
0.077 9.354 1.845 6.312 8.667 232.475
0.190 6.542 1.063 6.314 8.437 209.287
Random 0.327 15.480 8.982 88.040 379.182 6105.510
Table 2: Average solving time (in seconds) to find a plan using greedy (first 3 rows) and by random (last row) approaches
Prob1 Prob2 Prob3 Prob4 Prob5 Prob6
0.041/0.35 0.067/0.65 0.067/0.25 0.131/0.1* 0.126/0.15 0.128/0.2
0.035/0.4 0.05/0.8 0.096/0.5 0.147/0.4 0.140/0.5 0.101/0.5
0.158/0.8 0.136/0.95 0.256/0.55 0.459/0.15* 0.346/0.3* 0.349/0.45
Table 3: Comparison of the diversity in the plan sets returned by the random and greedy approaches. Cases where random approach is better than greedy approach are marked with (*).

Table 2 shows the comparison of average solving time to find one solution in the greedy and random approaches. The results show that on an average, the random approach takes significantly more time to find a single solution, regardless of the distance measure used by the greedy approach. To assess the diversity in the solution sets, Table 3 shows the comparison of: (1) the average pairwise minimum distance between the solutions in sets returned by the random approach; and (2) the maximum for which the greedy approach still can find a set of diverse plans. The comparisons are done for all three distance measures. For example, the first cell in Table 3, implies that the minimum pairwise distance averaged for all solvable using the random approach is while it is (i.e. 8x more diverse) for the greedy approach using the distance measure. Except for 3 cases, using global constraints to enforce minimum pairwise distance between solutions helps gp-csp return significantly more diverse set of solutions. On average, the greedy approach returns 4.25x, 7.31x, and 2.79x more diverse solutions than the random approach for , and , respectively.

Analysis of the different distance-bases: Overall, we were able to solve 1264 combinations for three distance measures using the greedy approach. We were particularly interested in investigating the following issues:

  • H1: Computational efficiency - Is it easy or difficult to find a set of diverse solutions using different distance measures? Thus, (1) for the same and values, which distance measure is more difficult (time consuming) to solve; and (2) given an encoding horizon limit, how high is the value of and for which we can still find a set of solutions for a given problem using different distance measures.

  • H2: Solution diversity - What, if any, is the correlation/sensitivity between different distance measures? Thus, how comparative diversity of solutions is when using different distance measures.

d Prob1 Prob2 Prob3 Prob4 Prob5 Prob6

0.01
11,5, 28 8,18,12 9,8,18 3,4,5 4,6,8 8,7,7

0.03
6,3,24 8,13,9 7,7,12 2,4,3 4,6,6 4,7,6

0.05
5,3,18 6,11,9 5,7,10 2,4,3 4,6,5 3,7,5

0.07
2,3,14 6,10,8 4,7,6 2,4,2 4,6,5 3,7,5


0.09
2,3,14 6,9,6 3,6,6 2,4,2 3,6,4 3,7,4

0.1
2,3,10 6,9,6 3,6,6 2,4,2 2,6,4 3,7,4


0.2
2,3,5 5,9,6 2,6,6 1,3,1 1,5,2 2,5,3


0.3
2,2,3 4,7,5 1,4,4 1,2,1 1,3,2 1,3,3


0.4
1,2,3 3,6,5 1,3,3 1,2,1 1,2,1 1,2,3


0.5
1,1,3 2,4,5 1,2,2 - 1,2,1 1,2,1


0.6
1,1,2 2,3,4 - - - -


0.7
1,1,2 1,2,2 - - - -


0.8
1,1,2 1,2,2 - - - -


0.9
- 1,1,2 - - - -


Table 4: For each given value, each cell shows the largest solvable for each of the three distance measures , , and (in this order). The maximum values in cells are in bold.

Regarding H1, Table 4 shows the highest solvable value for each distance and base , , and . For a given pair, enforcing appears to be the most difficult, then , and is the easiest. gp-csp is able to solve 237, 462, and 565 combinations of respectively for , and . gp-csp solves dDISTANTkSET problems more easily with and than with due to the fact that solutions with different action sets (diverse with regard to ) will likely cause different trajectories and causal structures (diverse with regard to and ). Between and , solves more problems for easier instances (Problems 1-3) but less for the harder instances, as shown in Table 4. We conjecture that for solutions with more actions (i.e. in bigger problems) there are more causal dependencies between actions and thus it is harder to reorder actions to create a different causal-structure.

For running time comparisons, among 216 combinations of that were solved by all three distance measures, gp-csp takes the least amount of time for in 84 combinations, for in 70 combinations and in 62 for . The first three lines of Table 2 show the average time to find one solution in -diverse -set for each problem using , and (which we call , and respectively). In general, is the smallest and in most problems. Thus, while it is harder to enforce than and (as indicated in Table 4), when the encodings for all three distances can be solved for a given , then takes less time to search for one plan in the diverse plan set; this can be due to tighter constraints (more pruning power for the global constraints) and simpler global constraint setting.

- 1.262 1.985
0.485 - 0.883
0.461 0.938 -

Table 5: Cross-validation of distance measures , , and .

To test H2, in Table 5, we show the cross-comparison between different distance measures , , and . In this table, cell row, column indicates that over all combinations of solved for distance , the average value where and are distance measured according to and , respectively (). For example, means that over 462 combinations of solvable for , for each , the average distance between solutions measured by is . The results indicate that when we enforce for , we will likely find even more diverse solution sets according to () and (). However, when we enforce for either or , we are not likely to find a more diverse set of solutions measured by the other two distance measures. Nevertheless, enforcing using will likely give comparable diverse degree for () and vice versa. We also observe that is highly dependent on the difference between the parallel lengths of plans in the set. The distance seems to be the smallest (i.e. ) when all plans have the same/similar number of time steps. This is consistent with the fact that and do not depend on the steps in the plan execution trajectory while does.

4.2 Finding Diverse Plan Set with LPG

In this section, we consider the problem of generating diverse set of plans using another planning approach, in particular the lpg planner which is able to scale up to bigger problems, compared to gp-csp. We focus on the action-based distance measure between plans, which has been shown in the previous section to be the most difficult to enforce diversity. lpg is a local-search-based planner, that incrementally modifies a partial plan in a search for a plan that contains no flaws (Gerevini et al., 2003). The behavior of lpg is controlled by an evaluation function that is used to select between different plan candidates in a neighborhood generated for local search. At each search step, the elements in the search neighborhood of the current partial plan are the alternative possible plans repairing a selected flaw in . The elements of the neighborhood are evaluated according to an action evaluation function (Gerevini et al., 2003)

. This function is used to estimate the cost of either adding or of removing an action node

in the partial plan being generated.

4.2.1 Revised Evaluation Function for Diverse Plans

In order to manage distanceset problems, the function has been extended to include an additional evaluation term that has the purpose of penalizing the insertion and removal of actions that decrease the distance of the current partial plan under adaptation from a reference plan . In general, consists of four weighted terms, evaluating four aspects of the quality of the current plan that are affected by the addition () or removal () of

The first three terms of the two forms of are unchanged from the standard behavior of lpg. The fourth term, used only for computing diverse plans, is the new term estimating how the proposed plan modification will affect the distance from the reference plan . Each cost term in is computed using a relaxed temporal plan (Gerevini et al., 2003).

The plans are computed by an algorithm, called RelaxedPlan, formally described and illustrated in Gerevini et al. (2003). We have slightly modified this algorithm to penalize the selection of actions decreasing the plan distance from the reference plan. The specific change to RelaxedPlan for computing diverse plans is very similar to the change described in (Fox et al., 2006), and it concerns the heuristic function for selecting the actions for achieving the subgoals in the relaxed plans. In the modified function for RelaxedPlan, we have an extra 0/1 term that penalizes an action for if its addition decreases the distance of from (in the plan repair context investigated in (Fox et al., 2006), is penalized if its addition increases such a distance).

The last term of the modified evaluation function is a measure of the decrease in plan distance caused by adding or removing : or , where contains the new action . The -coefficients of the -terms are used to weigh their relative importance.111111These coefficients are also normalized to a value in using the method described in Gerevini et al. (2003). The values of the first 3 terms are automatically derived from the expression defining the plan metric for the problem (Gerevini et al., 2003). The coefficient for the fourth new term of () is automatically set during search to a value proportional to , where is the current partial plan under construction. The general idea is to dynamically increase the value of according to the number of plans that have been generated so far: if is much higher than , the search process consists of finding many solutions with not enough diversification, and hence the importance of the last -term should increase.

4.2.2 Making LPG Return a Set of Plans

In order to compute a set of -distant plans solving a distanceset problem, we run the lpg search multiple times, until the problem is solved, with the following two additional changes to the standard version of lpg: (i) the preprocessing phase computing mutex relations and other reachability information exploited during the relaxed plan construction is done only once for all runs; (ii) we maintain an incremental set of valid plans, and we dynamically select one of them as the reference plan for the next search. Concerning (ii), let be the set of valid plans that have been computed so far, and CPlans the subset of containing all plans that have a distance greater than or equal to from a reference plan .

The reference plan used in the modified heuristic function is a plan which has a maximal set of diverse plans in , i.e.,

(13)

The plan is incrementally computed each time the local search finds a new solution. In addition to being used to identify the reference plan in , is also used for defining the initial state (partial plan) of the search process. Specifically, we initialize the search using a (partial) plan obtained by randomly removing some actions from a (randomly selected) plan in the set CPlans.

The process of generating diverse plans starting from a dynamically chosen reference plan continues until at least plans that are all -distant from each other have been produced. The modified version of lpg to compute diverse plans is called lpg-d.

4.2.3 Experimental Analysis with LPG-d

Figure 6: Performance of lpg-d (CPU-time and plan distance) for the problem pfile20 in the DriverLog-Time domain.
Figure 7: Performance of lpg-d (CPU-time and plan distance) for the problem pfile20 in the Satellite-Strips domain.
Figure 8: Performance of lpg-d (CPU-time and plan distance) for the problem pfile15 in the Storage-Propositional domain.

Recall that the distance function , using set-difference, can be written as the sum of two terms:

(14)

The first term represents the contribution of the actions in to the plan difference, while the second term indicates the contribution of to . We experimentally observed that in some cases the differences between two diverse plans computed using are mostly concentrated in only one of the components. This asymmetry means that one of the two plans can have many more actions than the other one, which could imply that the quality of one of the two plans is much worse than the quality of the other plan. In order to avoid this problem, we can parametrize by imposing the two extra constraints

and

where and are the first and second terms of , respectively, and is an integer parameter “balancing” the diversity of and .

In this section, we analyze the performance of the modified version of lpg, called lpg-d, in three different benchmark domains from the 3rd and 5th IPCs. The main goals of the experimental evaluation were (i) showing that lpg-d can efficiently solve a large set of -combinations, (ii) investigating the impact of the -constraints on performance, (iii) comparing lpg-d and the standard lpg.

We tested lpg-d using both the default and parametrized versions of , with and . We give detailed results for and a more general evaluation for and the original . We consider that varies from to , using increment step, and with 2…5, 6, 8, 10, 12, 14, 16, 20, 24, 28, 32 (overall, a total of 266 -combinations). Since lpg-d is a stochastic planner, we use the median of the CPU times (in seconds) and the median of the average plan distances (over five runs). The average plan distance for a set of plans solving a specific -combination () is the average of the plans distances between all pairs of plans in the set. The tests were performed on an AMD Athlon(tm) XP 2600+, 512 Mb RAM. The CPU-time limit was 300 seconds.

Figure 6 gives the results for the largest problem in IPC-3 DriverLog-Time (fully-automated track). lpg-d solves -combinations, including combinations and , and and . The average CPU time (top plots) is seconds. The average (bottom plots) is , with always greater than . With the original function lpg-d solves -combinations, the average CPU time is seconds, and the average is ; while with lpg-d solves combinations, the average CPU time is seconds, and the average is .

Figure 7 shows the results for the largest problem in IPC-3 Satellite-Strips. lpg-d solves -combinations; of them require less than seconds. The average CPU time is seconds, and the average is . We observed similar results when using the original function or the parametrized with  (in the second case, lpg-d solves 198 problems, while the average CPU time and the average are nearly the same as with ).

Figure 8 shows the results for a middle-size problem in IPC-5 Storage-Propositional. With lpg-d solves -combinations, of which require less than 10 seconds, while of them require less than 50 seconds. The average CPU time is seconds and the average is . With the original lpg-d solves -combinations, the average CPU time is seconds, and the average is ; with lpg-d solves combinations, the average CPU time is seconds and the average is .

The local search in lpg is randomized by a “noise” parameter that is automatically set and updated during search (Gerevini et al., 2003). This randomization is one of the techniques used for escaping local minima, but it also can be useful for computing diverse plans: if we run the search multiple times, each search is likely to consider different portions of the search space, which can lead to different solutions. It is then interesting to compare lpg-d and a method in which we simply run the standard lpg until -diverse plans are generated. An experimental comparison of the two approaches show that in many cases lpg-d performs better. In particular, the new evaluation function is especially useful for planning problems that are easy to solve for the standard lpg, and that admit many solutions. In these cases, the original function produces many valid plans with not enough diversification. This problem is significantly alleviated by the new term in . An example of domain where we observed this behavior is logistics.121212E.g., for logistics_a (prob3 of Table 2) lpg-d solved 128 instances, 41 of them in less than 1 CPU second and 97 of them in less than 10 CPU seconds; the average CPU time was seconds and the average was . While using the standard lpg, only 78 instances were solved, 20 of them in less than 1 CPU seconds and 53 of them in less than 10 CPU seconds; the average CPU time was seconds and the average was .

5 Generating Plan Sets with Partial Preference Knowledge

In this section, we consider the problem of generating plan sets when the user’s preferences are only partially expressed. In particular, we focus on metric temporal planning where the preference model is assumed to be represented by an incomplete value function specified by a convex combination of two features: plan makespan and execution cost, with the exact trade-off value drawn from a given distribution. The quality value of plan sets is measured by the ICP value, as formalized in Equation 12. Our objective is to find a set of plans where and is the lowest.

Notice that we restrict the size of the solution set returned, not only for the comprehension issue discussed earlier, but also for an important property of the ICP measure: it is a monotonically non-increasing function of the solution set (specifically, given two solution sets and such that the latter is a superset of the former, it is easy to see that ).

5.1 Sampling Weight Values

Given that the distribution of trade-off value is known, the straightforward way to find a set of representative solutions is to first sample a set of values for : based on the distribution . For each value , we can find an (optimal) plan minimizing the value of the overall value function . The final set of solutions is then filtered to remove duplicates and dominated solutions, thus selecting the plans making up the lower-convex hull. The final set can then be returned to the user. While intuitive and easy to implement, this sampling-based approach has several potential flaws that can limit the quality of its resulting plan set.

First, given that solution plans are searched sequentially and independently of each other, even if the plan found for each is optimal, the final solution set may not even be the optimal set of solutions with regard to the ICP measure. More specifically, for a given set of solutions , some tradeoff value , and two non-dominated plans , such that , it is possible that . In our running example (Figure 5), let and then . Thus, the planner will select to add to because it looks locally better given the weight . However, so indeed by taking previous set into consideration then is a much better choice than .

Second, the values of the trade-off parameter are sampled based on a given distribution, and independently of the particular planning problem being solved. As there is no relation between the sampled values and the solution space of a given planning problem, sampling approach may return very few distinct solutions even if we sample a large number of weight values . In our example, if all samples have values then the optimal solution returned for any of them will always be . However, we know that is the optimal set according to the measure. Indeed, if then the sampling approach can only find the set or and still not be able to find the optimal set .

5.2 ICP Sequential Approach

Given the potential drawbacks of the sampling approach outlined above, we also pursued an alternative approach that takes into account the ICP measure more actively. Specifically, we incrementally build the solution set by finding a solution such that has the lowest ICP value. We can start with an empty solution set , then at each step try to find a new plan such that has the lowest ICP value.

While this approach directly takes the ICP measure into consideration at each step of finding a new plan and avoids the drawbacks of the sampling-based approach, it also has its own share of potential flaws. Given that the set is built incrementally, the earlier steps where the first “seed” solutions are found are very important. The closer the seed solutions are to the global lower convex hull, the better the improvement in the ICP value. In our example (Figure 5), if the first plan found is then the subsequent plans found to best extend can be and thus the final set does not come close to the optimal set .

Input: A planning problem with a solution space ; maximum number of plans required ; number of sampled trade-off values (); time bound ; Output: A plan set ();AlgoLine0.1
begin sample values for ; AlgoLine0.2
find good quality plans in for each ;AlgoLine0.3
while and do Search for s.t. end Return endAlgoLine0.4
Algorithm 1 Incrementally find solution set
4

4

4

4

5.3 Hybrid Approach

In this approach, we aim to combine the strengths of both the sampling and ICP-sequential approaches. Specifically, we use sampling to find several plans optimizing for different weights. The plans are then used to seed the subsequent ICP-sequential runs. By seeding the hybrid approach with good quality plan set scattered across the pareto optimal set, we hope to gradually expand the initial set to a final set with a much better overall ICP value. Algorithm 1 shows the pseudo-code for the hybrid approach. We first independently sample the set of values (with pre-determined) of given the distribution on (step 4). We then run a heuristic planner multiple times to find an optimal (or good quality) solution for each trade-off value (step 5). We then collect the plans found and seed the subsequent runs when we incrementally update the initial plan set with plans that lower the overall ICP value (steps 6-8). The algorithm terminates and returns the latest plan set (step 9) if plans are found or the time bound exceeds.

5.4 Making LPG Search Sensitive to ICP

Since the LPG planner used in the previous section cannot handle numeric fluents, in particular the representing plan cost that we are interested in, we use a modified version of the Metric-LPG planner (Gerevini et al., 2008) in implementing our algorithms. Not only is Metric-LPG equipped with a very flexible local-search framework that has been extended to handle various objective functions, but also it can be made to search for single or multiple solutions. Specifically, for the sampling-based approach, we first sample the values based on a given distribution. For each value, we set the metric function in the domain file to: , and run the original LPG in the quality mode to heuristically find the best solution within the time limit for that metric function. The final solution set is filtered to remove any duplicate solutions, and returned to the user.

For the ICP-sequential and hybrid approach, we can not use the original LPG implementation as is and need to modify the neighborhood evaluation function in LPG to take into account the ICP measure and the current plan set . For the rest of this section, we will explain this procedure in detail.

Background: Metric-LPG uses local search to find plans within the space of numerical action graphs (NA-graph). This leveled graph consists of a sequence of interleaved proposition and action layers. The proposition layers consist of a set of propositional and numerical nodes, while each action layer consists of at most one action node, and a number of no-op links. An NA-graph represents a valid plan if all actions’ preconditions are supported by some actions appearing in the earlier level in . The search neighborhood for each local-search step is defined by a set of graph modifications to fix some remaining inconsistencies (unsupported preconditions) at a particular level . This can be done by either inserting a new action supporting or removing from the graph the action that is a precondition of (which can introduce new inconsistencies).

Each local move creates a new NA-graph , which is evaluated as a weighted combination of two factors: and . Here, is the amount of search effort to resolve inconsistencies newly introduced by inserting or removing action ; it is measured by the number of actions in a relaxed plan resolving all such inconsistencies. The total cost , which is a default function to measure plan quality, is measured by the total action execution costs of all actions in . The two weight adjustment values and are used to steer the search toward either finding a solution quickly (higher value) or better solution quality (higher value). Metric-LPG then selects the local move leading to the smallest value.

Adjusting the evaluation function for finding set of plans with low ICP measure: To guide Metric-LPG towards optimizing our ICP-sensitive objective function instead of the original minimizing cost objective function, we need to replace the default plan quality measure with a new measure . Specifically, we adjust the function for evaluating each new NA-graph generated by local moves at each step to be a combination of and . Given the set of found plans , guides Metric-LPG’s search toward a plan generated from such that the resulting set has a minimum ICP value: . Thus, estimates the expected total ICP value if the best plan found by expanding is added to the current found plan set . Like the original Metric-LPG, is estimated by where is the relaxed plan resolving inconsistencies in caused by inserting or removing . The for a given NA-graph is calculated as: