# First Order Decision Diagrams for Relational MDPs

Markov decision processes capture sequential decision making under uncertainty, where an agent must choose actions so as to optimize long term reward. The paper studies efficient reasoning mechanisms for Relational Markov Decision Processes (RMDP) where world states have an internal relational structure that can be naturally described in terms of objects and relations among them. Two contributions are presented. First, the paper develops First Order Decision Diagrams (FODD), a new compact representation for functions over relational structures, together with a set of operators to combine FODDs, and novel reduction techniques to keep the representation small. Second, the paper shows how FODDs can be used to develop solutions for RMDPs, where reasoning is performed at the abstract level and the resulting optimal policy is independent of domain size (number of objects) or instantiation. In particular, a variant of the value iteration algorithm is developed by using special operations over FODDs, and the algorithm is shown to converge to the optimal policy.

## Authors

• 2 publications
• 3 publications
• 8 publications
• ### On the Complexity of Policy Iteration

Decision-making problems in uncertain or stochastic domains are often fo...
01/23/2013 ∙ by Yishay Mansour, et al. ∙ 0

• ### Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

We study an approach to policy selection for large relational Markov Dec...
09/09/2011 ∙ by A. Fern, et al. ∙ 0

• ### Intrinsically Motivated Multimodal Structure Learning

We present a long-term intrinsically motivated structure learning method...
07/15/2016 ∙ by Jay Ming Wong, et al. ∙ 0

• ### Models and algorithms for skip-free Markov decision processes on trees

We introduce a class of models for multidimensional control problems whi...
09/17/2013 ∙ by E. J. Collins, et al. ∙ 0

• ### Solving Relational MDPs with Exogenous Events and Additive Rewards

We formalize a simple but natural subclass of service domains for relati...
06/26/2013 ∙ by S. Joshi, et al. ∙ 0

• ### Exploiting First-Order Regression in Inductive Policy Selection

We consider the problem of computing optimal generalised policies for re...
07/11/2012 ∙ by Charles Gretton, et al. ∙ 0

• ### Universal Decision Models

Humans are universal decision makers: we reason causally to understand t...
10/28/2021 ∙ by Sridhar Mahadevan, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Many real-world problems can be cast as sequential decision making under uncertainty. Consider a simple example in a logistics domain where an agent delivers boxes. The agent can take three types of actions: to load a box on a truck, to unload a box from a truck, and to drive a truck to a city. However the effects of actions may not be perfectly predictable. For example its gripper may be slippery so load actions may not succeed, or its navigation module may not be reliable and it may end up in a wrong location. This uncertainty compounds the already complex problem of planning a course of action to achieve some goals or maximize rewards.

Markov Decision Processes (MDP) have become the standard model for sequential decision making under uncertainty [Boutilier, Dean,  HanksBoutilier et al.1999]

. These models also provide a general framework for artificial intelligence (AI) planning, where an agent has to achieve or maintain a well-defined goal. MDPs model an agent interacting with the world. The agent can fully observe the state of the world and takes actions so as to change the state. In doing that, the agent tries to optimize a measure of the long term reward it can obtain using such actions.

The classical representation and algorithms for MDPs [PutermanPuterman1994] require enumeration of the state space. For more complex situations we can specify the state space in terms of a set of propositional variables called state attributes. These state attributes together determine the world state. Consider a very simple logistics problem that has only one box and one truck. Then we can have state attributes such as truck in Paris (TP), box in Paris (BP), box in Boston (BB), etc. If we let the state space be represented by binary state attributes then the total number of states would be . For some problems, however, the domain dynamics and resulting solutions have a simple structure that can be described compactly using the state attributes, and previous work known as the propositionally factored approach

has developed a suite of algorithms that take advantage of such structure and avoid state enumeration. For example, one can use dynamic Bayesian networks, decision trees, and algebraic decision diagrams to concisely represent the MDP model. This line of work showed substantial speedup for propositionally factored domains

[Boutilier, Dearden,  GoldszmidtBoutilier et al.1995, Boutilier, Dean,  GoldszmidtBoutilier et al.2000, Hoey, St-Aubin, Hu,  BoutilierHoey et al.1999].

The logistics example presented above is very small. Any realistic problem will have a large number of objects and corresponding relations among them. Consider a problem with four trucks, three boxes, and where the goal is to have a box in Paris, but it does not matter which box is in Paris. With the propositionally factored approach, we need to have one propositional variable for every possible instantiation of the relations in the domain, e.g., box 1 in Paris, box 2 in Paris, box 1 on truck 1, box 2 on truck 1, and so on, and the action space expands in the same way. The goal becomes a ground disjunction over different instances stating “box 1 in Paris, or box 2 in Paris, or box 3 in Paris, or box 4 in Paris”. Thus we get a very large MDP and at the same time we lose the structure implicit in the relations and the potential benefits of this structure in terms of computation.

This is the main motivation behind relational or first order MDPs (RMDP).111 SannerBo2005 make a distinction between first order MDPs that can utilize the full power of first order logic to describe a problem and relational MDPs that are less expressive. We follow this in calling our language RMDP. A first order representation of MDPs can describe domain objects and relations among them, and can use quantification in specifying objectives. In the logistics example, we can introduce three predicates to capture the relations among domain objects, i.e., , , and with their obvious meaning. We have three parameterized actions, i.e., , , and . Now domain dynamics, reward, and solutions can be described compactly and abstractly using the relational notation. For example, we can define the goal using existential quantification, i.e., . Using this goal one can identify an abstract policy, which is optimal for every possible instance of the domain. Intuitively when there are steps to go, the agent will be rewarded if there is any box in Paris. When there is one step to go and there is no box in Paris yet, the agent can take one action to help achieve the goal. If there is a box (say ) on a truck (say ) and the truck is in Paris, then the agent can execute the action , which may make true, thus the goal will be achieved. When there are two steps to go, if there is a box on a truck that is in Paris, the agent can take the

action twice (to increase the probability of successful unloading of the box), or if there is a box on a truck that is not in Paris, the agent can first take the action

followed by . The preferred plan will depend on the success probability of the different actions. The goal of this paper is to develop efficient solutions for such problems using a relational approach, which performs general reasoning in solving problems and does not propositionalize the domain. As a result the complexity of our algorithms does not change when the number of domain objects changes. Also the solutions obtained are good for any domain of any size (even infinite ones) simultaneously. Such an abstraction is not possible within the propositional approach.

Several approaches for solving RMDPs were developed over the last few years. Much of this work was devoted to developing techniques to approximate RMDP solutions using different representation languages and algorithms [Guestrin, Koller, Gearhart,  KanodiaGuestrin et al.2003a, Fern, Yoon,  GivanFern et al.2003, Gretton  ThiebauxGretton  Thiebaux2004, Sanner  BoutilierSanner  Boutilier2005, Sanner  BoutilierSanner  Boutilier2006]

. For example, DzeroskiDeDr01 and Driessens2006 use reinforcement learning techniques with relational representations. FernYoGi2006 and GrettonTh2004 use inductive learning methods to learn a value map or policy from solutions or simulations of small instances. SannerBo2005,SannerBo2006 develop an approach to approximate value iteration that does not need to propositionalize the domain. They represent value functions as a linear combination of first order basis functions and obtain the weights by lifting the propositional approximate linear programming techniques

[Schuurmans  PatrascuSchuurmans  Patrascu2001, Guestrin, Koller, Par,  VenktaramanGuestrin et al.2003b] to handle the first order case.

There has also been work on exact solutions such as symbolic dynamic programming (SDP) [Boutilier, Reiter,  PriceBoutilier et al.2001], the relational Bellman algorithm (ReBel) [Kersting, Otterlo,  De RaedtKersting et al.2004], and first order value iteration (FOVIA) [Großmann, Hölldobler,  SkvortsovaGroßmann et al.2002, Höolldobler, Karabaev,  SkvortsovaHöolldobler et al.2006]. There is no working implementation of SDP because it is hard to keep the state formulas consistent and of manageable size in the context of the situation calculus. Compared with SDP, ReBel and FOVIA provide more practical solutions. They both use restricted languages to represent RMDPs, so that reasoning over formulas is easier to perform. In this paper we develop a representation that combines the strong points of these approaches.

Our work is inspired by the successful application of Algebraic Decision Diagrams (ADD) [BryantBryant1986, McMillanMcMillan1993, Bahar, Frohm, Gaona, Hachtel, Macii, Pardo,  SomenziBahar et al.1993] in solving propositionally factored MDPs and POMDPs [Hoey, St-Aubin, Hu,  BoutilierHoey et al.1999, St-Aubin, Hoey,  BoutilierSt-Aubin et al.2000, Hansen  FengHansen  Feng2000, Feng  HansenFeng  Hansen2002]. The intuition behind this idea is that the ADD representation allows information sharing, e.g., sharing the value of all states that belong to an “abstract state”, so that algorithms can consider many states together and do not need to resort to state enumeration. If there is sufficient regularity in the model, ADDs can be very compact, allowing problems to be represented and solved efficiently. We provide a generalization of this approach by lifting ADDs to handle relational structure and adapting the MDP algorithms. The main difficulty in lifting the propositional solution, is that in relational domains the transition function specifies a set of schemas for conditional probabilities. The propositional solution uses the concrete conditional probability to calculate the regression function. But this is not possible with schemas. One way around this problem is to first ground the domain and problem at hand and only then perform the reasoning [<]see for example¿SanghaiDoWe2005. However this does not allow for solutions abstracting over domains and problems. Like SDP, ReBel, and FOVIA, our constructions do perform general reasoning.

First order decision trees and even decision diagrams have already been considered in the literature [Blockeel  De RaedtBlockeel  De Raedt1998, Groote  TveretinaGroote  Tveretina2003] and several semantics for such diagrams are possible. BlockeelDR98 lift propositional decision trees to handle relational structure in the context of learning from relational datasets. GrooteTv2003 provide a notation for first order Binary Decision Diagrams (BDD) that can capture formulas in Skolemized conjunctive normal form and then provide a theorem proving algorithm based on this representation. The paper investigates both approaches and identifies the approach of GrooteTv2003 as better suited for the operations of the value iteration algorithm. Therefore we adapt and extend their approach to handle RMDPs. In particular, our First Order Decision Diagrams (FODD) are defined by modifying first order BDDs to capture existential quantification as well as real-valued functions through the use of an aggregation over different valuations for a diagram. This allows us to capture MDP value functions using algebraic diagrams in a natural way. We also provide additional reduction transformations for algebraic diagrams that help keep their size small, and allow the use of background knowledge in reductions. We then develop appropriate representations and algorithms showing how value iteration can be performed using FODDs. At the core of this algorithm we introduce a novel diagram-based algorithm for goal regression where, given a diagram representing the current value function, each node in this diagram is replaced with a small diagram capturing its truth value before the action. This offers a modular and efficient form of regression that accounts for all potential effects of an action simultaneously. We show that our version of abstract value iteration is correct and hence it converges to optimal value function and policy.

To summarize, the contributions of the paper are as follows. The paper identifies the multiple path semantics [<]extending¿GrooteTv2003 as a useful representation for RMDPs and contrasts it with the single path semantics of BlockeelDR98. The paper develops FODDs and algorithms to manipulate them in general and in the context of RMDPs. The paper also develops novel weak reduction operations for first order decision diagrams and shows their relevance to solving relational MDPs. Finally the paper presents a version of the relational value iteration algorithm using FODDs and shows that it is correct and thus converges to the optimal value function and policy. While relational value iteration was developed and specified in previous work [Boutilier, Reiter,  PriceBoutilier et al.2001], to our knowledge this is the first detailed proof of correctness and convergence for the algorithm.

This section has briefly summarized the research background, motivation, and our approach. The rest of the paper is organized as follows. Section 2 provides background on MDPs and RMDPs. Section 3 introduces the syntax and the semantics of First Order Decision Diagrams (FODD), and Section 4 develops reduction operators for FODDs. Sections 5 and 6 present a representation of RMDPs using FODDs, the relational value iteration algorithm, and its proof of correctness and convergence. The last two sections conclude the paper with a discussion of the results and future work.

## 2 Relational Markov Decision Processes

We assume familiarity with standard notions of MDPs and value iteration [<]see for example¿Bellman1957,Puterman1994. In the following we introduce some of the notions. We also introduce relational MDPs and discuss some of the previous work on solving them.

Markov Decision Processes (MDPs) provide a mathematical model of sequential optimization problems with stochastic actions. A MDP can be characterized by a state space , an action space , a state transition function denoting the probability of transition to state given state and action , and an immediate reward function , specifying the immediate utility of being in state . A solution to a MDP is an optimal policy that maximizes expected discounted total reward as defined by the Bellman equation:

 V∗(s)=maxa∈A[r(s)+γ∑s′∈SPr(s′|s,a)V∗(s′)]

where

represents the optimal state-value function. The value iteration algorithm (VI) uses the Bellman equation to iteratively refine an estimate of the value function:

 Vn+1(s)=maxa∈A[r(s)+γ∑s′∈SPr(s′|s,a)Vn(s′)] (1)

where represents our current estimate of the value function and is the next estimate. If we initialize this process with as the reward function, captures the optimal value function when we have steps to go. As discussed further below the algorithm is known to converge to the optimal value function.

BoutilierRePr2001 used the situation calculus to formalize first order MDPs and a structured form of the value iteration algorithm. One of the useful restrictions introduced in their work is that stochastic actions are specified as a randomized choice among deterministic alternatives. For example, action in the logistics example can succeed or fail. Therefore there are two alternatives for this action: (unload success) and (unload failure). The formulation and algorithms support any number of action alternatives. The randomness in the domain is captured by a random choice specifying which action alternative ( or ) gets executed when the agent attempts an action (

). The choice is determined by a state-dependent probability distribution characterizing the dynamics of the world. In this way one can separate the regression over effects of action alternatives, which is now deterministic, from the probabilistic choice of action. This considerably simplifies the reasoning required since there is no need to perform probabilistic goal regression directly. Most of the work on RMDPs has used this assumption, and we use this assumption as well. SannerBo2007 investigate a model going beyond this assumption.

Thus relational MDPs are specified by the set of predicates in the domain, the set of probabilistic actions in the domain, and the reward function. For each probabilistic action, we specify the deterministic action alternatives and their effects, and the probabilistic choice among these alternatives. A relational MDP captures a family of MDPs that is generated by choosing an instantiation of the state space. Thus the logistics example corresponds to all possible instantiations with 2 boxes or with 3 boxes and so on. We only get a concrete MDP by choosing such an instantiation.222 One could define a single MDP including all possible instances at the same time, e.g. it will include some states with 2 boxes, some states with 3 boxes and some with an infinite number of boxes. But obviously subsets of these states form separate MDPs that are disjoint. We thus prefer the view of a RMDP as a family of MDPs. Yet our algorithms will attempt to solve the entire MDP family simultaneously.

BoutilierRePr2001 introduce the case notation to represent probabilities and rewards compactly. The expression , where is a logical formula, is equivalent to . In other words, equals when is true. In general, the ’s are not constrained but some steps in the VI algorithm require that the ’s are disjoint and partition the state space. In this case, exactly one is true in any state. Each denotes an abstract state whose member states have the same value for that probability or reward. For example, the reward function for the logistics domain, discussed above and illustrated on the right side of Figure 1, can be captured as . We also have the following notation for operations over function defined by case expressions. The operators and are defined by taking a cross product of the partitions and adding or multiplying the case values.

 case[ϕi,ti:i≤n]⊕case[ψj,vj:j≤m]=case[ϕi∧ψj,ti+vj:i≤n,j≤m]
 case[ϕi,ti:i≤n]⊗case[ψj,vj:j≤m]=case[ϕi∧ψj,ti⋅vj:i≤n,j≤m].

In each iteration of the VI algorithm, the value of a stochastic action parameterized with free variables is determined in the following manner:

 QA(→x)(s)=rCase(s)⊕[γ⊗⊕j(pCase(nj(→x),s)⊗Regr(nj(→x),vCase(do(nj(→x),s))))] (2)

where and denote reward and value functions in case notation, denotes the possible outcomes of the action , and the choice probabilities for . Note that we can replace a sum over possible next states in the standard value iteration (Equation 1) with a finite sum over the action alternatives (reflected in in Equation 2), since different next states arise only through different action alternatives.

, capturing goal regression, determines what states one must be in before an action in order to reach a particular state after the action. Figure 1 illustrates the regression of in the reward function through the action alternative . will be true after the action if it was true before or box was on truck and truck was in Paris. Notice how the reward function partitions the state space into two regions or abstract states, each of which may include an infinite number of complete world states (e.g., when we have an infinite number of domain objects). Also notice how we get another set of abstract states after the regression step. In this way first order regression ensures that we can work on abstract states and never need to propositionalize the domain.

After the regression, we get a parameterized -function which accounts for all possible instances of the action. We need to maximize over the action parameters of the -function to get the maximum value that could be achieved by using an instance of this action. To illustrate this step, consider the logistics example where we have two boxes and , and is on truck , which is in Paris (that is, and ), while is in Boston (). For the action schema , we can instantiate and with and respectively, which will help us achieve the goal; or we can instantiate and with and respectively, which will have no effect. Therefore we need to perform maximization over action parameters to get the best instance of an action. Yet, we must perform this maximization generically, without knowledge of the actual state. In SDP, this is done in several steps. First, we add existential quantifiers over action parameters (which leads to non disjoint partitions). Then we sort the abstract states in by the value in decreasing order and include the negated conditions for the first abstract states in the formula for the , ensuring mutual exclusion. Notice how this step leads to complex description of the resulting state partitions in SDP. This process is performed for every action separately. We call this step object maximization and denote it with .

Finally, to get the next value function we maximize over the -functions of different actions. These three steps provide one iteration of the VI algorithm which repeats the update until convergence.

The solutions of ReBel [Kersting, Otterlo,  De RaedtKersting et al.2004] and FOVIA [Großmann, Hölldobler,  SkvortsovaGroßmann et al.2002, Höolldobler, Karabaev,  SkvortsovaHöolldobler et al.2006] follow the same outline but use a simpler logical language for representing RMDPs. An abstract state in ReBel is captured using an existentially quantified conjunction. FOVIA [Großmann, Hölldobler,  SkvortsovaGroßmann et al.2002, Höolldobler, Karabaev,  SkvortsovaHöolldobler et al.2006] has a more complex representation allowing a conjunction that must hold in a state and a set of conjunctions that must be violated. An important feature in ReBel is the use of decision list [RivestRivest1987] style representations for value functions and policies. The decision list gives us an implicit maximization operator since rules higher on the list are evaluated first. As a result the object maximization step is very simple in ReBel. Each state partition is represented implicitly by the negation of all rules above it, and explicitly by the conjunction in the rule. On the other hand, regression in ReBel requires that one enumerate all possible matches between a subset of a conjunctive goal (or state partition) and action effects, and reason about each of these separately. So this step can potentially be improved.

In the following section we introduce a new representation – First Order Decision Diagrams (FODD). FODDs allow for sharing of parts of partitions, leading to space and time saving. More importantly the value iteration algorithm based on FODDs has both simple regression and simple object maximization.

## 3 First Order Decision Diagrams

A decision diagram is a graphical representation for functions over propositional (Boolean) variables. The function is represented as a labeled rooted directed acyclic graph where each non-leaf node is labeled with a propositional variable and has exactly two children. The outgoing edges are marked with values true and false. Leaves are labeled with numerical values. Given an assignment of truth values to the propositional variables, we can traverse the graph where in each node we follow the outgoing edge corresponding to its truth value. This gives a mapping from any assignment to a leaf of the diagram and in turn to its value. If the leaves are marked with values in then we can interpret the graph as representing a Boolean function over the propositional variables. Equivalently, the graph can be seen as representing a logical expression which is satisfied if and only if the 1 leaf is reached. The case with

leaves is known as Binary Decision Diagrams (BDDs) and the case with numerical leaves (or more general algebraic expressions) is known as Algebraic Decision Diagrams (ADDs). Decision Diagrams are particularly interesting if we impose an order over propositional variables and require that node labels respect this order on every path in the diagram; this case is known as Ordered Decision Diagrams (ODD). In this case every function has a unique canonical representation that serves as a normal form for the function. This property means that propositional theorem proving is easy for ODD representations. For example, if a formula is contradictory then this fact is evident when we represent it as a BDD, since the normal form for a contradiction is a single leaf valued

. This property together with efficient manipulation algorithms for ODD representations have led to successful applications, e.g., in VLSI design and verification [BryantBryant1992, McMillanMcMillan1993, Bahar, Frohm, Gaona, Hachtel, Macii, Pardo,  SomenziBahar et al.1993] as well as MDPs [Hoey, St-Aubin, Hu,  BoutilierHoey et al.1999, St-Aubin, Hoey,  BoutilierSt-Aubin et al.2000]. In the following we generalize this representation for relational problems.

### 3.1 Syntax of First Order Decision Diagrams

There are various ways to generalize ADDs to capture relational structure. One could use closed or open formulas in the nodes, and in the latter case we must interpret the quantification over the variables. In the process of developing the ideas in this paper we have considered several possibilities including explicit quantifiers but these did not lead to useful solutions. We therefore focus on the following syntactic definition which does not have any explicit quantifiers.

For this representation, we assume a fixed set of predicates and constant symbols, and an enumerable set of variables. We also allow using an equality between any pair of terms (constants or variables).

###### Definition 1

First Order Decision Diagram

1. A First Order Decision Diagram (FODD) is a labeled rooted directed acyclic graph, where each non-leaf node has exactly two children. The outgoing edges are marked with values true and false.

2. Each non-leaf node is labeled with: an atom or an equality where each is a variable or a constant.

3. Leaves are labeled with numerical values.

Figure 2 shows a FODD with binary leaves. Left going edges represent true branches. To simplify diagrams in the paper we draw multiple copies of the leaves 0 and 1 (and occasionally other values or small sub-diagrams) but they represent the same node in the FODD.

We use the following notation: for a node , denotes the true branch of , and the false branch of ; is an outgoing edge from , where can be true or false. For an edge , is the node that edge issues from, and is the node that edge points to. Let and be two edges, we have iff .

In the following we will slightly abuse the notation and let mean either an edge or the sub-FODD this edge points to. We will also use and interchangeably where and can be true or false depending on whether lies in the true or false branch of .

### 3.2 Semantics of First Order Decision Diagrams

We use a FODD to represent a function that assigns values to states in a relational MDP. For example, in the logistics domain, we might want to assign values to different states in such a way that if there is a box in Paris, then the state is assigned a value of 19; if there is no box in Paris but there is a box on a truck that is in Paris and it is raining, this state is assigned a value of 6.3, and so on.333 This is a result of regression in the logistics domain cf. Figure 19(l). The question is how to define the semantics of FODDs in order to have the intended meaning.

The semantics of first order formulas are given relative to interpretations. An interpretation has a domain of elements, a mapping of constants to domain elements and, for each predicate, a relation over the domain elements which specifies when the predicate is true. In the MDP context, a state can be captured by an interpretation. For example in the logistics domain, a state includes objects such as boxes, trucks, and cities, and relations among them, such as box 1 on truck 1 (), box 2 in Paris () and so on. There is more than one way to define the meaning of FODD on interpretation . In the following we discuss two possibilities.

#### 3.2.1 Semantics Based on a Single Path

A semantics for relational decision trees is given by BlockeelDR98 and it can be adapted to FODDs. The semantics define a unique path that is followed when traversing relative to . All variables are existential and a node is evaluated relative to the path leading to it.

In particular, when we reach a node some of its variables have been seen before on the path and some are new. Consider a node with label and the path leading to it from the root, and let be the conjunction of all labels of nodes that are exited on the true branch on the path. Then in the node we evaluate , where includes all the variables in and . If this formula is satisfied in then we follow the true branch. Otherwise we follow the false branch. This process defines a unique path from the root to a leaf and its value.

For example, if we evaluate the diagram in Figure 2 on the interpretation with domain and where the only true atoms are then we follow the true branch at the root since is satisfied, but we follow the false branch at since is not satisfied. Since the leaf is labeled with 0 we say that does not satisfy . This is an attractive approach, because it partitions the set of interpretations into mutually exclusive sets and this can be used to create abstract state partitions in the MDP context. However, for reasons we discuss later, this semantics leads to various complications for the value iteration algorithm, and it is therefore not used in the paper.

#### 3.2.2 Semantics Based on Multiple Paths

The second alternative builds on work by GrooteTv2003 who defined semantics based on multiple paths. Following this work, we define the semantics first relative to a variable valuation . Given a FODD over variables and an interpretation , a valuation maps each variable in to a domain element in . Once this is done, each node predicate evaluates either to true or false and we can traverse a single path to a leaf. The value of this leaf is denoted by .

Different valuations may give different values; but recall that we use FODDs to represent a function over states, and each state must be assigned a single value. Therefore, we next define

 MAPB(I)=aggregateζ{MAPB(I,ζ)}

for some aggregation function. That is, we consider all possible valuations , and for each valuation we calculate . We then aggregate over all these values. In the special case of GrooteTv2003 leaf labels are in and variables are universally quantified; this is easily captured in our formulation by using minimum as the aggregation function. In this paper we use maximum as the aggregation function. This corresponds to existential quantification in the binary case (if there is a valuation leading to value , then the value assigned will be ) and gives useful maximization for value functions in the general case. We therefore define:

 MAPB(I)=maxζ{MAPB(I,ζ)}.

Using this definition assigns every a unique value so defines a function from interpretations to real values. We later refer to this function as the map of .

Consider evaluating the diagram in Figure 2 on the interpretation given above where the only true atoms are . The valuation where is mapped to and is mapped to 3 denoted leads to a leaf with value 1 so the maximum is 1. When leaf labels are in {0,1}, we can interpret the diagram as a logical formula. When , as in our example, we say that satisfies and when we say that falsifies .

We define node formulas (NF) and edge formulas (EF) recursively as follows. For a node labeled with incoming edges , the node formula . The edge formula for the true outgoing edge of is . The edge formula for the false outgoing edge of is . These formulas, where all variables are existentially quantified, capture the conditions under which a node or edge are reached.

### 3.3 Basic Reduction of FODDs

GrooteTv2003 define several operators that reduce a diagram into normal form. A total order over node labels is assumed. We describe these operators briefly and give their main properties.

(R1)

Neglect operator: if both children of a node in the FODD lead to the same node then we remove and link all parents of to directly.

(R2)

Join operator: if two nodes have the same label and point to the same two children then we can join and (remove and link ’s parents to ).

(R3)

Merge operator: if a node and its child have the same label then the parent can point directly to the grandchild.

(R4)

Sort operator: If a node is a parent of but the label ordering is violated () then we can reorder the nodes locally using two copies of and such that labels of the nodes do not violate the ordering.

Define a FODD to be reduced if none of the four operators can be applied. We have the following:

###### Theorem 1

[Groote  TveretinaGroote  Tveretina2003]
(1) Let Neglect, Join, Merge, Sort be an operator and the result of applying to FODD , then for any , , and , .
(2) If are reduced and satisfy then they are identical.

Property (1) gives soundness, and property (2) shows that reducing a FODD gives a normal form. However, this only holds if the maps are identical for every and this condition is stronger than normal equivalence. This normal form suffices for GrooteTv2003 who use it to provide a theorem prover for first order logic, but it is not strong enough for our purposes. Figure 3 shows two pairs of reduced FODDs (with respect to R1-R4) such that but . In this case although the maps are the same the FODDs are not reduced to the same form. Consider first the pair in part (a) of the figure. An interpretation where is false but is true and a substitution leads to value of 0 in while always evaluates to 1. But the diagrams are equivalent. For any interpretation, if is true for any object then through the substitution ; if is false for any object then through the substitution . Thus the map is always 1 for as well. In Section 4.2 we show that with the additional reduction operators we have developed, B1 in the first pair is reduced to . Thus the diagrams in (a) have the same form after reduction. However, our reductions do not resolve the second pair given in part (b) of the figure. Notice that both functions capture a path of two edges labeled in a graph (we just change the order of two nodes and rename variables) so the diagrams evaluate to 1 if and only if the interpretation has such a path. Even though B1 and B2 are logically equivalent, they cannot be reduced to the same form using R1-R4 or our new operators. To identify a unique minimal syntactic form one may have to consider all possible renamings of variables and the sorted diagrams they produce, but this is an expensive operation. A discussion of normal form for conjunctions that uses such an operation is given by GarrigaKhRa2007.

### 3.4 Combining FODDs

Given two algebraic diagrams we may need to add the corresponding functions, take the maximum or use any other binary operation, op, over the values represented by the functions. Here we adopt the solution from the propositional case [BryantBryant1986] in the form of the procedure Apply(,,op) where and are algebraic diagrams. Let and be the roots of and respectively. This procedure chooses a new root label (the lower among labels of ) and recursively combines the corresponding sub-diagrams, according to the relation between the two labels (, , or ). In order to make sure the result is reduced in the propositional sense one can use dynamic programming to avoid generating nodes for which either neglect or join operators ((R1) and (R2) above) would be applicable.

Figure 4 illustrates this process. In this example, we assume predicate ordering as , and parameter ordering . Non-leaf nodes are annotated with numbers and numerical leaves are underlined for identification during the execution trace. For example, the top level call adds the functions corresponding to nodes 1 and 3. Since is the smaller label it is picked as the label for the root of the result. Then we must add both left and right child of node 1 to node 3. These calls are performed recursively. It is easy to see that the size of the result may be the product of sizes of input diagrams. However, much pruning will occur with shared variables and further pruning is made possible by weak reductions presented later.

Since for any interpretation and any fixed valuation the FODD is propositional, we have the following lemma. We later refer to this property as the correctness of Apply.

###### Lemma 1

Let , then for any and , .

Proof: First we introduce some terminology. Let refer to the set of all nodes in a FODD . Let the root nodes of and be and respectively. Let the FODDs rooted at , , , , , and be , , , , and respectively.

The proof is by induction on . The lemma is true for , because in this case both and have to be single leaves and an operation on them is the same as an operation on two real numbers. For the inductive step we need to consider two cases.

Case 1: . Since the root nodes are equal, if a valuation reaches , then it will also reach and if reaches , then it will also reach . Also, by the definition of Apply, in this case and . Therefore the statement of the lemma is true if and for any and . Now, since and , this is guaranteed by the induction hypothesis.

Case 2: . Without loss of generality let us assume that . By the definition of Apply, and . Therefore the statement of the lemma is true if and for any and . Again this is guaranteed by the induction hypothesis.

### 3.5 Order of Labels

The syntax of FODDs allows for two “types” of objects: constants and variables. Any argument of a predicate can be a constant or a variable. We assume a complete ordering on predicates, constants, and variables. The ordering between two labels is given by the following rules.

1. if

2. if there exists such that for all , and (where “type” can be constant or variable) or and .

While the predicate order can be set arbitrarily it appears useful to assign the equality predicate as the first in the predicate ordering so that equalities are at the top of the diagrams. During reductions we often encounter situations where one side of the equality can be completely removed leading to substantial space savings. It may also be useful to order the argument types so that constant

variables. This ordering may be helpful for reductions. Intuitively, a variable appearing lower in the diagram can be bound to the value of a constant that appears above it. These are only heuristic guidelines and the best ordering may well be problem dependent. We later introduce other forms of arguments:

predicate parameters and action parameters. The ordering for these is discussed in Section 6.

In our context, especially for algebraic FODDs, we may want to reduce the diagrams further. We distinguish strong reductions that preserve for all and weak reductions that only preserve . Theorem 1 shows that R1-R4 given above are strong reductions. The details of our relational VI algorithm do not directly depend on the reductions used. Readers more interested in RMDP details can skip to Section 5 which can be read independently (except where reductions are illustrated in examples).

All the reduction operators below can incorporate existing knowledge on relationships between predicates in the domain. We denote this background knowledge by . For example in the Blocks World we may know that if there is a block on block then it is not clear: .

In the following when we define conditions for reduction operators, there are two types of conditions: the reachability condition and the value condition. We name reachability conditions by starting with P (for Path Condition) and the reduction operator number. We name conditions on values by starting with V and the reduction operator number.

### 4.1 (R5) Strong Reduction for Implied Branches

Consider any node such that whenever is reached then the true branch is followed. In this case we can remove and connect its parents directly to the true branch. We first present the condition, followed by the lemma regarding this operator.

(P5) : where are the variables in .

Let denote the operator that removes node and connects its parents directly to the true branch. Notice that this is a generalization of R3. It is easy to see that the following lemma is true:

###### Lemma 2

Let be a FODD, a node for which condition P5 holds, and the result of . Then for any interpretation and any valuation we have .

A similar reduction can be formulated for the false branch, i.e., if then whenever node is reached then the false branch is followed. In this case we can remove and connect its parents directly to the false branch.

Implied branches may simply be a result of equalities along a path. For example so we may prune if and are known to be true. Implied branches may also be a result of background knowledge. For example in the Blocks World if is guaranteed to be true when we reach a node labeled then we can remove and connect its parent to .

### 4.2 (R7) Weak Reduction Removing Dominated Edges

Consider any two edges and in a FODD whose formulas satisfy that if we can follow using some valuation then we can also follow using a possibly different valuation. If gives better value than then intuitively never determines the value of the diagram and is therefore redundant. We formalize this as reduction operator R7.444We use R7 and skip the notation R6 for consistency with earlier versions of this paper. See further discussion in Section 4.2.1.

Let , , and , where and can be true or false. We first present all the conditions for the operator and then follow with the definition of the operator.

(P7.1) : where are the variables in and the variables in .

(P7.2) : where are the variables that appear in both and , the variables that appear in but are not in , and the variables that appear in but are not in . This condition requires that for every valuation that reaches there is a valuation that reaches such that and agree on all variables that appear in both and .

(P7.3) : where are the variables that appear in both and , the variables that appear in but are not in , and the variables that appear in but are not in . This condition requires that for every valuation that reaches there is a valuation that reaches such that and agree on all variables that appear in both and .

(V7.1) : where is the minimum leaf value in , and the maximum leaf value in . In this case regardless of the valuation we know that it is better to follow and not .

(V7.2) : .

(V7.3) : all leaves in have non-negative values, denoted as . In this case for any fixed valuation it is better to follow instead of .

(V7.4) : all leaves in have non-negative values.

We define the operators as replacing with a constant that is between 0 and (we may write it as if ), and as dropping the node and connecting its parents to .

We need one more “safety” condition to guarantee that the reduction is correct:

(S1) : and the sub-FODD of remain the same before and after R7-replace and R7-drop. This condition says that we must not harm the value promised by . In other words, we must guarantee that is reachable just as before and the sub-FODD of is not modified after replacing a branch with . The condition is violated if is in the sub-FODD of , or if is in the sub-FODD of . But it holds in all other cases, that is when and are unrelated (one is not the descendant of the other), or is in the sub-FODD of , or is in the sub-FODD of , where are the negations of .

###### Lemma 3

Let be a FODD, and edges for which conditions P7.1, V7.1, and S1 hold, and the result of , where , then for any interpretation we have .

Proof: Consider any valuation that reaches . Then according to P7.1, there is another valuation reaching and by V7.1 it gives a higher value. Therefore, will never be determined by so we can replace with a constant between 0 and without changing the map.

###### Lemma 4

Let be a FODD, and edges for which conditions P7.2, V7.3, and S1 hold, and the result of , where , then for any interpretation we have .

Proof: Consider any valuation that reaches . By P7.2 there is another valuation reaching and and agree on all variables that appear in both and . Therefore, by V7.3 it achieves a higher value (otherwise, there must be a branch in with a negative value). Therefore according to maximum aggregation the value of will never be determined by , and we can replace it with a constant as described above.

Note that the conditions in the previous two lemmas are not comparable since P7.2 P7.1 and V7.1 V7.3. Intuitively when we relax the conditions on values, we need to strengthen the conditions on reachability. The subtraction operation is propositional, so the test in V7.3 implicitly assumes that the common variables in the operands are the same and P7.1 does not check this. Figure 5 illustrates that the reachability condition P7.1 together with V7.3, i.e., combining the weaker portions of conditions from Lemma 3 and Lemma 4, cannot guarantee that we can replace a branch with a constant. Consider an interpretation with domain and relations . In addition assume domain knowledge . So P7.1 and V7.3 hold for and . We have and . It is therefore not possible to replace with .

Sometimes we can drop the node completely with R7-drop. Intuitively, when we remove a node, we must guarantee that we do not gain extra value. The conditions for R7-replace can only guarantee that we will not lose any value. But if we remove the node , a valuation that was supposed to reach may reach a better value in ’s sibling. This would change the map, as illustrated in Figure 6. Notice that the conditions P7.1 and V7.1 hold for and so we can replace with a constant. Consider an interpretation with domain and relations . We have via valuation and via valuation . Thus removing is not correct.

Therefore we need the additional condition to guarantee that we will not gain extra value with node dropping. This condition can be stated as: for any valuation that reaches and thus will be redirected to reach a value in when is removed, there is a valuation that reaches a leaf with value . However, this condition is too complex to test in practice. In the following we identify two stronger conditions.

###### Lemma 5

Let be a FODD, and edges for which condition V7.2 hold in addition to the conditions for replacing with a constant, and the result of , then for any interpretation we have .

Proof: Consider any valuation reaching . As above its true value is dominated by another valuation reaching . When we remove the valuation will reach and by V7.2 the value produced is smaller than the value from . So again the map is preserved.

###### Lemma 6

Let be a FODD, and edges for which P7.3 and V7.4 hold in addition to conditions for replacing with a constant, and the result of , then for any interpretation we have .

Proof: Consider any valuation reaching . As above its value is dominated by another valuation reaching . When we remove the valuation will reach and by the conditions P7.3 and V7.4, the valuation will reach leaf of greater value in (otherwise there will be a branch in G leading to a negative value). So under maximum aggregation the map is not changed.

To summarize if P7.1 and V7.1 and S1 hold or P7.2 and V7.3 and S1 hold then we can replace with a constant. If we can replace and V7.2 or both P7.3 and V7.4 hold then we can drop completely.

In the following we provide a more detailed analysis of applicability and variants of R7.

#### 4.2.1 R6: A Special Case of R7

We have a special case of R7 when , i.e., and are siblings. In this context R7 can be considered to focus on a single node instead of two edges. Assuming that and , we can rewrite the conditions in R7 as follows.

(P7.1) : . This condition requires that if is reachable then is reachable.

(P7.2) : where are the variables that appear in both and , the variables that appear in but not in , and the variables in and not in or .

(P7.3) : where are the variables that appear in (since ), the variables that appear in but not in , and the variables in and not in or .

(V7.1) : .

(V7.2) : is a constant.

(V7.3) : all leaves in the diagram have non-negative values.

Conditions S1 and V7.4 are always true. We have previously analyzed this special case as a separate reduction operator named R6 [Wang, Joshi,  KhardonWang et al.2007]. While this is a special case, it may still be useful to check for it separately before applying the generalized case of R7, as it provides large reductions and seems to occur frequently in example domains.

An important special case of R6 occurs when is an equality where is a variable that does not occur in the FODD above node . In this case, the condition P7.1 holds since we can choose the value of . We can also enforce the equality in the sub-diagram of . Therefore if V7.1 holds we can remove the node connecting its parents to and substituting for in the diagram . (Note that we may need to make copies of nodes when doing this.) In Section 4.4 we introduce a more elaborate reduction to handle equalities by taking a maximum over the left and the right children.

#### 4.2.2 Application Order

In some cases several instances of R7 are applicable. It turns out that the order in which we apply them is important. In the following, the first example shows that the order affects the number of steps needed to reduce the diagram. The second example shows that the order affects the final result.

Consider the FODD in Figure 7(a). R7 is applicable to edges and , and and . If we reduce in a top down manner, i.e., first apply R7 on the pair and , we will get the FODD in Figure 7(b), and then we apply R7 again on and , and we will get the FODD in Figure 7(c). However, if we apply R7 first on and thus getting Figure 7(d), R7 cannot be applied to and because will have negative leaves. In this case, the diagram can still be reduced. We can reduce by comparing and that is in the right part of FODD. We can first remove and get a FODD shown in Figure 7(e), and then use the neglect operator to remove . As we see in this example applying one instance of R7 may render other instances not applicable or may introduce more possibilities for reductions so in general we must apply the reductions sequentially. Wang2007 develops conditions under which several instances of R7 can be applied simultaneously.

One might hope that repeated application of R7 will lead to a unique reduced result but this is not true. In fact, the final result depends on the choice of operators and the order of application. Consider Figure 8(a). R7 is applicable to edges and , and and . If we reduce in a top down manner, i.e., first apply R7 on the pair and , we will get the FODD in Figure 8(b), which cannot be reduced using existing reduction operators (including the operator R8 introduced below). However, if we apply R7 first on and we will get Figure 8(c). Then we can apply R7 again on and and get the final result Figure 8(d), which is clearly more compact than Figure 8(b). It is interesting that the first example seems to suggest applying R7 in a top down manner (since it takes fewer steps), while the second seems to suggest the opposite (since the final result is more compact). More research is needed to develop useful heuristics to guide the choice of reductions and the application order and in general develop a more complete set of reductions.

Note that we could also consider generalizing R7. In Figure 8(b), if we can reach then clearly we can reach or . Since both and give better values, we can safely replace with , thus obtaining the final result Figure 8(d). In theory we can generalize P7.1 as