Relational Linear Programs

10/12/2014 ∙ by Kristian Kersting, et al. ∙ 0

We propose relational linear programming, a simple framework for combing linear programs (LPs) and logic programs. A relational linear program (RLP) is a declarative LP template defining the objective and the constraints through the logical concepts of objects, relations, and quantified variables. This allows one to express the LP objective and constraints relationally for a varying number of individuals and relations among them without enumerating them. Together with a logical knowledge base, effectively a logical program consisting of logical facts and rules, it induces a ground LP. This ground LP is solved using lifted linear programming. That is, symmetries within the ground LP are employed to reduce its dimensionality, if possible, and the reduced program is solved using any off-the-shelf LP solver. In contrast to mainstream LP template languages like AMPL, which features a mixture of declarative and imperative programming styles, RLP's relational nature allows a more intuitive representation of optimization problems over relational domains. We illustrate this empirically by experiments on approximate inference in Markov logic networks using LP relaxations, on solving Markov decision processes, and on collective inference using LP support vector machines.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 23

page 24

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Modern social and technological trends result in an enormous increase in the amount of accessible data, with a significant portion of the resources being interrelated in a complex way and having inherent uncertainty. Such data, to which we may refer to as relational data, arise for instance in social network and media mining, natural language processing, open information extraction, the web, bioinformatics, and robotics, among others, and typically features substantial social and/or business value if become amenable to computing machinery. Therefore it is not surprising that probabilistic logical languages, see e.g. 

[Getoor:2007aa, deraedt08springer, DeRaedt:2008aa, deraedt10srl]

and references in there for overviews, are currently provoking a lot of new AI research with tremendous theoretical and practical implications. By combining aspects of logic and probabilities — a dream of AI dating back to at least the late 1980s when Nils Nilsson introduced the term probabilistic logics 

[nilsson86] — they help to effectively manage both complex interactions and uncertainty in the data. Moreover, since performing inference using traditional approaches within these languages is in principle rather costly, as tractability of traditional inference approaches comes at the price of either coarse approximations (often without any guarantees) or restrictions in the language, they have motivated novel forms of inference. In essence, probabilistic logical models can be viewed as collections of building blocks respectively templates such as weighted clauses that are instantiated several times to construct a ground probabilistic model. If few templates are instantiated often, the resulting ground model is likely to exhibit symmetries. In his seminal paper [poole03], David Poole suggested to exploited these symmetries to speed up inference within probabilistic logic models. This has motivated an active field of research known as lifted probabilistic inference, see e.g. [kersting12faia] and references in there.

However, instead of looking at AI through the glasses of probabilities over possible worlds, we may also approach it using optimization. That is, we have a preference relation over possible worlds, and we want a best possible world according to the preference. The preference is often to minimize some objective function. Consider for example a typical machine learning user in action solving a problem for some data. She selects a model for the underlying phenomenon to be learned (choosing a learning bias), formats the raw data according to the chosen model, and then tunes the model parameters by minimizing some objective function induced by the data and the model assumptions. In the process of model selection and validation, the core optimization problem in the last step may be solved many times. Ideally, the optimization problem solved in the last step falls within a class of mathematical programs for which efficient and robust solvers are available. For example, linear, semidefinite and quadratic programs, found at the heart of many popular AI and learning algorithms, can be solved efficiently by commercial-grade software packages.

This is an instance of the declarative “” paradigm currently observed a lot in AI [geffner14], machine learning and also data mining [gunsNR11]: instead of outlining how a solution should be computed, we specify what the problem is using some high-level modeling language and solve it using general solvers.

Unfortunately, however, today’s solvers for mathematical programs typically require that the mathematical program is presented in some canonical algebraic form or offer only some very restricted modeling environment. For example, a solver may require that a set of linear constraints be presented as a system of linear inequalities or that a semidefinite constraint be expressed as . This may create severe difficulties for the user:

  1. Practical optimization involves more than just the optimization of an objective function subject to constraints. Before optimization takes place, effort must be expended to formulate the model. This process of turning the intuition that defines the model “on paper” into a canonical form could be quite cumbersome. Consider the following example from graph isomorphism, see e.g. [atserias13siam]. Given two graphs and , the LP formulation introduces a variable for every possible partial function mapping vertices of to vertices in . In this case, it is not a trivial task to even come up with a convenient linear indexing of the variables, let alone expressing the resulting equations as . Such conversions require the user to produce and maintain complicated matrix generation code, which can be tedious and error-prone. Moreover, the reusability of such code is limited, as relatively minor modification of the equations could require large modifications of the code (for example, the user decides to switch from having variables over sets of vertices to variables over tuples of vertices). Ideally, one would like to separate the problem specification from the problem instance itself

  2. Canonical forms are inherently propositional. By design they cannot model domains with a variable number of individuals and relations among them without enumerating all of them. As already mentioned, however, many AI tasks and domains are best modeled in terms of individuals and relations. Agents must deal with heterogenous information of all types. Even more important, they must often build models before they know what individuals are in the domain and, therefore, before they know what variables exist. Hence modeling should facilitate the formulation of abstract, general knowledge.

To overcome these downsides and triggered by the success of probabilistic logical languages, we show that optimization is liftable to the relational level, too. Specifically, we focus on linear programs which are the most tractable, best understood, and widely used in practice subset of mathematical programs. Here, the objective is linear and the constraints involve linear (in)equalities only. Indeed, at the inference level within propositional models considerable attention has been already paid to the link between probabilistic models and linear programs. This relation is natural since the MAP inference problem can be relaxed into linear programs. At the relational and lifted level, however, the link has not been established nor explored yet.

Our main contribution is to establish this link, both at the language and at the inference level. We introduce relational linear programming best summarized as

The user describes a relational problem in a high level, relational LP modeling language and — given a logical knowledge base (LogKB) encoding some individuals or rather data — the system automatically induces a symmetry-reduced LP that in turn can be solved using any off-the-shelf LP solver. Its main building block are relational linear programs (RLPs). They are declarative LP templates defined through the logical concepts of individuals, relations, and quantified variables and allow the user to express the LP objective and constraints about a varying number of individuals without enumerating them. Together with a the LogKB referring to the individuals and relations, effectively a logical program consisting of logical facts and rules, it induces a ground LP. This ground LP can be solved using any LP solver. Our illustrations of relational linear programming on several AI problems will showcase that relational programming can considerably ease the process of turning the “modeller’s form” — the form in which the modeler understands a problem or actually a class of problems — into a machine readable form since we can now deal with a varying number of individuals and relational among them in a declarative way. We will show this for computing optimal value function of Markov decision processes [littmanDK95], for approximate inference within Markov logic networks [richardson2006markov] using LP relaxations as well as for collective classification [senNBGGE08].

In particular, as another contribution, we will showcase a novel approach to collective classification by relational linear programming

. Say we want to classify people as having or not having cancer. In addition to the usual “flat” data about attributes of people like age, education and smoking habits, we have access to the social network among the people, cf. Fig. 

1. This allows us to model influence among smoking habits among friend. Now imagine that we want to do the classification using support vector machines (SVMs) [vapnik1998statistical] which boils down to a quadratic optimization problem. Zhou et. al. [zhou2002linear] have shown that the same problem can be modeled as an LP with only a small loss in generalization performance. Existing LP template language, however, would require feature engineering to capture smoking habits among friend. In contrast, in an RLP one simply adds rules such as

saying that if two persons and are friends and smokes, then also smokes, at least passively. Moreover, as we will do in our experiments, we can formulate relational LP constraints to encode that objects that link to each other tend to be in the same class.

(a)
name age education smokes
anna 27 uni +
bob 22 college -
edward 25 college -
frank 30 uni -
gary 45 college +
helen 35 school +
iris 50 school -
(b)
Figure 1: Example for collective inference. There are people in a social network. Each person is described in terms of three attributes. The class label “cancer” is not shown.

However, the benefits of relational linear programming go beyond modeling. Since RLPs consist of templates that are instantiated several times to construct a ground linear model, they are also likely to induce ground models that exhibit symmetries, and we will demonstrate how to detect and exploit them. Specifically, we will introduce lifted linear programming (LLP). It detects and eliminates symmetries in a linear program in quasilinear time. Unlike lifted probabilistic inference methods such as lifted belief propagation [singla08aaai, kersting09uai, ahmadi2013mlj], which works only with belief propagation approximations for probabilistic inference, LLP does not depend on any specific LP solver — it can be seen as simply reparametrizing the linear program. As our experimental results on several AI tasks will show this can result in significant efficiency gains.

We proceed as follows. After touching upon related work, we start off by reviewing linear programming and existing LP template languages in Section 3. Then, in Section 4, we introduce relational linear programming, both the syntax and the semantics. Afterwards, Section 5 shows how to detect and exploit symmetries in linear programs. Before touching upon directions for future work and concluding, we illustrate relational linear programming on several examples from machine learning and AI.

2 Related Work on Relational and Lifted Linear Programming

The present paper is a significant extension of the AISTATS 2012 conference paper [mladenov2012lifted]. It provides a much more concise development of lifted linear programming compared to [mladenov2012lifted] and the first coherent view on relational linear programming as a novel and promising way for scaling AI. To do so, it develops the first relational modeling language for LPs and illustrates it empirically. One of the advantages of the language is the closeness of its syntax to the mathematical notation of LP problems. This allows for a very concise and readable relational definition of linear optimization problems, which is supported by certain language elements from logic such as individuals, relations, and quantified variables. The (relational) algebraic formulation of a model does not contain any hints how to process it. Indeed, several modeling language for mathematical programming have been proposed. Examples of popular modeling languages are AMPL [fourer1987ampl], GAMS [brooke1992gams], AIMMS [blomvall1993aimms], and Xpress-Mosel [cirianiCH03], but also see [kuip93, fragniere2002, wallace2005] for surveys. Although they are declarative, they focus on imperative programming styles to define the index sets and data tables typically used to construct LP model. They do not employ logical concepts such as clauses and unification. Moreover, since index sets and data tables are closely related to the attributes and relations of relational database systems, there have been also proposals for “feeding” linear program directly from relational database systems, see e.g. [mitra95, atamturk2000relational, farrellM05]. However, logic programming — which allows to use e.g. compound terms — was not considered and the resulting approaches do not provide a syntax close to the mathematical notation of linear programs. Recently, Mattingley and Boyd [MattingleySB12] have introduced CVXGEN, a software tool that takes a high level description of a convex optimization problem family, and automatically generates custom C code that compiles into a reliable, high speed solver for the problem family. Again concepts from logic programing were not used. Indeed, Gordon et al. [gordonHD09, zawadzkiGP11] developed first-order programming (FOP) that combines the strength of mixed-integer linear programming and first-order logic. In contrast to the present paper, however, they focused on first-order logical reasoning and not on specifying relationally and solving efficiently arbitrary linear programs. And non of these approaches as considered symmetries in LPs and how to detect and to exploit them.

Indeed, detection and exploiting symmetries within LPs is related to symmetry-aware approaches in (mixed–)integer programming [Margot_2010]. Howerver, they are vastly different to LPs in nature. Symmetries in ILP are used for pruning the symmetric branches of search trees, thus the dominant paradigm is to add symmetry breaking inequalities, similarly to what has been done for SAT and CSP [Sellmann05]. In contrast, lifted linear programming achieves speed-up by reducing the problem size. For ILPs, symmetry-aware methods typically focus on pruning the search space to eliminate symmetric solutions, see e.g. [Margot_2010] for a survey). In linear programming, however, one takes advantage of convexity and projects the LP into the fixed space of its symmetry group [Boedi13]. The projections we investigate in the present paper are similar in spirit. Until recently, discussions were mostly concentrated on the case where the symmetry group of the ILP/LP consists of permutations, e.g. [Boedi10]. In such cases the problem of computing the symmetry group of the LP can be reduced to computing the coloured automorphisms of a “coefficient” graph connected with the linear program, see e.g. [berthold2008, Margot_2010]. Moreover, the reduction of the LP in this case essentially consists of mapping variables to their orbits. Our approach subsumes this method, as we replace the orbits with a coarser equivalence relation which, in contrast to the orbits, is computable in quasilinear time. Going beyond permutations, Bödi and Herr [Boedi13] extend the scope of symmetry, showing that any invertible linear map, which preserves the feasible region and objective of the LP, may be used to speed-up solving. While this setting offers more compression, the symmetry detection problem becomes even more difficult.

After the AISTATS conference paper [mladenov2012lifted], lifted (I)LP-MAP inference approaches for (relational) graphical models based on graph automorphisms and variants have been explored in several ways, which go beyond the scope of the present paper. We refer to [bui12arxive, noessner13aaai, mladenov14aistats, apsel14aaai].

3 Linear Programming

Linear programs, see e.g. [dantzig]

, have found a wide application in the fields of operations research, where they are applied to problems like multicommodity flow and optimal resource allocation, and combinatorial optimization, where they provide the basis for many approximation algorithms for hard problems such as TSP. They have also found their way to machine learning and AI. Consider e.g. support vector machine (SVMs), which are among the most popular models for classification. Although they are traditionally formulated as quadratic optimization problems, there are also linear program (LP) formulations of SVMs such as Zhou

et al.’s LP-SVMs [zhou2002linear]. Many other max-margin approaches use LPs for inference as well such as LP boosting [demiriz2002linear] or LP-based large margin structure prediction [wang2009large]. They have also been used for the decoding step within column generation approaches to solving quadratic problem formulations of the collective classification task, see e.g. [kleinBS08, torkamaniL13] and reference in there. However, they are not based on LP-SVMs and on relational programming. In probabilistic graphical models LP relaxations are used for efficient approximate MAP inference, see e.g. [wainwright2008graphical]. Finding the optimal policy for a Markov decision problem can be formulated and solved with LPs [syed2008apprenticeship]

. Likewise, they have been used for inverse reinforcement learning 

[ng2000algorithms]

where the goal is to estimate a reward function from observed behaviour. In addition, recent work use approximate LPs for relational MDPs 

[sannerB09], which scale to problems of previously prohibitive size by avoiding grounding. However, they were not using relational LPs. Clustering can also be formulated via LPs, see e.g. [komodakis2008clustering]. Ataman applied LPs to learning optimal rankings for binary classification problems [ataman2007learning]. In many cases, even if a learning problem itself is not posed as an LP, linear programming is used to solve some intermediate steps. For instance Sandler et al. [sandler2005use] phrases computing the pseudoinverse of a matrix and greedy column selection from this pseudoinverse as LPs. The resulting matrix is then used for dimensionality reduction and unsupervised text classification. So what are linear programs?

3.1 Linear Programs

A linear program (LP) is an optimization problem that can be expressed in the following general form:

subject to

where and are matrices, and are real vectors of dimension and respectively, and denotes the inner product of two vectors. Note that the equality constraints can be reformulated as inequality constraints to yield an LP in the so-called dual form,

(1)
subject to

which we represent by the triplet .

While LP models often look intuitive on paper, applying these models in practice presents a challenge. The main issue is that the form of a problem representation that is natural for most solvers (i.e. the L triplet representation) is not the form that is natural for domain experts. Furthermore, the matrix is typically sparse, i.e., having mostly -entries. Modeling any real world problem in this form can be quite error prone and time consuming. In addition, it is often necessary to separate the general structure of a problem from the definition of a particular instance. For example, a flow problem formulated as an LP consists of a set of constraint for each edge in a graph, which do not depend on a particular graph. Hence, the definition of the flow LP can be separated from the specification of the graph at hand and, in turn, be applied to different graphs.

3.2 Declarative Modelling Languages for Linear Programs

The problems above are traditionally solved by modelling languages. They simplify LP definition by allowing to use algebraic notation instead of matrices and define an objective and constraints through parameters whose domains are defined in a separate file, thus enabling model/instance separation. Starting from Eq. 1, they typically make the involved arithmetic expressions explicit:

(2)
subject to

where the sets and as well as the corresponding non-zero entries of vectors , and matrix are defined in a separate file. This simplified representation is then automatically translated to the matrix format and fed to a solver that the user prefers.

To code the LP in this “set form”, several mathematical programming modelling languages have been proposed to implement this general idea. According to NEOS solver statistics111http://www.neos-server.org/neos/report.html; accessed on April 19, 2014., AMPL is the most popular one. We only briefly review the basic AMPL concepts. For more details, we refer to  [fourer1987ampl].

Based on the “set form”, an LP can be written in AMPL as shown in Fig. 2.

1set P;   #column dimension of A
2set K;   #row dimension of A
3param a {j in P, i in K};   #provided as input
4param c {j in P};           #provided as input
5param b {i in K};           #provided as input
6var x {j in P};  #determined by the solver
7
8#the objective
9minimize:  sum {j in P} c[j] * x[j];
10#the constraints
11subject to sum {j in P} a[i, j]*x[j] <= b[i];
Figure 2: AMPL declaration scheme for a linear program in “set form” as shown in Eq. 2.

In principle an AMPL program consists of one objective and a number of ground or indexed constraints. If a constrain is indexed, (i.e. the constraint in the example above is indexed by the set K) a ground constraint is generated for every combination of values of index variables (in the example above there is just one index variable in the constrain, hence a ground constraint is generated for every value in K). The keyword set declares a set name, whose members are provided in a separate file. The keyword param declares a parameter, which may be a single scalar value or a collection of values indexed by a set. Subscripts in algebraic notation are written in square brackets as in b[i] instead of . The values to be determined by the solver are defined by the var keyword. The typical symbol is replaced by the sum keyword. The key element of the AMPL system is the so called indexing expression

In addition to being a part of variable/parameter declaration, they serve both as limits for sums and as indices for constraints. Finally, comments in AMPL start with the symbol #.

In relational linear programs, which we will introduce next, we are effectively mixing first order logic into AMPL. This allows us to keep AMPL’s benefits that make it the number one choice for optimization experts and at the same time enable the representation of relational problems.

4 Relational Linear Programming

The main idea of relational linear programming is to parameterize AMPL’s arithmetic expressions by logical variables and to replace AMPL’s indexing expression by queries to a logical knowledge base. Before showing how to do this, let us briefly review logic programming. For more details we refer to [lloyd1987foundations, flach94, deraedt08springer].

4.1 Logic Programming

A logic program is a set of clauses constructed using four types of symbols: constants, variables, functors, and predicates. Reconsider the collective classification example from the introduction, also see in Fig. 1, and in particular the “passive smoking” rule. Formally speaking, we have that are predicates (with their arity, i.e., number of arguments listed explicitly). The symbols anna, bob, edward, frank, gary, helen, iris are constants and , and are variables. All constants and variables are also terms In addition, one can also have structured terms such as , which contains the functor of arity and the term . Atoms are predicate symbols followed by the necessary number of terms, e.g., , , , etc. Literals are atoms (positive literal) and their negations (negative literals). We are now able to define the key concept of a clause. They are formulas of the form

where – the head – and the — the body — are logical atoms and all variables are understood to be universally quantified. For instance, the clause

can be read as has if and are and has the . Clauses with an empty body are facts. A logic program consists of a finite set of clauses. The set of variables in a term, atom, conjunction or clause , is denoted as , e.g., . A term, atom or clause is ground when there is no variable occurring in , i.e. . A clause is range-restricted when all variables in the left-hand side of also appear in the right-hand side.

A substitution , e.g. , is an assignment of terms to variables . Applying a substitution to a term, atom or clause yields the instantiated term, atom, or clause where all occurrences of the variables are simultaneously replaced by , e.g. is

The Herbrand base of a logic program , denoted as , is the set of all ground atoms constructed with the predicate, constant and function symbols in the alphabet of . A Herbrand interpretation for a logic program is a subset of . A Herbrand interpretation is a model of a clause if and only if for all substitutions such that holds, it also holds that . A clause (logic program ) entails another clause (logic program ), denoted as (), if and only if, each model of () is also a model of ().

The least Herbrand model , which constitutes the semantics of the logic program , consists of all facts such that logically entails , i.e. . Answering a query with respect to a logic program is to determine whether the query is entailed by the program or not. That is, is true in all worlds where is true. This is often done by refutation: entails iff is unsatisfiable.

Logic programming is especially convenient for representing relational data like the social graph in Fig. 1. All one needs is the binary predicate friend/2 to encode the edges in the social graph as well as the predicate to code the attributes of the people in the social network.

4.2 Relational Linear Programs

Since our language can be seen as a logic programming variant of AMPL, we introduce its syntax in a contrast to the AMPL syntax. To do so, let us consider a well known network flow problem [ahuja1993network]. The problem is to, given a finite directed graph in which every edge has a non-negative capacity , and two vertices and called source and sink, maximize a function called flow with the first parameter fixed to (outgoing flow from the source node), subject to the following constraints: and

where the third constraint means that incoming flow equals to outgoing flow for internal vertices. Such flow problems can naturally be formulated as an LP specified in AMPL as shown in Fig. 3.

1set VERTEX;                                                      #vertices
2set EDGES within (VERTEX diff {sink}) cross (VERTEX diff {source}); #edges
3
4param source symbolic in VERTEX;                    #entrance to the graph
5param sink symbolic in VERTEX, <> source;           #exit from the graph
6param cap {EDGES} >= 0;                             #flow capacities
7
8var Flow {(i,j) in EDGES} >= 0, <= cap[i,j];          #flows
9
10maximize: sum {(source,j) in EDGES} Flow[source,j];    #objective
11
12subject to {k in VERTEX diff {source,sink}}:          #conservation of flow
13  sum {(i,k) in EDGES} Flow[i,k] = sum {(k,j) in EDGES} Flow[k,j];
Figure 3: AMPL specification for a general flow linear program. The vertices and edges as well as the corresponding flow capacities are provided in separate files. The flows of the edges are declared to be determined by the LP solver.

The program starts with a definition of all sets, parameters and variables that appear in it. They are then used to define the objective and the constraints from the network flow problem, in particular the third one. The first two constraints are incorporated into the variable definition. As one can see, AMPL allows one to write down the problem description in a declarative way. It frees the user from engineering instance specific LPs while capturing the general properties of the problem class at hand. However, AMPL does not provide logically parameterized definitions for the arithmetic expressions and for the index sets. RLPs, which we will introduce now, feature exactly this.

A first important thing to notice is that AMPL mimics arithmetic notation in its syntax as much as possible. It operates on sets, intersections of sets and arithmetic expressions indexed with these sets. Our language for relational linear programming effectively replaces these constructs with logical predicates, clauses, and queries to define the three main parts of an RLP: the objective template, the constraints template, and a logical knowledge base. An RLP for the flow example is shown in Fig. 4. It directly codes the flow constraints, concisely captures the essence of flow problems, and illustrates nicely that linear programming in general can be viewed as being highly relational in nature. Let’s now discuss this program line by line.

1var flow/2;              #the flow along edges is determined by the solver
2
3outflow(X) = sum {edge(X, Y)} flow(X, Y);                #outflow of nodes
4inflow(Y)  = sum {edge(X, Y)} flow(X, Y);                #outflow of nodes
5
6maximise: sum {source(X)} outflow(X);                     #objective
7
8subject to {vertex(X), not source(X), not sink(X)}:  #conservation of flow
9  outflow(X) - inflow(X) = 0;
10subject to {edge(X, Y)}: cap(X, Y) - flow(X, Y) >= 0;      #capacity bound
11subject to {edge(X, Y)}: flow(X, Y) >= 0;               #no negative flows
Figure 4: Relational encoding the general flow LP. For details we refer to the main text.

Predicates define variables and parameters in the LP. In the flow example, flow/2 captures for example the flows between nodes. Sets that are explicitly defining domains in AMPL are discarded and parameter/variable domains are defined implicitly. In contrast to logic, (ground) atoms can take any numeric value, not just true or false. For instance the capacity between the nodes is captured by cap/2, and the specific capacity between node and could take the value , i.e., cap(f, t) = 3.7. Generally, atoms are parameters of the LP. To declare that they are values to be determined by the solver we follow AMPL’s notation,

1var flow/2;

The in- and outflows per node, inflow/2 and outflow/2, are defined within the RLP. They are the sums of all flows into respectively out of a node. To do so, we use logically parameterized equations or par-equations in short. A par-equation is a finite-length expression of the form

where and are par-expressions of the form

of finite length. Here are numeric constants, atoms or par-expressions, and the are arithmetic operators. The term — which is optional — essentially implements the AMPL aggregation sum but now indexed over a logical query . That is, the AMPL indexing expression {j in P} for the aggregation is turned into an indexing over all tuples in the answer set of the logical query. Essentially, one can think of this as calling the Prolog meta-predicate

treating the par-expression as a conjunction . This will produce the set P of all substitutions of the variables (with any duplicate removed) such that the query is satisfied. In case, we are interested in multi-sets, i.e., to express counts, one may also use . This could be expressed using instead of . The sum aggregation and the involved par-expression is then evaluated over the resulting multidimensional index P. If no sum is provided, this will just be logical indexing for the evaluation of the par-expression . Finally, we note that all par-equalities are assumed implicitly to be all-quantified. That is, they may lead to several ground instances, in particular if there are free variables in the logical query .

With this at hand, we can define inflow/2 and outflow/2 as follows

3outflow(X) = sum {edge(X, Y)} flow(X, Y);
4inflow(Y)  = sum {edge(X, Y)} flow(X, Y);

Since Y is bounded by the summation for outflow/1 and X by the summation for inflow/1, this says that there are two equality expressions per node X — one for the outflow and one for the inflow — summing over all flows of the out- respectively incoming edges of the node X. Indeed, edge/2 is not defined in the flow RLP. In such cases, we assume it to be defined within a logical knowledge base LogKB (see below) represented as logic program.

Similarly, we can now define the objective222For the sake of simplicity, we here assume that there are exactly one source and one sink vertex. If one wants to enforce this, we could simply add this as a logical constraint within the selection query, resulting in an empty objective if there are several source and sink notes. In AMPL, one would use additional check statements to express such restrictions, which cannot be expressed using simple inequalities. using a par-expression:

6maximise: sum {source(X)} outflow(X);

This says, that we want to maximize the outflows for all source nodes. Note that we assume that all variables appear in the sum statement to avoid producing multiple and conflicting objectives.

Next, we define the constraints. Again we can use par-equations or actually par-inequalities. Par-inequalities are like par-equations where we use inequalities instead of equalities. For the flow example they are:

8subject to {node(X), not source(X), not sink(X)}:
9  outflow(X) - inflow(X) = 0;
10subject to {edge(X, Y)}: cap(X, Y) - flow(X, Y) >= 0;
11subject to {edge(X, Y)}: flow(X, Y) >= 0;

This again illustrates the power of RLPs. Since indexing expressions are logical queries, we can naturally express things that either look cumbersome in AMPL or even go beyond its capabilities. For instance the concept of internal edge, which is explicitly represented by a lengthy expression with sets intersections and differences in AMPL, is implicitly represented by a combination of the indexing query node(X), not source(X), not sink(X) and par-equations in-/outflow.






Figure 5: A graph for a particular instance of the flow problem. The node denotes the source, and the sink. The numbers associated with the edges are the flow capacities.

Finally, as already mentioned, everything not defined in the RLP is assumed to be defined in an external logical knowledge base LogKB. It includes groundings of parameter atoms and definitions of intensional predicates (clauses). We here use Prolog but assume that each query from the RLP produces a finite set of answers, i.e., ground substitutions of its logical variables. For instance the LogKB for the instance of the flow problem shown in Fig. 5 can be expressed as follows:

cap(s,a) = 4.  cap(s,b) = 2.  cap(a,c) = 3.  cap(b,c) = 2.
cap(b,d) = 3.  cap(c,b) = 1.  cap(b,t) = 2.  cap(d,t) = 4.
edge(X,Y) :- cap(X,Y).
vertex(X) :- edge(X,_).
vertex(X) :- edge(_,X).
source(s).
sink(t).

where cap(s, a) = 4 is short-hand notation for cap(s,a,4) and cap(X, Y) for cap(X,Y,_), where we use an anonymized variable ‘_’. Predicates edge and vertex are defined intensionally using logical clauses. By default intensional predicates take value 1 or 0 (corresponding to true and false) with the RLP. Values of intensional predicates are computed before grounding the RLP.

Indeed, querying for a vertex would results in a multi-set, since the definition of vertex/1 tests for a vertex as the source and the sink of an edge. However, recall that a logical indexing statement {...} removes any duplicate first. Consequently, we could have used directly the capacity statements cap/2 to define edges and vertices.

8subject to {cap(X,_), not source(X), not sink(X)}:
9  outflow(X) - inflow(X) = 0;
10subject to {cap(X, Y)}: cap(X, Y) - flow(X, Y) >= 0;
11subject to {cap(X, Y)}: flow(X, Y) >= 0;

This works, since the logical indexing statement {...} produces a set variable bindings so that the query cap(X, _) to a knowledge base will return a only once (note: we do not allow multiset indexing here, as it only generates redundant constraints). And, due to the use of the anonymous variable _, its values of the corresponding variable are not included in the query results.

In any case, using logical clauses within relational linear programs is quite powerful. For example, a passive smoking predicate for the collective classification example from the introduction can be defined in the following way:

attribute(X, passive) :- friends(X, Y), attribute(Y, smokes).

It can then be used within the RLP. Or, as another example, consider the compact representation of MLNs for instance for MAP LP inference using RLPs. The following logKB encodes compactly the smokers MLN [richardson2006markov] This MLN has the following weighted clause: which means that smoking leads to cancer, and our belief in this fact is proportional to . In the MAP LP we want to have a predicate with value for every instance of a rule which is true, and a predicate with value 0 for every instance which is false. Instead of writing this down manually, we can use:

person(anna).   person(bob). ...
value(0).  value(1).
w(smokes(X), cancer(X), 1, 0) = 0 :- person(X).
w(smokes(X), cancer(X), V1, V2) = 0.75 :-
                           person(X), value(V1), value(V2).

Please keep our short-hand notation in mind: w(smokes(X), cancer(X), 1, 0) = 0 stands for w(smokes(X), cancer(X), 1, 0, 0). This LogKB will generate the following ground atoms for the weights:

w(smokes(anna), cancer(anna), 1, 0) = 0
w(smokes(anna), cancer(anna), 0, 0) = 0.75
...
w(smokes(bob), cancer(bob), 1, 1) = 0.75

To summarize, a relational linear program (RLP) consists of

  • variable declarations of predicates,

  • one par-equality to define the objective, and

  • several par-(in)equalities to define the constraints.

Everything not explicitly defined is assumed to be a parameter defined in an external LogKB. We now show that any RLP induces a valid ground LP.

Theorem 1.

An RLP together with a LogKB (such that the logical queries in the RLP have finite answer-sets) induces a ground LP.

Proof.

The intuition is to treat par-(in)equality as a logical rules, treating the arithmetic operators as conjunctions, the (in)equalities as the , and sum statements as meta-predicates. Then the finiteness follows from the assumption of finite answer-sets for logical queries to LogKB. Finally, each ground clause can be turned back into an arithmetic expression resp. (in)equality. More specifically, a predicate is grounded either to numbers or to LP variables. Since par-expressions are of finite length, we can turn them into a kind of prenex normal form, that ist, we can write them as strings of statements followed by a sum-free part where bracket expressions are correspondingly simplified. Now, we can ground a par-expression inside-out with respect to the sum statements. Each of these groundings is finite due to the assumption that the queries to the LogKB have finite sets of answers. In turn, since the objective assumes that all variables are bounded by the sum statements in the prenex normal form, only a single ground sum over numbers and LP variables is produced as objective. For par-(in)equalities encoding constraints both sides of the (in)equality are par-expressions. Hence, using the same argument as for the objective, they produce at most a finite number of ground (in)equalities. Taking everything together, an RLP together with a finite LogKB always induces a valid ground LP.

For illustration reconsider the flow instance in Fig. 5. It induces the following ground LP for the relational flow LP in Fig. 4 (for the sake of readability, only two groundings are shown for each constraint for compactness).

1maximize: flow(s,a) + flow(s,b);
2
3subject to: flow(b,c)+flow(b,d)-flow(s,b)-flow(c,b)=0;
4subject to: flow(a,c) - flow(s, a) = 0
5...
6subject to: 3 - flow(a, c) >= 0;
7subject to: 2 - flow(c, t) >= 0;
8...
9subject to: flow(c, t) >= 0;
10subject to: flow(b, c) >= 0;
11...
Input: RLP together with a LogKB
Output: Ground LP consisting of ground (AMPL) statements
1 Set to the empty LP;
2 ‘‘Flatten’’ par-(in)equalities and the objective into prenex normal form by inlining aggregates and simplifying brackets;
3 for each par-(in)equality and the objective do
4       for each sum-aggreation and each separate atom do
5             Query the LogKB in order to obtain a grounding or a set of groundings if we are dealing with a constraint which involves an indexing expression;
6            
7       end for
8      Concatenate results of queries evaluation for each grounding to form a ground (in)equality resp. objective;
9       Add the ground (in)equality resp. objective to ;
10      
11 end for
return
Algorithm 1 Grounding RLPs. The resulting ground LP can be solved using any LP solver after transforming into the solver’s input form. This can of course be automated.

But how do we compute this induced ground LP, which will then become an input to an LP solver? That is, given a RLP and a LogKB how do we effectively expand all the sums and substitute all the parameters with numbers? The proof of Theorem 1 essentially tells us how to do this. We treat all par-(in)equalities and par-expressions as logical rules (after turning them into prenex normal form and simplifying them). Then any Prolog engine can be used to compute all groundings. A simple version of this is summarized in Alg. 1.

In our experiments, however, we have used a more efficient approach similar to the one introduced in Tuffy [niu2011tuffy]

. The idea is to use a relational database management system (RDBMS) for bottom-up grounding, first populating the ground predicates into a DB and then translating logical formulas into SQL queries. Intuitively, this allows to exploit RDBMS optimizers, and thus significantly speed up of the grounding phase. We essentially follow the same strategy for grounding RLPs. PostgreSQL is used in the current implementation, which allows to perform arithmetic computations and string concatenations inside an SQL query, so we are able to get sums of LP variables with corresponding coefficients directly from a query. As a results, the only post processing needed is concatenation of these strings. Our grounding implementation takes comparable time to Tuffy on an MLN with comparable number of ground predicates to be generated, which is a state-of-the-art performance on this task at the moment.

To summarize, relational linear programming works as follows:

  1. Specify an RLP .

  2. Given a LogKB, ground into an LP (Alg. 1).

  3. Solve using any LP solver.

In fact, this novel linear programming approach — as we will demonstrate later — already ease considerably the specification of whole families of linear programs. However, we can do considerably better. As we will show next, we can efficiently detect and exploit symmetries within the induced LPs and in turn speed up solving the RLP often considerably.

5 Exploiting Symmetries for Reducing the Dimension of LPs

As we have already mentioned in the introduction, one of the features of many relational models is that they can produce model instances with a lot of symmetries. These symmetries in turn can be exploited to perform inference at a “lifted” level, i.e., at the level of groups of variables. For probabilistic relational models, this lifted inference can yield dramatic speed-ups, since one reasons about the groups of indistinguishable variables as a whole, instead of treating them individually.

Triggered by this success, we will now show that linear programming is liftable, too.

5.1 Detection Symmetries using Color-Passing

One way to devise a lifted inference approach is the following. One starts with a standard inference algorithm and introduces some notion of indistinguishability among the variables in the model (instance) at hand. For example, we can say that two variables and in a linear program are indistinguishable, if there exist a permutation of all variables, which exchanges and , yet still yields back the same model in terms of the solutions. Then, given a particular model instance, one detects, which variables are exchangeable in that model instance. The standard inference algorithm is modified in such a way that it can deal with groups of indistinguishable variables as a whole, instead of treating them individually. This approach was for instance followed to devise a lifted version of belief propagation [singla08aaai, kersting09uai, ahmadi2013mlj], a message-passing algorithm for approximate inference in Markov random fields (MRFs), which we will not briefly sketch in order to prepare the stage for lifted linear programming. In doing so, we will omit many details, since they are not important for developing lifted linear programming.

Input: A graph , an initial coloring function
Output: A partition of
1 Initialize , repeat
2       foreach  do
3             foreach  do
4                  
5             end foreach
6            
7       end foreach
8      
9until ;
10return ;
Algorithm 2 Color-Passing

Belief propagation approximately computes the single-variable marginal probabilities

in an MRF encoding the joint distribution over the random variables

, . It does so by passing messages within a graphical represention of the MRF.

The main idea to lift belief propagation is to simulate it keeping track of which s and clauses send identical messages. These elements of the model can then be merged into groups, whose members are indistinguishable in terms of belief propagation. After grouping elements together into a potentially smaller (lifted) MRF, a modified message-passing computes the same beliefs as standard belief propagation on the original MRF.

To identify indistinguishable elements, one first assigns colors to the elements of the MRF. Then, one performs message-passing, but replaces the belief propagation messages — the computation of which is the most time consuming step in belief propagation, namely a sum-product operation — by these colors. That is, instead of the sum-product update rule, one uses a less computationally intensive sort-and-hash update for colors. After every iteration, one keeps track of the partition of the network, , induced by nodes the same colors. Unlike standard belief propagation, this color-passing procedure is guaranteed to converge (that is, the partition will stop getting finer) in at most iterations. I.e., at worst, one may end up with the trivial partition. This color-passing algorithm is outlined in Alg. 2.

Lifted linear programming as introduced next is quite similar to lifted belief propagation. In fact, it also uses color-passing for detection the symmetries. However, there are remarkable differences:

  1. First, lifted belief propagation applies only to approximate probabilistic inference. As it exploits redundancies within belief propagation, it will produce the same solution as belief propagation. However, examples can be found, where the true and exact solution does not exhibit symmetry, while the approximation does. In contrast, lifted linear programming is sound — an exact solution to an LP can be recovered from any lifted solution.

  2. Second, as an intermediate step lifted belief propagation generates a symmetry-compressed model. This lifted MRF which is no longer an MRF since there are for instance multi-edges.To accommodate for that, the message equations of lifted belief propagation are modified and, as a result, lifted message-passing cannot be done using standard belief propagation implementations. As we will show, this is not the case for lifted linear programming. The symmetry-compressed LP, or lifted LP for short, is still a LP and can be solved using any LP solver. This is a significant advantage over lifted belief propagation, as we can take full advantage of state-of-the-art LP solver technology.

Both points together suggest to view lifted linear programming as reducing the dimension of the LP. So, how do we do this?

5.2 Equitable Partitions and Fractional Automorphisms

Figure 6: Using symmetry to speed up linear programming: (a) the feasible region of and the objective vector (in pink); (b) the fixed space of is identified (grey); (c) the feasible region is restricted to its intersection with the fixed space.

To develop lifted linear programming, we proceed as follows: first, we introduce the notion of equitable partitions and show how they connect color-passing. To make use of equitable partitions in linear programs, we need to bridge the combinatorial world of partitions with the algebraic world of linear inequalities. We do so by introducing the notion of fractional automorphisms, which serve as a matrix representation of partitions. Finally, we show that if an LP admits an equitable partition, then an optimal solution of the LP can be found in the span of the corresponding fractional automorphism. This essentially defines our lifting: by restricting the feasible region of the LP to the span of the fractional automorphism, we reduce the dimension of the LP to the rank of the fractional automorphsm (geometric intuition is given in Figure 6). As we will see, this results in an LP with fewer variables (to be precise, we have the same number of variables as the number of classes in the equitable partition).

Now, let us make some necessary remarks on the nature of the partition returned by Alg. 2. Suppose for now, for all , i.e. the graph is only vertex-colored. Observe that according to line , nodes and receive different colors in the ’th iteration if the multisets of the colors of their neighbors are different. That is, in order to be distinguished by Alg. 2, and must have a different number of neighbors of the same color at some iteration .

Consequently, as the algorithm terminates when the partition no longer refines, we conclude that the following holds: Alg. 2 partitions a graph in such a way that every two nodes nodes in the same class have the same number of neighbors from every other class. More formally, for each pair classes and every two nodes , we have

We call any partition with the above property equitable. For the edge-colored case, we have a slightly more complicated definition: is equitable, whenever it holds that for each pair classes , every edge color , and every two nodes

In other words, we say that every two nodes in a class have the same number of edges of the same color going into every other class.

In fact, it can be shown that Alg. 2 computes the coarsest such partition of a graph, and the relationship between equitable partitions and color-passing has been well-known in graph theory [Ramana1994].

Observe that like MRFs, which have variables and factors, linear programs are bipartite objects as well: they consists of variables and constraints. As such, they are more naturally represented by bipartite graphs, hence we will now narrow down our discussion to bipartite graphs. We say that a colored graph is biparite, if the vertex set consists of two subsets, , such that every edge in has one end-point in and the other in . The notion of equitable partitions apply naturally to the bipartite setting – we will use the notation to indicate that the classes over the subset are disjoint from the classes over the subset . I.e., no pair can be in the same equivalence class.

With this in mind, we introduce fractional automorphisms. Note that our definition is slightly modified from the original one [Ramana1994], in order to accomodate the bipartite setting.

Definition 2.

Let be an real matrix, such as the (colored, weighted) adjacency matrix of a bipartite graph. A fractional automorphism of is a pair of doubly stochastic (meaning that the entries are non-negative, and every row and column sum to one) matrices such that

The following theorem establishes the correspondence between equitable partitions and fractional automorphisms.

Theorem 3 ([Ramana1994, grohe13arxiv]).

Let be a bipartite graph. Then:

  • if is an equitable partition of , then the matrices , having entries

    (3)

    is a fractional automorphism of the adjacency matrix of .

  • conversely, let be a fractional automorphism of the (colored, weighted) adjacency matrix of the bipartite with edge set . Then the partition , where vertices belong to the same if and only if at least one of and is greater than , respectively belong to a class if or is greater than , is an equitable partition of .

In the following, part of the above Theorem will be of particular interest to us. We will shortly show how we can use color-passing to construct equitable partitions of linear programs. Encoding these equitable partitions as fractional automorphisms will provides us with insights into the geometrical aspects of lifting and will be an essential tool for proving its soundness.

Note that any graph partition (equitable or not) can be turned into a doubly stochastic matrix using Eq. 

3. However, keep in mind that the resulting matrix will not be a fractional automorphism unless the partition is equitable. In any case, partition matrices have a useful property that will later on allow us to reduce the number of constraints and variables of a linear program. Namely,

Proposition 4 ([god97]).

Let be the doubly stochastic matrix of some partition according to Eq. 3. Then , with

(4)

5.3 Fractional Automorphisms of Linear Programs

As we already mentioned, we are going to apply equitable partitions through fractional automorphisms to reduce the size of linear programs. Hence, the obvious question presents itself: what is an equitable partition (resp. fractional automorphism) of linear program?

In order to answer the question, we need a graphical representation of , called the coefficient graph of , . To construct , we add a vertex to for every of the constraints variables of . Then, we connect a constraint vertex and variable vertex if and only if . Furthermore, we assign colors to the edges in such a way that . Finally, to ensure that and are preserved by any automorphism we find, we color the vertices in a similar manner, i.e., for row vertices and for column vertices. We must also choose the colors in a way that no pair of row and column vertices share the same color; this is always possible.

Figure 7: Construction of the coefficient graph of . On the left-hand side, the coloring of the LP is shown. This turns into the colored coefficient graph shown on the right-hand side.

To illustrate this, consider the following toy LP:

1var p/1;
2
3maximize: sum_{gadget(X)} p(X);
4
5subject to: sum{widget(X)}p(X) + sum{gadget(X)} p(X) <= 1;
6subject to {widget(X)}: widget(X) <= 0;
7subject to: sum{widget(X)}p(X) - sum{gadget(X)} p(X) <= -1;

with knowledge base LogKB (recall that logical atoms are assumed to evaluate to and within an RLP):

widget(x).
widget(y).
gadget(z).

If we ground this linear program and convert it to dual form (as in Eq. 1), we obtain the following linear program

subject to

where for brevity we have substituted by respectively. The coefficient graph of is shown in Fig. 7.

We call an equitable partition of a linear program the equitable partition of the graph 333using the notion of equitable partitions of bipartite colored graphs from the previous section.. Suppose now we compute an equitable partition of using Algorithm 2 and compute the corresponding fractinal automorphism as in Eq. 3. Observer that will have the following properties:

  • due to Theorem 3, we have ;

  • by our choice of initial colors of , the partition will never group together variable vertices with , nor will it group constraint vertices with . By Eq. 3, this implies

    and

This yields the definition of a fractional automorphism of linear programs – we call a pair of doubly stochastic matrices a fractional automorphism of the linear program if it satisfies properties and as above.

5.4 Lifted Linear Programming

With this at hand, we are ready for the main part of our argument. We split the argument in two parts: first, we will show that if an LP has an optimal solution , then it also has a solution . That is, if we add the constraint to the linear program, we will not cut away the optimum. The second claim is that can be realized by a projection of the LP into a lower dimensional space. So, we can actually project to a low-dimensional space, solve the LP there, and then recover the high-dimensional solution via simple matrix multiplication. We recall that his idea is illustrated in Figure 6.

Let us now state the main result.

Theorem 5.

Let be a linear program and be a fractional automorphism of . Then, it holds that if is a feasible in , then is feasible as well and both have the same objective value. As a consequence, if is an optimal solution, is optimal as well.

Proof.

Let be feasible in , i.e. . Observe that left multiplication of the system by a doubly stochastic matrix preserves the direction of inequalities. More precisely

for any doubly stochastic 444actually, this holds for any positive matrix , since , since is positive and by assumption. . So now, we left-multiply the system by :

since and . This proves the first part of our Theorem. Finally, observe that as .

We have thus shown that if we add the constraint to , we can still find a solution of the same quality as in the original program. How does this help to reduce dimensionality? To answer, we observe that the constraint can be implemented implicitly, through reparametrization. That is, instead of adding it to explicitly, we take the LP . Now, recall that was generated by an equitable partition, and it can be factorized as where is the normalized incidence matrix of as in Eq. 4.

Note that the span of is equivalent (in the vector space isomorphism sense) to the column space of . That is, every , can be expressed as some and conversely, for all .Hence, we can replace with the equivalent . Since this is now a problem in variables, i.e., of reduced dimension, a speed-up of solving the original LP is possible. Finally, by the above, if is an optimal solution of , is an optimum solution of .

Input: An inequality-constrained LP,
Output:
1 Construct the coefficient graph ;
2 Lift using color-passing, see Alg. 2.;
3 Read off the characteristic matrix ;
4 Obtain the solution of the LP using any standard LP solver;
5 return ;
Algorithm 3 Lifted Linear Programming

Overall, this yields the lifted linear programming approach as summarized in Alg. 3. Given an LP, first construct the coefficient graph of the LP (line 1). Then, lift the coefficent graph (line 2) and read off the characteristic matrix (line 3). Solve the lifted LP (line 4) and “unlift” the lifted solution to a solution of the original LP (line 5).

Applying this lifted linear programming to LPs induced by RLPs, we can rephrase relational linear programming as follows:

  1. Specify an RLP .

  2. Given a LogKB, ground into an LP (Alg. 1).

  3. Solve using lifted linear programming (Alg. 3).

Before illustrating relational linear programming, we would like to note that this method of constructing fractional automorphisms works for any equitable partition, not only the one resulting from running color-passing. For example, an equitable partition of a graph can be constructed out of its automorphism group, by making two vertices equivalent whenever there exists a graph automorphism that maps one to the other. The resulting partition is called the orbit partition of a graph, see e.g. [godsil01]. Applying this partitioning method to coefficient graphs of linear programs and using the corresponding fractional automorphism is equivalent to previous theoretical well-known results in solving linear programs under symmetry, see e.g. [Boedi13] and references in there. However, there are two major benefits for using the color-passing partition instead of the orbit partition:

  • The color-passing partition is at least as coarse as the orbit partition, see e.g. [godsil01]. To illustrate this, consider the so-called Frucht graph as shown in Fig. 8. Suppose we turn this graph into a linear program by introducing constraint nodes along the edges and coloring everything with the same color. The Frucht graph has two extreme properties with respect to equitable partitions: 1) it is asymmetric, meaning that the orbit partition is trivial – one vertex per equivalence class; 2) it is regular (every vertex has degree 3); as one can easily verify, in this case the coarsest equitable partition consists of a single class!

    Figure 8: The Frucht graph with 12 nodes. The colors indicate the resulting node partitions using color-passing (the coarsest equitable partition, eft) and using automorphisms (the orbit partition, right).

    Due to these two properties, in the case of the Frucht graph the orbit partition yields no compression, whereas the coarsest equitable resp. color-passing partition produces an LP with a single variable.

  • The color-passing partition can be computed in quasi-linear time [berbongro13], yet current tools for orbit partition enumeration have significantly worse running times. Thus, by using color-passing we achieve strict gains in both compression and efficiency compared to using orbits.

Let us now illustrate relational linear programming using several AI tasks.

6 Illustrations of Relational Linear Programming

Our intention here is to investigate the viability of the ideas and concepts of relational linear programming through the following questions:

(Q1)

Can important AI tasks be encoded in a concise and readable relational way using RLPs?

(Q2)

Are there (R)LPs that can be solved more efficiently using lifting?

(Q3)

Does relational linear programming enable a programming approach to AI tasks facilitating the construction of more sophisticated models from simpler ones by adding rules?

If so, relational linear programming has the potentially to make linear models faster to write and easier to understand, reduce the development time and cost to encourage experimentation, and in turn reduce the level of expertise necessary to build AI applications. Consequently, our primary focus is not to achieve the best performance by using advanced models. Instead we will focus on basic models.

We have implemented a prototype system of relational linear programming, and illustrate the relational modeling of several AI tasks: computing the value function of Markov decision processes, performing MAP LP inference in Markov logic networks and performing collective transductive classification using LP support vector machines.

6.1 Lifted Linear Programming for Solving Markov Decision Processes

Our first application for illustrating relational linear programing is the computation of the value function of Markov Decision Problems (MDPs). The LP formulation of this task is as follows [Littman95]:

(5)

where is the value of state , is the reward that the agent receives when carrying out action , and is the probability of transferring from state to state by taking action . is a discounting factor. The corresponding RLP is given in Fig. 9. Since it abstract away the states and rewards — they are defined in the LogKB — it extracts the essence of computing value functions of MDPs. Given a LogKB, a ground LP is automatically created Instead of coding the LP by hand for each problem instance again and again as in vanilla linear programming. This answers question (Q1) affirmatively.

1var value/1;             #value function to be determined by the LP solver
2
3maximize: sum{reward(S,_)} value(S);          #best values for all states
4
5#encoding of discounted Bellman optimality as in inequality (5)
6subject to {transProb(S,T,_)}: value(S) <= reward(S,A) +
7  gamma*sum{transProb(S,T,A)} transProb(S,T,A)*value(T);
Figure 9: An RLP for computing the value function value/1 of a Markov decision process. There is a finite set of states and actions, and the agent receives a reward reward(S,A) for performing and action A in state S, specified in a LogKB.

The MDP instance that we used is the well-known Gridworld (see e.g. [suttonbarto98]). The gridworld problem consists of an agent navigating within a grid of states. Every state has an associated reward . Typically there is one or several states with high rewards, considered the goals, whereas the other states have zero or negative associated rewards. We considered an instance of gridworld with a single goal state in the upper-right corner with a reward of . The reward of all other states was set to . The scheme for the LogKB looked like this:

reward(state(n-1,n),right)=100.
reward(state(n,n-1),up)=100.
reward(state(X,Y),_)=-1 :- X>0, X<n-1, Y>0, Y<n-1.

As can be seen in Fig. 10(a), this example can be compiled to about half the original size. Fig. 10(b) shows that already this compression leads to improved running time. We now introduce additional symmetries by putting a goal in every corner of the grid. As one might expect this additional symmetry gives more room for compression, which further improves efficiency as reflected in Figs. 10(c) and  10(d).

(a) Ground vs. lifted variables on a basic gridworld MDP.
(b) Measured times on a basic gridworld MDP.
(c) Variables on a gridworld with additional symmetry.
(d) Measured times on a gridworld with additional symmetry.
(e) Variables on a gridworld with additional symmetry in sparse form.
(f) Measured times on a gridworld with additional symmetry in sparse form.
Figure 10: Experimental results of relational linear programming for solving Markov decision processes.

However, the examples that we have considered so far are quite sparse in their structure. Thus, one might wonder whether the demonstrated benefit is achieved only because we are solving sparse problem in dense form. To address this we convert the MDP problem to a sparse representation for our further experiments. We scaled the number of states up to and as one can see in Fig. 10(e) and (f) lifting still results in an improvement of size as well as running time. Therefore, we can conclude that lifting an LP is beneficial regardless of whether the problem is sparse or dense, thus one might view symmetry as a dimension orthogonal to sparsity. Furthermore, in Fig. 10(f) we break down the measured total time for solving the LP into the time spent on lifting and solving respectively. This presentation exposes the fact that the time for lifting dominates the overall computation time. Clearly, if lifting was carried out in every iteration (CVXOPT took on average around 10 iterations on these problems) the approach would not have been competitive to simply solving on the ground level. This justifies that the loss of potential lifting we had to accept in order to not carry out the lifting in every iteration indeed pays off (Q2). Remarkably, these results follow closely what has been achieved with MDP-specific symmetry-finding and model minimization approaches [ravi08icml, ravi01tech, Dean97].

6.2 Programming MAP LP Inference in Markov Logic Networks

1var m/2;  #single node, pairwise, and
2var m/4;  #triplewise probabilities
3var m/6;  #of configurations to be determined by the solver
4#value of the MAP assignment
5innerProd = sum{w(P, V)} w(P, V) * m(P, V) +
6  sum{w(P1, P2, V1, V2)} w(P1, P2, V1, V2) * m(P1, P2, V1, V2) +
7  sum{w(P1, P2, P3, V1, V2, V3)} w(P1, P2, P3, V1, V2, V3) *
8  m(P1, P2, P3, V1, V2, V3);
9atomMarg(P) = sum {w(P, V)} m(P, V);                #single node marginals
10#single node marginal computed from pairwise marginals
11clauseMargl1(P1, P2, V1) = sum{w(P2, V2)} m(P1, P2, V1, V2);
12...
13#single node marginal computed from triplewise marginales
14clauseMarg1(P1, P2, P3, V1) = sum{w(P3, V3), w(P2, V2)}
15  m(P1, P2, P3, V1, V2, V3);
16...
17maximise: innerProd;               #find MAP assignment with largest value
18
19subject to {w(P, _)}: atomMarg(P) = 1;           #normalization constraint
20#pairwise consistency constraints
21subject to {w(P1, P2, V1, _)}: m(P1, V1) - clauseMarg1(P1, P2, V1) = 0;
22...
23#triplewise consistency constraints
24subject to {w(P1, P2, P3, V1, _, _)}:
25         m(P1, V1) - clauseMarg1(P1, P2, P3, V1) = 0;
26...
Figure 11: RLP encoding the MAP-LP for triplewise MLNs as shown in Eq. 6. The last two constraints as well as the last two aggregates have symmetric copies that have been omitted (this redundancy is necessary, since logic predicates are not symmetric).

MLNs, see  [richardson2006markov] for more details, are a prominent model in statistical relational learning (SRL). We here focus on MAP (maximum a posteriori) inference where we want to find a most likely joint assignment to all the random variables. More precisely, an MLN induces a Markov random field (MRF) with a node for each ground atom and a clique for every ground formula. A common approach to approximate MAP inference in MRFs is based on LP, see e.g. [wainwright2008graphical] for a general overview. Actually, there is a hierarchy of LP formulation for MAP inference each assuming a hypertree MRF of increasing treewidth. Since MLNs of interest typically consists of at least one factor with three random variables, we will focus on triplewise MRFs as presented e.g. in [apsel14aaai] in order to investigate (Q1). Given an MRF induced by an MLN over the set of (ground) random variables and with factors , the MAP-LP is define as follows, see also [apsel14aaai]. For each subset of indices taken from of size , let denote a vector of variables of size . A notation is used to describe a specific variable in vector corresponding to the subset and entry . Additionally, let denote the set of all ordered indices for which there exists a factor with a matching variables scope , and let denote the log probability table of a factor whose variables scope is . The MAP-LP is now defined as follows:

(6)

This MAP-LP has to be instantiated for each MLN after we have computed a ground MRF induced by the MLN and a set of constants.

The fact that both MLNs and RLPs are based on logic programming features a different and more convenient translation from an MLN to a MAP LP for inference. Actually, each MAP-LP is defined through weights, marginals, and triples of variables induced by the MLN formulas. This generation can naturally be specified at the lifted level as done in the RLP in Fig. 11. Since it is defined at the lifted level abstracting from the specific MLN, it clearly answers (Q1) affirmatively. To see this, consider to perform inference in the well known smokers MLN. Here, there are two rules. The first rule says that smoking can cause cancer

and the second implies that if two people are friends then they are likely to have the same smoking habits

Since the second formula contains three predicates using the triplewise MAP-LP is valid. Now the the MLN as well as the used constants are encoded in the following LogKB:

person(anna).  person(bob).  ...   #the people in the social network
value(0).  value(1).               #we consider binary MLNs
#encoding of the MLN clauses and their weights
w(smokes(X), cancer(X), 1, 0) = 0 :- person(X).
w(smokes(X), cancer(X), V1, V2) = 0.75 :-
                   person(X), value(V1), value(V2).
w(friends(X, Y), smokes(X), smokes(Y), 1, 1, 1) = 0.75 :-
                   person(X), person(Y).
w(friends(X, Y), smokes(X), smokes(Y), 1, 0, 0) = 0.75 :-
                   person(X), person(Y).
w(friends(X, Y), smokes(X), smokes(Y), V1, V2, V3) = 0 :-
                   person(X), person(Y), value(V1), value(V2), value(V3).
m(smokes(gary), 1).                     #the smokers we know
m(smokes(helen), 1).
m(friends(anna, bob), 1).               #the observed friendship relations
...
m(friends(helen,iris), 1).

This examples gives us an opportunity to introduce another benefit of mixing logic with arithmetics, namely the ability to easily represent such concepts as indicator functions. In the MAP LP we can add an ”indicator” predicate clause(X, Y) which equals to 1 if X and Y are two atoms that form a clause in an MLN and is 0 otherwise. This results in the following extension to the LogKB:

clause(smokes(X), cancer(X)) = 1 :- person(X).

We can now write the LP constraints in a more mathematical way using constraints such as

1subject to: {pred(P1), pred(P2), val(V2)}
2     (m(P2,V2) - clauseMarg2(P1,P2,V2))*clause(P1, P2) = 0;

The resulting program is equivalent to the previous version, but can be more straightforward for some people with an optimization background. Both programs show that MAP-LP inference within MLNs can be compactly represented as an RLP. Only the LogKB changes when the MLN and/or the evidence changes. Moreover, the RLP does not rely on the fact that we consider MLNs. Propositional models can be encoded in the LogKB, too. Hence the RLP extracts the essence of MAP-LP inference in probabilistic models, whether relational or propositional. This supports even more that (Q1) can be answered affirmatively.

(a) Number of variables in the lifted and ground LPs.
(b) Time for solving the ground LP vs. time for lifting and solving.
Figure 12: Experimental results of relational linear programming for MAP-LP inference within Markov logic networks.

Let us now turn towards investigating Q2. As shown in previous works, inference in graphical models can be dramatically sped-up using lifted inference. Thus, it is natural to expect that the symmetries in graphical models which can be exploited by standard lifted inference techniques will also be reflected in the corresponding MAP-(R)LP. To verify whether this is indeed the case we induced MRFs of varying size from a pairwise smokers MLN [fierensKDCM12]. In turn, we used a pairwise MAP-RLP following essentially the same structure as the triplewise MAP-RLP above but restricted to pairs. We scaled the number of random variables from