 # Anti-unification in Constraint Logic Programming

Anti-unification refers to the process of generalizing two (or more) goals into a single, more general, goal that captures some of the structure that is common to all initial goals. In general one is typically interested in computing what is often called a most specific generalization, that is a generalization that captures a maximal amount of shared structure. In this work we address the problem of anti-unification in CLP, where goals can be seen as unordered sets of atoms and/or constraints. We show that while the concept of a most specific generalization can easily be defined in this context, computing it becomes an NP-complete problem. We subsequently introduce a generalization algorithm that computes a well-defined abstraction whose computation can be bound to a polynomial execution time. Initial experiments show that even a naive implementation of our algorithm produces acceptable generalizations in an efficient way. Under consideration for acceptance in TPLP.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction and motivation

Anti-unification refers to the process of computing for a given set of symbolic expressions , a so-called generalization of , that is a single expression that captures some of the common structure that is shared by all elements in . For instance, in a logic programming context, the atom can be seen as a generalization of the set of atoms

 {p(a,a,f(a)),p(a,b,f(g(c))),p(a,A,f(a))}

as each of these atoms is an instance of . Often, one is interested in what is called a most specific or, equivalently, a least general generalization. That is, a generalization that preserves a maximal amount of common structure. In the example above, is a most specific generalization of the three given atoms although other, less specific, generalizations exist such as and . Being able to compute such generalizations is a mandatory ingredient in a number of program analyses and transformations such as partial deduction (e.g. [Gallagher (1993), De Schreye et al. (1999)], supercompilation (e.g. [Sørensen and Glück (1999)]) and fold/unfold (e.g. [Pettorossi and Proietti (1998)]) transformations where it is typically used as a mean to guarantee termination.

In this work we develop a theory of generalization (or anti-unification) in the context of constraint logic programming (CLP) where - in its most declarative form - clause bodies and goals are conceptually represented by sets of constraints and atoms. While some works exist on generalizing CLP, these typically focus on the underlying constraint domain and introduce widening operators (e.g. convex hull on ) in order to generalize the constraint set at the semantical level (e.g. [Fioravanti et al. (2013)]). Other existing works are targeted to a particular application such as learning constraints by generalization of samples of facts [Gutiérrez-Naranjo et al. (2003)]. In contrast, we take a fundamentally different approach and focus on generalizing the syntactical representation of the program structures to be generalized (basically conjunctions represented by sets of constraints and atoms), and this independent of the particular constraint or application domain. Our main motivation for doing so is to obtain a generalization operator that computes the maximal common syntactical structure shared by two goals or, by extension, clauses and predicates. This is a basic operation needed in the work on clone detection and detection of algorithmic equivalence (see e.g. [Mesnard et al. (2016)]) where one needs to frequently and rapidly compute such generalizations in order to compare how closely related two goals or clauses are. Moreover, the generalization operator we propose being domain- and application-independent, it could readily be integrated in other program manipulation approaches that need to generalize CLP clauses (examples include conjunctive partial deduction or ILP-based learning). While other more involved generalization approaches exist, for example grammar-based E-generalization [Burghardt (2014)] and regular tree abstraction [Bouajjani et al. (2006)], we focus in this work on the most specific generalization (msg) as it suits best our particular context.

Computing a most specific generalization (msg) of two or more terms (and, by extension, atoms) or other tree-like structures is straightforward and can be done in linear time. Existing algorithms are typically based on the seminal algorithm of Plotkin [Plotkin (1970)] in which two tree-structures are generalized by computing their maximal common subtree and replacing non-matching subbranches by new variables. However, when more involved computational structures need to be generalized (such as conjunctions of atoms, goals and clauses), the literature is less clear on what algorithms are available to automatically compute their most specific generalization. The basic problem, of course, being that in this case one is not necessarily interested in viewing the structures that need to be generalized as simple tree structures as that would be too restrictive. Take for instance the conjunctions and ; when these conjunctions are considered as trees, computing the msg would result in missing the fact that also is common to both conjunctions. Dependent on the application at hand, usually an ad-hoc technique is introduced that most often boils down to applying the classical msg operation to (a subset of) the atoms of both structures, usually preserving the order in which the atoms appear in the structure for efficiency reasons. This is for example the case in conjunctive partial deduction [Leuschel et al. (1998)] where conjunctions are treated as sequential structures and the abstraction operation generalizes ordered subconjunctions. This is defensible when partially deducing Prolog programs where the order of the atoms in a conjunction is important and usually needs to be preserved, but it nevertheless limits the possible outcomes of the generalization operation and makes it hard to transfer the approach towards other contexts where the order of the individual atoms or other computational constituents might be less important.

While CLP is an important target in itself – especially given its aptitude as a universal intermediate language for analysis and transformation [Gange et al. (2015)], our generalization operator, basically manipulating sets of atoms, can also be beneficial in program transformation for classical (non-constraint) logic programming, as it allows to lift the restriction imposed by most of the existing generalization operators to preserve the order of the atoms in the conjunctions that are generalized.

The paper is structured as follows. In Section 2 we introduce some preliminary concepts and notation, in Section 3 we introduce our main abstraction and algorithm, we evaluate our approach by means of a prototype implementation discussed in Section 4 before concluding in Section 5.

## 2 Preliminaries

### 2.1 Constraint logic programming essentials

Let us first introduce some of the basic concepts and notations that will be used throughout the paper. A CLP program is traditionally defined [Jaffar and Maher (1994)] over a CLP context, which is a 5-tuple , where is a non-empty set of constant values, is a set of variable names, a set of function names, is a set of constraint predicates over and a set of predicate symbols. The sets and are all supposed to be disjoint sets. Symbols from , , and have an associated arity and as usual we write to represent a symbol having arity . Given a CLP context , we can define the set of terms over as where . Likewise, the set of constraints over is defined as and the set of atoms as . A goal is a set of atoms and/or constraints. We will sometimes use the notion of a literal to refer to either a constraint or an atom. A program is then defined over a context as a set of constraint Horn Clause definitions where each clause definition is of the form where is an atom called the head of the clause with all distinct variables, and a goal called the body of the clause. We will sometimes refer to a clause by if we want to distinguish the set of constraints and the set of atoms in its body. A fact is a clause with only constraints in its body. For a predicate symbol , we use to denote the definition of in the program at hand, i.e. the set of clauses having a head atom using as predicate symbol. Without loss of generality, we suppose that all clauses defining a predicate have the same head (i.e. use the same variables to represent the arguments).

In what follows we will often consider the context to be implicit and talk simply about a program and the predicates and clauses defined therein. Without loss of generality we assume that the set of constraint predicates contains at least an equality relation represented by . Note that in our definition of a clause, atoms contain only variables as arguments. This is by no means a limitation, as arguments can be instantiated by means of equality constraints in the clause body.

Different semantics have been defined for CLP. In our approach, we consider the declarative semantics as in [Jaffar and Maher (1994)]. A constraint domain is comprised of a set of values and an interpretation for the relational symbols used in the underlying context. Given a constraint domain , a valuation is a mapping from variables to values and we say that a set of constraints is satisfiable, noted if there exists a valuation with such that evaluates to . In this work we focus on the declarative semantics of a program which is defined as a subset of , the latter defined as . For a program and an underlying constraint domain , the immediate consequence operator can be defined as a continouous function on as follows [Jaffar et al. (1998)]:

 TDP(I)=⎧⎪ ⎪⎨⎪ ⎪⎩p(V1,…,Vn)←C,B a renamed apart clause in Pp(v1,…,vn)v a valuation on D such that D⊨v(C) and v(B)⊆I∀i∈{1,…,n}:v1=v(Vi)⎫⎪ ⎪⎬⎪ ⎪⎭

The semantics of a program , which we will represent by can then be defined as the least fixed point of . In what follows, we will often simply refer to the semantics of a program without specifying the underlying constraint domain or CLP context. The semantics of a goal with respect to a program and a set of variables occurring in is then defined as where is the program to which a clause has been added with a special predicate symbol not occurring in . Slightly abusing notation, we will use to denote the semantics of the goal w.r.t. the program and the set of variables , or simply if the program is clear from the context. While in practice CLP is typically used over a concrete domain, we will make abstraction of the concrete domain over which the constraints are expressed, as our generalization theory only considers the syntactical structure of the constraints (and not their semantics).

### 2.2 Generalization principles

For any program expression (be it a term, a constraint, an atom or a goal), we use to denote the set of variables that appear in . As usual, a substitution is a mapping from variables to terms and will be denoted by a Greek letter. For any mapping , represents its domain, its image, and for a program expression and a substitution , represents the result of simultaneously replacing in those variables that are in by . A renaming is a special kind of substitution, mapping variables to distinct variables (i.e. being injective). For a renaming , we use to denote its reverse. Two expressions and are variants if and only if and for some renaming . For an expression , a fresh renaming of is a variant of where all variables have been renamed to new, previously unused variables. Given the notion of a renaming, we can easily define a quasi-order relation between goals as follows.

###### Definition 1 (Generalization)

Let and be goals. We say that is more general than (or, synonymously, is a generalization of) , denoted , if and only if there exists a renaming such that .

Hence, a goal is more general than another goal if the former is a subset of the latter modulo a variable renaming. While our notion of generalization is simple and purely of syntactic nature, it is in line with what one could consider to be a generalization at the semantic level, since generalizing a goal corresponds to removing computational units (constraints or atoms).

###### Example 1

Consider the goal . Then the goals , , and are all generalizations of .

In a more traditional logic programming context, an atom is typically defined as more general than another atom if the latter can be obtained from the former by applying a substitution [Benkerimi and W. Lloyd (1990), Sørensen and Glück (1995)] and generalizing an atom is done by replacing terms with new variables. Since in our context, atoms are represented in simple form (i.e. all arguments being variables), the same effect can be obtained by removing constraints from the goal. Note that our definition is, at the same time, more general, as it allows to generalize a goal also by removing atoms. In a traditional logic programming context where goals are conjunctions of atoms, one need to resolve to higher-order generalization techniques in order to obtain the same effect. Also observe that in our generalization scheme, constants and functors are impossible to generalize through variabilization, because renamings are mappings from variables to variables only. This is a fundamental difference of relation with the -subsumption relation of [Plotkin (1970)], the latter being defined by substitutions rather than renamings. Our relation is a first-order generalization (higher-order terms as well as predicate names can’t be generalized) with firm constants and functors.

Defining generalizations with injective mappings (i.e. renamings) rather than arbitrary mappings from to as in -subsumption ensures that some variable cannot be generalized by two (or more) distinct variables in the computed generalization. If renamings weren’t injective, a generalization could have many more variables than the goals it generalizes; in that case, the generalization could contain variables that are no longer linked on the semantic level such as new variables occurring only once. For many domains, the injective property makes more sense, not allowing variables to lose their semantics once generalized.

###### Example 2

Let us consider where we suppose the constraints are over some numerical domain. In our framework, the three following generalizations are correct: , , . Without the restriction to injective renamings, would also be a valid generalization.

In practice, some domain-specific constraint predicates and functional operators could be characterized as commutative (such as and for numeric instances), which would affect their generalizations. The approach presented in this paper could easily be extended to take this property into account, but for the sake of clarity we will keep the approach purely syntactic on that point of view, only considering non-commutative symbols in textual representations of constraints. Despite their differences, our generalization relation shares the following property with the usual -subsumption order from [Khardon and Arias (2006)].

###### Proposition 1

The generalization relation is a quasi-order.

We need to prove that is transitive and reflexive. Reflexity is immediate since for any goal and, thus, . For transitivity, consider three arbitrary goals , and such that and . Then by definition 1, there exist and such that and . Or, equivalently,

 G3=(G1ρ1∪Δ1)ρ2∪Δ2=G1ρ1ρ2∪(Δ1ρ2∪Δ2)

Since the composition of two renamings is a renaming, and the union of two sets a set, it follows that .

Generalized goals are linked by their semantics as stated in Proposition 2 below.

###### Proposition 2

Let be a program and and goals. If such that for some renaming , then for any set of variables , we have that .

The proof is trivial given that . Indeed, suppose that is composed of a set of constraints and a set of atoms . Then, if is a valuation on the underlying domain such that and , then there exist some predicate symbol such that . Now, since for some set of constraints and/or atoms , it holds that for the constraints and for the set of atoms . Consequently, .

We can now define the computational structure that is shared by a set of goals through the concept of common generalization.

###### Definition 2 (Common generalization)

Let be a set of goals. Then a goal is a common generalization of if and only if .

In what follows we will mostly consider common generalizations of two goals. Note that at least one common generalization exists for any two goals: the empty set which can be seen as the most general generalization, i.e. the minimal element in the quasi-order . But obviously the empty set is not an interesting generalization to express similarities in groups of literals. In what follows, we are interested in computing what we call a most specific generalization, that is a maximal element with respect to . A most specific generalization is also sometimes called a least general generalization.

###### Definition 3 (msg)

Let be a common generalization of . Then is a most specific generalization (msg) of if there does not exist another common generalization of , say , such that and .

Note that, by definition, a common generalization of two goals and is a variant of both a subset from and of a subset from . Without loss of generality, we will often consider a common generalization to be a subset of one of the goals, as in the following example.

###### Example 3

Let us consider the goals

 G1={f(X),g(X),g(Y)}G2={f(R),g(T)}

is a common generalization of , as there exists such that , so ; it also holds that , so . Moreover, is an msg of as no strictly less general common generalization exists, having generalized all literals in . Note that is also an msg of , which can as easily be proved. In fact, by Definition 1, any variant of is also an msg for and .

Contrary to the case of traditional logic programming, where the most specific generalization of two goals is unique (modulo a variable renaming) [Benkerimi and W. Lloyd (1990)], in our context two goals may typically have several most specific generalizations.

###### Example 4

Let us consider the goals

 G1={f(X),g(Y),h(X,Y)}G2={f(R),g(U),h(T,S)}

and are both msgs of . Indeed, each of these generalizations doesn’t allow the addition of any more literals while remaining a valid common generalization of and , due to the injectivity of the generalization renamings. The two msgs are thus incomparable, -wise.

Amongst the msgs of a set of goals, some generalizations could only have a few literals, thereby capturing less common structure than others. Ideally, we are interested in those most specific generalizations that are of maximal cardinality.

###### Definition 4 (mcg)

Let be a common generalization of . Then is a maximal common generalization (mcg) of if there does not exist another common generalization of , say , such that .

It is trivial to show that a maximal generalization of a set of goals is also a most specific generalization of . Indeed, if it weren’t the case, it would, by Definition 3, be possible to add some literal to and get a more specific generalization. But the latter generalization would have strictly greater cardinality than , so cannot be maximal. However, computing a maximal common generalization is an intractable problem. The reason is, of course, due to the fact that we need to match unordered sets of literals rather than sequences, whereas the classical subsumption-based formulation from [Plotkin (1970)] is computable in polynomial time.

In order to show this formally, we define a decision problem variant which we name MCGP (Maximal Common Generalization Problem) that we show to be NP-complete. The decision problem variant MCGP boils down to verifying whether there exist a renaming such that the smallest of two goals is in itself a maximal common generalization of both. Formally: given two goals and with and , verify whether there exists such that is a subset of .

###### Theorem 1

The MCGP problem is NP-complete.

It is easy to see that MCGP is in NP: given renamed apart goals and as well as a renaming , the application of on all the literals in will either yield a subset of or not, which can be verified in polynomial time.

We will now perform a reduction from the Induced Subgraph Isomorphism Problem (ISIP) which is stated as follows [Sysło (1982)]. Given two unoriented and unweighted graphs, and , where for each graph , denotes the set of vertices and the set of edges between vertices from . Assuming, moreover, that , then ISIP is the problem of deciding whether is isomorphic to an induced subgraph of meaning there exists a (total) injective function such that , there is an edge if and only if there is an edge . The problem is known to be NP-complete [Sysło (1982)].

We can transform any instance of ISIP into an instance of MCGP as follows. Given the graphs and (with ), we define goals

 G1={node(Vx)|x∈V1}∪{edge(Vx,Vy)|(x,y)∈E1} G2={node(Vx)|x∈V2}∪{edge(Vx,Vy)|(x,y)∈E2}

In these goals, we suppose that node is a unary predicate representing nodes and edge a binary predicate representing edges between nodes. Given a node we use a variable named to represent this node in the goal. If and have at least one variable’s name in common, considering a renamed apart version of rather than itself will ensure that the obtained instance of MCGP is valid. Using this scheme, the transformation from graphs into goals can obviously be done in polynomial time. We will now prove that this transformation preserves the positive and negative instances of ISIP, that is is isomorphic to an induced subgraph of if and only if is an mcg of .

• Let us suppose that is isomorphic to an induced subgraph of . In other words there exists an injective function such that , there is an edge if and only if there is an edge . We have to show that is an mcg of and . Obviously the existence of implies the existence of a renaming defined as . Since is a total injective function, we have that for each there is and, by definition of , for each there is . In other words is a subset of and, hence, is a generalization of and, consequently, a maximal common generalization of .

• The other way round, suppose that is an mcg for , implying there exists a renaming such that . Given that and that is injective by definition, we can define a function as that is injective as well. Now, (i.e. is total) since there is a for each vertex . Moreover, since , we have that for each there exists and, consequently, we have that , there is an edge if and only if there is an edge concluding the proof that is isomorphic to an induced subset of .

## 3 Anti-unification algorithm

In the following we restrict ourselves to generalizations of two renamed apart goals - each of them being a set of literals. To construct a generalization of goals and our algorithm basically needs to search for a subset of that is also a subset of (modulo a variable renaming) and vice versa. To represent these matching subsets, the algorithm will use an injective mapping that associates literals from to matching literals of . For such to represent a generalization, there must exist a renaming such that and, likewise, . In what follows we will use the word generalization to refer to such a mapping as well as to the goal(s) it represents.

###### Example 5

Let us consider the goals

 G1={f(X),f(Z),g(X,Y),h(Y,Z)}G2={f(R),g(R,T),h(T,U),f(U)}.

Then the mapping (mapping from to from and from to from ) is a generalization of and . Indeed, and is a variant of .

Since computing maximal common generalizations is an NP-complete problem, we will rather focus on computing common generalizations that are not necessarily maximal, but whose size is stable in the sense that replacing a limited number of elements in does not give rise to a larger generalization. Let us first define the notion of a -swap, being a replacement of at most elements in a generalization.

###### Definition 5 (k-swap)

Let and be two renamed apart goals, and generalizations. We say that is a -swap of if and only if and for some .

Intuitively, a k-swap of a generalization is obtained from by changing at most elements such that the result is still a generalization.

###### Example 6

Let us reconsider the generalization from Example 5. Then the generalization

 ϕ′={(g(Y,X),g(R,T)),(h(Y,Z),h(T,U))}

is a 1-swap of , since effectively one element has been replaced in to get . In a similar way, is a 2-swap of (but is not a 1-swap, as two elements have been replaced to get ).

Central to our approach to get a workable anti-unification algorithm is the notion of -swap stability. We call a generalization of goals and -swap stable if any larger generalization of these goals differs from in at least elements.

###### Definition 6 (k-swap stability)

Let and be two renamed apart goals and a generalization of and . Then the generalization is -swap stable if and only if there does not exist a larger generalization where is a -swap of . Such a is called a k-swap extension of .

A -swap stable generalization, even though not necessarily maximal, is at least stable in the sense that there is no obvious way (i.e. by replacing or less elements) in which a larger generalization could be obtained. Put differently, when a generalization is constructed by a search algorithm, -swap stability implies that in order to find a larger generalization, the algorithm would need to reconsider at least choices that were made during construction.

###### Example 7

Consider and . Then, when is constructed by mapping to , the largest generalization mapping that can grow to is or, equivalently, the generalization . However is not 1-swap stable. Indeed, mapping to instead would give rise to or, equivalently, the larger generalization .

Obviously, if a generalization between goals and is k-swap stable for all , then is a maximal and thus most-specific generalization. This is in line with the intuition that as grows, any k-swap-stable generalization has increased stability and thus increased accuracy (in number of generalized literals).

One more concept needs to be introduced before we can define our algorithm for computing -swap stable generalizations, namely an operator that allows to combine two generalizations into a single generalization.

###### Definition 7 (Enforcement operator)

Let and be two renamed apart goals. The enforcement operator is defined as the function such that for two generalizations and for and , where is the largest subset of such that is a generalization of and .

In other words, is the mapping obtained from by eliminating those pairs of literals from that are incompatible with some either because it concerns the same literal(s) or because the involved renamings cannot be combined into a single renaming.

###### Example 8

Consider , a generalization of two goals and . Suppose is also a generalization of and . Enforcing gives . Indeed, this can be seen as forcing to be mapped on ; therefore the resulting generalization can no longer contain as the latter maps on .

Algorithm 1 represents the high-level construction of a k-swap stable generalization of goals and . In the algorithm, we use to represent those literals from and that are variants of each other, formally . In each round, the algorithm tries to transform the current generalization (which initially is empty) into a larger generalization by forcing a new pair of literals from in , which is only accepted if doing so requires to swap no more than elements in . More precisely, the algorithm selects a subset of (namely ) that can be swapped with a subset of the remaining mappings from that are not yet used such that the result of replacing by in and adding constitutes a generalization. Note how condition 1 in the algorithm expresses that must include at least those elements from that are not compatible with . The search continues until no such can be added.

Even if the algorithm as formulated is non-deterministic and does not specify how , or are computed (we will come back to this), it can easily be seen that it computes a generalization that is -swap stable.

###### Theorem 2

Given renamed apart goals , and a constant , the generalization computed by Algorithm 1 is -swap stable.

Given goals , and constant , Algorithm 1 can be seen as computing a sequence of generalizations where each represents the value of at the end of the -th loop iteration. The generalization is then the final value in this sequence, i.e. .

The proof is by contradiction. Suppose that is not k-swap stable. By definition, this means that there exists a k-swap extension of such that and , with a k-swap of . Consequently, there exist generalizations , and such that and , with and . Then, by taking and the conditions of in the algorithm are satisfied, contradicting the fact that the algorithm’s execution would end with .

For a given value of , Algorithm 1 computes thus a -swap stable generalization, at least if an exhaustive search is performed in each round of the repeat loop in order to find a couple that allows to transform into a strictly larger generalization . Even if this exhaustive search is implemented, it is not hard to see that for a given and constant value of , the algorithm executes in time , where is a constant and proportional to . Note how the exponent depends on , which is a constant parameter unrelated to the size of the goals to generalize (the input). Therefore the execution time of the algorithm is polynomially bounded.

By aiming to improve some initial solution at each iteration, Algorithm 1 is an anytime algorithm: as such, in concrete implementations one could retrieve the -th generalization computed by Algorithm 1 when it is interrupted at iteration . The -th generalization may not be k-swap stable, but it is assured to be a generalization of size . Also note that being inherently non-deterministic, the algorithm is by no means guaranteed to find the largest, or most convenient, -swap stable generalization. In order to somewhat steer the search towards a promising generalization, we introduce the concept of a

quality estimator

, i.e. a function that associates a value in to any couple of matching literals . The general idea behind this function being that the higher the value associated to a couple

, the higher the probability that

is an element of a maximal common generalization.

###### Definition 8 (Quality estimator)

Given goals and , a quality estimator is a function . When goals and are unambiguously identified, we will simply write .

A typical implementation of Algorithm 1 will thus loop through the potential couples in descending order of their -values. If is a perfect oracle – in the sense that it associates maximal values to those couples that constitute an mcg – then, obviously, Algorithm 1 computes this mcg. In practice, however,

will be a heuristic. In our implementation, which we elaborate on in Section

4, we use the following heuristic -function.

###### Example 9

An intuitive yet efficient quality estimator is the function that maps a couple to the multiplicative inverse of the number of conflicts the couple has with other couples (i.e. the involved renamings being incompatible). Let denote the set . We then define as . The ”+1” term is only meant to avoid division by zero.

A quality estimator acts as an indicator of the interest of having a couple into the generalization under construction. It will naturally segment the couples in into subsets with different quality () values, guiding our algorithm as to which couples should or should not be part of the generalization. Now, inside the main loop of Algorithm 1, the same estimator function can be used to guide the search for the -swap - in particular the mappings and - rather than computing these by exhaustive search. Algorithm 2 provides such a concrete search procedure based on . Given a couple of atoms and a generalization under construction, the algorithm searches for a suitable and that could be used as a -swap to continue the construction of the generalization by Algorithm 1.

The search process of Algorithm 2 is conceptually analogous to an A* search. The mapping is initialized with the part of that is incompatible with the pair of atoms we wish to enforce into the generalization. Its replacement mapping is initially empty and the algorithm subsequently searches to construct a sufficiently large (the inner while loop). During this search, represents the set of candidates, i.e. couples from that are not (yet) associated to the generalization, and represents the subset of of which each element could be added to such that the result is a generalization (i.e. there is no conflict in the associated renamings). In order to explore different possibilities by backtracking, the while loop manipulates a stack that records alternatives for with the corresponding set for further exploration.

Now, in order to steer the search process, only candidate couples having an -value within the best are considered for further exploration. We therefore define (resp. ) as denoting the subset of composed of those couples that have an associated -value among the highest (resp. lowest) qualities of elements in . In this, is a parameter of the algorithm that can be used to control the degree of backtracking. If backtracking is performed over all possible alternatives (exhaustive search), whereas when only the couples with the best (or worst) -value are considered for use. Note that even when exhaustive search is used (), the algorithm considers the most promising couples (those with the highest -values) first.

If the search for was without a satisfying result (i.e. no is found equal in size to ), the algorithm continues by removing another couple from (thereby effectively enlarging ). The rationale behind this action is that there might be a couple in that is “blocking” the couples in from addition to . In order to steer the removal of such potentially blocking couples, a couple from is selected, and alternatives (those having an -value among the worst) are recorded in a queue (). Note the use of a queue (and its associated operations enter and exit) as opposed to the stack .

The process is repeated until either in what case we have found a suitable -swap, or until in what case we have not, and the algorithm returns .

## 4 Prototype evaluation

In order to experimentally evaluate both the result and performance of our approach, we have made a prototype implementation of Algorithms 1 and 2 in Prolog111Source code is available at https://github.com/Gounzy/CLPGeneralization.. The implementation uses the quality function defined in Example 9. Our evaluation consist in computing -swap stable generalizations for a considerable set of test cases (pairs of goals) that have been generated randomly according to certain criteria. In particular, we have defined 6 problem classes, the characteristics of which are represented in Table 1.

Table 1 provides, for each problem class, a row containing the admissible (ranges of) values that were used when generating a test case belonging to that class. The columns ’Variables’ and ’Literals’ denote, respectively, the number of variables and literals that are allowed in the generated goals. The column ’Variable combinations’ denotes the total number of mappings that must exist between the variables of and the variables of . In a similar vein, the column ’Literal matchings’ denotes the number of subsets of (excluding those mapping a single literal more than once), as such representing an upper bound on the number of potential generalizations of and . Note that these parameters (in particular the latter two) guarantee that each test case exhibits a certain complexity for the anti-unification algorithm and the parameter values of each class are chosen in such a way to have ascending complexities both with respect to the number variable combinations and literal matching possibilities that could potentially need to be explored by the algorithm. The generated literals are all atoms that are built using three test predicates and . Real-life applications would typically harbor a higher number of literal symbols, but less symbols tend to increase the anti-unification complexity of the generated goals, making them more of a challenge for our algorithm. Also note that although being built on a CLP formalism, the test instances are by no means intended to depict real-life Constraint Satisfaction Problems (CSP). They rather represent batches of anti-unification instances as could arise in semantic clones detection [Mesnard et al. (2016)] where one typically needs a fast and efficient anti-unification algorithm capable of handling a multitude of goals in a reasonable time.

###### Example 10

The following is an example of a generated test case, verifying the constraints of class 2 in Table 1. It presents 72,000 anti-unification possibilities and 181,440 possible variable combinations.

Table 2 summarizes the results of our experimental evaluation. Four incarnations of our algorithm were tested, computing -swap stable generalizations for , , and . Each incarnation is represented in the table by, respectively, , , and . For each incarnation, we have fixed in order to severely limit backtracking to alternatives having the same -value. While minimal backtracking is of course advantageous for the execution time, it is at the same time the most demanding setting when testing the accuracy and relevance of the -swap stability concept. To compare the execution times, we have also implemented two naive brute-force algorithms, denoted in the table by and , that compute an mcg either by exhaustively enumerating all possible renamings () or all possible literal matchings () and retaining the largest generalization that was thus found.

For each of the 6 problem classes, one thousand examples were generated verifying the constraints of the class. Each algorithm was executed over all 1000 examples and Table 2 displays their average execution time (in milliseconds). As expected, the execution time is higher for larger values of , and grows with the complexity of the problems that are dealt with. However, for all classes but the simplest, the execution time of our algorithm (even in the case where ) stays well below the execution time of the brute-force algorithms. For the more complex problem classes, the difference amounts to several orders of magnitude and remains more than manageable (in the millisecond range), even with . Only for the simplest of test cases (problem class 1) our algorithm shows an overhead caused by trying out some k-swaps more than once. As a side note, between the two brute-force algorithms is in general the slowest because it has in general an enormous amount of variable mappings to explore, while is more often able to cut exploration paths when encountering incompatible literal matchings during its mcg construction process.

In order to test the accuracy of our abstraction, for each example we compared the size of the computed -swap stable generalization with the size of computed by the naive algorithms. For each problem class and algorithm incarnation, Table 2 displays the average size of the computed -swap stable generalization expressed as a percentage of the size of the corresponding mcg. As can be expected, the accuracy grows for larger values of but is, on average, never below 80% of the mcg even for the most simple and greedy incarnation of our algorithm (). Note that in the case of , the average accuracy is below 100% while in theory should compute an mcg. This is of course due to the fact that , meaning that not enough backtracking is performed in order to compute an mcg in all cases. These are nevertheless quite promising results.

While the use of average times and accuracy might be criticized, it is noteworthy that for all problem classes and algorithms the standard deviation between the execution times was less than 20% of the average value and less than 10% in the case of the accuracy.

In conclusion, these simple experiments show that our abstraction performs quite well: although it will in general not compute the maximal common generalization, it will find relatively large generalizations in a tractable time (generally even impressively fast when compared to a brute-force approach), even when the overall anti-unification complexity is high.

## 5 Conclusions and future work

In this work, we have established a theory of anti-unification (or generalization) in the context of Constraint Logic Programming. When goals are considered as sets of atoms and constraints, the problem of computing their maximal common generalization becomes an intractable problem, a result that we have formally proved. We have introduced an abstraction of the maximal common generalization, namely a -swap stable generalization, that can be computed in polynomial time. We have defined a skeleton algorithm that is parametric by and that allows to steer the generalization by a heuristic function . We have shown our algorithm to provide promising results on a set of randomly created test cases. Its parameters should be tuned to achieve the best trade-off between output mcg size (by increasing and/or ) and time performance (by decreasing and/or ), depending on the application at hand. Future work should investigate the exact interaction between parameters and : when not able to find an mcg, the responsible parameter is, in our current prototype, not clearly identified. While the heuristic function we have used in our prototype implementation seems to perform quite well and results in overall large generalizations, other heuristic functions can be envisioned, possibly in function of the application at hand.

In further work, we also aim at integrating the notions developed in this paper into a framework for clone detection or algorithmic equivalence recognition such as [Mesnard et al. (2016)] that uses CLP clauses as an intermediate program representation. Having an efficient generalization algorithm is a necessary ingredient that allows to compute the similarity between program fragments. We expect that our generalization concept and algorithm can be integrated in such a framework such that it would allow to steer the underlying transformation process. In that context, we intend to conduct a more in-depth empirical study of the two algorithms presented in Section 3. We will in particular investigate the complexity of Algorithm 2 that in practice depends on the branching factor induced by the quality estimator at hand.

Direct applications of our generalization algorithm include other transformational approaches on CLP programs, in particular those where computing generalizations is a means to obtain finiteness of the transformation, an example being partial deduction of CLP programs. Our anti-unification theory is a general and domain-independent framework. As such, it can likely be incarnated and enforced by incorporating and integrating domain-specific widening operators, which is another topic for future work. Moreover, depending on the context, generalizations can be considered maximal or most-specific based on other criteria than just cardinality, a simple example being the amount of literal arguments captured in the common generalization. This is especially relevant when arities can widely vary from one literal to another, and constitutes a topic for future research on other generalization strategies.

## Acknowledgments

We thank anonymous reviewers for their constructive input and remarks.

## References

• Benkerimi and W. Lloyd (1990) Benkerimi, K. and W. Lloyd, J. 1990. A partial evaluation procedure for logic programs. 343–358.
• Bouajjani et al. (2006) Bouajjani, A., Habermehl, P., Rogalewicz, A., and Vojnar, T. 2006. Abstract regular tree model checking. Electronic Notes in Theoretical Computer Science 149, 1, 37 – 48. Proceedings of the 7th International Workshop on Verification of Infinite-State Systems (INFINITY 2005).
• Burghardt (2014) Burghardt, J. 2014. E-generalization using grammars. CoRR abs/1403.8118.
• De Schreye et al. (1999) De Schreye, D., Glück, R., Jørgensen, J., Leuschel, M., Martens, B., and Sørensen, M. H. 1999. Conjunctive partial deduction: foundations, control, algorithms, and experiments. The Journal of Logic Programming 41, 2, 231 – 277.
• Fioravanti et al. (2013) Fioravanti, F., Pettorossi, A., Proietti, M., and Senni, V. 2013. Controlling polyvariance for specialization-based verification. Fundam. Inform. 124, 4, 483–502.
• Gallagher (1993) Gallagher, J. P. 1993. Tutorial on specialisation of logic programs. In Proceedings of the 1993 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation. PEPM ’93. ACM, New York, NY, USA, 88–98.
• Gange et al. (2015) Gange, G., Navas, J. A., Schachte, P., Søndergaard, H., and Stuckey, P. J. 2015. Horn clauses as an intermediate representation for program analysis and transformation. TPLP 15, 4-5, 526–542.
• Gutiérrez-Naranjo et al. (2003) Gutiérrez-Naranjo, M. A., Alonso-Jiménez, J. A., and Borrego-Díaz, J. 2003. Generalizing programs via subsumption. In Computer Aided Systems Theory - EUROCAST 2003, R. Moreno-Díaz and F. Pichler, Eds. Springer Berlin Heidelberg, Berlin, Heidelberg, 115–126.
• Jaffar et al. (1998) Jaffar, J., Maher, M., Marriott, K., and Stuckey, P. 1998. The semantics of constraint logic programs. The Journal of Logic Programming 37, 1, 1 – 46.
• Jaffar and Maher (1994) Jaffar, J. and Maher, M. J. 1994. Constraint logic programming: a survey. The Journal of Logic Programming 19-20, 503 – 581. Special Issue: Ten Years of Logic Programming.
• Khardon and Arias (2006) Khardon, R. and Arias, M. 2006. The subsumption lattice and query learning. Journal of Computer and System Sciences 72, 1, 72 – 94.
• Leuschel et al. (1998) Leuschel, M., Martens, B., and Schreye, D. D. 1998. Controlling generalization amd polyvariance in partial deduction of normal logic programs. ACM Trans. Program. Lang. Syst. 20, 1, 208–258.
• Mesnard et al. (2016) Mesnard, F., Payet, É., and Vanhoof, W. 2016. Towards a framework for algorithm recognition in binary code. In Proceedings of the 18th International Symposium on Principles and Practice of Declarative Programming, Edinburgh, United Kingdom, September 5-7, 2016, J. Cheney and G. Vidal, Eds. ACM, 202–213.
• Pettorossi and Proietti (1998) Pettorossi, A. and Proietti, M. 1998. Program specialization via algorithmic unfold/fold transformations. ACM Comput. Surv. 30, 3es, 6.
• Plotkin (1970) Plotkin, G. D. 1970. A note on inductive generalization. Machine Intelligence 5, 153–163.
• Sørensen and Glück (1999) Sørensen, M. H. and Glück, R. 1999. Introduction to supercompilation. In Partial Evaluation - Practice and Theory, DIKU 1998 International Summer School. Springer-Verlag, Berlin, Heidelberg, 246–270.
• Sørensen and Glück (1995) Sørensen, M. H. and Glück, R. 1995. An algorithm of generalization in positive supercompilation. In Proceedings of ILPS’95, the International Logic Programming Symposium. MIT Press, 465–479.
• Sysło (1982) Sysło, M. M. 1982. The subgraph isomorphism problem for outerplanar graphs. Theoretical Computer Science 17, 1, 91 – 97.