    # Implementing Anti-Unification Modulo Equational Theory

We present an implementation of E-anti-unification as defined in Heinz (1995), where tree-grammar descriptions of equivalence classes of terms are used to compute generalizations modulo equational theories. We discuss several improvements, including an efficient implementation of variable-restricted E-anti-unification from Heinz (1995), and give some runtime figures about them. We present applications in various areas, including lemma generation in equational inductive proofs, intelligence tests, diverging Knuth-Bendix completion, strengthening of induction hypotheses, and theory formation about finite algebras.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

An important task in the field of artificial intelligence is generalization. To the extent that a generalization approach allows us to incorporate certain background knowledge, it opens up applications in various fields of computer science. Inductive logic programming, for example, is concerned with generalization wrt. Horn–logic theories, with potential applications in automated scientific discovery, knowledge discovery in databases, automatic programming, and other areas.

[Hei95] gives several algorithms for generalizing terms wrt. equational background theories; they are applied to lemma generation in equational induction proofs. In this paper, we describe the implementation of these algorithms and give some further applications.

As a motivating example, consider the sequence , , , , represented by terms , , , , respectively. Trained people with a knowledge of addition and multiplication of natural numbers will easily recognize that square numbers computable by the scheme are the desired solution for finding the continuation of the numbers. However, syntactical anti–unification merely leads to the term , which is too general to be used for computing a continuation of the above sequence.

A useful mechanism for computing schemes is anti–unification extended to take account of equational theories. Addition and multiplication of natural numbers can be specified by the equations of Theory (1) in Fig. 3. In Fig. 1, we demonstrate that is one of the generalizations resulting from anti–unification of the above terms , , , modulo equational Theory (1), since each of them equals an instance of .

Some problems arise if generalizations of terms wrt. equational theories have to be computed:

• There may exist many generalizations allowing the elements of a sequence of terms to be computed.

• The set of generalizations wrt. equational theory is usually infinite (see, however, Fig. 42 in Sect. 7.2).

• Hence, only approaches enumerating its elements can be provided.

• Depending on the application area, the set of possible generalizations may contain useless or undesired computation schemes.

In [Hei95], an approach to generalization modulo canonical equational theories, called anti–narrowing, has been developed. This approach simply allows all generalizations of terms modulo canonical theories to be enumerated; useless generalizations can only be eliminated after their computation using corresponding criteria.

Instead of enumerating all generalizations of terms modulo equational theory, a compact, finite representation of the set of all generalizations is desirable for several reasons, e.g. to enable all useless generalizations to be eliminated at one go.

[Hei95] therefore investigates a second approach to E–anti–unification based on regular tree grammars111Called “sorts” in this paper, following [Com90]. At the heart of the approach is an algorithm that takes two sorts or regular tree languages , and computes a sort or regular tree language of all syntactical generalizations of terms from and :

 L(θ)={t⊓t′∣t∈L(s),t′∈L(s′)}.

In order to E–anti–unify two terms modulo a given background equational theory , their equivalence classes modulo are represented by sorts ; the corresponding then contains all E–generalizations of both terms.

In this paper, we describe the implementation of the sort approach to E–anti–unification including technical optimizations and run-time measurements. We present several applications:

• generation of lemma candidates in blocked situations of an inductive proof (Sect. 6)

• construction of series–formation laws (Sect. 7)

• some other potential applications, which are only sketched (Sect. 8).

Section 2 introduces some necessary formal definitions and notations. Section 3 presents the implementation of E–Anti–Unification. Section 4 discusses how to enumerate the terms of a computed sort. Section 5 gives an elaborate example of E–anti–unification. Sections 6 to 8 present the above–mentioned applications. In Sect. 11, we show that for each finite algebra we can always generate a closed representation of all its quantifier–free and variable–bounded theorems. The PROLOG source code of our implementation is listed in Appendix 0.A.

## 2 Definitions and Notations

Definition 1. Let be an enumerable set of variables, a finite set of function symbols, each with fixed arity; denotes the arity of . Terms are built from ; denotes the set of terms.

We assume familiarity with the classic definitions and notations of terms. We say that a term starts with a function symbol if for some terms . describes the set of variables occurring in a term . A term is called linear if no variable occurs in it twice. A term is called a ground term if does not contain any variables.

Definition 2. Lists are built from and , written as as in PROLOG; denotes the number of elements of the list ; denotes its first element. The list–comprehension notation denotes a list of all , such that holds, in some arbitrary order.

denotes an -tuple; denotes its -th projection.

means equal by definition. means that is a subset of , or is equal to .

denotes the number of elements of a finite set . denotes the image of the set under the mapping ; denotes the inverse image of under . For a set , we denote its -fold cartesian product by .

Definition 3. Substitutions are defined as usual; denotes a substitution that maps each variable to the term . Sometimes we also use a set–comprehension–like notation: denotes a substitution that maps each to such that holds. denotes the domain of .

is called a linear substitution if is a linear term, where . is called a flat substitution if for all . is called a renaming substitution if it is a bijective mapping from variables to variables, and hence is both linear and flat.

is called an instance of , and an anti–instance of , if a substitution exists such that . If is a renaming, is also called a variant of .

Definition 4. An equational theory is a finite set of equations . The relation is defined as usual as the smallest reflexive, symmetric, and transitive rewrite relation that contains .

The equivalence class of a term mod.  is denoted by . denotes the algebra of equivalence classes. We assume that in each equivalence class , some term is distinguished, which we call the normal form of . Let be the set of all normal forms. , , and depend on .

Note that we do not require to be computable by a confluent and noetherian term–rewriting system. However, we require that each equivalence class be a regular tree language.

Definition 5. Let be an enumerable set of sort names, and let the sets , , and be pairwise disjoint. Let denote the set of all sort expressions; a sort expression is a sort name or has one of the forms

• , where ;

• , where , ; or

• for .

A sort definition is of the form , where is a sort name and is a sort expression. We say that is defined by .

Given a system of sort definitions where each occurring sort name has exactly one definition, we define their semantics as their least fixed point. We denote the semantics of a sort expression by , which has the following properties:

L(s1∣s2) L(x) =L(s1)∪L(s2) ={x} for x∈V ={f(t1,…,tn)∣ti∈L(si)} for f∈F =L(se) for sn≐se

Observe that , and for each . The empty sort is denoted by . A sort definition of the form is said to be in head normal form, where , , and . For proof–technical reasons, we define a system of sort definitions as being in normal form if each sort definition has either the form

• with , or

• with ,

and if no cycles

 s1 ≐…∣ s2 ∣… s2 ≐…∣ s3 ∣… … sn−1 ≐…∣ sn ∣… sn ≐…∣ s1 ∣…

occur. Each system of sort definitions can be transformed into head normal form, and into normal form, maintaining its semantics. Any system of sort definitions in head normal form, or in normal form, has exactly one fixed point.

Theorem 6. Assume a system of sort definitions in normal form. Let be a family of unary predicates, indexed over the set of all defined sort names. Show for each defined sort name :

 ∀t∈Tps(t)↔ ps1(t)∨…∨psn(t) if s≐s1∣…∣sn ∀t∈Tps(t)↔ ∃t1,…,tn∈Tt=f(t1,…,tn)∧ps1(t1)∧…∧psn(tn) if s≐f(s1,…,sn) ∀t∈Tps(t)↔ t=x if s≐x

Then, holds for each defined sort name .

Definition 7. A term is called a generalization of and iff there exist substitutions and such that and . is called the most specific generalization iff each generalization of and is an anti–instance of . The most specific generalization of two terms always exists and is unique up to renaming. We sometimes use the notation .

The above definitions can be extended to generalization of terms; we write to denote the most specific generalization of .

Definition 8. A term is called an -generalization of terms iff there exist substitutions and such that and .

As in to unification, a most specific -generalization of arbitrary terms does not usually exist. A set is called a correct set of -generalizations of iff each member is an -generalization of . is called complete if, for each -generalization of , contains an instance of . is called complete wrt. linear generalizations if for each linear term , which is an -generalization of , contains an instance of .

The following algorithm can be traced back to the early seventies [Plo70, Plo71, Rey70]. It takes two terms and computes the syntactical generalization .

Algorithm 9. Let be an infinite set of new variables and an injective mapping.

1. Define .

2. Define , if .

Since syntactical anti–unification is unique only up to renaming, the mapping is used to fix one concrete variable naming that is the same in all subterms. In the example in Fig. 2, the use of ensures that both occurrences of yield the same result, viz. the variable .

Lemma 10. Algorithm 2 can be extended to anti–unify terms simultaneously, requiring . For any such and any finite , we may define the substitutions for by . We have for all .

The result of Alg. 2 then satisfies the following correctness property:
for provided we choose .

## 3 E–Anti–Unification

In this section, we describe the implementation of E–anti–unification based on sorts.

Section 3.2 restates the algorithm from [Hei95] for computing a sort containing all linear generalizations; Sect. 3.3 discusses how to subsequently modify this sort in order to get all nonlinear generalizations as well. To obtain the sort of all generalizations, Alg. 3.2 from Sect. 3.2 is run as a first phase, then the algorithm from Sect. 3.3 is run as the second phase; cf. also Sect. 5.

In Sect. 3.4, the algorithm for computing the sort of all generalization terms that contain only variables from a given set is restated from [Hei95]. Note that all linear and nonlinear generalizations are computed in one phase only, thus saving a large amount of computation time; cf. Sect. 3.6, Fig. 15.

Section 3.5 discusses a technical optimization that applies to both and and helps to avoid many useless recursive calls. Sect. 3.6 presents some figures for runtime measurements and the improvement factors of technical optimizations.

### 3.1 Modeling Equivalence Classes as Sorts

The algorithms from Sect. 3.2/3.3 and from Sect. 3.4 both require the representation of the equivalence classes of the input terms as sorts. In this paper, we do not treat this issue in detail; rather, we drew up the respective sort definitions manually.

Figure 3 shows the equational theories used in this paper; we will refer to them as background theories. For example, Theory (1) consists of all equations that have a “1” in the leftmost column. Figure 23 refers to Theory (1) and shows the sort definition of and , representing and , respectively. Figure 4 shows the sort representation of to wrt. Theory (5); , , , and denote the sets of positive (i.e.

), even, odd, and arbitrary natural numbers, respectively.

In Theory (6), we have included the operator, such that equivalence classes of terms are no longer regular tree languages, and hence cannot be described by our sorts; for example, the sort definition would become infinite. We can, however, approximate the equivalence classes from below by cutting off the sort definitions at a certain , i.e., omitting all greater numbers. Figure 5 shows the approximations of to , where we set the cut–off point to , i.e.  is missing; hence, the equivalence classes do not contain terms that yield an intermediate result greater than when evaluated to normal form.

In Sect. 9, we discuss the use of sort–definition schemes that allow several sort definitions to be abbreviated by one scheme. Figure 51 shows a sort–definition scheme for background Theory (0). Figure 50, shows sort–definition schemes of equivalence classes mod.  and .

[Emm94] describes an algorithm for building the sort definitions automatically from a given confluent and noetherian term–rewriting system. However, there is no implementation that fits our data structures. See also Thm. 11.1 and Cor. 11.1 in Sect. 11, which provide sufficient criteria for an equational theory in order that the equivalence classes are regular tree languages.

### 3.2 Linear Generalizations

The following algorithm from [Hei95] takes two sorts and computes a sort containing all linear syntactical generalizations of terms , , i.e.

 θ⊃{t⊓t′∣t∈L(s),t′∈L(s′),t⊓t′ linear }.

Algorithm 11. Let and be sort names, and be a new sort name. Let be an infinite set of new variables, and an injective mapping. Define , where a new sort definition is introduced for :

1. If has been called before, then is already defined.

2. If , then define .

3. If , then define .

4. If and , then define .

5. If and with , then define .

As shown in [Hei95], is a correct set of generalizations which is complete for the linear ones, i.e., , and .

Note that Case 1. requires us to maintain a set of argument pairs for which has already been called, and to check the argument pair of each new call against this set. This set is called Occ in the implementation; it was initially implemented by a PROLOG list, then by a binary search tree, and now by a balanced binary search tree. A membership test in a balanced tree with 50 entries takes about 4 msec user time. See also Sect. 3.6 for the impact of balancing on runtime.

Assuming all sort definitions in head normal form, we can slightly improve the above algorithm in order to produce less variables. For example, given the sort definitions of from Fig. 23, the sort computed in Fig. 24 would comprise twelve different variables if computed by , while it actually – computed by below – has only one variable.

Since , Alg. 3.2 is still correct; since for each generalization we have for some flat substitution , Alg. 3.2 is also complete for the linear generalizations.

Algorithm 12. Let , , , and be as in Alg. 3.2. Define as follows:

1. If has been called earlier, its result is already defined.

2. If , and ,
then let be a new sort name,
define ;
let .

3. Define .

4. Define , if .

Usually, most of the disjuncts in Case 2. start with different function symbols, , and hence evaluate to by Case 4. In Sect. 3.5, we will discuss an optimization that avoids these recursive calls of .

Algorithms 3.2 and 3.2 can both be extended to compute the linear generalizations of sorts simultaneously, requiring . Our implementation is capable of that, and we will tacitly use with arguments where appropriate in this paper. Note that it makes sense to have multiple occurrences of the same sort among the input arguments. For example, using the sort definitions from Fig. 23, we have . Moreover, it is important to maintain the order of argument sorts during computation, since otherwise, for example

 hsg(s1,s1) = …∣hsg(s0,s1)+hsg(s1,s0)∣… = …∣hsg(s0,s1)+hsg(s0,s1)∣… = …∣v01+v01∣…

although is not equal to any instance of . For these reasons, we treat the argument of the extended as a list rather than as a set of sorts.

### 3.3 Nonlinear Generalizations

The result sorts of the above algorithm contains all linear generalizations, but only some nonlinear ones. That is not generally sufficient, as the example in Sect. 5 shows (see in particular the difference between and in Fig. 30).

In order to obtain as well all nonlinear generalizations, which are more specific, we need a second phase which introduces common variables to related sorts. We use the abbreviation ; note that exists since is injective. Below, we will assume for the sake of simplicity.

[Hei95] proceeds as follows: for each set of variables from the result sorts, whenever does not contain the empty sort, a new variable is introduced which can be thought of intuitively as generalizing the sorts in , and each occurrence of each is replaced by . For example, using the definitions from Sect. 5, we have