Many learning techniques in the field of symbolic Artificial Intelligence are based on adopting the features common to the given examples, calledselective induction in the classification of [DM84], for example. Syntactical anti-unification reflects these abstraction techniques in the theoretically elegant domain of term algebras.
In this article, we propose an extension, called -anti-unification or -generalization, which also provides a way of coping with the well-known problem of representation change [O’H92, DIS97]. It allows us to perform abstraction while modeling equivalent representations using appropriate equations between terms. This means that all equivalent representations are considered simultaneously in the abstraction process. Abstraction becomes insensitive to representation changes.
In 1970, Plotkin and Reynolds [Plo70, Plo71, Rey70] introduced the notion of (syntactical) anti-unification of terms as the dual operation to unification: while the latter computes the most general common specialization of the given terms, if it exists, the former computes the most special generalization of them, which always exists and is unique up to renaming. For example, using the usual - representation of natural numbers and abbreviating to , the terms and anti-unify to , retaining the common function symbol as well as the equality of its arguments.
While extensions of unification to equational theories and classes of them have been investigated [Fay79, Sie85, GS89], anti-unification has long been neglected in this respect, except for the theory of associativity and commutativity [Pot89] and so-called commutative theories [Baa91]. For an arbitrary equational theory , the set of all -generalizations of given terms is usually infinite. Heinz [Hei95, BH96] presented a specially tailored algorithm that uses regular tree grammars to compute a finite representation of this set, provided leads to regular congruence classes. However, this work has never been internationally published. In this paper, we try to make up for this neglect, giving an improved presentation using standard grammar algorithms only, and adding some new theoretical results and applications (Sect. 3.3, 4, 5.3 below).
|syn. anti-un.||syn. anti-un.|
In general, -anti-unification provides a means to find correspondences that are only detectable using an equational theory as background knowledge. By way of a simple example, consider the terms and . Anti-unifying them purely syntactically, without considering an equational theory, we obtain the term , which indicates that there is no common structure. If, however, we consider the usual defining equations for and , see Fig. 1 (left), the terms may be rewritten nondeterministically as shown in Fig. 1 (right), and then syntactically anti-unified to as one possible result. In other words, it is recognized that both terms are quadratic numbers.
Expressed in predicate logic, this means we can learn a definition from the examples and . Other possible results are , and, less meaningfully, . The closed representation of generalization sets by grammars allows us to filter out generalizations with certain properties that are undesirable in a given application context.
After some formal definitions in Sect. 2, we introduce our method of -generalization based on regular tree grammars in Sect. 3 and briefly discuss extensions to more sophisticated grammar formalisms. As a first step toward integrating -generalization into Inductive Logic Programming (ILP), we provide, in Sect. 4, theorems for learning determinate or nondeterminate predicate definitions using atoms or clauses. In Sect. 5, we present applications of determinate atom learning in different areas, including inductive equational theorem-proving, learning of series-construction laws and user support for learning advanced screen-editor commands. Section 6 draws some conclusions.
A signature is a set of function symbols , each of which has a fixed arity; if some is nullary, we call it a constant. Let be an infinite set of variables. denotes the set of all terms over and a given . For a term , denotes the set of variables occurring in ; if it is empty, we call a ground term. We call a term linear if each variable occurs at most once in it.
By , or , we denote a substitution that maps each variable to the term . We call it ground if all are ground. We use the postfix notation for application of to , and for the composition of (to be applied first) and (second). The domain of is denoted by .
A term is called an instance of a term if for some substitution . In this case, we call more special than , and more general than . We call a renaming of if is a bijection that maps variables to variables.
A term is called a syntactical generalization of terms and , if there exist substitutions and such that and . In this case, is called the most specific syntactical generalization of and , if for each syntactical generalization of and there exists a substitution such that . The most specific syntactical generalization of two terms is unique up to renaming; we also call it their syntactical anti-unifier [Plo70].
An equational theory is a finite set of equations between terms. denotes the smallest congruence relation that contains all equations of . Define
to be the congruence class of in the algebra of ground terms. The congruence class of a term is usually infinite; for example, using the equational theory from Fig. 1 (left), we have . Let
denote the set of all terms congruent to under .
A congruence relation is said to be a refinement of another congruence relation , if . In Sect. 5.1, we need the definition if for all ground substitutions with ; this is equivalent to the equality of and being inductively provable [DJ90, Sect. 3.2].
We call an -ary function symbol a constructor if functions exist such that
The are called selectors associated to . As usual, we assume additionally that for any two constructors , any variable and arbitrary terms . On this assumption, some constants can also be called constructors. No selector can be a constructor. If is a constructor, then .
A term is called a constructor term if it is built from constructors and variables only. Let and be constructor terms.
Constructor terms (Constructor terms) Let and be constructor terms.
If , then for some such that is a constructor term and for each .
If , then .
If , then .
Induction on :
If , then choose .
If , then . Hence, for some , and . By I.H., for some . For any , , and , we have , and therefore . Hence, all are compatible, and we can unite them into a single .
Follows from 1 with .
Induction on :
For , we have nothing to show.
If , we have , and we are done by I.H. ∎
Each is called an alternative of the rule. We assume that for each nonterminal , there is exactly one defining rule in with as its left-hand side. As usual, the rules may be mutually recursive.
Given a grammar and a nonterminal , the language produced by is defined in the usual way as the set of all ground terms derivable from as the start symbol. We omit the index if it is clear from the context. We denote the total number of alternatives in by .
In Sect. 4, we will use the following predicate logic definitions. To simplify notation, we sometimes assume all predicate symbols to be unary. An -ary predicate can be simulated by a unary using an -ary tupling constructor symbol and defining .
An -ary predicate is called determinate wrt. some background theory if there is some such that w.l.o.g. each of the arguments has only one possible binding, given the bindings of the arguments [LD94, Sect. 5.6.1]. The background theory may be used to define , hence ’s determinacy depends on . Similar to the above, we sometimes write as a binary predicate to reflect the two classes of arguments. For a binary determinate predicate , the relation corresponds to a function . We sometimes assume that is defined by equations from a given , i.e. that .
A literal has the form or , where is a predicate symbol and is a term. We consider a negation to be part of the predicate symbol. We say that the literals and fit if both have the same predicate symbol, including negation. We extend to literals by defining if . For example, if .
A clause is a finite set of literals, with the meaning . We consider only nonredundant clauses, i.e. clauses that do not contain congruent literals. For example, is redundant if . We write if
if is nonredundant, is uniquely determined by .
We say that -subsumes if for some . In this case, the conjunction of and implies ; however, there are other cases in which but does not -subsume . For example, implies, but does not subsume, , even for an empty .
A Horn clause is a clause with exactly one positive literal. It is also written as . We call the head literal, and a body literal for . Like [LD94, Sect. 2.1], we call the Horn clause constrained if .
We call a Horn clause
semi-determinate wrt. some background theory if all are determinate wrt. , all variables are distinct and do not occur in , , and . Semi-determinacy for clauses is a slight extension of determinacy defined by [LD94, Sect. 5.6.1], as it additionally permits arbitrary predicates . On the other hand, [LD94] permits for ; however, can be equivalently transformed into .
We treat the problem of -generalization of ground terms by standard algorithms on regular tree grammars. Here, we also give a rational reconstruction of the original approach from [Hei95], who provided monolithic specially tailored algorithms for -anti-unification. We confine ourselves to -generalization of two terms. All methods work similarly for the simultaneous -generalization of terms.
3.1 The Core Method
-Generalization (-Generalization) For an equational theory , a term is called an -generalization, or -anti-unifier, of terms and if there exist substitutions and such that and . In Fig. 1 (right), we had , , , , and .
As in unification, a most special -generalization of arbitrary terms does not normally exist. A set is called a set of -generalizations of and if each member is an -generalization of and . Such a is called complete if, for each -generalization of and , contains an instance of . ∎
As a first step towards computing -generalization sets, let us weaken Def. 2 by fixing the substitutions and . We will see below, in Sect. 4 and 5, that the weakened definition has important applications in its own right.
Constrained -Generalization (Constrained -Generalization) Given two terms , a variable set , two substitutions with and an equational theory , define the set of -generalizations of and wrt. and as . This set equals . ∎
|Constrained -generalization:||, externally prescribed|
|Unconstrained -generalization:||, computed from ,|
If we can represent the congruence class and as some regular tree language and , respectively, we can immediately compute the set of constrained -generalizations : The set of regular tree languages is closed wrt. union, intersection and complement, as well as under inverse tree homomorphisms, which cover substitution application as a special case. Figure 2 gives an overview of our method for computing the constrained set of -generalizations of and wrt. and according to Def. 3:
From and , obtain a grammar for the congruence class , if one exists; the discussion of this issue is postponed to Sect. 3.2 below.
Compute a grammar for the intersection , using the product-automaton construction, e.g., from [CDG99, Sect. 1.3], which takes time .
Each member of the resulting tree language is an actual -generalization of and . The question of enumerating that language is discussed later on.
Let a regular tree grammar
and a ground substitution be given.
where each is a distinct new nonterminal, and
the rules of are built as follows:
For each rule in include the rule into .
Then for all and all , we have iff and .
The condition in the defining rule of is decidable. and have the same number of nonterminals, and of rules. Each rule of may have at most more alternatives. Note that variables from occur in the grammar like constants. ∎
Based on the above result about constrained -generalization, we show how to compute the set of unconstrained -generalizations of and according to Def. 2, where no is given. It is sufficient to compute two fixed universal substitutions and from the grammars for and and to let them play the role of and
in the above method (cf. the dotted vectors in Fig.2). Intuitively, we introduce one variable for each pair of congruence classes and map them to a kind of normalform member of the first and second class by and , respectively.
We give below a general construction that also accounts for auxiliary nonterminals not representing a congruence class, and state the universality of in a formal way. For the sake of technical simplicity, we assume that and share the same grammar ; this can easily be achieved by using the disjoint union of the grammars for and .
Normal Form (Normal Form) Let an arbitrary tree grammar be given. A non-empty set of nonterminals is called maximal if , but for each . Define . Choose some arbitrary but fixed
maximal for each
maximal for each
ground term for each
The mappings and can be effectively computed from . We abbreviate ; this is a kind of normalform of . Each term not in any , in particular each nonground term, is mapped to some arbitrary ground term, the choice of which does not matter. For a substitution , define . We always have . ∎
Substitution Normalization (Substitution Normalization) For all , , and ,
From the definition of and , we get and , respectively.
Induction on the structure of :
If and , then by 1.
Assuming , we have
Universal Substitutions (Universal Substitutions) For each grammar , we can effectively compute two substitutions that are universal for in the following sense. For any two substitutions , a substitution exists such that for , we have .
Let be a new distinct variable for each . Define for . Given and , let . Then and coincide on , and hence by Lem. 6.2. ∎∎
() We apply Lem. 7 to the grammar consisting of the topmost three rules in Fig. 3 below. The result will be used in Ex. 10 to compute some set of -generalizations. We have , since, e.g., and , while . We choose
We abbreviate, e.g., to . This way, we obtain
Given , and for example, we obtain a proper instance of using and :
The computation of universal substitutions is very expensive because it involves computing many tree-language intersections to determine the mappings and . Assume , where comprises nonterminals representing congruence classes and comprises other ones. A maximal set may contain at most one nonterminal from and an arbitrary subset of ; however, no maximal may be a proper subset of another one. By some combinatorics, we get as an upper bound on . Hence, the cardinality of is bounded by the square of that number. In our experience, is usually small. In most applications, it does not exceed , resulting in . Computing the requires grammar intersections in the worst case, viz. when for some is maximal. In this case, is rather small. Since the time for testing emptiness is dominated by the intersection computation time, and , we get a time upper bound of for computing the .
If the grammar is deterministic, then each nonterminal produces a distinct congruence class [McA92, Sect.2], and we need compute no intersection at all to obtain and . We get . In this case, , , and can be computed in linear time from . However, in general a nondeterminstic grammar is smaller in size than its deterministic counterpart.
Unconstrained -Generalization (Unconstrained -Generalization) Let an equational theory and two ground terms be given. Let be a tree grammar and such that for . Let be as in Lemma 7. Then, is a complete set of -generalizations of and . A regular tree grammar for it can be computed from in time .
If , then and , i.e. is an -generalization of and . To show the completeness, let be an arbitrary -generalization of and , i.e. for some . Obtain from Lemma 7 such that . Then, by definition, contains the instance of . ∎∎
Since the set of -generalizations resulting from our method is given by a regular tree grammar, it is necessary to enumerate some terms of the corresponding tree language in order to actually obtain some results. Usually, there is a notion of simplicity (or weight), depending on the application