    # An Improved Algorithm for E-Generalization

E-generalization computes common generalizations of given ground terms w.r.t. a given equational background theory E. In 2005 [arXiv:1403.8118], we had presented a computation approach based on standard regular tree grammar algorithms, and a Prolog prototype implementation. In this report, we present algorithmic improvements, prove them correct and complete, and give some details of an efficiency-oriented implementation in C that allows us to handle problems larger by several orders of magnitude.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

E-generalization computes common abstractions of given objects, considering given background knowledge to obtain most suitable results.

More formally, assuming familiarity with term rewriting theory (e.g. [DJ90]), the problem of E-generalization consists in computing the set of common generalizations of given ground terms w.r.t. a given equational background theory .   In [Bur05, Def.1,2], we gave a formal definition of the notion of a complete set of -generalizations of given ground terms, and presented [Thm.7] an approach to compute such a set by means of standard algorithms on regular tree grammars (e.g. [TW68, CDG01]).   Its most fruitful application turned out to be the synthesis of nonrecursive function definitions from input-output examples:   given some variables , some ground substitutions , and some ground terms , we construct a term such that

 t[x1σ1,…,xnσ1]=Et1⋮⋮t[x1σm,…,xnσm]=Etm

As a main example application of this approach, we could obtain construction law terms for sequences.   For example, given the sequence , we could find a term such that the equations in Fig. 1 (right) hold, where consisted of the usual equations defining and on natural numbers in - notation (see Fig. 1 left), writing e.g.  for for convenience.   Each such term computes each supplied sequence member , , and correctly from its 1st and 2nd predecessor and , respectively, and its position in the sequence (counting from ).   The supplied values , , are said to be explained by the term ; in the sequence, we write a semicolon to separate an initial part that doesn’t need to be explained from an adjacent part that does.

For the given example, we obtained e.g. the term ; when the background theory also included , we additionally obtained e.g. .   When instantiated by the substitution , both terms yield as sequence continuation, which is usually accepted as a “correct” sequence extrapolation.   This way, -generalization, or nonrecursive-function synthesis, can be used to solve a certain class of intelligence tests.

In order to avoid solutions like , we defined a notion of term weight and focused on terms of minimal weight for .   As we computed a regular tree grammar describing all possible solution terms in a compact form, we could use a version of Knuth’s algorithm from [Knu77] to actually enumerate particular law terms in order of increasing weight.

 ∗

The current report is in some sense the continuation of [Bur05].   After giving some definitions in Sect. 2, it reports in Sect. 3 algorithmic improvements on the problem of computing -generalizations.   Based on them, an efficiency-oriented reimplementation in C allows us to handle problems larger by several orders of magnitude, compared to the Prolog implementation described in [Bur05, Sect. 5.4].   Section 4 discusses some details of that implementation.   Section 5 shows some run time statistics.

We used our implementation to investigate the intelligence test IST’70-ZR [Amt73]; a report on our findings is forthcoming.

## 2 Definitions

### 2.1 Terms

###### Definition 1.

Signature (Signature)
Let be a finite set of function symbols, together with their arities.   We abbreviate the set of all -ary functions by .   Let be a finite set of variables. ∎

###### Definition 2.

Term (Term)
For , let denote the set of terms over with variables from , defined in the usual way.   In particular, denotes the set of ground terms.   We use to denote the syntactic equality of terms.

###### Definition 3.

Equality, normal form (Equality, normal form)
Let be a congruence relation on , i.e. an equivalence relation that is compatible with each .   We require that is computable in the following sense:   a computable function shall exist such that for all we have

1. Idempotency:   ,

2. Decisiveness:   , and

3. Representativeness:   .

In property 2, syntactic term equality is used on the right-hand side of “”.   Property 3 is redundant, it follows by applying 2 to 1.   We call the normal form of , and say that evaluates to .   Let be the set of all normal forms, also called values. ∎

Most often, is given by a confluent and terminating term rewriting system (see e.g. [DJ90, p.245,267]); the latter is obtained in turn most often from a conservative extension of an initial algebra of ground constructor terms (see e.g. [Duf91, Sect.7.3.2, p.160], [Pad89]).   As an example, let and let be given by the usual rewriting rules defining addition and multiplication on that set, like those shown in Fig. 1 (left); e.g. .

### 2.2 Weights

In this subsection, we give the necessary definitions and properties about term weights.   We associate to each function symbol a weight function of the same arity (Def. 6) which operates on some well-ordered domain (Def. 4).   Based on these functions, we inductively define the weight of a term (Def. 7).

###### Definition 4.

Weight domain, minimal weight (Weight domain, minimal weight)
Let be a set and an irreflexive total well-founded ordering on , such that has a maximal element .   We use to denote the reflexive closure of .   Since is well-founded, each non-empty subset of contains a minimal element ; we additionally define . ∎

For our algorithm in Sect. 3, we need a maximal element as initial value in minimum-computation algorithms.   For the algorithm’s completeness, we additionally need the absence of limit ordinals (e.g. [Hal68, Sect.19]) below (see Lem. 15).   Altogether, for Alg. 13 we have to chose order-isomorphic to .   However, the slightly more general setting from Def. 4 is more convenient in theoretical discussions and counter-examples.   We need to be total and well-founded in any case, in order for a set of terms to contain a minimal-weight term.

###### Definition 5.

Function properties wrt. order (Function properties wrt. order)
A function is called

• increasing if ,

• strictly increasing if ,

• monotonic if ,

• strictly monotonic if it is monotonic and
,

• strict if , in particular, a -ary weight function is strict if . ∎

Knuth’s algorithm — which will play an important role below — requires weight functions to be monotonic and increasing [Knu77, p.1l].

###### Definition 6.

Weight function associated to a function symbol (Weight function associated to a function symbol)
Let and as in Def. 4.   Let a signature be given.   For each and , let a monotonic and strictly increasing weight function be given; we call the weight function associated with .   We assume that can always be computed in time . ∎

###### Definition 7.

Term weight, term set weight (Term weight, term set weight)
Given the weight functions, define the weight of a ground term inductively by

 wg(f(t1,…,tn)):=¯¯¯f(wg(t1),…,wg(tn)).

For a set of ground terms , define

 wg(T):=min{wg(t)∣t∈T}

to be the minimal weight of all terms in .   Note that is always well-defined, and for nonempty we have for some , since is well-ordered. ∎

###### Example 8.

Weight function examples (Weight function examples)
The most familiar examples of weight measures are the size , and the height of a term , i.e. the total number of nodes, and the length of the longest path from the root to any leaf, respectively.   If and for each , we get ; the definitions for yield . ∎

###### Example 9.

Weight function counter-examples (Weight function counter-examples)
We give two counter-examples to show what our weight functions cannot achieve.

First, it would be desirable in some contexts to prefer terms with minimal sets of distinct variables.   Choosing , for , and111 relaxing ’s increasingness from strict to nonstrict, for sake of the example for an -ary function symbol (including constants), we obtain as the set of distinct variables occurring in the term .   However, we cannot use as irreflexive well-ordering on in the sense of Def. 4, since it is not total.   Even if in a generalized setting on was allowed to be a partial ordering, and we were interested only in the number of distinct variables, Knuth’s algorithm from [Knu77] to compute minimal terms cannot be generalized accordingly to this setting, unless , as shown in [Bur03, Sect.5, Lem.29].

Similarly, it can be desirable to have as the number of distinct subterms of .   Following similar proof ideas as in [Bur03, Sect.5], it can be shown that this is impossible unless the monotonicity requirement is dropped for weight functions, and that Knuth’s algorithm cannot be adapted to that setting, again unless . ∎

The following property will be used in the completeness and correctness proof of Alg. 13 below.

###### Lemma 10.

Strict weight-functions (Strict weight-functions)
If all weight functions are strict, then .

###### Proof.

Induction on the height of . ∎∎

## 3 An improved algorithm to compute E-generalizations

In this section, we discuss some algorithmic improvements on the problem of computing E-generalizations.   We first give the improved algorithm in Sect. 3.1, and prove its completeness and correctness in Sect. 3.2 (see Thm. 19).   In Sect. 3.3, we relate it to the grammar-based algorithm from [Bur05, Sect. 3.1], indicating that the former is an improvement of the latter.   Implementation details are discussed in Sect. 4.

### 3.1 The algorithm

If is an -ary function symbol, and are weights such that , we call the -tuple a weight decomposition list evaluating to .

Our algorithm (Fig. 2 shows an example state) can be decomposed in two layers.   The lower one (Alg. 11, see left and red part in Fig. 2) generates, in order of increasing evaluation weight, a sequence of all possible weight decomposition lists.   The latter is fed into the higher layer (Alg. 12, see right and blue part in Fig. 2), where each decomposition list is used to generate a corresponding set of terms.

This pipeline architecture is similar to that of a common compiler front-end, where a scanner generates a sequence of tokens which is processed by a parser.

###### Algorithm 11.

Weight decomposition list generation (Weight decomposition list generation)

Input:

• a signature ,

• a computable weight function , for each function symbol

• a finite set of variables to be used in terms

Output:

• a potentially infinite stream of weight decomposition lists, with their evaluated weights being a non-decreasing sequence

Algorithm:

1. Maintain a set of weight decomposition lists to be considered next.

2. Maintain a set of evaluating weights of all decomposition lists that have ever been drawn from .

3. Initially, let be obtained from all variables and signature constants.

4. Initially, let .

5. While is non-empty, loop:

1. Remove from a weight decomposition list evaluating to the least weight among all lists in ; let denote that weight.

2. Output to the stream.

3. Insert into .

4. If in step 5a the least evaluating weight in had increased since the previous visit, enter each possible weight decomposition list into that can be built from and some weights from .
More formally: for every (non-constant) function symbol from the signature, enter into each weight decomposition list such that . ∎

###### Algorithm 12.

Term generation from decomposition lists (Term generation from decomposition lists)

Input:

• a pair of goal values ,

• ground substitutions ,

• a finite, or potentially infinite, stream of weight decomposition lists, ordered by ascending evaluation weight.

Output (if the algorithm terminates):

• a term of minimal weight such that for .

Algorithm:

1. Maintain a partial mapping that yields the term of least weight considered so far (if any) for each value pair such that and , if is defined.

2. Maintain a set of minimal terms generated so far.   Terms will be added to it in order of increasing weight; so we can easily maintain a weight layer structure within it.   More formally: for each weight , let be the set of all minimal terms generated so far that have weight .

3. Initially, let be the empty mapping.

4. Initially, let , for each .

5. While the input stream of weight decomposition lists is non-empty, loop:

1. Let be the next list from the stream, let denote the weight it evaluates to.

2. For all , loop:

1. Build the term . This term has weight by construction.

2. Let , and .

3. If is undefined, then

3. If , then stop with success:
is a term of minimal weight such that .

6. Stop with failure: no term with exists. ∎

Note that the loop in step 5 may continue forever, if the input stream is infinite but no solution exists.   Next, we compose Alg. 11 and Alg. 12:

###### Algorithm 13.

Value-pair cached term generation (Value-pair cached term generation)
Input:

• a signature ,

• a computable weight function , for each function symbol

• a pair of goal values ,

• substitutions .

Output:

• a term of minimal weight such that for .

Algorithm:

• Let .

• Feed , all , and into Alg. 11.

• Feed , , and Alg. 11’s output stream into Alg. 12.

• Run Alg. 11 in parallel to Alg. 12 until either algorithm terminates.

Implementation issues:

• The set in Alg. 11 is best realized by a heap (e.g. [AHU74, Sect. 3.4]).

• Since we have in Alg. 11 iff in Alg. 12, the former test can be implemented by the latter one, thus avoiding the need for an implementation of .

• The mapping in Alg. 12 is best implemented by a hash table (e.g. [AHU74, Sect. 4.2]) of balanced trees (e.g. [AHU74, Sect. 4.9]).

• The sets in Alg. 12 are just segments of one global list of terms, ordered by non-decreasing weight .

• The test in step 5iiiC of Alg. 12 can be speeded-up by initializing to a special non-term entry “”. ∎

###### Example 14.

Fibonacci sequence (Fibonacci sequence)
Figure 2 shows an example state of Alg. 11 and 12, employed to obtain a law for the sequence .   It assumes the weight from Exm. 8, slightly modified by assigning weight , rather than , to variable symbols.

In the left part, in red, the Alg. 11 state is shown, with weight decomposition lists given in infix notation.   The list was just drawn from .   It evaluates to weight , which occurs for the first time, so all lists buildable from and some member of are entered into , of which , , and are shown as examples.

In the right part, in blue, the Alg. 12 state is shown.   To the right of each term in some , its evaluation value under and , i.e.  is given; note that both values agree for ground terms.   Alg. 12 is just building all terms corresponding to the input weight decomposition list , i.e. all sums of two generated minimal terms from .   After the term has been built and entered into , the term is currently under consideration.   It evaluates to the normal form and under and , respectively.   Looking up the pair with reveals that there was already a term of less weight with these values, viz. ; so the term is discarded.   Next, the term will be built, evaluating to ; lookup via will show that this is the goal pair, and the algorithm will terminate successfully, with as a law term for the given sequence.   We tacitly ignored commutative variants of buildable terms, such as , , and ; see Sect. 4.3 below for a formal treatment.

Figure 3 shows the detailed run of Alg. 11 in this example, up to the state shown in Fig. 2 (left).   Figures 4 and 5 show the full run of Alg. 12, until the solution is found.   For sake of brevity we again ignored commutative variants and some other trivial computations (marked “skipped” in the step field). ∎