Craig Interpolation and Access Interpolation with Clausal First-Order Tableaux

02/14/2018 ∙ by Christoph Wernhard, et al. ∙ 0

We show methods to extract Craig-Lyndon interpolants and access interpolants from clausal first-order tableaux as produced by automated first-order theorem provers based on model elimination, the connection method, the hyper tableau calculus and instance-based methods in general. Smullyan introduced an elegant method for interpolant extraction from "non-clausal" first-order tableaux. We transfer this to clausal tableaux where quantifier handling is based on prenexing and Skolemization. A lifting technique leads from ground interpolants of Herbrand expansions of Skolemized input formulas to quantified interpolants of the original input formulas. This is similar to a known interpolant lifting by Huang but based more straightforwardly on Herbrand's theorem instead of the auxiliary notion of relational interpolant. Access interpolation is a recent form of interpolation for formulas with relativized quantifiers targeted at applications in query reformulation and specified in the constructive framework of Smullyan's general tableaux. We transfer this here to clausal tableaux. Relativized quantification upon subformulas seems incompatible with lifting techniques that only introduce a global quantifier prefix. We thus follow a different approach for access interpolation: A structure preserving clausification leads to clausal ground tableaux that can be computed by automated first-order provers and, in a postprocessing step, can be restructured such that in essence the interpolant extraction from Smullyan's tableaux becomes applicable.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

By Craig’s interpolation theorem craig:linear, for two first-order formulas and such that entails there exists a third first-order formula that is entailed by , entails and is such that all predicate and function symbols occurring in it occur in both and . Such a Craig interpolant  can be constructed from given formulas and , for example by a calculus that allows to extract  from a proof that entails , or, equivalently, that the implication is valid. Automated construction of interpolants has many applications, in the area of computational logic most notably in symbolic model checking, initiated with mcmillan:2003, and in query reformulation marx:2007; nash:2010; borgida:2010; toman:wedell:book; benedikt:guarded; benedikt:etal:2014:generating; toman:2015:tableaux; benedikt:book; benedikt:2017; toman:2017. The foundation for the latter application field is the observation that a reformulated query can be viewed as a definiens of a given query where only symbols from a given set, the target language of the reformulation, occur in the definiens. The existence of such definientia, that is, definability tarski:35, or determinacy as it is called in the database context, can be expressed as validity and their synthesis as interpolant construction. For example, a definiens  of a unary predicate  within a first-order formula  can be characterized by the following conditions:

  1. entails .

  2. does not occur in .

The variable is allowed there to occur free in . We further assume that does not occur free in and let denote with replaced by a fresh symbol . Now the characterization of definiens by the two conditions given above can be equivalently expressed as

is a Craig interpolant of the two formulas and .

A definiens exists if and only if it is valid that the first formula implies the second one.

The construction of Craig interpolants of given first-order formulas has been elegantly specified in the framework of tableaux by Smullyan smullyan:book:68; fitting:book. Although this has been taken as foundation for applications of interpolation in query reformulation toman:wedell:book; benedikt:book

, it has been hardly used as a basis for the practical computation of first-order interpolants with automated reasoning systems, where the focus so far has been on interpolant extraction from specially constrained resolution proofs (see

bonacina:15:on:ipol; kovacs:17 for recent overviews and discussions).

Here we approach the computation of interpolants from another paradigm of automated reasoning, the construction of a clausal tableau. Expectations are that, on the one hand, the elegance of Smullyan’s interpolation method for non-clausal tableaux can be utilized and, on the other hand, the foundation for efficient practical implementations is laid. Various efficient theorem proving methods can be viewed as operating by constructing a clausal tableau handbook:tableaux:letz (or clause tableau handbook:ar:haehnle). They can be roughly divided into two major families: First, methods that are goal-sensitive, typically proceeding with the tableau construction “top-down”, by “backward chaining”, starting with clauses from the theorem in contrast to the axioms. Aside of clausal tableaux in the literal sense, techniques to specify and investigate such methods include model elimination loveland:1969, the connection method bibel:1981, and the Prolog technology theorem prover pttp. One of the leading first-order proving systems of the 1990s, SETHEO setheo, followed that approach. The leanCoP system leancop along with its recent derivations kaliszyk15:tableaux; femalecop as well as the CM component of PIE cw-mathlib; cw-pie are implementations in active duty today. The second major family of methods constructs clausal tableaux “bottom-up”, in a “forward-chaining” manner, by starting with positive axioms and deriving positive consequences. With the focus of their suitability to construct model representations, these methods have been called bottom-up model generation (BUMG) methods bumg. They include, for example, SATCHMO satchmo and the hyper tableau calculus hypertab, with implementations such as Hyper, formerly called E-KRHyper cw-ekrhyper; cw-krhyper; hyper:2013. Hyper tableau methods are also used in high-performance description logic reasoners dl:hypertab. It appears that the chase method from the database field, which recently got attention anew in knowledge representation (see, e.g., grau:2013:acyclicity), can also be understood as such a bottom-up tableau construction. Methods of the instance-based approach to theorem proving (see baumgartner:2010:ibased for an overview) should in general be applicable to construct a clausal tableau after proving, from the instances involved in the proof, although the proof construction itself might not proceed by tableau construction. For a systematic overview of different variants of tableaux structures and methods, including clausal tableaux with respect to both considered major paradigms see handbook:ar:haehnle.

An essential distinction of clausal tableau methods from resolution-based methods is that at the tableau construction only instances of input clauses are created and incorporated. Clauses are not broken apart and joined as in a resolution step. Nevertheless, clausal tableau methods might be complemented by preprocessors that perform such operations. An essential distinction from non-clausal tableau methods is that with the clausal form only a particularly simple formula structuring has to be considered, in essence sets of clauses. Through preprocessing with conversion to prenex form and Skolemization, the handling of quantifications amounts for clausal tableau methods just to the handling of free variables.

The tableau-based method for Craig interpolation presented here proceeds in two stages, with some similarity to resolution-based methods discussed in huang:95; baaz:11; bonacina:15:on:ipol; kovacs:17 that compute in a first stage a so-called relational, weak or provisional interpolant which satisfies the vocabulary restriction on interpolants with respect to predicate symbols but not necessarily with respect to function and constant symbols. The result of the first stage is in the second stage lifted to an actual interpolant of the original input formulas by replacing terms with variables and prepending a specific quantifier prefix. In our tableau-based method the two stages are separated at a different place, more directly related to Herbrand’s theorem, without need of an additional notion such as relational interpolant. In the first stage an actual Craig interpolant of a finite unsatisfiable subset of the Herbrand expansion of the Skolemized and clausified input formulas is constructed. The involved ground clauses can be obtained as instances of clauses of the closed tableau computed by a first-order prover for a set of first-order clauses. With respect to interpolation, the closed clausal tableau can be considered just as given, abstracting from the method by which it has been constructed. This leads to a lean formalism for interpolation and justifies the practical implementation of Craig interpolation with arbitrary high-performance first-order theorem provers that construct clausal tableaux, without need to modify inference rules or other prover internals.

There are many known ways to strengthen Craig’s interpolation theorem by ensuring that for given formulas  and  that satisfy certain syntactic restrictions there exists an interpolant  that also satisfies certain syntactic restrictions. For example, that predicates occur in  only with polarities with which they occur in both and

. (A predicate occurs with positive (negative) polarity in a formula if it occurs there in the scope of an even (odd) number of negation operators.) The respective strengthened interpolation theorem has been explicated by Lyndon

lyndon, hence we call Craig interpolants that meet this restriction Craig-Lyndon interpolants. Access interpolation benedikt:book is a variant of Craig-Lyndon interpolation that applies to formulas in which quantifiers only occur relativized by atoms, as for example in


With each occurrence of a relativizing atom a binding pattern or access pattern is associated, which comprises the predicate, the polarity of the occurrence and the argument positions of those variables that are not quantified by the associated quantifier. For example, in (i) we have for the occurrence of  the predicate in negative polarity with the empty set of argument positions and for the occurrence of the predicate in positive polarity and the set  of argument positions, because at the first argument position in the occurrence of is not quantified by . Positions specified in the set are also called input positions, while the quantified positions are output positions, corresponding to their role in a naive formula evaluation. Access interpolation strengthens Craig-Lyndon interpolation by requiring that also the binding patterns occurring in the interpolant formula are subsumed by binding patterns occurring in a specific way in the input formulas.

In benedikt:book it has been shown that many tasks in database query reformulation can be expressed in terms of access interpolation, applied to construct definientia of queries that are in a certain vocabulary and involve only certain binding patterns which makes them evaluable in a certain sense. A variant of Craig-Lyndon interpolation by Otto otto:interpolation:2000 has been suggested in nash:2010 as a technique to take relativization into account. In benedikt:book access interpolation is presented as a generalization of Otto’s interpolation and constructively proven on the basis of Smullyan’s tableau method following the presentation in fitting:book.

Access interpolants involve only relativized quantification, which seems incompatible with a global quantifier prefix as computed by the lifting technique sketched above for Craig interpolation, at least if predicates used as relativizers are permitted to have empty extensions.111If relativizer predicates are assumed to have nonempty extensions, quantifiers together with their relativizing literals can be moved to the prefix, justified in essence by the following entailments shown here for a unary relativizer predicate , but holding analogously also for relativizers with larger arity. If and are formulas such that does not occur free in , then:

Hence, the method for access interpolation presented here extracts the interpolant from a tableau in a single stage, where a form of lifting that only applies to subformulas corresponding to scopes of relativized quantifiers is incorporated. In essence, Smullyan’s techniques for non-clausal tableau are simulated with the more machine-oriented clausal tableaux and variable handling through Skolemization. Correspondence to Smullyan’s tableaux is achieved by a structure preserving normal form and certain structural requirements on the clausal tableaux. These are already met by hyper tableaux. In the general case they can be ensured with restructuring transformations, applied in a postprocessing step to closed clausal tableaux obtained from provers.

The contributions of this work can be summarized as follows:

  1. Foundations to perform Craig interpolation and related forms of interpolation for first-order logic with clausal tableau methods are developed. They provide:

    1. A basis for implementing interpolation with efficient machine-oriented theorem provers for first-order logic that can be understood as constructing clausal tableaux. With methods and systems of two main families, goal-oriented “top-down” and forward-chaining “bottom-up”, there is a wide range of potential applications.

    2. A relatively simple framework to prove constructively the existence of interpolants with further syntactic properties, beyond the restriction on symbols required by Craig interpolants. The involved constructions are, moreover, suited for realization by practical systems. In the paper such constructions are shown for Craig-Lyndon interpolation, interpolation from a Horn formula, and, with access interpolation, for a form of quantifier relativization.

  2. Interpolant lifting, which is in principle known from resolution-based approaches since the mid-nineties, is placed at a new and apparently more natural position within the overall task of first-order interpolation, where it is independent of a particular calculus. A detailed correctness proof that resides on a small technical basis is presented.

  3. For access interpolation, a key technique for query reformulation, the first practically implementable methods are described.

  4. Conversions between closed clausal tableaux are developed that transform arbitrarily structured inputs to clausal tableaux with a restricted structure that in essence simulates non-clausal tableaux or tableaux that are constrained in specific ways, as, for example, computed by hyper tableau methods. They justify the application of practical methods that construct unrestricted clausal tableaux, such as, for example, goal-oriented “top-down” first-order theorem proving methods, to tasks like access interpolation which require a certain tableau structuring.

Proofs are given for all theorem, lemma and proposition statements that do not pertain to the considered logics in general. Proofs which involve intricacies or subtleties are given in detail.

The rest of this paper is structured in two main parts: Sections 2LABEL:sec-cli-related are concerned with Craig-Lyndon interpolation and Sections LABEL:sec-aiLABEL:sec-ai-conclusion with access interpolation. After notation and basic terminology have been specified in Sect. 2, precise accounts of clausal tableau and related notions are given in Sect. 3. In Sect. 4 the extraction of ground interpolants from closed clausal ground tableaux is specified and proven correct. The generalization of this method to first-order formulas, which involves preprocessing by Skolemization and postprocessing of ground interpolants by lifting is specified and proven correct in Sect. 5, and in Sect. LABEL:sec-lift-related compared with related approaches from the literature. In Sect. LABEL:sec-tab-constraints constraints on clausal tableaux are specified that characterize positive hyper tableaux, which are typically computed by “bottom-up” methods. On this basis a construction of Craig-Lyndon interpolants that inherit the Horn property from the first interpolation input is shown. Section LABEL:sec-cli-related concludes the part on Craig-Lyndon interpolation with a discussion of possible refinements of our method and issues for further research. We then turn to access interpolation. In Sect. LABEL:sec-ai a brief overview on our approach is given, underlying notions from the literature are recapitulated, and a structure-preserving clausal normalization of the relativized input formulas is described. The extraction of an access interpolant from a closed clausal ground tableau that is for such clauses and meets certain structural constraints is then specified and proven correct in Sect. LABEL:sec-access-extract. These structural constraints are met by positive hyper tableaux. For the general case they can be ensured with tableau transformations, specified in Sect. LABEL:sec-access-convert and illustrated with examples in Sect. LABEL:sec-access-convert-examples. Section LABEL:sec-ai-conclusion concludes the part on access interpolation with a discussion of possible refinements of our method, issues for further research and related work. Section LABEL:sec-conclusion concludes the paper with an abstract view on its main contributions.

A work-in-progress poster of this research at an earlier stage was presented at theTABLEAUX 2017 conference.

2 Notation and Basic Terminology

We basically consider first-order logic without equality.222This does not preclude to represent equality as a predicate with axioms that express reflexivity, symmetry, transitivity and substitutivity. Atoms are of the form , where is a predicate symbol (briefly predicate) with associated arity and are terms formed from function symbols (briefly functions) with associated arity  and individual variables (briefly variables). Function symbols with arity are also called individual constants (briefly constants).

Unless specially noted, a formula is understood as a formula of first-order logic without equality, constructed from atoms, constant operators , , the unary operator , binary operators and quantifiers with their usual meaning. Further binary operators , as well as -ary versions of and can be understood as meta-level shorthands. Also quantification upon a set of variables is used as shorthand for successive quantification upon each of its elements. The operators and bind stronger than and . The scope of , the quantifiers, and the -ary connectives is the immediate subformula to the right. Formulas in which no functions with exception of constants occur are called relational. Formulas in which no predicates with arity larger than zero and no quantifiers occur are called propositional.

A subformula occurrence has in a given formula positive (negative) polarity, or is said to occur positively (negatively) in the formula, if it is in the scope of an even (odd) number of negations. If is a term or a formula, then the set of variables that occur free in  is denoted by , the set of functions occurring in  by , and the set of constants occurring in  by . If is a formula, then the set of pairs of predicates occurring in  coupled with an identifier of the respective polarity of the atom in which they occur is denoted by , the set of pairs of atoms occurring in coupled with an identifier of the respective polarity in which they occur as , and the set of terms that occur as argument of a predicate (in contrast to just as argument of a function) as . The notation , and is also used with sets  of terms or formulas, where it stands for the union of values of the respective function applied to each member of . A formula without free variables is called a sentence. A term or quantifier-free formula in which no free variable occurs is called ground. A ground formula is thus a special case of a sentence. Symbols not present in the formulas and other items under discussion are called fresh.

A literal is an atom or a negated atom. If is an atom, then the complement of is and the complement of is . The complement of a literal  is denoted by . A clause is a (possibly empty) disjunction of literals. A clausal formula is a (possibly empty) conjunction of clauses, called the clauses in the formula.

The notion of substitution used here follows baader:snyder:unificationtheory: A substitution is a mapping from variables to terms which is almost everywhere equal to identity. If is a substitution, then the domain of is the set of variables , the range of is , and the restriction of to a set  of variables, denoted by , is the substitution which is equal to the identity everywhere except over , where it is equal to . The identity substitution is denoted by . A substitution can be represented as a function by a set of bindings of the variables in its domain, e.g., . The application of a substitution  to a term or a formula  is written as , is called an instance of and is said to subsume . Composition of substitutions is written as juxtaposition. Hence, if and are both substitutions, then stands for .

For injective substitutions we use the following additional notation: If is an injective substitution and is a term or a formula, then denotes with all occurrences of subterms  that are in the range of and are not a strict subterm of another subterm in the range of replaced by the variable that is mapped by  to . As an example let . Then

The principal functor of a term that is not a variable is its outermost function symbol. If is a set of function symbols, then a term with a principal functor in is also called an -term.

We write for entails ; for is valid; and for is equivalent to , that is, and . On occasion we write a sequence of statements with these operators where the right and left, respectively, arguments of subsequent statements are identical in a chained way, such as, for example, for and .

3 Clausal First-Order Tableaux

The following definition makes the variant of clausal tableaux that we use as basis for interpolation precise. It is targeted at modeling tableau structures produced by efficient fully automated first-order proving systems based on different calculi. [Clausal Tableau and Related Notions]

def-tab Let be a clausal formula. A clausal tableau (briefly tableau) for is a finite ordered tree whose nodes  with exception of the root are labeled with a literal, denoted by , such that the following condition is met: For each node  of the tableau the disjunction of the labels of all its children in their left-to-right order, denoted by , is an instance of a clause in . A value of for a node  in a tableau is called a clause of the tableau.

def-tab-closed A node  of a tableau is called closed if and only if it has an ancestor  with . With a closed node , a particular such ancestor is associated as target of , written . A tableau is called closed if and only if all of its leaves are closed.

def-tab-ground A tableau is called ground if and only if for all its nodes  it holds that is ground. The most immediate relationship of clausal tableaux to the semantics of clausal formulas is that the universal closure of a clausal formula is unsatisfiable if and only if there exists a closed clausal tableaux for the clausal formula. Knowing that there are sound and complete calculi that operate by constructing a closed clausal tableau for an unsatisfiable clausal formula, and taking into account Herbrand’s theorem we can state the following proposition:[Unsatisfiability and Computation of Closed Clausal Tableaux] There is an effective method that computes from a clausal formula  a closed clausal tableau for  if and only if , where , is unsatisfiable. Moreover, this also holds if terms in the literal labels of tableau nodes are constrained to ground terms formed from functions occurring in  and, in case there is no constant occurring in , an additional fresh constant.

Our objective is here interpolant construction on the basis of clausal tableaux produced by fully automated systems. This has effect on some aspects of our formal notion of clausal tableau: All occurrences of variables in the literal labels of a tableau according to Definition LABEL:def-tab are free and the scope of these variables spans all literal labels of the whole tableau. In more technical terms, this means that the tableaux are free variable tableaux (see (handbook:tableaux:letz, p. 158ff)) with rigid variables (see (handbook:ar:haehnle, p. 114)). Tableaux with only clause-local variables can, however, of course be expressed by just using different variables in each tableau clause. Thus, although our notion of tableaux involves rigid variables, this does not in any way imply that interpolant computation based on it applies only to tableaux whose construction by a prover had involved rigid variables.

Another aspect concerns the definition of closed for nodes and for tableaux: A tableau is closed if all of its leaves are closed, which does, however, not exclude that also an inner node of a closed tableau might be closed. For the construction of a closed tableau in theorem proving it is pointless to attach children to an already closed node. In our context, however, operations such as instantiating literal labels and certain tableau transformations might introduce inner closed nodes. To let the results of such operations be tableaux again, we thus have to permit closed inner nodes. A tableau simplification to eliminate these is shown in Sect. LABEL:sec-access-convert.

4 Ground Interpolant Extraction from Clausal Tableaux

As shown by Craig craig:linear, for first-order sentences and such that , an “intermediate” sentence such that can be constructed, whose predicates and functions are occurring in both and . That this also holds if in addition the polarities of predicate occurrences in are constrained to polarities in which they occur in both and is attributed to Lyndon lyndon, such that formulas are sometimes called Lyndon interpolants in analogy to Craig interpolants. We call them here Craig-Lyndon interpolants: [Craig-Lyndon Interpolant] Let be sentences such that . A Craig-Lyndon interpolant of and is a sentence such that

  1. .

  2. .

  3. .

The notion of Craig-Lyndon interpolant is specified here for sentences in contrast to formulas , and . This is without loss of generality because free variables in , and would, with respect to interpolation, be handled exactly like constants.

Smullyan smullyan:book:68 specifies in his framework of non-clausal tableaux an elegant technique to extract a Craig-Lyndon interpolant from a tableau that represents a proof of , which is also presented in Fitting’s book fitting:book. The handling of propositional connectives in this method can be straightforwardly transferred to clausal tableaux. Quantifiers, however, have to be processed differently to match their treatment in clausal tableaux by conversion to prenex form and Skolemization. The overall interpolant extraction from a closed clausal tableau then proceeds in two stages, analogously as described for resolution-based methods in huang:95; baaz:11; bonacina:15:on:ipol; kovacs:17. In the first stage a “rough interpolant” is constructed which needs postprocessing by replacing terms with variables and prepending a quantifier prefix on these variables to yield an actual interpolant. This second stage will be specified in Sect. 5 and discussed further in Sect. LABEL:sec-lift-related. As we will see now, on the basis of clausal tableaux the first stage can be specified and verified with proofs by a straightforward adaption of Smullyan’s method in an almost trivially simple way.

Our interpolant construction is based on a variant of clausal tableaux where nodes have an additional side label that is shared by siblings and indicates whether the tableau clause is an instance of an input clause derived from the formula of the left side or the formula on the right side of the entailment underlying the interpolation: [Two-Sided Clausal Tableau and Related Notions]

def-coltab Let be clausal formulas. A two-sided clausal tableau for and (or briefly tableau for the two formulas) is a clausal tableau for whose nodes  with exception of the root are labeled additionally with a side , such that the following conditions are met:

  1. If and are siblings, then .

  2. If is a child of , then is an instance of a clause in .

The side of a clause  in a tableau is the value of the side label of the children of .

def-branch For and nodes of a two-sided clausal tableau define

where is the union of with the set of the ancestors of . The following definition specifies an adaption of the handling of propositional connectives in (smullyan:book:68, Chap. XV) and (fitting:book, Chap. 8.12) to construct interpolants from non-clausal tableaux. Differently from these works, the specification is here not in terms of tableau manipulation rules that deconstruct the tableau bottom-up, but inductively, as a function that maps a node to a formula. [Interpolant Extraction from a Clausal Ground Tableau] Let be a node of a closed two-sided clausal ground tableau. The value of is a ground formula, defined inductively as follows:

  1. [label=.,leftmargin=2em]

  2. If is a leaf, then the value of is determined by the values of and as specified in the following table:

  1. [label=.,leftmargin=2em]

  2. If is an inner node with children where , then the value of is composed from the values of for the children, disjunctively or conjunctively, depending on the side label of the children (which is the same for all of them), as specified in the following table:

The following lemma associates semantic and syntactic properties with the formula obtained as value of applying to the root of a closed ground tableau. These properties imply the conditions required from a Craig-Lyndon interpolant (Definition 4). [Correctness of Interpolant Extraction from Clausal Ground Tableaux] Let be clausal ground formulas and let be a closed two-sided clausal ground tableau for and . If is the root of , then

  1. .

  2. .


We show the following property of that invariantly holds for all nodes of the tableau, including the root, which immediately implies the proposition: For all nodes  of it holds that

  1. [label=()]

  2. .

  3. .

This is proven by induction on the tableau structure, proceeding from leaves upwards. We prove the base case, where is a leaf, by showing 1 and 2 for all possible values of :

  • Case :

    • Case : Immediate since then .

    • Case : Then . Properties 1 and 2 follow because is a conjunct of and is a conjunct of .

  • Case :

    • Case : Then . Properties 1 and 2 follow because is a conjunct of and is a conjunct of .

    • Case : Immediate since then .

To show the induction step, assume that is an inner node with children . Consider the case where the side of the children is . The induction step for the case where the side of the children is can be shown analogously. By the induction hypothesis we can assume that for all it holds that

which, since , is equivalent to

Since it follows that

Because is an instance of a clause in and thus entailed by the semantic requirement 1 of the induction conclusion follows:

The syntactic requirement 2 follows from the induction assumption and because in general for all nodes  of a two-sided clausal ground tableau for clausal ground formulas and it holds that all literals in occur in some clause of and all literals in occur in some clause of . ∎

Lemma 4 immediately yields a construction method for Craig-Lyndon interpolants of propositional and, more general, ground formulas, or, in other words, quantifier-free first-order formulas. We call the method CTI, suggesting Clausal Tableau Interpolation. In Sect. 5 below it will be generalized to first-order sentences in full. [The CTI Method for Craig-Lyndon Interpolation on Ground Formulas]

Input: Ground formulas and such that .

Method: Convert and to equivalent clausal ground formulas and compute a closed two-sided clausal ground tableau for them. Let be the root of the tableau and compute the value of .

Output: Return the value of . The output is a ground formula that is a Craig-Lyndon interpolant of the input formulas.

The procedure is correct: The existence of a closed two-sided clausal tableau as required follows from Proposition 3, that the result is ground and is a Craig-Lyndon interpolant of and  follows from Lemma 4 and Definition 4.

5 First-Order Interpolant Extraction from Clausal Tableaux

Procedure 4 provides a method to compute Craig-Lyndon interpolants of ground formulas. We now generalize it to first-order sentences with arbitrary quantifications. The starting point is a ground interpolant obtained from a closed clausal ground tableaux according to Lemma 4. The tableau is now for two clausal formulas that have been obtained from first-order sentences by Skolemization, conversion to clausal form and instantiation. By a postprocessing lifting operation, the ground interpolant is converted to an interpolant of the two original first-order input sentences. Terms with function symbols that do not occur in both of them are there replaced by variables and a suitable quantifier prefix upon these variables is prepended. The postprocessing is easy to implement, it effects at most a linear increase of the formula size and its computational effort amounts to sorting the replaced terms according to their size. Similar lifting techniques have been shown for resolution-based methods in huang:95 and (baaz:11, Lemma 8.2.2). We discuss the relationship to these in Sect. LABEL:sec-lift-related.

Before we specify the first-order interpolation procedure and prove its correctness we note that to capture the semantics of Skolemization and to eliminate function symbols that occur only in one the two interpolation inputs we use second-order quantification upon functions and predicates in intermediate formulas, that is, formulas used in the procedure specification and within the correctness proof. In particular, we apply the following properties: [Second-Order Skolemization] Let be a formula. Assume that are variables that do not occur bound in and that is an -ary function symbol that does not occur at all in . Then

[Inessential Quantifications in Entailments] Let be formulas and let be sets of predicate and function symbols such that . Then

Proposition 5 includes the special case of quantification upon nullary functions, that is, constants, which is actually first-order quantification upon them in the role of variables. On the right side of the equivalence stated by the proposition, where they occur free, they can be viewed as constants or as free variables. Notice that and in the preconditions take polarity into account. That is, if a predicate occurs in only with, say, positive polarity and in only with negative polarity, then, by Proposition 5 it holds that holds if and only if , although occurs in as well as in .

We are now ready to specify the CTI method in full, which generalizes Procedure 4 by allowing first-order sentences with arbitrary quantifications as inputs: [The CTI Method for Craig-Lyndon Interpolation]

Input: First-order sentences and such that .

Method: Clausify and to obtain equivalent sentences and , respectively, where and are the introduced Skolem functions and and are clausal formulas whose variables are and , respectively. Assume w.l.o.g. that and are disjoint. Let be a fresh constant. Construct a closed two-sided clausal ground tableau for and in which all literal labels are instantiated with terms formed from and functions that occur in or in . Let  be the conjunction of the clauses of the tableau with side  and let  be the conjunction of the clauses of the tableau with side . Let  be , where is the root of the tableau. Define:

(Alternatively, it is also possible to place into instead of . Further possibilities are discussed in Sect. LABEL:sec-cli-related-choices below.) Let and be fresh sequences of variables and let be an injective substitution with domain such that

Construct as

Construct the quantifier prefix as follows: Let be the members of that occur in ordered such that for it holds that if is a strict subterm of , then and, for , let if and let if .

Output: Return

The output is a Craig-Lyndon interpolant of the input sentences.

Procedure 5 indeed generalizes Procedure 4: For ground inputs both procedures proceed identically. Correctness of the procedure is stated with the following theorem, which will be proven in detail. The proof is followed by Example LABEL:examp-thm-lifting, which illustrates items mentioned in the proof for a pair of concrete input sentences. [Correctness of the CTI Method] If and are first-order sentences such that , then Procedure 5 applied to and outputs a Craig-Lyndon interpolant of and .


Let symbols have the denotation according to the procedure specification. In addition we will specify further clausal formulas, sets of variables, and substitutions, that relate to the items in the procedure specification and are overviewed in the following two graphs:


Figure 1: Clausal formulas and substitutions used to prove interpolant lifting.

Variables allowed in the respective formulas are shown there in parentheses. Formulas  and are ground. Sets of variables denoted by different symbols (including differences in the subscript) in the figure are disjoint. The superset symbol indicates that all clauses of the formula on the right are clauses of the formula on the left. Arrows () represent the instance of relationship, where the formula at the arrow tip under the substitution shown as arrow label is the formula at the arrow origin. Substitutions that are injections are marked with an asterisk (). The shown substitutions have the following domains:

The following additional syntactic constraints are imposed on the involved formulas:

Members of do not occur in .
Members of do not occur in .

We proceed to show the construction of the involved items, stepping out from those mentioned in the procedure description. Sentences  and are given as input. The conversion to and can be obtained by usual first-order normal form transformation. Skolemization can there be understood as equivalence preserving rewriting with Proposition 5. It has to be applied here independently to and to , which is possible since these sentences do not share quantified variables. The required disjointness conditions on sets of variables and Skolem functions can be achieved easily by renaming bound variables. The sets of functions  can then be constructed from and . The following semantic relationships hold: