Language Models for Some Extensions of the Lambek Calculus

07/31/2020 ∙ by Max Kanovich, et al. ∙ 0

We investigate language interpretations of two extensions of the Lambek calculus: with additive conjunction and disjunction and with additive conjunction and the unit constant. For extensions with additive connectives, we show that conjunction and disjunction behave differently. Adding both of them leads to incompleteness due to the distributivity law. We show that with conjunction only no issues with distributivity arise. In contrast, there exists a corollary of the distributivity law in the language with disjunction only which is not derivable in the non-distributive system. Moreover, this difference keeps valid for systems with permutation and/or weakening structural rules, that is, intuitionistic linear and affine logics and affine multiplicative-additive Lambek calculus. For the extension of the Lambek with the unit constant, we present a calculus which reflects natural algebraic properties of the empty word. We do not claim completeness for this calculus, but we prove undecidability for the whole range of systems extending this minimal calculus and sound w.r.t. language models. As a corollary, we show that in the language with the unit there exissts a sequent that is true if all variables are interpreted by regular language, but not true in language models in general.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Lambek calculus was introduced by Joachim Lambek Lambek58 for mathematical modelling of natural language syntax. This suggests the natural interpretation of the Lambek calculus as the algebraic logic of operations on formal languages. Such interpretations of the Lambek calculus are called language models, or L-models for short.

The Lambek calculus, as originally formulated by Lambek, includes three operations: (product), (left division), and (right division). A distinctive feature of the Lambek calculus is the so-called Lambek’s non-emptiness restriction. In terms of L-models, this means that the empty word is disallowed, and we consider, for a given alphabet , subsets of . Lambek operations on languages are defined as follows:

The division operations, and , are indeed residuals of the product w.r.t. the subset relation:

These equivalences form the core of the Lambek calculus. Along with transitivity (), reflexivity (), and associativity (), they form a complete axiomatization of all generally true atomic statements about Lambek operations on formal languages. This axiomatization is the Lambek calculus in its non-sequential form.

The sequential formulation of the Lambek calculus Lambek58 is as follows. Formulae are constructed from variables () using three connectives: , , . (We use capital Latin letters both for languages and for Lambek formulae.) Sequents are expressions of the form , where the antecedent is a sequence of formulae and the succedent is one formula (intuitionistic style). The calculus includes axioms of the form and the following rules of inference:

The cut rule is eliminable Lambek58 .

An L-model, formally, is a mapping of Lambek formulae to subsets of (languages without the empty word), which commutes with Lambek operations: , , and . A sequent is true in this model, if .

According to Lambek’s non-emptiness restriction, all sequents in derivations are required to have non-empty antecedents. This constraint is motivated by linguistic applications: without it, Lambek categorial grammars generate ungrammatical sentences (MootRetore, , § 2.5).

Abolishing Lambek’s restriction—that is, removing constraints “ is non-empty” on and —yields the Lambek calculus allowing empty antecedents, denoted by  Lambek61 . Language models are easily adapted for the case of : now we consider languages, which are subsets of (that is, they are allowed to include the empty word ). The definition of division operations is also modified: for models of ,

This modification can alter the values of and even if and do not contain the empty word. For example, now always includes , and therefore is always a subset of . Hence, is not a conservative extension of : the sequent has a non-empty antecedent, but is derivable only in , not in . For these modified L-models, let us use the term L-models.

In an L-model , a sequent of the form is true if , and a sequent of the form , with an empty antecedent, is true if .

Completeness theorems for and w.r.t. corresponding versions of L-models were proved by Pentus PentusAPAL ; PentusFmonov . Pentus’ proofs are highly non-trivial. If one considers the fragment without (the product-free fragment), however, proving L-completeness becomes much easier. This was done by Buszkowski Buszko1982 ; Buszkowski’s proof applies both to and , w.r.t. L-models and L-models, respectively.

Besides product and two divisions, natural operations on formal languages include set-theoretic intersection and union. These operations correspond to so-called additive conjunction and disjunction. Additive operations are usually axiomatized by the following inference rules (cf. KanazawaJoLLI ):

The Lambek calculus extended with these rules is denoted by (multiplicative-additive Lambek calculus); is the variant of without Lambek’s restriction (that is, allowing empty antecedents). L-completeness, however, fails for in general. Further, in Section 2, we discuss this issue in detail.

Following Abrusci Abrusci , we put the Lambek calculus into a broader context of linear logic. Namely, can be viewed as a fragment of intuitionistic non-commutative linear logic. (This fragment includes multiplicative and additive operations, but lacks the exponential and constants.) We also consider commutative systems: intuitionistic linear logic and intuitionistic affine logic .

Calculi and can be obtained from by adding structural rules: permutation for and permutation and weakening for . In the language of , the rules of permutation and weakening are formulated as follows:

Adding only weakening yields non-commutative intuitionistic affine logic, or affine (monotone) multiplicative-additive Lambek calculus. We denote this system by (in the presence of extra structural rules, we do not impose Lambek’s restriction).

We shall also use alternative calculi for the commutative systems and , in which structural rules are hidden in axioms and in the format of sequents. First, we change the language of formulae, introducing one connective instead of and (these are equivalent in and ). We also write instead of , following Girard’s Girard linear logic notations.

Sequents are now going to be expressions of the form , where is a multiset of formulae. Further means , and means , where is multiset union.

Axioms are of the form , for each variable , in the case of , and of the form for . Inference rules for both systems are as follows:

For , the weakening rule is not officially included in the system, but is admissible:

(it is hidden in axioms).

The cut rule of the following form is admissible both in and :

This is shown by a standard inductive argument.

Finally, let us introduce the multiplicative unit constant, . The unit constant is added to systems without Lambek’s restriction extending (i.e., itself, , , , ). The Lambek calculus with the unit,  Lambek69 , is obtained from by adding one axiom, (its antecedent is empty), and one inference rule,

L-completeness, however, does not hold for . Indeed, since should be the unit w.r.t. , that is for any , in L-models it should be interpreted as . The following sequent is a counter-example for L-completeness: . This sequent is true in all models for any interpretation of , but is not derivable in .

Throughout this paper, we shall frequently consider fragments of the calculi defined above in languages with restricted sets of connectives. Such a fragment will be denoted by the name of the calculus, followed by the list of connectives in parentheses: e.g., .

2 Distributivity Law in Fragments with One Additive

It is well known, that is incomplete w.r.t. L-models. The reason is the distributivity principle,

On one hand, this principle is obviously true in all L-models. On the other hand, as noticed by Ono and Komori OnoKomori , one needs the structural rules of contraction and weakening to derive it. In particular, the distributivity principle is not derivable in , , , , and .

The distributivity principle, as formulated above, includes both additive connectives, and . We investigate fragments of with only one additive, or . The result of our study is that with respect to distributivity and behave in opposite ways.

Let denote with the distributivity principle added as an extra axiom scheme. In the presence of this axiom scheme, we have to keep cut as an official rule of the system (it is now not eliminable). A hypersequential system for was developed by Kozak Kozak .

Let us restrict ourselves to the product-free language (with product, proving L-completeness is hard even without extra connections PentusAPAL ; PentusFmonov ). We also consider calculi without the unit constant: issues connected with are discussed in Section 3. Thus, we consider two fragments of the multiplicative-additive Lambek calculus: and , and the corresponding fragments of bigger system up to . (For commutative calculi, we have only one implication, that is, consider fragments in the language of and .)

As shown by Buszkowski Buszko1982 , is complete w.r.t. L-models. This yields the following corollary: is a conservative fragment of both and . Indeed, any sequent provable in is true in all L-models; if it is in the language of , it is derivable in by L-completeness. In other words, the distributivity principle has no non-trivial corollaries in the language of .

The situation with is opposite. Namely, we present a corollary of the distributivity principle in the language of , which is not provable in . Thus, is not a conservative fragment of , and is therefore incomplete w.r.t. L-models. Moreover, we show that this effect is of a more general nature. Namely, the same holds for the corresponding fragments of , , , and : distributivity has no new corollaries in the language with , but has such in the language with .

2.1 Completeness with Additive Conjunction Only

For the first series of results, concerning , we give a semantic proof. For each system, we consider a specific version of L-semantics. For and , these are L-models and L-models respectively. For other systems, let us first give some definitions and prove correctness statements for them.

Definition 1.

A language is called monotone, if for any word and an arbitrary word the word also belongs to .

Proposition 1.

If and are both monotone, then so are , , and .

Proof.

Let . Then for any we have . Now take for an arbitrary . By monotonicity of , the word is also in . Since this holds for any , we get . The reasoning for is symmetric. The case of is trivial. ∎

Definition 2.

A language is called commutative, if for any word belonging to and an arbitrary transposition on the word also belongs to .

Commutative languages are in one-to-one correspondence with multisets of letters from . Thus, we can define the operation of multiset union, , for two commutative languages and , which can be expressed as follows:

If is a commutative language, then if and only if . Therefore, for commutative and , we have ; we denote this language by .

Proposition 2.

If and are commutative, then so is and .

Proof.

Commutativity of is obvious. For , take any and let . Now for any . By definiton of , we have . Now by commutativity of , the word also belongs to . Indeed, it is obtained from by the following transposition:

Since was taken arbitrarily, we conclude that . ∎

Having the class of monotone languages and the class of commutative languages closed under our operations (, , ), we can define the classes of restricted L-models for all our systems.

Definition 3.

An L-model is monotone, if all languages in it are monotone. Truth of sequents is defined as in ordinary L-models.

Definition 4.

A commutative L-model is an L-model, where all languages are commutative.

In commutative models actually plays the role of product (while we do not have product as a connective, we still have the sequential comma, which is a hidden product), due to the following fact.

Proposition 3.

In a commutative L-model , a sequent is true if and only if .

Proof.

The “if” part is due to the fact that . The “only if” part holds since is closed under transpositions. ∎

Now we prove an extension of Buszkowski’s completeness result

Theorem 4.

Each of , , , , is sound and complete w.r.t. the corresponding class of models, according to the following table:

Calculus Models
L-models
L-models
monotone L-models
commutative L-models
L-models, which are both monotone and commutative
Proof.

The cases of and are due to Buszkowski Buszko1982 . Let us consider the remaining three systems.

The soundness part is easy: our conditions on models were specifically designed to reflect structural rules. In a monotone model, if , then also , thus the weakening rule is valid. If we have a commutative L-model, then the permutation rule is valid. This is obvious from Proposition 3: unlike , is just commutative. All other rules and axioms are valid in arbitrary L-models.

Completeness is proved by Buszkowski’s canonical model argument. We do it uniformly for all systems. In the canonical model, the alphabet is the set of all formulae of the given calculus, and for any formula let

First we show that is indeed an L-model:

This is performed exactly as in Buszkowski’s proof. Indeed, if , then for an arbitrary we have and . Applying cut with , we get derivable in our system. Thus, , therefore . Notice that cut is available in all systems we consider. Dually, if , then, since by the axiom, . This means derivability , thus . Hence, .

The case is symmetric. For , we use the equivalence if and only if and . Here the “if” part is an application of , and the “only if” part is by cut with and .

Next, is easy to see that the canonical model belongs to the corresponding class of models: monotone for , commutative for , commutative and monotone for .

Finally, suppose a sequent is not derivable. Consider two cases. If is non-empty, then, since each belongs to , we have . On the other hand, . This falsifies under interpretation . If is empty, then we have , which again falsifies . This finishes the completeness proof. ∎

It is easy to see that soundness actually extends to the language with (interpreted as set-theoretic union). Unions of monotone languages are also monotone, the same for commutative languages. The situation with product is a bit more complicated for commutative systems, since is usually not commutative, even for commutative and . Thus, we have to alter the definition of language models in the commutative case, requiring instead of . Under this modification, soundness holds for product also. Finally, notice that in all models we consider and are interpreted set-theoretically, thus, obey the distributivity law. These considerations yield the following soundness result:

Proposition 5.

Each of , , , , is sound w.r.t. the corresponding class of models, according to the table in Theorem 4; for and in the models we use to interpret .

Now we are ready to state and prove our conservativity result.

Theorem 6.

The systems in the restricted language without , , , , , and are conservative fragments of , , , , and respectively.

Proof.

Let be a sequent in the language of (in the commutative case, ). Suppose it is derivable in one of the distributive systems, , …, . Then by Proposition 5 it is true in all models of the corresponding class. By Theorem 4 it is derivable in, respectively, , …, . ∎

2.2 Incompleteness with Additive Disjunction Only

If we take instead of , however, no analog of the conservativity result like Theorem 6 is possible, due to the following counter-example.

Theorem 7.

The sequent

is derivable in but this sequent is not derivable in .

This sequent is in the language of . The theorem states that it is derivable in , and therefore in all its extensions up to , but not in the corresponding () fragments without the distributivity law added. Thus, this is a non-trivial corollary of in the language without . In particular, Theorem 7 implies that is incomplete w.r.t. L-models, as well as , , , are incomplete w.r.t. the corresponding modifications of L-models (compare with Theorem 4).

Before proving Theorem 7, let us make some remarks. First, let us notice that the sequent in this theorem is slightly different from the one in our WoLLIC 2019 paper KanKuzSceWoLLICLMod , where one variable is used for and . The reason is that the old example happens to be derivable in (but still not in and weaker systems).

Second, the hard part of Theorem 7 is, of course, the second one (non-derivability). Fortunately, the derivability problem in is algorithmically decidable (belongs to PSPACE), thus, it is possible to establish non-derivability by exhaustive proof search. This proof search was first performed, as a pre-verification of the result, automatically using an affine modification of llprover by Tamura Tamura . (For the WoLLIC 2019 paper, we used a prover by Jipsen Jipsenprover , based on the algorithm by Okada and Terui OkadaTerui .) In order to make this article self-contained and independent from proof-search software, here we present a complete manual proof search.

One of the WoLLIC 2019 reviewers suggested a shorter method of proving non-derivability of the given sequent in , via an algebraic counter-model. This counter-model is a commutative residuated lattice on the set . The order is defined as follows: ; are incomparable. Product and residual are defined as follows:

(In the commutative situation, we have only one residual, which we denote by .) Variables are interpreted as follows: as , as , and both as . This algebraic model falsifies the sequent in Theorem 7. However, is insufficient for our new purposes. The reason is that in this model , while in the presence of weakening should be true. Thus, in order to establish non-derivability of our sequent not only in , but also in , we use the good old syntactic method.

Proof of Theorem 7.

The first statement is proved using the joining (diamond) construction, the idea of which goes back to Lambek Lambek58 and Pentus PentusJoLLI . Indeed, let and . Then is equivalent to . One can easily check derivability of and in . Notice that the antecedent of this sequent is exactly the one in the sequent of our theorem. Next, we derive , and further by distributivity .

The second statement is proved by an exhaustive proof search for the sequent

(the translation of our sequent into the commutative language) in .

In order to facilitate proof search, we take into account the following considerations.

First, the rules and are invertible. Thus, we can suppose they are applied immediately. Moreover, has two premises, and when disproving derivability we have the right to choose one and establish non-derivability there.

Second, we can suppose that in our (hypothetic) derivation instances of of the form are directly preceded by axioms. Indeed, such instances are interchangeable upwards with and , and cannot appear before this , since is a variable. Other rules are impossible by the polarized subformula property.

Third, we establish non-derivability of several sequents, which will appear frequently in our proof search:

(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)

Now we are ready to start proof search. First we invert introducing and choose :

Now we have a choice of 4 principal connectives (denoted by numbers in the sequent) to be decomposed first.

Case 1. In this case, we use , thanks to our consideration that with should be applied immediately after an axiom.

Invert and , choosing out of :

Now we can decompose (by ) one of the implications 2–4, and for each we have a choice of ways of splitting the rest of the antecedent into and . Making use of the weakening rule, however, we can reduce the number of cases.

Subcase 1–2. If includes , then the right premise is , where is a subset of . Notice that if and the sequent is not derivable with , it is also not derivable with (otherwise we could derive it with using the weakening rule). However, the sequent is not derivable even with the maximal :

Indeed, invert and choose :

Here one should use , but then in its right premise we can again invert choosing , which yields one of:

None of these is derivable.

If does not include , then is a subset of , and we again take the maximal in the left premise:

(9)

Decomposing yields either or , both not derivable by (1) and (2). Thus, we have to decompose on the right.

Taking (and inverting ) yields

Now we again have to use . The new cases are and , both not derivable (3)(4).

Taking and inverting gives

Decomposing fails due to (1)(2)(5).

Subcase 1–3. Apply for and consider its left premise with the maximal possible :

(10)

Subsubcase 1–3–2. Decompose . If the big formula with goes to the new , then the new is either or empty. However, neither nor is derivable (6)(7). If the formula with goes to the new , then the new is either or empty. This gives, at maximum, , which is falsified by choosing in the inverted : .

Subsubcase 1–3–4. Decompose . Again, if the big formula (now with ) goes to the new , we falsify the left premise by (1) or (2). Otherwise, the right premise is, at maximum, , which is again falsified by choosing .

Subcase 1–4. If includes , then the right premise is, at maximum,

Invert and choose :

Now we have to use . Its right premise is, at maximum, . Choosing falsifies it.

If is in , then the maximal version of the left premise is

(11)

Applying right now is impossible: its left premise gets falsified by (7) or (6). Apply (recall that is used only directly below axiom) and invert :

Here the left premise of is also falsified by (7), (6), or (8).

Case 2. Consider again two cases, depending on whether goes to or to . If it goes to , then the right premise is, at maximum,

Invert and choose :

(12)

For reusal of our reasoning in further cases, we shall falsify a stronger sequent

(13)

Indeed, , and therefore is derivable in . Thus, if (12) happens to be derivable then, by cut, so will be (13).

Now we decompose one of , , in (13).

Subcase 2––1. Recall that we never choose in , and invert :

Invert and choose :

Subsubcase 2––1–5. The left premise is, at maximum,

Applying is impossible, since its right premise is falsified by choosing : and .

Subsubcase 2––1–4. Again, if goes to the new , then the right premise is, at maximum, which is falsified by choosing . If it goes to the new , then the new left premise is, at maximum, , which is not derivable by (6).

Subcase 2––4. If the new is empty, then the left premise is falsified by (7). Otherwise, the right premise is

Invert and choose :

Applying is impossible (); also . Thus, we have to use , and we can immediately apply afterwards: Inverting and choosing falsifies this sequent: .

Subcase 2––5. The left premise is, at maximum,

This is not derivable.

Now let, in Case 2, go to . Then the left premise is, at maximum,

This sequent is stronger than (9)—that is, (9) can be obtained from it by weakening. Therefore, it cannot be derivable, since we’ve already falsified (9) in Case 1.

Case 3. Take the maximal possible and consider the left premise:

This sequent is stronger than (10), and therefore not derivable: (10) was falsified in Case 1.

Case 4. If goes to , then the maximal version of the right premise of is

Invert and choose :

Suppose this sequent is derivable. Then it will also be derivable after swapping variables and :

This sequent, however, is exactly (13), up to commutativity; (13) was falsified in Case 2.

Finally, if , in Case 4, goes to , then the maximal version of the left premise of is

This sequent is stronger than (11) and therefore cannot be derivable. ∎

3 Undecidability with , , and

3.1 The System and Its Undecidability

In this section we consider the extension of the Lambek calculus with the multiplicative unit constant. The language of our fragment will be as follows: , , . As shown by Buszkowski Buszko1982 , in the fragment of and the Lambek calculus with empty antecedents is complete w.r.t. L-models. As noticed in the Introduction, however, this is not the case if we add . In L-models, because of the principle , the unit constant is necessarily interpreted as the singleton set , where is the empty word. (In the presence of the unit constant, we allow the empty word to belong to our languages and abolish Lambek’s non-emptiness restriction.) This particular interpretation of satisfies certain principles, including and . Moreover, these principles keep valid for languages of the form (for any ). Indeed, this language is either or , and for the empty set we also have and .

Below we present a calculus denoted by , which reflects these principles as sequential rules:

The rules and are called “commuting” rules; they reflect the fact that, for any set , and . The “doubling” rule is caused by and . Thus, these rules express natural algebraic properties of the interpretation of as . However, we emphasize that they are not admissible in the standard calculus , introduced by Lambek Lambek69 , that is, non-commutative intuitionistic multiplicative-additive linear logic.

The rules , , and are not new. Their underlying principles, namely, and appear in works of the Hungarian school (Andréka, Mikulás, Németi). Namely, in Andreka2011TCS one can find the first of these equivalences (denoted there as formula 3.2), as one of the principles which is true in language algebras, but not in algebras of binary relations. The second equivalence is true for binary relations also; formula (CbI) in Andreka2011AlgUniv is actually its stronger version, . We get our by taking .

Andréka, Mikulás, and Sain Andreka1994LIF also sketch an undecidability proof for a system related to the one considered here. Their proof is based on the technique of Kurucz et al. Kurucz1993 . The system considered in Andreka1994LIF is the logic of residuated distributive lattices over monoids. Unlike the case we consider in this section, their system requires product, the unit and also the zero constant (the minimal element of the lattice) to be present in the language. Here we require only division, additive conjunction, and the unit. The trade-off is that we consider a narrower class of models. Namely, we consider only L-models, and these models, as shown above, allow extra principles for