DeepAI

Elementary Logic in Linear Space

First-order logic is typically presented as the study of deduction in a setting with elementary quantification. In this paper, we take another vantage point and conceptualize first-order logic as a linear space that encodes "plausibility". Whereas a deductive perspective emphasizes how (i.e., process), a space perspective emphasizes where (i.e., location). We explore several consequences that a shift in perspective to "signals in space" has for first-order logic, including (1) a notion of proof based on orthogonal decomposition, (2) a method for assigning probabilities to sentences that reflects logical uncertainty, and (3) a "models as boundary" principle that relates the models of a theory to its "size".

03/09/2018

Inquisitive bisimulation

Inquisitive modal logic InqML is a generalisation of standard Kripke-sty...
04/12/2019

Axiomatizing first-order consequences in inclusion logic

Inclusion logic is a variant of dependence logic that was shown to have ...
08/14/2020

Partial Orders, Residuation, and First-Order Linear Logic

We will investigate proof-theoretic and linguistic aspects of first-orde...
02/19/2019

Elementary-base cirquent calculus II: Choice quantifiers

Cirquent calculus is a novel proof theory permitting component-sharing b...
03/30/2019

FO = FO3 for linear orders with monotone binary relations

We show that over the class of linear orders with additional binary rela...
02/07/2018

Neighborhood Contingency Logic: A New Perspective

In this paper, we propose a new neighborhood semantics for contingency l...
04/08/2019

Bounded rational decision-making from elementary computations that reduce uncertainty

In its most basic form, decision-making can be viewed as a computational...

1. Introduction

The inspiration for this paper has its origins in Pólya’s writings on plausible reasoning (Pólya, 1990a, b, 2004).

Finished mathematics presented in a finished form appears as purely demonstrative, consisting of proofs only. Yet mathematics in the making resembles any other human knowledge in the making. You have to guess a mathematical theorem before you can prove it; you have to guess the idea of the proof before you carry through the details. You have to combine observations and follow analogies; you have to try and try again. (Pólya, 1990a, pg. vi)

This leads us to ask: Where is the information content of logical experimentation—the guesses, observations, and analogies—located? In this paper, we give our attempt at answering this question in the restricted setting where we use a first-order language to express statements.

The standard presentation of first-order logic is as the study of deduction in a setting with elementary (, over elements of sets) quantification. Thus our first task is to identify a space that holds the information content of first-order sentences. Towards this end, we build upon ideas found in (Huang, 2019)

which introduces a probability distribution on first-order sentences and identifies a corresponding space. We organize our exploration as a series of questions.

• Is there a space that encodes the information content of first-order sentences with fixed quantifier rank (Section 2)? We use the observation that every first-order sentence can be written in distributive normal form (Hintikka, 1965) and identify a vector space

where sentences of fixed quantifier rank are represented as vectors.

• Is there a space for the case of sentences with arbitrary quantifier rank (Section 3)? We will see that an infinite dimensional analogue of a vector space—a Hilbert space—is an appropriate generalization to the setting with arbitrary quantifier rank. A first-order sentence can be identified with a signal in this space.

• How does deduction affect the “plausibility” of a logical system (Section 4)? We can conceptualize a logical system as carrying the “plausibility” of first-order sentences. Similar to how a physical system carries an invariant amount of energy in different forms, a logical system carries an invariant amount of “plausibility” in different places. Under this formulation, we will see that deduction is an entropic operation.

At this point, we will have a view of first-order logic as “signals in space” and so can turn our attention towards the geometry of sentences in space.

• Where is a first-order theory located (Section 5)? A theory manifests itself as a subspace of a certain dimension. It turns out that a complete theory has dimension one whereas an undecidable theory’s dimension is not computable.

• What does orthogonal decomposition reveal about proofs (Section 6)? We show how a proof can be constructed by decomposing the “plausibility” of a first-order sentence into its orthogonal components. Counter-examples (, theorem proving) can be used to approximate a theory from below and examples (, model checking) can be used to approximate a theory from above. An application to assigning probabilities to first-order sentences that reflect logical uncertainty and a discussion of conjecturing will also be given.

• How much space does a first-order theory occupy (Section 7)?

We explore two aspects of a “models as boundary” principle that highlight the difficulty of approximating a theory. In particular, we will encounter a familiar (edge) isoperimetric principle on the Boolean hypercube: a theory with small “volume” (, variance) can have unusually large “perimeter” (, many models).

We emphasize that our aim in this paper is to offer a complementary perspective on first-order logic as “signals in space”. At the end of the paper, we will discuss some directions for future work.

Preliminaries

We assume familiarity with the syntax and semantics of first-order logic, vector spaces, and basic probability theory.

As notation, we write a formula with free variables as or simply . When the quantifier rank of a sentence is important, we indicate it as . Throughout the paper, we work with a first-order language with equality and a finite number of predicates and without constants or function symbols.

Let denote the two element set where and . Let denote the powerset of .

2. Space: Fixed Quantifier Rank

Is there a space that encodes the information content of first-order sentences with fixed quantifier rank? We begin our exploration of first-order logic as a space starting with the simpler case of fixed quantifier rank. We use the observation that every first-order sentence can be written in distributive normal form (Hintikka, 1965) and identify a vector space where sentences of fixed quantifier rank are represented as vectors (Section 2.1). We then examine basic properties of the vector space (Section 2.2). Throughout this section, we restrict attention to sentences of quantifier rank .

2.1. Vector Space

We define a vector space for rank sentences by identifying a basis of rank constituents for the vector space. Before we review the definition of constituents, we highlight two properties that make them a natural candidate for a basis.

Proposition 2.1 ((Hintikka, 1965)).
Distributive normal form:

Every formula can be written in distributive normal form, , as a disjunction

 \mvform(r)[¯y]≡\lOrδ(r)[¯y]∈dnf(\mvform(r)[¯y])δ(r)[¯y]

where is a function that maps a formula (an element of ) to a subset of constituents (an element of ).

Mutual exclusion:

Any two distinct constituents and are mutually exclusive, , .

The first item captures the idea that every vector (formula) can be expressed as a linear combination of basis vectors (constituents) where logical or is interpreted as vector addition. The second item hints suggests an inner product for the vector space: two formulas that denote logically distinct possibilities will be orthogonal. Thus constituents actually give an orthogonal basis. We introduce constituents concretely now following (Hintikka, 1965).

The intuitive idea behind the definition of a constituent is that we would like to describe possible kinds of “logical worlds” by enumerating descriptions of how individuals in the domain are related to one another uniformly in .111Notably, this method of describing possibilities yields a finite number of finite descriptions for each quantifier rank . In contrast, attempting to describe possibilities by enumerating the individuals in the domain may result in an infinite description when the domain is infinite. In order to define constituents, we require (1) several auxiliary definitions and (2) the notion of an attributive constituent. We introduce these in turn.

Auxiliary definitions

Let denote the set of all atomic formula (, a predicate applied to a tuple of variables) involving the free variables . Let denote the subset of that mentions the last element of at least once. Let

 S(\set\mvform1,…,\mvformk)\eqdef\set\lAndi∈\set1,…,k(±)bi\mvformi\STb1∈2,…,bk∈2

be the set where an element is a conjunction of every or its negation where , and .

Attributive constituents

The set of attributive constituents with free variables is defined by induction on quantifier rank .

In the base case,

 Γ(0)[¯y]\eqdefS(\cB[¯y]).

Intuitively, a rank attributive constituent with free variables describes how the individual relates to the other individuals via all the atomic formula in the language.

In the inductive case,

 Γ(r)[¯y]\eqdef{γ(0)[¯y]∧\lAndγ[¯y;z]∈Γ(r−1)[¯y;z](±)s(γ[¯y;z])(∃Ez)γ[¯y;z]\STγ(0)∈Γ(0)[¯y],s:Γ(r−1)[¯y;z]→2}.

The notation indicates an exclusive interpretation of quantification so that is distinct from all other variables in scope. For example, where is shorthand for . Intuitively, a rank constituent with free variables describes (1) how is related to each where and (2) for every possible smaller description (, a rank attributive constituents), whether or not an additional individual exists that is related to the other individuals recursively via that description.

Constituents

The set of rank constituents with free variables is defined using attributive constituents as

 Δ(r)[¯y;z]\eqdef{A[¯y]∧γ(r)[¯y;z]\STA[¯y]∈\cA[¯y],γ(r)[¯y;z]∈Γ(r)[¯y;z]}.

Thus a rank constituent with free variables completes the description of a rank attributive constituent by additionally describing how each term in is related to one another. Thus we can think of a constituent as describing a possible kind of “logical world”.

Some constituents will be unsatisfiable, and hence, describe possibilities that are impossible. The satisfiable constituents, on the other hand, describe logically possible worlds.

Proposition 2.2 ((Hintikka, 1965)).

A formula is logically valid iff its distributive normal form contains all satisfiable rank constituents.

Of course, it is not decidable whether or not a constituent is satisfiable because validity of first-order statements is not decidable. When there are no free variables, attributive constituents and constituents are identical. We abbreviate as . Throughout the rest of the paper, we will largely focus on the case with no free variables, , sentences, unless indicated otherwise.222A constituent is equivalent to which has a distributive normal form of rank .

Constituents as a basis

As a reminder, a (real-valued) vector space is a set along with vector addition and scalar multiplication satisfying the vector space axioms. We define a vector space for sentences of quantifier rank by defining the set and the two operations.

Definition 2.3 ().

Let . Define as component-wise addition and as component-wise multiplication.

Proposition 2.4 ().

is a vector space with zero given by the all vector and additive inverse given by the negation of the vector components.

Proof.

Routine verification of the vector space axioms. ∎

Proposition 2.5 ().

Constituents form a basis for .

Proof.

By construction. ∎

Because a vector space can be described by many different bases, we should think of constituents as the standard basis, similar to the unit vectors of .

The vector space spans all rank first-order sentences.

Proposition 2.6 ().

Every (rank ) first-order sentence can be written as

 \mvform(r)=a1^δ(r)1+…an^δ(r)n

where and any non-zero . Unless stated otherwise, we assume .

Proof.

Every first-order sentence of rank can be written as a disjunction of rank constituents (Proposition 2.1). ∎

As a reminder, every sentence of rank is equivalent to some sentence of rank . Thus this vector space expresses all first-order sentences up to rank .

2.2. Structure

We review some of the structure of the vector space in this section.

Dimensions

The number of attributive constituents of rank can be computed recursively as

 |Γ(r)[¯y]|\eqdef|Γ(0)[¯y]|2|Γ(r−1)[¯y;z]|

where , is combinatorial in the number of base predicates in . The number of constituents of rank is then

 |Δ(r)[¯y]|\eqdef|Δ(0)[¯y]|2|Γ(r−1)[¯y;z]|

where is combinatorial in the number of base predicates in . Consequently, the first-order vector space has a super-exponential number of dimensions as a function of quantifier rank . Thus the vector space has astronomic dimension. Later in Section 7, we will see how models of a theory can be used to approximately span the theory.

Orthogonality

We can define an inner product between two vectors in the standard way:

 ⟨v,w⟩\eqdef∑δ(r)∈Δ(r)vδ(r)wδ(r).

It is a routine exercise to check that this is an inner product. As usual, two vector are orthogonal if their inner product is .

Proposition 2.7 ().

Two vectors are orthogonal in the space if they are logically mutually exclusive.

Proof.

By Proposition 2.1. ∎

The converse does not hold because there can be multiple inconsistent constituents at some quantifier rank . In other words, there are multiple ways to express falsehood in this standard basis so we may have two unsatisfiable sentences that have an inner product of . Note that if we remove all inconsistent constituents of rank from the basis, then we end up with a basis of -isomorphism types (, see (Libkin, 2004)). In this case, falsehood is represented by .

Gram-Schmidt

Let be an enumeration that spans it, , every first-order sentence of rank can be written as a disjunction of elements from . Given an enumeration of , we can construct an enumeration that spans it as .

Recall that the Gram-Schmidt process is a method for constructing an orthonormal basis from a set of vectors that spans the space. Applying the Gram-Schmidt process results in the following basis:

.

3. Space: Arbitrary Quantifier Rank

Is there a space for the case of sentences with arbitrary quantifier rank? The case of sentences with fixed quantifier rank suffices to illustrate the concepts that we hope to explore in the rest of the paper. Nevertheless, for the sake of completeness, we generalize the space perspective for sentences with fixed quantifier rank to the case where sentences have arbitrary quantifier rank. Towards this end, we identify a Hilbert space where first-order sentences are represented as signals in this space.

The construction of the Hilbert space is inspired by the one given in (Huang, 2019) (Section 3.1 and Section 3.2). The main difference is that we take the space view as primary whereas (Huang, 2019) defines a probability distribution on first-order sentences and derives the corresponding space.

3.1. Limits of Logical Descriptions

The intuitive difference between the fixed quantifier rank case and the arbitrary quantifier rank case is “resolution”: the former considers descriptions of logical possibilities at a fixed resolution whereas the latter considers descriptions of logical possibilities at arbitrary resolution. It is thus natural to think of an infinitely precise description of a logical possibility as the limit of a sequence of increasingly finer descriptions.

Our task in this section is to formalize this intuition of the limit of a convergent sequences of logical descriptions. The contents of this section are rather tedious and not essential to the rest of the paper. The high-level takeaway is that we can associate a tree structure (Figure 1) with the set of constituents where each (infinite) path in the tree corresponds to a sequence of constituents ordered by quantifier rank that provide increasingly finer descriptions. The limiting description is the “endpoint” of the corresponding path and provides an infitely precise description.

Expansion

We begin by recalling a basic fact about constituents and their expansion to higher quantifier rank.

Proposition 3.1 ((Hintikka, 1965)).

Every rank constituent can be written as a disjunction of rank constituents, , there is an expansion relation such that any can be expressed as

 δ(r)≡\lOrδ(r+s)≻srδ(r)δ(r+s).

Observe that a constituent can be the expansion of two distinct constituents, in which case it is necessarily unsatisfiable. In symbols, if and then .

Additionally, observe that the notion of expansion is consistent with logical implication. More concretely, if then . When is satisfiable, we can interpret it as extending the description of a logical possibility denoted by to account for additional individuals. In other words, is a higher resolution description of . When is not satisfiable, we can interpret it as an inconsistent description obtained by extending some description (either consistent or inconsistent).

These two observations highlight a certain asymmetry between satisfiable and unsatisfiable descriptions. Whereas a satisfiable constituent is necessarily in the expansion of another unique satisfiable constituent at a given rank due to mutual exclusivity, an unsatisfiable constituent can be in the expansion of any constituent. We thus have to make a choice about how different logical impossibilities are related to one another in order to prevent two sequences that eventually describe logical impossibilities from merging into each other.

Logical impossibilities

We use a syntactic criterion to aid with uniquely associating constituents with their expansions.333The syntactic criterion we use is one of the conditions used in the definition of trivial inconsistency (Hintikka, 1965). Trivial inconsistency is a syntactic criterion for determining whether or not a constituent is satisfiable that satisfies a completeness property (Hintikka, 1965): a constituent that is not satisfiable has a quantifier rank where all of its expansions are trivially inconsistent. The idea is to define a suitable notion for a constituent to be a prefix of another constituent .

The notation

 †δ(r)\eqdef\set⟨(∃Ez)γ(r−1)[z]⟩γ(r−1)[z]γ(r)\STγ(r−1)[z]∈Γ(r−1)[z]

gives the set representation of a constituent where the notation indicates that whether or not the formula inside is negated depends on and . When the quantifier rank is , we have . The notation indicates that we remove the deepest layer of quantification from , , all formula involving the rank quantifiers.444We can see the effect that removing a layer of quantification has on a constituent by rewriting it to only use rank attributive constituents as

We say that is a prefix of , written , if .

Proposition 3.2 ().
1. Every is a prefix of some .

2. If and for some , then .

3. If and , then .

Proof.

See appendix. ∎

The first item ensures that every constituent is the prefix of some expansion constituent so that the resolution of its description can increased. The second and third items ensure that prefixes are compatible with logical implication. More concretely, the second ensures that satisfiable expansions have satisfiable prefixes while the third ensures that an unsatisfiable constituent is the prefix of some unsatisfiable expansion.

Refinement tree

Define a sequence of graphs as follows. Let so that the vertices are the constituents of rank and no edges. Let with edges where is defined in Algorithm 1.

Definition 3.3 ().

The graph is a refinement tree.

We say that refines if there is a path from to in . We check that is a tree with the desired properties.

Proposition 3.4 ().

For any , is a tree such that for any .

See appendix. ∎

Proposition 3.5 ().

is a tree such that for any .

Proof.

See appendix. ∎

The tree structure lends itself to constructing a topology that expresses what an initial segment of refining constituents could converge to.

Topology

Recall that we can define a topology on a tree by defining a basis of open sets identified by finite initial segments of the tree.555For background on topology, we refer the reader to (Munkres, 2000). Let be the set of paths of a refinement tree. We write to indicate the path in a refinement tree. Then a basic open set has the form

 Bδ(r)\eqdef\setρ∈Ψ\ST¯δ(r)⊑ρ

where indicates a prefix relation on refinement tree paths. We can think of a basic open as the description of along with all possible ways in which that description can be extended.

Definition 3.6 ().

The constituent refinement space is the topological space where the topology is generated by the basis of clopen sets

 \setBδ(r)\STδ(r)∈Δ(r).

3.2. Hilbert Space

At the end of the previous section, we obtained a topological space on appropriate sequences of constituents. In this section, we will use the notion of convergence given by the topological space to define a Hilbert space that encodes sentences of arbitrary quantifier rank. Towards this end, we reintroduce metric notions to the topological space so that we can recover a geometry.

The inner product is the fundamental metric notion in a Hilbert space. It intuitively gives a measure of similarity between two elements of a Hilbert space. In our case, we have a space of paths through a refinement tree. Recall that a path in a refinement tree describes a logical description in finer and finer detail. Thus, if we were to measure the similarity between two such paths, we would intuitively weigh the difference in coarser descriptions (the big picture) more than differences in finer descriptions (the minutiae). This leads us to construct a Hilbert space using , , the space of square summable sequences of weighted by a measure .

Definition 3.7 ().

Let be the Hilbert space for sentences of arbitrary quantifier rank where is any measure on ’s Borel -algebra. Unless stated otherwise, we assume that is the uniform measure, , .666Recall that we can uniquely extend a valuation on a clopen basis to a measure on -algebra. For background on measure-theoretic probability, we refer the reader to (Kallenberg, 2006).

Recall that is a Hilbert space so is a Hilbert space.

The Hilbert space spans all first-order sentences. Let

be the characteristic function over the set

.

Proposition 3.8 ().

Every first-order sentence can be written as

 \mvform(r)=n∑i=1aiχBδ(r)i(⋅)

where and are non-zero. Unless stated otherwise, we assume .

Proof.

Every first-order sentence of rank can be written as a disjunction of rank constituents (Proposition 2.1). ∎

Thus we can think of a first-order sentence as the signal in .

Orthogonality

The inner product is given by

 ⟨f,g⟩=∫f⋅gdβ

for (, and are square integrable).

Although there are an uncountable number of infinitely precise descriptions, there is a sense in which we really only have a denumerable handle on them.

Proposition 3.9 ().

has a countable basis.

Proof.

Let and . Then is countable and spans the space. ∎

We can apply Gram-Schmidt to to obtain a orthonormal basis.

This concludes the construction of a space for first-order logic. For simplicity, we will largely restrict attention to the case of fixed quantifier rank throughout the rest of the paper as it suffices to illustrate the main ideas of what a “signals in space” perspective might offer.

4. Plausibility

How does deduction affect the “plausibility” of a logical system? In this section, we conceptualize a first-order space as carrying the “plausibility” of first-order sentences (Section 4.1). Like its physical counterpart which carries a conserved amount of energy in various forms, a logical system carries a conserved amount of “plausibility” in various places. We will then see that deduction is an entropic operation (Section 4.2). This provides an alternative explanation for the failure of logical omniscience, , why a rational agent fails to know all the consequences of a set of axioms. We restrict attention to sentences of quantifier rank less than in this section.

4.1. Plausibility as Energy

Energy is a concept that is fundamental to the physical world. In particular, the law of conservation of energy states that energy cannot be created or destroyed—it can only be converted from one form into another by some physical process. We can thus reason about the effects of physical processes by accounting for the system’s energy.

We propose that “plausibility” is the analogous concept of energy for a logical system. A logical system is comprised of a single sentence.777A logical system consisting of sentences is encoded as the one sentence system . We define the plausibility of a logical system to be

 \cP(\mvform)\eqdef∑δ(R)∈Δ(R)p(δ(R))⟨^δ(R),\mvform⟩

where is some probability mass function on .

We thus have the following analogy with energy:

 plausibility ∼energy logical possibilities ∼forms of energy.

Each logical possibility corresponds to a different form of energy (, kinetic).

The law of conservation of plausibility can then be stated as follows: “Plausibility cannot be created or destroyed—it can only be shifted from one logical possibility to another by some reasoning process”. As with conservation of energy, conservation of plausibility only holds when we consider the entire system, , . We use the phrase “reasoning process” informally to suggest that the process is not necessarily deductive.888For example, Huang (2019) shows how a local Bayesian update process can be used to update the assignment of plausibility to logical possibilities. Nevertheless, deduction is an essential reasoning process and so we turn our attention towards this kind of reasoning next.

4.2. Deduction

From the perspective of language, deduction or modus ponens (mp) is formulated as the inference rule where transforms the left hand side to the right hand side. From the perspective of space, mp will be formulated as an operator.

Modus ponens as an operator

We use the observation that is syntactic sugar for to express mp as an operator.

Definition 4.1 ().

Define the operator as where:

∧ ∨ \mvform1⊕\mvform2\eqdef ∑δ(R)∈Δ(R)max(⟨^δ(R),\mvform1⟩,⟨^δ(R),\mvform2⟩)^δ(R) \mvform1⊖\mvform2\eqdef ∑δ(R)∈Δ(R)min(⟨^δ(R),\mvform1⟩,⟨^δ(R),\mvform2⟩)^δ(R) \mvform⊥\eqdef ∑δ(R)∉dnf(\mvform)⟨^δ(R),\mvform⟩^δ(R)

The additional vector is when . Thus this definition of mp from the space perspective additionally keeps track of how well the antecedent to an implication matches with the current state. It is an over approximation of logical mp.

Analogous to how we can model the evolution of a physical system by considering the sequencing of operators that correspond to physical processes, we can model the evolution of a logical system by considering the effects of sequencing mp operators that correspond to deductive processes.

Observe that is a continuous operator. We can interpret this as another instantiation of the locality of computation: Any sentence that can be deduced is a sequence of local and small steps. As it turns out, is “entropic” in a sense that we describe next.

Modus ponens is entropic

We begin with the observation that mp increases plausibility of a logical system.

.

Proof.

We get that by induction on . It then follows that because . ∎

Consequently, knowledge of a proof contains less plausibility than knowledge of a theorem statement.

Intuitively, the plausibility of a logical system increases because the number of possibilities we consider possible after an application of mp increases. Note that this fact does not contradict the law of conservation of plausibility. In particular, an application of mp changes the logical system under consideration which may no longer be the total system .

We can rephrase the increase in plausibility as an increase in unnormalized entropy. Recall the unnormalized entropy of a vector is defined as . Let be the vector where whenever and otherwise where .

Proposition 4.3 ().
 U((\mvform1∧\mvform2)‡)≤U((MP\mvform1→\mvform2(\mvform1))‡)
Proof.

Observe that . The result thus follows by routine calculation. ∎

Because deduction increases (unnormalized) entropy, a deductive process may spread out information over too many possibilities as to render it impossible to find the proof of a conjecture.

5. Theories

Where is a first-order theory located? As we might expect, a first-order theory forms a subspace. We can visualize the subset of sentences that have quantifier rank

as a hyperplane sitting in

(Figure 2).

Proposition 5.1 ().

Let be some theory. Then forms a subspace of for every .

Proof.

Observe that the subset of satisfiable constituents that satisfy the theory forms a basis for the subspace. ∎

Definition 5.2 ().

The dimension of a theory is a function such that gives the number of satisfiable constituents at rank .

The dimension of a theory has can be interpreted as a measure of its complexity. We introduce a complete theory and an undecidable theory next to give examples of the lower and upper limits of a theory’s complexity.

Example 5.3 (Unbounded dense linear order).

The theory of unbounded dense linear orders (UDLO) expresses the ordering of the real line using the binary relation .999For reference, the axioms for UDLO are: (antisymmetry), (transitivity), (trichotomy), (dense), and (unbounded).

UDLO is a complete theory so it is intuitively “simple”.

Proposition 5.4 ().

A complete theory has dimension the constant one function.

Proof.

Let be some satisfiable constituent (which exists because the theory is consistent). Then is not satisfiable by completeness of the theory. Thus every other constituent at rank is not satisfiable. ∎

To give a flavor of what a constituent looks like, the only satisfiable rank and constituents for UDLO are

1. ; and

2. where and

where is syntactic sugar for . In words, the only satisfiable rank constituent describes that either or , , the only two possible linear orderings involving two elements.

Example 5.5 (Groups).

The theory of groups expresses groups from abstract algebra and involves one binary operator expressing group multiplication.101010For reference, the axioms of groups are: (identity) and (transitivity). Recall that we do not consider function symbols in this paper. Thus the axioms should replace the binary operator with a ternary predicate and additional formula asserting that is a function.

Groups is not decidable so it is intuitively “complex”.

Proposition 5.6 ().

An undecidable theory has uncomputable dimension.

Proof.

A decision procedure for validity of constituents would decide validity of first-order sentences. ∎

Although the dimension is not computable, we do have a (trivial) upper-bound for the dimension of any theory given by the number of constituents.

Proposition 5.7 ().

The dimension is an upper bound on the dimension of of any theory.

Proof.

Straightforward. ∎

Thus there is a super-exponential upper bound on the dimension of a theory as a function of quantifier rank .

Compared to UDLO which only has one satisfiable constituent at any depth, there are many satisfiable constituents at any depth for groups. Intuitively, these constituents correspond to classifying the kinds of groups that are expressible in a first-order language. The most granular kind of group simply satisfies the axioms, a finer kind of group differentiates abelian groups from non-abelian groups, and so on.

6. Proofs

What does orthogonal decomposition reveal about proofs? We revisit the notion of proof from the space perspective in this section and explore what one based on orthogonal decomposition would look like (Section 6.1). We will see that each component can be lower bounded by a counter-example (, theorem proving) or upper bounded by an example (, model checking) (Section 6.2). This provides an explanation of the utility of examples to the process of proving. Finally, we highlight an application to assigning probabilities to first-order sentences that reflects logical uncertainty and provide a discussion of conjecturing (Section 6.3).

6.1. Component Proofs

The idea of decomposing a signal into its orthogonal components is one that has numerous applications in engineering and the sciences. It is a powerful method because it enables one to analyze each component separately and combine the results to obtain the behavior of the original signal. For example, we can understand the motion of an object in a 2D plane by separately analyzing its motion along the x-axis and y-axis and superimpose the results.

Decomposition is similar to the usage of compositionality in language where we give the meaning of a sentence be composing the meanings of its phrases. The difference between the two is that the former implicitly assumes the pieces to be composed are semantically independent whereas the latter does not. Thus we will be looking at proofs under an additional constraint in this section.

In the case of proving, the assumption of semantic independence means that each piece of the proof corresponds to a distinct logical possibility, and consequently, can be tested for satisfiability independently of one another. We formalize this idea now.

Definition 6.1 ().

A (rank ) component proof system is a tuple where

• is a choice of orthogonal basis for ;

• is a valuation such that means that is unsatisfiable and means that is satisfiable; and

• is a component classifier that indicates whether a sentence contains the corresponding basis element as a component or not. In symbols, if and otherwise. The component function is given by

 χ(\mvform(r),δ)=δ∈dnf(\mvform(r))

when is the standard basis.

Definition 6.2 ().

A component proof of is a sentence

 \lAndψ∈\cBτ(ψ)∧χ(\mvform(r),ψ)

where is a component proof system. Each is a component.

A priori, neither the valuation nor the component classifier are required to reflect logical validity. We say that a valuation is sound if whenever is not satisfiable and a valuation is complete if whenever is satisfiable. Similarly, we say that a component classifier is sound if whenever and a component classifier is complete if whenever .

Because neither the valuation nor component classifier are required to reflect logical validity, a proof constructed in a component proof system can exhibit error. In particular, a component proof can be incorrect on some components and correct on others, leading to a “partially correct” proof.

Definition 6.3 ().
• The error of the valuation with respect to the ground truth is

 EV\eqdef∑ψ∈\cB|^τ(ψ)−τ(ψ)|.
• The error of the component classifier with respect to the ground truth on the sentence is

 E\mvform(r)\eqdef∑ψ∈\cB|^χ(\mvform(r),ψ)−χ(\mvform(r),ψ)|.

The total error of a component proof for is .

We say that a component proof is partially correct if it has non-zero total error.

The error is related to the soundness and completeness of the valuation and component classifiers in the obvious way.

Proposition 6.4 ().

The total error for every sentence iff both the valuation and component classifier are sound and complete.

Proof.

Straightforward. ∎

The notion of partial correctness is an essential difference between a standard proof and a component proof. As a reminder, a standard proof is a sequence of sentences where every sentence follows from the previous one by deduction. One invalid proof step thus invalidates the entire proof—a standard proof is brittle. In contrast, a component proof is somewhat robust because each component is semantically independent of the other. The robustness to error leads us to consider how approximation can be used to construct a component proof.

6.2. Approximation

The inspiration for an approximation procedure for constructing a component proof comes from sandwiching a function in analysis: use a simpler family of functions to simultaneously lower bound and upper bound the unwieldy function of interest. In our setting, we can use the family of characteristic functions .

Figure 3 illustrates the idea behind the approximation procedure for the components of a component proof. Let .

Lower bound:

We lower bound by attempting to show that is logically valid. We can use any first-order proof system (, an automated theorem prover) to show that is derivable. This corresponds to showing that this specific component has a counter-example.

Upper bound:

We upper bound by (1) “choosing” a model and then (2) attempting to show that (, using a model checker). This corresponds to showing that this specific component has an example.

The approximation procedure reveals an asymmetry in the efficacy of lower bounding (, proving) versus upper bounding (, model checking). In particular, any disjunction of unsatisfiable basis elements will be unsatisfiable as well. Consequently, by refuting for some subset of unsatisfiable basis elements, we can eliminate basis elements in one go.

In contrast, the model checking approach can only show that one component is satisfiable at a time due to the mutual exclusivity of the basis. Moreover, it requires the user to devise a concrete example. Consequently, from the perspective of a component proof, it should be more effective to put more effort into proving as opposed to model checking.

6.3. Logical Uncertainty

Intuitively, we would expect proving and assigning logical uncertainty—assessing how probable a conjecture is—to be related tasks. For instance, we might believe that a conjecture is more likely to be true than false even when we have no clear path for proving it.

In this section, we show how an approximation procedure for constructing component proofs gives rise to a probability distribution on first-order sentences such that logically equivalent sentences are not necessarily assigned the same probability. This provides another take on the failure of logical omniscience that complements the one given in Section 4 where we saw that deduction was an entropic operation. We begin by introducing a representation of a theorem proving agent’s knowledge.

Representation

Let be a function that soundly approximates the deduction relation, , implies that . We can think of as a resource-bound prover. Similarly, let be a function that soundly approximates a model checker, , implies that . As before, we can think of as a resource-bound model checker.

Suppose we have sentences that are known to be false, knowledge of models , and sentences whose satisfiability is not known. We can interpret as theorems, as common examples, as conjectures that are specific to the agent. (We will see why we include conjectures momentarily.) Let . Let when and

 Nij\eqdef|\set\cMk\ST0≤k≤M,check\cMk(\mvformi→\mvformj)=\true|M+α

otherwise where , , and .

Definition 6.5 ().

The matrix

represents the knowledge of an agent .

Each row and column of the matrix is associated with the sentences . Each entry of the matrix can be thought of as the probability that the row sentence implies the column sentence , , . Thus the first row of the matrix is all (because false statements imply any statement) and the first column of the matrix gives the probability that the given sentence is false (, ). Note that in general (unless we have an iff), and consequently, we should not think of the matrix as a covariance matrix. Nevertheless, there is a sense in which we can think of each entry as a measure of logical overlap, , logical correlation.

Conjectures

Consider the subspace of spanned by where is the maximum rank of all the sentences involved. We call a conjecturing basis.

In general, will not be (1) linearly independent nor (2) contain . In the setting of first-order logic, the first is not an issue for the purpose of determining validity. The second, however, is problematic because it means that there are certain sentences whose validity an agent will not be able to determine unless it considers alternative conjectures. We can thus view conjecturing from the perspective of linear algebra as finding a useful conjecturing basis.

An application of principal component analysis (pca) to

gives an ordering on the agent’s conjectures with respect to

’s eigenvalues. Because the matrix

is constructed with respect to the satisfiability of an agent’s models, we can interpret the sentences with the lowest eigenvalues as those that have no known examples to the sentences with the highest eigenvalues as those with no known counterexamples. Thus there is an exploration versus exploitation tradeoff in determining which conjectures to invest time in. An agent that wishes to explore should invest in the sentences with the smallest eigenvalues while an agent that wishes to exploit should invest in the sentences with the largest eigenvalues.

Beliefs and probabilities

We return to the problem of defining probabilities on sentences, starting with a proxy for them.

Definition 6.6 ().

An agent with knowledge has belief

 B(\mvformi)\eqdef1−Ni0.

Note that an agent’s beliefs do not represent probabilities because they are not normalized.

Although an agent’s beliefs are not probabilities, they can be converted into them.

Definition 6.7 ().

Define a valuation on basic opens by on induction on quantifier rank as and

 ν(Bδ(r+1))\eqdef⎧⎪⎨⎪⎩0ν(Bδ(r))=0B(δ(r+1))∑δ(r+1)