Towards Formalising Schutz' Axioms for Minkowski Spacetime in Isabelle/HOL

08/08/2021 ∙ by Richard Schmoetten, et al. ∙ 0

Special Relativity is a cornerstone of modern physical theory. While a standard coordinate model is well-known and widely taught today, several alternative systems of axioms exist. This paper reports on the formalisation of one such system which is closer in spirit to Hilbert's axiomatic approach to Euclidean geometry than to the vector space approach employed by Minkowski. We present a mechanisation in Isabelle/HOL of the system of axioms as well as theorems relating to temporal order. Proofs and excerpts of Isabelle/Isar scripts are discussed, particularly where the formal work required additional steps, alternative approaches, or corrections to Schutz' prose.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Formal foundations are a recently re-emerging trend in modern physics. While philosophical, mathematical, and empirical studies were inseparably entwined in antiquity, formal mathematics and physical science drifted apart in the eighteenth and nineteenth centuries suppes1968 .

The mathematical deduction employed for example in Ptolemy’s Harmonics is taken to be almost divine. Thus he considers “arithmetic and geometry, as instruments of indisputable authority” (bernard2010, , p. 507). In contrast, the main physical theories of the twentieth century were developed as physics first, and retro-fitted with rigorous mathematical foundations later. An example particularly relevant to this work is that of Special Relativity (SR) gourgoulhon2013g . The comprehensive mathematical treatment given by Minkowski minkowski1908 was at first dismissed as unnecessarily complicated einstein1908 . Early work on axiomatising SR (e.g. by Robb robb1936 ) went largely unnoticed by the physical research community.

But the search for a formal foundation to modern physics gained support in the second half of the twentieth century. Philosophical essays suppes1968 , the successes of the new mathematical quantum and relativity theories schrodinger1926 ; born1926 , and increasing interest by the mathematical community, all contributed to works ranging from differential geometry and General Relativity (GR) to the Wightman axioms in particle physics streater2000 .

We will present here a mechanisation of an axiom system for Minkowski spacetime, the main ingredient of the theory of SR, given by Schutz in 1997 schutz1997 . To this end, we use the proof assistant Isabelle/HOL, briefly introduced in Section 2. We then proceed to an exhibition of the axioms in Section 3, and describe some of our mechanised lemmas and theorems in Section 4.111The formalisation can be accessed at https://github.com/rhjs94/schutz-minkowski-space. 11todo: 1formatting at the end: make sure proofs are not split across pages; make sure lstinline doesn’t lead to overfull hboxes (seen in warnings, words extending into right margin); noindent after listings?; see if more places need linking of text/Isa/Schutz (e.g. by filling in variable names); mathpmx fonts; indent for proof env?;

2 Background

2.1 Formalisation in Special Relativity

Several axiom systems have been proposed for Minkowski spacetime. Schutz himself proposes several iterations, starting with a formulation based on primitive particles and the binary signal relation in 1973 schutz1973 . The next iteration in 1981 replaces signals with a binary temporal order relation, and light signals become an entirely derived notion, whose existence is proven, not assumed schutz1981 . It is the final axiom system, published in a monograph in 1997, that is of primary interest to us: it contains many of the axioms of earlier systems as theorems, while also boasting the property of independence (see Sec. 3 for details). Systems formulated by Szekeres szekeres1968 and Walker walker1959 also rely on undefined bases and axioms inspired by physical intuition, and Schutz cites them as direct predecessors to his work. Another early approach is that of Robb robb1936 , based on events and an ordering relation, and continued by Mundy mundy1986 ; mundy1986a . A first-order alternative to Schutz is given by Goldblatt goldblatt1989 ; goldblatt2012 , who relies on a relation of orthogonality in addition to the betweenness Schutz employs in his system of 1997.

More recently, an extension of Tarski’s Euclidean ideas using Goldblatt’s approach to Minkowski spacetime was given by Cocco and Babic cocco2021 . Their system is mostly formulated in first-order logic, but with a second-order continuity axiom in order to show the usual four-dimensional Minkowski spacetime is a model. A flexible first-order system of axioms describing several different theories of relativity was given by Andréka et al. andreka2011 ; andreka2013 . Notably, there exists a mechanisation of this approach in Isabelle/HOL by Stannett and Németi stannett2014 . In contrast to what we propose here, Stannett and Németi assume an underlying coordinate formulation and use first-order axioms, while Schutz’ system is second-order, and his Isomorphism Theorem linking it to the usual coordinate model is one of his final results.

2.2 Axiomatic Geometries

Geometry is arguably the oldest discipline to have seen successful axiomatisation in the form of Euclid’s Elements heath1956 . Over two millennia later, Hilbert’s Grundlagen der Geometriehilbert1950 built on Euclid to propose a new, self-contained system of axioms using modern logical concepts such as undefined notions (in contrast to Euclid’s primitive definitions). Many alternative Euclidean systems have been postulated and examined since then. Schutz acknowledges clear parallels between several of his theorems and those of Veblen veblen1904 , whose axioms for Euclidean geometry replace Hilbert’s primitives (points, lines, planes, and several relations between them) to use only points and a single relation. Tarski’s system of elementary Euclidean geometry tarski1959 is influential too: points as well as two undefined relations are his only primitive notions. His axioms can be formulated in primitive notions only, using first-order logic (with identity and using an axiom schema). Schutz schutz1997 similarly strives for simplicity, though his continuity axiom is second-order, and while a line-like primitive exists, only a single undefined relation is required.

2.2.1 Mechanisation in Geometry

Several axiomatic approaches to geometry have been (at least partially) formalised in Isabelle/HOL. Hilbert’s Grundlagen has seen work in Isabelle by Meikle, Scott and Fleuriot meikle2003 ; scott2008 , and further investigation of both the axioms and tools for their study in HOL Light scott2011 ; scott2015 . Tarski’s axiom system was investigated by Narboux in Coq narboux2007 , and its independence verified in Isabelle by Makarios makarios2012 . Geometric formalisations also exist e.g. for projective geometry in Coq magaud2011 and again for Tarski’s geometry in Mizar grabowski2016 . We refer to a recent review for a more comprehensive picture narboux2018 .

Our formalisation bears some similitude to the above work on Hilbert’s Grundlagen in a number of respects since several of Schutz’ axioms originate in the Grundlagen (see Section 3). For example, our definition of chains (Section 3.2), one of the most fundamental constructs in this paper, relies on an adapted definition due to Scott’s work on the Grundlagen in HOL Light scott2015 . As another example, we employ the same weakening of Schutz’ Axiom O3 that can be found in Scott’s formalisation of Hilbert’s Axiom II.1. Scott also finds a result very similar to our chain_unique_upto_rev (see Section 4.8): while he obtains it from a remark of Hilbert’s (scott2015, , Section 6.7.2), we derived it by necessity in an early version of our proof of Theorem 4.13, and found the correspondence only later. Notice the formalisations of Hilbert’s Grundlagen cited here focus on the first three groups of axioms, which exclude the parallel and continuity axioms.

2.3 Isabelle/HOL

Computer-based theorem proving, verification, and proof exploration is the dominant area of automated reasoning today. A breakthrough development for the field was Scott’s work on LCF scott1993a , a typed version of the -calculus, and the subsequent construction of an interactive theorem prover of the same acronym by Gordon, Milner and Wadsworth gordon1979 . Isabelle is a generic proof assistant which continues the LCF-style of automated reasoning wenzel2008 ; paulson2019 . Its generic meta-logic (the simple type system responsible for validity checking) supports multiple instances of object logic: we will be using higher order logic (HOL), but instances for e.g. first-order logic (FOL) and ZFC set theory exist.

We review several salient aspects of Isabelle below, and give a brief introduction to proof reading and writing.

2.3.1 Automation and Readability

A proof is a repeatable experiment in persuasion.

Jim Horning

Considering the above quote, the advantage of computer assistance in logical and mathematical proof is clear. Using Isabelle (for example), we can write a proof of any (provable) theorem, and provided our readers are convinced of the soundness222The consistency of theorem provers is its own research field kuncar2017 . of Isabelle’s trusted kernel, they can take the theorem as fact without manually verifying the proof. A famous and well-popularised success of computer-verified mathematics is the Flyspeck project hales2015 . A computer-assisted proof of the Kepler conjecture was submitted for review in 1998, but only published (without the reviewers’ complete certification) in 2006 hales2006 ; lagarias2011a . The Flyspeck project is a twelve-year effort to formalise this proof, accepted to a mathematical journal in 2017.

Even if a proof is certified and trusted, it is often still instructive to read through it. One may identify methods to be used in similar problems, or generalised to unrelated areas of inquiry; intuition is built for the behaviour of the mathematical entities manipulated throughout the proof. Readability is therefore important, particularly for proofs as verbose as those often found in mechanisations. Isabelle provides us with the language Isar (Intelligible semi-automated reasoning) wenzel1999 that can be used for proofs that are both human-readable and supported by automatic solvers. Isar proofs merge the forward reasoning common in mathematical texts and natural for human readers to follow, and the backwards reasoning often useful in exploring possible avenues for a proof to be completed (see the next section for a glimpse of Isar).

Several tools for proof discovery come with the Isabelle distribution. In particular, the umbrella tool sledgehammer paulson2010 automatically chooses a range of (several hundred) facts to pass to different first-order solvers (both resolution and SMT provers), and, if successful, provides a reconstruction of the automatic proof in Isabelle/HOL. In practice, automatic proof discovery is useful, but sometimes struggles to justify steps that seem obvious to the reader, or returns proofs relying on highly unexpected facts. This may be due to the complexity of some of our definitions, or difficulty in reductions to first-order logic. It has also sometimes led to corrections to axioms.

2.3.2 Proofs and Isar

Working in Isabelle/HOL (and Isar) is a mix of meta- and object-level reasoning. This is best looked at through an example: we use a lemma named no_empty_paths from our current formalisation. We are only interested in the formalism and method for now. Sec. 3.3 will provide some context.

Meta-logic in Isabelle can be part of the inner syntax (between double quotes, e.g. \<lbrakk>...\<rbrakk> for assumptions and \<Longrightarrow> for meta-implication) or outer syntax (e.g. assumes, shows). We announce the statement of a top-level fact requiring proof with keywords such as theorem, lemma. This is followed (optionally) by a unique name, as well as the fact statement, either in inner syntax or in the more legible Isar style as above.

lemma no_empty_paths:
  assumes "Q\<in>\<P>"
  shows "Q\<noteq>{}"
proof -
  obtain a where "a\<in>\<E>"
    using nonempty_events by blast
  have "a\<in>Q \<or> a\<notin>Q" by auto
  thus ?thesis
  proof (rule disjE)
    assume "a\<in>Q"
    thus ?thesis by blast
  next
    assume "a\<notin>Q"
    then obtain b where "b\<in>\<emptyset> Q a"
      using two_in_unreach \<open>a\<in>\<E>\<close> assms
      by blast
    thus ?thesis
      using unreachable_subset_def by auto
  qed
qed

We start an Isar proof proof with the keyword proof. We can supplement proof with an initial method to use (e.g. a case split rule disjE as above or the general method safe, which splits and rewrites goals; or induct as explained below). Isabelle will try to choose a rule for us if we do not provide one, unless we prevent this using a dash (i.e. proof -). A successful proof ends with qed. Two other keywords can terminate a proof: sorry and oops. Both signify a proof that is not complete, or cannot be done, but while oops means that Isabelle will refuse use of the unproven fact, sorry allows an unproven statement to be used in legitimate proofs of other propositions. Thus sorry can be quite dangerous (see Sec. 2.3.3 for an alternative). It is useful, however, for checking which subgoals could be sufficient to prove a lemma.

Intermediate facts are declared using for example have or hence, and facts that satisfy the current goal using show or thus. This is followed by an optional name and the fact statement, and proved using a separate proof (*…*) qed block, with its own scope for variables. Such blocks can be nested. We may provide useful facts after using, and a proof method or automatic theorem provers (ATP) after by. Isabelle will now verify whether this method and collection of results are sufficient to prove the desired statement. The sixth line of the listing above is a simple example of this procedure. Multiple facts can be listed after a single name, and proved all at once; such facts can be referenced by their given name, accompanied by a number in brackets that indicates which fact it was (e.g. factname(2)).

Several results of our formalisation are proved by induction. The method induct takes an induction parameter, which is always of type nat for our proofs, and splits the goal into subgoals, e.g. a base case and an inductive step. Each subgoal is proved in its own scope, separated from the others by next. Isabelle provides shorthand notation for the usual first lines of both split cases. The base case (case ) sets a goal that is just the lemma’s conclusion, but with the induction variable set to . The induction case (case (Suc n)) fixes , assumes the lemma’s conclusion for , and sets the goal to the conclusion for (i.e. Suc n). This assumption for is the induction hypothesis (IH).

Finally, several of our lemmas in Section 4.6 use the keywords fixes, which introduces a variable, and defines, which gives its definition as an equality (strictly speaking, a meta-equality). Isabelle will treat the fixed variable as an abbreviation for its defining statement. We refer to the lemma show_segmentation (part of Theorem 4.11) in Section 4.6 as an example.

2.3.3 Locales

One useful feature, particularly for sizeable axiom systems such as ours, is Isabelle’s locale mechanism. One can think of a locale as a parameterised context: it names one or more “arbitrary but fixed” parameters, and assumes some initial properties. In our case, these are undefined notions and axioms respectively. Since the formulation of axioms often changes as proofs are attempted because they are found wanting (e.g. axiom I6, see Theorem 4.13 in Sec. 4.8), we try to limit the amount of logic that is affected and possibly invalidated by such a change. Containing small groups of related axioms in their own separate locales circumscribes the scope of their influence. For instance, this purpose is served by our locale MinkowskiDense (see again Sec. 4.8, and below), which contains an assumption (in this case an additional, hidden assumption needed for one of Schutz’ proofs) that we do not want to spill outside the locale. This is a safer alternative to using sorry.

Locales have additional practical benefits: they are augmented by each theorem proven inside them, they can extend other locales, and they can be interpreted. The latter allows an explicit example to an abstract algebraic concept (e.g. , 3D-rotations, form a concrete instance of a group). This means that if we eventually want to find a model of our system, we can do so in steps: showing some interpretation satisfies our locale MinkowskiChain (see Sec. 3.2) gives us immediate access to that locale’s theorems (e.g. collinearity2), and those of any locales it extends. These theorems may then be used to prove satisfies the additional requirements of a locale extending MinkowskiChain.

An example locale from our formalisation is given below. The locale MinkowskiDense here extends MinkowskiSpacetime with the additional assumption named path_dense. The context block of the locale is delimited by begin (*…*) end. Alternatively, the locale of an individual result can be specified directly using the keyword in (fictitious example below).

locale MinkowskiDense = MinkowskiSpacetime +
  assumes path_dense: "path ab a b \<Longrightarrow> \<exists>x. [[a x b]]"
begin
  lemma (in MinkowskiSpacetime) example: "True"
    by simp
end

Since model proofs are outside the scope of this work, locales serve mostly an organisational purpose for our formalisation.

3 Axioms

Schutz proves several properties of his axiomatic system in his monograph schutz1997 : consistency (relative to the real numbers), categoricity, and independence. He insists upon independence i.e. that none of his axioms can be derived from any combination of the others: he considers that the search for it has made his axioms more intuitive.

Some of the axioms as we encode them in Isabelle are subtly different from Schutz’ statements. These changes are due in some cases to the requirements of Isabelle/HOL (e.g. Isabelle’s functions being total on types, not sets), in other cases some details are not considered in the original axiom, and several are just a matter of choice and simplicity (e.g. reformulations for easier use in interactive proofs). These choices will be discussed as we proceed with our exposition. In most cases, Schutz’ formulation can be easily restored as a theorem, by using the entire system of axioms.

Schutz lays out his axioms in two main groups: order and incidence. The first relates betweenness to events and paths, and establish a kind of plane geometry with axiom O6. The second deals with the relationships between events and paths, and also contains statements regarding unreachable subsets, which make a Euclidean/Galilean model impossible. In contrast to Schutz, we present axioms according to their specificity to Minkowski spacetime. In particular, our main comparison is with Hilbert’s Grundlagen der Geometrie hilbert1950 , which introduced the separation of incidence and order axioms.

fancylinefancylinetodo: fancyline

A short discussion of sort against work that we’ve done mechanizing Hilbert will probably be needed somewhere. - Added something above, is it enough?

Since several definitions of derived objects are required for stating some axioms, we construct our system as a hierarchy of locales (Sec. 2.3.3), defining objects in the locale they make most sense in, and often just before they are needed.

3.1 Primitives and Simple Axioms

The first axioms, introduced in the locale MinkowskiPrimitive together with the primitive notions of events and paths (which are introduced with the keyword fixes), are similar to examples found in many other geometric axiom systems, notably Hilbert hilbert1950 . Schutz names them I1, I2, I3 (schutz1997, , p. 13), and they assert basic properties of two primitives: a set of events, , and a set of paths, , where each path is a set of events.

Axiom (I1 (Existence))

is not empty.

Axiom (I2 (Connectedness))

For any two distinct events there are paths , such that , , and .

Axiom (I3 (Uniqueness))

For any two distinct events, there is at most one path which contains both of them.

As an example for the verbosity of a full formalisation, contrast Axiom LABEL:ax:I3 with the many premises of its formalised version eq_paths, and its customary translation of “there is at most one” as “if given two such objects, they must be equal”. Importantly, note that we also require one axiom Schutz does not have: in_path_event, which excludes the possibility of non-event objects of the appropriate type being in a path, and guarantees is a subset of the powerset of , not the universal set.

locale MinkowskiPrimitive =
  fixes \<E> :: "’a set"
    and \<P> :: "(’a set) set"
  assumes in_path_event [simp]: "\<lbrakk>Q \<in> \<P>; a \<in> Q\<rbrakk> \<Longrightarrow> a \<in> \<E>"
      (* I1 *)
      and nonempty_events [simp]: "\<E> \<noteq> {}"
      (* I2 *)
      and events_paths:
        "\<lbrakk>a \<in> \<E>; b \<in> \<E>; a \<noteq> b\<rbrakk>
        \<Longrightarrow> \<exists>R\<in>\<P>. \<exists>S\<in>\<P>. a \<in> R \<and> b \<in> S \<and> R \<inter> S \<noteq> {}"
      (* I3 *)
      and eq_paths [intro]:
        "\<lbrakk>P \<in> \<P>; Q \<in> \<P>; a \<in> P; b \<in> P; a \<in> Q; b \<in> Q; a\<noteq>b\<rbrakk> \<Longrightarrow> P = Q"

Nothing initially defines apart from the type of its elements, yet we do not take to be the universal set of type a. This choice is made since it may lead to easier model instantiations in the future: for example, it allows building a model from a subset of natural numbers without defining an extra datatype. Given Isabelle’s lack of subtypes, if events were the universal set of some type, a model over a subset of natural numbers could not make immediate recourse to the type nat, but would need to define an entirely new type. A universal set of events would also differ from Schutz’ language. For example, types are never empty in Isabelle, so a universal set of events already implies Axiom LABEL:ax:I1. The set of paths is always envisaged as a strict subset of the powerset of – otherwise the axioms introduced later in Sec. 3.3 lose all relevance.

Our final undefined notion, the ternary relation of betweenness, is defined over events. It is introduced in the locale MinkowskiBetweenness, which extends MinkowskiPrimitive and contains the first five axioms of order (O1 - O5) (schutz1997, , p. 10).

The axioms of order in Schutz’ system are in close analogy with axioms of the same name in Hilbert’s Grundlagen (i.e. his group II). Hilbert’s Axiom II.1 combines Schutz’ Axioms LABEL:ax:O1, LABEL:ax:O2, LABEL:ax:O3; Hilbert’s II.2 becomes Schutz’ Theorem 4.6, II.3 becomes Theorem 4.1. Pasch’s axiom exists in both systems, respectively as II.4 and LABEL:ax:O6.

Axiom (O1 )

For events ,

Axiom (O2 )

For events ,

Axiom (O3 )

For events ,

Axiom (O4 )

For distinct events ,

Axiom (O5 )

For any path and any three distinct events ,

Schutz denotes betweenness as , but since that notation is used for lists in Isabelle, we define it to be [[_ _ _]] below.

locale MinkowskiBetweenness = MinkowskiPrimitive +
  fixes betw :: "’a \<Rightarrow> a \<Rightarrow> a \<Rightarrow> bool" ("[[_ _ _]]")
      (* O1 *)
  assumes abc_ex_path: "[[a b c]] \<Longrightarrow> \<exists>Q\<in>\<P>. a \<in> Q \<and> b \<in> Q \<and> c \<in> Q"
      (* O2 *)
      and abc_sym: "[[a b c]] \<Longrightarrow> [[c b a]]"
      (* O3 *)
      and abc_ac_neq: "[[a b c]] \<Longrightarrow> a \<noteq> c"
      (* O4 *)
      and abc_bcd_abd: "\<lbrakk>[[a b c]]; [[b c d]]\<rbrakk> \<Longrightarrow> [[a b d]]"
      (* O5 *)
      and some_betw:
        "\<lbrakk>Q \<in> \<P>; a \<in> Q; b \<in> Q; c \<in> Q; a \<noteq> b; a \<noteq> c; b \<noteq> c\<rbrakk>
               \<Longrightarrow> [[a b c]] \<or> [[b c a]] \<or> [[c a b]]"

Three of these have mild changes compared to Schutz: O3 and O5 are slightly weaker (having weaker conclusions) since the original statements are actually derivable (in the same locale). In O4, Schutz’ condition that be distinct has to be removed. This is because distinctness of and is already implied by LABEL:ax:O3, and requiring makes Schutz’ proof of Theorem 4.1 impossible (see Sec. 4.1).

We prove Schutz’ Axiom LABEL:ax:O3 from our formulation of Axioms LABEL:ax:O2, LABEL:ax:O3, LABEL:ax:O4; and Schutz’ Axiom LABEL:ax:O5 from our LABEL:ax:O2 and LABEL:ax:O5.

3.2 Chains

The final axiom of order given by Schutz is analogous to the axiom of Pasch, which is common in axiomatic geometric systems. It is stated in terms of particular subsets of paths called chains, which Schutz defines as follows (schutz1997, , p. 11).

Definition 1

A sequence of events (of a path ) is called a chain if:

  1. it has two distinct events, or

  2. it has more than two distinct events and for all ,

This is hard to represent in Isabelle because of the notion of a sequence as an indexed set. The informal naming convention of using a label for an event encodes two pieces of information: that the event lies on path , and that several betweenness relations hold with other events indexed by adjacent natural numbers. Following Palmer and Fleuriot palmer2018 and Scott (scott2015, , p. 110), we explicitly give a function (with ) that is order-preserving, and use this to define chains. The predicate ordering formalises what we mean by “order-preserving”, taking as arguments an indexing function f, a ternary relation ord on the codomain of f, and a set of events X.

definition ordering ::
  "(nat \<Rightarrow> a) \<Rightarrow> (’a \<Rightarrow> a \<Rightarrow> a \<Rightarrow> bool) \<Rightarrow> a set \<Rightarrow> bool"
  where "ordering f ord X
    \<equiv> \<forall>n. (finite X \<longrightarrow> n < card X) \<longrightarrow> f n \<in> X \<and>
      \<forall>x\<in>X. (\<exists>n. (finite X \<longrightarrow> n < card X) \<and> f n = x) \<and>
      \<forall>n n n’’. (finite X \<longrightarrow> n’’ < card X) \<and> n<n \<and> n’<n’’
                \<longrightarrow> ord (f n) (f n’) (f n’’)"

Our chains differ from Schutz’ in that they use sets instead of his sequences, and that while he assumes chains to lie on paths, we prove this as a theorem (chain_on_path). We also have a stronger condition on preserving long-range order: in our case, must hold for any , while Schutz only considers .333A kind of chain that is more precisely similar to Schutz’ definition is briefly introduced in Sec. 4.1. Notice that we split the definition between chains of two events, short_ch, and chains with at least three events, long_ch_by_ord, as Schutz does. The abbreviation path_ex used in the definition of the two-event chain asserts that two elements are distinct, and that there is a path containing both. The cardinality of a set , denoted in prose, is card X in Isabelle. It is a natural number, and infinite sets have cardinality , just like the empty set does. The conditions involving cardinality in ordering are used to ensure that a natural number is a valid index into the chain.

definition short_ch :: "’a set \<Rightarrow> bool"
  where "short_ch X \<equiv>
    \<exists>x\<in>X. \<exists>y\<in>X. path_ex x y \<and> \<not>(\<exists>z\<in>X. z\<noteq>x \<and> z\<noteq>y)"
definition long_ch_by_ord :: "(nat \<Rightarrow> a) \<Rightarrow> a set \<Rightarrow> bool"
  where "long_ch_by_ord f X \<equiv>
    \<exists>x\<in>X. \<exists>y\<in>X. \<exists>z\<in>X. x\<noteq>y \<and> y\<noteq>z \<and> x\<noteq>z \<and> ordering f betw X"
definition fin_long_chain :: "(nat\<Rightarrow>’a)\<Rightarrow>’a\<Rightarrow>’a\<Rightarrow>’a\<Rightarrow>’a set\<Rightarrow>bool"
    ("[_[_ .. _ ..  _]_]")
  where "fin_long_chain f x y z Q \<equiv>
    x\<noteq>y \<and> x\<noteq>z \<and> y\<noteq>z \<and> finite Q \<and> long_ch_by_ord f Q \<and>
    f 0 = x \<and> y\<in>Q \<and> f (card Q - 1) = z"

Two auxiliary definitions are made to capture Schutz’ prose definitions more directly.

definition ch_by_ord :: "(nat \<Rightarrow> a) \<Rightarrow> a set \<Rightarrow> bool"
  where "ch_by_ord f X \<equiv>
    short_ch X \<or> long_ch_by_ord f X"
definition ch :: "’a set \<Rightarrow> bool"
  where "ch X \<equiv> \<exists>f. ch_by_ord f X"

We point out the notation: a fin_long_chain is denoted [f[x..y..z]X], and we carry the indexing function and the set of all chain elements explicitly; this is absorbed into Schutz’ subscripting notation. We are now ready to describe the final axiom of order.

Axiom (O6 )

If , , are distinct paths which meet at events , , and if:

  1. there is an event such that , and

  2. there is an event and a path which passes through both and such that ,

then meets in an event which belongs to a finite chain .

locale MinkowskiChain = MinkowskiBetweenness +
  assumes O6:
    "\<lbrakk>Q \<in> \<P>; R \<in> \<P>; S \<in> \<P>; T \<in> \<P>; Q \<noteq> R; Q \<noteq> S; R \<noteq> S;
      a \<in> Q\<inter>R \<and> b \<in> Q\<inter>S \<and> c \<in> R\<inter>S;
      \<exists>d\<in>S. [[b c d]] \<and> (\<exists>e\<in>R. d \<in> T \<and> e \<in> T \<and> [[c e a]])\<rbrakk>
    \<Longrightarrow> \<exists>f\<in>T\<inter>Q. \<exists>X. [[a..f..b]X]"

Although the statement is technical, the intention of O6 (or Pasch’s axiom) is simple. Using some intuition from Euclidean geometry, a rough translation is: if three paths meet in a triangle, then a fourth path which intersects one side of the triangle externally, and another internally, must meet the third side internally as well (see Fig. 1). Such an intuitive understanding can be justified by noting that similar axioms occur e.g. in Hilbert’s Grundlagen hilbert1950 and its mechanisation meikle2003 ; it is not O6 that makes our system non-Euclidean.

Figure 1: Intuitive visualisation of axiom O6. A path that meets externally to the triangle (in ) and meets internally (in ), must meet the third side of the triangle internally (in ).

3.3 Unreachability

While the axioms of the previous sections establish a geometry, nothing in them excludes a Euclidean space with Galilean relativity, i.e. velocities that are additive across reference frames (schutz1997, , p. 12). Crucially, no speed limit is implied so far, and thus there is no trajectory through space and time that is forbidden. The next group of axioms (I5-I7) specifies existence and basic properties of unreachable sets, a concept tightly linked to the lightcones often used in relativistic physics (gourgoulhon2013g, , sec. 1.4). In fact, if we pre-empt significantly, and hypothesise our undefined paths to relate to observer worldlines, one can glean the notion of an ultimate speed limit hidden in the condition that certain regions of spacetime should not be connected by paths. Ultimately, saying that nothing can move faster than some speed is merely the statement that certain histories or trajectories through space and time should not occur. We begin by formalising Schutz’ various notions of unreachable sets.

Definition 2 (Unreachable Subset from an Event)

Given a path and an event , we define the unreachable subset of from to be

definition unreachable_subset ::
  "’a set \<Rightarrow> a \<Rightarrow> a set" ("\<emptyset> _ _" [100, 100] 100)
  where "unreachable_subset Q b
           \<equiv> {x\<in>Q. Q \<in> \<P> \<and> b \<in> \<E> \<and> b \<notin> Q \<and> \<not>(path_ex b x)}"

The pen-and-paper definition is simple enough: it collects all the events of a path that cannot be connected (by a path) to another event . In prose, we use Schutz’ notation , where is used like a flag filtering elements of whereas the Isabelle version uses \<emptyset> Q b, where behaves as a function symbol. Note that the empty set in Isabelle is denoted {}, so ambiguity is not an issue.

The second definition is more complex: if meets at , then we use the notation \<emptyset> Q from Qa via R at x to collect all events that are on the side of the intersection given by , and where some event on is connected neither to nor (see Fig. 2).

Definition 3 (Unreachable Subset via a Path (schutz1997, , pp. 16))

For any two distinct paths , which meet at an event , we define the unreachable subset of from via to be

definition unreachable_subset_via ::
  "’a set \<Rightarrow> a \<Rightarrow> a set \<Rightarrow> a \<Rightarrow> a set"
    ("\<emptyset> _ from _ via _ at _" [100, 100, 100, 100] 100)
  where "unreachable_subset_via Q Qa R x
    \<equiv> {Qy. [[x Qy Qa]] \<and> (\<exists>Rw\<in>R. Qa \<in> \<emptyset> Q Rw \<and> Qy \<in> \<emptyset> Q Rw)}"
Figure 2: The event belongs to the unreachable subset of from via . Thus there is an event , such that there are no paths connecting or (dashed lines). In this case, also belongs to the unreachable subset of from .

Next, we give the formalised axioms I5-I7, introduced in the locale MinkowskiUnreachable, together with their prose formulation and some comment. Axiom LABEL:ax:I5 is simple once unreachable sets from events are understood. It has important implications for many proofs, since it is necessary to guarantee that the empty set is not a path (see Sec. 2.3.2, where this result serves as an example listing). It is the only axiom that mentions the existence of events on a path.

Axiom (I5 )

For any path and any event , the unreachable set contains (at least) two events.

fancylinefancylinetodo: fancylinecheck globally and in Isa: chains are XYZ, paths are PQR
locale MinkowskiUnreachable = MinkowskiChain +
  assumes (*I5*) two_in_unreach:
    "\<lbrakk>Q \<in> \<P>; b \<in> \<E>; b \<notin> Q\<rbrakk> \<Longrightarrow> \<exists>x\<in>\<emptyset> Q b. \<exists>y\<in>\<emptyset> Q b. x \<noteq> y"

Schutz calls axiom I6 “Connectedness of the Unreachable Set”. Indeed, given two unreachable (from ) events on a path , it essentially states that any points between must be unreachable too. This is phrased in terms of a finite chain with endpoints .

Axiom (I6 )

Given any path , any event and distinct events , there is a finite chain with and such that for all ,

  1. .

Notice the extra clause for short chains in the formalisation: if we have only two events, ternary ordering is meaningless, thus so is . This means that while Schutz often just doesn’t mention two-event chains (supposing perhaps that this part of a proof is obvious), Isar statements and proofs have to be split, making them more complicated. The two-event clause was needed for the proof of Theorem 4.13 (see Sec 4.8).

    assumes I6:
    "\<lbrakk>Q \<in> \<P>; b \<notin> Q; b \<in> \<E>; Qx \<in> (\<emptyset> Q b); Qz \<in> (\<emptyset> Q b)\<rbrakk>
     \<Longrightarrow> \<exists>X f. ch_by_ord f X \<and> f 0 = Qx \<and> f (card X - 1) = Qz \<and>
         (\<forall>i\<in>{1 .. card X - 1}. (f i) \<in> \<emptyset> Q b \<and>
             (\<forall>Qy\<in>\<E>. [[(f(i-1)) Qy (f i)]] \<longrightarrow> Qy \<in> \<emptyset> Q b)) \<and>
         (short_ch X \<longrightarrow> Qx\<in>X \<and> Qz\<in>X \<and>
            (\<forall>Qy\<in>\<E>. [[Qx Qy Qz]] \<longrightarrow> Qy \<in> \<emptyset> Q b))"

Axiom I7 about the “Boundedness of the Unreachable Set” is reminiscent of the Archi-medean property22todo: 2is this a good idea? archimedean is about fixed-size steps, while we have no length yet, namely that one can “leave” the unreachable set in finitely many “steps”. A simplified illustration is given in Fig. 4.

Axiom (I7 )

Given any path , any event , and events and , there is a finite chain

with , and .

We drop the double naming of the events and , noting the index of is implied once the chain is defined. The complement of the unreachable set, , is best thought of as all the events of path that can be reached by a path passing through . Axiom I7 is then straightforwardly formalised as:

    assumes I7:
    "\<lbrakk>Q \<in> \<P>; b \<notin> Q; b \<in> \<E>; Qx \<in> Q - \<emptyset> Q b; Qy \<in> \<emptyset> Q b\<rbrakk>
     \<Longrightarrow> \<exists>g X Qn. [g[Qx..Qy..Qn]X] \<and> Qn \<in> Q - \<emptyset> Q b"

3.4 Symmetry and Continuity

The final two axioms, symmetry and continuity, both receive their own locale. Although neither is used in proofs in this paper, we still present them in full as they are non-trivial to formalise in Isabelle.

The axiom of symmetry is a hefty statement that, according to Schutz schutz1997 , serves as a replacement of an entire axiom group in geometries such as Hilbert’s Grundlagen. Continuity is simple to state, but relies on mechanised definitions of bounds and closest bounds. We break up the presentation of the formalised axiom of symmetry, explaining the conclusion as we go along. See also Figure 3.

Axiom (S (Symmetry) )

If are distinct paths which meet at some event and if is an event distinct from such that

then

  1. there is a mapping

  2. which induces a bijection , such that

  3. the events of are invariant, and

  4. .

Figure 3: Visualisation of Axiom LABEL:ax:symmetry. The unreachable subsets of from via and (indicated by dashed lines) are equal, so the induced symmetry mapping takes to .
locale MinkowskiSymmetry = MinkowskiUnreachable +
  assumes Symmetry:
    "\<lbrakk>Q \<in> \<P>; R \<in> \<P>; S \<in> \<P>; Q \<noteq> R; Q \<noteq> S; R \<noteq> S;
    x \<in> Q\<inter>R\<inter>S; Q\<^sub>a \<in> Q; Q\<^sub>a \<noteq> x;
    \<emptyset> Q from Q\<^sub>a via R at x = \<emptyset> Q from Q\<^sub>a via S at x\<rbrakk>

The first two lines essentially say that are distinct paths in SPRAY x (see Sec. 3.5), and obtain an event on . The third states that the unreachable sets of from the source via and are the same.

fancylinefancylinetodo: fancylinecommented paragraph: rewrite?

We split up the conclusion of the axiom, reproducing Schutz’ prose (schutz1997, , p. 16) for each of the parts (i)-(iv); notice the first line below quantifies the entire conclusion.

  1. there is a mapping

        \<Longrightarrow> \<exists>\<theta>::’a\<Rightarrow>’a.
  2. which induces444Schutz doesn’t give an explicit form for . Since the set of paths is contained in the powerset of events, taking the direct image under to be the induced bijection seems the only choice. a bijection

                    bij_betw (\<lambda>P. {\<theta> y | y. y\<in>P}) \<P> \<P> \<and>
  3. the events of are invariant, and

                    (y\<in>Q \<longrightarrow> \<theta> y = y) \<and>
  4.                 (\<lambda>P. {\<theta> y | y. y\<in>P}) R = S

Schutz’ statement is not completely clear on whether he means to be invariant under or . We settled on the stronger version, involving -invariance: it is stronger than the alternative only by also preserving the ordering of the events on . Since this ordering affects unreachable sets, not preserving it seemed to go against the spirit of the axiom.

The axiom of continuity compares to the property of least upper bounds on the real numbers (also called Dedekind completeness). Indeed, Theorem 12 (entitled “Continuity”), the first to use this axiom, deals with sets that look very similar to Dedekind cuts dedekind1963 . Bounds are defined by Schutz only for infinite chains.

Definition 4 ((Closest) Bound (schutz1997, , pp. 17))

Given a path and an infinite chain of events in , the set

is called the set of bounds of the chain: if is non-empty we say that the chain is bounded. If there is a bound such that for all ,

we say that is a closest bound.

Axiom (C (Continuity))

Any bounded infinite chain has a closest bound.

The formalisation in this case is straightforward. We formally define bounds first.

definition is_bound_f :: "’a \<Rightarrow> a set \<Rightarrow> (nat\<Rightarrow>’a) \<Rightarrow> bool" where
  "is_bound_f Q\<^sub>b Q f \<equiv>
    \<forall>i j ::nat. [f[(f 0)..]Q] \<and> (i<j \<longrightarrow> [[(f i) (f j) Q\<^sub>b]])"
definition bounded :: "’a set \<Rightarrow> bool" where
  "bounded Q \<equiv> \<exists> Q\<^sub>b f. is_bound_f Q\<^sub>b Q f"
definition closest_bound :: "’a \<Rightarrow> a set \<Rightarrow> bool" where
  "closest_bound Q\<^sub>b Q \<equiv> \<exists>f. is_bound_f Q\<^sub>b Q f \<and>
    (\<forall> Q\<^sub>b’. (is_bound Q\<^sub>b Q \<and> Q\<^sub>b \<noteq> Q\<^sub>b) \<longrightarrow> [[(f 0) Q\<^sub>b Q\<^sub>b’]])"

The axiom of continuity is now so simple that the Isabelle locale below is easily readable.

locale MinkowskiContinuity = MinkowskiSymmetry +
  assumes Continuity: "bounded Q \<longrightarrow> (\<exists>Q\<^sub>b. closest_bound Q\<^sub>b Q)"

3.5 Path Dependence and Dimension

The final axiom we introduce is that of dimension. It comes last in our hierarchy of locales because spacetimes in different numbers of dimensions can then be constructed. Thus we found it sensible to have an easily replaceable top layer that specifies only the axiom least critical to the general Minkowski spacetime structure, in case one wants to explore other dimensions.

However, this axiom has a hidden purpose much more fundamental than we first realised: it is the only one that excludes a singleton set of events with an empty set of paths from being a model. As a result, the axiom of dimension turns out to be crucial to several fairly basic proofs involving geometric construction of several paths (that without it could not be guaranteed to exist), and we end up working inside the full MinkowskiSpacetime locale for many more proofs than originally expected (notably, any proof requiring the overlapping ordering lemmas presented in Sec. 4.6). A minor restructuring could isolate an axiom for existence of at least one path: if applications in higher or lower dimensions are deemed important in future work, this is easily done.555This may not even break independence, as Schutz’ independence model for I4 is simply 1+1-dimensional spacetime. We keep Schutz’ formulation for now.

Defining dimensionality in linear algebra requires the idea of linear dependence and independence. Since vector spaces are not included in our axioms, we need a more basic notion, namely an idea of paths depending on other paths. This relation is defined only for a set of paths that all cross in one point and is called a SPRAY (schutz1997, , p. 13).

Definition 5

Given any event ,

definition SPRAY :: "’a \<Rightarrow> (’a set) set"
  where "SPRAY x \<equiv> {R\<in>\<P>. x \<in> R}"

Path dependence in a SPRAY is defined first for a set of three paths (schutz1997, , p. 13):

Definition 6

A subset of three paths of a SPRAY is dependent if there is a path which does not belong to the SPRAY and which contains one event from each of the three paths: we also say any one of the three paths is dependent on the other two. Otherwise the subset is independent.

definition dep3_event :: "’a set \<Rightarrow> a set \<Rightarrow> a set \<Rightarrow> a \<Rightarrow> bool"
  where "dep3_event Q R S x
    \<equiv> Q \<noteq> R \<and> Q \<noteq> S \<and> R \<noteq> S
      \<and> Q \<in> SPRAY x \<and> R \<in> SPRAY x \<and> S \<in> SPRAY x
      \<and> (\<exists>T\<in>\<P>. T \<notin> SPRAY x
        \<and> (\<exists>y\<in>Q. y \<in> T) \<and> (\<exists>y\<in>R. y \<in> T) \<and> (\<exists>y\<in>S. y \<in> T))"

To obtain path dependence for an arbitrary number of paths, we extend the base case above by induction, quoting Schutz (schutz1997, , p. 14):

Definition 7

A path is dependent on the set of paths (where )

if it is dependent on two paths and , where each of these two paths is dependent on some subset of paths from the set . We also say that the set of paths is a dependent set. If a set of paths has no dependent subset, we say that the set of paths is an independent set.

inductive dep_path :: "’a set \<Rightarrow> (’a set) set \<Rightarrow> a \<Rightarrow> bool"
  where
    dep_two: "dep3_event T A B x \<Longrightarrow> dep_path T {A, B} x"
  | dep_n: "\<lbrakk>S \<subseteq> SPRAY x; card S \<ge> 3; dep_path T {S1, S2} x;
      S \<subseteq> S; S’’ \<subseteq> S; card S = card S - 1; card S’’ = card S - 1;
      dep_path S1 S x; dep_path S2 S’’ x\<rbrakk>
        \<Longrightarrow> dep_path T S x"

This definition uses the keyword inductive, which allows us to give a non-recursive base case and induction rules, to create the minimal set of triplets such that dep_path T S x. Notice that we keep track of the (source of the) SPRAY that the paths exist in explicitly, while Schutz keeps this implicit, referring to it as and when needed. This leaves us with only the job of transforming this inductive definition into an analytical one, such that a set of paths can be examined and found dependent or not, rather than being able only to construct such sets to measure.

definition dep_set :: "(’a set) set \<Rightarrow> bool"
  where "dep_set S \<equiv> \<exists>x. \<exists>S’\<subseteq>S. \<exists>P\<in>(S-S’). dep_path P S x"
definition indep_set :: "(’a set) set \<Rightarrow> bool"
  where "indep_set S \<equiv> \<not>(\<exists>T \<subseteq> S. dep_set T)"

Now the axiom of dimension can be given as follows, with a final definition:

Definition 8

A SPRAY is a 3-SPRAY if:

  1. it contains four independent paths, and

  2. all paths of the SPRAY are dependent on these four paths.

Axiom (I4 (Dimension))

If is non-empty, then there is at least one 3-SPRAY.

Notice Schutz introduces the Axiom LABEL:ax:I1 into the antecedent of Axiom LABEL:ax:I4. This serves the purpose of conserving independence: the empty set is an obvious model for proving independence of LABEL:ax:I1, and in this current formulation, the empty event-set vacuously satisfies Axiom LABEL:ax:I4.

Formalising the 3-SPRAY in Isabelle/HOL is long because we need to introduce the four distinct paths, all of them in a SPRAY. The final two lines of the definition are the interesting ones. Much like the Axiom of Continuity, Dimension becomes very simple, even in Isabelle, once all the preparation is complete.

definition three_SPRAY :: "’a \<Rightarrow> bool" where
  "three_SPRAY x \<equiv> \<exists>S1\<in>\<P>. \<exists>S2\<in>\<P>. \<exists>S3\<in>\<P>. \<exists>S4\<in>\<P>.
    S1 \<noteq> S2 \<and> S1 \<noteq> S3 \<and> S1 \<noteq> S4 \<and> S2 \<noteq> S3 \<and> S2 \<noteq> S4 \<and> S3 \<noteq> S4
    \<and> S1 \<in> SPRAY x \<and> S2 \<in> SPRAY x \<and> S3 \<in> SPRAY x \<and> S4 \<in> SPRAY x
    \<and> (indep_set {S1, S2, S3, S4})
    \<and> (\<forall>S\<in>SPRAY x. dep_path S {S1,S2,S3,S4} x)"
locale MinkowskiSpacetime = MinkowskiContinuity +
  (* I4 *)
  assumes ex_3SPRAY: "\<E> \<noteq> {} \<Longrightarrow> \<exists>x\<in>\<E>. three_SPRAY x"

4 Formalisation: Temporal Order on a Path

We have formalised all of Schutz’ results from Chapter 3 (Temporal Order on a Path) of his monograph, except for Theorem 12 (Continuity; see Section 5 for a short discussion). In many cases, his statements had to be extended or amended to pass Isabelle’s unforgiving scrutiny. In what follows, rather than giving formal proofs for all of these results, we sketch the proofs given by Schutz and highlight interesting features of their formalisation. We refer to the Isabelle proof document666To be accessed at https://github.com/rhjs94/schutz-minkowski-space. for the complete proof script, and the original monograph schutz1997 for sometimes more extensive prose, when we do not reproduce it.

We endeavour to present proof procedures at a comfortable level of detail. Fairly often, extra steps required in Isabelle are obvious to the inspecting reader; usually their omission does not obscure the flow of the overall argument. We therefore employ “snipping” rather freely. We denote by <proof> a proof that was cut, but exists in the associated proof script. The notation (*…*) is used for cutting away multiple not necessarily related lines, or even just a part of a line. This relaxation is possible because we trust the Isabelle verification of our proof: if one wanted to verify all the statements in this paper, one could simply make sure they exist in the Isabelle theory, identify the introduced axioms, and let Isabelle check the entire file. Regardless of snipping, all results presented are completed and accepted by Isabelle.

The following section is ordered as in Schutz’ monograph, and this structure is reflected in the formal proof document as well.

4.1 Order on a finite chain

Theorem 4.1

If then and no other order.

The point of this theorem is really to exclude other orders, as is explicitly established by Axiom LABEL:ax:O2. Schutz proceeds by contradiction, and following him forced us to change Axiom LABEL:ax:O4. For example, Schutz claims that implies (with ) the order via Axiom LABEL:ax:O4. This works only if Axiom LABEL:ax:O4 is changed to allow, in the notation of its definition in Sec. 3, the case . We obtain a contradiction from and Axiom LABEL:ax:O3, which applies here to give .

theorem theorem1:
  assumes abc: "[[a b c]]"
  shows "[[c b a]] \<and> \<not> [[b c a]] \<and> \<not> [[c a b]]"

Our formalisation is concerned only with two of the four impossible orderings, the rest being trivial via Axiom LABEL:ax:O2. In addition to theorem1, we prove a similar result called abc_only_cba. This concludes only the impossible orderings from , and is used frequently in the rest of the formalisation. It follows from LABEL:ax:O2, LABEL:ax:O3, and LABEL:ax:O4 like Theorem 4.1.

lemma abc_only_cba:
  "[[a b c]]
   \<Longrightarrow> \<not> [[b a c]] \<and> \<not> [[a c b]] \<and> \<not> [[b c a]] \<and> \<not> [[c a b]]"

The second theorem, “Order on a Finite Chain”, begins building a link between Schutz’ definition of chains, and ours (Sec. 3.2) (schutz1997, , p. 18). In fact, it allows to transform a chain with only local orderings (orderings of elements with adjacent indices) into one where any three events on the chain can be ordered – the latter being true of our chains by definition. In this way, Theorem 4.2 justifies our definition, since Schutz’ chains can be immediately transformed into this stronger variety.

Theorem 4.2 (Order on a Finite Chain)

On any finite chain , there is a betweenness relation for each ordered triple; that is

Furthermore all events of a chain are distinct.

This theorem is true by definition for the chains we define in Sec. 3.2. Indeed, it can be verified by the prover metis in a single line hurd2003 ; smolka2013 .

theorem (*2*) order_finite_chain:
  assumes chX: "long_ch_by_ord f X"
      and finiteX: "finite X"
      and ordered_nats: "0 \<le> (i::nat) \<and> i < j \<and> j < l \<and> l < card X"
    shows "[[(f i) (f j) (f l)]]"
  by (metis ordering_def chX long_ch_by_ord_def ordered_nats)

In order to check that Schutz’ proof holds, we introduce a new definition for chains, long_ch_by_ord2. This is closer to Schutz’ original definition, and similar to long_ch_by_ord, except for imposing ordering relations only on adjacent events.

definition ordering2 ::
  "(nat \<Rightarrow> a) \<Rightarrow> (’a \<Rightarrow> a \<Rightarrow> a \<Rightarrow> bool) \<Rightarrow> a set \<Rightarrow> bool"
  where "ordering2 f ord X
    \<equiv> (*…*) (\<forall>n n n’’.
      (finite X\<longrightarrow>n’’ < card X) \<and> Suc n = n \<and> Suc n = n’’
        \<longrightarrow> ord (f n) (f n’) (f n’’))"
definition long_ch_by_ord2 ::
  "(nat \<Rightarrow> a) \<Rightarrow> a set \<Rightarrow> bool"
  where "long_ch_by_ord2 f X
    \<equiv> \<exists>x\<in>X. \<exists>y\<in>X. \<exists>z\<in>X. x\<noteq>y \<and> y\<noteq>z \<and> x\<noteq>z \<and> ordering2 f betw X"

We can then state Theorem 4.2 using this new chain. Notice that Theorem 4.2 strengthens the ordering relations between chain elements to an extent that is sufficient to prove equivalence between long_ch_by_ord and long_ch_by_ord2, provided the chains are finite. This is why we use the former in most of our formalisation: it gives immediate access to a more powerful relationship between chain events.

theorem order_finite_chain2:
  assumes chX: "long_ch_by_ord2 f X"
      and finiteX: "finite X"
      and ordered_nats: "0 \<le> (i::nat) \<and> i < j \<and> j < l \<and> l < card X"
  shows "[[(f i) (f j) (f l)]]"

The proof of Theorem 4.2 follows the outline of Schutz (schutz1997, , p. 19): it is split into two proofs by induction on decreasing for , and increasing for . The induction step propagates ordering relations along increasing/decreasing indices using Axioms LABEL:ax:O2 and LABEL:ax:O4.

Distinctness of chain events is an obvious conclusion of the first part of the theorem and Axiom LABEL:ax:O3. Our explicit handling of indices allows for a clearer statement of this property, namely that distinct indices label distinct events (i.e. the indexing function is injective). Several such statements are included in the formalisation, and we give an example below. The proof relies notably on Axiom LABEL:ax:O3 only, but involves a few case splits according to how we can find a third element for the betweenness relation (e.g. whether a natural number exists between and or not).

theorem (*2ii*) index_injective:
  fixes i::nat and j::nat
  assumes chX: "long_ch_by_ord2 f X"
      and finiteX: "finite X"
      and indices: "i<j" "j<card X"
    shows "f i \<noteq> f j"

Schutz follows the statement of Theorem 4.2 with the remark that Theorem 4.10 extends it to any finite subset of a path. Indeed, there is a tight relationship between these two results, and we will mention Theorem 4.2 again in Sec. 4.6.

We can now prove an explicit claim of our chains being the same (in the finite case) as Schutz’. The proofs for each individual direction of the equivalence go through easily using Theorem 4.2.

lemma ch_equiv:
  assumes "finite X"
  shows "long_ch_by_ord f X \<longleftrightarrow> long_ch_by_ord2 f X"

4.2 First collinearity theorem

We begin by defining a fundamental structure for the geometric proofs to come. This can be intuitively thought of as a triangle – while maintaining the reassurance that Isabelle will not allow us to use any unproven Euclidean intuition about triangles.

Definition 9 (Kinematic Triangle)

A set of three distinct events is called a kinematic triangle if each pair of events belongs to one of three distinct paths: we will refer to the kinematic triangle , or simply .

Furthermore, since each path is defined by any two distinct points that lie on it (thanks to Axiom  LABEL:ax:I3), we shall denote a path that contains two distinct events and as . In Isabelle, this shorthand is not possible, but we approximate it using the following Isabelle abbreviations.

abbreviation path :: "’a set \<Rightarrow> a \<Rightarrow> a \<Rightarrow> bool" where
  "path ab a b \<equiv> ab \<in> \<P> \<and> a \<in> ab \<and> b \<in> ab \<and> a \<noteq> b"
abbreviation path_of :: "’a \<Rightarrow> a \<Rightarrow> a set" where
  "path_of a b \<equiv> THE ab. path ab a b"

Theorem 3 is a straightforward application of the Axiom of Collinearity (LABEL:ax:O6, see also Fig. 1), and named after it. Schutz provides three results of this name, of increasing complexity, with Theorem 7 being the other one included in our formalisation. The Third Collinearity Theorem, numbered 15, is fundamental to Schutz’ treatment of optical lines and causality (schutz1997, , chap. 4). Its proof relies heavily on the preceding Collinearity Theorems.

Theorem 4.3 (Collinearity)

Given a kinematic triangle and events such that

  1. there is a path , and

  2. and

then meets in an event such that .

Proof

By the previous theorem (Theorem 2), the statement of the Axiom of Collinearity (Axiom LABEL:ax:O6) implies .

The proof in Isabelle again follows Schutz closely. His proof, a single sentence quoting Axiom LABEL:ax:O6 and Theorem 2, is expanded upon merely by finding the precise paths to use in the Axiom of Collinearity (O6), namely and .

theorem (*3*) (in MinkowskiChain) collinearity:
  assumes tri_abc: "\<triangle> a b c"
      and path_de: "path de d e"
      and bcd: "[[b c d]]"
      and cea: "[[c e a]]"
    shows "(\<exists>f\<in>de\<inter>(path_of a b). [[a f b]])"

4.3 Boundedness of the unreachable set

In the spirit of Theorem 3, Schutz continues to strengthen the statements made by his axioms. Theorem 4 (Boundedness of the Unreachable Set, see also Fig. 4) is concerned with restating the Axiom I7, which shares its name, in the context of the chain order established in Theorem 2. Schutz’ proof is a one-liner referencing these two results.

Theorem 4.4 (Boundedness of the Unreachable Set)

Let Q be any path and let b be any event such that . Given events and , there is an event such that

  1. , and

  2. .

Figure 4: Boundedness of the Unreachable Set. Given (reachable from ) and (unreachable from ), Theorem 4.4 obtains (reachable from ). Axiom LABEL:ax:I7 furthermore states that all three events must be part of a finite chain.

Formalisation is again very simple, and in fact, Theorem 4.4 can be proven in one step by Isabelle’s metis. The only results needed for this (apart from the theorem assumptions, I7, and Theorem 2) are the definition of chains and a corollary of Theorem 2 without explicit indices (fin_ch_betw).

lemma fin_ch_betw:
  assumes "[f[a..b..c]X]"
  shows "[[a b c]]"
theorem (*4*) (in MinkowskiUnreachable) unreachable_set_bounded:
  assumes path_Q: "Q \<in> \<P>"
      and b_nin_Q: "b \<notin> Q"
      and b_event: "b \<in> \<E>"
      and Qx_reachable: "Qx \<in> Q - \<emptyset> Q b"
      and Qy_unreachable: "Qy \<in> \<emptyset> Q b"
  shows "\<exists>Qz\<in>Q - \<emptyset> Q b. [[Qx Qy Qz]] \<and> Qx \<noteq> Qz"
  using assms I7 order_finite_chain fin_long_chain_def
  by (metis fin_ch_betw)

Theorem 5 allows one to generate additional events, given an event and a path: a second event on the same path, and a reachable event outside the path. After Theorem 3, this is the next more involved proof of the monograph. The events provided by Theorem 4.5 form a triangle of paths, thus enabling very geometric proofs of several lemmas leading up to Theorem 4.9. These lemmas are, in practice, amongst the most important results for this work, both practically and conceptually, allowing to conclude new betweenness relations from existing ones (similarly to Axiom LABEL:ax:O4).

Theorem 4.5 (First Existence Theorem)

Given a path and an event , there is

  1. an event with distinct from , and

  2. an event and a path (distinct from ).

Schutz first shows that there is an event outside the path . This is done by contradiction, i.e. by showing that there cannot be a path containing all events (by Axiom I3, this would be the only existing path). We encapsulate this statement, the crux of the proof of Theorem 5(i), in a helper lemma.

lemma (in MinkowskiUnreachable) only_one_path:
  assumes path_Q: "Q \<in> \<P>"
      and all_inQ: "\<forall>a\<in>\<E>. a \<in> Q"
      and path_R: "R \<in> \<P>"
  shows "R = Q"

In addition to Axiom I3, we require I5 in order to prove this, which Schutz misses out. The proof is again by contradiction: If a path exists that is not , then, since is the set of all events, . A contradiction to I3, the Axiom of Uniqueness (of paths), can only be obtained if there are two events on , which is guaranteed by Axiom I5. The remainder of Theorem 5(i) follows Schutz, using I4 to contradict , and I5 again to obtain the required event .

The second statement of Theorem 5 is proved as in the original prose. In particular, now that we have two events, a second path is implied by Axiom I2, as in the statement below.

lemma ex_crossing_path:
  assumes path_Q: "Q \<in> \<P>"
  shows "\<exists>R\<in>\<P>. R \<noteq> Q \<and> (\<exists>e. e \<in> R \<and> e \<in> Q)"

Then our proof follows the case split made by Schutz: either or not. The latter case becomes a little longer than in prose, but there are no surprises. Both cases use Axiom I5 to obtain the desired reachable event . The final pair of statements for Theorem 5 is listed below.

theorem (*5i*) ge2_events:
  assumes path_Q: "Q \<in> \<P>"
      and a_inQ: "a \<in> Q"
  shows "\<exists>b\<in>Q. b \<noteq> a"
theorem (*5ii*) ex_crossing_at:
  assumes path_Q: "Q \<in> \<P>"
      and a_inQ: "a \<in> Q"
  shows "\<exists>ac\<in>\<P>. ac \<noteq> Q \<and> (\<exists>c. c \<notin> Q \<and> a \<in> ac \<and> c \<in> ac)"

4.4 Prolongation

Theorem 4.6 goes a little further in justifying our intuition of paths as line-like objects by showing they are infinite. This also gives us the means to always find more events on a path.

Theorem 4.6 (Prolongation)
  1. If are distinct events of a path , then there is an event such that .

  2. Each path contains an infinite set of distinct events.

Schutz’ proof (schutz1997, , p. 21) of the first part is straightforward, and remains so in Isabelle: the formal proof reads almost exactly like Schutz’ prose. Theorem 5(ii) provides an event and a path . Axiom I5 then guarantees existence of an event that is unreachable from ; thus . Theorem 4 delivers the desired event .

lemma (in MinkowskiSpacetime) prolong_betw2:
  assumes path_Q: "Q \<in> \<P>"
      and a_inQ: "a \<in> Q"
      and b_inQ: "b \<in> Q"
      and ab_neq: "a \<noteq> b"
  shows "\<exists>c\<in>Q. [[a b c]]"

While the second part of Theorem 4.6 can be proven almost by inspection by the reader, it is much trickier to formalise. Schutz says that “By the preceding theorem […] part (i), Theorem 1, and induction, the path contains an infinite set of distinct events”. Our problem is to formalise this list of results into an inductive proof that can be checked by Isabelle. This involves thinking about how to translate from induction on a natural number to infinity, what exactly the induction variable should be, and properly applying Isabelle’s induction rule.

The main proof is by induction on the cardinality of a subset , and is encapsulated by the helper lemma finite_path_has_ends, which allows us to choose two elements of a set of events on a path , such that all other elements of that set are between and . color=green!70,fancylinecolor=green!70,fancylinetodo: color=green!70,fancylineShall we say anything about alternative approaches we tried or thought about here? So far we haven’t discussed any dead-ends, just end-products.

lemma finite_path_has_ends:
  assumes "Q \<in> \<P>"
      and "X \<subseteq> Q"
      and "finite X"
      and "card X \<ge> 3"
    shows "\<exists>a\<in>X. \<exists>b\<in>X. a \<noteq> b \<and> (\<forall>c\<in>X. a \<noteq> c \<and> b \<noteq> c \<longrightarrow> [[a c b]])"

These events will later be used to apply the first part of Theorem 4.6. A sample listing of the proof is given below. We begin by applying the induction hypothesis to identify the edges and of the set .

proof (induct "card X - 3" arbitrary: X)
(*…*)
case IH: (Suc n)
  obtain Y x where X_eq: "X = insert x Y" and "x \<notin> Y"
    by (meson IH.prems(4) Set.set_insert three_in_set3)
  (*…*)
  obtain a b where ab_Y: "a \<in> Y" "b \<in> Y" "a \<noteq> b"
             and Y_ends: "\<forall>c\<in>Y. (a \<noteq> c \<and> b \<noteq> c) \<longrightarrow> [[a c b]]"
    using IH(1) [of Y] IH.prems(1-3) X_eq by auto

The rest of the proof treats each possible ordering of the additional event with and , to identify the extremal events of the larger set .

  consider "[[a x b]]" | "[[x b a]]" | "[[b a x]]" <proof>
  thus ?case
  proof (cases)
    (*…*)
    assume "[[x b a]]"
    { fix c