1 Introduction
A mechanized mathematics system, to be useful, must possess a large library of mathematical knowledge, on top of sound foundations. While sound foundations contain many interesting intellectual challenges, building a large library seems a daunting task simply because of its sheer volume. However, as has been welldocumented [CFJ11, CK11, GS10], there is a tremendous amount of redundancy in existing libraries. Thus there is some hope that by designing the “right” metalanguage, guided by parsimony principles [Vel07], we can reduce the effort needed to build a library of mathematics.
Our aim is to build tools that allow library developers to take advantage of commonalities in mathematics so as to build a large, rich library for endusers, whilst expending much less actual development effort than in the past. In other words, we continue with our approach of developing High Level Theories [CF08] through building a network of theories, by putting our previous experiments [CFJ11] on a sound theoretical basis.
1.1 The Problem
The problem we wish to solve is easy to state: we want to shorten the development time of large mathematical libraries. But why would mathematical libraries be any different than other software, where the quest for timesaving techniques has been long but vain [Bro95]? Because we have known since Whitehead’s 1898 text “A treatise on universal algebra” [Whi98] that significant parts of mathematics have a lot of structure, structure which we can take advantage of. The flat list of structures gathered by Peter Jipsen [Jip] is both impressively large, and could easily be greatly extended. Another beautiful source of structure in a theory graph is that of modal logics; John Halleck’s web pages on Logic System Interrelationships [Hal] is quite eye opening.
Figure 1 shows what we are talking about: The presentation of the theory Semigroup strictly contains that of the theory Magma, and so on^{1}^{1}1We are not concerned with models, whose inclusion go in the opposite direction.. It is therefore pointless for a human to enter this information multiple times – if it is actually possible to take advantage of this structure. Strict inclusions at the level of presentations is only part of the structure: for example, we know that a Ring actually contains two isomorphic copies of Monoid, where the isomorphism is given by a simple renaming. There are further commonalities to take advantage of, which we will explain later in this paper.
Another question that arises naturally: is there sufficient structure outside of the traditional realm of universal algebra, in other words, beyond singlesorted equational theories, to make it worthwhile to develop significant infrastructure to leverage that structure? Luckily for us, it turns out that there is.
We will also require tools to selectively hide (and reveal) this structure from endusers. This latter requirement stems from the observation [CF08] that in practice, when mathematicians are using theories rather than developing new ones, they tend to work in a rather “flat” namespace. An analogy: someone working in Group Theory will unconsciously assume the availability of all concepts from a standard textbook, with their usual names and meanings. As their goal is to get some work done, whatever structure system builders have decided to use to construct their system should not leak into the application domain. They may not be aware of the existence of pointed semigroups, nor should that awareness be forced upon them. On the other hand, some application domains do rely on the “structure of theories”, so we cannot unilaterally hide this structure from all users either.
1.2 Contributions
We previously explained our core ideas in [CO12], where a variant of the category of contexts was presented as our setting for theory presentations. There we presented a simple term language for building theories, along with two (compatible) categorical semantics – one in terms of objects, another in terms of arrows. By using “tiny theories”, this allowed reuse and modularity. We emphasized names, as the objects we are dealing with are syntactic and ultimately meant for human consumption. We also emphasized arrows: while this is categorically obvious, nevertheless the current literature on this topic is very objectcentric. Put another way: most of the emphasis in other work is on operational issues, or evolved from operational thinking, while our approach is unabashedly denotational, whilst still taking names seriously.
We leverage that basis here, and extend our work in multiple ways^{2}^{2}2We provide a summary of the contributions here to guide the reader who wishes to focus on the new ideas, even though much of the terminology used in this paragraph is only defined later.. First, we enhanced contexts with definitions. We treat these as firstclass citizens, so that names introduced by definitions are dealt with in the same way as all other names. The categorical semantics is extended to a fibration of generalized extensions over contexts. This is not straightforward: taking names seriously prevents us from having a cloven fibration without a renaming policy. But once this machinery is in place, this allows us to build presentations by lifting views over extensions, a very powerful mechanism for defining new presentations. There are obstacles to taking the “obvious” categorical solutions: for example, having all pullbacks would require that the underlying type theory have subset types, which is something we do not want to force. Furthermore, equivalence of terms needs to be checked when constructing mediating arrows, which in some settings may have implications for the decidability of typechecking.
1.3 Plan of paper
We motivate our work with concrete examples in section 2. Section 3 lays out the basic (operational) theory, with concrete algorithms. The theoretical foundations of our work, the fibered category of contexts, is presented in full detail in section 4. This allow us in section 5 to formalize a combinator language for theory presentation combinators. We close with some discussion, related work and conclusions in sections 7–10.
2 Motivation
We review, informally, the motivation for introducing a variety of combinators for creating new theory presentations from old. We use an informal syntax which should be readily understandable to anyone with a reasonable background in mathematics and type theory; section 5 will give a formal syntax and its semantics. Note that the “intuitive” combinators that we present here are purely motivational, as the semantics of some of these turn out to be awkward and/or contrived. We will thus have to build our formal language (almost) from scratch, based on the semantics we develop in sections 3 and 4. As we go, we will also comment on the problems which need to be overcome to obtain a reasonable solution.
It is important to remember, throughout this section, that our principal perspective is that of system builders. Our task is to form a bridge (via software) between tasks that endusers of a mechanized mathematics system may wish to perform, and the underlying (semantic) theory concerned. This bridge is necessarily syntactic, as syntax is the only entity which can be symbolically manipulated by computers. More importantly, we must respect the syntactic choices of users, even when these choices are not necessarily semantically relevant.
2.1 Extension
The simplest situation is where the presentation of one theory is included, verbatim, in another. Concretely, consider and .
As expected, the only difference is that adds a axiom. Thus, given , it would be much more economical to define
2.2 Renaming
From an enduser perspective, our has one flaw: such monoids are frequently written additively rather than multiplicatively. Let us call a commutative monoid written additively an abelian monoid, as we do with groups. Thus it would be convenient to be able to say
Immediately, one is led to ask: how are and related? Traditionally, these are regarded as equal, for semantic reasons. However, since we are dealing with presentations, as syntax, we wish to regard them as isomorphic rather than equal^{3}^{3}3Univalent Foundations[Uni13] does not change this, as we can distinguish the two, as presentations. . In other words, we take a nominal rather than structural approach, since we are dealing with syntax. While working up to isomorphism is a minor inconvenience for the semantics, this enables us to respect user choices in names.
2.3 Combination
But even with these features, given Group, we would find ourselves writing
which is problematic: we lose the relationship that every commutative group is a commutative monoid. In other words, we reduce our ability to transport results “for free” to other theories, and must prove that these results transport, even though the morphism involved is (essentially) the identity. Thus it is natural to further extend our language with a facility that expresses this sharing. Taking a cue from previous work, we might want to say
Informally, this can be read as saying that Group and CommutativeMonoid are both “extensions” of Monoid, and CommutativeGroup is formed by the union (amalgamated sum) of those extensions. In other words, by over, we mean to have a single copy of , to which we add the extensions necessary for obtaining and . This implicitly assumes that our two extensions are meant to be orthogonal, in some suitable sense.
Unfortunately, while this “works” to build a sizeable library (say of the order of concepts) in a fairly economical way, it is nevertheless brittle. Let us examine why this is the case. It should be clear that by combine, we really mean pushout. But a pushout is a ary operation on objects and arrows; our syntax gives the objects and leaves the arrows implicit. In other words, they have to be inferred. This is a very serious mistake: these arrows are (in general) impossible to infer, especially in the presence of renaming. As mentioned previously, there are two distinct arrows from Monoid to Ring, with neither arrow being “better” or somehow more canonical than the other. Furthermore, we know that pushouts can also be regarded as a ary operation on compatible arrows. In other words, even though our goal is to produce theory presentations, using pushouts as a fundamental building block, gives us no choice but to take arrows seriously.
2.4 Arrows
If we revisit the extension and renaming operations, it is easy to see that these operations not only create a new presentation, they also create a map from the source presentation into the target presentation. For extensions, this is an injective map. In other words,
creates more than just , it also creates a morphism from to . These can be written explicitly, and in this case this would be
where we use to indicate that this is a construction, and to mean that this is an assignment of terms of to names of . Clearly this would be very tedious to write out for larger theories. In concrete syntax, we would prefer to write just the nonidentity parts, so that for this case we would prefer
which is easily seen to be isomorphic to the definition we started with. Thus it would be better to simply infer these morphisms when we can. We will however not make inferability a requirement.
For renaming, it is natural to require that the map on names causes no collisions, as that would rename multiple concepts to the same name. While this is a potentially interesting operation on presentations, this is not the operation that users have in mind for renaming. Collisionfree renamings also induce an injective map.
Pushouts do create arrows as well, but unfortunately renamings are a problem: there are simple situations where there is no canonical name for some of the objects in the result. For example, take the presentation of , aka and the arrows induced by the renamings and ; while the result will necessarily be isomorphic to , there is no canonical choice of name for the end result. This is one problem we must solve. The Figure above left illustrates the issue. It also illustrates that we really do compute amalgamated sums and not simply syntactic union.
In general, a map from one presentation to another will be called a view. For example, one can witness that the additive naturals form a monoid with a statement such as
(1) 
where we elide the names of the proofs. The right hand side of an assignment in a view does not need to be a symbol, it can be any welltyped term. For example, we can have a view from to itself which maps the binary operation to its opposite:
(2) 
2.5 Little Theories
One important observation is that contexts of a type theory (or a logic) contain the same information as a theory presentation. Given a context, theorems about specific structures can be constructed by transport along views [FGT92]. For example, in the context of the definition of (2.1), we can prove that the identity element, , is unique.
In order to apply this theorem in other contexts, we can provide a view from one theory presentation to another. For example, consider the theory presentation of semirings.
There are two naturally induced views from to , one assigning to and to , and another assigning to and to (with the views also assigning the monoid axioms to their respective axioms). Each of these two views can be used to transport our example theorem to prove that and are unique with respect to their associated binary operations.
But these are not the only views from to . We do not have to restrict to assigning constants to constants – we could map constants to arbitrary terms (in the underlying language). For example we could send to .
Which leads to the inevitable conclusion that, in general, we need an explicit language for defining views. But we have to proceed with care, otherwise we risk making simple situations complicated. For example, if we required explicit identity views for extensions, this would be semantically correct but painfully verbose in practice, as was pointed out earlier.
2.6 Models
It is important to remember that models are contravariant: while there is a presentation view from to , the model morphisms are from to . Theorems are also contravariant with respect to model morphisms, so that they travel in the same direction as presentation views.
In this way a view to the empty theory presentation provides models of presentations by assigning every constant to a closed term. It is worthwhile noting that these models are internal to the underlying logic, rather than necessarily being models. For example, if our underlying logic can express the existence of a type of natural numbers, , then the view given by (1) can be used to transport our example theorem to prove that is the unique identity element for .
2.7 Tiny Theories
We noticed in our experiments [CFJ11] that for ease of extension, it was best to use tiny theories, in other words presentations which add in a single concept at a time. This is useful both for defining pure signatures (presentations with no axioms) as well as when defining properties such as commutativity. Typically one proceeds by first defining the smallest typing context in which the property can be stated. For commutativity, is the smallest such context – which also turns out to be a signature. We can then obtain the structures we are actually interested in via a “mixin” of the necessary properties over a base signature.
An example might make this clearer. Suppose we want to construct the presentation of by adding the commutativity property to (see §2.5). As commutativity is defined as an extension to , we need a view from to . This view will tell us (exactly!) which binary operation we want to make commutative. Here we would pick the view that maps to and to We can then combine that view with the injection from to to produce a presentation.
We see that this operation also requires that we provide a renaming that maps the axiom name “commutative” to “multipliciative commutative” in order to avoid the possibility of name collision (as addition was already commutative in ).
2.8 Constructions
It is worthwhile noticing that there is nothing specific to CommutativeGroup in the renaming , this can be applied to any theory where the pairs and have compatible signatures (including the case where they are not present). Similarly, extend really defines a “construction” which can be applied to any presentation whenever all the symbols used in the extension are defined. In other words, a reasonable semantics should associate a whole class of arrows^{4}^{4}4We are again being deliberately vague here, section 4 will make this precise. to these operations. While it is tempting to think that these operations will induce some endofunctors on presentations, this is not quite the case: name clashes will prevent that.
2.9 Problems
Clearly we need to have a setting in which extensions, renamings and combinations (or mixins) make sense. We will need to play close attention to names, both to allow pleasant names and prevent accidental collisions. In other words, to be able to maintain humanreadable names for all concepts, we will put the burden on the library developers to come up with a reasonable naming scheme, rather than to push that issue onto end users. Another way to see this is that symbol choice carries a lot of intentional, as well as contextual, information which is commonly used in mathematical practice.
Views will need to be formally defined, as well as a convenient language for dealing with them. While in some situations, it is imperative to be explicit about views, at other times they are obvious or easily inferred; in those latter situations, usability dictates that we should let the system do the heavy lifting for us.
Furthermore, we do want to use both the little theories and tiny theories method, so our language (and semantics) needs to allow, even promote, that style. We will see that, semantically, not all views have the same compositional properties. We will thus want to single out, syntactically, as large a subset of wellbehaved views as possible, even though we know we can’t be complete.
Our earlier attempt used an explicit base for combine, which only works for mediumscale libraries: we need to work more directly with views themselves. A common solution uses long names, which automatically generates (new, long) names to uniquely identify common names. But this has the effect of leaking the details of how a presentation was constructed into the names of the constants of the new presentation. This essentially prevents later refinements, as all these names would change. As far as we can tell, any automatic naming policy will suffer from this problem, which is why we insist on having the library developers explicitly deal with name clashes. We can then check that this has been done consistently. In practice few renamings are needed, so allowing the empty renaming annotation to denote the identity renaming scheme makes our design choice lightweight.
3 Basic Semantics
In this section we present the necessary definitions from (dependent) type theory and category theory which will form the basis of our theory presentation combinators. First we formally describe theory presentations and views, then we describe the semantics of our combinators.
Our presentations depend on a background type theory, but is otherwise agnostic as to many of the internal details of that theory. From this type theory we require the following:

An infinite set of variable names .

A typing judgement for terms of type in a context which we write .

A kinding judgement of types of kind in a context which we write
. We further assume that the set of valid kinds is given and fixed. 
A definitional equality (a.k.a. convertibility) judgement of terms of type and of type in a context , which we write . We will write to denote .

A notion of substitution on terms. Given a list of variable assignments and an expression we write for the term after simultaneous substitution of variables by the corresponding term in the assignment.
We will often denote an assignment by , and its application to a term by .
3.1 Theory Presentations
A theory presentation is a welltyped list of declarations and definitions. More formally, Figure 2 gives the formation rules. In this definition, we use to denote the set of variables of a wellformed context . Explicitly, it is given by
Here denotes the declaration of a new synonym for term of type . It is possible to develop this theory without declarations, however including them appears to make both the theory and practical implementations easier.
3.2 Views
A view from a theory presentation to a theory presentation is an assignment of welltyped expressions in to declarations of . The assigments transport welltyped terms in the context to welltyped terms in , by substitution. More formally,
There is a subtle but important distinction between assignments, and views, . A view is made up of 3 components: an assignment, a source presentation and a target presentation. In particular, the same assignment can occur in different views.
3.2.1 Extensions and Inclusions
An extension is a special type of view, which we denote
where each expression is a unique variable name from . An inclusion is a special type of extension of the form
Inclusions have the nice property that there is a most one inclusion between any two theory presentations, and that inclusions form a poset of presentations. However this nice property is also a limitation. As we have hinted at before, is an extension of in two different ways, and hence both extensions cannot be inclusions. We do not give inclusions any special status (unlike extensions); we draw attention to them here as many other systems make inclusions play a very special rôle.
3.2.2 Composition of Views
Given two views: and , we can compose them to create a view . If then the composite view is
3.2.3 Equivalence of Views
Two views with the same domain and codomain, are equivalent if where
3.2.4 The category of theory presentations
We now have all the necessary ingredients to define the category of theory presentations with theory presentations as objects, and views as morphisms. The identity inclusions are the identity morphisms, and views act on views by substitution, which is associative and respects the identity.
Note that in [CO12], we worked with , which is traditionally called the category of contexts, which is more often used in categorical logic [Car86, Jac99, Tay99, Pit00]. But in our setting, and as is common in the context of specifications (see for example [BG77, Smi93, CoF04] amongst many others), we prefer to take our intuition from textual inclusion rather than models. Nevertheless, when it will be time to define the semantics, we will revert to using , as this not only simplifies certain arguments, it also makes our work easier to compare to that in categorical logic.
3.3 Combinators
Having defined theory presentations and views (including extensions), we can now define presentation and view combinators. In fact, all combinators in this section will end up working in tandem on presentations and views. They allow us, as with most combinators, to create new presentations/views from old, in a much more convenient manner than building everything by hand.
The combinators are: extend, rename, combine and mixin. This list should be unsurprising given §2. Although we expect the majority of theory presentations and views will be constructed with these combinators, a few complex views will need to be defined directly. The reader may have noticed the absence of combinators such as delete or hide: this is quite purposeful on our part. While the operational semantics on theory presentations for these is “obvious”, the denotational semantics in terms of theory morphisms is backwards, and has distasteful properties.
We give the full details of the constructions, which are completely deterministic. These can serve as a direct design for an implementation. In other words, this section gives an operational semantics for the combinators. In the next section, we will give them a categorical semantics; we make a few inline remarks here to help the reader understand why we choose a particular construction.
3.3.1 Renaming
Given a presentation and an injective renaming function we can construct a new theory presentation by renaming ’s symbols: we will denote this action of on by . We also construct a extension from which provides a translation from to the constructed presentation ; we denote this extension by . For this construction as a whole, we use the notation
3.3.2 Extend
Given a theory presentation , a fresh name and a well formed type of some kind , (i.e. ) we can construct a new theory presentation and the extension (an inclusion in this case) . More generally, given a sequence of fresh names, types and kinds, , , and we can define a sequence of theory presentations and so long as . Given such as sequence we construct a new theory presentation with the extension (which is still an inclusion) . Of course is the concatenation of with . We will thus use to denote the target of this view whenever the components of are clear from context. However is in general not a valid presentation, as it may depend on . This is why we use an asymmetric symbol .
It is worthwhile noting that general extensions as defined in §3.2.1 can be decomposed into a renaming composed with an , in other words , where is defined by the action of the extension on , namely . We will use the notation as it makes the dependence on the extension clearer.
Extensions which are inclusions are traditionally called display maps in , and our in is denoted by in [Tay99], and in [Jac99].
For notational convenience, we can encode the construction above as an explicit function from the inputs as given above, to a record containing two fields, pres (for presentation) and extend (for extension).
where .
3.3.3 Combine
Given two extensions and and two injective renaming functions and , we can combine them and generate a new theory presentation . We require that
Say that the two extensions decompose as and . Then we define where (or equivalently ), , and . Note that, by construction, is equivalent to ; we denote this equivalence class^{5}^{5}5In practice, theory presentations are rendered (printed, serialized) using a topological sort where ties are broken alphabetically, so as to be constructionorder indedepent. of views by .
The combination operation also provides the two extensions and where
A quick calculation shows that is equal to (and not just equivalent); we denote this joint arrow . Furthermore, combine provides a set of mediating views from the constructed theory presentation . Suppose we are given views and such that the the composed views and are equivalent. We can combine and into a mediating view where
This union is well defined since if then there exists such that and , in which case and are equivalent since by assumption and are equivalent. It is also worthwhile noticing that this construction is symmetric in and .
For this construction, we use the following notation, where we use the symbols as defined above (omitting type information for notational clarity)
The attentive reader will have noticed that we have painstakingly constructed an explicit pushout in . There are two reasons to do this: first, we need to be this explicit if we wish to be able to implement such an operation. And second, we do not want an arbitrary pushout, because we do not wish to work up to isomorphism as that would “mess up” the names. This is why we need userprovided injective renamings and to deal with potential name clashes. If we worked up to isomorphism, these renamings would not be needed, as they can always be manufactured by the system – but then these are no longer necessarily related to the users’ names. Alternatively, if we use long names based on the (names of the) views, the method used to construct the presentations and views “leaks” into the names of the results, which we also consider undesirable.
3.3.4 Mixin
Given a view , an extension and two disjoint injective renaming functions and , where the extension decomposes as , we can mixin the view into the extension, constructing a new theory presentation . We define where
The mixin also provides an extension and a view , defined as
By definition of extension, there is no that is mapped into by . The definition of is arranged such that is equal to (and not just equivalent); so we can denote this joint arrow by . In other words, in a mixin, by only allowing renaming of the new components in , we insure commutativity on the nose rather than just up to isomorphism.
Mixins also provide a set of mediating views from the constructed theory presentaiton . Suppose we are given a view and view such that the composed views and are equivalent. We can combine and into the mediating view defined as
For mixin, again using the symbols as above, we denote the construction results as
Symbolically the above is very similar to what was done in combine, and indeed we are constructing all of the data for a specific pushout. However in this case the results are not symmetric, as seen from the details of the construction of and , which stems from the fact that in this case is an arbitrary view rather than an extension.
3.3.5 Reification of views
Although we (will) have a syntax and semantics for views, there are times when we wish to take views and treat them as firstclass objects. For example, if we want to show that the set of all (small) groups and group homomorpisms forms a category, we need to be able to have a “theory” of group homomorphisms. But we can think of an even simpler example: we would like to talk about the theory of opposite magmas (see the view 2 in §2.4). To do this, we need to somehow internalize (reify) this view: this is a further reason to add declarations to our presentations (§3.1).
Suppose we are given a view We want to define a new presentation which internalizes . A priori, this would require copying all of and into a new presentation, and then define relations between the terms of and via . However, might share some names with , with some sharing of names, both on purpose and accidental. While we could rename everything in and use to recover sharing, this is wasteful. In this case, we will only ask for a renaming for those names of which introduce definitions.
Given an injective renaming function , we can define a new presentation where . Note how we have used the convertibility axiom from the formation rules of views (Fig. 3) in the definition of . Naturally, we also get extensions and .
For internalization, using the symbols as above, we denote results of the construction as
4 The Categorical Theory of Semantics
At first glance, the definitions of combine and mixin may appear ad hoc and overly complicated. This is because, in practice, the renaming functions and are frequently the identity. The main reason for this is that mathematical vernacular uses a lot of rigid conventions, such as usually naming an associative, commutative, invertible operator which possesses a unit , and the unit , backward composition is , forward composition is , and so on. But the usual notation of lattices is different than that of semirings, even though they share a similar ancestry – renamings are clearly necessary at some point.
The details of the combinators combine and mixin can be motivated by giving them a categorical specification. When we do, we find out that the mixin operation is a Cartesian lifting in a suitable fibration, and the combine operation is a special case of mixin.
While our primary interest is in theory presentations, the bulk of the categorical work in this area has been done on the category of contexts, which is the opposite category. To be consistent with the existing literature, we will give our semantics in terms of . Thus if is a view from the theory presentation to the theory presentation , then is an arrow from , considered as a context, to , considered as a context. We will write such arrows as as an arrow from to when we are considering the category of contexts. Composition of two arrows is simply the composition of views.
4.1 Semantics
The category of contexts forms the base category for a fibration. The fibered category is the category of theory extensions. The objects of are extensions of theory presentations. We write such objects as where is the base of the extension and is the extended theory presentation. The notation is to remind the reader that the underlying arrows are in fact monos. Arrows between two extensions is a pair of views forming a commutative square with the extensions. Thus given extensions and , then an arrow between these consists of two arrows and from such that . When we need to be very precise, we write such an arrow as . We will write whenever the rest of the information can be inferred from context. When given a specific arrow in , we will use the notation for its name.
A fibration of over is defined by giving a suitable functor from to . Our “base” functor sends an extension to and sends an arrow to its base arrow .
Theorem 4.1
This base fibration is a Cartesian fibration.
This theorem, in slightly different form, can be found in [Jac99] and [Tay99]. We give a full proof here because we want to make the link with our mixin construction explicit. We use the results of §3.3 directly.
Suppose is an arrow in (a view), and is an object of in the fiber of (i.e. an extension). We need to construct a Cartesian lifting of , which is a Cartesian arrow of over . The components of the mixin construction are exactly the ingredients we need to create this Cartesian lifting. Let and be two disjoint injective renaming functions. Note that such and always exist because is infinite while and are finite. Then let
Then is an arrow of which is a Cartesian lift of .
Firstly, to see that is in fact an arrow of , we note that is an extension, so is an object of . Next we need to show that . Let . Then by definition of . On the other hand, by definition of . However, is a variable since is an extension, and by definition so that we have as required.
Secondly we need to see that is a Cartesian lift of . We note that it is plain to see that is the base of . What remains is to show that for any other arrow from and any arrow from such that , there is a unique mediating arrow
Comments
There are no comments yet.