Java Generics: An Order-Theoretic Approach (Abridged Outline)

06/20/2019 ∙ by Moez A. AbdelGawad, et al. ∙ Rice University 0

The mathematical modeling of generics in Java and other similar nominally-typed object-oriented programming languages is a challenge. In this short paper we present the outline of a novel order-theoretic approach to modeling generics, in which we also elementarily use some concepts and tools from category theory. We believe a combined order-theoretic and category-theoretic approach to modeling generics holds the keys to overcoming much of the adversity found when analyzing features of generic OO type systems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Generics have been added to Java so as to increase the expressiveness of its type system [25, 26, 18, 22, 39]. Generics in Java and other mainstream nominally-typed OOP languages similar to it such as C# [1], Scala [32], C++ [2], and Kotlin [3]

, however, include some features—such as variance annotations (

e.g., Java wildcards), -bounded generics, and Java erasure—that have been hard to analyze and reason about so far [41, 40, 20, 19, 36, 38]. As a result, the type systems of mainstream nominally-typed OOP languages, which are built based on current mathematical models of generics, are overly complex111Check, for example, [31], or sections of the Java language specification that specify crucial parts of its generic type system, e.g., [26, 4.5 & 5.1.10]., thereby hindering the progress of these type systems.

Further, support of some features in Java generics has a number of irregularities or “rough edges.” These include type variables that can have upper -bounds but cannot have lower bounds (let alone lower -bounds), wildcard type arguments that can have an upper-bound or a lower-bound but not both, and Java erasure—a feature prominent in Java and Java-based OOP languages such as Scala and Kotlin—that is usually understood, basically, as being “outside the type system.”

In this short paper we outline a new order-theoretic approach to modeling Java generics, and we report on our progress in developing this approach. The main contribution of the approach is demonstrating how concepts and tools from order theory can significantly simplify analyzing and reasoning about subtle features of OO generics, including the ones mentioned above.

Fundamentally, in the approach we use the nominal subclassing relation (as a partial ordering between classes222In this work Java interfaces and Scala traits are treated as abstract classes. In this paper the term ‘class’ thus refers to classes and other similar type-constructing constructs. Also, in other OOP literature parameterized types are sometimes called object types, class types, reference types, generic types, or just types.) together with some novel order-theoretic tools to construct the generic nominal subtyping relation (also a partial ordering, between parameterized types) and the containment relation (a third partial ordering, between generic type arguments). These three ordering relations lie at the heart of mainstream generic OO type systems. Using order theoretic tools, as well as some concepts and tools from category theory, we further analyze these three relations and the relationship between them. Consequently, we demonstrate the value of the approach by exploring extensions of generic OO type systems that are naturally suggested by such analysis.

2 Description

Constructing The Generic Subtyping Relation

The first step in the order-theoretic approach to modeling generics is defining two operators that construct ordering relations (i.e., posets). In particular, the first operator, called ppp and denoted , takes two input posets and a subset of the first poset and constructs a partial poset product [11, 23] of the two input posets. The second operator, called (for wildcards) and denoted , takes as input a bounded poset (i.e., one with top and bottom elements) and constructs a “triangle-shaped” poset—corresponding to wildcard type arguments—that is roughly composed of three copies of the input poset. See [10] for more details on and .

The formal definition of ppp is the order-theoretic counterpart of the definition of partial Cartesian product of graphs presented in [11, 10], while the formal definition of the wildcards operator wc is presented in [10]. It is worthy to note that if the input poset of wc is a chain (i.e., a “straight edge”), then will construct an exact triangle-shaped output poset. The poset constructed by wc is “triangle-shaped” due to the existence of three variant subtyping rules for generic wildcard types in generic OOP (where covariant subtyping, roughly, is modeled by the left side of the triangle, contravariant subtyping is modeled by the right side, while the base of the triangle models invariant subtyping). See [10, 8] for details and examples.

Next, given a finite subclassing relation ,333In it is assumed that a generic class takes one type argument, and that if a generic class extends (i.e., inherits from, or, is a subclass of) another generic class then the superclass is passed the parameter of the subclass as the superclass type argument (e.g., as in the declaration class D<T> extends C<T>, where T, the type parameter of D, is used “as is” as the type argument of superclass C). While we do not expect any significant complications when these simplifying assumptions are relaxed, we keep a discussion of how these assumptions can be relaxed to future work. It is worthy to mention that the second assumption, in some sense, models the most general case (of a type argument passed to the superclass) and that a more complex inheritance relation (such as class D<T> extends C<E<T>>) only restricts the set of valid subtyping relations between instantiations of the subclass (e.g., D) and those of its superclasses (e.g., C). (See later discussion of valid versus admittable subtyping relations). operators ppp and wc are used to construct, iteratively, the infinite subtyping relation between ground parameterized types444Ground parameterized types are ones with no type variables. Such types are infinite in number due to the possibility of having arbitrary-depth nesting of type arguments [8]. Subtyping between these types is the basis for defining the full subtyping relation that includes type variables [27, 33].. In particular, given , a finite approximation of , wc constructs the corresponding wildcard type arguments, ordered by containment. Given and the constructed arguments, ppp then pairs generic classes with these arguments and adds types corresponding to non-generic classes to construct the poset , ordered by subtyping, that next approximates .555This iterative construction process constructs the (least fixed point) solution of the recursive poset equation

where is the subtyping relation, is the subclassing/inheritance relation, and is the subset of generic classes in . See [10] for more details.

The Erasure Galois Connection and Nominal Subtyping

Erasure—where, intuitively, type arguments of a parameterized type are “erased”—is a feature prominent in Java and Java-based OO languages such as Scala and Kotlin, but that can be also defined and made explicit in other generic nominally-typed OOP languages. In the order-theoretic approach to modeling generics, erasure is modeled as a mapping from types to classes.666To model erased types, the erasure mapping is composed with a notion of a default type that maps each generic class to some corresponding parameterized type (i.e., a particular instantiation of the class). We keep further discussion of default type arguments and default types to future work.

Also, in the order-theoretic approach to modeling generics the ‘most general wildcard instantiation’ of a generic class is called the free type corresponding to the class. For example, a generic class C with one type parameter has the type C<?> as its corresponding free type777A non-generic class is mapped to the only type it constructs—a type typically homonymous to the class—as its corresponding free type..

By maintaining a clear separation between classes ordered by subclassing, on one hand, and types ordered by subtyping, on the other, the construction of the subtyping relation using the subclassing relation (as presented earlier, using order-theoretic tools) allows us to observe that a Galois connection [23] exists between the two fundamental relations in generic nominally-typed OOP. This connection between subclassing and subtyping is expressed formally using the erasure and free type mappings.

In particular, if is the erasure mapping that maps a parameterized type to the class used to construct the type, and if is the free type mapping that maps a class to its most general wildcard instantiation, then the connection between subclassing and subtyping states that for all parameterized types and classes we have

(1)

where is the subclassing relation between classes and is the subtyping relation between parameterized types.888For example, in Java the statement

where , in Equation (1) on page 1, is instantiated to type LinkedList<String> and is instantiated to class List, asserts that class LinkedList being a subclass of List is equivalent to (i.e., if and only if, or implies and is implied by) LinkedList<String> being a subtype of the free type List<?>, which is a true statement in Java.,999Based on the strong relation between category theory and order theory—see below—the Galois connection between subclassing and subtyping is called JEA, the Java erasure adjunction.

It should be noted that the erasure Galois connection expresses a fundamental property of generic nominally-typed OOP, namely, that subclassing (a.k.a., type inheritance) is the source of subtyping in generic nominally-typed OOP. In other words, the property states, in one direction, that type inheritance is a source of subtyping (i.e., subclassing causes subtyping between parameterized types) and, in the other direction, that type inheritance is the only source of subtyping in generic nominally-typed OOP (i.e., subtyping between parameterized types comes from nowhere else other than subclassing). This property of generic nominally-typed OOP—stated as ‘inheritance is the source of subtyping’—corresponds to the ‘inheritance is subtyping’ property of non-generic nominally-typed OOP [4, 21].

Extending Generics: Interval Types and Doubly -bounded Generics

The construction of the generic subtyping relation using tools from order theory suggests how generics in nominally-typed OOP languages can be extended in two specific directions.

First, the approach suggests how wildcard type arguments can simultaneously have lower bounds and upper bounds, thereby defining interval type arguments (as a generalization of wildcard type arguments) and defining interval types (as a generalization of wildcard types—which are parameterized types with wildcard type arguments). In particular, interval types and the subtyping relation between them can simply be constructed by replacing the wc operator (that we presented earlier) with a novel operator int () that constructs interval type arguments (and the containment relation between them) from an input subtyping relation.101010The formal definition of operator int is presented in [12]. Unlike the wc operator, operator int does not require the input poset to be bounded, i.e., it does not assume the existence of a greatest type Object and a least type Null in the input (subtyping) relation. The subtyping relation is then iteratively constructed from the subclassing relation, again as the solution of a recursive poset equation.111111Namely, the equation . See [12] for more details.

Second, the approach suggests how to define doubly -bounded generics (dfbg, for short), where a type variable can have both an upper -bound and a lower -bound.121212It is worthy to note that the definition of dfbg got inspiration from functions in real analysis. See [9] for more details. See [9] for more details on dfbg.

Considering dfbg led us to distinguish between valid type arguments, which satisfy declared type parameter bounds, and admittable type arguments, which do not necessarily satisfy bounds, and to thus define valid parameterized types, whose type arguments are valid, and admittable parameterized types, whose type arguments are admittable but may not be valid (such as the admittable but invalid Java type Enum<Object>), and, accordingly, to define valid subtyping relations, between valid parameterized types, and admittable subtyping relations, between two admittable parameterized types that are not necessarily valid.

Induction, Coinduction, (Co)Inductive Types and Mutual (Co)Induction in Generic OOP

In logic, coinductive reasoning can, intuitively, be summarized as asserting that a statement is proven to be true if there is no (finite or “good”) reason for the statement to not hold [30, 14]. While analyzing dfbg in [9], we use a coinductive logical argument to prove that checking the validity of type arguments inside some particular bounds-declarations of generic classes is unnecessary. Also, in [38] Tate et al. conclude that Java wildcards are some form of coinductive bounded existentials.131313Given their historical origins [29, 37], induction and coinduction—and accordingly (co)inductive mathematical objects—are naturally best studied in lattice theory, which is a sub-field of order theory.

Combined, these factors motivated us to consider, in some depth, the status of (co)inductive types in our order-theoretic approach [13], which led us to define the notions of -subtypes and -supertypes of a generic class .141414A parameterized type Ty is an -subtype of class iff Ty <: F<Ty>. Dually, type Ty is an -supertype of class iff F<Ty> <: Ty. The names of these two notions come from category theory (see later discussion), where -subtypes correspond to -coalgebras while -supertypes correspond to -algebras of a generic class (called a functor in category theory, and a generator—or a constructor— in lattice theory and order theory). The value of defining these notions is illustrated in the definition of dfbg, where a type variable, say T, with both a lower -bound and an upper -bound ranges over a set of -supertypes and -subtypes specified by the bounds of T. See [9, 13] for more details.

Further, the mutual dependency between the containment relation (as an ordering of generic type arguments) and the subtyping relation (as an ordering of parameterized types), in addition to the fact that classes in OO programs, including generic classes, are frequently defined mutually-recursively151515E.g., assuming the absence of primitive types in Java, the definitions of classes Object and Boolean are mutually dependent on each other (since class Boolean extends Object, and, without primitive type bool, the fundamental equals() method in Object returns a Boolean)., led us to also define an order-theoretic notion of mutual (co)induction to allow studying least and greatest fixed point solutions of mutually-recursive definitions161616Which are common in OOP but also in programming in general. in an order-theoretic context [16].

Using Category Theory in Modeling Generics

Category theory can be viewed simply as a (major) generalization of order theory [24, 34, 35]. In particular, each poset can be viewed, canonically, as a (thin, small) category [24].

As such, some concepts and tools from category theory, such as adjunctions, monads, -(co)algebras, initial algebras (e.g., co-free types), final coalgebras (e.g., free types), and operads, can be used to generalize the order-theoretic model of generics, and to situate it in the context of category theory.

A more detailed account of the use of category theory in our approach is presented in [17].

3 Discussion

In this short paper we presented the outline of an order-theoretic model of generic nominally-typed OOP. This model demonstrates that in generic nominally-typed OOP:

 The subtyping relation between parameterized types can be constructed solely from the subclassing (i.e., inheritance) relation between classes using order-theoretic tools,

 Erasure can be modeled as a map (i.e., a homomorphism) from parameterized types ordered by subtyping to classes ordered by subclassing,

 Wildcard type arguments can be modeled as intervals over the subtyping relation,171717In particular, intervals with upper bound Object or lower bound Null.

 Generic classes can be modeled as type generators over the subtyping relation181818I.e., as mathematical functions that take in type arguments, ordered by containment, and produce parameterized types, ordered by subtyping.,

 The complex open and close operations (i.e., capture conversion; see [26, 5.1.10, p.113]) are not needed in the definition of the subtyping relation between ground parameterized types191919Ground parameterized types constitute the full set over which type variables of generic classes range., since the relation can be defined exclusively using the containment relation between generic type arguments, and

 Upper -bounded type variables—e.g., type variable T in the class declaration class Enum<T extends Enum<T>>—range over -subtypes (modeled as coinductive types of the type generators modeling generic classes), while lower -bounded type variables range over -supertypes (modeled as inductive types).

Moreover, the model hints that:

 Circular (a.k.a., infinite, or infinitely-justified) subtyping relations can be modeled by an order-theoretic coinductive interpretation [28] of the subtyping relation, and

 Mutually-recursive definitions in OOP (e.g., of classes, and of the subtyping and containment relations) can be modeled by mutually-(co)inductive mathematical objects.

Additionally we observe that, by incorporating nominal subtyping, the presented order-theoretic model of generics crucially depends on the finite inheritance relation between classes202020Since it is always explicitly declared using class names, inheritance/subclassing is an inherently nominal relation.. On the other hand, extant models of generic OOP—which capture conversion and bounded existentials are prominent characteristics of—are inspired by structural (i.e., non-nominal) models of polymorphic functional programming. Those models thus largely ignore the nominal subclassing relation—explicitly declared by OO software developers—when interpreting the generic subtyping relation and other features of generic OOP. Influenced by their origins in functional programming, those models depend instead on concepts and tools212121Such as existentials, abstract datatypes, and the opening/closing of type “packages.” developed for structural typing and structural subtyping.

On account of these observations we believe the order-theoretic model of generics is a significantly simpler and more intuitive model of generic OOP than extant models, and that it is more in the spirit of nominally-typed OO type systems than those models are.

References