Practical Optional Types for Clojure

12/09/2018 ∙ by Ambrose Bonnaire-Sergeant, et al. ∙ Commonwealth Bank Indiana University Bloomington 0

Typed Clojure is an optional type system for Clojure, a dynamic language in the Lisp family that targets the JVM. Typed Clojure enables Clojure programmers to gain greater confidence in the correctness of their code via static type checking while remaining in the Clojure world, and has acquired significant adoption in the Clojure community. Typed Clojure repurposes Typed Racket's occurrence typing, an approach to statically reasoning about predicate tests, and also includes several new type system features to handle existing Clojure idioms. In this paper, we describe Typed Clojure and present these type system extensions, focusing on three features widely used in Clojure. First, multimethods provide extensible operations, and their Clojure semantics turns out to have a surprising synergy with the underlying occurrence typing framework. Second, Java interoperability is central to Clojure's mission but introduces challenges such as ubiquitous null; Typed Clojure handles Java interoperability while ensuring the absence of null-pointer exceptions in typed programs. Third, Clojure programmers idiomatically use immutable dictionaries for data structures; Typed Clojure handles this with multiple forms of heterogeneous dictionary types. We provide a formal model of the Typed Clojure type system incorporating these and other features, with a proof of soundness. Additionally, Typed Clojure is now in use by numerous corporations and developers working with Clojure, and we present a quantitative analysis on the use of type system features in two substantial code bases.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Clojure with static typing

The popularity of dynamically-typed languages in software development, combined with a recognition that types often improve programmer productivity, software reliability, and performance, has led to the recent development of a wide variety of optional and gradual type systems aimed at checking existing programs written in existing languages. These include TypeScript [typescript] and Flow [flow] for JavaScript, Hack [hack] for PHP, and mypy [mypy] for Python among the optional systems, and Typed Racket [TF08], Reticulated Python [Vitousek14], and Gradualtalk [gradualtalk] among gradually-typed systems.111We use “gradual typing” for systems like Typed Racket with sound interoperation between typed and untyped code; Typed Clojure or TypeScript which don’t enforce type invariants we describe as “optionally typed”.

One key lesson of these systems, indeed a lesson known to early developers of optional type systems such as Strongtalk, is that type systems for existing languages must be designed to work with the features and idioms of the target language. Often this takes the form of a core language, be it of functions or classes and objects, together with extensions to handle distinctive language features.

We synthesize these lessons to present Typed Clojure, an optional type system for Clojure. Clojure is a dynamically typed language in the Lisp family—built on the Java Virtual Machine (JVM)—which has recently gained popularity as an alternative JVM language. It offers the flexibility of a Lisp dialect, including macros, emphasizes a functional style via immutable data structures, and provides interoperability with existing Java code, allowing programmers to use existing Java libraries without leaving Clojure. Since its initial release in 2007, Clojure has been widely adopted for “backend” development in places where its support for parallelism, functional programming, and Lisp-influenced abstraction is desired on the JVM. As a result, there is an extensive base of existing untyped programs whose developers can benefit from Typed Clojure, an experience we discuss in this paper.

Since Clojure is a language in the Lisp family, we apply the lessons of Typed Racket, an existing gradual type system for Racket, to the core of Typed Clojure, consisting of an extended -calculus over a variety of base types shared between all Lisp systems. Furthermore, Typed Racket’s occurrence typing has proved necessary for type checking realistic Clojure programs.

(*typed ann pname [(U File String) -> (U nil String)] typed*)
(defmulti pname class)  ; multimethod dispatching on class of argument
(defmethod pname String [s] (*invoke pname (*interop new File s interop*) invoke*)) ; String case
(defmethod pname File [f] (*interop .getName f interop*)) ; File case, static null check
(*invoke pname STAINS/JELLY invoke*) ;=> JELLY :- (U nil Str)
Figure 1: A simple Typed Clojure program (delimiters: Java interoperation (green), type annotation (blue), function invocation (black), collection literal (red), other (gray))

However, Clojure goes beyond Racket in many ways, requiring several new type system features which we detail in this paper. Most significantly, Clojure supports, and Clojure developers use, multimethods to structure their code in extensible fashion. Furthermore, since Clojure is an untyped language, dispatch within multimethods is determined by application of dynamic predicates to argument values. Fortunately, the dynamic dispatch used by multimethods has surprising symmetry with the conditional dispatch handled by occurrence typing. Typed Clojure is therefore able to effectively handle complex and highly dynamic dispatch as present in existing Clojure programs.

But multimethods are not the only Clojure feature crucial to type checking existing programs. As a language built on the Java Virtual Machine, Clojure provides flexible and transparent access to existing Java libraries, and Clojure/Java interoperation is found in almost every significant Clojure code base. Typed Clojure therefore builds in an understanding of the Java type system and handles interoperation appropriately. Notably, null is a distinct type in Typed Clojure, designed to automatically rule out null-pointer exceptions.

An example of these features is given in fig:ex1. Here, the pname multimethod dispatches on the class of the argument—for Strings, the first method implementation is called, for Files, the second. The String method calls a File constructor, returning a non-nil File instance—the getName method on File requires a non-nil target, returning a nilable type.

Finally, flexible, high-performance immutable dictionaries are the most common Clojure data structure. Simply treating them as uniformly-typed key-value mappings would be insufficient for existing programs and programming styles. Instead, Typed Clojure provides a flexible heterogenous map type, in which specific entries can be specified.

While these features may seem disparate, they are unified in important ways. First, they leverage the type system mechanisms inherited from Typed Racket—multimethods when using dispatch via predicates, Java interoperation for handling null tests, and heterogenous maps using union types and reasoning about subcomponents of data. Second, they are crucial features for handling Clojure code in practice. Typed Clojure’s use in real Clojure deployments would not be possible without effective handling of these three Clojure features.

Our main contributions are as follows:

  1. We motivate and describe Typed Clojure, an optional type system for Clojure that understands existing Clojure idioms.

  2. We present a sound formal model for three crucial type system features: multi-methods, Java interoperability, and heterogenous maps.

  3. We evaluate the use of Typed Clojure features on existing Typed Clojure code, including both open source and in-house systems.

The remainder of this paper begins with an example-driven presentation of the main type system features in sec:overview. We then incrementally present a core calculus for Typed Clojure covering all of these features together in sec:formal and prove type soundness (sec:metatheory). We then present an empirical analysis of significant code bases written in —the full implementation of Typed Clojure—in sec:experience. Finally, we discuss related work and conclude.

2 Overview of Typed Clojure

We now begin a tour of the central features of Typed Clojure, beginning with Clojure itself. Our presentation uses the full Typed Clojure system to illustrate key type system ideas,222Full examples: https://github.com/typedclojure/esop16 before studying the core features in detail in sec:formal.

2.1 Clojure

Clojure [Hic08] is a Lisp that runs on the Java Virtual Machine with support for concurrent programming and immutable data structures in a mostly-functional style. Clojure provides easy interoperation with existing Java libraries, with Java values being like any other Clojure value. However, this smooth interoperability comes at the cost of pervasive null, which leads to the possibility of null pointer exceptions—a drawback we address in Typed Clojure.

2.2 Typed Clojure

A simple one-argument function greet is annotated with ann to take and return strings.

(*typed ann  greet [Str -> Str] typed*)
(defn greet [n] (*invoke str Hello,  n ”!” invoke*))
(*invoke greet Grace invoke*) ;=> Hello, Grace!” :- Str

Providing nil (exactly Java’s null) is a static type error—nil is not a string.

(*invoke greet nil invoke*) ; Type Error: Expected Str, given nil

Unions

To allow nil, we use ad-hoc unions (nil and false are logically false).

(*typed ann  greet-nil [(U nil Str) -> Str] typed*)
(defn greet-nil [n] (*invoke str Hello (when n (*invoke str ”,  n invoke*)) ”!” invoke*))
(*invoke greet-nil Donald invoke*) ;=> Hello, Donald!” :- Str
(*invoke greet-nil nil invoke*)      ;=> Hello!”         :- Str

Typed Clojure prevents well-typed code from dereferencing nil.

Flow analysis

Occurrence typing [TF10] models type-based control flow. In greetings, a branch ensures repeat is never passed nil.

(*typed ann  greetings [Str (U nil Int) -> Str] typed*)
(defn greetings [n i]
  (*invoke str Hello,  (when i (*invoke apply str (*invoke repeat i hello,  invoke*) invoke*)) n ”!” invoke*))
(*invoke greetings Donald 2 invoke*)  ;=> Hello, hello, hello, Donald!” :- Str
(*invoke greetings Grace nil invoke*) ;=> Hello, Grace!”                :- Str

Removing the branch is a static type error—repeat cannot be passed nil.

(*typed ann  greetings-bad [Str (U nil Int) -> Str] typed*)
(defn greetings-bad [n i]           ; Expected Int, given (U nil Int)
  (*invoke str Hello,  (*invoke apply str (*invoke repeat i hello,  invoke*) invoke*) n ”!” invoke*))

2.3 Java interoperability

Clojure can interact with Java constructors, methods, and fields. This program calls the getParent on a constructed File instance, returning a nullable string.

Example 1

(*interop .getParent (*interop new File a/b interop*) interop*)  ;=> a :- (U nil Str)

Typed Clojure can integrate with the Clojure compiler to avoid expensive reflective calls like getParent, however if a specific overload cannot be found based on the surrounding static context, a type error is thrown.

(fn [f] (*interop .getParent f interop*)) ; Type Error: Unresolved interop: getParent

Function arguments default to Any, which is similar to a union of all types. Ascribing a parameter type allows Typed Clojure to find a specific method.

Example 2

(*typed ann parent [(U nil File) -> (U nil Str)] typed*)
(defn parent [f] (if f (*interop .getParent f interop*) nil))

The conditional guards from dereferencing nil, and—as before—removing it is a static type error, as typed code could possibly dereference nil.

(defn parent-bad-in [f :- (U nil File)]
  (*interop .getParent f interop*)) ; Type Error: Cannot call instance method on nil.

Typed Clojure rejects programs that assume methods cannot return nil.

(defn parent-bad-out [f :- File] :- Str
  (*interop .getParent f interop*)) ; Type Error: Expected Str, given (U nil Str).

Method targets can never be nil. Typed Clojure also prevents passing nil as Java method or constructor arguments by default—this restriction can be adjusted per method.

In contrast, JVM invariants guarantee constructors return non-null.333http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.9.4

Example 3

(*invoke parent (*interop new File s interop*) invoke*)

2.4 Multimethods

Multimethods are a kind of extensible function—combining a dispatch function with one or more methods—widely used to define Clojure operations.

Value-based dispatch

This simple multimethod takes a keyword (Kw) and says hello in different languages.

Example 4

(*typed ann hi [Kw -> Str] typed*) ; multimethod type
(defmulti hi identity) ; dispatch function identity
(defmethod hi :en [_] hello”) ; method for ‘:en
(defmethod hi :fr [_] bonjour”) ; method for ‘:fr
(defmethod hi :default [_] um…”) ; default method

When invoked, the arguments are first supplied to the dispatch function—identity—yielding a dispatch value. A method is then chosen based on the dispatch value, to which the arguments are then passed to return a value.

(*invoke map hi [*vec :en :fr :bocce vec*] invoke*) ;=> (*list hello bonjour um…” list*)

For example, (*invoke hi :en invoke*) evaluates to hello—it executes the :en method because (*invoke = (*invoke identity :en invoke*) :en invoke*) is true and (*invoke = (*invoke identity :en invoke*) :fr invoke*) is false.

Dispatching based on literal values enables certain forms of method definition, but this is only part of the story for multimethod dispatch.

Class-based dispatch

For class values, multimethods can choose methods based on subclassing relationships. Recall the multimethod from fig:ex1. The dispatch function class dictates whether the String or File method is chosen. The multimethod dispatch rules use isa?, a hybrid predicate which is both a subclassing check for classes and an equality check for other values.

(*invoke isa? :en :en invoke*)       ;=> true
(*invoke isa? String Object invoke*) ;=> true

The current dispatch value and—in turn—each method’s associated dispatch value is supplied to isa?. If exactly one method returns true, it is chosen. For example, the call (*invoke pname STAINS/JELLY invoke*) picks the String method because (*invoke isa? String String invoke*) is true, and (*invoke isa? String File invoke*) is not.

2.5 Heterogeneous hash-maps

The most common way to represent compound data in Clojure are immutable hash-maps, typicially with keyword keys. Keywords double as functions that look themselves up in a map, or return nil if absent.

Example 5

(def breakfast {*map :en waffles :fr croissants map*})
(*invoke :en breakfast invoke*)    ;=> waffles :- Str
(*invoke :bocce breakfast invoke*) ;=> nil       :- nil

HMap types describe the most common usages of keyword-keyed maps.

breakfast ; :- (HMap :mandatory {:en Str, :fr Str}, :complete? true)

This says :en and :fr are known entries mapped to strings, and the map is fully specified—that is, no other entries exist—by :complete? being true.

HMap types default to partial specification, with ’\{:en Str :fr Str\} abbreviating (HMap :mandatory \{:en Str, :fr Str\}).

Example 6

(*typed ann lunch ’{:en Str :fr Str} typed*)
(def lunch {*map :en muffin :fr baguette map*})
(*invoke :bocce lunch invoke*) ;=> nil :- Any ; less accurate type

HMaps in practice

The next example is extracted from a production system at CircleCI, a company with a large production Typed Clojure system (sec:casestudy presents a case study and empirical result from this code base).

Example 7

(*typed defalias RawKeyPair ; extra keys disallowed
  (HMap :mandatory {:pub RawKey, :priv RawKey},
        :complete? true) typed*)
(*typed defalias EncKeyPair ; extra keys disallowed
  (HMap :mandatory {:pub RawKey, :enc-priv EncKey}, :complete? true) typed*)
(*typed ann enc-keypair [RawKeyPair -> EncKeyPair] typed*)
(defn enc-keypair [kp]
  (*invoke assoc (*invoke dissoc kp :priv invoke*) :enc-priv (*invoke encrypt (*invoke :priv kp invoke*) invoke*) invoke*))

As EncKeyPair is fully specified, we remove extra keys like :priv via dissoc, which returns a new map that is the first argument without the entry named by the second argument. Notice removing dissoc causes a type error.

(defn enc-keypair-bad [kp] ; Type error: :priv disallowed
  (*invoke assoc kp :enc-priv (*invoke encrypt (*invoke :priv kp invoke*) invoke*) invoke*))

2.6 HMaps and multimethods, joined at the hip

HMaps and multimethods are the primary ways for representing and dispatching on data respectively, and so are intrinsically linked. As type system designers, we must search for a compositional approach that can anticipate any combination of these features.

Thankfully, occurrence typing, originally designed for reasoning about if tests, provides the compositional approach we need. By extending the system with a handful of rules based on HMaps and other functions, we can automatically cover both easy cases and those that compose rules in arbitrary ways.

Futhermore, this approach extends to multimethod dispatch by reusing occurrence typing’s approach to conditionals and encoding a small number of rules to handle the isa?-based dispatch. In practice, conditional-based control flow typing extends to multimethod dispatch, and vice-versa.

We first demonstrate a very common, simple dispatch style, then move on to deeper structural dispatching where occurrence typing’s compositionality shines.

HMaps and unions

Partially specified HMap’s with a common dispatch key combine naturally with ad-hoc unions. An Order is one of three kinds of HMaps.

(*typed defalias Order A meal order, tracking dessert quantities.”
  (U ’{:Meal ’:lunch, :desserts Int} ’{:Meal ’:dinner :desserts Int}
     ’{:Meal ’:combo :meal1 Order :meal2 Order}) typed*)

The :Meal entry is common to each HMap, always mapped to a known keyword singleton type. It’s natural to dispatch on the class of an instance—it’s similarly natural to dispatch on a known entry like :Meal.

Example 8

(*typed ann desserts [Order -> Int] typed*)
(defmulti desserts :Meal)  ; dispatch on :Meal entry
(defmethod desserts :lunch [o] (*invoke :desserts o invoke*))
(defmethod desserts :dinner [o] (*invoke :desserts o invoke*))
(defmethod desserts :combo [o]
  (*invoke + (*invoke desserts (*invoke :meal1 o invoke*) invoke*) (*invoke desserts (*invoke :meal2 o invoke*) invoke*) invoke*))
(*invoke desserts {*map :Meal :combo, :meal1 {*map :Meal :lunch :desserts 1 map*},
           :meal2 {*map :Meal :dinner :desserts 2 map*} map*} invoke*) ;=> 3

The :combo method is verified to only structurally recur on Orders. This is achieved because we learn the argument o must be of type ’\{:Meal ’:combo\} since (isa? (:Meal o) :combo) is true. Combining this with the fact that o is an Order eliminates possibility of :lunch and :dinner orders, simplifying o to ’\{:Meal ’:combo :meal1 Order :meal2 Order\} which contains appropriate arguments for both recursive calls.

Nested dispatch

A more exotic dispatch mechanism for desserts might be on the class of the :desserts key. If the result is a number, then we know the :desserts key is a number, otherwise the input is a :combo meal. We have already seen dispatch on class and on keywords in isolation—occurrence typing automatically understands control flow that combines its simple building blocks.

The first method has dispatch value Long, a subtype of Int, and the second method has nil, the sentinel value for a failed map lookup. In practice, :lunch and :dinner meals will dispatch to the Long method, but Typed Clojure infers a slightly more general type due to the definition of :combo meals.

Example 9

(*typed ann desserts [Order -> Int] typed*)
(defmulti desserts
  (fn [o :- Order] (*invoke class (*invoke :desserts o invoke*) invoke*)))
(defmethod desserts Long [o]
;o :- (U ’{:Meal (U ’:dinner ’:lunch), :desserts Int}
;       ’{:Meal ’:combo, :desserts Int, :meal1 Order, :meal2 Order})
  (*invoke :desserts o invoke*))
(defmethod desserts nil [o]
  ; o :- ’{:Meal ’:combo, :meal1 Order, :meal2 Order}
  (*invoke + (*invoke desserts (*invoke :meal1 o invoke*) invoke*) (*invoke desserts (*invoke :meal2 o invoke*) invoke*) invoke*))

In the Long method, Typed Clojure learns that its argument is at least of type ’\{:desserts Long\}—since (*invoke isa? (*invoke class (*invoke :desserts o invoke*) invoke*) Long invoke*) must be true. Here the :desserts entry must be present and mapped to a Long—even in a :combo meal, which does not specify :desserts as present or absent.

In the nil method, (*invoke isa? (*invoke class (*invoke :desserts o invoke*) invoke*) nil invoke*) must be true—which implies (*invoke class (*invoke :desserts o invoke*) invoke*) is nil. Since lookups on missing keys return nil, either

  • o maps the :desserts entry to nil, like the value \{:desserts nil\}, or

  • o is missing a :desserts entry.

We can express this type with the :absent-keys HMap option

(U ’{:desserts nil} (HMap :absent-keys #{:desserts}))

This eliminates non-:combo meals since their ’\{:desserts Int\} type does not agree with this new information (because :desserts is neither nil or absent).

From multiple to arbitrary dispatch

Clojure multimethod dispatch, and Typed Clojure’s handling of it, goes even further, supporting dispatch on multiple arguments via vectors. Dispatch on multiple arguments is beyond the scope of this paper, but the same intuition applies—adding support for multiple dispatch admits arbitrary combinations and nestings of it and previous dispatch rules.

3 A Formal Model of

After demonstrating the core features of Typed Clojure, we link them together in a formal model called . Building on occurrence typing, we incrementally add each novel feature of Typed Clojure to the formalism, interleaving presentation of syntax, typing rules, operational semantics, and subtyping.

3.1 Core type system

We start with a review of occurrence typing [TF10], the foundation of .

Expressions

Syntax is given in main:figure:termsyntax. Expressions include variables , values ˇ, applications, abstractions, conditionals, and let expressions. All binding forms introduce fresh variables—a subtle but important point since our type environments are not simply dictionaries. Values include booleans , , class literals , keywords , integers , constants , and strings . Lexical closures - close value environments —which map bindings to values—over functions.

Types

Types or - include the top type , untagged unions , singletons , and class instances . We abbreviate the classes to , to , to , to , and to . We also abbreviate the types to , to , to , and to . The difference between the types and is subtle. The former is inhabited by class literals like and the result of a—the latter by instances of classes, like a keyword literal a, an instance of the type . Function types contain latent (terminology from [Lucassen88polymorphiceffect]) propositions , object , and return type -, which may refer to the function argument . They are instantiated with the actual object of the argument in applications.

Objects

Each expression is associated with a symbolic representation called an object. For example, variable m has object m; has object ; and has the empty object since it is unimportant in our system. main:figure:termsyntax gives the syntax for objects —non-empty objects {} combine of a root variable and a path , which consists of a possibly-empty sequence of path elements () applied right-to-left from the root variable. We use two path elements— and k—representing the results of calling and looking up a keyword , respectively.

Propositions with a logical system

In standard type systems, association lists often track the types of variables, like in LC-Let and LC-Local. [LC-Let] 1
, ↦ 2 - 12 -

[LC-Local] () = - -

Occurrence typing instead pairs logical formulas, that can reason about arbitrary non-empty objects, with a proof system. The logical statement says variable is of type . [T0-Let] 1
, 2 - 12 -

[T0-Local] - - In T0-Local, appeals to the proof system to solve for -.