Type-Level Computations for Ruby Libraries

04/06/2019 ∙ by Milod Kazerounian, et al. ∙ University of Maryland Tufts University IMDEA Networks Institute 0

Many researchers have explored ways to bring static typing to dynamic languages. However, to date, such systems are not precise enough when types depend on values, which often arises when using certain Ruby libraries. For example, the type safety of a database query in Ruby on Rails depends on the table and column names used in the query. To address this issue, we introduce CompRDL, a type system for Ruby that allows library method type signatures to include type-level computations (or comp types for short). Combined with singleton types for table and column names, comp types let us give database query methods type signatures that compute a table's schema to yield very precise type information. Comp types for hash, array, and string libraries can also increase precision and thereby reduce the need for type casts. We formalize CompRDL and prove its type system sound. Rather than type check the bodies of library methods with comp types---those methods may include native code or be complex---CompRDL inserts run-time checks to ensure library methods abide by their computed types. We evaluated CompRDL by writing annotations with type-level computations for several Ruby core libraries and database query APIs. We then used those annotations to type check two popular Ruby libraries and four Ruby on Rails web apps. We found the annotations were relatively compact and could successfully type check 132 methods across our subject programs. Moreover, the use of type-level computations allowed us to check more expressive properties, with fewer manually inserted casts, than was possible without type-level computations. In the process, we found two type errors and a documentation error that were confirmed by the developers. Thus, we believe CompRDL is an important step forward in bringing precise static type checking to dynamic languages.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

There is a large body of research on adding static typing to dynamic languages (Furr et al., 2009; Ren and Foster, 2016; Ren et al., 2013; Tobin-Hochstadt and Felleisen, 2006, 2008; Anderson et al., 2005; Lerner et al., 2013; Thiemann, 2005; Ancona et al., 2007; Aycock, 2000). However, existing systems have limited support for the case when types depend on values. Yet this case occurs surprisingly often, especially in Ruby libraries. For example, consider the following database query, written for a hypothetical Ruby on Rails (a web framework, called Rails henceforth) app:

Person.joins(:apartments).where({name: Alice, age: 30, apartments: {bedrooms: 2}})

This query uses the ActiveRecord DSL to join two database tables, people111Rails knows the plural of person is people. and apartments, and then filter on the values of various columns (name, age, bedrooms) in the result.

We would like to type check such code, e.g., to ensure the columns exist and the values being matched are of the right types. But we face an important problem: what type signature do we give joins? Its return type—which should describe the joined table—depends on the value of its argument. Moreover, for tables, there are ways to join two of them, ways to join three of them, etc. Enumerating all these combinations is impractical.

To address this problem, in this paper we introduce CompRDL, which extends RDL (Foster et al., 2018), a Ruby type system, to include method types with type-level computations, henceforth referred to as comp types. More specifically, in CompRDL we can annotate library methods with type signatures in which Ruby expressions can appear as types. During type checking, those expressions are evaluated to produce the actual type signature, and then typing proceeds as usual. For example, for the call to Person.joins, by using a singleton type for :apartments, a type-level computation can look up the database schemas for the receiver and argument and then construct an appropriate return type.222The use of type-level computations and singleton types could be considered dependent typing, but as our type system is much more restricted we introduce new terminology to avoid confusion (see § 2.4 for discussion).

Moreover, the same type signature can work for any model class and any combination of joins. And, because CompRDL allows arbitrary computation in types, CompRDL type signatures have access to the full, highly dynamic Ruby environment. This allows us to provide very precise types for the large set of Rails database query methods. It also lets us give precise types to methods of finite hash types (heterogeneous hashes), tuple types (heterogeneous arrays), and const string types (immutable strings), which can help eliminate type casts that would otherwise be required.

Note that in all these cases, we apply comp types to library methods whose bodies we do not type check, in part to avoid complex, potentially undecidable reasoning about whether a method body matches a comp type, but more practically because those library methods are either implemented in native code (hashes, arrays, strings) or are complex (database queries). This design choice makes CompRDL a particularly practical system which we can apply to real-world programs. To maintain soundness, we insert dynamic checks to ensure that these methods abide by their computed types at runtime. (§ 2 gives an overview of typing in CompRDL.)

We introduce , a core, object-oriented language that formalizes CompRDL type checking. In , library methods can be declared with signatures of the form , where and are the conventional (likely overapproximate) argument and return types of the method. The precise argument and return types are determined by evaluating and , respectively, and that evaluation may refer to the type of the receiver and the type of the argument. also performs type checking on and , to ensure they do not go wrong. To avoid potential infinite recursion, does not use type-level computations during this type checking process, instead using the conventional types for library methods. Finally, includes a rewriting step to insert dynamic checks to ensure library methods abide by their computed types. We prove ’s type system is sound. (See § 3 for our formalism.)

We implemented CompRDL on top of RDL, an existing Ruby type checker. Since CompRDL can include type-level computation that relies on mutable values, CompRDL inserts additional runtime checks to ensure such computations evaluate to the same result at method call time as they did at type checking time. Additionally, CompRDL uses a lightweight analysis to check that type-level computations (and thus type checking) terminate. The termination analysis uses purity effects to check that calls that invoke iterator methods—the main source of looping in Ruby, in our experience—do not mutate the receiver, which could introduce non-termination. Finally, we found that several kinds of comp types we developed needed to include weak type updates to handle mutation in Ruby programs. (§ 4 describes our implementation in more detail.)

We evaluated CompRDL by first using it to write type annotations for 482 Ruby core library methods and 104 Rails database query methods. We found that by using helper methods, we could write very precise type annotations for all 586 methods with just a few lines of code on average. Then, we used those annotations to type check 132 methods across two Ruby APIs and four Ruby on Rails web apps. We were able to successfully type check all these methods in approximately 15 seconds total. In doing so, we also found two type errors and a documentation error, which we confirmed with the developers. We also found that, with comp types, type checking these benchmarks required 4.75 fewer type cast annotations compared to standard types, demonstrating comp types’ increased precision. (§ 5 contains the results of our evaluation.)

Our results suggest that using type-level computations provides a powerful, practical, and precise way to statically type check code written in dynamic languages.

2. Overview

The starting point for our work is RDL (Foster et al., 2018), a system for adding type checking and contracts to Ruby programs. RDL’s type system is notable because type checking statically analyzes source code, but it does so at runtime. For example, line 0(a) in Figure 0(a) gives a type signature for the method defined on the subsequent line. This “annotation” is actually a call to the method type,333In Ruby, parentheses in a method call are optional. which stores the type signature in a global table. The type annotation includes a label :model. (In Ruby, strings prefixed by colon are symbols, which are interned strings.) When the program subsequently calls RDL.do_typecheck:model (not shown), RDL will type check the source code of all methods whose type annotations are labeled :model.

This design enables RDL to support the metaprogramming that is common in Ruby and ubiquitous in Rails. For example, the programmer can perform type checking after metaprogramming code has run, when corresponding type definitions are available. See Ren and Foster (2016) for more details. We note that while CompRDL benefits from this runtime type checking approach—we use RDL’s representation of types in our CompRDL signatures, and our subject programs include Rails apps—there is nothing specific in the design of comp types that relies on it, and one could implement comp types in a fully static system.

2.1. Typing Ruby Database Queries

While RDL’s type system is powerful enough to type check Rails apps in general, it is actually very imprecise when reasoning about database (DB) queries. For example, consider Figure 0(a), which shows some code from the Discourse app. Among others, this app uses two tables, users and emails, whose schemas are shown on lines 0(a) and 0(a). Each user has an id, a username, and a flag indicating whether the account was staged. Such staged accounts were created automatically by Discourse and can be claimed by the email address owner. An email has an id, the email address, and the user_id of the user who owns the email address.

1# Table Schema
2# users: { id: Integer, username: String, staged: bool } 
3# emails: { id: Integer, email: String, user_id: Integer } 
4
5class User < ActiveRecord::Base
6 type ”(String, String) -> %bool, typecheck: :model
7 def self.available?(name, email)
8 return false if reserved?(name)
9 return true if !User.exists?({username: name})
10 # staged user accounts can be claimed
11 return User.joins(:emails).exists?({staged: true, username: name, emails: { email: email }})
12 end
13end
(a) Discourse code (uses ActiveRecord).
1type Table, :exists?, ”(<<schema_type(tself)>>) -> Boolean
2type Table, :joins, ”(t<:Symbol) ->
3        <<if t.is_a?(Singleton)
4        then Generic.new(Table, schema_type(tself).merge( {t.val=>schema_type(t)}))
5        else Nominal.new(Table)
6        end >>”
7
8def schema_type(t)
9 if t.is_a?(Generic) && (t.base == Table) # Table<T>
10 return t.param # return T
11 elsif t.is_a?(Singleton) # Class or :symbol
12 table_name = t.val # get the class/symbol vale
13 table_type = RDL.db_schema[table_name]
14 return table_type.param
15 else # will only be reached for the nominal type Table
16 return … # returns Hash<Symbol, Object>
17 end
18end
(b) Comp type annotations for query methods.
Figure 1. Type Checking Database Queries in Discourse.

Next in the figure, we show code for the class User, which is a model, i.e., instances of the class correspond to rows in the users table. This class has one method, available?, which returns a boolean indicating whether the username and email address passed as arguments are available. The method first checks whether the username was already reserved (line 0(a), note the postfix if). If not, it uses the database query method exists? to see if the username was already taken (line 0(a)). (Note that in Ruby, \{a: b\} is a hash that maps the symbol :a, which is suffixed with a colon when used as a key, to the value b.) Otherwise, line 0(a) uses a more complex query to check whether an account was staged. More specifically, this code joins the users and emails table and then looks for a match across the joined tables.

We would like to type check the exists? calls in this code to ensure they are type correct, meaning that the columns they refer to exist and the values being matched are of the right type. The call on line 0(a) is easy to check, as RDL can type the receiver User as having an exists? method that takes a particular finite hash type \{c1: t1, …, cn: tn\} as an argument, where the ci are singleton types for symbols naming the columns, and the ti are the corresponding column types.

Unfortunately, the exists? call on line 0(a) is another story. Notice that this query calls exists? on the result of User.joins(:emails). Thus, to give exists? a type with the right column information, we need to have that information reflected in the return type of joins. Unfortunately, there is no reasonable way to do this in RDL, because the set of columns in the table returned by joins depends on both the receiver and the value of the argument. We could in theory overload joins with different return types depending on the argument type—e.g., we could say that User.joins returns a certain type when the argument has singleton type :emails. However, we would need to generate such signatures for every possible way of joining two tables together, three tables together, etc., which quickly blows up. Thus, currently, RDL types this particular exists? call as taking a Hash<Symbol,Object>, which would allow type-incorrect arguments.

Comp types for DB Queries.

To address this problem, CompRDL allows method type signatures to include computations that can, on-the-fly, determine the method’s type. Figure 0(b) gives comp type signatures for exists? and joins. It also shows the definition of a helper method, schema_type, that is called from the comp types. The comp types also make use of a new generic type Table<T> to type a DB table whose columns are described by T, which should be a finite hash type.

Line 0(b) gives the type of exists?. Its argument is a comp type, which is a Ruby expression, delimited by , that evaluates to a standard type. When type checking a call to exists? (including those in the body of available?), CompRDL runs the comp type code to yield a standard type, and then proceeds with type checking as usual with that type.

In this case, to compute the argument type for exists?, we call the helper method schema_type with tself, which is a reserved variable naming the type of the receiver. The schema_type method has a few different behaviors depending on its argument. When given a type Table<T>, it returns T, i.e., the finite hash type describing the columns. When given a singleton type representing a class or a symbol, it uses another helper method RDL.db_schema (not shown) to look up the corresponding table’s schema and return an appropriate finite hash type. Given any other type, schema_type falls back to returning the type Hash<Symbol, Object>.

This type signature already allows us to type check the exists? call on line 0(a). On this line, the receiver has the singleton type for the User class, so schema_type will use the second arm of the conditional and look up the schema for User in the DB.

Line 0(b) shows the comp type signature for joins. The signature’s input type binds t to the actual argument type, and requires it to be a subtype of Symbol. For example, for the call on line 0(a), t will be bound to the singleton type for :emails. The return comp type can then refer to t. Here, if t is a singleton type, joins returns a new Table type that merges the schemas of the receiver and the argument tables using schema\_type. Otherwise, it falls back to producing a Table with no schema information. Thus, the joins call on line 0(a) returns type

Table¡{staged:%bool, username:String, id: Integer,

emails: {email:String, user_id: Integer }}¿

That is, the type reflects the schemas of both the users and emails tables. Given this type, we can now type check the exists? call on line 0(a) precisely. On this line, the receiver has the table type given above, so when called by exists? the helper schema_type will use the first arm of the conditional and return the Table column types, ensuring the query is type checked precisely.

Though we have only shown types for two query methods in the figure, we note that comp types are easily extensible to other kinds of queries. Indeed, we have applied them to 104 methods across two DB query frameworks (§ 5). Furthermore, we can also use comp types to encode sophisticated invariants. For example, in Rails, database tables can only be joined if the corresponding classes have a declared association. We can write a comp type for joins that enforces this. (We omitted this in Figure 1 for brevity.)

Finally, we note that while we include a “fallback” case that allows comp types to default to less precise types when necessary, in practice this is rarely necessary for DB queries. That is, parameters that are important for type checking, such as the name of tables being queried or joined, or the names of columns be queried, are almost always provided statically in the code.

2.2. Avoiding Casts using Comp Types

1type Hash, :[], ”(k) -> v
2type Array, :first, ”() -> a
3type :page, ”() -> {info: Array<String>, title: String}”
4
5type ”() -> String
6def image_url()
7 page[:info].first # can’t type check
8 # Fix: RDL.type_cast(page[:info], ”Array<String>”).first 
9 end
Figure 2. Type Casts in a Method.

In addition to letting us find type errors in code we could not previously type check precisely enough, the increased precision of comp types can also help eliminate type casts.

For example, consider the code in Figure 2. The first line gives the type signature for a method of Hash, which is parameterized by a key type k and a value type v (declarations of the parameters not shown). The specific method is Hash#[],444Here we use the Ruby idiom that A#m refers to the instance method m of class A. which, given a key, returns the corresponding value. Notably, the form x[k] is desugared to x.[](k), and thus hash lookup, array index, and so forth are methods rather than built-in language constructs.

The second line similarly gives a type for Array#first, which returns the first element of the array. Here type variable a is the array’s contents type (declaration also not shown). The third line gives a type for a method page of the current class, which takes no arguments and returns a hash in which :info is mapped to an Array<String> and :title is mapped to a String.

Now consider type checking the image_url method defined at the bottom of the figure. This code is extracted and simplified from a Wikipedia client library used in our experiments (§ 5). Here, since page is a no-argument method, it can be invoked without any parentheses. We then invoke Hash#[] on the result.

Unfortunately, at this point type checking loses precision. The problem is that whenever a method is invoked on a finite hash type \{c1: t1, …, cn: tn\}, RDL (retroactively) gives up tracking the type precisely and promotes it to Hash<Symbol, t1 or…ortn> (Foster et al., 2018). In this case, page’s return type is promoted to Hash<Symbol, Array<String> or String>.

Now the type checker gets stuck. It reasons that first could be invoked on an array or a string, but first is defined only for the former and not the latter. The only currently available fix is to insert a type cast, as shown in the comment on line 2.

One possible solution would be to add special-case support for [] on finite hash types. However, this is only one of 54 methods of Hash, which is a lot of behavior to special-case. Moreover, Ruby programs can monkey patch any class, including Hash, to change library methods’ behaviors. This makes building special support for those methods inelegant and potentially brittle since the programmer would have no way to adjust the typing of those methods.

In CompRDL, we can solve this problem with a comp type annotation. More specifically, we can give Hash#[] the following type:

type Hash, :[], ”(t<:Object) ->
<<if tself.is_a?(FiniteHash) && t.is_a?(Singleton)
then tself.elts[t.val]
else tself.value_type end>>”

This comp type specifies that if the receiver has a finite hash type and the key has a singleton type, then Hash#[] returns the type corresponding to the key, otherwise it returns a value type covering all possible values (computed by value_type, definition not shown).

Notice that this signature allows image_url to type check without any additional casts. The same idea can be applied to many other Hash methods to give them more precise types.

Tuple Types.

In addition to finite hash types, RDL has a special tuple type to model heterogeneous Arrays. As with finite hash types, RDL does not special-case the Array methods for tuples, since there are 124 of them. This leads to a loss of precision when invoking methods on values with tuple types. However, analogously to finite hash tables, comp types can be used to recover precision. As examples, the Array#first method can be given a comp type which returns the type of the first element of a tuple, and the comp type for Array#[] has essentially the same logic as Hash#[].

Const String Types.

As another example, Ruby strings are mutable, hence RDL does not give them singleton types. (In contrast, Ruby symbols are immutable.) This is problematic, because types might depend on string values. In particular, in the next section we explore reasoning about string values during type checking raw SQL queries.

Using comp types, we can assign singleton types to strings wherever possible. We introduce a new const string type representing strings that are never written to. CompRDL treats const strings as singletons, and methods on String are given comp types that perform precise operations on const strings and fall back to the String type as needed. We discuss handling mutation for const strings, finite hashes, and tuples in Section 4.

2.3. SQL Type Checking

1# Table Schema
2# posts table { id: Integer, topic_id: Integer,  }
3# topics table { id: Integer, title: String,  }
4# topic_allowed_groups table { group_id: Integer, topic_id: Integer }
5
6# Query with SQL strings
7Post.includes(:topic)
8 .where(’topics.title IN (SELECT topic_id FROM topic_allowed_groups WHERE group_id = ?)’, self.id)
9
10type Table, :where, ”(t <: <<if t.is_a?(ConstString)
11 then sql_typecheck(tself, t)
12 else schema_type(tself)
13 end >>) -> <<tself>>”
Figure 3. Type Checking SQL Strings in Discourse.

As we saw in Figure 1, ActiveRecord uses a DSL that makes it easier to construct queries inside of Ruby. However, sometimes programmers need to include raw SQL in their queries, either to access a feature not supported by the DSL or to improve performance compared to the DSL-generated query.

Figure 3 gives one such example, extracted and simplified from Discourse, one of our subject programs. Here there are three relevant tables: posts, which stores posted messages; topics, which stores the topics of posts; and topic_allowed_groups, which is used to limit the topics allowed by certain user groups.

Line 3 shows a query that includes raw SQL. First, the posts and topics tables are joined via the includes method. (This method does eager loading whereas joins does lazy loading.) Then where filters the resulting table based on some conditions. In this case, the conditions involve a nested SQL query, which cannot be expressed except using raw SQL that will be inserted into the final generated query.

This example also shows another feature: any ?’s that appear in raw SQL are replaced by additional arguments to where. In this case, the ? will be replaced by self.id.

We would like to extend type checking to also reason about the raw SQL strings in queries, since they may have errors. In this particular example, we have injected a bug. The inner SELECT returns a set of integers, but topics.title is a string, and it is a type error to search for a string in an integer set.

To find this bug, we developed a simple type checker for a subset of SQL, and we wrote a comp type for where that invokes it as shown on line 3. In particular, if the type of the argument to where, here referred to by t, is a const string, then we type check that string as raw SQL, and otherwise we compute the valid parameters of where using the schema_type method from Figure 1. The result of where has the same type as the receiver.

The sql_typecheck method (not shown) takes the receiver type, which will be a Table with a type parameter describing the schema, and the SQL string. One challenge that arises in type checking the SQL string is that it is actually only a fragment of a query, which therefore cannot be directly parsed using a standard SQL parser. We solve this problem by creating a complete, but artificial, SQL query into which we inject the fragment. This query is never run, but it is syntactically correct so it can be parsed. Then, we replace any ?’s with placeholder AST nodes that store the types of the corresponding arguments.

For example, the raw SQL in Figure 3 gets translated to the following SQL query:

SELECT * FROM posts INNER JOIN topics
  ON a.id = b.a_id
  WHERE topics.title IN (SELECT topic_id FROM topic_allowed_groups WHERE group_id = [Integer])

Notice the table names (posts, topics) occur on the first line and the ? has been replaced by a placeholder indicating the type Integer of the argument. Also note that the column names to join on (which are arbitrary here) are ignored by our type checker, which currently only looks for errors in the where clause.

Once we have a query that can be parsed, we can type check it using the DB schema. In this case, the type mismatch between topics.title and the inner query will be reported.

In § 2.1, comp types were evaluated to produce a normal type signature. However, we use comp types in a slightly different way for checking SQL strings. The sql_typecheck method will itself perform type checking and provide a detailed message when an error is found. If no error is found, sql_typecheck will simply return the type String, allowing type checking to proceed.

2.4. Discussion

Now that we have seen CompRDL in some detail, we can discuss several parts of its design.

Dynamic Checks.

In type systems with type-level computations, or more generally dependent type systems, comparing two types for equality is often undecidable, since it requires checking if computations are equivalent.

To avoid this problem, CompRDL only uses comp types for methods which themselves are not type checked. For example, Hash#[] is implemented in native code, and we have not attempted to type check ActiveRecord’s joins method, which is part of a very complex system.

As a result, type checking in CompRDL is decidable. Comp types are only used to type check method calls, meaning we will always have access to the types of the receiver and arguments in a method call. Additionally, in all cases we have encountered in practice, the types of the receiver and arguments are ground types (meaning they do not contain type variables). Thus, comp types can be fully evaluated to non-comp types before proceeding to type checking.

For soundness, since we do not type check the bodies of comp type-annotated methods, CompRDL inserts dynamic checks at calls to such methods to ensure they match their computed types. For example, in Figure 2, CompRDL inserts a check that page[:info] returns an Array. This follows the approach of gradual (Siek and Taha, 2006) and hybrid (Flanagan, 2006) typing, in which dynamic checks guard statically unchecked code.

We should also note that although our focus is on applying comp types to libraries, they can be applied to any method at the cost of dynamic checks for that method rather than static checks. For example, they could be applied to a user-defined library wrapper.

Termination.

A second issue for the decidability of comp types is that type-level computations could potentially not terminate. To avoid this possibility, we implement a termination checker for comp types. At a high level, CompRDL ensures termination by checking that iterators used by type-level code do not mutate their receivers and by forbidding type-level code from using looping constructs. We also assume there are no recursive method calls in type-level code. We discuss termination checking in more detail in § 4.

Value Dependency.

We note that, unlike dependent types (e.g., Coq (Pierce et al., 2017), Agda (Norell, 2009), F* (Swamy et al., 2016)) where types depend directly on terms, in CompRDL types depend on the types of terms. For instance, in a comp type (t<:Object) -> tres the result type tres can depend on the type t of the argument. Yet, since singleton types lift expressions into types, we could still use CompRDL to express some value dependencies in types in the style of dependent typing.

Constant Folding.

Finally, in RDL, integers and floats have singleton types. Thus, we can use comp types to lift some arithmetic computations to the type level. For example, CompRDL can assign the expression 1+1 the type Singleton(2) instead of Integer. This effectively incorporates constant folding into the type checker.

While we did write such comp types for Integer and Float (see Table 1), we found that this precision was not useful, at least in our subject programs. The reason is that RDL only assigns singleton types to constants, and typically arithmetic methods are not applied to constant values. Thus, though we have written comp types for the Integer and Float libraries, we have yet to find a useful application for them in practice. We leave further exploration of this topic to future work.

3. Soundness of Comp Types

In this section we formalize CompRDL as , a core object-oriented calculus that includes comp types for library methods. We first define the syntax and semantics of (§ 3.1), and then we formalize type checking (§ 3.2). The type checking process includes a rewriting step to insert dynamic checks to ensure library methods satisfy their type signatures. Finally, we prove type soundness (§ 3.3). For brevity, we leave the full formalism and proofs to Appendix A. Here we provide only the key details.

3.1. Syntax and Semantics

, , ,

Figure 4. Syntax and Relations of .

Figure 5. A subset of the type checking and rewriting rules for .

Figure 4 gives the syntax of . Values include nil, true, and false. To support comp types, class IDs , which are the base types in , are also values. We assume the set of class IDs includes several built-in classes: Nil, the class of nil; Obj, which is the root superclass; True and False, which are the classes of true and false, respectively, as well as their superclass Bool; and Type, the class of base types .

Expressions include values and variables and . By convention, we use the former in regular program expressions and the latter in comp types. The special variable self names the receiver of a method call, and the special variable tself names the type of the receiver in a comp type. New object instances are created with . Expressions also include sequences , conditionals , and method calls , where, to simplify the formalism, methods take one argument. Finally, our type system translates calls to library methods into checked method calls , which checks at run-time that the value returned from the call has type . We assume this form does not appear in the surface syntax.

We assume the classes form a lattice with Nil as the bottom and Obj as the top. We write the least upper bound of and as . For simplicity, we assume the lattice correctly models the program’s classes, i.e., if , then is a subclass of by the usual definition. Lastly, three of the built-in classes, Nil, True, and False, are singleton types, i.e., they contain only the values nil, true, and false, respectively. Extending with support for more kinds of singleton types is straightforward.

Method Types are of the form where and are the domain and range types, respectively. Library Method Types are either method types or have the form , where and are expressions that evaluate to types and that can refer to the variables and tself. The base types and provide an upper bound on the respective expression types, i.e., for any , expressions and should evaluate to subtypes of and , respectively. These upper bounds are used for type checking comp types (§ 3.2).

Finally, programs are sequences of method definitions and library method declarations.

Dynamic Semantics.

The dynamic semantics of are the small-step semantics of Ren and Foster (2016), modified to throw blame (§ 3.3) when a checked method call fails. They use dynamic environments , defined in Figure 4, which map variables to values. We define the relation , meaning the expression evaluates to under dynamic environment . The full evaluation rules use a stack as well A, but we omit the stack here for simplicity.

Example.

As an example comp type in the formalism, consider type checking the expression , where the method returns the logical conjunction of the receiver and argument. Standard type checking would assign this expression the type Bool. However, with comp types we can do better.

Recall that true and false are members of the singleton types True and False. Thus, we can write a comp type for the method that yields a singleton return type when the arguments are singletons, and Bool in the fallback case:

The first two lines of the condition handle the singleton cases, and the last line is the fallback case.

3.2. Type Checking and Rewriting

Figure 5 gives a subset of the rules for type checking and rewriting to insert dynamic checks at library calls. The remaining rules, which are straightforward, can be found in Appendix A. These rules use two additional definitions from Figure 4. Type environments map variables to base types, and the class table CT maps methods to their type signatures. We omit the construction of class tables, which is standard. We also use disjoint sets and to refer to the user-defined and library methods, respectively.

The rules in Figure 5 prove judgments of the form , meaning under type environment and class table CT, source expression is rewritten to target expression , which has type .

Rule (C-Type) is straightforward: any class ID that is used as a value is rewritten to itself, and it has type Type. We include this rule to emphasize that types are values in .

Rule (C-AppUD) finds the receiver type , then looks up in the class table. This rule only applies when is user-defined and thus has a (standard) method type . Then, as is standard, the rule checks that the argument’s type is a subtype of , and the type of the whole call is . This rule rewrites the subexpressions and , but it does not itself insert any new checks, since user-defined methods are statically checked against their type signatures (rule not shown).

Rule (C-AppLib) is similar to Rule (C-AppUD), except it applies when the callee is a library method. In this case, the rule inserts a check to ensure that, at run-time, the library method abides by its specified type.

Rule (C-App-Comp) is the crux of ’s type checking system. It applies at a call to a library method that uses a type-level computation, i.e., with a type signature . The rule first type checks and rewrites and to ensure they will evaluate to a type (i.e., have type Type). These expressions may refer to and tself, which themselves have type Type. The rule then evaluates the rewritten and using the dynamic semantics mentioned above to yield types and , respectively. Finally, the rule ensures that the argument has a subtype of ; sets the return type of the whole call to ; and inserts a dynamic check that the call returns an at runtime. For instance, the earlier example of the use of logical conjunction would be rewritten to .

There is one additional subtlety in Rule (C-App-Comp). Recall the example above that gives a type to . Notice that the type-level computation itself uses . This could potentially lead to infinite recursion, where calling requires checking that produces a type, which requires recursively checking that produces a type etc.

To avoid this problem, we introduce a function that rewrites class table CT to drop all annotations with type-level expressions. More precisely, any comp type is rewritten to . Then type checking type-level computations, in the fifth and eighth premise of (C-App-Comp), is done under the rewritten class table.

Note that, while this prevents the type checking rules from infinitely recursing, it does not prevent type-level expressions from themselves diverging. In , we assume this does not happen, but in our implementation, we include a simple termination checker that is effective in practice (§ 4).

3.3. Properties of .

Finally, we prove type soundness for . For brevity, we provide only the high-level description of the proof. The details can be found in Appendix A.

Blame.

The type system of does not prevent null-pointer errors, i.e., nil has no methods yet we allow it to appear wherever any other type of object is expected. We encode such errors as blame. We also reduce to blame when a dynamic check of the form fails.

Program Checking and Ct.

In the Appendix A we provide type checking rules not just for expressions but also for programs . These rules are where we actually check user-defined methods against their types. We also define a notion of validity for a class table CT with respect to , which enforces that CT’s types for methods and fields match the declared types in , and that appropriate subtyping relationships hold among subclasses. Given a well typed program , it is straightforward to construct a valid CT.

Type Checking Rules.

In addition to the type checking and rewriting rules of Figure 5, we define a separate judgment that is identical to except it omits the rewriting step, i.e., only performs type checking.

We can then prove soundness of the judgment using preservation and progress, and finally prove soundness of the type checking and rewriting rules as a corollary:

Theorem 3.1 (Soundness).

For any expressions and ’, type , class table CT, and program such that CT is valid with respect to , if then either reduces to a value, reduces to blame, or does not terminate.

4. Implementation

We implemented CompRDL as an extension to RDL, a type checking system for Ruby (Foster et al., 2018; Ren and Foster, 2016; Strickland et al., 2014; Ren et al., 2013). In total, CompRDL comprises approximately 1,170 lines of code added to RDL.

RDL’s design made it straightforward to add comp types. We extended RDL so that, when type checking method calls, type-level computations are first type checked to ensure they produce a value of type Type and then are executed to produce concrete types, which are then used in subsequent type checking. Comp types use RDL’s contract mechanism to insert dynamic checks for comp types.

Heap Mutation.

For simplicity, does not include a heap. By contrast, CompRDL allows arbitrary Ruby code to appear in comp types. This allows great flexibility, but it means such code might depend on mutable state that could change between type checking and the execution of a method call. For example, in Figure 1, type-level code uses the global table RDL.db_schema. If, after type checking the method available?, the program (pathologically) changed the schema of User to drop the username column, then available? would fail at runtime even though it had type checked. The dynamic checks discussed in § 2 and § 3 are insufficient to catch this issue, because they only check a method call against the initial result of evaluating a comp type; they do not consider that the same comp type might yield a new result at runtime.

To address this issue, CompRDL extends dynamic checks to ensure types remain the same between type checking and execution. If a method call is type checked using a comp type, then prior to that call at runtime, CompRDL will reevaluate that same comp type on the same inputs. If it evaluates to a different type, CompRDL will raise an exception to signal a potential type error. An alternative approach would be to re-check the method under the new type.

Of course, the evaluation of a comp type may itself alter mutable state. Currently, CompRDL assumes that comp type specifications are correct, including any mutable computations they may perform. If a comp type does have any erroneous effects, program execution could fail in an unpredictable manner. Other researchers have proposed safeguards for this issue of effectful contracts by using guarded locations (Dimoulas et al., 2012) or region based effect systems (Sekiyama and Igarashi, 2017). We leave incorporating such safeguards for comp types as future work. We note, however, that this issue did not arise in any comp types we used in our experiments.

Termination of Comp Types.

A standard property of type checkers is that they terminate. However, because comp types allow arbitrary Ruby code, CompRDL could potentially lose this property. To address this issue, CompRDL includes a lightweight termination checker for comp types.

1type :m1, …, terminates: :+
2type :m2, …, terminates: :+
3type :m3, …, terminates: :-
4
5type Array, :map, …, terminates: :blockdep
6type Array, :push, …, pure: :-
7
8def m1()
9 m2() # allowed: m2 terminates 
10 m3() # not allowed: m3 may not terminate 
11 while … end # not allowed: looping 
12
13 array = [1,2,3] # create new array
14 array.map { |val| val+1 } # allowed 
15 array.map { |val| array.push(4) }
16 # not allowed: iterator calls impure method push
17end
Figure 6. Termination Checking with CompRDL.

Figure 6 illustrates the ideas behind termination checking. In CompRDL, methods can be annotated with termination effects :+, for methods that always terminate (e.g., m1 and m2) and :- for methods that might diverge (e.g., m3). CompRDL allows terminating methods to call other terminating methods (Line 6) but not potentially non-terminating methods (Line 6). Additionally, terminating methods may not use loops (Line 6). CompRDL assumes that type-level code does not use recursion, and leave checking of recursion to future work.

We believe it is reasonable to forbid the use of built-in loop constructs, and to assume no recursion, because in practice most iteration in Ruby occurs via methods that iterate over a structure. For instance, array.map \{block }  returns a new array in which the block, a code block or lambda, has been applied to each element of array. Since arrays are by definition finite, this call terminates as long as block terminates and does not mutate the array. A similar argument holds other iterators of Array, Hash, etc.

Thus, CompRDL checks termination of iterators as follows. Iterator methods can be annotated with the special termination effect :blockdep (Line 6), indicating the method terminates if its block terminates and is pure. CompRDL also includes purity effect annotations indicating whether methods are pure (:+) or impure (:-). A pure method may not write to any instance variable, class variable, or global variable, or call an impure method. CompRDL determines that a :blockdep method terminates as long as its block argument is pure, and otherwise it may diverge. Using this approach, CompRDL will allow Line 6 but reject reject Line 6.

Type Mutations and Weak Updates

Finally, to handle aliasing, our type annotations for Array, Hash, and String need to perform weak updates to type information when tuple, finite hash, and const string types, respectively, are mutated. For example, consider the following code:

a = [1, foo]; if…then b = a else…end; a[0]=’one

Here (ignoring singleton types for simplicity), a initially has the type t = [Integer, String], where t is a Ruby object, specifically an instance of RDL’s TupleType class. At the join point after the conditional, the type of b will be a union of t and its previous type.

We could potentially forbid the assignment to a[0] because the right-hand side does not have the type Integer. However, this is likely too restrictive in practice. Instead, we would like to mutate t after the write. However, b shares this type. Thus we perform a weak update: after the assignment we mutate t to be [Integer $\cup$ String, String], to handle the cases when a may or may not have been assigned to b.

For soundness, we need to retroactively assume t was always this type. Fortunately, for all tuple, finite hash, and const string types , RDL already records all asserted constraints and to support promotion of tuples, finite hashes, and const strings to types Array, Hash, and String, respectively (Foster et al., 2018). We use this same mechanism to replay previous constraints on these types whenever they are mutated. For example, if previously we had a constraint