Type Safe Redis Queries: A Case Study of Type-Level Programming in Haskell

08/30/2017 ∙ by Ting-Yan Lai, et al. ∙ Academia Sinica 0

Redis is an in-memory data structure store, often used as a database, with a Haskell interface Hedis. Redis is dynamically typed --- a key can be discarded and re-associated to a value of a different type, and a command, when fetching a value of a type it does not expect, signals a runtime error. We develop a domain-specific language that, by exploiting Haskell type-level programming techniques including indexed monad, type-level literals and closed type families, keeps track of types of values in the database and statically guarantees that type errors cannot happen for a class of Redis programs.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Redis111https://redis.io is an open source, in-memory data structure store that can be used as database, cache, and message broker. A Redis data store can be thought of as a set of key-value pairs. The value can be a string, a list of strings, a set of strings, or a hash table of strings, etc. However, string is the only primitive datatype. Numbers, for example, are serialized to strings before being saved in the data store, and parsed back to numbers to be manipulated with. While the concept is simple, Redis is used as an essential component in a number of popular, matured services, including Twitter, GitHub, Weibo, StackOverflow, and Flickr.

For an example, consider the following sequence of commands, entered through the interactive interface of Redis. The keys some-set and another-set are both assigned a set of strings. The two calls to command SADD respectively add three and two strings to the two sets, before SINTER takes their intersection: [xleftmargin=.4in] redis¿ SADD some-set a b c (integer) 3 redis¿ SADD another-set a b (integer) 2 redis¿ SINTER some-set another-set 1) ”a” 2) ”b” Notice that the keys some-set and another-set, if not existing before the call to SADD, are created on site. The calls to SADD return the sizes of the resulting sets.

Many third party libraries provide interfaces for general purpose programming languages to access Redis through its TCP protocol. For Haskell, the most popular library is Hedis.222https://hackage.haskell.org/package/hedis A (normal) Redis computation returning a value of type is represented in Haskell by , where the type is a monad, while indicates that the computation either returns a value of type , or fails with an error message represented by type . The following program implements the previous example:

The function

takes a key and a list of values as arguments, and returns a Redis computation yielding . Keys and values, being nothing but binary strings in Redis, are represented using Haskell .

The Problems

Most commands work only with data of certain types. In the following example, the key some-string is assigned a string foo — the command SET always assigns a string to a key. The subsequent call to SADD, which adds a value to a set, thus raises a runtime error. [xleftmargin=.4in] redis¿ SET some-string foo OK redis¿ SADD some-string bar (error) WRONGTYPE Operation against a key holding the wrong kind of value For another source of type error, the command INCR key increments the value associated with key by one. With strings being the only primitive type, Redis parses the stored string to an integer and, after incrementation, stores a string back. If the string cannot be parsed as an integer, a runtime error is raised.

We point out a peculiar pattern of value creation and updating in Redis: the same command is used both to create a key-value pair and to update them. Similar to SADD, the command LPUSH appends a value (a string) to a list, or creates one if the key does not exist: [xleftmargin=.4in] redis¿ LPUSH some-list bar (integer) 1 Another command LLEN returns the length of the list, and signals an error if the key is not associated with a list: [xleftmargin=.4in] redis¿ LLEN some-list (integer) 1 redis¿ SET some-string foo OK redis¿ LLEN some-string (error) WRONGTYPE Operation against a key holding the wrong kind of value Curiously, however, when applied to a key not yet created, Redis designers chose to let LLEN return 0: [xleftmargin=.4in] redis¿ LLEN nonexistent (integer) 0

Being a simple wrapper on top of the TCP protocol of Redis, Hedis inherits all the behaviors. Executing the following program yields the same error, but wrapped in a Haskell constructor: "WRONGTYPE Operation against a key holding the wrong kind of value".

Such a programming model is certainly very error-prone. Working within Haskell, a host language with a strong typing system, one naturally wishes to build a domain-specific embedded language (DSEL) that exploits the rich type system of Haskell to not only ensure the absence of Redis type errors (at least in the simplified situation where there is one client accessing the store), but also provides better documentations. We wish to be sure that a program calling INCR, for example, can be type checked only if we can statically guarantee that the value to be accessed is indeed an integer. We wish to see from the type of operators such as LLEN when it can be called, and allow it to be used only in contexts that are safe. We may even want to explicitly declare a fresh key and its type, to avoid reusing an existing key by accident, and to prevent it from unexpectedly being used as some other type.

This paper discusses the techniques we used and the experiences we learned from building such a language, Edis. We constructed an indexed monad, on top of the monad , which is indexed by a dictionary that maintains the set of currently defined keys and their types. To represent the dictionary, we need to encode variable bindings with type-level lists and strings. And to manipulate the dictionary, we applied various type-level programming techniques. To summarize our contributions:

  • We present Edis, a statically typed domain-specific language embedded in Haskell (with extension provided by the Glasgow Haskell Compiler) and built on Hedis. Serializable Haskell datatypes are automatically converted before being written to Redis data store. Available keys and their types are kept track of in type-level dictionaries. The types of embedded commands state clearly their preconditions and postconditions on the available keys and types, and a program is allowed to be constructed only if it is guaranteed not to fail with a type error, assuming that it is the only client accessing the store.

  • We demonstrate the use of various type-level programming techniques, including data kinds, singleton types and proxies, closed type families, etc., to define type-level lists and operations that observe and manipulate the lists.

  • This is (yet another) example of encoding effects and constraints of programs in types, using indexed monad (Atkey, 2009), closed type-families (Eisenberg et al., 2014) and constraint kinds (Orchard and Schrijvers, 2010).

In Section 2 we introduce indexed monads, to reason about pre and postconditions of stateful programs, and in Section 3 we review the basics of type-level programming in Haskell that allows us to build type-level dictionaries to keep track of keys and types. Embeddings of Redis commands are presented in Section 4. In Section 5 we discuss various issues regarding design choices, limitations of this approach, as well as possible future works, before we review related work in Section 6.

2. Indexed Monads

Stateful computations are often reasoned using Hoare logic. A Hoare triple denotes the following proposition: if the statement is executed in a state satisfying predicate , when it terminates, the state must satisfy predicate . Predicates and are respectively called the precondition and the postcondition of the Hoare triple.

In Haskell, stateful computations are represented by monads. To reason about them within the type system, we wish to label a state monad with its pre and postcondition. An indexed monad (Atkey, 2009) (also called a parameterised monad or monadish) is a monad that, in addition to the type of its result, takes two more type parameters representing an initial state and a final state, to be interpreted like a Hoare triple:

The intention is that a computation of type is a stateful computation such that, if it starts execution in a state satisfying and terminates, it yields a value of type , and the new state satisfies . The operator lifts a pure computation to a stateful computation that does not alter the state. In , a computation is followed by — the postcondition of matches the precondition of the computation returned by . The result is a computation .

We define a new indexed monad . At term level, the and methods are not interesting: they merely make calls to and of , and extract and re-apply the constructor when necessary. With being a , they can be optimized away in runtime. The purpose is to add the pre/postconditions at type level:

The properties of the state we care about are the set of currently allocated keys and types of their values. We will present, in Section 3, techniques that allow us to specify properties such as “the keys in the data store are "A", "B", and "C", respectively assigned values of type , , and .” For now, however, let us look at the simplest Redis command.

Redis commands can be executed in two contexts: the normal context, or a transaction. In Hedis, a command yielding value of type in the normal case is represented by , as mentioned in Section 1; in a transaction, the command is represented by two other datatypes . In this paper we focus on the former case. For brevity we abbreviate to .

The command PING in Redis does nothing but replying a message PONG if the connection is alive. In Hedis, has type . The Edis version of simply applies an additional constructor (functions from Hedis are qualified with to prevent name clashing):

Since does not alter the data store, the postcondition and precondition are the same. Commands that are more interesting will be introduced after we present our type-level encoding of constraints on states.

3. Type-Level Dictionaries

One of the challenges of statically ensuring type correctness of stateful languages is that the type of the value of a key can be altered by updating. In Redis, one may delete an existing key and create it again by assigning to it a value of a different type. To ensure type correctness, we keep track of the types of all existing keys in a dictionary. A dictionary is a finite map, which can be represented by an associate list, or a list of pairs of keys and some encoding of types. For example, we may use the dictionary to represent a predicate, or a constraint, stating that “the keys in the data store are "A", "B", and "C", respectively assigned values of type , , and .” (This representation will be refined in the next section.)

The dictionary above mixes values (strings such as "A", "B") and types. Furthermore, as mentioned in Section 2, the dictionaries will be parameters to the indexed monad . In a dependently typed programming language (without the so-called “phase distinction” — separation between types and terms), this would pose no problem. In Haskell however, the dictionaries, to index a monad, has to be a type as well.

In this section we describe how to construct a type-level dictionary, to be used with the indexed monad in Section 2.

3.1. Datatype Promotion

Haskell maintains the distinction between values, types, and kinds: values are categorized by types, and types are categorized by kinds. The kinds are relatively simple: is the kind of all lifted types, while type constructors have kinds such as , or , etc.333In Haskell, the opposite of lifted types are unboxed types, which are not represented by a pointer to a heap object, and cannot be stored in a polymorphic data type.  Consider the datatype definitions below:

The left-hand side is usually seen as having defined a type , and two value constructors and . The right-hand side is how Haskell lists are understood. The kind of is , since it takes a lifted type to a lifted type . The two value constructors respectively have types and , for all types .

The GHC extension data kinds (Yorgey et al., 2012), however, automatically promotes certain “suitable” types to kinds.444It is only informally described in the GHC manual what types are “suitable”. With the extension, the definitions above has an alternative reading: is a new kind, is a type having kind , and is a type constructor, taking a type in kind to another type in . When one sees a constructor in an expression, whether it is promoted can often be inferred from the context. When one needs to be more specific, prefixing a constructor with a single quote, such as in and , denotes that it is promoted.

The situation of lists is similar: for all kinds , is also a kind. For all kinds , is a type of kind . Given a type of kind and a type of kind , is again a type of kind . Formally, . For example, is a type having kind — it is a list of (lifted) types. The optional quote denotes that the constructors are promoted. The same list can be denoted by a syntax sugar .

Tuples are also promoted. Thus we may put two types in a pair to form another type, such as in , a type having kind .

Strings in Haskell are nothing but lists of s. Regarding promotion, however, a string can be promoted to a type having kind . In the expression:

the string on the left-hand side of is a type whose kind is .

With all of these ingredients, we are ready to build our dictionaries, or type-level associate lists:

All the entities defined above are types, where has kind . In , while has kind and "A" has kind , the former kind subsumes the later. Thus also has kind .

3.2. Type-Level Functions

Now that we can represent dictionaries as types, the next step is to define operations on them. A function that inserts an entry to a dictionary, for example, is a function from a type to a type. While it was shown that it is possible to simulate type-level functions using Haskell type classes (McBride, 2002), in recent versions of GHC, indexed type families, or type families for short, are considered a cleaner solution.

For example, compare disjunction and its type-level counterpart :

The first is a typical definition of

by pattern matching. In the second definition,

is not a type, but a type lifted to a kind, while and are types of kind . The declaration says that is a family of types, indexed by two parameters and of kind . The type with index and is , and all other indices lead to . For our purpose, we can read as a function from types to types — observe how it resembles the term-level . We present two more type-level functions about — negation, and conditional, that we will use later:

As a remark, type families in GHC come in many flavors. One can define families of types, as well as families of synonyms. They can appear inside type classes (Chakravarty et al., 2005b; Chakravarty et al., 2005a) or at toplevel. Top-level type families can be open (Schrijvers et al., 2008) or closed (Eisenberg et al., 2014). The flavor we choose is top-level, closed type synonym families, since it allows overlapping instances, and since we need none of the extensibility provided by open type families. Notice that the instance could be subsumed under the more general instance, . In a closed type family we may resolve the overlapping in order, just like how cases overlapping is resolved in term-level functions.

Now we can define operations on type-level dictionaries. Let us begin with:

returns the entry associated with key in the dictionary . Notice, in the first case, how type-level equality can be expressed by unifying type variables with the same name. Note also that is a partial function on types: while evaluates to , when appears in a type expression, there are no applicable rules to reduce it. The expression thus stays un-reduced.

Some other dictionary-related functions are defined in Figure 1. The function either updates an existing entry or inserts a new entry, removes an entry matching a given key, while checks whether a given key exists in the dictionary.

Figure 1. Some operations on type-level dictionaries.

4. Embedding Hedis Commands

Having the indexed monads and type-level dictionaries, in this section we present our embedding of Hedis commands into Edis, while introducing necessary concepts when they are used.

4.1. Proxies and Singleton Types

The Hedis function takes a list of keys (encoded to ) and removes the entries having those keys in the database. For some reason to be explained later, we consider an Edis counterpart that takes only one key. A first attempt may lead to something like the following:

where the function converts to . At term-level, our merely calls . At type-level, if the status of the database before is called meets the constraint represented by the dictionary , the status afterwards should meet the constraint . The question, however, is what to fill in place of the question mark. It cannot be , since is a runtime value and not a type. How do we smuggle a runtime value to type-level?

In a language with phase distinction like Haskell, it is certainly impossible to pass the value of to the type checker if it truly is a runtime value, for example, a string read from the user. If the value of can be determined statically, however, singleton types (Eisenberg and Weirich, 2012) can be used to represent a type as a value, thus build a connection between the two realms.

A singleton type is a type that has only one term. When the term is built, it carries a type that can be inspected by the type checker. The term can be thought of as a representative of its type at the realm of runtime values. For our purpose, we will use the following type :

For every type , is a type that has only one term: .555While giving the same name to both the type and the term can be very confusing, it is unfortunately a common practice in the Haskell community. To call , instead of passing a key as a value, we give it a proxy with a specified type:

where "A" is not a value, but a string lifted to a type (of kind ). Now that the type checker has access to the key, the type of can be .

The next problem is that, , at term level, gets only a value constructor without further information, while it needs to pass a key to . Every concrete string literal lifted to a type, for example, "A", belongs to a type class . For all types in , the function :

retrieves the string associated with a type-level literal that is known at compile time. In summary, can be implemented as:

where .

A final note: the function , from the Haskell library cereal, helps to convert certain datatypes that are serializable into . The function and its dual will be used more later.

4.2. Automatic Serialization

As mentioned before, while Redis provide a number of container types including lists, sets, and hash, etc., the primitive type is string. Hedis programmers manually convert data of other types to strings before saving them into the data store. In Edis, we wish to save some of the effort for the programmers, as well as keeping a careful record of the intended types of the strings in the data store.

To keep track of intended types of strings in the data store, we define the following types (that have no terms):

If a key is associated with, for example, in our dictionary, we mean that its value in the data store was serialized from an and should be used as an . Types and , respectively, denotes that the value is a list or a set of type .

While the command in Hedis always writes a string to the data store, the corresponding in Redis applies to any serializable type (those in the class ), and performs the encoding for the user:

For example, executing updates the dictionary with an entry . If "A" is not in the dictionary, this entry is added; otherwise the old type of "A" is updated to .

Redis command INCR reads the (string) value of the given key, parses it as an integer, and increments it by one, before storing it back. The command INCRBYFLOAT increments the floating point value of a key by a given amount. They are defined in Edis below:

Notice the use of (), equality constraints (Sulzmann et al., 2007), to enforce that the intended type of the value of must respectively be and . The function is only allowed to be called in a context where the type checker is able to reduce to — recall that when is not in , cannot be fully reduced. The type of works in a similar way.

4.3. Disjunctive Constraints

Recall, from Section 1, that commands LPUSH key val and LLEN key succeed either when appears in the data store and is assigned a list, or when does not appear at all. What we wish to have in their constraint is thus a predicate equivalent to .

To impose a conjunctive constraint , one may simply put them both in the type: . Expressing disjunctive constraints is only slightly harder, thanks to our type-level functions. We may thus write the predicate as:

To avoid referring to , which might not exist, we define an auxiliary predicate such that reduces to only if . As many Redis commands are invokable only under such “well-typed, or non-existent” precondition, we give names to such constraints, as seen in Figure 2.

Figure 2. The “well-typed, or non-existent” constraints.

The Edis counterpart of LPUSH and LLEN are therefore:

Similarly, the type of , a function we have talked about a lot, is given below:

To see a command with a more complex type, consider , which uses the type-level function defined in Section 3.2:

From the type one can see that creates a new entry in the data store only if is fresh. The type of computes a postcondition for static checking, as well as serving as a good documentation for its semantics.

4.4. Hashes

Hash is a useful datatype supported by Redis. While the Redis data store can be seen as a set of key/value pairs, a hash is itself a set of field/value pairs. The following commands assigns a hash to key user. The fields are name, birthyear, and verified, respectively with values banacorn, 1992, and 1. [xleftmargin=.4in] redis¿ hmset user name banacorn birthyear 1992 verified 1 OK redis¿ hget user name ”banacorn” redis¿ hget user birthyear ”1992”

For a hash to be useful, we should allow the fields to have different types. To keep track of types of fields in a hash, takes a list of pairs:

By having an entry in a dictionary, we denote that the value of key is a hash whose fields and their types are specified by , which is also a dictionary.

Figure 3. Type-level operations for dictionaries with hashes.

Figure 3 presents some operations we need on dictionaries when dealing with hashes. Let be a dictionary, returns the type of field in the hash assigned to key , if both and