Competitive Parallelism: Getting Your Priorities Right

07/10/2018 ∙ by Stefan K. Muller, et al. ∙ Carnegie Mellon University 0

Multi-threaded programs have traditionally fallen into one of two domains: cooperative and competitive. These two domains have traditionally remained mostly disjoint, with cooperative threading used for increasing throughput in compute-intensive applications such as scientific workloads and cooperative threading used for increasing responsiveness in interactive applications such as GUIs and games. As multicore hardware becomes increasingly mainstream, there is a need for bridging these two disjoint worlds, because many applications mix interaction and computation and would benefit from both cooperative and competitive threading. In this paper, we present techniques for programming and reasoning about parallel interactive applications that can use both cooperative and competitive threading. Our techniques enable the programmer to write rich parallel interactive programs by creating and synchronizing with threads as needed, and by assigning threads user-defined and partially ordered priorities. To ensure important responsiveness properties, we present a modal type system analogous to S4 modal logic that precludes low-priority threads from delaying high-priority threads, thereby statically preventing a crucial set of priority-inversion bugs. We then present a cost model that allows reasoning about responsiveness and completion time of well-typed programs. The cost model extends the traditional work-span model for cooperative threading to account for competitive scheduling decisions needed to ensure responsiveness. Finally, we show that our proposed techniques are realistic by implementing them as an extension to the Standard ML language.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The increasing proliferation of multicore hardware has sparked a renewed interest in programming-language support for cooperative threading. In cooperative threading, threads correspond to pieces of a job and are scheduled with the goal of completing the job as quickly as possible—or to maximize throughput. Cooperative thread scheduling algorithms are therefore usually non-preemptive: once a thread starts executing, it is allowed to continue executing until it completes.

Cooperatively threaded languages such as NESL (Blelloch et al., 1994), Cilk (Frigo et al., 1998), parallel Haskell (Chakravarty et al., 2007; Keller et al., 2010) and parallel ML (Fluet et al., 2011; Jagannathan et al., 2010; Raghunathan et al., 2016), have at least two important features:

  • The programmers can express opportunities for parallelism at a high level with relatively simple programming abstractions such as fork/join and async/finish. The run-time system of the language then handles the creation and scheduling of the threads.

  • The efficiency and performance of parallel programs written at this high level can be analyzed by using cost models based on work and span (e.g. (Blelloch and Greiner, 1995, 1996; Eager et al., 1989; Spoonhower et al., 2008)), which can guide efficient implementations.

Cooperative threading is elegant and expressive but it mostly excludes the important class of interactive applications, which require communication with the external world, including users and other programs. Such interactive applications typically require responsiveness, such as the requirement to process user input as soon as possible. Ensuring responsiveness usually requires competitive threading, where threads are scheduled pre-emptively, usually based on priorities. To guarantee responsiveness, most competitive threading libraries in use today expose a fixed range of numerical priorities which may be assigned to threads. Regardless of the threading primitives used, this greatly complicates the task of writing programs:

  • Writing effective competitively threaded programs requires assigning priorities to threads. While this can be simple for simple programs, using priorities at scale is a big challenge because most current approaches to priorities are inherently anti-modular. Because priorities are totally ordered, writing responsive programs might require reasoning about whether a thread should be given a higher or lower priority than a thread introduced in another part of the program, or possibly even in a library function.

  • To compensate for this lack of modularity, many systems expose large numbers of priorities: the POSIX threads (pthreads) API exposes scheduling policies with as many as 100 levels. Without clean guidelines governing their use, however, programmers must still reason globally about how to assign these numbers to threads. Studies have shown that programmers struggle to use systems with even 7 priorities (Hauser et al., 1993).

  • Reasoning about performance is much more difficult: the clean work-span model of cooperative threading does not apply to competitive threading, because of the impact of priorities on run-time. Furthermore, in competitive threading, priority inversions, where a low-priority thread delays a high-priority one, can have harmful and even disastrous consequences. For example, “Mars Pathfinder”, which landed on Mars on 4 July 1997, suffered from a software bug, traced to a priority inversion, that caused the craft to reset itself periodically. The bug had to be patched remotely so the mission could continue.

In this paper, we develop language techniques and a cost model for writing parallel interactive programs that use a rich set of cooperative and competitive threading primitives. This problem is motivated by the fact that as shared-memory hardware becomes widely used, competitively threaded, interactive applications will need to take advantage of the benefits of this parallel hardware, and not just cooperatively threaded, compute-intensive applications.

We present a programming language with features for spawning and syncing with asynchronous threads, which may be assigned priorities by the programmer. Aside from priorities, these threads are equivalent to futures, a powerful general-purpose cooperative threading mechanism. Like futures, threads are first-class values in the language. To enable modular programming with priorities, we allow the programmer to declare any number of priorities and define a partial order between them. The resulting language is sufficiently powerful to enable both cooperative and competitive threading. For example, the programmer can write a purely compute intensive program (e.g., parallel Quicksort), a purely interactive program (e.g. a simple graphical user interface), and anything that combines the two (e.g. an email client that sorts tens of thousands of emails in parallel in the background while remaining responsive to user interaction events).

To reason about the efficiency and responsiveness of the programs written in this language, we present a cost model that bounds both the total computation time of the program and the response time of individual threads. Our cost semantics extends prior cost models of cooperative parallel programs to enable reasoning about the response time of threads with partially-ordered priorities. The main theoretical result of the paper shows that the response time of a thread does not depend on the amount of computation performed at lower priorities for any program in which threads do not sync on threads of lower priority. Such a sync clearly allows the response time of a high-priority thread to depend on low-priority work and is an example of the classic problem of priority inversions described above.

Our prior work on extending cooperative threading with priorities (Muller et al., 2017) also observed that priority inversions prevent responsiveness guarantees and presented static mechanisms for avoiding them. That work, however, considers only two priorities (high and low). Research in languages such as Ada (Cornhill and Sha, 1987; Levine, 1988) also discusses the importance of preventing priority inversion in a general setting with rich priorities, but we are aware of no prior static language mechanisms for doing so.

To guarantee appropriate bounds on responsiveness, we specify a type system that statically identifies and prevents priority inversions that would render such an analysis impossible. The type system enforces a monadic separation between commands, which are restricted to run at a certain priority, and expressions, which are priority-invariant. The type system then tracks the priorities of threads and rejects programs in which a high-priority thread may synchronize with a lower-priority one. In developing this system, we draw inspiration from modal logics, where the “possible worlds” of the modal logic correspond to priorities in our programs. More specifically, our type system is analogous to S4 modal logic, where the accessibility relation between worlds is assumed to be reflexive and transitive. This accessibility relation reflects the fact that the ways in which priorities are intended to interact is inherently asymmetric. Modal logic has proved to be effective in many problems of computer science. For example, Murphy et al. (2004), and Jia and Walker (2004) use the modal logic S5, where the accessibility relation between worlds is assumed to be symmetric (as well as reflexive and transitive), to model distributed computing.

The dynamic semantics of our language is a transition system that simulates, at an abstract level, the execution of a program on a parallel machine. We show that, for well-typed programs, our cost model accurately predicts the response time of threads in such an execution. Finally, we show that the proposed techniques can be incorporated into a practical language by implementing a compiler which typechecks prioritized programs and compiles them to a parallel version of Standard ML. We also provide a runtime system which schedules threads according to their priorities.

The specific contributions of this paper include the following.

  • An extension of the Parallel ML language, called , with language constructs for user-defined, partially ordered priorities.

  • A core calculus that captures the essential ideas of and a type system that guarantees inversion-free use of threads.

  • A cost semantics for which can be used to make predictions about both overall computation time and responsiveness, and a proof that these predictions are accurately reflected by the dynamic semantics.

  • An implementation of the compiler and the runtime system for as an extension of the Parallel MLton compiler.

  • Example benchmarks written in our implementation that give preliminary qualitative evidence for the practicality of the proposed techniques.

2 Overview

We present an overview of our approach to multithreaded programming with priorities by using a language called that extends Standard ML with facilities for prioritized multithreaded programming. As a running example, we consider an email client which interacts with a user while performing other necessary tasks in the background. The purpose of this section is to highlight the main ideas. The presentation is therefore high-level and sometimes informal. The rest of the paper formalizes these ideas (Section 3), expands on them to place performance bounds on programs (Section 4) and describes how they may be realized in practice (Section 6).

Priorities.

enables the programmer to define priorities as needed and specify the relationships between them. For example, in our mail client, we sometimes wish to alert the user to certain situations (such as an incoming email) and we also wish to compress old emails in the background when the system is idle. To express this in , we define two priorities alert and background and order them accordingly as follows.

priority alert
priority background
order background < alert

The ordering constraint specifies that background is lower priority than alert. Programmers are free to specify as many, or as few, ordering constraints between priorities as desired. therefore provides support for a set of partially ordered priorities. Partially ordered priorities suffice to capture the intuitive notion of priorities, and to give the programmer flexibility to express any desired priority behavior, but without the burden of having to reason about a total order over all priorities. Consider two priorities p and q. If they are ordered, e.g., p < q, then the system is instructed to run threads with priority q over threads with priority p. If no ordering is specified (i.e. p and q are incomparable in the partial order), then the system is free to choose arbitrarily between a thread with priority p and another with priority q.

Modal type system.

To ensure responsive use of priorities, provides a modal type system that tracks priorities. The types of include the standard types of functional programming languages as well as a type of thread handles, by which computations can refer to, and synchronize with, running threads.

To support computations that can operate at multiple priorities, the type system supports priority polymorphism through a polymorphic type of the form , where is a newly bound priority variable, and is a set of constraints of the form (where  and  are priority constants or variables, one of which will in general be ), which bounds the allowable instantiations of .

To support the tracking of priorities, the syntax and type system of distinguish between commands and expressions. Commands provide the constructs for spawning and synchronizing with threads. Expressions consist of an ML-style functional language, with some extensions. Expressions cannot directly execute commands or interact with threads, and can thus be evaluated without regard to priority. Expressions can, however, pass around encapsulated commands (which have a distinguished type) and abstract over priorities to introduce priority-polymorphic expressions.

Threads

Once declared, priorities can be used to specify the priority of threads. For example, in response to a request from the user, the mail client can spawn a thread to sort emails for background compression, and spawn another thread to alert the user about an incoming email. Spawned threads are annotated with a priority and run asynchronously with the rest of the program.

  spawn[background] { ret (sort …) };
  spawn[alert] { ret (display Incoming mail!”) }

The spawn command takes a command to run in the new thread and returns a handle to the spawned thread. In the above code, this handle is ignored, but it can also be bound to a variable using the notation x <- m; and used later to synchronize with the thread (wait for it to complete).

  spawn[background] { ret (sort …) };
  alert_thread <- spawn[alert] { ret (display New mail received”) };
  sync alert_thread
Example: priority-polymorphic multithreaded quicksort.

Priority polymorphism allows prioritized code to be compositional. For example, several parts of our email client might wish to use a library function qsort for sorting (e.g., the background thread sorts emails by date to decide which ones to compress and a higher-priority thread sorts emails by subject when the user clicks a column header.) Quicksort is easily parallelized, and so the library code spawns threads to perform recursive calls in parallel. The use of threads, however, means that the code must involve priorities and cannot be purely an expression. Because sorting is a basic function and may be used at many priorities, We would want the code for qsort to be polymorphic over priorities. This is possible in by defining qsort to operate at a priority defined by an unrestricted priority variable.

fun[p] qsort (compare: a * a -> bool) (s: a seq) : a seq cmd[p]  =
  if Seq.isEmpty s then
    cmd[p] {ret Seq.empty}
  else
    let val pivot = Seq.sub(s, (Seq.length s) / 2)
        val (s_l, s_e, s_g) = Seq.partition (compare pivot) s
    in
      cmd[p]
      {
        quicksort_l <- spawn[p] {do ([p]qsort compare s_l)};
        quicksort_g <- spawn[p] {do ([p]qsort compare s_g)};
        ss_l <- sync quicksort_l;
        ss_g <- sync quicksort_g;
        ret (Seq.append [ss_l, s_e, ss_g])
      }
    end
Figure 1: Code for multithreaded quicksort, which is priority polymorphic.

Figure 1 illustrates the code for a multithreaded implementation of Quicksort in . The code uses a module called Seq which implements some basic operations on sequences. In addition to a comparison function on the elements of the sequence that will be sorted and the sequence to sort, the function takes as an argument a priority , to which the body of the function may refer (e.g. to spawn threads at that priority)111Note that, unlike type-level parametric polymorphism in languages such as ML, which can be left implicit and inferred during type checking, priority parameters in must be specified in the function declaration.. The implementation of qsort follows a standard implementation of the algorithm but is structured according to the type system of . This can be seen in the return type of the function, which is an encapsulated command at priority p.

The function starts by checking if the sequence is empty. If so, it returns a command that returns an empty sequence. If the sequence is not empty, it partitions the sequence into sub-sequences consisting of elements less than, equal to and greater than, a pivot, chosen to be the middle element of the sequence. It then returns a command that sorts the sub-sequences in parallel, and concatenates the sorted sequences to produce the result. To perform the two recursive calls in parallel, the function spawns two threads, specifying that the threads operate at priority p.

This code also highlights the interplay between expressions and commands in . The expression cmd[p] {m} introduces an encapsulated command, and the command do e evaluates e to an encapsulated command, and then runs the command.

Priority Inversions.

The purpose of the modal type system is to prevent priority inversions, that is, situations in which a thread synchronizes with a thread of a lower priority. An illustration of such a situation appears in Figure 1(a). This code shows a portion of the main event loop of the email client, which processes and responds to input from the user. The event loop runs at a high priority. If the user sorts the emails by date, the loop spawns a new thread, which calls the priority-polymorphic sorting function. The code instantiates this function at a lower priority sort_p, reflecting the programmer’s intention that the sorting, which might take a significant fraction of a second for a large number of emails, should not delay the handling of new events. Because syncing with that thread immediately afterward (line 1(a)) causes the remainder of the event loop (high-priority) to wait on the sorting thread (lower priority), this code will be correctly rejected by the type system. The programmer could instead write the code as shown in Figure 1(b), which displays the sorted list in the new thread, allowing the event loop to continue processing events. This code does not have a priority inversion and is accepted by the type system.

priority loop_p
priority sort_p
order sort_p < loop_p
fun loop emails : unit cmd[loop_p] =
  case next_event () of
  SORT_BY_DATE =>
    cmd[loop_p] {
      t <- spawn[sort_p] {
        do ([sort_p]qsort
                  date emails)};
      l <- sync t;
      ret (display_ordered l)
    }
    | 
(a) Ill-typed event loop code
priority loop_p
priority sort_p
order sort_p < loop_p
fun loop emails : unit cmd[loop_p] =
  case next_event () of
  SORT_BY_DATE =>
    cmd[loop_p] {
      spawn[sort_p] {
        l <- do ([sort_p]qsort
                   date emails);
        ret (display_ordered l)
      }
    }
    | 
(b) Well-typed event loop code
Figure 2: Two implementations of the event loop, one of which displays a priority inversion.

Although the priority inversion of Figure 1(a) could easily be noticed by a programmer, the type system also rules out more subtle priority inversions. Consider the ill-typed code in Figure 3, which shows another way in which a programmer might choose to implement the event loop. In this implementation, the event loop spawns two threads. The first (at priority sort\_p) sorts the emails, and the second (at priority display\_p) calls a priority-polymorphic function [p]disp, which takes a sorting thread at priority p, waits for it to complete, and displays the result. This type of “chaining” is a common idiom in programming with futures, but this attempt has gone awry because the thread at priority display\_p is waiting on the lower-priority sorting thread. Because of priority polymorphism, it may not be immediately clear where exactly the priority inversion occurs, and yet this code will still be correctly rejected by the type system. The type error is on line 3:

constraint violated at 9.10-9.15: display_p <= p_1

This sync operation is passed a thread of priority p (note from the function signature that the types of thread handles explicitly track their priorities), and there is no guarantee that p is higher-priority than display_p (and, in fact, the instantiation on line 3 would violate this constraint). We may correct the type error in the disp function by adding this constraint to the signature:

fun[p : display_p <= p] disp (t: email seq thread[p]) : unit cmd[display_p] =

With this change, the instantiation on line 3 would become ill-typed, as it should because this way of structuring the code inherently has a priority inversion. The event loop code should be written as in Figure 1(b) to avoid a priority inversion. However, the revised disp function could still be called on a higher-priority thread (e.g. one that checks for new mail).

priority loop_p
priority display_p
priority sort_p
order sort_p < loop_p
order sort_p < display_p
fun[p] disp (t : email seq thread[p]) : unit cmd[display_p] =
  cmd[display_p] {
    l <- sync t;
    ret (display_ordered l)
  }
fun loop emails : unit cmd[loop_p] =
  case next_event () of
  SORT_BY_DATE =>
    cmd[loop_p] {
      t <- spawn[sort_p] { do ([sort_p]qsort date emails) };
      spawn[display_p] { do ([sort_p]disp t) } 
    }
    | 
Figure 3: An ill-typed attempt at chaining threads together.

Note that the programmer could also fix the type error in both versions of the code by spawning the sorting thread at a higher priority. This change, however, betrays the programmer’s intention (clearly stated in the priority annotations) that the sorting should be lower priority. The purpose of the type system, as with all such programming language mechanisms, is not to relieve programmers entirely of the burden of thinking about the desired behavior of their code, but rather to ensure that the code adheres to this behavior if it is properly specified.

3 The calculus

Figure 4: Syntax of

In this section, we define a core calculus which captures the key ideas of a language with an ML-style expression layer and a modal layer of prioritized asynchronous threads. Figure 4 presents the abstract syntax of . In addition to the unit type, a type of natural numbers, functions, product types and sum types, has three special types. The type  is used for a handle to an asynchronous thread running at priority  and returning a value of type . The type  is used for an encapsulated command. The calculus also has a type  of priority-polymorphic expressions. These types are annotated with a constraint  which restricts the instantiation of the bound priority variable. For example, the abstraction  can only be instantiated with priorities  for which .

A priority  can be either a priority constant, written , or a priority variable . Priority constants will be drawn from a pre-defined set, in much the same way that numerals  are drawn from the set of natural numbers. The set of priority constants (and the partial order over them) will be determined statically and is a parameter to the static and dynamic semantics. This is a key difference between the calculus and , in which the program can define new priority constants (we discuss in Section 6 how a compiler can hoist priority definitions out of the program).

As in , the syntax is separated into expressions, which do not involve priorities, and commands which do. For simplicity, the expression language is in “2/3-cps” form: we distinguish between expressions and values, and expressions take only values as arguments when this would not interfere with evaluation order. An expression with unevaluated subexpressions, e.g.  can be expressed using let bindings as . Values consist of the unit value , numerals , anonymous functions , pairs of values, left- and right-injection of values, thread identifiers, encapsulated commands  and priority-level abstractions .

Expressions include values, let binding, the if-zero conditional  and function application. There are also additional expression forms for pair introduction and left- and right-injection. These are  and , respectively. One may think of these forms as the source-level instructions to allocate the pair or tag, and the corresponding value forms as the actual runtime representation of the pair or tagged value (separating the two will allow us to account for the cost of performing the allocation). Finally, expressions include the case construct , output, input, priority instantiation  and fixed points.

Commands are combined using the binding construct , which evaluates  to an encapsulated command, which it executes, binding its return value to , before continuing with command . Spawning a thread and synchronizing with a thread are also commands. The spawn command  is parametrized by both a priority  and the type  of the return value of  for convenience in defining the dynamic semantics.

3.1 Static Semantics

The type system of carefully tracks the priorities of threads as they wait for each other and enforces that a program is free of priority inversions. This static guarantee will ensure that we can derive cost guarantees from well-typed programs.

Figure 5: Expression typing rules.
Figure 6: Command typing rules.
Figure 7: Constraint entailment

As with the syntax, the static semantics are separated into the expression layer and the command layer. Because expressions do not depend on priorities, the static semantics for expressions is fairly standard. The main unusual feature is that the typing judgment is parametrized by a signature  containing the types and priorities of running threads. A signature has entries of the form  indicating that thread  is running at priority  and will return a value of type . The signature is needed to check the types of thread handles.

The expression typing judgment is , indicating that under signature , a partial order  of priority constants and context , expression  has type . As usual, the variable context  maps variables to their types. The rules for this judgment are shown in Figure 5 . The variable rule var, the rule for fixed points and the introduction and elimination rules for unit, natural numbers, functions, products and sums, are straightforward. The rule for thread handles  looks up the thread  in the signature. The rule for encapsulated commands  requires that the command  be well-typed and runnable at priority , using the typing judgment for commands, which will be defined below. Rule I extends the context with both the priority variable  and the constraint . Rule E handles priority instantiation. When instantiating the variable  with priority , the rule requires that the constraints hold with  substituted for  (the constraint entailment judgment  will be discussed below). The rule also performs the corresponding substitution in the return type.

The command typing judgment is  and includes both the return type  and the priority  at which  is runnable. The rules are shown in Figure 6. The rule for bind requires that  return a command of the current priority and return type , and then extends the context with a variable  of type  in order to type the remaining command. The rule for  requires that  be runnable at priority  and return a value of type . The spawn command returns a thread handle of type , and may do so at any priority. The  command requires that  have the type of a thread handle of type , and returns a value of type . The rule also checks the priority annotation on the thread’s type and requires that this priority be at least the current priority. This is the condition that rules out sync commands that would cause priority inversions. Finally, if  has type , then the command  returns a value of type , at any priority.

The constraint checking judgment is defined in Figure 7. We can conclude that a constraint holds if it appears directly in the context (rule hyp) or the partial order (rule assume) or if it can be concluded from reflexivity or transitivity (rules refl and trans, respectively). Finally, the conjunction  requires that both conjuncts hold.

We use several forms of substitution in both the static and dynamic semantics. All use the standard definition of capture-avoiding substitution. We can substitute expressions for variables in expressions () or in commands (), and we can substitute priorities for priority variables in expressions (), commands (), constraints (), contexts (), types and priorities. For each of these substitutions, we prove the principle that substitution preserves typing. These substitution principles are collected in Lemma 1.

Lemma 1 (Substitution).

  1. If and , then .

  2. If and , then .

  3. If , then .

  4. If , then .

  5. If , then .

Proof.
  1. By induction on the derivation of . Consider one representative case.

    • E Then . By inversion, and . By induction, . Apply E.

  2. By induction on the derivation of .

    • Bind. Then . By inversion, and
      . By weakening, . By induction, and . Apply Bind.

    • Spawn. Then . By inversion, . By induction, . Apply Spawn.

    • Sync. Then . By inversion, . By induction, . Apply Sync.

    • Ret. Then . By inversion, . By induction, . Apply Ret.

  3. By induction on the derivation of .

    • E Then and . By inversion,
      and and .
      By induction, and .
      By E, .
      Because , this completes the case.

  4. By induction on the derivation of

    • Bind. Then . By inversion, and . By induction, and . Apply Bind.

    • Spawn. Then
      and . By inversion,
      . By induction, . By Spawn,

    • Sync. Then . By inversion, . By induction, . Apply Sync.

    • Ret. Then . By inversion, .
      By induction, . Apply Ret.

  5. By induction on the derivation of . We consider the non-trivial cases.

    • trans. By inversion, and . By induction, and Apply rule trans.

    • conj By inversion, and . By induction, and . Apply rule conj.

3.2 Dynamic Semantics

We define a transition semantics for . Because the operational behavior (as distinct from run-time or responsiveness, which will be the focus of Section 4) of expressions does not depend on the priority at which they run or what other threads are running, their semantics can be defined without regard to other running threads. The semantics for commands will be more complex, because it must include other threads. We will also define a syntax and dynamic semantics for thread pools, which are collections of all of the currently running threads.

The dynamic semantics for expressions consists of two judgments. The judgment  states that  is a well-formed value and refers only to thread names in the signature . The rules for this judgment are omitted. The transition relation for expressions  is fairly straightforward for a left-to-right, call-by-value lambda calculus and is shown in Figure 8. The signature  does not change during expression evaluation and is used solely to determine whether thread IDs are well-formed values. The ifz construct conditions on the value of the numeral . If , it steps to . If not, it steps to , substituting  for . The case construct conditions on whether  is a left or right injection, and steps to  (resp. ), substituting the injected value for  (resp. ). Function applications and priority instantiations simply perform the appropriate substitution.

Figure 8: Dynamic semantics for expressions.
Figure 9: Typing rules for thread pools
Figure 10: Congruence rules for thread pools.

Define a thread pool  to be a mapping of thread symbols to threads:  indicates a thread  at priority  running . The concatenation of two thread pools is written . Thread pools can also introduce new thread names: the thread pool  allows the thread pool  to use thread names bound in the signature . Thread pools are not ordered; we identify thread pools up to commutativity and associativity of 222Because threads cannot refer to threads that (transitively) spawned them, we could order the thread pool, which would allow us to prove that deadlock is not possible in . This is outside the scope of this paper.. We also introduce the additional congruence rules of Figure 10, which allow for thread name bindings to freely change scope within a thread pool.

Figure 9 gives the typing rules for thread pools. The typing judgment  indicates that all threads of  are well-typed assuming an ambient environment that includes the threads mentioned in , and that  includes the threads introduced in , minus any bound in a  form. The rules are straightforward: the empty thread pool  is always well-typed and introduces no threads, individual threads are well-typed if their commands are, and concatenations are well-typed if their components are. In a concatenation , if  introduces the threads  and  introduces the threads , then  may refer to threads in  and vice versa. If a thread pool  is well-typed and introduces the threads in , then  introduces the threads in  (subtracting off the threads explicitly introduced by the binding).

Figure 11: Dynamic rules for commands.
Figure 12: Dynamic rules for thread pools.

The transition judgment for commands is , indicating that under signature , command  steps to . The transition relation carries a label , indicating the “action” taken by this step. At this point, actions can be the silent action  or the sync action , indicating that the transition receives a value  by synchronizing on thread . This step may also spawn new threads, and so the judgment includes extensions to the thread pool () and the signature (). Both extensions may be empty.

The rules for the transition judgment are shown in Figure 11. The rules for the bind construct  evaluate  to an encapsulated command , then evaluate this command to a return value  before substituting  for  in . The spawn command  does not evaluate , but simply spawns a fresh thread  to execute it, and returns a thread handle . The sync command  evaluates  to a thread handle , and then takes a step to  labeled with the action . Note that, because the thread  is not available to the rule, the return value  is “guessed”. It will be the job of the thread pool semantics to connect this thread to the thread  and provide the appropriate return value. Finally, evaluates  to a value.

We define an additional transition judgment for thread pools, which nondeterministically allows a thread to step. The judgment  is again annotated with an action. In this judgment, because it is not clear what thread is taking the step, the action is labeled with the thread . Actions now also include the “return” action , indicating that the thread returns the value . Rule DT-Sync matches this with a corresponding sync action and performs the synchronization. If a thread in  wishes to sync with  and a thread  in  wishes to return its value, then the thread pool  can step silently, performing the synchronization. Without loss of generality,  can come first because thread pools are identified up to ordering. The last two rules allow threads to step when concatenated with other threads and under bindings.

We will show as part of the type safety theorem that any thread pool may be, through the congruence rules, placed in a normal form  and that stepping one of these threads does not affect the rest of the thread pool other than by spawning new threads. This property, that transitions of separate threads do not impact each other, is key to parallel functional programs and allows us to cleanly talk about taking multiple steps of separate threads in parallel. This is expressed by the judgment , which allows all of the threads in the set  to step silently in parallel. The only rule for this judgment is DT-Par, which steps any number of threads in a nondeterministic fashion. We do not impose any sort of scheduling algorithm in the semantics, nor even a maximum number of threads. When discussing cost bounds, we will quantify over executions which choose threads in certain ways.

We prove a version of the standard progress theorem for each syntactic class. Progress for expressions is standard: a well-typed expression is either a value or can take a step. The progress statement for commands is similar, because commands can step (with a sync action) even if they are waiting for other threads. The statement for thread pools is somewhat counter-intuitive. One might expect it to state that if a thread pool is well-typed, then either all threads are complete or the thread pool can take a step. This statement is true but too weak to be useful; because of the non-determinism in our semantics, such a theorem would allow for one thread to enter a “stuck” state as long as any other thread is still able to make progress (for example, if it is in an infinite loop). Instead, we state that, in a well-typed thread pool, every thread is either complete or is active, that is, able to take a step.

Figure 13: Static semantics for actions.

The progress theorems for commands and thread pools also state that, if the command or thread pool can take a step, the action performed by that step is well-typed. The typing rules for actions are shown in Figure 13 and require that the value returned or received match the type of the thread.

Theorem 1 (Progress).
  1. If , then either  or .

  2. If , then either  where or where .

  3. If and , then and for all , we have and .

Proof.
  1. By induction on the derivation of . Consider three representative cases.

    • natE. Then . By inversion, . By canonical forms, and either or .

    • E. Then . By inversion, and . By canonical forms, and .

    • E. Then . By inversion, , where . By canonical forms, and  steps by the transition rules.

  2. By induction on the derivation of .

    • Bind. Then . By inversion, and . By induction, either or . In the second case,  steps by rule Bind1. In the first case, by canonical forms, and, by inversion on the expression typing rules, . By induction, either where or takes a step. In both cases, takes a step (Bind3 or Bind2).

    • Spawn. Apply rule Spawn.

    • Sync. Then . By inversion, . By induction, either or . In the second case,  steps by rule Sync1. In the first case, by canonical forms, . Apply rule Sync2.

    • Ret. Then . By inversion, . By induction, either or . In the second case,  steps by rule Ret. In the first case, the conclusions are trivially satisfied.

  3. By induction on the derivation of . We consider the interesting cases.

    • Concat. By inversion, and . By induction, and , where

      and

      We also have that for all , and for all , . We have , so the conclusion holds by weakening and DT-Concat-One.

    • Extend. Then and . By inversion, . By induction, , where

      We also have that for all , . By the congruence rules, and the conclusion holds by weakening and DT-Extend.

The preservation theorem is also split into components for expressions, commands and thread pools. The theorem for commands requires that any new threads spawned () meet the extension of the signature ().

Theorem 2 (Preservation).
  1. If and , then .

  2. If and and then and .

  3. If and then

  4. If and then .

Proof.
  1. By induction on the derivation of .

  2. By induction on the derivation of .

    • Bind1. By inversion on the typing rules,