A memory model specifies what values can be read by a thread from a given memory location during execution. Traditional concurrency research has assumed sequential consistency, wherein memory actions operate atomically on a global state, and a read is guaranteed to observe the value most recently written to that location globally. Consequently, “the result of any execution is the same as if the operations of all the processors were executed in some sequential order” [Lamport1979]. However, sequential consistency negatively impacts performance, and modern architectures often provide much weaker guarantees. These weaker guarantees mean that classical concurrency algorithms, which often assume sequential consistency, can behave in unexpected ways. Consider, for example, the Dekker algorithm on a system using the SPARC instruction set. The Dekker algorithm seeks to ensure that at most one process enters a critical section at a time. Executing the following instance of the Dekker algorithm on a sequentially consistent system from an initial state where memory locations , , , are all zero will ensure that we end in a state where not both and are set to one:
However, the SPARC ISA provides the weaker SPARC TSO (total store ordering) memory model. Under SPARC TSO, it is possible to start from the aforementioned initial state and end in a state where both and are set to one, thus violating mutual exclusion.
Weak memory models are often described in standards documents using natural language. This informality makes it difficult to reason about how programs will behave on systems that use these memory models. The SPARC Architecture Manual [SPARC8] gives an axiomatic description of TSO using partial orders of actions. We present this description in Section 2; other axiomatic approaches are discussed in Section LABEL:sec:related-work. Despite their formality, it is hard to use axiomatic accounts to reason about the behaviour of programs. This is because the axiomatic approach is non-compositional and precludes modular reasoning. We address this problem by presenting a denotational semantics for SPARC TSO in Section 3. Our denotational semantics assigns to each program a collection of pomsets. Pomsets are generalizations of traces and were first used by Pratt [Pratt1986] to give denotational semantics for concurrency, and later by Brookes [Brookes2016MFPS], with some modifications, to study weak memory models. We illustrate our semantics by validating various litmus tests and expected program equivalences. To ensure our denotational semantics accurately captures the behaviour of SPARC TSO, we show in Section LABEL:sec:sands that from every denotation of a program we can derive a collection of partial orders satisfying the axiomatic description of Section 2 and, moreover, that we can derive every such partial order from the denotation.
2. An Axiomatic Account
The SPARC manual [SPARC8] gives an axiomatic description of TSO in terms of partial orders of actions. Unfortunately, this description is incomplete because it fails to specify the fork and join behaviour of TSO. In this section, we complete the SPARC manual’s account of TSO with an account of forking and joining. Before doing so, we give an informal description of TSO to help build intuition.
A system providing the TSO weak memory model can be thought of as a collection of processors, each with a write buffer. Whenever a processor performs a write, it places it in its write buffer. The buffer behaves as a queue, and writes migrate out of the buffers one at a time, and shared memory applies them according to a global total order. Whenever a processor tries to read a location, it first checks its buffer for a write to that location. If it finds one, it uses the value of the most recent such write; otherwise, it looks to shared memory for a value. Because of buffering, it is possible for writes and reads to be observed out of order relative to the program order.
2.1. Program Order Pomsets
To make the above intuition precise, we must formalize the notion of a program order, i.e., the ordering of read and write actions specified by a program. We do so by means of partially-ordered multisets.
A (strict) partially-ordered multiset or pomset over a label set consists of a strict poset of “action occurrences” and a function mapping each action occurrence to its label or “action”.
We frequently write just for , in which case we let and denote its obvious components. Denote by the set of pomsets over . We remark that pomsets are a natural generalization of traces. Indeed, all traces are simply pomsets where the underlying order is total.
We do not usually make the poset explicit, because the structure of the pomset is invariant under relabellings of the elements of . Consequently, we identify pomsets and if there exists an order isomorphism such that . We usually denote the elements of the pomset using just their labels, but we sometimes need to specify their exact occurrence, in which case we write , where .
It is useful to draw a pomset as a labelled directed acyclic graph, where multiple vertices can have the same label and we have an edge if . For clarity, we always omit edges obtained by transitivity of . For example, the following graph depicts the pomset where , the order is given by , , , and , and is given by , , , and :
We assume a countably infinite set of locations , ranged over by metavariables , and a set of values , ranged over by . In our examples, we will take to be the set of integers. We call a global write action, a read action, and a skip action. Let and be the sets of global write actions and read actions, respectively.
A program order is a pomset over the set of action labels that satisfies the (locally) finite height property, that is, such that for all , the set is finite.
Informally, the finite height property guarantees that all actions in can be executed after finitely many other actions, i.e., that the program order contains no unreachable actions.
Intuitively, the program order
describes the parallel execution of writing 2 to before reading 1 from , and writing 1 to before reading 1 from , with no other ordering constraints.
2.2. TSO Axioms
We now turn our attention to our completed version of the axiomatic account given in the SPARC manual. To do so, we first introduce the notion of state and the requisite notation.
A global state is a finite partial function from locations to values . We let be the set of global states, and use to range over .
Given any set and partial order on it, every element determines a set called its lower closure. Write to denote that and are not comparable under the reflexive closure of , and write to denote that they are comparable.
Let be a program order and be a strict partial order on the elements of . We say is TSO-consistent with from (the initial state) if it satisfies the following six axioms:
Ordering: totally orders the write actions of .
Value: for all reads in , either
there exists a write maximal under amongst all writes to in , all writes to in are in , and ; or
there exists a write maximal under amongst all writes to in , and both and ; or
there are no writes to in or , and .
LoadOp: for all reads and all actions , implies .
StoreStore: for all writes , implies .
Fork: if , , and , then and .
Join: if , , and , then and .
The fork axiom is easily understood by: if in , then or in ; the join axiom is symmetric. We simply say is TSO-consistent with if there exists some initial state from which they are TSO-consistent. It will be useful to identify and the pomset . ∎
Axioms (O), (Va), (Vb), (L), and (S) are directly adapted from the formal specification given in Appendix K.2 of [SPARC8]. We introduce axiom (Vc) to simplify our presentation of examples. By requiring that programs first write to any locations from which they read, it can be omitted, and apart from examples, we will assume throughout that our TSO-consistent orders do not require (Vc). Though the formal specification does not provide axioms (F) and (J), they are consistent with the behaviour intended by Appendix J.6 of [SPARC8]. Intuitively, axiom (Vb) requires that whenever a processor reads from a location, it must use the most recent write to that location in its write buffer (if it exists). If there is no such write in its write buffer, but we have observed a global write to that location, then (Va) requires that the most recent such write be the one read. Our presentation differs slightly from the formal specification. In particular, we do not consider instruction fetches or atomic load-store operations, and we do not consider flush actions, because they can be implemented as a derived action in our semantics by forking and immediately joining. To be consistent with our pomset development, we also assume the order to be strict.
As the following proposition’s corollary shows, if is TSO-consistent for , then there exists a (not necessarily unique) total order on that is TSO-consistent with and contains . As a result, we can view all orders that are TSO-consistent with as weakenings of total orders that are TSO-consistent with . The proposition follows by a straightforward check of the axioms, where (V) is the only axiom that is not immediate.
Let be TSO-consistent for and be two actions such that b. Let be the transitive closure of . If there exist maximum writes under and relative to , call them and , respectively. If either and exist and , or does not exist, then is TSO-consistent for . ∎
If is a finite program order pomset and a partial order TSO-consistent with , then there exists a total order TSO-consistent with such that . ∎
Despite Corollary 2.2, one should be careful not to conflate the notion of linearisation with that of TSO-consistent total orders. Consider, for example, the program order
The linearisation is not TSO-consistent with the program order because it violates (Va); the order is not a linearisation of the program order but is TSO-consistent with it.
When we have a write followed by a read in the program order, but swapped in the linear order, as in this example, we can imagine the write having gotten stuck in the write buffer, and observing the read before the write.
3. A Denotational Account
So far we have dealt with program orders in the abstract. To make the rest of our development more concrete, we restrict our attention to program orders for well-defined programs in the simple imperative language given below. These program orders are defined in Section 3.2.
Restricting our attention to program orders of these well-defined programs raises the question of compositionality. The key is to find a way to derive TSO-consistent orders for a sequential composition or parallel composition given TSO-consistent orders for and . This is infeasible with the axiomatic approach, which requires reasoning about whole programs and is inherently non-compositional. In contrast, a denotational approach using pomsets is compositional: it allows us to derive the meaning of a program vis-à-vis a weak memory model from the meanings of its parts vis-à-vis the memory model.
Our denotational semantics has two components. The first associates to each program a set of TSO pomsets, which serves as the abstract meaning or denotation of the program. This component is described in Section 3.3. The second associates to each pomset a set of executions, which describe its input-output behaviours. This is described in Section LABEL:sec:footprints.
3.1. A Simple Imperative Language
We express our programs using a simple imperative language. This formalism avoids the complexity of high-level languages, while still capturing the programs we are interested in. In the syntax below, ranges over integer expressions, over boolean expressions, over commands, and over programs. We distinguish between commands and programs, because although commands can be composed to form new commands, programs are assumed to be syntactically complete and executable. This distinction will be useful later when we consider executions, where we will assume programs are executed from initial states with empty buffers, but impose no such constraint on executions for commands.
Let denote the set of integer expressions, the set of boolean expressions, and the set of commands.
3.2. PO Pomsets
Given a command in our language, we must now compile it down to its set of program order pomsets. We need operations for the sequential and parallel composition of pomsets over the same set of labels. When defining compositions of pomsets, we assume without loss of generality that the underlying posets are disjoint.
The sequential composition is whenever is infinite, and otherwise it is . The parallel composition of pomsets is . The empty pomset is the unit for sequential and parallel composition. Given a pomset on a set of labels and a subset , the restriction of to is the pomset on whose ordering is induced by . The deletion of from is . We lift these operations to sets of pomsets in the obvious manner, e.g., . ∎
Because the skip action has no effects, we identify program orders and whenever there exists a non-empty pomset that can be obtained in two ways: by deleting a finite number of actions from and also by deleting a finite number of actions from . This means, e.g., that we identify , , and whenever , but .
We begin with the program order denotation of expressions. To each expression , we assign a set of tuples of program orders and corresponding values:
where ranges over binary operations. Read expressions are associated with arbitrary values in for reasons of compositionality: we do not know with which writes the read may eventually be composed, and so we need to permit reading arbitrary values. We chose to evaluate binary operations in parallel; one could just as legitimately have chosen to sequentialise the evaluation and written . We assume to be the result of applying the binary operation to and . We handle program orders for unary expressions analogously, and assume is the result of negating the boolean value . To simplify the clauses involving conditionals, we give helper functions and to extract the pomsets corresponding to the given boolean values from .
Note that in the case of boolean binary operations, the might be integer or boolean expressions, and the corresponding semantic clause for should be used.
We give the program order denotation of commands in a similar manner, this time associating sets of program orders to each command phrase:
where and .
The only interesting clause is for . Here, we take union of all of the finite unrollings of the loop. We must also consider the case of an infinite loop. This is captured by , which describes the infinite pomset obtained by unrolling the loop countably infinitely many times. The clause also illustrates why we associate the pomset instead of to values: otherwise, we would have , and this would break our intuition that this program should be denotationally equivalent to . It would also have no executions under the formal account of Section LABEL:sec:executions.
To illustrate the above semantic clauses, we return to the Dekker program from the introduction. This program has pomsets of each of the following forms, for each choice of and :
The first program order describes an execution where we read both and and where Dekker fails. The next three forms of program order describe executions in which one or both reads obtain a non-zero value.
3.3. TSO Pomsets
In this subsection, we assign a set of TSO pomsets to each program , serving as the abstract meaning or denotation of the program under the TSO memory model. To do so, we will need to carefully model write buffers. For compactness, we will write instead of in this section’s semantic clauses.
We introduce a set of buffer locations, assumed to be in bijection with , and let the set of buffer write actions be . An action by a thread denotes adding a write to the thread’s write buffer. The set of TSO actions then consists of extended with . A TSO pomset will then be a pomset in satisfying the finite height property.
To capture the effects of buffers, we parametrize our semantic clauses with lists of global write actions, which represent the writes currently in our buffer. We let be the set of all lists. The intuition is that write buffers behave as queues under TSO, and we can use a list to model a queue by dequeuing from the head of the list and enqueuing at the end of the list. For expository convenience, we identify lists and linear pomsets, where we say a pomset is linear if its underlying poset is linear. Explicitly, we identify with the empty pomset , and with the pomset . To minimize notation, we leverage this identification and write to denote the concatenation of and .
The semantic clauses are given in two strata. The semantic clauses for “basic TSO pomsets” capture the meaning of the syntactic phrases in a manner very similar to the program order definitions in Section 3.2. assign to each command phrase a function from buffer lists to a set of pairs of TSO pomsets and buffer lists. We present these clauses using the abbreviation . The pomset component of captures the meaning of the phrase in the presence of the buffer , while the buffer component describes the state of the buffer after performing the actions associated with the phrase. In the second stratum, we use clauses to capture the meaning of the phrase in the presence of buffer flushing. Flushing a write from a buffer consists of dequeuing a global write from and inserting it in the pomset. is again a subset of .
To generate TSO pomsets, we modify the semantic clauses generating program orders in four key places to get our basic pomsets. The first is for write commands . Starting from a buffer , we get the pomset and associated value for from the denotation instead of . The buffer may have changed to a buffer while we were evaluating , and also gives us this . Instead of immediately making a global write to as we would have in the program order clause, we enqueue the global write on the buffer :
We must also change the semantic clauses for read expressions. By axiom (Vb), whenever we read from a location , we must use the most recent value available for it in the write buffer, if available. We use the following helper function to convert a buffer to a partial function giving us the value of the most recent write in to a given location:
Then, the semantic clause giving us the basic pomsets for reads is
The first part tells us to use the value associated with in the buffer , if available. The second part uses arbitrary values if the value is unavailable, as with program orders.
The third major change involves parallel composition. We explain parallel composition of expressions; parallel composition of commands is analogous. By axioms (F) and (J), we must flush our buffers before every fork and join. We therefore begin by flushing our entire buffer, i.e., by taking and placing it at the beginning of our pomset. Having flushed the buffer, we then evaluate the with empty buffers and get back pomsets and . Because we can only join threads if their buffers are empty, we require that these and be associated with empty buffers in . We then proceed as for the program order, and add the parallel composition of the to our pomset, and compute the value . Because we just joined two empty buffers, our resulting buffer is empty:
Finally, when we sequentially compose two commands and (assuming no forking or joining), continues executing from the buffer finished with. and are both functions of type and are not composable qua functions. Consequently, we need to define a composition operation capturing the above the operational intuition. This composition is the polymorphic function , where and :
Taking , sequential composition can be expressed using as
Explicitly, this means . This idiom of chaining pairs of pomsets and buffers together using will be useful throughout. We make polymorphic so that we can handle, e.g., the case of and below.
The remainder of the basic clauses are analogous to those for program order pomsets, subject to the modifications described above:
where and . The set contains all infinite pomsets obtained through countably infinitely many unfoldings; because we can never observe the buffer at the end, we treat it as empty to simplify presentation.
We now turn our attention to flushing. The intent is that a thread can flush arbitrarily many of its writes at any point in its execution. Thus, the pomsets associated with flushes for a buffer are the prefixes of , and the resulting buffers are the remainders of . We use to denote these prefix-suffix pairs:
We introduce a variant of to cope with triples of pomsets, values, and buffers, and will rely on types to disambiguate the version needed in any given situation:
We define the TSO pomsets in terms of and . composes and in a manner that we can flush some writes from the buffer, then evaluate or perform , and then flush some writes at the end:
We can validate various expected equivalences by unfolding these definitions. For example, sequential composition of commands is associative, because
Using the identity and the fact that parallel composition of pomsets is associative, one can show that parallel composition of commands is associative, i.e., that . The parallel composition of pomsets commutes, so the parallel composition of commands commutes, i.e., .
To illustrate the effects of flushing and the effect of buffers on reads, we consider the expression in the presence of the buffer . The triples are of the form