A memory model specifies which values can be read by memory accesses. C11-style memory models allow the programmer to specify if a given memory location should be acted on atomically or non-atomically . Atomic memory locations are intended to be used for inter-thread communication and synchronization. Every atomic memory action has a programmer-chosen memory ordering tag. The action’s memory ordering specifies the visibility of actions sequenced before or after it to other actions that synchronize with it. Intuitively, the memory ordering stipulates how the memory action can be reordered with other actions in the same thread. Following Lahav et al. , we differ from C11 and do not treat unsequenced races between atomic accesses to the same location as undefined behaviour. In contrast, multiple concurrent accesses to a non-atomic location, at least one of which is a write, constitutes a race and is regarded as undefined behaviour, because reading from a non-atomic location could retrieve a value from an intermediate state. Memory actions on non-atomic locations can be compiled to normal memory accesses, cheaper to perform than their atomic counterparts.
We provide a denotational framework for exploring C11-style memory models. We did not set out to exactly capture any particular account of the C11 memory model for two reasons. First, because the literature presents many different accounts of the C11 memory model (e.g., [2, 10, 13]), each addressing various shortcomings, we believe it is better to develop a generic framework in which we can study various formulations. Second, C11 has some features, such as consume accesses, that we deem are premature or introduce excessive complexity with little gain.
Our framework has two major components. In Section 3, we give a denotational (and hence compositional) account of a C11-style memory model. Each program is given a set of pomsets (a generalization of traces) as its denotation using various composition operators designed to capture exactly the per-thread memory reorderings permitted by the memory model. In Section 4, we give these pomsets an executional interpretation inductively defined on the structure of the pomset, using a local view of state. This interpretation is carefully constructed to respect synchronization constraints provided by the memory model and it is race-detecting.
2 An Informal Account
Each type of memory location has an associated set of actions. Non-atomic locations can be read from and written to. Atomic locations can additionally be acted on with atomic read-modify-write actions. There are memory operations that involve no locations. For example, a fence is a special atomic memory action that acts as a barrier against reordering. All atomic operations have an associated memory ordering tag chosen by the programmer.
The strongest ordering on atomic actions is sequential consistency, denoted by the tag . An action cannot be reordered with any other action, and every execution induces a total order on the actions. Though actions are expensive to implement, they allow the programmer to reason via an interleaving semantics. All atomic memory actions can use this ordering.
The release-acquire memory ordering paradigm gives lightweight synchronization between threads. No memory action sequenced before a release () write can be reordered to after the write. Symmetrically, no action sequenced after an acquire () read can be reordered to before the read. The intended semantics is that any action after an acquire read that “synchronizes with” a release write to the same atomic location sees the effects of all actions that occurred before the write. Fences and atomic read-modify-write actions, such as locking primitives, can use the acquire-release () ordering. These actions behave both as an acquire read and a release write. A key difference from is that executions need not induce a total order on actions.
The weakest memory ordering we consider is relaxed (). It imposes no additional constraints on reordering and only guarantees atomicity.
It is helpful to visualize the relative strength of these memory orderings using the following diagram by Lahav et al. :
A key desideratum is that executing a single sequential thread under our memory model should produce the same result as execution without any reorderings. This implies that at no point may we reorder memory actions to the same location within a given thread. We also want coherence, i.e., the property that writes to the same location appear in the same order to all threads. Our model should further respect data dependencies: whenever we write the value of an expression to a location, any reads required to evaluate that expression must be ordered before the write.
We illustrate these principles and the interplay between the release and acquire orderings using a simple message-passing example. Consider executing the program
from an initial state where all locations are initialized to . The fact that the write to is a release and the reads from are acquires guarantees that after the while loop terminates, the read from will see the value . We make these dependencies explicit by means of diagrams, where an arrow indicates that memory action is sequenced before , and abbreviates :
Had all of the actions been tagged with the ordering, the compiler could have reordered the writes to and because they do not depend on each other, and then the read from could have returned 0 instead of 42, giving us
We pause to remark that the reads from are still ordered before the read from : this is because there is a control-flow dependency between the read from the loop test and the write immediately following. Preserving control-flow dependencies will be important for eliminating various thin air behaviours.
3 Denotational Semantics
We make our informal account precise using a denotational semantics. Its chief advantage is compositionality: the meaning (or denotation) of a program in the memory model is determined by the denotations of its subphrases. This allows for modular reasoning and the validation of various program-level optimizations. The denotations of programs will be order structures on syntactic objects called “memory actions”. These order structures, called partially-ordered multisets (pomsets), generalize traces and are an abstract description of the program’s memory accesses.
We specify a simple imperative language with while loops, fences, local variables, and atomic read-modify-write actions. These features were chosen to illustrate the principles underlying our approach, but the details of the language are not important. We assume meta-variables for disjoint sets of identifiers (atomic assignable identifiers) and (non-atomic assignable identifiers) and we let range over the set of all identifiers. Finally, we let (integer values) and range over partial functions of type .
The abstract syntax of our language is given by the following grammar:
The meta-variables are (integer expressions), (boolean expressions), (commands), and (programs). We abuse notation and use subscripts such as to mean all memory orderings such that . Though the phrase is used both as an expression and a command, context will make its syntactic class unambiguous. In examples, we let the associated memory order tag determine whether an identifier corresponds to an atomic or non-atomic assignable.
Our semantic clauses will transform our language’s syntactic phrases into sets of memory action pomsets. A (memory) action is a syntactic object representing an action on the store. Memory actions are given by the following:
Let be the set of all actions and let and be the (partial) projections from to memory orderings and , respectively. Given a predicate on , let be the set of actions satisfying . For example, is the set of all actions such that . Special cases are the set of all memory actions with memory ordering , the set of all reading actions ( and ), the set of all writing actions ( and ), the set of fence actions, and the set of all actions involving the identifier .
A pomset over a set of labels is a triple where is a strict partial order satisfying the finite-height property and is a labelling function. A partial order satisfies the finite-height property if for all , the set is finite. Because our pomsets describe orderings between a program’s memory actions, the finite-height property implies that we have no unreachable actions in our pomset, i.e., that every action could in principle be executed. As is typical with mathematical structures, we call a pomset by its underlying set, and given a pomset , we let and denote its obvious components. We denote by the set of pomsets over . We typically refer to pomset elements by their labels, relying on context to disambiguate which underlying element we mean. In fact, the underlying elements carry no meaning, and we identify pomsets whenever there exists an order-isomorphism respecting labels, i.e., satisfying . We can identify pomsets and labelled directed acyclic graphs, as we did in Section 1, where we have an edge if and only if . We say a pomset is linear if is a total order.
We further identify non-empty pomsets whenever there exists a non-empty pomset such that can be obtained from and from by deleting finitely many actions.111Formally, the deletion of from is given by . Deleting finitely many actions from means deleting a finite subset of from . This is akin to closure under stuttering and mumbling in trace semantics (cf. ), and our semantics is well-defined relative to it.
Our semantic clauses assign sets of pomsets to syntactic phrases, and compositionality requires us to be able to compose the pomsets from subphrases to form the denotation of a phrase. The sequential composition of pomsets and is when is finite, where if and only if and , or , and where . Intuitively, this orders everything in before everything while preserving their internal orderings. When is infinite, . The finiteness check on ensures satisfies the finite-height property. The parallel composition of pomsets and is given by where if and only if and . It is straightforward to check that these compositions are all associative with the empty pomset as their unit. They lift to sets of pomsets in the obvious manner.
The denotation of an integer expression is a subset of inductively defined on the syntax of the expression:
The clause has a pomset for each possible value that could be read from . We must allow for all possible values to get compositionality: we do not know a priori with which writes an expression may be composed, and hence do not know what values might be read from . The clause is analogous and captures the atomic nature of the read-modify-write by treating it as a single memory action, rather than a sequenced read-write pair. We indicate that we compute the in in parallel by combining the memory actions with a parallel composition.
The denotation of a boolean expression is a subset of and is defined analogously. To simplify the clauses with conditionals, we introduce the helper definitions and the analogous .
The denotation of a program is a subset of inductively defined on its syntax: . The denotation of a command is a subset of , also inductively defined on its syntax. The basic commands are given by:
The only interesting clause here is for , where data dependency requires that the corresponding write be sequenced after all actions performed in computing .
Before we can give semantic clauses for compound commands, we must introduce the relaxed sequential composition. The relaxed composition of two pomsets orders actions from the first before those of the second only when required by the memory model. To make this precise, we introduce the following predicates. holds if and only if and . holds if and only if and . Actions and are memory-ordered, , if and only if , , or . The relaxed sequential composition of pomsets is when is finite, where if and only if and , or , , and , and is the transitive closure of . When is infinite, . Relaxed sequential composition is also associative with as its unit.
The sequencing, looping, and conditional clauses are given by:
where is the taken to be the evident infinite unfolding. There are a few subtleties in these clauses. In the clause for , we use sequential compositions because there is a control-flow dependency between the memory actions for and those for the . Respecting this dependency is important to eliminating “thin-air” behaviours. Indeed, suppose we had used the relaxed composition instead, and consider the program . Then it would have a pomset of the form in its denotation, and none of the memory actions would be ordered because the locations in the boolean expressions and the commands in the conditional branches involve different locations. One could then perform the write actions before the read actions and execute the whole program, even from a state where and are initialized to . By instead using sequential composition, the reads are sequenced before the branch’s writes, and the program is not executable from this state. In contrast, in the clause for , we should be permitted to reorder memory accesses if the memory model allows it, and so we use the relaxed sequential composition.
The last clause is for local assignables, which can be thought of as registers. Given a command , the intention is that the assignable should be initialized to and be visible only to . Consequently, any other commands should not be able to observe ’s effects on , even if appears free in . We must, however, be able to observe that did an action whenever it does an -action: the program should be non-terminating. To satisfy these desiderata we take all of the pomsets of whose uses of the location are internally consistent and then replace all -actions with no-op actions. We formally accomplish this by introducing additional operations on pomsets. To ensure internal consistency on , we need to restrict our attention to -actions. The restriction of a pomset to a subset is the pomset obtained by discarding all elements whose label is not in . To make sure they are internally consistent, we check that they are sequentially executable. This is accomplished with a predicate that holds if and only if is followed by zero or more occurrences of , or if with and . This syntactic check is equivalent to the sequential executions of Section 4. Finally, to replace all -actions by actions, we need a substitution operation. The substitution of for in , , is given by where if , and otherwise. Combining these ingredients, we get the clause
This definition satisfies various desirable equivalences, such as
where is program equivalence, defined to hold if and only if .
To illustrate our semantic clauses, we observe that the command
includes pomsets of the following form, for each , in its denotation:
In contrast, the command
has executable pomsets of the form
4 Executional Interpretation
We give a race-detecting input-output interpretation to the abstract denotations of Section 3. This interpretation serves three main purposes. First, it gives us a notion of “running” the executions a pomset describes, and it tells us the initial states from which we can do so, along with the corresponding effects on state. Second, it gives us a means of detecting which syntactic races are meaningful, and which can not occur. For example, the program (1) (page 1) has a syntactic race on the non-atomic location , but this race can never occur during an execution starting from a zero-initialized state because of the synchronization via the atomic location . Finally, it allows us to rule out various pomsets assigned to commands that are not executable alone, but that are included for the sake of compositionality and that are executable in a larger environment. Consider, for example, the pomset belonging to the program . It is not executable, but it would be if we were to compose it with .
We use two kinds of state: proper and overdefined. Proper states are finite partial function from identifiers to values, in particular, elements of . We include a least element in the codomain to denote an unconstrained value. Its purpose will be made clear when we define footprints of actions below. We use the notation to mean the proper state whose graph is . Given proper states and , let if and only if for all , . The symbol is the overdefined state, which is the result of a race. Let be the set of all states, ranged over by .
We proceed in two stages. We first assign an executional meaning to individual memory actions. Then, we assign an executional meaning to action pomsets. Because we are in a weak memory setting in which a single location can be acted on concurrently, the concept of a “global state” is not well-defined. Indeed, hardware features such as write buffers could cause different threads to read different values from the same location at the same time. Instead, we use a local notion of state called a footstep. A footstep of an action is a pair of states, where is a minimal piece of state enabling to be performed, and describes the effect of performing from . The footprint of is the set of all of its footprints. We define the footprints of memory actions as follows:
Informally, it should be clear that none of the above actions cause any allocation: whenever for some action , . We use the value in the codomain of proper states to indicate that, though a write action requires that the location appear in the initial state, it is ambivalent to its value. The footprint of an action is also agnostic of the action’s memory ordering tag.
We can give pomsets an analogous notion of footprint. We will do so by recursing on the structure of the pomset, considering three principle cases: when the pomset is a single action, when the pomset can be decomposed into a pair of parallel pomsets, and when the pomset has an executable prefix.
We first specify structural conditions for when two pomsets can be run concurrently and whether doing so constitutes a race. Because we want a total order on all actions, we cannot run two pomsets containing actions concurrently. So we say concurrently executing and respect actions, , if and only if only one of them performs actions, i.e., if and only if or . We say that pomsets and have a data race222One could replace with throughout to instead consider races between actions satisfying . on if and that they have a data race, , if they have one on some . Intuitively, means that and both act on with at least one of them writing to .
Pomsets and are consistent, , if 1. , 2. , and 3. . Consistency means that there is no syntactic constraint preventing us from considering concurrent execution of and . The third condition means we do not have any write-write races between and , and is required to totally order writes on a per-location basis. In contrast, pomsets and could race, , if 1. , 2. , and 3. . The third condition means that we do not have any atomic write-write races. The intention is whenever , we should be able to regain consistency by deleting all of the data races.
Next, we need a notion of splitting a pomset into a prefix and a suffix that can sequentially be executed. We say that a subset of a pomset is downward-closed if whenever and , then . We write to mean that is a finite downward-closed subset of and that is the remainder of . In this case, we call a prefix of and a suffix of . Observe that if and is finite, then ; finiteness is needed to guarantee fairness.
When executing two threads in parallel, we need only consider footsteps starting from consistent states. We say two proper states are consistent, , if exists. This means that for all , if and , then . The overdefined state is consistent with no state. Given a set and a state , we let be when , and otherwise. Given proper states and , updating by gives us a new state . Explicitly, is whenever , and whenever . If or is , then is defined to be . To combine the initial states of two pomsets data racing on , we define the racy product , explained below.
We let the footprint of a pomset be inductively defined as the least set given by the following rules, which are explained below:
If , then for all .
If , , , and ,
If , , , , and ,
If , , , , and ,
If and , then .
If , , , and ,
The set of executions of a pomset is , and the set of executions of a program is . Executions capture running programs on “real” states, so we require initial states to have a specific value for each location. We say is executable from if for some , and is racy if for some .
We explain each of the rules in turn. The (Seq) rule captures sequential execution of a prefix before a suffix . The consistency condition tells us that, if we start from a state satisfying and update it with the effects of performing , the resulting state doesn’t disable the execution of . The state contributes the initial state required by that isn’t provided by .
The (Par) rule tells us that whenever and cannot race and agree on their initial states, then we can run them in parallel with resulting effect . The effect is a well-defined proper state because the assumption guarantees that and do not write to the same location, i.e., that .
The (RaceP) rule handles races a in pomset’s prefix. The rule tells us that if is sufficient to reach a race in and is a prefix of , then is sufficient to reach a race in . This captures the viewpoint that if we ever encounter a race, we do not need to execute the rest of the program. The (RaceS) rule deals with a race in a suffix of a pomset, and is analogous to the (Seq) rule.
The (Race) rule resembles the (Par) rule. The key difference is how we form the initial state. Before considering motivating examples, we first unpack the definition of the racy product , assuming . If , then , and vice-versa. Now consider . If , i.e., if has a race on the non-atomic location , then , i.e., if any of the is . As we will see in the example below, this captures a race where the action writing to does not depend on a prior read from in order to be executable. If , then . For example, is
To begin with, consider the pomset , and assume first that . We argue that this pomset should be executable and racy for all values of and : it has a non-atomic write of to that is not sequenced with the non-atomic read of from , and so the values should not matter. Even when , it could be that is the value read from an intermediate hardware state caused by the writing of . Our semantics validates this desideratum: we can apply (Race) to the unsequenced actions to get the footstep , and then using (RaceS) we get that . Now assume that , then is not -racy: it has no actions on which it can race. Moreover, has a non-empty footprint only if , and .
Next consider the pomset , and assume first that . We argue that this pomset should be executable only when : when , there is no write supplying the value required by the read, and so the data-race on should not be able to manifest itself. However, when , this pomset should be racy, because we have an unsequenced non-atomic write and read from . Our semantics captures these behaviours. For example, and , and the states and are consistent only if . When this is the case, we get as the sole footprint for the parallel composition. But we can apply the (RaceS) rule only when , which holds only when . Decomposing the pomset differently provides the same constraint. In contrast, when , is not racy and it is executable only when and .
Finally, consider the pomset . We get the footsteps and for all and . The first footstep corresponds to a sequential execution with all the reads before the writes. The second footstep corresponds to reading from the initial state, and then the program racing on . When , we also get the footstep , which describes an execution where all reads are performed after all writes.
We show that our definition of execution validates various other litmus tests in the way we expect. The sb (store buffering) test is the simplest example of behaviour that is not sequentially consistent. When , we can execute the pomset starting from the state by using the (Par) rule.
We can also validate the iriw (independent reads of independent writes) test. Indeed, we can execute the pomset
starting from the initial state by splitting the pomset down the middle and applying the (Par) rule. This shows that we are weaker than the TSO memory model, because we do not impose a total order on all writes. Though we can read writes to different locations in different orders, the restriction on the (Par) rule ensures that we have a per-location total order on writes. Then all threads see the writes to the same location in the same order, i.e., we guarantee coherence.
5 Related Work
Most work on formalizing weak memory models so far uses “execution graphs” or “candidate executions”, in which nodes are labeled with actions and there are multiple kinds of edge, usually characterized as po (program order), rf (reads-from), mo (modification order), and so on. There is a substantial body of well-established research in this vein, including [1, 3, 5, 11, 12]. In work aiming to formalize C/C++11 such as , axioms are imposed to rule out candidate executions involving undesirable cycles of composite edges, primarily to avoid issues with so-called thin-air reads, and to ensure that each read action is justified by a suitable write. It has proven to be difficult to strike the right balance between ruling out bad cases while still allowing intended behaviours.
Our memory model is heavily based on the RC11 (“Repaired C11”) model presented by Lahav et al. . The RC11 model repairs compilation to Power by providing a better semantics for SC accesses. Like RC11, our model differs from C11 by not treating races between atomic accesses as undefined behaviour. Our approach to eliminating “thin-air” behaviour is similar to theirs: we prohibit violations of data and control-flow dependency, which amounts to prohibiting cycles in the execution (“hb”) order between reads and their corresponding writes.
In prior work , we give a semantics to the SPARC TSO weak memory model using pomset denotations and executions, and buffered state. The semantics developed in this paper is a significant step forward toward greater generality and wider applicability. Rather than attempt to model out-of-order execution by explicitly modelling buffers, both in pomset generation and in execution, we use the relaxed composition operator and leverage pomset structure. To enforce a total order on writes during execution, TSO executions were parametrized with linearizations of writes. In contrast, C11 executions totally order actions via the relation, thereby reducing technical overhead.
Jeffrey and Riely  and Castellan  give a denotational semantics using event structures and exploit game-theoretic ideas in formulating a notion of execution. We prefer to work with a set of pomsets, obtained by taking account of possible relaxations and executions, rather than using a single structure combining these possibilities into one object; it seems less cumbersome to work with sets-of-pomsets.
Our extension to incorporate the wider range of C11-style memory orderings required significant technical development in order to produce a denotational account that is faithful to operational intuitions, and improves on the foundations laid by our prior denotational account of TSO. A key new idea presented here is the relaxed sequential composition for pomsets, a form of sequential composition that takes account of the memory model’s support for reordering of memory actions. We are careful to avoid reordering of actions for which there is a control-flow dependency, arguing that this eliminates a major class of thin-air behaviours. We discussed a selection of “litmus test” examples to show that our semantics and our notion of pomset execution yield results consistent with the literature. We plan a more comprehensive cataloguing of litmus tests to solidify this claim. Indeed, we expect to prove a pomset analogue of the DRF-SC property, which provides programmers a sufficient condition for ensuring their programs are not executionally racy.
Batty, M., M. Dodds and A. Gotsman, Library abstraction for C/C++
concurrency, in: Proceedings of the 40th Annual ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages, POPL ’13 (2013), pp.
Batty, M., A. F. Donaldson and J. Wickerson, Overhauling SC atomics in
C11 and OpenCL, in: Proceedings of the 43rd Annual ACM
SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’16
(2016), pp. 634–648.
Batty, M., K. Memarian, S. Owens, S. Sarkar and P. Sewell, Clarifying and
compiling C/C++ concurrency: From C++11 to POWER, in:
Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages, POPL ’12 (2012), pp. 509–520.
Batty, M., S. Owens, S. Sarkar, P. Sewell and T. Weber, Mathematizing
C++ concurrency, in: Proceedings of the 38th Annual ACM
SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’11
(2011), pp. 55–66.
Boehm, H.-J. and S. V. Adve, Foundations of the C++ concurrency memory
model, in: Proceedings of the 29th ACM SIGPLAN Conference on
Programming Language Design and Implementation, PLDI ’08 (2008), pp. 68–78.
Brookes, S., Full abstraction for a shared-variable parallel language,
Information and Computation 127 (1996), pp. 145–163.
Castellan, S., Weak memory models using event structures, in:
Vingt-septième Journées Francophones des Langages Applicatifs
(JFLA 2016), 2016.
Jeffrey, A. and J. Riely, On thin air reads towards an event structures
model of relaxed memory, in: Proceedings of the 31st Annual ACM/IEEE
Symposium on Logic in Computer Science, LICS ’16 (2016), pp. 759–767.
Kavanagh, R. and S. Brookes, A denotational semantics for SPARC TSO,
in: Proceedings of the 33rd Conference on the Mathematical Foundations
of Programming Semantics (MFPS XXXIII), Electronic Notes in Theoretical
Computer Science, 2017, to appear.
Lahav, O., V. Vafeiadis, J. Kang, C.-K. Hur and D. Dreyer, Repairing
sequential consistency in C/C++11, in: Proceedings of the 38th ACM
SIGPLAN Conference on Programming Language Design and Implementation,
PLDI 2017 (2017), pp. 618–632.
Sarkar, S., P. Sewell, F. Z. Nardelli, S. Owens, T. Ridge, T. Braibant, M. O.
Myreen and J. Alglave, The semantics of x86-cc multiprocessor machine
code, in: Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium
on Principles of Programming Languages, POPL ’09 (2009), pp. 379–391.
Sewell, P., S. Sarkar, S. Owens, F. Z. Nardelli and M. O. Myreen,
X86-TSO: A rigorous and usable programmer’s model for x86
multiprocessors, Commun. ACM 53 (2010), pp. 89–97.
Vafeiadis, V., T. Balabonski, S. Chakraborty, R. Morisset and
F. Zappa Nardelli, Common compiler optimisations are invalid in the
C11 memory model and what we can do about it, in: Proceedings of the
42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming
Languages, POPL ’15 (2015), pp. 209–220.