Safe Non-blocking Synchronization in Ada 202x

03/27/2018
by   Johann Blieberger, et al.
0

The mutual-exclusion property of locks stands in the way to scalability of parallel programs on many-core architectures. Locks do not allow progress guarantees, because a task may fail inside a critical section and keep holding a lock that blocks other tasks from accessing shared data. With non-blocking synchronization, the drawbacks of locks are avoided by synchronizing access to shared data by atomic read-modify-write operations. To incorporate non-blocking synchronization in Ada 202x, programmers must be able to reason about the behavior and performance of tasks in the absence of protected objects and rendezvous. We therefore extend Ada's memory model by synchronized types, which support the expression of memory ordering operations at a sufficient level of detail. To mitigate the complexity associated with non-blocking synchronization, we propose concurrent objects as a novel high-level language construct. Entities of a concurrent object execute in parallel, due to a fine-grained, optimistic synchronization mechanism. Synchronization is framed by the semantics of concurrent entry execution. The programmer is only required to label shared data accesses in the code of concurrent entries. Labels constitute memory-ordering operations expressed through attributes. To the best of our knowledge, this is the first approach to provide a non-blocking synchronization construct as a first-class citizen of a high-level programming language. We illustrate the use of concurrent objects by several examples.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

02/08/2020

Paving the way for Distributed Non-Blocking Algorithms and Data Structures in the Partitioned Global Address Space

The partitioned global address space has bridged the gap between shared ...
04/08/2019

Analysis of Commutativity with State-Chart Graph Representation of Concurrent Programs

We present a new approach to check for commutativity in concurrent progr...
04/10/2018

A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines

Common implementations of core memory allocation components, like the Li...
11/22/2021

A Formally-Verified Framework for Fair Synchronization in Kotlin Coroutines

Writing concurrent code that is both correct and efficient is notoriousl...
11/04/2019

Verifying Visibility-Based Weak Consistency

Multithreaded programs generally leverage efficient and thread-safe conc...
09/12/2011

Light-weight Locks

In this paper, we propose a new approach to building synchronization pri...
03/01/2022

Synthesizing Fine-Grained Synchronization Protocols for Implicit Monitors (Extended Version)

A monitor is a widely-used concurrent programming abstraction that encap...

1 Introduction

Mutual exclusion locks are the most common technique to synchronize multiple tasks to access shared data. Ada’s protected objects (POs) implement the monitor-lock concept [13]. Method-level locking requires a task to acquire an exclusive lock to execute a PO’s entry or procedure. (Protected functions allow concurrent read-access in the style of a readers–writers lock [12].) Entries and procedures of a PO thus effectively execute one after another, which makes it straight-forward for programmers to reason about updates to the shared data encapsulated by a PO. Informally, sequential consistency ensures that method calls act as if they occurred in a sequential, total order that is consistent with the program order of each participating task. I.e., for any concurrent execution, the method calls to POs can be ordered sequentially such that they (1) are consistent with program order, and (2) meet each PO’s specification (pre-condition, side-effect, post-condition) [12].

Although the sequential consistency semantics of mutual exclusion locks facilitate reasoning about programs, they nevertheless introduce potential concurrency bugs such as dead-lock, live-lock and priority inversion. The mutual-exclusion property of (highly-contended) locks stands in the way to scalability of parallel programs on many-core architectures [20]. Locks do not allow progress guarantees, because a task may fail inside a critical section (e.g., by entering an endless loop), preventing other tasks from accessing shared data.

Given the disadvantages of mutual exclusion locks, it is thus desirable to give up on method-level locking and allow method calls to overlap in time. Synchronization is then performed on a finer granularity within a method’s code, via atomic read-modify-write (RMW) operations. In the absence of mutual exclusion locks, the possibility of task-failure inside a critical section is eliminated, because critical sections are reduced to single atomic operations. These atomic operations are provided either by the CPU’s instruction set architecture (ISA), or the language run-time (with the help of the CPU’s ISA). It thus becomes possible to provide progress guarantees, which are unattainable with locks. In particular, a method is non-blocking, if a task’s pending invocation is never required to wait for another task’s pending invocation to complete [12].

1-- Initial values:
2Flag := False;
3Data := 0;
1Data := 1;     
2Flag := True;  
1loop
2  R1 := Flag;
3  exit when R1;
4end loop; 
5R2 := Data; 
1Flag : Boolean with Atomic;
(a)  (b)
Figure 1: (a) Producer-consumer synchronization in pseudo-code: Task 1 writes the Data variable and then signals Task 2 by setting the Flag variable. Task 2 is spinning on the Flag variable (lines 1 to 1) and then reads the Data variable. (b) Labeling to enforce sequential consistency in Ada 2012.

Non-blocking synchronization techniques are notoriously difficult to implement and the design of non-blocking data structures is an area of active research. To enable non-blocking synchronization, a programming language must provide a strict memory model. The purpose of a memory model is to define the set of values a read operation in a program is allowed to return [2].

To motivate the need for a strict memory model, consider the producer-consumer synchronization example in Fig. 1(a) (adopted from [22] and [5]). The programmer’s intention is to communicate the value of variable Data from Task 1 to Task 2. Without explicitly requesting a sequentially consistent execution, a compiler or CPU may break the programmer’s intended synchronization via the Flag variable by re-ordering memory operations that will result in reading R2 = 0 in Line 1 of Task 2. (E.g., a store–store re-ordering of the assignments in lines 1 and 1 of Task 1 will allow this result.) In Ada 2012, such re-orderings can be ruled out by labeling variables Data and Flag by aspect volatile. The corresponding variable declarations are depicted in Fig. 1(b). (Note that by [9, C.6§8/3] aspect atomic implies aspect volatile, but not vice versa.)

The intention for volatile variables in Ada 2012 was to guarantee that all tasks agree on the same order of updates [9, C.6§16/3]. Updates of volatile variables are thus required to be sequentially consistent, in the sense of Lamport’s definition [14]: “With sequential consistency (SC), any execution has a total order over all memory writes and reads, with each read reading from the most recent write to the same location”.

However, the Ada 2012 aspect volatile has the following shortcomings:

  1. Ensuring SC for multiple tasks without atomic access is impossible. Non-atomic volatile variables therefore should not be provided by the language. Otherwise, the responsibility shifts from the programming language implementation to the programmer to ensure SC by pairing an atomic (implied volatile) variable with each non-atomic volatile variable (see, e.g., Fig. 1(b) and [21] for examples). (Note that a programming language implementation may ensure atomicity by a mutual exclusion lock if no hardware primitives for atomic access to a particular type of shared data are available.)

  2. Requiring SC on all shared variables is costly in terms of performance on contemporary multi-core CPUs. In Fig. 1, performance can be improved by allowing a less strict memory order for variable Data (to be addressed in Section 2).

  3. Although Ada provides the highly abstract PO monitor-construct for blocking synchronization, there is currently no programming primitive available to match this abstraction level for non-blocking synchronization.

Contemporary CPU architectures relax SC for the sake of performance [3, 10, 22]. It is a challenge for programming language designers to provide safe, efficient and user-friendly non-blocking synchronization features. The original memory model for Java contained problems and had to be revised [15]. It was later found to be unsound with standard compiler optimizations [23]. The C++11 standard (cf. [1, 24]) has already specified a strict memory model for concurrent and parallel computing. We think that C++11 was not entirely successful both in terms of safety and in terms of being user-friendly. In contrast, we are convinced that these challenges can be met in the upcoming Ada 202x standard.

It has been felt since Ada 95 that it might be advantageous to have language support for synchronization based on atomic variables. For example, we cite [11, C.1]:

“A need to access specific machine instructions arises sometimes from other considerations as well. Examples include instructions that perform compound operations atomically on shared memory, such as test-and-set and compare-and-swap, and instructions that provide high-level operations, such as translate-and-test and vector arithmetic.”

Ada is already well-positioned to provide a strict memory model in conjunction with support for non-blocking synchronization, because it provides tasks as first-class citizens. This rules out inconsistencies that may result from thread-functionality provided through libraries [7].

To provide safe and efficient non-blocking synchronization for Ada 202x, this paper makes the following contributions:

  1. We extend Ada’s memory model by introducing synchronized types, which allow the expression of memory ordering operations consistently and at a sufficient level of detail. Memory ordering operations are expressed through aspects and attributes. Language support for spin loop synchronization via synchronized variables is proposed.

  2. We propose concurrent objects (COs) as a high-level language construct to express non-blocking synchronization. COs are meant to encapsulate the intricacies of non-blocking synchronization as POs do for blocking synchronization. Contrary to POs, the entries and procedures of COs execute in parallel, due to a fine-grained, optimistic synchronization mechanism.

  3. We provide an alternative, low-level API on synchronized types, which provides programmers with full control over the implementation of non-blocking synchronization semantics. Our main purpose with the low-level API is to provoke a discussion on the trade-off between abstraction versus flexibility.

  4. We illustrate the use of concurrent objects and the alternative, low-level API by several examples.

The remainder of this paper is organized as follows. We summarize the state-of-the-art on memory models and introduce synchronized variables in Sec. 2. We introduce attributes for specifying memory ordering operations in Sec. 3. We specify concurrent objects in Sec. 4 and discuss task scheduling in the presence of COs in Sec. 5. Sec. 6 contains two CO example implementations with varying memory consistency semantics. We discuss our low-level API in Sec. 7. Sec. 8 contains our conclusions.

This paper is an extension of work that appeared at the Ada-Europe 2018 conference [6]. Additional material is confined to two appendices: Appendix 0.A states the design-decisions of our proposed non-blocking synchronization mechanisms. Appendix 0.B contains further examples.

2 The Memory Model

For reasons outlined in Sec. 1, we do not consider the Ada 2012 atomic and volatile types here. Rather, we introduce synchronized types and variables. Synchronized types provide atomic access. We propose aspects and attributes for specifying a particular memory model to be employed for reading/writing synchronized variables.

Modern multi-core computer architectures are equipped with a memory hierarchy that consist of main memory, caches and registers. It is important to distinguish between memory consistency and coherence. We cite from [22]: ‘For a shared memory machine, the memory consistency model defines the architecturally visible behavior of its memory system. Consistency definitions provide rules about loads and stores (or memory reads and writes) and how they act upon memory. As part of supporting a memory consistency model, many machines also provide cache coherence protocols that ensure that multiple cached copies of data are kept up-to-date.’

The purpose of a memory consistency model (or memory model, for short) is to define the set of values a read operation is allowed to return [2]. To facilitate programmers’ intuition, it would be ideal if all read/write operations of a program’s tasks are sequentially consistent. However, the hardware memory models provided by contemporary CPU architectures relax SC for the sake of performance [3, 10, 22]. Enforcing SC on such architectures may incur a noticeable performance penalty. The workable middle-ground between intuition (SC) and performance (relaxed hardware memory models) has been established with SC for data race-free programs (SC-for-DRF) [4]. Informally, a program has a data race if two tasks access the same memory location, at least one of them is a write, and there are no intervening synchronization operations that would order the accesses. “SC-for-DRF” requires programmers to ensure that programs are free of data races under SC. In turn, the relaxed memory model of a SC-for-DRF CPU guarantees SC for all executions of such a program.

It has been acknowledged in the literature [2] that Ada 83 was perhaps the first widely-use high-level programming language to provide first-class support for shared-memory programming. The approach taken with Ada 83 and later language revisions was to require legal programs to be without synchronization errors, which is the approach taken with SC-for-DRF. In contrast, for the Java memory model it was perceived that even programs with synchronization errors shall have defined semantics for reasons of safety and security of Java’s sand-boxed execution environment. (We do not consider this approach in the remainder of this paper, because it does not align with Ada’s current approach to regard the semantics of programs with synchronization errors as undefined, i.e., as an erroneous execution, by [9, 9.10§11].) The SC-for-DRF programming model and two relaxations were formalized for C++11 [8]. They were later adopted for C11, OpenCL 2.0, and for X10 [26] (without the relaxations).

On the programming language level to guarantee DRF, means for synchronization (ordering operations) have to be provided. Ada’s POs are well-suited for this purpose. For non-blocking synchronization, atomic operations can be used to enforce an ordering between the memory accesses of two tasks. It is one goal of this paper to add language features to Ada such that atomic operations can be employed with DRF programs. To avoid ambiguity, we propose synchronized variables and types, which support the expression of memory ordering operations at a sufficient level of detail (see Sec. 3.1).

The purpose of synchronized variables is that they can be used to safely transfer information (i.e., the value of the variables) from one task to another. ISAs provide atomic load/store instructions only for a limited set of primitive types. Beyond those, atomicity can only be ensured by locks. Nevertheless, computer architectures provide memory fences (see e.g., [12]) to provide means for ordering memory operations. A memory fence requires that all memory operations before the fence (in program order) must be committed to the memory hierarchy before any operation after the fence. Then, for data to be transferred from one thread to another it is not necessary to be atomic anymore. I.e., it is sufficient that (1) the signaling variable is atomic, and that (2) all write operations are committed to the memory hierarchy before setting the signaling variable. On the receiver’s side, it must be ensured that (3) the signaling variable is read atomically, and that (4) memory loads for the data occur after reading the signaling variable (Listing 2 provides an example.)

In addition to synchronized variables, synchronized types and attribute Synchronized_Components are convenient means for enhancing the usefulness of synchronized variables.

The general idea of our proposed approach is to define non-blocking concurrent objects similar to protected objects (cf. e.g., [12]). However, entries of concurrent objects will not block on guards; they will spin loop until the guard evaluates to true. In addition, functions, procedures, and entries of concurrent objects are allowed to execute and to modify the encapsulated data in parallel. Private entries for concurrent objects are also supported. It is their responsibility that the data provides a consistent view to the users of the concurrent object. Concurrent objects will use synchronized types for synchronizing data access. Several memory models are provided for doing this efficiently. It is the responsibility of the programmer to ensure that the entries of a concurrent object are free from data races (DRF). For such programs, the non-blocking semantics of a concurrent object will provide SC in the same way as protected objects do for blocking synchronization.

2.1 Synchronizing memory operations and enforcing ordering

For defining ordering relations on memory operations, it is useful to introduce some other useful relations.

The synchronizes-with relation can be achieved only by use of atomic types. Even if monitors or protected objects are used for synchronization, the runtime implements them employing atomic types. The general idea is to equip read and write operations on an atomic variable with information that will enforce an ordering on the read and write operations. Our proposal is to use attributes for specifying this ordering information. Details can be found below.

The happens-before relation is the basic relation for ordering operations in programs. In a program consisting of only one single thread, happens-before is straightforward. For inter-thread happens-before relations the synchronizes-with relation becomes important. If operation X in one thread synchronizes-with operation Y in another thread, then X happens-before Y. Note that the happens-before relation is transitive, i.e., if X happens-before Y and Y happens-before Z, then X happens-before Z. This is true even if X, Y, and Z are part of different threads.

We define different memory models. These memory models originated from the DRF [4] and properly-labeled [10] hardware memory models. They were formalized for the memory model of C++ [8]. The “sequentially consistent” and “acquire-release” memory models provide SC for DRF. The models can have varying costs on different computer architectures. The “acquire-release” memory model is a relaxation of the “sequentially consistent” memory model. As described in Table 1, it requires concessions from the programmer to weaken SC in turn for more flexibility for the CPU to re-order memory operations.

Sequentially Consistent Ordering

is the most stringent model and the easiest one for programmers to work with. In this case all threads see the same, total order of operations. This means, a sequentially consistent write to a synchronized variable synchronizes-with a sequentially-consistent read of the same variable.

Relaxed Ordering

does not obey synchronizes-with relationships, but operations on the same synchronized variable within a single thread still obey happens-before relationships. This means that although one thread may write a synchronized variable, at a later point in time another thread may read an earlier value of this variable.

Acquire-Release Ordering

when compared to relaxed ordering introduces some synchronization. In fact, a read operation on synchronized variables can then be labeled by acquire, a write operation can be labeled by release. Synchronization between release and acquire is pairwise between the thread that issues the release and that acquire operation of a thread that does the first read-acquire after the release.111In global time! A thread issuing a read-acquire later may read a different value than that written by the first thread.

memory order involved constraints for reordering memory accesses
threads (for compilers and CPUs)
relaxed 1 no inter-thread constraints
release/acquire 2 (1) ordinary222Memory accesses other than accesses to synchronized variables stores originally333Before optimizations performed by the compiler and before reordering done by the CPU. before release (in program order) will happen before the release fence (after compiler optimizations and CPU reordering)
(2) ordinary loads originally after acquire (in program order) will take place after the acquire fence (after compiler optimizations and CPU reordering)
sequentially
consistent
all (1) all memory accesses originally before the sequentially_consistent one (in program order) will happen before the fence (after compiler optimizations and CPU reordering)
(2) all memory accesses originally after the sequentially_consistent one (in program order) will happen after the fence (after compiler optimizations and CPU reordering)


Table 1: Memory Order and Constraints for Compilers and CPUs

It is important to note, that the semantics of the models above have to be enforced by the compiler (for programs which are DRF). I.e., the compiler “knows” the relaxed memory model of the hardware and inserts memory fences in the machine-code such that the memory model of the high-level programming language is enforced. Compiler optimizations must ensure that reordering of operations is performed in such a way that the semantics of the memory model are not violated. The same applies to CPUs, i.e., reordering of instructions is done with respect to the CPU’s relaxed hardware memory model, constrained by the ordering semantics of fences inserted by the compiler. The constraints enforced by the memory model are summarized in Table 1.

3 Synchronization primitives

3.1 Synchronized Variables

Synchronized variables can be used as atomic variables in Ada 2012, the only exception being that they are declared inside the lexical scope (data part) of a concurrent object. In this case aspects and attributes used in the implementation of the concurrent object’s operations (functions, procedures, and entries) are employed for specifying behavior according to the memory model. Variables are labeled by the boolean aspect Synchronized.

Read accesses to synchronized variables in the implementation of the concurrent object’s operations may be labeled with the attribute Concurrent_Read, write accesses with the attribute Concurrent_Write. Both attributes have a parameter Memory_Order to specify the memory order of the access. (If the operations are not labeled, the default values given below apply.) In case of read accesses, values allowed for parameter Memory_Order are Sequentially_Consistent, Acquire, and Relaxed. The default value is Sequentially_Consistent. For write accesses the values allowed are Sequentially_Consistent, Release, and Relaxed. The default value is again Sequentially_Consistent.

For example, assigning the value of synchronized variable Y to synchronized variable X is given like
     X’Concurrent_Write(Memory_Order => Release) :=
                           Y’Concurrent_Read(Memory_Order => Acquire);

In addition we propose aspects for specifying variable specific default values for the attributes described above. In more detail, when declaring synchronized variables the default values for read and write accesses can be specified via aspects Memory_Order_Read and Memory_Order_Write. The allowed values are the same as those given above for read and write accesses. If these memory model aspects are given when declaring a synchronized variable, the attributes Concurrent_Read and Concurrent_Write need not be given for actual read and write accesses of this variable. However, these attributes may be used to temporarily over-write the default values specified for the variable by the aspects. For example
     X: integer with Synchronized, Memory_Order_Write => Release;
     Y: integer with Synchronized, Memory_Order_Read => Acquire;

     X := Y;
does the same as the example above but without spoiling the assignment statement.

Aspect Synchronized_Components relates to aspect Synchronized in the same way as Atomic_Components relates to Atomic in Ada 2012.

3.2 Read-Modify-Write Variables

If a variable inside the data part of a concurrent object is labeled by the aspect Read_Modify_Write, this implies that the variable is synchronized. Write access to a read-modify-write variable in the implementation of the protected object’s operations is a read-modify-write access. The read-modify-write access is done via the attribute Concurrent_Exchange. The two parameters of this attribute are Memory_Order_Success and Memory_Order_Failure. The first specifies the memory order for a successful write, the second one the memory order if the write access fails (and a new value is assigned to the variable).

Memory_Order_Success is one of Sequentially_Consistent, Acquire, Release, and Relaxed.

Memory_Order_Failure may be one of Sequentially_Consistent, Acquire, and Relaxed. The default value for both is Sequentially_Consistent. For the same read-modify-write access the memory order specified for failure must not be stricter than that specified for success. So, if Memory_Order_Failure => Acquire or Memory_Order_Failure => Sequentially_Consistent is specified, these have also be given for success.

For read access to a read-modify-write variable, attribute Concurrent_Read has to be used. The parameter Memory_Order has to be given. Its value is one of Sequentially_Consistent, Acquire, Relaxed. The default value is Sequentially_Consistent.

Again, aspects for variable specific default values for the attributes described above may be specified when declaring a read-modify-write variable. The aspects are Memory_Order_Read, Memory_Order_Write_Success, and Memory_Order_Write_Failure with allowed values as above.

3.3 Synchronization Loops

As presented below synchronization by synchronized variables is performed via spin loops. We call these loops sync loops.

4 Concurrent Objects

4.1 Non-Blocking Synchronization

Besides the aspects and attributes proposed in Section 3 that have to be used for implementing concurrent objects, concurrent objects are different from protected objects in the following way. All operations of concurrent objects can be executed in parallel. Synchronized variables have to be used for synchronizing the executing operations. Entries have Boolean-valued guards. The Boolean expressions for such guards may contain only synchronized variables declared in the data part of the protected object and constants. Calling an entry results either in immediate execution of the entry’s body if the guard evaluates to true, or in spin-looping until eventually the guard evaluates to true. We call such a spin loop sync loop.

4.2 Read-Modify-Write Synchronization

For concurrent objects with read-modify-write variables the attributes proposed in Section 3 apply. All operations of concurrent objects can be executed in parallel. Read-modify-write variables have to be used for synchronizing the executing operations. The guards of entries have to be of the form X = X’OLD where X denotes a read-modify-write variable of the concurrent object. The attribute OLD is well-known from postconditions. An example in our context can be found in Listing 1.

If during the execution of an entry a read-modify-write operation is reached, that operation might succeed immediately, in which case execution proceeds after the operation in the normal way. If the operation fails, the whole execution of the entry is restarted (implicit sync loop). In particular, only the statements being data-dependent on the read-modify-write variable are re-executed. Statements not being data-dependent on the read-modify-write variables are executed only on the first try.444 For the case that the compiler cannot figure out which statements are data-dependent, we propose an additional Boolean aspect only_execute_on_first_try to tag non-data-dependent statements. Precluding non-data-dependent statements from re-execution is not only a matter of efficiency, it sometimes makes sense semantically, e.g., for adding heap management to an implementation.

5 Scheduling and Dispatching

We propose a new state for Ada tasks to facilitate correct scheduling and dispatching for threads synchronizing via synchronized or read-modify-write types. If a thread is in a sync loop, the thread state changes to “in_sync_loop”. Note that sync loops can only happen inside concurrent objects. Thus they can be spotted easily by the compiler and cannot be confused with “normal” loops. Note also that for the state change it makes sense not to take place during the first iteration of the sync loop, because the synchronization may succeed immediately. For read-modify-write loops, iteration from the third iteration on may be a good choice; for spin loops, an iteration from the second iteration on may be a good choice.

In this way the runtime can guarantee that not all available CPUs (cores) are occupied by threads in state “in_sync_loop”. Thus we can be sure that at least one thread makes progress and finally all synchronized or read-modify-write variables are released (if the program’s synchronization structure is correct and the program does not deadlock).

After leaving a sync loop, the thread state changes back to “runable”.

6 Examples

6.0.1 Non-blocking Stack.

Listing 1 shows an implementation of a non-blocking stack using our proposed new syntax for concurrent objects.

1  subtype Data is Integer;
2
3  type List;
4  type List_P is access List;
5  type List is
6    record
7      D: Data;
8      Next: List_P;
9    end record;
10
11  Empty: exception;
12
13  concurrent Lock_Free_Stack
14  is
15    entry Push(D: Data);
16    entry Pop(D: out Data);
17  private
18    Head: List_P with Read_Modify_Write,
19      Memory_Order_Read => Relaxed,
20      Memory_Order_Write_Success => Release,
21      Memory_Order_Write_Failure => Relaxed;
22  end Lock_Free_Stack;
23
24  concurrent body Lock_Free_Stack is
25    entry Push (D: Data)
26        until Head = Head’OLD is
27      New_Node: List_P := new List;
28    begin
29      New_Node.all := (D => D, Next => Head);
30      Head := New_Node;
31    end Push;
32
33    entry Pop(D: out Data)
34        until Head = Head’OLD is
35      Old_Head: List_P;
36    begin
37      Old_Head := Head;
38      if Old_Head /= null then
39        Head := Old_Head.Next;
40        D := Old_head.D;
41      else
42        raise Empty;
43      end if;
44    end Pop;
45  end Lock_Free_Stack;
Listing 1: Non-blocking Stack Implementation Using Proposed New Syntax

Implementation of entry Push (lines 11) behaves as follows. In Line 1 a new element is inserted at the head of the list. Pointer Next of this element is set to the current head. The next statement (Line 1) assigns the new value to the head of the list. Since variable Head has aspect Read_Modify_Write (line 1), this is done with RMW semantics, i.e., if the value of Head has not been changed (since the execution of Push has started) by a different thread executing Push or Pop (i.e., Head = Head’OLD), then the RMW operation succeeds and execution proceeds at Line 1, i.e., Push returns. If the value of Head has been changed (Head /= Head’OLD), then the RMW operation fails and entry Push is re-executed starting from Line 1. Line 1 is not re-executed as it is not data dependent on Head.

Several memory order attributes apply to the RMW operation (Line 1) which are given in lines 11: In case of a successful execution of the RMW, the value of Head is released such that other threads can read its value via memory order acquire. In the failure case the new value of Head is assigned to the “local copy” of Head (i.e., Head’OLD) via relaxed memory order. “Relaxed” is enough because RMW semantics will detect if the value of Head has been changed by a different thread anyway. The same applies to “Relaxed” in Line 1.

Implementation of entry Pop (lines 11) follows along the same lines.

Memory management needs special consideration: In our case it is enough to use a synchronized counter that counts the number of threads inside Pop. If the counter equals , memory can be freed. Ada’s storage pools are a perfect means for doing this without spoiling the code.

This example also shows how easy it is to migrate from a (working) blocking to a (working) non-blocking implementation of a program. Assume that a working implementation with a protected object exists, then one has to follow these steps:

  1. Replace keyword protected by keyword concurrent.

  2. Replace protected operations by DRF concurrent operations, thereby adding appropriate guards to the concurrent entries.

  3. Test the non-blocking program which now has default memory order sequentially_consistent.

  4. Carefully relax the memory ordering requirements: Add memory order aspects and/or attributes Acquire, Release, and/or Relaxed to improve performance but without violating memory consistency.

6.0.2 Generic Release-Acquire Object.

Listing 2 shows how release-acquire semantics can be implemented for general data structures with help of one synchronized Boolean.

1generic
2  type Data is private;
3package Generic_Release_Acquire is
4
5  concurrent RA
6  is
7    procedure Write (d: Data);
8    entry Get (D: out Data);
9  private
10    Ready: Boolean := false with Synchronized,
11      Memory_Order_Read => Acquire,
12      Memory_Order_Write => Release;
13    Da: Data;
14  end RA;
15
16end Generic_Release_Acquire;
17
18package body Generic_Release_Acquire is
19
20  concurrent body RA is
21
22    procedure Write (D: Data) is
23    begin
24      Da := D;
25      Ready := true;
26    end Write:
27
28    entry Get (D: out Data)
29      when Ready is
30      -- spin-lock until released, i.e., Ready = true;
31      -- only sync. variables and constants allowed in guard expression
32    begin
33      D := Da;
34    end Get;
35  end RA;
36
37end Generic_Release_Acquire;
Listing 2: Generic Release-Acquire Object

7 Api

As already pointed out, we feel that providing concurrent objects as first-class citizens is the right way to enhance Ada with non-blocking synchronization on an adequate memory model. On the other hand, if the programmer needs synchronization on a lower level than concurrent objects provide, an API-based approach (generic function Read_Modify_Write in package Memory_Model) would be a viable alternative. Listing 3 shows such a predefined package Memory_Model. It contains the specification of generic function Read_Modify_Write, which allows to use the read-modify-write operation of the underlying computer hardware555An example for employing function Read_Modify_Write is given in the Appendix in Listing 8. It shows an implementation of a lock free stack using generic function Read_Modify_Write of package Memory_Model..

Exposing sync loops to the programmer makes it necessary to introduce a new aspect sync_loop to let the runtime perform the state change to “in_sync_loop” (cf. Section 5). Because nobody can force the programmer to use this aspect correctly, the information transferred to the runtime may be false or incomplete, giving rise to concurrency defects such as deadlocks, livelocks, and other problems.

1package Memory_Model is
2
3  type Memory_Order_Type is (
4    Sequentially_Consistent,
5    Relaxed,
6    Acquire,
7    Release);
8
9  subtype Memory_Order_Success_Type is Memory_Order_Type;
10
11  subtype Memory_Order_Failure_Type is Memory_Order_Type
12    range Sequentially_Consistent .. Acquire;
13
14  generic
15     type Some_Synchronized_Type is private;
16     with function Update return Some_Synchronized_Type;
17     Read_Modify_Write_Variable: in out Some_Synchronized_Type
18       with Read_Modify_Write;
19     Memory_Order_Success: Memory_Order_Success_Type :=
20       Sequentially_Consistent;
21     Memory_Order_Failure: Memory_Order_Failure_Type :=
22       Sequentially_Consistent;
23  function Read_Modify_Write return Boolean;
24
25end Memory_Model;
Listing 3: Package Memory_Model

8 Conclusion and Future Work

We have presented an approach for providing safe non-blocking synchronization in Ada 202x. Our novel approach is based on introducing concurrent objects for encapsulating non-blocking data structures on a high abstraction level. In addition, we have presented synchronized and read-modify-write types which support the expression of memory ordering operations at a sufficient level of detail. Concurrent objects provide SC for programs without data races. This SC-for-DRF memory model is well-aligned with Ada’s semantics for blocking synchronization via protected objects, which requires legal programs to be without synchronization errors ([9, 9.10§11]).

Although Ada 2012 provides the highly abstract protected object monitor-construct for blocking synchronization, there was previously no programming primitive available to match this abstraction level for non-blocking synchronization. The proposed memory model in conjunction with our concurrent object construct for non-blocking synchronization may bar users from having to invent ad-hoc synchronization solutions, which have been found error-prone with blocking synchronization already [25].

Until now, all previous approaches are based on APIs. We have listed a number of advantages that support our approach of making non-blocking data structures first class language citizens. In contrast, our approach for Ada 202x encapsulates non-blocking synchronization inside concurrent objects. This safe approach makes the code easy to understand. Note that concurrent objects are not orthogonal to objects in the sense of OOP (tagged types in Ada). However, this can be achieved by employing the proposed API approach (cf. Section 7). In addition, it is not difficult to migrate code from blocking to non-blocking synchronization. Adding memory management via storage pools integrates well with our modular approach and does not clutter the code.

A lot of work remains to be done. To name only a few issues: Non-blocking barriers (in the sense of [9, D.10.1]) would be useful; details have to be elaborated. Fully integrating concurrent objects into scheduling and dispatching models and integrating with the features for parallel programming planned for Ada 202x have to be done carefully.

9 Acknowledgments

This research was supported by the Austrian Science Fund (FWF) project I 1035N23, and by the Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Science, ICT & Future Planning under grant NRF2015M3C4A7065522.

References

  • [1] Working Draft, Standard for Programming Language C++. ISO/IEC N4296, 2014.
  • [2] S. V. Adve and H.-J. Boehm. Memory models: A case for rethinking parallel languages and hardware. Commun. ACM, 53(8):90–101, Aug. 2010.
  • [3] S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. Computer, 29(12):66–76, Dec. 1996.
  • [4] S. V. Adve and M. D. Hill. Weak ordering—a new definition. In Proceedings of the 17th Annual International Symposium on Computer Architecture, ISCA ’90, pages 2–14, New York, NY, USA, 1990. ACM.
  • [5] J. Barnes. Ada 2012 Rationale: The Language – the Standard Libraries. Springer LNCS, 2013.
  • [6] J. Blieberger and B. Burgstaller. Safe non-blocking synchronization in Ada 202x. In Proceeding of Ada-Europe, Springer LNCS, 2018.
  • [7] H.-J. Boehm. Threads cannot be implemented as a library. SIGPLAN Not., 40(6):261–268, June 2005.
  • [8] H.-J. Boehm and S. V. Adve. Foundations of the C++ concurrency memory model. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’08, pages 68–78, New York, NY, USA, 2008. ACM.
  • [9] R. L. Brukardt, editor. Annotated Ada Reference Manual, ISO/IEC 8652:2012/Cor 1:2016. 2016.
  • [10] K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. SIGARCH Comput. Archit. News, 18(2SI):15–26, May 1990.
  • [11] L. Guerby. Ada 95 Rationale – The Language – The Standard Libraries. Springer, 1997.
  • [12] M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2012.
  • [13] C. A. R. Hoare. Monitors: An operating system structuring concept. Commun. ACM, 17(10):549–557, Oct. 1974.
  • [14] L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput., 28(9):690–691, Sept. 1979.
  • [15] J. Manson, W. Pugh, and S. V. Adve. The Java memory model. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’05, pages 378–391, New York, NY, USA, 2005. ACM.
  • [16] P. E. McKenney, T. Riegel, and J. Preshing. N4036: Towards Implementation and Use of memory_order_consume. Technical Report WG21/N4036, JTC1/SC22/WG21 – The C++ Standards Committee – ISOCPP, May 2014.
  • [17] P. E. McKenney, T. Riegel, J. Preshing, H. Boehm, C. Nelson, O. Giroux, L. Crowl, J. Bastien, and M. Wong. P0462R1: Marking memory_order_consume Dependency Chains. Technical Report WG21/P0462R1, JTC1/SC22/WG21 – The C++ Standards Committee – ISOCPP, Feb. 2017.
  • [18] P. E. McKenney, M. Wong, H. Boehm, J. Maurer, J. Yasskin, and J. Bastien. P0190R4: Proposal for New memory_order_consume Definition. Technical Report WG21/P0190R4, JTC1/SC22/WG21 – The C++ Standards Committee – ISOCPP, July 2017.
  • [19] J. Preshing. The purpose of memory_order_consume in C++11. http://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/. Accessed: 2017-09-13.
  • [20] M. L. Scott. Shared-Memory Synchronization. Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, 2013.
  • [21] H. Simpson. Four-slot fully asynchronous communication mechanism. Computers and Digital Techniques, IEE Proceedings E, 137:17–30, 02 1990.
  • [22] D. J. Sorin, M. D. Hill, and D. A. Wood. A Primer on Memory Consistency and Cache Coherence. Number 16 in Synthesis Lectures on Computer Architecture. Morgan & Claypool, 2011.
  • [23] J. Ševčík and D. Aspinall. On validity of program transformations in the Java memory model. In Proceedings of the 22nd European Conference on Object-Oriented Programming, ECOOP ’08, pages 27–51, Berlin, Heidelberg, 2008. Springer-Verlag.
  • [24] A. Williams. C++ Concurrency in Action. Manning Publ. Co., Shelter Island, NY, 2012.
  • [25] W. Xiong, S. Park, J. Zhang, Y. Zhou, and Z. Ma. Ad hoc synchronization considered harmful. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, pages 163–176, Berkeley, CA, USA, 2010. USENIX Association.
  • [26] A. Zwinkau. A memory model for X10. In Proceedings of the 6th ACM SIGPLAN Workshop on X10, X10 2016, pages 7–12, New York, NY, USA, 2016. ACM.

Appendix 0.A Rationale and comparison with C++11

We state the rationale for our proposed language features and compare them to the C++ memory model. This section thus requires modest familiarity with the C++11 standard [1].

0.a.1 C++11’s compare_exchange_weak and compare_exchange_strong

We felt that compare_exchange_weak and compare_exchange_strong are not needed on language level. These are hardware-related details which the compiler knows and should anticipate without intervention of the programmer.

In particular, compare_exchange_weak means that sometimes a RMW operation fails although the value of the RMW variable has not been changed by a different thread. In this case re-executing the whole implicit sync loop is not necessary, only the RMW operation has to be redone. We assume that the compiler produces machine code for this “inner” loop. Because this is only the case on very peculiar CPUs, it is obvious that the compiler and not the programmer should take care of this.

In addition, migrating Ada programs will be facilitated by assigning this job to the compiler.

0.a.2 C++11’s consume memory ordering

C++ introduced memory_order_consume specifically for supporting read-copy update (RCU) used in the Linux Kernel (cf. [19]). However, it turned out that memory_order_consume as defined in the C++ standard [1] is not implemented by compilers. Instead, all compilers map it to memory_order_acquire. The major reason for this is that the data dependency as defined in [1] is difficult to implement (cf., e.g., [16]). There is, however, ongoing work within ISOCPP to make memory_order_consume viable for implementation (cf., e.g., [17, 18]). In particular,  [18] proposes to restrict memory_order_consume to data dependence chains starting with pointers because this represents the major usage scenario in the Linux kernel.

For Ada 202x it seems reasonable not to include memory_order_consume in the standard. Instead, compilers are encouraged to exploit features provided by the hardware for gaining performance on weakly-ordered CPUs. The programmer uses memory_order_release and memory_order_acquire for synchronization and the compiler improves the performance of the program if the hardware is weakly-ordered and it (the compiler) is willing to perform data dependency analysis. In addition, a compiler switch might be a way for letting the programmer decide whether she is willing to bare the optimization load (increased compile time).

In addition, migrating Ada programs will be facilitated by not having to replace memory_order_acquire with memory_order_consume and vice versa depending on the employed hardware.

0.a.3 C++11’s acquire_release memory ordering

C++11 defines acquire_release memory ordering because some of C++11’s RMW operations contain both a read and a write operation, e.g., i++ for i being an atomic integer variable. Because Ada’s syntax does not contain such operators, acquire_release memory ordering is not needed on language level. Compiling i := i+1 (i being an atomic integer variable), an Ada compiler is able to employ suitable memory fences to ensure the memory model aspects given by the programmer together with the original statement.

Appendix 0.B Further Examples

0.b.0.1 Peterson’s Algorithm.

Listing 4 shows an implementation of Peterson’s algorithm, a method for lock-free synchronizing two tasks, under the sequentially consistent memory model.

1concurrent Peterson_Exclusion is
2
3  procedure Task1_Critical_Section;
4
5  procedure Task2_Critical_Section;
6
7private
8
9  -- Accesses to synchronized variables are atomic and have by default
10  -- Sequential Consistency (i.e. the compiler must generate code that
11  -- respects program order, and the adequate memory fence instructions
12  -- are introduced before and after each load or store to serialize
13  -- memory operations in all CPU cores, respecting ordering, visibility,
14  -- and atomicity).  However, it is possible to relax each read / write
15  -- operation on an synch variable, for obtaining higher performance in
16  -- those algorithms that allow these kinds of reorderings by other
17  -- threads.
18
19  Flag1 : Boolean := False with Synchronized;
20  Flag2 : Boolean := False with Synchronized;
21  Turn  : Natural with Synchronized;
22
23  -- Additional data can be placed here, i.e. for shared data variables
24  -- that need no atomic accesses (i.e. when data races are not possible
25  -- because protected by synchronized variables)
26
27  -- Concurrent entries also encapsulate the access to shared data
28  -- (Synch variables), but automatically using spin loops / compare and
29  -- swap operations for synchronization among threads.
30
31  entry Task1_Wait;
32
33  entry Task2_Wait;
34
35end Peterson_Exclusion;
36
37
38concurrent body Peterson_Exclusion is
39
40  entry Task1_Busy_Wait
41    when Flag2 and then Turn = 2  -- Spin loop until the condition is True
42  is
43  begin
44    null;
45  end Task1_Busy_Wait;
46
47  entry Task2_Busy_Wait
48    when Flag1 and then Turn = 1  -- Spin loop until the condition is True
49  is
50  begin
51    null;
52  end Task2_Busy_Wait;
53
54  procedure Task1_Critical_Section is
55  begin
56    Flag1 := True;
57    Turn  := 2;
58    Task1_Busy_Wait;
59
60    Code_For_Task1_Critical_Section;
61
62    Flag1 := False;
63  end Task1_Critical_Section;
64
65  procedure Task2_Critical_Section is
66  begin
67    Flag2 := True;
68    Turn  := 1;
69    Task2_Busy_Wait;
70
71    Code_For_Task2_Critical_Section;
72
73    Flag2 := False;
74  end Task2_Critical_Section;
75
76end Peterson_Exclusion;
Listing 4: Peterson’s Algorithm under the Sequentially Consistent Memory Model

Listing 5 shows an implementation of Peterson’s algorithm under the release-acquire memory model with default memory model specified in the declarative part.

1concurrent Peterson_Exclusion is
2
3  procedure Task1_Critical_Section;
4
5  procedure Task2_Critical_Section;
6
7private
8
9  Flag1 : Boolean := False
10    with Synchronized, Memory_Order_Read => Acquire,
11      Memory_Order_Write => Release;
12  Flag2 : Boolean := False
13    with Synchronized, Memory_Order_Read => Acquire,
14      Memory_Order_Write => Release;
15  Turn  : Natural
16    with Synchronized, Memory_Order_Read => Acquire,
17      Memory_Order_Write => Release;
18
19  entry Task1_Wait;
20
21  entry Task2_Wait;
22
23end Peterson_Exclusion;
24
25
26concurrent body Peterson_Exclusion is
27
28  entry Task1_Busy_Wait
29    when Flag2 and Turn = 2
30    -- Spin loop until the condition is True
31  is
32  begin
33    null;
34  end Task1_Busy_Wait;
35
36  entry Task2_Busy_Wait
37    when Flag1 and Turn = 1
38    -- Spin loop until the condition is True
39  is
40  begin
41    null;
42  end Task2_Busy_Wait;
43
44  procedure Task1_Critical_Section is
45  begin
46    Flag1 := True;
47    Turn  := 2;
48    Task1_Busy_Wait;
49
50    Code_For_Task1_Critical_Section;
51
52    Flag1 := False;
53  end Task1_Critical_Section;
54
55  procedure Task2_Critical_Section is
56  begin
57    Flag2 := True;
58    Turn  := 1;
59    Task2_Busy_Wait;
60
61    Code_For_Task2_Critical_Section;
62
63    Flag2 := False;
64  end Task2_Critical_Section;
65
66end Peterson_Exclusion;
Listing 5: Peterson’s Algorithm under the Release-Acquire Memory Model with default memory model specified in the declarative part

Listing 6 shows an implementation of Peterson’s algorithm under the release-acquire memory model with memory model explicitly specified at statements.

1concurrent Peterson_Exclusion is
2
3  procedure Task1_Critical_Section;
4
5  procedure Task2_Critical_Section;
6
7private
8
9  Flag1 : Boolean’Concurrent_Write(Memory_Model => Release) := False
10    with Synchronized;
11  Flag2 : Boolean’Concurrent_Write(Memory_Model => Release) := False
12    with Synchronized;
13  Turn  : Natural with Synchronized;
14
15  entry Task1_Wait;
16
17  entry Task2_Wait;
18
19end Peterson_Exclusion;
20
21
22concurrent body Peterson_Exclusion is
23
24  entry Task1_Busy_Wait
25    when Flag2’Concurrent_Read(Memory_Model => Acquire) and
26      Turn’Concurrent_Read(Memory_Model => Acquire) = 2
27    -- Spin loop until the condition is True
28  is
29  begin
30    null;
31  end Task1_Busy_Wait;
32
33  entry Task2_Busy_Wait
34    when Flag1’Concurrent_Read(Memory_Model => Acquire) and
35      Turn’Concurrent_Read(Memory_Model => Acquire) = 1
36    -- Spin loop until the condition is True
37  is
38  begin
39    null;
40  end Task2_Busy_Wait;
41
42  procedure Task1_Critical_Section is
43  begin
44    Flag1’Concurrent_Write(Memory_Model => Release) := True;
45    Turn’Concurrent_Write(Memory_Model => Release)  := 2;
46    Task1_Busy_Wait;
47
48    Code_For_Task1_Critical_Section;
49
50    Flag1’Concurrent_Write(Memory_Model => Release) := False;
51  end Task1_Critical_Section;
52
53  procedure Task2_Critical_Section is
54  begin
55    Flag2’Concurrent_Write(Memory_Model => Release) := True;
56    Turn’Concurrent_Write(Memory_Model => Release)  := 1;
57    Task2_Busy_Wait;
58
59    Code_For_Task2_Critical_Section;
60
61    Flag2’Concurrent_Write(Memory_Model => Release) := False;
62  end Task2_Critical_Section;
63
64end Peterson_Exclusion;
Listing 6: Peterson’s Algorithm under the Release-Acquire Memory Model with memory model explicitly specified at statements

0.b.0.2 Filter Algorithm.

The filter algorithm is a non-blocking method for synchronizing processes, which is starvation and deadlock free ([12]). Listing 7 is an implementation using our proposed approach. In particular, notice the use of a private entry family.

1generic
2  No_Of_Processes: Positive; -- positive number >= 2
3package Filter_Algorithm is
4
5  subtype Process_ID is Natural range 0 .. No_Of_Processes-1;
6  subtype Process_ID_With_Minus_One is Integer range
7    -1 .. No_Of_Processes-1;
8  subtype Process_ID_Small is Process_ID range
9    Process_ID’FIRST .. Process_ID’LAST-1;
10  type Level_Type is array(Integer range <>) of
11    Process_ID_With_Minus_One;
12
13  concurrent Access_To_Critical_Section
14  is
15    procedure Acquire_Lock (ID: Process_ID);
16    procedure Release_Lock (ID: Process_ID);
17  private
18    entry Private_Lock(Process_ID); -- entry family
19
20    Level: Level_Type (Process_ID) := (others => -1)
21      with Synchronized_Components,
22      Memory_Order_Read => Acquire,
23      Memory_Order_Write => Release;
24    Last_To_Enter: Level_Type(Process_ID_Small) := (others => -1)
25      with Synchronized_Components,
26      Memory_Order_Read => Acquire,
27      Memory_Order_Write => Release;
28    Var_L: Level_Type (Process_ID);
29
30  end Access_To_Critical_Section;
31
32end Filter_Algorithm;
1package body Filter_Algorithm is
2
3  concurrent body Access_To_Critical_Section is
4
5    procedure Acquire_Lock (ID: Process_ID) is
6    begin
7      for L in Process_ID_Small’RANGE loop
8        Level(ID) := L;
9        Last_To_Enter(L) := ID;
10        Var_L(ID) := L;
11        Private_Lock(ID);
12      end loop;
13    end Acquire_Lock;
14
15    entry Private_Lock(for ID in Process_ID)
16      when ((Last_To_Enter(Var_L(ID)) /= ID) or else
17        (for all K in Level’RANGE => ( K /= ID and then
18          Level(K) < Var_L(ID))))
19    is
20    begin
21      null;
22    end Private_Lock;
23
24    procedure Release_Lock (ID: Process_ID) is
25    begin
26      Level(ID) := -1;
27    end Release_Lock;
28
29  end Access_To_Critical_Section;
30
31end Filter_Algorithm;
Listing 7: Filter Algorithm

0.b.0.3 API-based non-blocking stack.

Here we present how a non-blocking stack can be implemented via the API proposed in Sec. 7.

1  subtype Data is Integer;
2
3  type List;
4  type List_P is access List;
5  type List is
6    record
7      D: Data;
8      Next: List_P;
9    end record;
10
11  Empty: exception;
12
13  concurrent Lock_Free_Stack
14  is
15    procedure Push(D: Data);
16    procedure Pop(D: out Data);
17  private
18    Head: List_P with Read_Modify_Write;
19  end Lock_Free_Stack;
20
21  concurrent body Lock_Free_Stack is
22    procedure Push (D: Data) is
23      New_Node: List_P := new List;
24      function Update_Head_Push return List_P is
25      begin
26        return New_Node;
27      end Update_Head_Push;
28      function RMW_Head_Push return Boolean is
29        new Memory_Model.Read_Modify_Write(
30          Some_Synchronized_Type => List_P,
31          Update => Update_Head_Push,
32          Read_Modify_Write_Variable => Head,
33          Memory_Order_Success => Release,
34          Memory_Order_Failure => Relaxed);
35    begin
36      loop with Sync_Loop
37        New_Node.all := (D => D, Next => Head’Concurrent_Read(
38          Memory_Order => Relaxed);
39        exit when RMW_Head_Push;
40        -- This is an RMW operation; so: if value of head has changed in
41        -- between, the loop is reexecuted;
42        -- if not, the assignment succeeds.
43        -- NOTE: memory_order release initiates a happens_before
44        -- relationship for the memory_order aquire in pop
45      end loop;
46    end Push;
47
48    procedure Pop(D: out Data) is
49      Old_Head: List_P;
50      function Update_Head_Pop return List_P is
51      begin
52        return Old_Head.Next;
53      end Update_Head_Pop;
54      function RMW_Head_Pop return Boolean is
55        new Memory_Model.Read_Modify_Write(
56          Some_Atomic_Type => List_P,
57          Update => Update_Head_Pop,
58          Read_Modify_Write_Variable => Head,
59          Memory_Order_Success => Relaxed,
60          Memory_Order_Failure => Relaxed);
61    begin
62      loop with Sync_Loop
63        Old_Head := Head’Concurrent_Read(Memory_Order => Relaxed);
64        if Old_Head /= null then
65          if RMW_Head_Pop then
66          -- This is an RMW operation; so: if value of head has changed in
67          -- between, the if statement terminates and the loop body is
68          -- executed once more,
69          -- if not, the assignment succeeds and the then branch is
70          -- executed.
71          -- NOTE: memory_order aquire establishes a happens_before
72          -- relationship with the memory_order release in push
73            D := Old_Head.D;
74            exit;
75          end if;
76        else
77          raise Empty;
78        end if;
79      end loop;
80    end Pop;
81  end Lock_Free_Stack;
Listing 8: Non-blocking Stack Implementation Using Generic Function
Memory_Model.Read_Modify_Write