Decoupling Lock-Free Data Structures from Memory Reclamation for Static Analysis

Verification of concurrent data structures is one of the most challenging tasks in software verification. The topic has received considerable attention over the course of the last decade. Nevertheless, human-driven techniques remain cumbersome and notoriously difficult while automated approaches suffer from limited applicability. The main obstacle for automation is the complexity of concurrent data structures. This is particularly true in the absence of garbage collection. The intricacy of lock-free memory management paired with the complexity of concurrent data structures makes automated verification prohibitive. In this work we present a method for verifying concurrent data structures and their memory management separately. We suggest two simpler verification tasks that imply the correctness of the data structure. The first task establishes an over-approximation of the reclamation behavior of the memory management. The second task exploits this over-approximation to verify the data structure without the need to consider the implementation of the memory management itself. To make the resulting verification tasks tractable for automated techniques, we establish a second result. We show that a verification tool needs to consider only executions where a single memory location is reused. We implemented our approach and were able to verify linearizability of Michael&Scott's queue and the DGLM queue for both hazard pointers and epoch-based reclamation. To the best of our knowledge, we are the first to verify such implementations fully automatically.

READ FULL TEXT VIEW PDF

Authors

page 1

08/20/2018

Every Data Structure Deserves Lock-Free Memory Reclamation

Memory-management support for lock-free data structures is well known to...
10/12/2021

A Simple Way to Verify Linearizability of Concurrent Stacks

Linearizability is a commonly accepted correctness criterion for concurr...
06/21/2018

Proving Linearizability Using Reduction

Lipton's reduction theory provides an intuitive and simple way for deduc...
04/17/2020

Reducing Commutativity Verification to Reachability with Differencing Abstractions

Commutativity of data structure methods is of ongoing interest, with roo...
04/12/2022

Turning Manual Concurrent Memory Reclamation into Automatic Reference Counting

Safe memory reclamation (SMR) schemes are an essential tool for lock-fre...
09/12/2021

Verifying Concurrent Multicopy Search Structures

Multicopy search structures such as log-structured merge (LSM) trees are...
05/15/2019

Quantifiability: Concurrent Correctness from First Principles

Architectural imperatives due to the slowing of Moore's Law, the broad a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Data structures are a basic building block of virtually any program. Efficient implementations are typically a part of a programming language’s standard library. With the advent of highly concurrent computing being available even on commodity hardware, concurrent data structure implementations are needed. The class of lock-free data structures has been shown to be particularly efficient. Using fine-grained synchronization and avoiding such synchronization whenever possible results in unrivaled performance and scalability.

Unfortunately, this use of fine-grained synchronization is what makes lock-free data structures also unrivaled in terms of complexity. Indeed, bugs have been discovered in published lock-free data structures (Michael and Scott, 1995; Doherty et al., 2004a). This confirms the need for formal proofs of correctness. The de facto standard correctness notion for concurrent data structures is linearizability (Herlihy and Wing, 1990). Intuitively, linearizability provides the illusion that the operations of a data structure appear atomically. Clients of linearizable data structures can thus rely on a much simpler sequential specification.

Establishing linearizability for lock-free data structures is challenging. The topic has received considerable attention over the past decade (cf. Section 7). For instance, Doherty et al. (2004b) give a mechanized proof of a lock-free queue. Such proofs require a tremendous effort and a deep understanding of the data structure and the verification technique. Automated approaches remove this burden. Vafeiadis (2010b, a), for instance, verifies singly-linked structures fully automatically.

However, many linearizability proofs rely on a garbage collector. What is still missing are automated techniques that can handle lock-free data structures with manual memory management. The major obstacle in automating proofs for such implementations is that lock-free memory management is rather complicated—in some cases even as complicated as the data structure using it. The reason for this is that memory deletions need to be deferred until all unsynchronized, concurrent readers are done accessing the memory. Coping with lock-free memory management is an active field of research. It is oftentimes referred to as Safe Memory Reclamation (SMR). The wording underlines its focus on safely reclaiming memory for lock-free programs. This results in the system design depicted in Figure 1. The clients of a lock-free data structure are unaware of how it manages its memory. The data structure uses an allocator to acquire memory, for example, using malloc. However, it does not free the memory itself. Instead, it delegates this task to an SMR algorithm which defers the free until it is safe. The deferral can be controlled by the data structure through an API the functions of which depend on the actual SMR algorithm.

Client

LFDS

SMR

Allocator

API

malloc

free

this paper
Figure 1. Typical interaction between the components of a system. Lock-free data structures (LFDS) perform all their reclamation through an SMR component.

In this paper we tackle the challenge of verifying lock-free data structures which use SMR. To make the verification tractable, we suggest a compositional approach which is inspired by the system design from Figure 1. We observe that the only influence the SMR implementation has on the data structure are the free operations it performs. So we introduce SMR specifications that capture when a free can be executed depending on the history of invoked SMR API functions. With such a specification at hand, we can verify that a given SMR implementation adheres to the specification. More importantly, it allows for a compositional verification of the data structure. Intuitively, we replace the SMR implementation with the SMR specification. If the SMR implementation adheres to the specification, then the specification over-approximates the frees of the implementation. Using this over-approximation for verifying the data structure is sound because frees are the only influence the SMR implementation has on the data structure.

Although our compositional approach localizes the verification effort, it leaves the verification tool with a hard task: verifying shared-memory programs with memory reuse. Our second finding eases this task by taming the complexity of reasoning about memory reuse. We prove sound that it suffices to consider reusing a single memory location only. This result relies on data structures being invariant to whether or not memory is actually reclaimed and reused. Intuitively, this requirement boils down to ABA freedom and is satisfied by data structures from the literature.

To substantiate the usefulness of our approach, we implemented a linearizability checker which realizes the approaches presented in this paper, compositional verification and reduction of reuse to a single address. Our tool is able to establish linearizability of well-known lock-free data structures, such as Treiber’s stack (Treiber, 1986), Michael&Scott’s queue (Michael and Scott, 1996), and the DGLM queue (Doherty et al., 2004b), when using SMR, like Hazard Pointers (Michael, 2002) and Epoch-Based Reclamation (Fraser, 2004). We remark that we needed both results for the benchmarks to go through. To the best of our knowledge, we are the first to verify lock-free data structures with SMR fully automatically. We are also the first to automatically verify the DGLM queue with any manual memory management.

Our contributions and the outline of our paper are summarized as follows:

  • introduces a means for specifying SMR algorithms and establishes how to perform compositional verification of lock-free data structures and SMR implementations,

  • presents a sound verification approach which considers only those executions of a program where at most a single memory location is reused,

  • evaluates our approach on well-known lock-free data structures and SMR algorithms, and demonstrates its practicality.

We illustrate our contributions informally in §2, introduce the programing model in §3, discuss related work in §7, and conclude the paper in §8.

This technical report extends the conference version (Meyer and Wolff, 2018) with missing details. For companion material refer to https://wolff09.github.io/TMRexp/.

2. The Verification Approach on an Example

The verification of lock-free data structures is challenging due to their complexity. One source of this complexity is the avoidance of traditional synchronization. This leads to subtle thread interactions and imposes a severe state space explosion. The problem becomes worse in the absence of garbage collection. This is due to the fact that lock-free memory reclamation is far from trivial. Due to the lack of synchronization it is typically impossible for a thread to judge whether or not certain memory will be accessed by other threads. Hence, naively deleting memory is not feasible. To overcome this problem, programmers employ SMR algorithms. While this solves the memory reclamation problem, it imposes a major obstacle for verification. For one, SMR implementations are oftentimes as complicated as the data structure using it. This makes the already hard verification of lock-free data structures prohibitive.

1struct Node { data_t data; Node* next; };
2shared Node* Head, Tail;
3atomic init() { Head = new Node(); Head->next = NULL; Tail = Head; }
1void enqueue(data_t input) { 2  Node* node = new Node(); 3  node->data = input; 4  node->next = NULL; 5  while (true) { 6    Node* tail = Tail; 7H   protect(tail, 0); 8H   if (tail != Tail) continue; 9    Node* next = tail->next; 10    if (tail != Tail) continue; 11    if (next != NULL) { 12      CAS(&Tail, tail, next); 13      continue; 14    } 15    if (CAS(&tail->next, next, node)) 16      break 17  } 18  CAS(&Tail, tail, node); 19H unprotect(0); 20} 3data_t dequeue() {  4  while (true) { 5    Node* head = Head;  6H   protect(head, 0);  7H   if (head != Head) continue;  8    Node* tail = Tail;  9    Node* next = head->next;  10H   protect(next, 1); 11    if (head != Head) continue; 12    if (next == NULL) return EMPTY; 13    if (head == tail) { 14      CAS(&Tail, tail, next); 15      continue; 16    } else { 17      data_t output = next->data; 18      if (CAS(&Head, head, next)) {  19        // delete head;  20H       retire(head);  21H       unprotect(0); unprotect(1); 22        return output;  23} } } }
Figure 2. Michael&Scott’s non-blocking queue (Michael and Scott, 1996) extended with hazard pointers (Michael, 2002) for safe memory reclamation. The modifications needed for using hazard pointers are marked with H. The implementation requires two hazard pointers per thread.

We illustrate the above problems on the lock-free queue from Michael and Scott (1996). It is a practical example in that it is used for Java’s ConcurrentLinkedQueue and C++ Boost’s lockfree::queue, for instance. The implementation is given in Figure 2 (ignore the lines marked by H

for a moment). The queue maintains a

NULL-terminated singly-linked list of nodes. New nodes are enqueued at the end of that list. If the Tail pointer points to the end of the list, a new node is appended by linking Tail->next to the new node. Then, Tail is updated to point to the new node. If Tail is not pointing to the end of the list, the enqueue operation first moves Tail to the last node and then appends a new node as before. The dequeue operation on the other hand removes nodes from the front of the queue. To do so, it first reads the data of the second node in the queue (the first one is a dummy node) and then swings Head to the subsequent node. Additionally, dequeues ensure that Head does not overtake Tail. Hence, a dequeue operation may have to move Tail towards the end of the list before it moves Head. It is worth pointing out that threads read from the queue without synchronization. Updates synchronize on single memory words using atomic Compare-And-Swap (CAS).

In terms of memory management, the queue is flawed. It leaks memory because dequeued nodes are not reclaimed. A naive fix for this leak would be to uncomment the delete head statement in Figure 2. However, other threads may still hold and dereference pointers to the then deleted node. Such use-after-free dereference are unsafe. In C/C++, for example, the behavior is undefined and can result in a system crash due to a segfault.

23struct HPRec { HPRec* next; Node* hp0; Node* hp1; }
24shared HPRec* Records;
25threadlocal HPRec* myRec;
26threadlocal List<Node*> retiredList;
27atomic init() { Records = NULL; }
23void join() { 24  myRec = new HPRec(); 25  while (true) { 26    HPRec* rec = Records; 27    myRec->next = rec; 28    if (CAS(Records, rec, myRec)) 29      break; 30  } 31} 32 33void part() { 34  unprotect(0); unprotect(1); 35} 36 37void protect(Node* ptr, int i) { 38  if (i == 0) myRec->hp0 = ptr; 39  if (i == 1) myRec->hp1 = ptr; 40  assert(false); 41} 42 43void unprotect(int i) { 44  protect(NULL, i); 45} 27void retire(Node* ptr) { 28  if (ptr != NULL) 29    retiredList.add(ptr); 30  if (*) reclaim(); 31} 32 33void reclaim() { 34  List<Node*> protectedList; 35  HPRec* cur = Records; 36  while (cur != NULL) { 37    Node* hp0 = cur->hp0; 38    Node* hp1 = cur->hp1; 39    protectedList.add(hp0); 40    protectedList.add(hp1); 41    cur = cur->next; 42  } 43  for (Node* ptr : retiredList) {  44    if (protectedList.contains(ptr)) 45      continue; 46    retiredList.remove(ptr); 47    delete ptr; 48  } 49}
Figure 3. Simplified hazard pointer implementation (Michael, 2002). Each thread is equipped with two hazard pointers. Threads can dynamically join and part. Note that the record used to store a thread’s hazard pointers is not reclaimed upon parting.

To avoid both memory leaks and unsafe accesses, programmers employ SMR algorithms like Hazard Pointers (HP) (Michael, 2002). An example HP implementation is given in Figure 3. Each thread holds a HPRec record containing two single-writer multiple-reader pointers hp0 and hp1. A thread can use these pointers to protect nodes it will subsequently access without synchronization. Put differently, a thread requests other threads to defer the deletion of a node by protecting it. The deferred deletion mechanism is implemented as follows. Instead of explicitly deleting nodes, threads retire them. Retired nodes are stored in a thread-local retiredList and await reclamation. Eventually, a thread tries to reclaim the nodes collected in its list. Therefore, it reads the hazard pointers of all threads and copies them into a local protectedList. The nodes in the intersection of the two lists, retiredList$\,\cap\,

protectedList, cannot be reclaimed because they are still protected by some thread. The remaining nodes, retiredList$\,\setminus\,
protectedList
, are reclaimed.

Note that the HP implementation from Figure 3 allows for threads to join and part dynamically. In order to join, a thread allocates an HPRec and appends it to a shared list of such records. Afterwards, the thread uses the hp0 and hp1 fields of that record to issue protections. Subsequent reclaim invocations of any thread are aware of the newly added hazard pointers since reclaim traverses the shared list of HPRec records. To part, threads simply unprotect their hazard pointers. They do not reclaim their HPRec record (Michael, 2002). The reason for this is that reclaiming would yield the same difficulties that we face when reclaiming in lock-free data structures, as discussed before.

To use hazard pointers with Michael&Scott’s queue we have to modify the implementation to retire dequeued nodes and to protect nodes that will be accessed without synchronization. The required modifications are marked by H in Figure 2. Retiring dequeued nodes is straight forward, as seen in Figure 2. Successfully protecting a node is more involved. A typical pattern to do this is implemented by Figures 2, 2 and 2, for instance. First, a local copy head of the shared pointer Head is created, Figure 2. The node referenced by head is subsequently protected, Figure 2. Simply issuing this protection, however, does not have the intended effect. Another thread could concurrently execute reclaim from Figure 3. If the reclaiming thread already computed its protectedList, i.e., executed reclaim up to Figure 3, then it does not see the later protection and thus may reclaim the node referenced by head. The check from Figure 2 safeguards the queue from such situations. It ensures that head has not been retired since the protection was issued.111The reasoning is a bit more complicated. We discuss this in more detail in Section 5.3. Hence, no concurrent reclaim considers it for deletion. This guarantees that subsequent dereferences of head are safe. This pattern exploits a simple temporal property of hazard pointers, namely that a retired node is not reclaimed if it has been protected continuously since before the retire (Gotsman et al., 2013).

As we have seen, the verification of lock-free data structures becomes much more complex when considering SMR code. On the one hand, the data structure needs additional logic to properly use the SMR implementation. On the other hand, the SMR implementation is complex in itself. It is lock-free (as otherwise the data structure would not be lock-free) and uses multiple lists.

Our contributions make the verification tractable. First, we suggest a compositional verification technique which allows us to verify the data structure and the SMR implementation separately. Second, we reduce the impact of memory management for the two new verification tasks. We found that both contributions are required to automate the verification of data structures like Michael&Scott’s queue with hazard pointers.

2.1. Compositional Verification

We propose a compositional verification technique. We split up the single, monolithic task of verifying a lock-free data structure together with its SMR implementation into two separate tasks: verifying the SMR implementation and verifying the data structure implementation without the SMR implementation. At the heart of our approach is a specification of the SMR behavior. Crucially, this specification has to capture the influence of the SMR implementation on the data structure. Our main observation is that it has none, as we have seen conceptually in Figure 1 and practically in Figures 3 and 2. More precisely, there is no direct influence. The SMR algorithm influences the data structure only indirectly through the underlying allocator: the data structure passes to-be-reclaimed nodes to the SMR algorithm, the SMR algorithm eventually reclaims those nodes using free of the allocator, and then the data structure can reuse the reclaimed memory with malloc of the allocator.

In order to come up with an SMR specification, we exploit the above observation as follows. We let the specification define when reclaiming retired nodes is allowed. Then, the SMR implementation is correct if the reclamations it performs are a subset of the reclamations allowed by the specification. For verifying the data structure, we use the SMR specification to over-approximate the reclamation of the SMR implementation. This way we over-approximate the influence the SMR implementation has on the data structure, provided that the SMR implementation is correct. Hence, our approach is sound for solving the original verification task.

Towards lightweight SMR specifications, we rely on the insight that SMR implementations, despite their complexity, implement rather simple temporal properties (Gotsman et al., 2013). We have already seen that hazard pointers implement that a retired node is not reclaimed if it has been protected continuously since before the retire. These temporal properties are incognizant of the actual SMR implementation. Instead, they reason about those points in time when a call of an SMR function is invoked or returns. We exploit this by having SMR specifications judge when reclamation is allowed based on the history of SMR invocations and returns.

Figure 4. Automaton for specifying negative HP behavior for thread , address , and index . It states that if was protected by thread using hazard pointer before is retired by any thread (denoted by ), then freeing must be deferred. Here, "must be deferred" is expressed by reaching a final state upon a free of .

For the actual specification we use observer automata. A simplified specification for hazard pointers is given in Figure 4. The automaton is parametrized by a thread , an address , and an integer . Intuitively, specifies when the -th hazard pointer of forces a free of to be deferred. Technically, the automaton reaches an accepting state if a free is performed that should have been deferred. That is, we let observers specify bad behavior. We found this easier than to formalize the good behavior. For an example, consider the following histories:

History leads to an accepting state. Indeed, that is protected before being retired forbids a free of . History does not lead to an accepting state because the retire is issued before the protection has returned. The free of can be observed if the threads are scheduled in such a way that the protection of is not visible while computes its retiredList, as in the scenario described above for motivating why the check at Figure 2 is required.

Now, we are ready for compositional verification. Given an observer, we first check that the SMR implementation is correct wrt. to that observer. Second, we verify the data structure. To that end, we strip away the SMR implementation and let the observer execute the frees. More precisely, we non-deterministically free those addresses which are allowed to be freed according to the observer.

Theorem 2.1 (Proven by Theorem 4.2).

Let be a data structure using an SMR implementation . Let be an observer. If is correct wrt. and if is correct, then is correct.

A thorough discussion of the illustrated concepts is given in Section 4.

2.2. Taming Memory Management for Verification

Factoring out the implementation of the SMR algorithm and replacing it with its specification reduces the complexity of the data structure code under scrutiny. What remains is the challenge of verifying a data structure with manual memory management. As suggested by Abdulla et al. (2013); Haziza et al. (2016) this makes the analysis scale poorly or even intractable. To overcome this problem, we suggest to perform verification in a simpler semantics. Inspired by the findings of the aforementioned works we suggest to avoid reallocations as much as possible. As a second contribution we prove the following theorem.

Theorem 2.2 (Proven by Theorem 5.20).

For a sound verification of safety properties it suffices to consider executions where at most a single address is reused.

The rational behind this theorem is the following. From the literature we know that avoiding memory reuse altogether is not sound for verification (Michael and Scott, 1996). Put differently, correctness under garbage collection (GC) does not imply correctness under manual memory management (MM). The difference of the two program semantics becomes evident in the ABA problem. An ABA is a scenario where a pointer referencing address is changed to point to address and changed back to point to again. Under MM a thread might erroneously conclude that the pointer has never changed if the intermediate value was not seen due to a certain interleaving. Typically, the root of the problem is that address is removed from the data structure, deleted, reallocated, and reenters the data structure. Under GC, the exact same code does not suffer from this problem. A pointer referencing would prevent it from being reused.

From this we learn that avoiding memory reuse does not allow for a sound analysis due to the ABA problem. So we want to check with little overhead to a GC analysis whether or not the program under scrutiny suffers from the ABA problem. If not, correctness under GC implies correctness under MM. Otherwise, we reject the program as buggy.

To check whether a program suffers from ABAs it suffices to check for first ABAs. Fixing the address of such a first ABA allows us to avoid reuse of any address except while retaining the ability to detect the ABA. Intuitively, this is the case because the first ABA is the first time the program reacts differently on a reused address than on a fresh address. Hence, replacing reallocations with allocations of fresh addresses before the first ABA retains the behavior of the program.

A formal discussion of the presented result is given in Section 5.

3. Programs with Safe Memory Reclamation

We define shared-memory programs that use an SMR library to reclaim memory. Here, we focus on the program. We leave unspecified the internal structure of SMR libraries (typically, they use the same constructs as programs), our development does not depend on it. We show in Section 4 how to strip away the SMR implementation for verification.

Memory

A memory is a partial function which maps pointer expressions to addresses from and data expressions to values from , respectively. A pointer expression is either a pointer variable or a pointer selector: . Similarly, we have . The selectors of an address are and . A generalization to arbitrary selectors is straight forward. We use to denote undefined/uninitialized pointers. We write if . An address is in-use if it is referenced by some pointer variable or if one of its selectors is defined. The set of such in-use addresses in is . For a formal definition refer to Section C.1.

Programs

Figure 5. The syntax of atomic commands. Here, are data variables, and are pointer variables. We write instead of and similarly for . Besides , SMR implementations may use not further specified commands .

We consider computations of data structures using an SMR library , written . A computation is a sequence of actions. An action is of the form . Intuitively, states that thread executes command resulting in the memory update . An action stems either from executing or from executing functions offered by .

The commands are given in Figure 5. The commands of include assignments, memory accesses, assertions, and allocations with the usual meaning. We make explicit when a thread enters and exits a library function with and , respectively. That is, we assume that computations are well-formed in the sense that no commands from and all commands from of a thread occur between and . Besides deallocations we leave the commands of unspecified.

The memory resulting from a computation , denoted by , is defined inductively by its updates. Initially, pointer variables are uninitialized, , and data variables are default initialized, . For a computation with we have . With the memory update we mean and for all .

Semantics

The semantics of is defined relative to a set of addresses that may be reused. It is the set of allowed executions, denoted by . To make the semantics precise, let and be those sets of addresses which have never been allocated and have been freed since their last allocation, respectively. (See Section C.1 for formal definitions.) Then, the semantics is defined inductively. In the base case we have the empty execution . In the induction step we have if one of the following rules applies.

(Malloc):

If where is arbitrary and is fresh or available for reuse, that is, or .

(FreePtr):

If with .

(Enter):

If with and for all .

(Exit):

If .

(Assign1):

If .

(Assign2):

If with .

(Assign3):

If with .

(Assign4):

If with .

(Assign5):

If with .

(Assign6):

If with .

(Assert):

If if .

We assume that computations respect the control flow (program order) of threads. The control location after is denoted by . We deliberately leave this unspecified as we will express only properties of the form to state that after and the threads can execute the same commands.

4. Compositional Verification

Our first contribution is a compositional verification approach for data structures which use an SMR library . The complexity of SMR implementations makes the verification of data structure and SMR implementation in a single analysis prohibitive. To overcome this problem, we suggest to verify both implementations independently of each other. More specifically, we (i) introduce a means for specifying SMR implementations, then (ii) verify the SMR implementation against its specification, and (iii) verify the data structure relative to the SMR specification rather than the SMR implementation. If both verification tasks succeed, then the data structure using the SMR implementation, , is correct.

Our approach compares favorably to existing techniques. Manual techniques from the literature consider a monolithic verification task where both the data structure and the SMR implementaetion are verfied together (Parkinson et al., 2007; Fu et al., 2010; Tofan et al., 2011; Gotsman et al., 2013; Krishna et al., 2018). Consequently, only simple implementations using SMR have been verified. Existing automated techniques rely on non-standard program semantics and support only simplistic SMR techniques (Abdulla et al., 2013; Haziza et al., 2016). Refer to Section 7 for a more detailed discussion.

Towards our result, we first introduce observer automata for specifying SMR algorithms. Then we discuss the two new verification tasks and show that they imply the desired correctness result.

Observer Automata

An observer automaton consists of observer locations, observer variables, and transitions. There is a dedicated initial location and some accepting locations. Transitions are of the form with observer locations , event , and guard . Events consist of a type and parameters . The guard is a Boolean formula over equalities of observer variables and the parameters . An observer state is a tuple where is a location and maps observer variables to values. Such a state is initial if is initial, and similarly accepting if is accepting. Then, is an observer step, if is a transition and evaluates to . With we mean where the formal parameters are replaced with the actual values and where the observer variables are replaced by their -mapped values. Initially, the valuation  is chosen non-deterministically; it is not changed by observer steps.

A history is a sequence of events. We write if there are steps . If is accepting, then we say that is accepted by . We use observers to characterize bad behavior. So we say is in the specification of , denoted by , if it is not accepted by . Formally, the specification of is the set of histories that are not accepted by . The specification of is the set of histories that are not accepted by any initial state of , . The cross-product denotes an observer with .

SMR Specifications

To use observers for specifying SMR algorithms, we have to instantiate appropriately the histories they observe. Our instantiation crucially relies on the fact that programmers of lock-free data structures rely solely on simple temporal properties that SMR algorithms implement (Gotsman et al., 2013). These properties are typically incognizant of the actual SMR implementation. Instead, they allow reasoning about the implementation’s behavior based on the temporal order of function invocations and responses. With respect to our programming model, and actions provided the necessary means to deduce from the data structure computation how the SMR implementation behaves.

We instantiate observers for specifying SMR as follows. As event types we use (i) , the functions offered by the SMR algorithm, (ii) , and (iii) . The parameters to the events are (i) the executing thread and the parameters to the call in case of type , (ii) the executing thread for type , and (iii) the parameters to the call for type . Here, type represents the corresponding command of a call to . The corresponding event is uniquely defined because both and events contain the executing thread.

Observer specifying that address may be freed only if it has been retired and not been freed since.

,

,

,

,


Observer specifying when HP defers frees. A retired cell may not be freed if it has been protected continuously by the -th hazard pointer of thread since before being retired.
Figure 6. Observer characterizes the histories that violate the Hazard Pointer specification. Three observer variables, , , and , are used to observe a thread, an address, and an integer, respectively. For better legibility we omit self-loops for every location and every event that is missing an outgoing transition from that location.

For an example, consider the hazard pointer specification from Figure 6. It consists of two observers. First, specifies that no address must be freed that has not been retired. Second, implements the temporal property that no address must be freed if it has been protected continuously since before the retire. For observer we assume that no address is retired multiple times before being reclaimed (freed) by the SMR implementation. This avoids handling such double-retire scenarios in the observer, keeping it smaller and simpler. The assumption is reasonable because in a setting where SMR is used a double-retire is the analogue of a double-free and thus avoided. Our experiments confirm this intuition.

With an SMR specification in form of an observer at hand, our task is to check whether or not a given SMR implementation satisfies this specification. We do this by converting a computation of into its induced history and check if . The induced history is a projection of to , , and actions. This projection replaces the formal parameters in with their actual values. For example, . For a formal definition, consider Section C.1. Then, satisfies if . The SMR implementation satisfies if every possible usage of produces a computation that satisfies . To generate all such computations, we use a most general client (MGC) for which concurrently executes arbitrary sequences of SMR functions.

Definition 4.1 (SMR Correctness).

An SMR implementation is correct wrt. a specification , denoted by , if for every we have .

From the above definition follows the first new verification task: prove that the SMR implementation cannot possibly violate the specification . Intuitively, this boils down to a reachability analysis of accepting states in the cross-product of and . Since we can understand as a lock-free data structure itself, this task is similar to our next one, namely verifying the data structure relative to . In the remainder of the paper we focus on this second task because it is harder than the first one. The reason for this lies in that SMR implementations typically do not reclaim the memory they use. This holds true even if the SMR implementation supports dynamic thread joining and parting (Michael, 2002) (cf. part() from Figure 3). The absence of reclamation allows for a simpler222In terms of Section 5, the absence of reclamation results in SMR implementations being free from pointer races and harmful ABAs since pointers do not become invalid. Intuitively, this allows us to combine our results with ones from Haziza et al. (2016) and verify the SMR implementation in a garbage-collected semantics. and more efficient analysis. We confirm this in our experiments where we automatically verify the Hazard Pointer implementation from Figure 3 wrt. .

Compositionality

The next task is to verify the data structure avoiding the complexity of . We have already established correctness of wrt. a specification . Intuitively, we now replace by . Because is an observer, and not program code like , we cannot just execute in place of . Instead, we remove the SMR implementation from . The result is the computations of which correspond to the ones of with SMR-specific actions between and being removed. To account for the frees that executes, we introduce environment steps. We non-deterministically check for every address whether or not allows freeing it. If so, we free the address. Formally, the semantics corresponds to as defined in Section 3 plus a new rule for frees from the environment.

(Free):

If and can be freed, i.e., , then we have with .

Note that from the environment has the same update as from if points to . With this definition, performs more frees than provided .

With the semantics of data structures with respect to an SMR specification rather than an SMR implementation set up, we can turn to the main result of this section. It states that the correctness of wrt. and the correctness of entail the correctness of the original program . Here, we focus on the verification of safety properties. It is known that this reduces to control location reachability (Vardi, 1987). So we can assume that there is a dedicated bad control location in the unreachability of which is equivalent to the correctness of . To establish the result, we require that the interaction between and follows the one depicted in Figure 1 and discussed on an example in Section 2. That is, the only influence has on are frees. In particular, this means that does not modify the memory accessed by . We found this restriction to be satisfied by many SMR algorithms from the literature. We believe that our development can be generalized to reflect memory modifications performed by the SMR algorithm. A proper investigation of the matter, however, is beyond the scope of this paper.

Theorem 4.2 (Compositionality).

Let . If is correct, so is .

For the technical details of the above result, see Section C.2.

Compositionality is a powerful tool for verification. It allows us to verify the data structure and the SMR implementation independently of each other. Although this simplifies the verification, reasoning about lock-free programs operating on a shared memory remains hard. In Section 5 we build upon the above result and propose a sound verification of which considers only executions reusing a single addresses.

5. Taming Memory Reuse

As a second contribution we demonstrate that one can soundly verify a data structure by considering only those computations where at most a single cell is reused. This avoids the need for a state space exploration of full . Such explorations suffer from a severe state space explosion. In fact, we were not able to make our analysis from Section 6 go through without this second contribution. Previous works (Abdulla et al., 2013; Haziza et al., 2016; Holík et al., 2017) have not required such a result since they did not consider fully fledged SMR implementations like we do. For a thorough discussion of related work refer to Section 7.

Our results are independent of the actual safety property and the actual observer specifying the SMR algorithm. To achieve this, we establish that for every computation from there is a similar computation which reuses only a single address. We construct such a similar computation by eliding reuse in the original computation. With elision we mean that we replace in a computation a freed address with a fresh one. This allows a subsequent allocation to malloc the elided address fresh instead of reusing it. Our notion of similarity makes sure that in both computations the threads reach the same control locations. This allows for verifying safety properties.

The remainder of the section is structured as follows. Section 5.1 introduces our notion of similarity. Section 5.2 formalizes requirements on such that the notion of similarity suffices to prove the desired result. Section 5.3 discusses how the ABA problem can affect soundness of our approach and shows how to detect those cases. Section 5.4 presents the reduction result.

5.1. Similarity of Computations

Our goal is to mimic a computation where memory is reused arbitrarily with a computation where memory reuse is restricted. As noted before, we want that the threads in and reach the same control locations in order to verify safety properties of in . We introduce a similarity relation among computations such that and are similar if they can execute the same actions. This results in both reaching the same control locations as desired. However, control location equality alone is insufficient for to mimic subsequent actions of , that is, to preserve similarity for subsequent actions. This is because most actions involve memory interaction. Since reuses memory differently than , the memory of the two computations is not equal. Similarity requires a non-trivial correspondence wrt. the memory. Towards a formal definition let us consider an example.

Example 5.1 ().

Let be a computation of a data structure using hazard pointers:

In this computation, thread uses pointer to allocate address . The address is then retired and freed. In the subsequent allocation, acquires another pointer to ; is reused.

If is a computation where shall not be reused, then is not able to execute the exact same sequence of actions as . However, it can mimic as follows:

where coincides with up to replacing the first allocation of with another address . We say that elides the reuse of . Note that the memories of and differ on and agree on .

In the above example, is a dangling pointer. Programmers typically avoid using such pointers because it is unsafe. For a definition of similarity, this practice suggests that similar computations must coincide only on the non-dangling pointers and may differ on the dangling ones. To make precise which pointers in a computation are dangling, we use the notion of validity. That is, we define a set of valid pointers. The dangling pointers are then the complement of the valid pointers. We take this detour because we found it easier to formalize the valid pointers.

Initially, no pointer is valid. A pointer becomes valid if it receives its value from an allocation or another valid pointer. A pointer becomes invalid if its referenced memory location is deleted or it receives its value from an invalid pointer. A deletion of a memory cell makes invalid its pointer selectors and all pointers to that cell. A subsequent reallocation of that cell makes valid only the receiving pointer; all other pointers to that cell remain invalid. Assertions of the form validate if is valid, and vice versa. A formal definition of the valid pointers in a computation , denoted by , can be found in Section C.3.

Example 5.2 (Continued).

In both and from the previous example, the last allocation renders valid pointer . On the other hand, the free to in renders invalid. The reallocation of does not change the validity of , it remains invalid. In , address is allocated and freed rendering invalid. It remains invalid after the subsequent allocation of . That is, both and agree on the validity of and the invalidity of . Moreover, and agree on the valuation of the valid and disagree (here by chance) on the valuation of the invalid .

The above example illustrates that eliding reuse of memory leads to a different memory valuation. However, the elision can be performed in such a way that the valid memory is not affected. So we say that two computations are similar if they agree on the resulting control locations of threads and the valid memory. The valid memory includes the valid pointer variables, the valid pointer selectors, the data variables, and the data selectors of addresses that are referenced by a valid pointer variable/selector. Formally, this is a restriction of the entire memory to the valid pointers, written .

Definition 5.3 (Restrictions).

A restriction of to a set , denoted by , is a new with and for all .

We are now ready to formalize the notion of similarity among computations. Two computations are similar if they agree on the control location of threads and the valid memory.

Definition 5.4 (Computation Similarity).

Two computations and are similar, denoted by , if and .

If two computations and are similar, then each action enabled after can be mimicked in . An action can be mimicked by another action . Both actions agree on the executing thread and the executed command but may differ in the memory update. The reason for this is that similarity does not relate the invalid parts of the memory. This may give another update in if involves invalid pointers.

Example 5.5 (Continued).

Consider the following continuation of and :

where we append an assignment of to itself. The prefixes and are similar, . Nevertheless, the updates and differ because they involve the valuation of the invalid pointer which differs in and . The updates are and . Since the assignment leaves invalid, similarity is preserved by the appended actions, . We say that mimics .

Altogether, similarity does not guarantee that the exact same actions are executable. It guarantees that every action can be mimicked such that similarity is preserved.

In the above we omitted an integral part of the program semantics. Memory reclamation is not based on the control location of threads but on an observer examining the history induced by a computation. The enabledness of a is not preserved by similarity. On the one hand, this is due to the fact that invalid pointers can be (and in practice are) used in SMR calls which lead to different histories. On the other hand, similar computations end up in the same control location but may perform different sequences of actions to arrive there, for instance, execute different branches of conditionals. That is, to mimic actions we need to correlate the behavior of the observer rather than the behavior of the program. We motivate the definition of an appropriate relation.

Example 5.6 (Continued).

Consider the following continuation of and :

where issues a protection and a retirement using and , respectively. The histories induced by those computations are:

Recall that and are similar. Similarity guarantees that the events of the call coincide because is valid. The events of the call differ because the valuations of the invalid differ. That is, SMR calls do not necessarily emit the same event in similar computations. Consequently, the observer states after and differ. More precisely, from Figure 6 has the following runs from the initial observer state with :

This prevents from being freed after (a would lead to the final state and is thus not enabled) but allows for freeing it after .

The above example shows that eliding memory addresses to avoid reuse changes observer runs. The affected runs involve freed addresses. Like for computation similarity, we define a relation among computations which captures the observer behavior on the valid addresses, i.e., those addresses that are referenced by valid pointers, and ignores all other addresses. Here, we do not use an equivalence relation. That is, we do not require observers to reach the exact same state for valid addresses. Instead, we express that the mimicking allows for more observer behavior on the valid addresses than the mimicked does. We define an observer behavior inclusion among computations. This is motivated by the above example. There, address is valid because it is referenced by a valid pointer, namely . Yet the observer runs for differ in and . After more behavior is possible; can free while cannot.

To make this intuition precise, we need a notion of behavior on an address. Recall that the goal of the desired behavior inclusion is to enable us to mimic s. Intuitively, the behavior allowed by on address is the set of those histories that lead to a free of .

Definition 5.7 (Observer Behavior).

The behavior allowed by on address after history is the set .

Note that contains events for address only. This is necessary because an address may become invalid before being freed if, for instance, the address becomes unreachable from valid pointers. The mimicking computation may have already freed such an address while has not, despite similarity. Hence, the free is no longer allowed after but still possible after . To prevent such invalid addresses from breaking the desired inclusion on valid addresses, we strip from all s that do not target . Note that we do not even retain frees of valid addresses here. This way, only actions which emit an event influence .

The observer behavior inclusion among computations is defined such that includes at least the behavior of on the valid addresses. Formally, the valid addresses in are .

Definition 5.8 (Observer Behavior Inclusion).

Computation includes the (observer) behavior of , denoted by , if holds for all .

5.2. Preserving Similarity

The development in Section 5.1 is idealized. There are cases where the introduced relations do not guarantee that an action can be mimicked. All such cases have in common that they involve the usage of invalid pointers. More precisely, (i) the computation similarity may not be strong enough to mimic actions that dereference invalid pointers, and (ii) the observer behavior inclusion may not be strong enough to mimic calls involving invalid pointers. For each of those cases we give an example and restrict our development. We argue throughout this section that our restrictions are reasonable. Our experiments confirm this. We begin with the computation similarity.

Example 5.9 (Continued).

Consider the following continuation of and :

The first appended action updates in both computations to . Since is valid after both and this assignment renders valid . The second action assigns to in . This results in being invalid after because the right-hand side of the assignment is the invalid . In the second action updates which is why remains valid. That is, the valid memories of </