Generating Concurrent Programs From Sequential Data Structure Knowledge Using Answer Set Programming

09/17/2021
by   Sarat Chandra Varanasi, et al.
0

We tackle the problem of automatically designing concurrent data structure operations given a sequential data structure specification and knowledge about concurrent behavior. Designing concurrent code is a non-trivial task even in simplest of cases. Humans often design concurrent data structure operations by transforming sequential versions into their respective concurrent versions. This requires an understanding of the data structure, its sequential behavior, thread interactions during concurrent execution and shared memory synchronization primitives. We mechanize this design process using automated commonsense reasoning. We assume that the data structure description is provided as axioms alongside the sequential code of its algebraic operations. This information is used to automatically derive concurrent code for that data structure, such as dictionary operations for linked lists and binary search trees. Knowledge in our case is expressed using Answer Set Programming (ASP), and we employ deduction and abduction – just as humans do – in the reasoning involved. ASP allows for succinct modeling of first order theories of pointer data structures, run-time thread interactions and shared memory synchronization. Our reasoner can systematically make the same judgments as a human reasoner, while constructing provably safe concurrent code. We present several reasoning challenges involved in transforming the sequential data structure into its equivalent concurrent version. All the reasoning tasks are encoded in ASP and our reasoner can make sound judgments to transform sequential code into concurrent code. To the best of our knowledge, our work is the first one to use commonsense reasoning to automatically transform sequential programs into concurrent code. We also have developed a tool that we describe that relies on state-of-the-art ASP solvers and performs the reasoning tasks involved to generate concurrent code.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/08/2020

Generating Concurrent Programs From Sequential Data Structure Knowledge

In this paper we tackle the problem of automatically designing concurren...
10/02/2020

Proving Highly-Concurrent Traversals Correct

Modern highly-concurrent search data structures, such as search trees, o...
05/10/2018

Order out of Chaos: Proving Linearizability Using Local Views

Proving the linearizability of highly concurrent data structures, such a...
04/27/2019

A Practical Analysis of Rust's Concurrency Story

Correct concurrent programs are difficult to write; when multiple thread...
10/20/2017

Parallel Combining: Making Use of Free Cycles

There are two intertwined factors that affect performance of concurrent ...
06/22/2020

Scalable Range Locks for Scalable Address Spaces and Beyond

Range locks are a synchronization construct designed to provide concurre...
03/08/2021

A coordination-free, convergent, and safe replicated tree

The tree is an essential data structure in many applications. In a distr...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We present a novel technique based on answer set programming that automatically generates concurrent programs for pointer data structures given a first order (logic) data structure theory, background knowledge about its sequential operations and axioms for concurrency. Design of concurrent operations for data structures is non-trivial. There are several challenges that need to be addressed for automatically generating concurrent algorithms. Traditionally, concurrent programs are designed manually and their proofs of correctness are done by hand. Few concurrent data structures are also verified using symbolic bounded model checking [vechev2010abstraction]. Avoiding state space explosion in the verification of concurrent programs is the main challenge for symbolic model checkers. Several works address this issue in interesting ways [emerson2000reducing, vechev2008deriving]. Other formal approaches involve performing Hoare-Style Rely-Guarantee reasoning [vafeiadis2006proving] to verify concurrent programs that have been manually designed. These approaches, thus, seek help of automated verification in an otherwise manual design process to ensure correctness. In contrast, our approach leverages reasoning techniques employed in AI, knowledge about concurrency, and explicitly-modeled sequential data structure code to automatically derive a safe concurrent program. Work in model checking and formal logics for concurrency do not exploit the sequential data structure knowledge. Their main focus is to prove absence of incorrect thread interactions (or traces). The proof of correctness of the verified concurrent code is provided outside their frameworks, assuming certain symmetry properties on concurrent interactions. Our work, in contrast, performs the reasoning tasks that an expert in concurrent program design performs in order to construct a safe concurrent program. This requires an understanding of the data structure representation, the library of algebraic operations that modify the data structure, an understanding of shared memory and how primitive read and write operations affect the shared memory. Additionally, the expert can explicitly describe the safety conditions that are desired, the invariants that need to be preserved during concurrent execution. With this knowledge, the expert creates the concurrent program that acquires the “right” number of locks (synchronization steps) that is safe for any concurrent interaction with an unbounded number of threads. The work reported here aims to automate this process, thereby deriving concurrent code automatically.

Our work can be seen as applying automated (commonsense) reasoning [mccarthy1960programs] to the program synthesis problem. To the best of our knowledge our is the first effort that attempts to emulate the mind of a human domain expert who designs concurrent data structure using sequential ones as a starting point. Our goal is that once axioms for concurrency, etc., have been developed by a domain expert, a programmer can simply specify a new sequential data-structure and automatically obtain its correct concurrent version.

We make extensive use of answer set programming (ASP) to model the domain expert’s thought process. We assume basic familiarity with stable model semantics and ASP solvers [gebser2016theory]. More details about ASP can be found elsewhere [gelfond1988stable, gelfond2014knowledge]. We next give a background of various notions associated with Data Structures and Concurrency in Section 2. It also provides an example of a Data Structure definition and its sequential program knowledge. We illustrate how this knowledge is used to translate the sequential code into concurrent code. The various reasoning steps involved in the transformation are shown explicitly in a side-by-side comparison. Section 3 introduces general challenges involved in transforming a sequential data structure into its concurrent version. Section 4 introduces our technique and explains thread interference, predicate falsification and the role of locks to preserve data structure invariants. Section 5 discharges the ideas into several theories involved in making the reasoning tasks executable. These theories are ASP programs that check predicate falsification and infer thread synchronization (using locks). They also infer the correct order of program execution needed in a concurrent program in order to preserve data structure invariants. Section 6 provides the soundness proof of our approach. Section 7 gives details of our implemented tool: Locksynth. Section 8 concludes with our Experimental setup and future work.

2 Background and General Notions

2.1 Data Structures

Data structures include some representation of information and the dictionary operations associated with them such as membership, insert and delete. Representation itself involves several notions at various levels of abstraction. For example, to describe a linked list, one needs primitive notions of nodes contained in memory, connected by a chain of edges. Further, there are notions of reachability (or unreachability) of nodes and keys being present (or absent) in a list. Membership operations usually involves traversing the elements (or nodes) in the data structure until an element(s) satisfying certain criteria is found. Insert operation (and similarly delete operation) also involves traversing the data structure until a right “window” of insertion is found. The notion of window represents some local fragment of the data structure that is modified as part of a data structure update operation. This notion is useful when discussing about locking nodes in concurrent programs.

Tree-Based Pointer Data structures A heap is a collection of nodes connected by edges. A data structure is a recursive definition defining a tree of nodes in memory. The recursive definition for can be viewed as a constructor of various instances of the data structure. Further, the only primitive destructive operation that may be performed is linkage of pointers: . The abstract relation links node to in the heap. We support transformation of concurrent code for an algebraic operation associated with such that may be performed in a constant number of operations. For instance insert operation for linked lists can be performed in two steps.

2.2 Concurrent Data Structures

Concurrent Data Structures usually support data structure dictionary operations being manipulated by an unbounded number of interacting threads. They are nothing but multiprocessor programs. We only assume a sequentially consistent shared memory model in this paper. Sequential consistent memory allows any update performed on the shared memory to be visible, before performing a subsequent read, to every thread in the system. Concurrency can be viewed as a sequence of interleaved steps taken by various threads in the system. To make sense of correctness of concurrent data structures, the notion of linearizability [herlihy2011art] is widely used. A concurrent data structure is termed linearizable, if the effects of concurrent modification by several threads can be viewed as if the concurrent operations were performed in some sequential order. In this paper, we study the modifications performed on a data structure as if they are respecting a given sequential algorithm. This allows us to model concurrency in an intuitive manner and sidesteps the necessity to understand traces. This assumption is sufficient to generate safe concurrent programs.

2.3 Data Structure Theory and Knowledge

We assume that a first order theory is provided for a pointer data structure along with the sequential data structure knowledge (see Fig. 1). We use the theory for linked lists and its knowledge as running example in this paper. The technique however applies to all tree-based pointer data structures. The data structure theory defines linked lists as a chain of edges with special sentinel nodes at the head of the list and at the end. The meaning of predicates and is straightforward.

2.4 Sequential Data Structure Knowledge

The knowledge contains which predicates are time-dependent (fluents). It also has the , and nodes for a data structure traversal, beginning from the start node. It also contains the pre/post-conditions of insert and delete operations for linked lists. The primitive write step is denoted by link (link-pointer) operation. The effects of link operation are also described using causes relation. The knowledge is useful for two purposes: 1. It bounds the interference effects of arbitrary thread interactions in a concurrent execution. 2. It narrows the code blocks that need to be synchronized to obtain a concurrent algorithm. However, as we present next, there are several challenges to transforms steps of insert operation (Fig. 2) into a concurrent version. The program statements are encoded within the vocabulary of the data structure using answer set programming (ASP). Program Blocks in computer programs can be viewed as equivalence class of input-output transformation. Further, the program blocks perform destructive update operations on the data-structure (insert/delete). We assume program blocks are straight line programs. If the sequential program has multiple blocks, the conditions under which the blocks may be executed should be mutually exclusive. In other words, every precondition uniquely determines the its associated program block. Given the data structure definition, it is straightforward to generate data structure instances that satisfy a given equivalence class. This is because the assumed data structure definition is recursive. The recursive definition for can enumerate the set of all structurally isomorphic instances of . The set can be ordered by the number of recursion unfoldings used to generate the instances, starting from the least number of unfoldings. For , implies that is a “smaller” structure than and appears before in the recursion depth ordering.

Original Theory
List Structural Definition

Reachability Definition
Keys Present Definition
Data Structure Knowledge Fluents, Start and End nodes
Data Structure Knowledge
Pre/Post Condition(s) (Insert)
Program Steps (Insert)
Primitive Destructive Update Step
Pre/Post Condition(s) (Delete)
Program Steps (Delete)

Figure 1: Data Structure Theory and Knowledge

2.5 Illustration of lazy synchronization for Concurrent Linked List Insert

Consider the insert operation of a linked list. We insert a target node (key) that preserves the linked list order. The left hand side of Fig. 1 shows the sequential code. The code is annotated with necessary pre-condition and post-conditions. The invariant is violated after step 1 and restored back in step 2. The equivalent concurrent code is shown on the right. In a concurrent setting, cannot be violated at any time. This is because a membership operation cannot encounter a broken list. Therefore, order of the program steps matter in a concurrent execution. Also, the right locks have to be acquired to perform safe destructive updates. Figuring out the right nodes to be locked requires considerable domain expertise and understanding of the data structure. Further, acquiring the precise locks is still insufficient. As seen in the figure, there is an extra validation of the reachability of the nodes post lock acquisition. This is because, by the time lock is acquired on node , the node may have been removed from the list. Hence, we do the reachability check. This technique is widely known as lazy synchronization [herlihy2011art].


x.next :=


.next := y


lock(x)
lock(y)
if  validate(,
xxxxxxxxx
xxx
xxx .next := y
xxx
xxx x.next :=
xxx
x : Correct Nodes identified to lock,     lin : Destructive update steps (numbered); : Unfalsifiable predicate ;        : Concurrency Invariant

Figure 2: Steps Involved in Transforming Sequential Linked List to Lazy Concurrent List

3 Sequential Data Structures Code to Concurrent Code: Challenges

We assume that the traversal code remains the same as the sequential version for a lock-based concurrent data structure. Therefore, the challenges we discuss are purely for destructive update program steps. We present the challenges involved and how they are addressed in turn.

3.1 Order of the Program Steps Matter

We have already shown that the order of the program steps matter. However, in general, it is possible that there exists no ordering of steps that preserves an invariant in a concurrent execution. For instance, the Internal BST invariant cannot be maintained by any order of the program steps involved in either inserting or deleting a node into the BST. In such a case, the designer uses the Read-Copy-Update (RCU) [mckenney2020rcu] technique to copy the window, perform changes locally (outside shared memory) and atomically splice window back to the shared memory. The RCU technique depends on the ability to splice back the window atomically. For tree-data structures, if the window is a sub-tree, then it is easy to atomically splice a sub-tree to shared memory by updating its parent pointer, in the shared memory. The applicability of RCU framework can be either made explicit in the data structure knowledge, or should otherwise be inferrable from the knowledge of data structure representation/operations.

3.2 Concurrent Traversal may require RCU

As we mentioned before, our assumption is that transformation for concurrent membership operations is vacuous (i.e., membership operation is unchanged in a concurrent setting). This ensures that the membership queries execute as fast as possible while acquiring no locks. However, for the membership operation to work consistently with insert/delete, the code for insert and delete operations should work correctly. We illustrate this with an Internal BST example. Consider the following internal BST shown and assume a thread is about to delete the node l. The inorder successor of l is lrl. It is clear that the delete operation should lock all the nodes on the path from l to lrl (inclusive). However, this locking scheme is inadequate although it modifies the data structure in a consistent manner. The problem lies outside the code of delete operation itself. The problem surfaces with a concurrent membership operation looking for node lrl. Due to node (and henceforth key) movement, it is possible for the traversal code to miss . Again, this scenario needs to be inferred from the data structure knowledge. Due to arbitrary key-movement, a concurrent membership operation might claim that it does not see a node as part of the data structure when it is in fact part of the structure. We use the RCU framework in such a case.

3.3 Proving correctness of a Concurrent Algorithm

Proving that a set of algebraic operations are thread-safe may involve several proof obligations in general [vafeiadis2006proving]. For linearizable data structures, it is sufficient to show that every execution of the insert, delete and membership operations is equivalent to some sequential execution [herlihy2011art]. This implies that the pre/post-condition invariants associated with the sequential algorithm are never violated in any concurrent execution. That is, when a thread is modifying the data structure with respect to an algebraic operation, it is the only agent in the system modifying the necessary fragment of the data structure. A domain expert who does these proofs by hand in practice, identifies all the destructive update steps performed by each operation. Then, she ensures that if the correct shared memory variables are locked, then any potential interference from other operations does not violate the invariants associated with the destructive update steps for the subject operation. Our reasoner would perform these proof obligations in a way a domain expert would, given the same data structure knowledge and representation.

4 Transform Sequential Data Structures to Concurrent Data Structures

4.1 Modeling Thread Interference

A concurrency domain expert views interference as arbitrary mutations that can occur on the data structure. We argue that this model is sufficient to discover any undesired thread interactions. The sufficiency of the interference model stems from reasoning interference effects based on sequential algorithm equivalence classes. This feature is usually not present in a concurrent program verification task performed via model checking. However, model checkers may also be instrumented with additional abstractions to guide their search for counterexample traces[vechev2010abstraction]. Our reasoner captures this viewpoint taken by a domain expert and performs the same reasoning.

h

l

ll

lr

lrl

lrr

r

h

lrl

ll

lr

lrr

r
Figure 3: Traversal operation reaches till node but misses by the time it reads

4.2 Predicate Falsification in Concurrent Execution

When designing a concurrent algorithm, it is necessary to know the predicates (invariants) that are falsifiable in a concurrent execution. Since the sequential data structure knowledge provides the necessary preconditions, we systematically check for potential falsification of every conjunct in (or ) (Fig. 1) with respect to environment interference. Predicate (and similarly ) is defined as which is picked from the third argument of (Fig. 1). If a predicate is not falsified with respect to the interference model, then it is indeed not falsifiable in any serialized concurrent execution. This implies, that a thread need not synchronize on the un-falsifiable predicate(s). For example, in the predicate is not falsifiable. This is because, any correct algebraic mutation on a linked list (via insert/delete) would only skip the node but not unlink it in the chain to tail node .

4.3 Lock Acquisition from Critical Conditions

Locks are necessary to protect the invariant predicates from falsification by interference. A conservative approach is to associate locks with every predicate and acquire locks. This approach can be taken for general concurrent programs where less semantic knowledge is available about the sequential program that is being transformed [deshmukh2010logical]. In a fine-grained locking scheme, it is desirable to acquire locks on a precise set of reachable nodes of the pointer data structure. Intuitively, locking the nodes involved in the window of modification seems sufficient. Although locking this set of nodes is insufficient in general, this sets a lower bound on the number of locks to acquire in a fine-grained locking scheme. Once, the right set of nodes to be locked is guessed, the domain expert confirms the non-falsifiability of the invariants. If non-falsifiability is proven affirmatively, then the concurrent program need only acquire the guessed locks.

5 Decomposition of Concurrency Proof Obligations into Reasoning Tasks

5.1 Notations and Assumptions

In the following, we define several theories, each of which captures the reasoning tasks performed by a domain expert. Every theory is an ASP program.

  1. We assume the theory encodes the structural definition of some tree-based recursive data structure . It also contains various primitives and definitions for every predicate from sequential data structure knowledge. It is also assumed that we can identify predicates that are time invariant from the predicates that are time dependent using

  2. Interference effects of algebraic operations are modelled in the theory , where contains all the rules in but reified into the time domain.

  3. We check the adequacy of guessed locks in the theory , where is same as but considers locking of nodes.

  4. To find the right order of program steps we use the theory

5.2 Generating Interference Model

Theory is the planning domain [lifschitz2019answer] with reified time argument. The theory contains all the predicates that are time dependent with an extra argument for time. More precisely, for all that is time dependent, . Also, the ordering of time is captured by the relation, where means time step follows after .

From the procedural information in , it is easy to model the instantaneous effects of the actions. We denote the theory encoding the interference model by . For operation , a predicate is added to as an abducible. Abducibles are literals that are guessed in ASP. Note that contains exactly the same terms in the data structure procedural knowledge replaced with uppercase variables.


The causal effects of insert and delete, are also encoded as literals following from interfere. They are shown below:



In general, for any algebraic operation, an interference predicate is added as an abducible along with its causal effects.

5.3 Checking Falsification Predicates (Task 1)

Given the data structure knowledge , one can discharge conditions that check for falsifications of every conjunct in . For of insert operation in linked list, one can generate predicate falsification checks for reach, suffix, and edge. These are precisely the time dependent relations in . The predicate falsify_reach checks for falsification of reach in one time step as follow: .

Similarly the falsification of suffix and edge are defined. These falsification predicates are also added to . Once the falsification predicates are added to the theory , one can check if the falsification predicates are true in some model of . If their satisfiability is affirmative, then interference indeed falsifies the predicates. Otherwise, the interference cannot falsify the predicates. This task is an optimization step in order to reduce the number of predicates to be validated post lock-acquisition.

5.4 Checking Adequacy of Guessed Locks (Task 2)

From the procedural knowledge of the data structure operations, it is easy to guess the locks to be acquired. As an initial guess, every thread should at least synchronize on the nodes involved in the “window” of modification. For example, the window for insert w.r.t is the set of nodes . After guessing the set of locks to be acquired, one can now check their adequacy in the presence of interference. In the interference model, the effects of interfere predicates are enabled only if there are no locks already acquired on the nodes they modify. For instance, for the nodes that are modified are (for arbitrary ). Both the two effects shown previously, are enabled only when there are no locks on or . These re-written rules are part of the theory which represent the reified interference model in the presence of locks. The re-written rules are shown below:



The locked nodes themselves are captured by the locked relation and are added as rules to . The falsification predicates remain the same in . From , one can infer entailment of the falsification predicates. If falsification is affirmative, then the locking scheme is clearly inadequate. Otherwise, the locking scheme is adequate and the concurrent code can be generated (with lazy synchronization). If the locking scheme is inadequate, we can recommend the RCU framework.

5.5 Validating Sequential Program Order (Task 3)

We denote and to be the pre-condition and post-condition associated with operation . Given the sequential program steps in , one should also be able to infer the right order of program steps, that do not violate an invariant, in a concurrent setting. A common invariant that needs to be satisfied is the well-formedness of the data structure. Having the data structure well-formed at all times is desirable as it makes the results returned by membership queries easier to explain with respect to linearizability. Given an invariant , theory and procedural knowledge in an operation in knowledge base , a new theory can be generated that validates the program order of all basic blocks with respect to invariant . For every program step , a reified abducible is is generated and added to . Then, the is reified and added to . Similarly is also added to . Also, the necessary time steps along with their ordering using is added to . Now, if is satisfiable, then there exists a program order that does not violate the . The order might be a permutation of the input program order. On the contrary, if is unsatisfiable, then no permutation (including the original program order) exists that can preserve . In that case, the reasoner would recommend using an RCU Synchronization.

5.6 Detecting Key-Movement (Task 4)

As illustrated before for internal BSTs, if the membership operation can potentially miss nodes that are part of the data structure, we use an RCU synchronization. To detect key movement, we compare the sets of keys observed by an asynchronous observer versus a synchronous observer. The asynchronous observer visits one node at a time whereas the synchronous observer visits all the nodes traversable from the beginning node, at every time step. To model this scenario, we require the relation to be present in the knowledge . The relation specifies that is the next node to visit while having last visited with respect to the node at time . For no loss of generality on the termination condition of traversal code, we assume that the traversal ends upon reaching the end node. The fact states that node is the end node for the data structure. Key movement is affirmative if there exists a node (key) that is visited by a synchronous observer, but is not visited by the asynchronous observer at the end of its traversal. Both the rules for synchronous and asynchronous observer use the relation, their definitions given below are straightforward. Finally the predicate detects key movement as just explained.



6 Overall Procedure and Soundness

Our reasoner performs the above four tasks based on a given data structure theory and sequential data structure knowledge and takes appropriate decisions on the structure of transformed concurrent code. It is also assumed that contains the library of sequential data structure operations , where each 111 signifies that may not be applicable to all instances in is mapping from one instance of data structure to another. Without loss of generality we can assume . We say that the operation is applicable on an instance if is true in some model of . There exists a least such that each is applicable to . This structure is assumed to be part of . The instance is both necessary and sufficient for the reasoning tasks performed in this paper. It is used in the soundness proof of the procedure later. The intuition behind choosing such an instance is that we need to model executions in which simultaneous operations contend to modify the data structure. If for some there is some that is not applicable to then interference model cannot model serialized concurrent execution faithfully, as there might be only a subset of operations modifying the data structure simultaneously and we miss out potential interference that could happen on . A safe concurrent algorithm must take into account interference effects from all destructive update operations in . Few notations need their description, denotes the precondition of some operation , denotes the generated falsification predicate for fluent ,

are the set of locks guessed according to some domain expert provided heuristic

on , function checks the adequacy of guessed locks, is the set of all valid program order permutations that preserve a given invariant and finally, is the function that captures the presence of key-movement using similar predicates presented earlier. When the procedure recommends RCU for , then either key-movement is detected or an invariant is violated with any program order .

Without loss of generality assume that we are trying to transform 2 operations of some tree-based inductive data structure into their corresponding concurrent versions. Let denote the two operations. Again, without loss of generality that both and have a single basic block in their destructive update code. For External BSTs insert operation. there are four different pre-conditions and hence four basic blocks. But, as we show, the argument follows similarly if we consider single basic block. Because, is inductive, we assume that , are applicable to countably infinite instances of in . Clearly, there exists a least instance such that and are applicable to . The agents (including interference) that perform or are always cautious with respect to their (permuted) sequential steps from . That is, after acquiring the desired locks, the agents post-check (validate) their respective preconditions or to ensure that the “window” of modification is still intact and not modified in the time taken to acquire the locks.

procedure GenerateConcurrentCode()        Code        if  then              RecommendRCU()              return        end if        if  then              RecommendRCU()              return        end if        if  then              RecommendRCU()              return        end if        Code Code LockStmts()        Code Code Validate()        Code Code        Code Code UnlockStmts() end procedure

6.1 Soundness of Unfalsifiable Predicates and Lock Adequacy Argument

Lemma 1: If a time-dependent predicate (fluent) is unfalsifiable in for some where every is applicable, then is unfalsifiable in any serialized concurrent execution of and
Proof: may belong to or (or both). We denote to mean one of either or .
Case 1: and every is applicable to . Then we have no problem. As conjuncts of are not falsified including .
Case 2: = and some is not applicable to . If , then we have no problem. If otherwise, , there must exist another predicate such that is falsified. Otherwise, would be applicable to (as is not falsified). From the above two cases, it is clear that any serialized run of operations and does not falsify .

Lemma 2: If the guessed locks for some , on an instance where every is applicable, make unfalsifiable in , then is unfalsifiable in any serialized execution of and . Proof:: Similar to Lemma 1.

7 Locksynth: A Tool for Automatic Concurrency Synthesis

Our tool Locksynth[github], implements the above reasoning tasks in SWI-Prolog and ASP. SWI-Prolog acts as the front-end of the tool that takes the sequential data structure knowledge as input. The backend of our tool is driven by ASP which performs the reasoning tasks discussed above. We use the Clingo ASP Solver. Locksynth takes the data structure defitions and sequential code as input and generates the abstract concurrent code. The template for sequential code is as follows: code(Operation, BasicBlockNum, Precondition, BasicBlockSteps, PostCondition). Here, Operation represents the name of the algebraic operation (ie. insert/delete), BasicBlockNum identifies the basic block, the other arguments carry the same denotation as in 1. In case of Linked Lists, the sequential code is represented as follows:

code(insert, block1,
  [reach(x), edge(x,y), key(x,kx), key(y,ky), key(target,ktarget),
   kx < ktarget, ktarget < ky, not(reach(target)],
  [link(x, target), link(target, y)], [reach(target)]).
code(delete, block1,
  [reach(x), edge(x,target), edge(target,y), key(x,kx), key(y,ky),
   key(target, ktarget), kx < ktarget, ktarget < ky],
  [link(x,y)], [not(reach(target)]).

We now mention some important features that are part of the implementation.

7.1 Generation of Equivalence Classes via Meta-Interpretation

Locksynth performs meta-interpretation of the recursive data structure definition. It systematically unfolds data structure instances of increasing size starting from the smallest instance. In case of linked list, Locksynth generates the empty list, then generates a list containing one element, two elements and so on. This unfolding is performed to find data structure instances where both insert and delete are applicable. In other words, Locksynth finds the equivalence class instances such that both the preconditions of insert and delete are satisfied. The necessity for such an equivalence class instance is already explained in the previous section. The data structure definition is represented as follows:

rule(list,
     [node(h),key(h,kh),edge(h,X),key(X,KX),lt(kh,KX),suffix(X)]).
rule(suffix(t), []).
rule(suffix(X),
     [node(X),node(Y),edge(X,Y),key(X,KX),
      key(Y,KY),lt(KX,KY),suffix(Y)]).

Within our tool implementation, the query ?- unfold(Depth, Instance) generates data structure instances (through meta-interpretation of data structure definition) for a certain given depth. Then the full query, ?- unfold(Depth, Instances), code(insert, block1, Pre, _, _), check(Instance, Pre) checks if the precondition Pre is satisfied by the data structure instance Instance. The predicate check internally calls the CLINGO ASP solver to perform the satisfiability check, whether the generated Instance indeed satisfies Pre.

7.2 Symbolic Reasoning over Nodes, Keys

Nodes and Keys are treated as abstract symbols with their usual reflexive and transitive equality and order relations. Symmetry is added in case of equality. This allows the construction of the Data structure instance at a symbolic level. Equality relation over nodes enables us to separate nodes that are part of the data structure (reachable nodes) from the nodes that are not part of the data structure (unreachable nodes). To illustrate, for the linked list insert operation, we require atleast one node to be not part of the data structure. Therefore, equality reasoning over nodes will allow us to specify a model with an unreachable node, which is not equal to any of the reachable nodes. We represent eq_node(X,Y) for node equality, eq_num(X, Y) for key equality and lt(X, Y) for key ordering. The sequential program capturing key order/equality and node equality should use these abstract predicates.

7.3 Reification of predicates into Time Domain and Bounded Time Chains

To simulate interference effects, to check for lock adequacy or to detect key movement, we require reasoning over time. To perform temporal reasoning, we reify all fluents into time domain. We also generate the commonsense law of inertia rules handle the frame problem. These rules are similar to the frame rules used in ASP planning problems. A consequence of introducing time is to set the maximum time steps, and link the time steps from the initial time step in a linear chain. This time chain is bounded to the maximum of either the largest basic block (or) the depth of the equivalence class data structure instance. This is because, modelling interference requires only two time steps as interference mutates the data structure atomically. Checking lock adequacy requires also requires two time steps, ie. we simply check if interference has falsified any predicates in the presence of locking. Checking a valid program order requires time steps in the size of largest basic block. Finally, modelling traversal to detect key movement requires a time in the size of the data structure depth. Hence we choose the maximum of either the largest basic block or the data structure depth. For time chain of length 3, the following facts are added to the appropriate ASP program performing the reasoning task: time(t1). time(t2). time(t3). next_time(t1, t2). next_time(t2, t3). The relation next_time, establishes the linear order of t1, t2 and t3.

7.4 Guessing Locks

Given preconditions for insert/delete, a simple heuristic that Locksynth follows is to lock every node present in the precondition. These nodes are treated as locked, and are subsequently checked for adequacy in the presence of interference. Locking rules (generated automatically) for guessed nodes for linked list insert are shown below:

locked(X) :-
  reach,edge(X, Y),not reach(target),key(X, Kx),key(Y, Ky),
  key(target, ktarget),lt(Kx, ktarget),lt(ktarget, Ky).
locked(Y) :-
  reach(X),edge(X,Y),not reach(target),key(X, Kx),key(Y, Ky),
  key(target, ktarget),lt(Kx, ktarget),lt(ktarget, Ky).

Our locking heuristic locks all the nodes that are present in the operation precondition. Improving this crude heuristic toward a more general lock detection algorithm is part of our future work.

8 Experiments, Conclusion and Future Work

Our approach has been applied to Linked Lists, External BSTs and Internal BSTs (Table 1). We are able to synthesize the concurrent versions of insert, delete for Linked Lists and External BSTs. Locksynth can also recommend RCU framework for Internal BSTs due to key-movement missed by an asynchronous observer. The generated code for concurrent linked list is shown below:

Sequential Insert { reach(x), edge(x,y), kx ktarget, ktarget ky} link(x, target) link(target, y) {reach(target)} Concurrent Insert {reach(x), edge(x,y), kx ktarget, ktarget ky} lock(x) lock(y) lock(target) if validate(reach(x) & edge(x,y) &            kx ktarget & ktarget ky){ link(target, y) link(x, target) } unlock(target) unlock(y) unlock(x) {reach(target)} Sequential Delete {reach(x), edge(x, target), edge(target, y),   kx ktarget, ktarget ky} link(x, y) {not reach(target)} Concurrent Delete {reach(x), edge(x, target), edge(target, y),   kx ktarget, ktarget ky} lock(x) lock(target) lock(y) if validate(reach(x) & edge(x,target) &      edge(target,y) & kx ktarget &      ktarget ky){ link(x,y) } unlock(target) unlock(y) unlock(x) {not reach(target)}

Our work presents the first step towards using commonsense reasoning to generate concurrent programs from sequential data structure knowledge. We have presented the challenges involved in the concurrent code generation and mechanized the reasoning tasks as performed by a human concurrency expert. The procedure described in this paper conforms to McCarthy’s vision of building programs that have commonsense and manipulate formulas in first order logic [mccarthy1960programs]. Our future work aims to apply our technique to more data structures such as Red-Black Trees and AVL-Trees. In general, given the knowledge about a sequential data structure as well knowledge about the concept of concurrency, one should be able to generate suitable, correct versions of concurrent programs. We eventually aim to generalize our technique to arbitrary data structures. Further, the only synchronization primitives we have addressed in this paper are locks. However, there are more sophisticated atomic write instructions supported by modern multiprocessors such as Compare-and-Swap [valois1995lock], Fetch-and-Add [heidelberger1990parallel]. These atomic instructions give rise to lock-free data structures. Supporting these primitives is part of future work.

Data Structures Membership Insert Delete
Linked List No change Success Success
External BST No change Success Success
Internal BST No change Success RCU
Table 1: Locksynth results for Linked List, Internal and External BSTs