A shared-memory program has a data race if an execution of the program can perform two memory accesses that are conflicting and concurrent, meaning that the accesses are executed by different threads, at least one access is a write, and there are no interleaving program operations. Data races often lead to atomicity, order, and sequential consistency violations that cause programs to crash, hang, or corrupt data (boehm-miscompile-hotpar-11, ; jmm-broken, ; portend-toplas15, ; conc-bug-study-2008, ; portend-asplos12, ; benign-races-2007, ; prescient-memory, ; adversarial-memory, ; racefuzzer, ; relaxer, ; therac-25, ; blackout-2003-tr, ; nasdaq-facebook, ). Modern shared-memory programming languages including C++ and Java provide undefined or ill-defined semantics for executions containing data races (java-memory-model, ; c++-memory-model-2008, ; memory-models-cacm-2010, ; you-dont-know-jack, ; data-races-are-pure-evil, ; out-of-thin-air-MSPC14, ; jmm-broken, ).
Happens-before (HB) analysis detects data races by tracking the happens-before relation (happens-before, ; adve-weak-racedet-1991, ) and reports conflicting, concurrent accesses unordered by HB as data races (fasttrack, ; multirace, ; goldilocks-pldi-2007, ). However, the coverage of HB analysis is inherently limited to data races that manifest in the current execution. Consider both example executions in Figure 3. The HB relation, which is the union of program and synchronization order (happens-before, ), orders the accesses to x and would not detect a data race for either observed execution. However, for Figure ((a))(a), we can know from the observed execution that a data race exists in a different interleaving of events of the program (if Thread 2 acquires m first, the accesses to x can occur concurrently). On the other hand, it is unclear if Figure ((b))(b) has a data race, since the execution of may depend on the order of accesses to y (e.g., suppose ’s execution depends on the value read by ).
Predictive analyses detect data races that are possible in executions other than the observed execution. Notably, Smaragdakis et al. introduce a predictive analysis that tracks the causally-precedes (CP) relation (causally-precedes, ), a subset of HB that conservatively orders conflicting accesses that cannot race in some other, unobserved execution. A CP-race exists between conflicting accesses not ordered by CP.111More precisely, two conflicting accesses unordered by CP imply either a data race or a deadlock in another execution (causally-precedes, ). In Figure ((a))(a), , but , i.e., the execution has a CP-race (two conflicting accesses unordered by CP). In contrast, Figure ((b))(b) has no CP-race () because CP correctly orders the critical sections that may result in different behavior if executed in the reverse order, i.e., may read a different value and the data race on x might not occur.
Smaragdakis et al. show how to compute CP in polynomial time in the execution length. Nonetheless, their analysis cannot scale to full executions, and instead analyzes bounded execution windows of 500 consecutive events (causally-precedes, ), missing CP-races involving accesses that are “far apart” in the observed execution. Their CP analysis is inherently offline; in contrast, an online dynamic analysis would summarize the execution so far in the form of analysis state, without needing to “look back” at the entire trace. Like Smaragdakis’s CP analysis, most existing predictive analyses are offline, needing access to the entire execution trace, and cannot scale to full execution traces (causally-precedes, ; rvpredict-pldi-2014, ; jpredictor, ; maximal-causal-models, ; rdit-oopsla-2016, ; said-nfm-2011, ; ipa, ; wcp, ) (Section 8).
Two recent approaches introduce online predictive analyses (wcp, ; vindicator, ). This article’s contributions are concurrent with or precede each of these prior approaches. In particular, Kini et al.’s weak-causally-precedes (WCP) submission to PLDI 2017 (wcp, ) is concurrent with this article’s work, as established by our November 2016 technical report (raptor, ). Furthermore, none of the related work shows how to compute CP online, a challenging proposition (causally-precedes, ) (Section 2.3).
This article introduces Raptor (Race predictor), a novel dynamic analysis that computes the CP relation soundly and completely. Raptor is inherently an online analysis because it summarizes an execution’s behavior so far in the form of analysis state that captures the recursive nature of the CP relation, rather than needing to look at the entire execution so far. We introduce analysis invariants and prove that Raptor soundly and completely tracks CP by maintaining the invariants after each step of a program’s execution.
We have implemented Raptor as a dynamic analysis for Java programs. Our unoptimized prototype implementation can analyze executions of real programs with hundreds of thousands or millions of events within an hour or two. In contrast, Smaragdakis et al.’s analysis generally cannot scale beyond bounded windows of thousands of events (causally-precedes, ). As a result, Raptor detects CP-races that are too “far apart” for the offline CP analysis to detect.
While concurrent work’s WCP analysis (wcp, ) and later work’s DC analysis (vindicator, ) are faster and (as a result of using weaker relations than CP) detect more races than Raptor, computing CP online is a challenging problem that prior work has been unable to solve (wcp, ; causally-precedes, ) (Section 2.3). Furthermore, Raptor provides the first set-based algorithm for partial-order-based predictive analysis. Though recent advances in predictive analysis have subsumed Raptor’s race coverage and performance, the alternative technique of a set-based approach provides unique avenues for future development.
Raptor advances the state of the art by (1) being the first online analysis for computing CP soundly and completely and (2) demonstrably scaling to longer executions than the prior CP analysis and finding real CP-races that the prior CP analysis cannot detect.
2. Background and Motivation
This section defines the causally-precedes (CP) relation from prior work (causally-precedes, ) and motivates the challenges of computing CP online. First, we introduce the execution model and notation used throughout the article.
2.1. Execution Model
An execution trace consists of events observed in a total order, denoted (reflexive) or (irreflexive), that represents a linearization of a sequentially consistent (SC) execution (sequential-consistency, ).222It is safe to assume SC because language memory models provide SC up to the first data race (memory-models-cacm-2010, ). Every event in has an associated executing thread.
An event is one of , , , or , where x is a variable, m is a lock, and specifices the th instance of the event, i.e., is the th write to variable x. is a read by thread T to variable x such that .
We assume that any observed execution trace is well formed, meaning a thread only acquires a lock that is not held and only releases a lock it holds, and lock release order is last in, first out (i.e., critical sections are well-nested).
Let be a helper function that returns the thread identifier that executed event . Two access (read/write) events and to the same variable are conflicting (denoted ) if at least one event is a write and .
2.2. Relations over an Execution Trace
The following presentation is based on prior work’s presentation (causally-precedes, ).
Program-order (PO) is a partial order over events executed by the same thread. Given a trace , is the smallest relation such that for any two events and , if .
Definition 2.1 (Happens-before).
The happens-before (HB) relation is a partial order over events in an execution trace (happens-before, ). Given a trace , is the smallest relation such that:
Two events are HB ordered if they are PO ordered. That is, if .
Release and acquire operations on the same lock (i.e., synchronization order) are HB ordered. That is, if (which implies ).
HB is closed under composition with itself. That is, if .
Throughout the article, we generally use irreflexive variants of and , and , respectively, when it is correct to do so.
Definition 2.2 (Causally-precedes).
The causally-precedes (CP) relation is a strict (i.e., irreflexive) partial order that is strictly weaker than HB (causally-precedes, ). Given a trace , is the smallest relation such that:
Release and acquire operations on the same lock containing conflicting events are CP ordered. That is, if .
Two critical sections on the same lock are CP ordered if they contain CP-ordered events. Because of the next rule, this rule can be expressed simply as follows: if .
CP is closed under left and right composition with HB. That is, if or if
The rest of this article refers to the above rules of the CP definition as Rules (a), (b), and (c). An execution trace has a CP-race if it has two events such that .
In the execution traces in Figures ((a))(a) and ((b))(b) (page ((a))(a)), because . In Figure ((b))(b), by Rule (a), and it follows that by Rule (c) because . In contrast, in Figure ((a))(a) the critical sections do not have conflicting accesses, so and .
Next consider the execution in Figure 4 (page 4), ignoring the rightmost column (explained in Section 2.3). The accesses to x are CP ordered through the following logic: by Rule (a) implies by Rule (c), which implies by Rule (b). Since , by Rule (c).
Prior work proves that the CP relation is sound333Following prior work on predictive analysis (causally-precedes, ; rvpredict-pldi-2014, ), a sound analysis reports only true data races. (causally-precedes, ). In particular, if a CP-race exists, there exists an execution that has an HB-race (two conflicting accesses unordered by HB) or a deadlock (causally-precedes, ).
2.3. Limitations of Recursive Ordering
This article targets the challenge of developing an online analysis for tracking the CP relation and detecting CP-races. An online analysis must (1) compute CP soundly and completely; (2) maintain analysis state that summarizes the execution so far, without needing to maintain and refer to the entire execution trace; and (3) analyze real program execution traces using time and space that is acceptable for heavyweight in-house testing.
The main difficulty in tracking the CP relation online is in summarizing the execution so far as analysis state. An analysis can compute the PO and HB relations for events executed so far based only on the events executed so far. In contrast, an online CP analysis must handle the fact that CP may order two events because of later events. For example, only because of a future event (); we provide a concrete example shortly. The analysis must summarize the possible order between and at least until executes, without needing access to the entire execution trace. Smaragdakis et al. explain the inherent challenge of developing an online analysis for CP as follows (causally-precedes, ):
CP reasoning, based on [the definition of CP],
is highly recursive. Notably, Rule (c) can feed into Rule (b), which can feed back into Rule (c). As a result, we have not implemented CP using techniques such as vector clocks, nor have we yet discovered a full CP implementation that only does online reasoning (i.e., never needs to “look back” in the execution trace).
Smaragdakis et al.’s CP algorithm encodes the recursive definition of CP in Datalog, guaranteeing polynomial-time execution in the size of the execution trace. However, the algorithm is inherently offline because it fundamentally needs to “look back” at the entire execution trace. Experimentally, Smaragdakis et al. find that their algorithm does not scale to full program traces. Instead, they limit their algorithm’s computation to bounded windows of 500 consecutive events (causally-precedes, ).
Figure 4 illustrates the challenge of developing an online analysis that computes CP soundly and completely while handling the recursive nature of the CP definition. The last column shows the orderings relevant to that are “knowable” after each event . More formally, these are orderings that exist for a subtrace comprised of events up to and including .
As Section 2 explained, because . However, at , an online analysis cannot determine that (and thus ) based on the execution subtrace so far. Not until is it knowable that and thus , , and . A sound and complete online analysis for CP must track analysis state that captures ordering once it is knowable without maintaining the entire execution trace.
|T1||T2||T3||Relevant orderings “knowable” after event|
|[knowable at since is inevitable]|
Alternatively, consider instead that T1 executed the critical section on u before the critical section on m. In that subtly different execution, . A sound and complete online analysis for CP must track analysis state that captures the difference between these two execution variants.
We note that more challenging examples exist. For instance, it is possible to modify the example so that it is unknowable even at that . Section 5 presents three such examples.
3. Raptor Overview
Raptor (Race predictor) is a new online dynamic analysis that computes the CP relation soundly and completely by maintaining analysis state that captures CP orderings knowable for a subtrace of events up to and including the latest event in the execution. This section overviews the components of Raptor’s analysis state.
Throughout the rest of the article, we say that an event is CP ordered to a lock m or thread T if there exists an event that releases m () or is executed by T (), respectively, and . This property, in turn, implies that for any future event (i.e., ), if acquires m () or is executed by T (), respectively, since CP composes with HB. Similarly, is HB ordered to a lock or thread if the same conditions hold for instead of .
Existing HB analyses typically represent analysis state using vector clocks (vector-clocks, ; multirace, ; fasttrack, ). Since the CP relation conditionally orders critical sections, conditional information is required on synchronization objects to accurately track CP. Using sets to track the HB and CP relations in terms of synchronization objects naturally manages conditional information, compared with using vector clocks. Raptor’s analysis state is represented by sets containing synchronization objects—locks and threads—that represent CP, HB, and PO orderings. For example, if a lock m is an element of the HB set , it means that the th write of x event, , is HB ordered to m. Similarly, the thread element means that the event (a read by T1 to y between the 3rd and 4th writes to y) is CP ordered to thread T2. Raptor’s sets are most related to the sets used by Goldilocks, a sound and complete HB data race detector (goldilocks-pldi-2007, ) (Section 8).
Sets for each access to a variable.
As implied above, rather than each variable x having CP, HB, and PO sets, every access and has its own CP, HB, and PO sets. Per-access CP sets are necessary because of the nature of the CP relation: at , it is not in general knowable whether or . Similarly, it is not generally knowable at whether . In Figure 4, even after executes, Raptor must continue to maintain sets for because has not yet been established.
Maintaining per-access sets would seem to require massive time and space (proportional to the length of the execution), making it effectively an offline analysis like prior work’s CP analysis (causally-precedes, ). However, as we show in Section 6, Raptor can safely remove sets under detectable conditions, e.g., it can remove ’s sets once it determines that and .
Sets for lock acquires.
Raptor tracks CP, HB, and PO sets not just for variable accesses, but also for lock acquire operations to compute CP order by Rule (b) (i.e., implies ). For example, means the event is CP ordered to thread T3.
Similar to sets for variable accesses, maintaining a CP, HB, and PO set for each lock acquire might consume high time and space proportional to the execution’s length. In Section 6, we show how Raptor can safely remove an acquire ’s sets once they are no longer needed—once no other CP ordering is dependent on the possibility of being CP ordered with a future rel(m).
Conditional CP sets.
As mentioned earlier, it is unknowable in general at an event whether . This recursive nature of the CP definition prevents immediate determination of CP ordering at . This delayed knowledge is unavoidable due to Rule (b), which states that if . A CP ordering might not be known until executes—or even longer because Rule (c) can “feed into” Rule (b), which can feed back into Rule (c) (causally-precedes, ).
Raptor maintains conditional CP (CCP) sets to track the fact that, at a given event in an execution, CP ordering may or may not exist, depending on whether some other CP ordering exists. For example, an element (or ) in the CCP set means that is CP ordered to lock n (or thread T) if for some future event .
In contrast with the above, Goldilocks does not need or use sets for each variable access, sets for lock acquires, or conditional sets, since it maintains sets that track only the HB relation (goldilocks-pldi-2007, ).
Outline of Raptor presentation.
Section 4 describes Raptor’s sets and their elements in detail, and it presents invariants maintained by Raptor’s sets at every event in an execution trace. Section 5 introduces the Raptor analysis that adds and, in some cases, removes set elements at each execution event. Section 6 describes how Raptor removes “obsolete” sets and detects CP-races.
4. Raptor’s Analysis State and Invariants
This section describes the analysis state that Raptor maintains. Every set owner , which can be a variable write instance , a variable read instance , or lock acquire instance , has the following sets: , , , and . In general, elements of each set are threads T and locks m, with a few caveats: maintains an index for each lock element (e.g., ), and each element includes an associated lock instance upon which CP ordering is conditional (e.g., or ). In addition, each set for a variable write instance and read instance can contain a special element , which indicates ordering between and and between and . Similarly, each set for can also contain a special element for each thread T, which indicates ordering between and . Since knowledge of CP ordering may be delayed, a write or read instance could establish CP order to a thread T at an event later than the conflicting write or read instance. The special elements are necessary to distinguish CP ordering to the conflicting write or read instance from CP ordering to a later event.
Figure 5 shows invariants that the Raptor analysis maintains for every set owner . The rest of this section explains these invariants in detail, using events , , , and as defined in the figure.
4.1. Program Order Set:
According to the [PO] invariant in Figure 5, contains all threads that the event is PO ordered to. That said, we know from the definition of PO that will be PO ordered to only one thread (the thread that executed ). In addition, for any or , may contain the special element , indicating that or , respectively, is PO ordered to the next write access to x by the same thread, i.e., or . Similarly, for any , may contain the special element , indicating that is PO ordered to the next read access to x by thread T, i.e., . Note that Raptor does not really need and to indicate , , or , since PO order is knowable at the next (read/write) access, but Raptor uses these elements for consistency with the CP and CCP sets, which do need and as explained later in this section.
4.2. Happens-Before Set:
The set contains threads and locks that the event is HB ordered to. Figure 5 states three invariants for : the [HB], [HB-index], and [HB-critical-section] invariants.
The [HB] invariant defines which threads and locks are in . If is HB ordered to a thread or lock, then contains that thread or lock. This property implies that will be HB ordered to any future event that executes on the same thread or acquires the same lock, respectively. Similar to PO sets, for or , means or , respectively. Additionally, for , means . Though the and elements are superfluous (HB ordering is knowable at the next (read/write) access), Raptor maintains these elements for consistency with the CP and CCP sets that need it.
According to the [HB-index] invariant, every lock m in has a superscript (e.g., ) that specifies the earliest release of m that is HB ordered to. For example, means that but . This property tracks which instance of the critical section on lock m, , would need to be CP ordered to to imply that (by Rules (b) and (c)).
According to the [HB-critical-section] invariant, for read/write accesses ( or ) only, in may have a subscript (i.e., ), indicating that, in addition to , executed inside the critical section on lock , i.e., . Notationally, whenever , is also implied. Raptor tracks this property to establish Rule (a) precisely.
4.3. Causally-Precedes Set:
Analogous to for HB ordering, each set contains locks and threads that the event is CP ordered to. However, at an event , since Rule (b) may delay establishing , does not necessarily contain such that . This property of CP presents two main challenges. First, Rule (b) may delay establishing CP order that is dependent on other CP orders. Raptor introduces the set (described below) to track potential CP ordering that may be established later. Raptor tracks every lock and thread that is CP ordered to, either eagerly using or lazily using , according to the [CP] invariant in Figure 5.
Second, as a result of computing CP lazily, Raptor may not be able to determine that there is a CP-race between conflicting events , , or until after the second conflicting access event or . For example, if the analysis adds T to sometime after T executed , that does not necessarily mean that (it means only that is CP ordered to some event by T after ). Raptor uses the special thread-like element that represents the thread T up to event only, so or only if or , respectively. Raptor also uses the special thread-like element that represents the thread T up to event only, so only if .
The [CP-rule-A] invariant (Figure 5) covers the case for which Raptor always computes CP eagerly: when two critical sections on the same lock have conflicting events, according to Rule (a). In this case, the invariant states that if two critical sections, and , are CP ordered by Rule (a) alone, then as soon as the second conflicting access executes. The [CP-rule-A] invariant is useful in proving that Raptor maintains the [CP] invariant (Appendix A).
4.4. Conditionally Causally-Precedes Set:
Section 3 overviewed CCP sets. In general, means that the event is CP ordered to if , where is an ongoing critical section (i.e., ).
As mentioned above, the [CP] invariant says that for every CP ordering, Raptor captures it eagerly in a CP set or lazily in a CCP set (or both). A further constraint, codified in the [CCP-constraint] invariant, is that only if a critical section on lock n is ongoing. As Section 5 shows, when n’s current critical section ends (at ), Raptor either (1) determines whether or (2) identifies another lock q that has an ongoing critical section such that it is correct to add some to .
Like , when or , can contain special thread-like elements of the form . The element or means that if , or if , respectively, where is the current ongoing critical section of n. Similarly, for , can contain special thread-like elements of the form . The element means that if where is the current ongoing critical section of n.
|Execution||Analysis state changes|
Raptor state example.
Figure 6 shows updates to Raptor’s analysis state at each event for the execution from Figure ((b))(b). Directly after , , satisfying the [PO] invariant; and , satisfying the [HB], [HB-index], and [HB-critical-section] invariants since . Directly after , , , and contain satisfying the [CP] and [CCP-constraint] invariants, capturing the fact that T1’s events are CP ordered to T2 if . Directly after , satisfies the [CP-rule-A] invariant, and satisfies the [CP] invariant. Finally, after and , and , respectively, satisfying the [CP] invariant, indicating .
5. The Raptor Analysis
This section details Raptor, our novel dynamic analysis that maintains the invariants shown in Figure 5 and explained in Section 4. For each event in the observed trace , Raptor updates its analysis state by adding and (in some cases) removing elements from each set owner ’s sets. Assuming that the analysis state satisfies the invariants immediately before , then at event , Raptor modifies the analysis state so that it satisfies the invariants immediately after :
Theorem 5.1 ().
After every event, Raptor maintains the invariants in Figure 5.
Appendix A proves the theorem.
To represent the new analysis state immediately after , we use the notation ,