1. Introduction
The presence of data races in concurrent software is the most common indication of a programming error. Data races in programs can result in nondeterministic behavior that can have unintended consequences. Further, manual debugging of such errors is prohibitively difficult owing to nondeterminism. Therefore, automated detection and elimination of data races is an important problem that has received widespread attention from the research community. Dynamic race detection techniques examine a single execution of a concurrent program to discover a data race in the program. In this paper we focus on dynamic race detection.
Dynamic race detection may either be sound or unsound. Unsound techniques, like lockset based methods (Savage et al., 1997), have low overhead but they report potential races that are spurious. Sound techniques (Lamport, 1978; Mattern, 1988; Said et al., 2011; Huang et al., 2014; Smaragdakis et al., 2012; Kini et al., 2017), on the other hand, never report the presence of a data race, if none exist. The most popular, sound technique is based on computing the happensbefore (HB) partial order (Lamport, 1978) on the events of the trace, and declares a data race when there is a pair of conflicting events (reads/writes to a common memory location performed by different threads, at least one of which is a write operation) that are unordered by the partial order. There are two reasons for the popularity of the HB technique. First, because it is sound, it does not report false positives. Low false positive rates are critical for the widespread use of debugging techniques (Serebryany and Iskhodzhanov, 2009; Sadowski and Yi, 2014). Second, even though HBbased algorithms may miss races detected by other sound techniques (Said et al., 2011; Huang et al., 2014; Smaragdakis et al., 2012; Kini et al., 2017), they have the lowest overhead among sound techniques. Many improvements (Pozniansky and Schuster, 2003; Flanagan and Freund, 2009; Elmas et al., 2007) to the original vector clock algorithm (Mattern, 1988) have helped reduce the overhead even further.
However, HBbased dynamic analysis tools suffer from some drawbacks. Recall that a program has a data race, if there is some execution of the program where a pair of conflicting data accesses are performed consecutively. Even though HB is a sound technique, its soundness guarantee is only limited to the first pair of unordered conflicting events; a formal definition of “first” unordered pair is given later in the paper. Thus, a trace may have many HBunordered pairs of conflicting events (popularly called HBraces) that do not correspond to data races. To see this, consider the example program and trace shown in Fig. 1. The trace corresponds to first executing the statement of thread , before executing the statements of thread . The statement y := x + 5 requires first reading the value of (which is ) and then writing to . Recall that HB orders (i) two events performed by the same thread, and (ii) synchronization events performed by different threads, in the order in which they appear in the trace. Using to denote the th event of the trace, in this trace since there are no synchronization events, both and are in HB race. Observe that while and can appear consecutively in a trace (as in Fig. 0(b)), there is no trace of the program where and appear consecutively. Thus, even though the events and are unordered by HB, they do not constitute a data race.
As a consequence, developers typically fix the first race discovered, rerun the program and the dynamic race detection algorithm, and repeat the process until no races are discovered. This approach to bug fixing suffers from many disadvantages. First, running race detection algorithms can be expensive (Sadowski and Yi, 2014), and so running them many times is a significant overhead. Second, even though only the first HB race is guaranteed to be a real race, it doesn’t mean that it is the only HB race that is real. Consider the example shown in Fig. 2. In the trace (shown in Fig. 1(b)), both pairs and are in HBrace. demonstrates that is a valid data race (because they are scheduled consecutively). But is also a valid data race. This can be seen by first executing y := 1; in thread , followed by if (x == 0) skip; in thread , and then finally x := 2; in . The approach of fixing the first race, and then reexecuting and performing race detection, not only unnecessarily ignores the race , but it might miss it completely because might not show up as a HB race in the next execution due to the inherent nondeterminism when executing multithreaded programs. As a result, most practical race detection tools including ThreadSanitizer (Serebryany and Iskhodzhanov, 2009), Helgrind (Müehlenfeld and Wotawa, 2007) and FastTrack (Flanagan and Freund, 2009) report more than one race, even if those races are likely to be false, to give software developers the opportunity to fix more than just the first race. In Appendix C, we illustrate this observation on four practical dynamic race detection tools based on the happensbefore partial order. Each of these tools resort to naïvely reporting races beyond the first race and produce false positives as a result.
The central question we would like to explore in this paper is, can we detect multiple races in a given trace, soundly? One approach would be to mimic the software developer’s strategy in using HBrace detectors — every time a race is discovered, force an order between the two events constituting the race and then analyze the subsequent events. This ensures that the HB soundness theorem then applies to the next race discovered, and so on. Such an algorithm can be proved to only discover valid data races. For example, in trace (Fig. 1), after discovering the race assume that the events and are ordered when analyzing events after in the trace. By this algorithm, when we process event , we will conclude that are not in race because comes before , has been force ordered before , and is before , and so is ordered before . However, force ordering will miss valid data races present in the trace. Consider the trace from Fig. 2. Here the force ordering algorithm will only discover the race and will miss which is a valid data race. Another approach (Huang et al., 2014), is to search for a reordering of the events in the trace that respects the data dependencies amongst the read and write events, and the effect of synchronization events like lock acquires and releases. Here one encodes the event dependencies as logical constraints, where the correct reordering of events corresponds to a satisfying truth assignment. The downside of this approach is that the SAT formula encoding event dependencies can be huge even for a trace with a few thousand events. Typically, to avoid the prohibitive cost of determining the satisfiability of such a large formula, the trace is broken up into small “windows”, and the formula only encodes the dependencies of events within a window. In addition, solver timeouts are added to give up the search for another reordering. As a consequence this approach can miss many data races in practice (see our experimental evaluation in Section 5).
In this paper, we present a new partial order on events in an execution that we call schedulable happensbefore (SHB) to address these challenges. Unlike recent attempts (Smaragdakis et al., 2012; Kini et al., 2017) to weaken HB to discover more races, SHB is a strengthening of HB — some HB unordered events, will be ordered by SHB. However, the first HB race (which is guaranteed to be a real data race by the soundness theorem for HB) will also be SHB unordered. Further, every race detected using SHB is a valid, schedulable race. In addition, we prove that, not only does SHB discover every race found by the naïve force ordering algorithm and more (for example, SHB will discover both races in Fig. 2), it will detect all HBschedulable races. The fact that SHB detects precisely the set of HBschedulable races, we hope, will make it popular among software developers because of its enhanced predictive power per trace and the absence of false positives.
We then present a simple vector clock based algorithm for detecting all SHB races. Because the algorithm is very close to the usual HB vector clock algorithm, it has a low overhead. We also show how to adapt existing improvements to the HB algorithm, like the use of epochs (Flanagan and Freund, 2009)
, into the SHB algorithm to lower overhead. We believe that existing HBbased detectors can be easily modified to leverage the greater power of SHBbased analysis. We have implemented our SHB algorithm and analyzed its performance on standard benchmarks. Our experiments demonstrate that (a) many HB unordered conflicting events may not be valid data races, (b) there are many valid races missed by the naïve force ordering algorithm, (c) SHB based analysis poses only a little overhead as compared to HB based vector clock algorithm, and (d) improvements like the use of epochs, are effective in enhancing the performance of SHB analysis.
The rest of the paper is organized as follows: Section 2 introduces notations and definitions relevant for the paper. In Section 3, we introduce the partial order SHB and present an exact characterization of schedulable races using this partial order. In Section 4, we describe a vector clock algorithm for detecting schedulable races based on SHB. We then show how to incorporate epochbased optimizations to this vector clock algorithm. Section 5 describes our experimental evaluation. We discuss relevant related work in Section 6 and present concluding remarks in Section 7.
2. Preliminaries
In this section, we will fix notation and present some definitions that will be used in this paper.
Traces. We consider concurrent programs under the sequential consistency model. Here, an execution, or trace, of a program is viewed as an interleaving of operations performed by different threads. We will use , and to denote traces. For a trace , we will use to denote the set of threads in . A trace is a sequence of events of the form , where , and can be one of , (read or write to memory location ), , (acquire or release of lock ) and , (fork or join to some thread ) ^{1}^{1}1Formally, each event in a trace is assumed to have a unique event id. Thus, two occurences of a thread performing the same operation will be considered different events. Even though we will implicitly assume the uniqueness of each event in a trace, to reduce notational overhead, we do not formally introduce event ids.. To keep the presentation simple, we assume that locks are not reentrant. However, all the results can be extended to the case when locks are assumed to be reentrant. The set of events in trace will be denoted by . We will also use (resp. ) to denote the set of events that read (resp. write) to memory location . Further (resp. ) denotes the union of the above sets over all memory locations . For an event , the last write before is the (unique) event such that appears before in the trace , and there is no event between and in . The last write before event maybe undefined, if there is no event before . We denote the last write before by . An event is said to be an event of thread if either or . The projection of a trace to a thread is the maximal subsequence of that contains only events of thread , and will be denoted by ; thus an event (or ) belongs to both and . For an event of thread , we denote by to be the last event before in such that and are events of the same thread. Again, may be undefined for an event . The projection of to a lock , denoted by , is the maximal subsequence of that contains only acquire and release events of lock . Traces are assumed to be well formed — for every lock , is a prefix of some string belonging to the regular language .
Example 2.1 ().
Let us illustrate the definitions and notations about traces introduced in the previous paragraph. Consider the trace shown in Fig. 3. As in the introduction, we will refer to the th event in the trace by . For trace we have — ; ; . The last write of the read events is as follows: and . The projection with respect to lock is . The definition of projection to a thread is subtle in the presence of forks and joins. This can be seen by observing that ; this is because the fork event and the join event are considered to be events of both threads and by our definition. Finally, we illustrate through a few examples — , is undefined, , and . The cases of and are the most interesting, and they follow from the fact that both and are also considered to be events of .
Orders. A given trace induces several total and partial orders. The total order , will be used to denote the traceorder — iff either or appears before in the sequence . Similarly, the threadorder is the smallest partial order such that for all pairs of events performed by the same thread, we have .
Definition 2.2 (HappensBefore).
Given trace , the happensbefore order is the smallest partial order on such that

[label=()]

,

for every pair of events , and, with , we have
Example 2.3 ().
We illustrate the definitions of , and using trace from Fig. 3. Trace order is the simplest; iff . Thread order is also straightforward in most cases; the interesting cases of and follow from the fact that and are events of both threads and . Finally, let us consider . It is worth observing that simply because these events are thread ordered due to the fact that and are events of both thread and . In addition, because by rule (b), and , and is transitive.
Trace Reorderings. Any trace of a concurrent program represents one possible interleaving of concurrent events. The notion of correct reordering (Smaragdakis et al., 2012; Kini et al., 2017) of trace identifies all these other possible interleavings of . In other words, if is a correct reordering of then any program that produces may also produce . The definition of correct reordering is given purely in terms of the trace and is agnostic of the program that produced it. We give the formal definition below.
Definition 2.4 (Correct reordering).
A trace is said to be a correct reordering of a trace if

[label=()]

is a prefix of , and

for a read event such that is not the last event in , exists iff exists. Further, if it exists, then .
The intuition behind the above definition is the following. A correct reordering must preserve lock semantics (ensured by the fact that is a trace) and the order of events inside a given thread (condition (a)). Condition (b) captures local determinism (Huang et al., 2014). That is, only the previous events in a given thread determine the next event of the thread. Since the underlying program that generated can have branch events that depend upon the data in shared memory locations, all reads in , except for the last events in each thread, must see the same value as in ; since our traces don’t record the value written, this can be ensured by conservatively requiring every read to see the same write event. If the last event of any thread in is a read, we allow that this event may not see the same value (and thus the same last write event) as in . For example, consider the program and trace given in Fig. 1. The read event in the conditional in thread cannot be swapped with the preceding event in thread , because that would result in a different branch being taken in , and the assignment x := 10 in will never be executed. However, this is required only if the read event is not the last event of the thread in the reordering. If it is the last event, it does not matter what value is read, because it does not affect future behavior.
We note that the definition of correct reordering we have is more general than in (Kini et al., 2017; Smaragdakis et al., 2012) because of the relaxed assumption about the lastwrite events corresponding to read events which are not followed by any other events in their corresponding threads. In other words, every correct reordering of a trace according to the definition in (Smaragdakis et al., 2012; Kini et al., 2017) is also a correct reordering of as per Definition 2.4, but the converse is not true. On the other hand, the related notion of feasible set of traces (Huang et al., 2014) allows for an even larger set of alternate reorderings that can be inferred from an observed trace by only enforcing that the lastwrite event corresponding to a read event must write the same value that reads in . In particular, may not be the same as .
In addition to correct reorderings, another useful collection of alternate interleavings of a trace is as follows. Under the assumption that identifies certain causal dependencies between events of , we consider interleavings of that are consistent with .
Definition 2.5 (respecting trace).
For trace , we say trace respects if for any such that and , we have and .
Thus, a respecting trace is one whose events are downward closed with respect to and in which ordered events are not flipped. We will be using the above notion only when the trace is a reordering of , and hence .
Example 2.6 ().
We give examples of correct reorderings of shown in Fig. 3. The traces , , and are all examples of correct reorderings of . Among these, the trace is not respecting because it is not downward closed — events are all HBbefore and none of them are in .
Race. It is useful to recall the formal definition of a data race, and to state the soundness guarantees of happensbefore. Two data access events and are said to be conflicting if , and at least one among and is a write event. A trace is said to have a race if it is of the form such that and are conflicting; here is either called a race pair or a race. A concurrent program is said to have a race if it has an execution that has a race.
The partial order is often employed for the purpose of detecting races by analyzing program executions. In this context, it is useful to define what we call an HBrace. A pair of conflicting events is said to be an HBrace if and and are incomparable with respect to (i.e., neither nor ). We say an HBrace is the first HBrace if for any other HBrace in , either , or and . For example, the pair in trace from Fig. 1 is the first HBrace of . The soundness guarantee of HB says that if a trace has an HBrace, then the first HBrace is a valid data race.
Theorem 2.7 (Soundness of HB).
Let be a trace with an HBrace, and let be the first HBrace. Then, there is a correct reordering of , such that .
Instead of sketching the proof of Theorem 2.7, we will see that it follows from the main result of this paper, namely, Theorem 3.3.
Example 2.8 ().
We conclude this section by giving examples of HBraces. Consider again from Fig. 3. Among the different pairs of conflicting events in , the HBraces are , , , , , , , and .
Remark.
Our model of executions and reorderings assume sequential consistency, which is a standard model used by most race detection tools. Executions in a more general memory model, such as Total Store Order (TSO), would also have events that indicate when a local write was committed to the global memory (Huang and Huang, 2016). In that scenario, the definition of correct reorderings would be similar, except that “last write” would be replaced by “last observed write”, which would either be the last committed write or the last write by the same thread, whichever is later in the trace. The number of correct reorderings to be considered would increase — instead of just considering executions where every write is immediately committed, as we do here, we would also need to consider reorderings where the write commits are delayed. However, since our results here are about proving the existence of a reordered trace where a race is observed, they carry over to the more general setting. We might miss race pairs that could be shown to be in race in a weaker memory model, where more reoderings are permitted, but the races we identify would still be valid.
3. Characterizing Schedulable Races
The example in Fig. 1 shows that not every HBrace corresponds to an actual data race in the program. The goal of this section is to characterize those HBraces which correspond to actual data races. We do this by introducing a new partial order, called schedulable happensbefore, and using it to identify the actual data races amongst the HBraces of a trace. We begin by characterizing the HBraces that correspond to actual data races.
Definition 3.1 (schedulable race).
Let be a trace and let be conflicting events in . We say that is a schedulable race if there is a correct reordering of that respects and or for some trace .
Note that any schedulable race is a valid data race in . Our aim is to characterize schedulable races by means of a new partial order. The new partial order, given below, is a strengthening of .
Definition 3.2 (Schedulable HappensBefore).
Let be a trace. Schedulable happensbefore, denoted by , is the smallest partial order on such that

[label=()]


The partial order can be used to characterize schedulable races. We state this result, before giving examples illustrating the definition of .
Theorem 3.3 ().
Let be a trace and be conflicting events in . is an schedulable race iff either is undefined, or .
Proof.
(Sketch) The full proof is presented in Appendix A; here we sketch the main ideas. We observe that if is a correct reordering of that also respects , then also respects except possibly for the last events of every thread in . That is, for any such that , , and is not the last event of some thread in , we have and . Therefore, if , then any correct reordering respecting that contains both and will also have . Further since is not the last event of its thread (since is present in ) and , must occur between and in . Therefore is not a schedulable race. The other direction can be established as follows. Let be the trace consisting of events that are before or (if defined), ordered as in . Define . We prove that when and satisfy the condition in the theorem, as defined here, is a correct reordering and also respects . ∎
We now illustrate the use of through some examples.
Example 3.4 ().
In this example, we will look at different traces, and see how reasons. Like in the introduction, we will use to refer to the th event of a given trace (which will be clear from context). Let us begin by considering the example program and trace from Fig. 1. Notice that , and so and are HBraces. Because , we have . Using Theorem 3.3, we can conclude correctly that (a) is schedulable as is undefined, but (b) is not, as .
Let us now consider trace from Fig. 2. Observe that , and so both and are schedulable races by Theorem 3.3. Note that, unlike force ordering, correctly identifies all real data races.
Finally, let us consider two trace examples that highlight the kind of subtle reasoning is capable of. Let us begin with from Fig. 3. As observed in Example 2.8, the only HBraces in this trace are , , , , , , , and . Both and are schedulable as demonstrated by the reorderings and from Example 2.6. However, the remaining are not real data races. Let us consider the pairs and for example. Theorem 3.3’s justification for it is as follows: . But, let us unravel the reasoning behind why neither nor are data races. Consider an arbitary correct reordering of that respects and contains . Since is also an event of , . In addition, as . Now, since , is before in and since , must also be before . Therefore, and will be between and and between and . Similar reasoning can be used to conclude that the other pairs are not schedulable as well.
Lastly, consider trace shown in Fig. 4. In this case, . All conflicting memory accesses are in HBrace. While HB correctly identifies the first race as valid, there are 3 HBraces that are not real data races — , , and . is not valid because any correct reordering of must have before and before . This is also captured by SHB reasoning because . A similar reasoning shows that is not valid. The interesting case is that of . Here, in any correct reordering of , the following must be true: (a) if then ; (b) if then ; (c) if then ; and (d) if then . Therefore, any correct reordering of containing both and contains and (because of (a) and (b)) and must contain at least one of or to ensure that critical sections of don’t overlap. Then in , and cannot be consecutive because either or will appear between them (properties (c) and (d)). This is captured using SHB and Theorem 3.3 by the fact that .
We conclude this section by observing that the soundness guarantees of HB (Theorem 2.7) follows from Theorem 3.3. Consider a trace whose first HBrace is . We claim that is a schedulable race. Suppose (for contradiction) it is not. Then by Theorem 3.3, is defined and . Now observe that we must have (or otherwise , contradicting our assumption that is an HBrace). Then, by the definition of (Definition 3.2), there are two events and (possibly same as and ) such that , , , and . Then is an HBrace, and it contradicts the assumption that is the first HBrace.
4. Algorithm for Detecting schedulable Races
We will discuss two algorithms for detecting races identified by the partial order. The algorithm is based on efficient, vector clock based computation of the partial order. It is similar to the standard Djit algorithm (Pozniansky and Schuster, 2003) to detect HBraces. We will first briefly discuss vector clocks and associated notations. Then, we will discuss a onepass streaming vector clock algorithm to compute for detecting races. Finally, we will discuss how epoch optimizations, similar to FastTrack (Flanagan and Freund, 2009) can be readily applied in our setting to enhance performance of the proposed vector clock algorithm.
4.1. Vector Clocks and Times
A vector time VT : Nat maps each thread in a trace to a natural number. Vector times support comparison operation for pointwise comparison, join operation () for pointwise maximum, and update operation which assigns the time Nat to the component in the vector time . Vector time maps all threads to 0. Formally,
Vector clocks are place holders for vector timestamps, or variables whose domain is the space of vector times. All the above operations, therefore, also apply to vector clocks. The algorithms described next maintain a state comprising of several vector clocks, whose values, at specific instants, will be used to assign timestamps to events. We will use double struck font (, , , etc.,) for vector clocks and normal font (, , etc.,) for vector times .
4.2. Vector Clock Algorithm for Detecting Schedulable Races
Algorithm LABEL:algo:vc depicts the vector clock algorithm for detecting schedulable races using the partial order. Similar to the vector clock algorithm for detecting HB races, Algorithm LABEL:algo:vc maintains a state comprising of several vector clocks. The idea behind Algorithm LABEL:algo:vc is to use these vector clocks to assign a vector timestamp to each event (denoted by ) such that the ordering relation on the assigned timestamps () enables determining the partial order on events. This is formalized in Theorem 4.1. The algorithm runs in a streaming fashion and processes each event in the order in which it occurs in the trace. Depending upon the type of the observed event, an appropriate handler is invoked. The formal parameter in each of the handlers refers to the thread performing the event, and the parameters , and represent the lock being acquired or released, the memory location being accessed and the thread being forked or joined, respectively. The procedure Initialization assigns the initial values to the vector clocks in the state. We next present details of different parts of the algorithm.
algocf[htbp]
4.2.1. Vector clocks in the state
The description of each of the vector clocks that are maintained in the state of Algorithm LABEL:algo:vc is as follows:

Clocks : For every thread in the trace being analyzed, the algorithm maintains a vector clock . At any point during the algorithm, let us denote by the last event performed by thread in the trace so far. Then, the timestamp of the event can be obtained from the value of the clock as follows. If is a read, acquire or a join event, then , otherwise , where .

Clocks : The algorithm maintains a vector clock for every lock in the trace. At any point during the algorithm, the clock stores the timestamp , where is the last event of the form , in the trace seen so far.

Clocks : For every memory location accessed in the trace, the algorithm maintains a clock (ast rite to ) to store the timestamp , of the last event of the form .

Clocks and : The clocks and store the read and write access histories of each memory location . At any point in the algorithm, the vector time stored in the the ead access history clock is such that where is the last event of thread that reads in the trace seen so far. Similarly, the vector time stored in the rite access history clock is such that where is the last event of thread that writes to in the trace seen so far.
The clocks , , are used to correctly compute the timestamps of the events, while the access history clocks and are used to detect races.
4.2.2. Initialization and Clock Updates
For every thread , the clock is initialized to the vector time . Each of the clocks , , and are initialized to . This is in accordance with the semantics of these clocks presented in Section 4.2.1.
When processing an acquire event , the algorithm reads the clock and updates the clock with (see Line 9). This ensures that the timestamp ( which is the value of the clock after executing Line 9) is such that for every release event observed in the trace so far.
At a release event , the algorithm writes the timestamp of the current event to the clock (see Line 11). Notice that
Comments
There are no comments yet.