Two-Phase Dynamic Analysis of Message-Passing Go Programs based on Vector Clocks

07/10/2018 ∙ by Martin Sulzmann, et al. ∙ 0

Understanding the run-time behavior of concurrent programs is a challenging task. A popular approach is to establish a happens- before relation via vector clocks. Thus, we can identify bugs and per- formance bottlenecks, for example, by checking if two con icting events may happen concurrently. We employ a two-phase method to derive vector clock information for a wide range of concurrency features that includes all of the message-passing features in Go. The rst phase (instrumentation and tracing) yields a run-time trace that records all events related to message-passing concurrency that took place. The second phase (trace replay) is carried out o ine and replays the recorded traces to infer vector clock information. Trace replay operates on thread-local traces. Thus, we can observe behav- ior that might result from some alternative schedule. Our approach is not tied to any speci c language. We have built a prototype for the Go programming language and provide empirical evidence of the usefulness of our method.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The analysis of concurrent programs is an important but due to the high degree of non-determinism a notoriously difficult problem. We consider here programs that make use of message-passing in the style of Communicating Sequential Processes (CSP) (Hoare, 1978). In our implementation, we support the Go programming language (Go, 2018) but our approach also applies to languages with similar message-passing features such as Concurrent ML (CML) (Reppy, 1999). Our focus is on the dynamic analysis (a.k.a. testing) of Go where we assume that the program is executed for a fixed number of steps. Specifically, we consider the challenge of given a precise explanation of the interplay among message-passing events that take place for a single execution run.

Consider the following program where we adopt Go-style notation for message-passing.

  spawn { x <- 1 };    // M1
  spawn { <-x };       // M2
  <-x;                 // M3

We assume that x is some unbuffered channel. We write x <- 1 to send value 1 via channel x and write <-x to receive some value via x. The actual values sent/received do not matter here.

We consider a program run where location M3 receives a value from M1. The receive at location M2 is blocked but from the user perspective the program terminates without showing any abnormal behavior. In Go, once the main thread terminates, all remaining threads are terminated as well. Our analysis is able to feedback to the user that M2 could have also received a value from M1 (which then would result in a deadlock). Suppose we encounter the deadlock. That is, M2 receives from M1. For such deadlocking situation, the Go run-time reports for each blocked thread the event that is responsible for the blockage. Our analysis provides more details and feedbacks to the user that M3 is blocked (the main thread) but could possibly communicate via M1.

In the following example, we make use of Go’s ability to close a channel. Any subsequent receive on a closed channel never blocks and obtains a default value. However, any subsequent send fails and yields a run-time exception.

  spawn { x <- 1 };       // M1
  spawn { close(x) };     // M2
  <-x;                    // M3

Assuming thread M2 executes followed by M3. From the user perspective, the program terminates without any abnormal behavior. Recall that once the main thread terminates, all remaining threads such as M1 terminate as well. Our analysis feedbacks to the user that there might be a different schedule where a send on a closed channel may occur.

In our final example, we consider Go’s select statement which corresponds to non-deterministic choice.

  spawn { x <- 1;           // M1
          y <- 1 };         // M2
  select {
    case <-x:               // M3
    case <-y:               // M4

The select statement blocks if neither of the cases is available, i.e. can communicate with some concurrent event. If both cases are available, one of the cases is chosen based on a pseudo-random order to ensure fairness. For our example, case M4 never applies. The user may wonder why this is so. Our analysis feedbacks to the user that (for this specific execution run), any potential communication partner of M4 always happens after a communication of M3 took place.

Besides assisting the user in narrowing down the source of a bug, our analysis can also identify performance bottlenecks. For example, consider the case of a (too slow) receiving thread that needs to negotiate with a high number of sending parties. A specific instance of this analysis case can be used for lock contention. We do not support the concept of a mutex in our formal treatment. However, it is well known that a mutex effectively corresponds to a buffered channel of size one where send equals to lock and receive equals to unlock.

To provide the above feedback to the user, we infer the dependencies among concurrent events. This is achieved by establishing a happens-before relation (Lamport, 1978) where the happens-before relation is derived via vector clocks (Fidge, 1992; Mattern, 1989). Earlier work by Fidge (Fidge, 1988) and Mattern (Mattern, 1989) shows how to compute vector clocks in the message-passing setting. We improve on these results as follows. First, we also cover buffered channels of a fixed size. Second, we introduce a novel form of a pre vector clock annotation to analyze events that lack a communication partner. That is, events that could not commit such as M2 in our first example above.

A novel aspect of our work is a two-phase method to derive vector clocks. The first phase consists of a light-weight instrumentation of the program that can be carried out via a simple pre-processing step. The inference of vector clock information happens in a subsequent (off-line) phase. Events are not recorded in a global trace as they actually took place. Rather, we record events on a per-thread basis. Hence, tracing requires little synchronization and therefore incurs only a low run-time overhead.

Almost no extra synchronization for tracing purposes among threads is required. To guarantee that we derive vector clock information that corresponds to an actual program run, a receive event obtains the thread id and program counter from the sending party. This is the only extra intra-thread information required to properly match a receive event to its corresponding send event.

In summary, our contributions are:

  • Based on a simple instrumentation method to obtain thread-local run-time traces of recorded events (Section 2), we give a precise account of how to derive vector clock information for a wide range of message-passing features (Section 3).

  • We discuss several analysis scenarios where the vector clock information inferred proves to be useful (Section 4). The scenarios comprise detection of performance bottlenecks and potential bugs as well as recovery from a bug.

  • We have built a prototype for the Go programming language (Go, 2018) and provide experiments that include real-world examples where we discuss the effectiveness of our method (Section 5).

Related work is discussed in Section 6. We conclude in Section 7. Further details can be found in the Appendix.

2. Instrumentation and Tracing

Figure 1. Programs

We assume a simplified language to cover the main concurrency features of Go. See Figure 1. Expressions include pairs (anonymous struct). Notation for send and receive follows Go syntax. Receive is always tied to an assignment statement where we write to denote assignment. Type declarations of variables are omitted for brevity. A buffered channel of size is introduced via . For , we refer to as an unbuffered channel. Commands to spawn a new thread and close a channel we have seen already.

Go supports non-deterministic choice via select where the to be selected cases are represented in a list. For example,

denotes a command that either sends a value via channel or receive a value from channel . We also support select with a default case. We assume that a single send/receive statement is represented by a select statement with a single case. We ignore locks as their treatment exactly corresponds to buffered channels of size one.

A program is represented as a list of commands. We follow Haskell style syntax and write to denote a non-empty list with head and tail . We write to denote list concatenation.

Programs are instrumented to record the events that took place when executing the program. Events are recorded on a per thread basis. Hence, we obtain a list of (thread-local) traces where each trace is connected to a thread. We write to denote the list of recorded traces attached with their thread id. The syntax of traces and events we use is as follows.

Definition 2.1 (Run-Time Traces and Events).

The purpose of each event becomes clear when considering the instrumentation of programs.

Figure 2. Instrumentation

Figure 2 formalizes the instrumentation of programs which can be carried out via a simple pre-processor. For each thread, we assume a thread-local variable that stores the events that take place in this thread. In turn, we discuss the various instrumentation cases.

As we support dynamic thread creation, there might be dependencies among threads when it comes to tracing. For example, consider the following program snippet.

Our instrumentation yields

Variable logs the events of the main thread, and variable the events in the newly spawned thread. Events in are only processed once the wait event is matched against its corresponding signal event. Thus, we ensure that events logged in a newly created thread happen after the events that took place in the thread that issued the spawn command. In the instrumentation, we assume a shared variable where the primitive atomically increments this variable and returns the updated value.

In Go it is possible to close a channel which means that any subsequent send on that channel yields an error but any receive succeeds by retrieving a dummy value. We support this feature by recording .

We support unbuffered as well as buffered channels of a fixed size. For operations on buffered channels, we also need to record the buffer size. This can be easily done in the instrumentation but is left out for brevity.

In case of channel operations send and receive, we use (post) events and to represent committed operations. That is, sends and receives that actually took place. To uniquely connect a sender to its corresponding receiver, the sender transmits its thread id and program counter to the receiver.

Consider instrumentation case that deals with the send operation. We assume a primitive tid to compute the thread id and a primitive pc to compute the thread-local program counter. Both values are additionally transmitted to the receiver. We assume common tuple notation. Instead of we transmit and store the event . For simplicity, we assume that both calls to pc yield the same value. In an actual implementation, we would need to store the current program counter and then transmit the stored value as well as record the value in the post event. Further note that events are stored in thread-local traces. Hence, in an implementation we could save space and drop the tid component for committed send operations. Here, we keep the tid component to have a uniform presentation for send and receive.

At the receiving site, see case , we assume primitives and to access the respective components of the received value. The receiver stores to record the sender’s thread id and program counter.

In addition to committed events we also keep track of events that could possibly commit. This can applies to not chosen cases in a select statements. We make use of pre events to represent such cases. For the earlier select example (SEL), we record both possibilities via the event . A select statement may include a default case. We represent this variant by including in the list of pre event. If the default case is chosen, we store the event .

The program’s dynamic behavior is captured by the trace obtained from running the instrumented program. By replaying the trace we can infer for each event a vector clock. This is what we will discuss next.

3. Vector Clocks

Figure 3. Trace Replay

The goal is to annotate events with vector clock information. For this purpose, we replay the set of recorded run-time traces to derive a global trace of vector clock annotated events. The syntax is as follows.

Definition 3.1 (Vector Clock Annotated Events).

For convenience, we represent a vector clock as a list of clocks where the first position belongs to thread 1 etc. We include to deal with events that did not commit. More on this shortly. We write to denote the initial vector clock where all entries (time stamps) are set to . We write to retrieve the -th component in . We define if for each position we have that . We write to denote the vector clock obtained from where all elements are the same but at index the element is incremented by one. We write to denote the vector clock where we per-index take the greater element. We write to denote the vector clock , i.e. all entries are zero except position  which is equal to one. We write to denote thread with vector clock .

We write to denote a send operation via channel  in thread . We infer two vector clock annotations  and for the following reason. In the (run-time) trace , we record for each channel operation a pre event (communication about to happen) and a post event (communication has happened). Vector clock corresponds to the pre event and to the post event.

We write to denote a vector clock annotated receive event in thread . As in case of send, represents the vector clock of the pre event and the vector clock of the post event.

We write to denote a vector clock annotated default event connected to a select statement. Like in case of send and receive, we find pre and post vector clock annotations. We will argue later that having the vector clock information for the pre event can have significant advantages for the analysis. In fact, as we support selective communication, the post vector clock for not selected cases may be absent. We introduce the following notation.

Definition 3.2 (Not Selected Events).

We write to denote an event from thread with pre vector clock where the post vector clock is absent. For , is a short-hand for . For , is a short-hand for . For , is a short-hand for .

We write to denote a vector clock annotated close event on channel  in thread  where is the post vector clock. There is no pre vector clock as close operations never block.

Figure 3 defines to trace replay rules to infer vector clock information. Replay rules effectively resemble operational rewrite rules to describe the semantics of a concurrent program. We introduce a rewrite relation among configurations to derive . Component corresponds to the list of thread-local run-time traces. Recall Definition 2.1. Component keeps track of the list of buffered channels and their current state. Its definition is as follows.

Definition 3.3 (Buffered Channels).

We assume that in , refers to the channel name and to the buffer size. Buffer size information can be obtained during run-time tracing but we omitted this detail in the formalization of the instrumentation. denotes the buffer. Initially, we assume that all buffer slots are empty and filled with where represents the initial vector clock. We write to denote a buffer where all slots in are occupied.

We have now everything in place to discuss the trace replay rules.

3.1. Shuffling and Collection

In our tracing scheme, we do not impose a global order among events. Events are stored in thread-local traces. This allows us to explore alternative schedules by suitably rearranging (shuffle) the list of buffered channels and thread-local traces. In terms of the replay rules, we therefore find rule (Shuffle). Via rule (Closure) we simply combine several elementary rewriting steps.

The next set of rules assume that channels and traces are suitably shuffled as these rules only inspect the leading buffer and the two leading traces.

3.2. Intra Thread Dependencies

Rule (Signal/Wait) ensures that a thread’s trace is only processed once the events stored in that trace can actually take place. See the earlier example in Section 2.

3.3. Unbuffered Channels

Rule (Sync) processes send/receive communications via some unbuffered channel. For convenience, we assume that primitive events in the list can be suitably rearranged. We check for two thread-local traces where a send and receive took place and the send and receive are a matching pair. That is, in the actual program run, the receiver obtained the value from this sender. A matching pair is identified by comparing the recorded thread id and program counter of the sender. See post events and .

Our (re)construction of vector clocks follows the method developed by Fidge and Mattern. We increment the time stamps of the threads involved and exchange vector clocks. To indicate that a synchronization between two concurrent events took place, we build the maximum. Our novel idea is to infer pre vector clocks. Thus, we can detect (a) alternative communications, and (b) events not chosen within a select statement. Recall the notation introduced in Definition 3.2. For brevity, we ignore the formal treatment of (c) orphan events. That is, events with a singleton list of pre events that lack a post event. We can treat such events like case (b) by including a dummy post event.

Here is an example to illustrate (a).

Example 3.4 ().

Consider the program annotated with thread id numbers.

We assume a specific program run where thread 2 synchronizes with thread 3. Thread 4 synchronizes with thread 5 and finally thread 3 synchronizes with thread 4. Here is the resulting trace. For presentation purposes, we write the initial vector clock behind each thread.

For example, in thread 3, in the second program step, the send operation on channel  could commit. Hence, we find the event .

Trace replay proceeds as follows. We process intra-thread dependencies via rule (Signal/Wait). This leads to the following intermediate step.

Next, we exhaustively synchronize events and attach pre/post vector clocks. We show the final result. For presentation purposes, instead of , we write the short form . Thread ids are written on the left. Events annotated with pre/post vector clocks are written next to the thread in which they arise. We omit the main thread (1) as there are no events recorded for this thread.

Consider the underlined events. Both are matching events, sender and receiver over the same channel. An alternative communication among two matching events requires both events to be concurrent to each other. In terms of vector clocks, concurrent means that their vector clocks are incomparable.

However, based on their post vector clocks it appears that the receive on channel  in thread 4 happens after the send in thread 2 because . This shows the limitations of post vector clocks as it is easy to see that both events represent an alternative communication. Thanks to pre vector clocks, this alternative communication can be detected. We find that events are concurrent because their pre vector clocks are incomparable, i.e. and .

3.4. Buffered Channels

Neither Fidge nor Mattern cover buffered channels. We could emulate buffered channels by treating each send operation as if this operation is carried out in its own thread. However, this leads to inaccuracies.

Example 3.5 ().


Assuming an emulation of buffered channels as described above, our analysis would report that (2) and (3) form an alternative match. However, in the Go semantics, buffered messages are queued. Hence, for every program run the only possibility is that (1) synchronizes with (2) and a synchronization with (3) never takes place!

We can eliminate such false positives by keeping track of (un)occupied buffer space during trace replay. Rule (Receive) processes a receive over some buffered channel. We check the first occupied buffer slot where buffers are treated like queues. The enqueued send event must match the receive event. We check for a match by comparing received thread id and program counter. The receive events pre/post vector clocks are computed as in case of rule (Sync). The buffered send event is dequeued and we enqueue an empty buffer slot attached with the sender’s vector clock. This is important to establish the proper order among receivers and senders as we will see shortly.

Consider rule (Send) where a sender synchronizes with an empty buffer slot. Recall that notation implies that all buffer slots in are occupied. We increment the time stamp of the thread and synchronize with the empty buffer slot by building the maximum. The now occupied buffer slot carries the resulting vector clock. If we would simply overwrite the buffer slot with the sender’s vector clock, the proper order among receive and send events may get lost.

Example 3.6 ().


It is clear that for any program run the receiver at location (1) happens before the send at location (2). Suppose we encounter a program run where the first two sends take place before the receive. For brevity, we omit the set of local traces containing all recorded pre/post events. Here is the program annotated with (post) vector clock information.

Recall that the main thread (with id number ) creates a new thread (signal/wait events). This then leads to the first send having the post vector clock and the second send having the post vector clock . At this point, the buffer contains

We write to indicate that the program counter of the sending thread  does not matter here. Then, the receive synchronizes with the first send. Hence, we find the post vector clock and the buffer has the form . As there is an empty buffer slot. The third send can proceed.

Ignoring the vector clock attached to the empty buffer slot would result in the (post) vector clock for the third send. This is clearly wrong as then the receive and (third) send appear to be concurrent to each other. Instead, the sender synchronizes with the vector clock of the empty buffer slot. See rule (Send). In essence, this vector clock corresponds to the vector clock of the earlier receive. Hence, we find that the third send has the vector clock and thus the receive happens before the third send.

3.5. Closed Channel

We deal with close events by simply incrementing the thread’s timestamp. See rule (Close). A receive event on a closed channel is distinguished from other receives by the fact that dummy values are received. We write to refer to a dummy thread id and program counter. See rule (Receive-Closed).

3.6. Select with Default

Rule (Default) covers that case that a default branch of a select statement has been taken.

3.7. Properties

Senders and receivers are uniquely connected based on the sender’s thread id and program counter. Hence, any vector clock annotation obtained via trace replay corresponds to a valid program run. However, due to our thread-local tracing scheme, trace replay rules do not need to follow the schedule of the actual program run. It is possible to explore alternative schedules. In case of buffered channels this may lead to different vector clock annotations. For unbuffered channels it turns out that the behavior, i.e. vector clock annotation, is completely deterministic regardless of the schedule. Formal details follow below.

We write to denote the initial buffer connected to some buffered channel of size . We assume that is filled with elements . That is, .

Definition 3.7 (Deterministic Replay).

Let be a program and its instrumentation where for a specific program run we observe the list of thread-local traces. Let be the buffered channels appearing in annotated with their buffer size.

We say that trace replay

is exhaustive iff no further trace replay rules are applicable and for all , each only contains pre events.

We say that trace replay is stuck iff no further trace replay rules are applicable and for some , contains some post events.

We say that the list of thread-local traces enjoys deterministic replay if for any two exhaustive trace replays


we have that vector clocks for events at the same program location in and are identical. In case of loops, we compare program locations used at the same instance.

Proposition 3.8 (Deterministic Replay for Unbuffered Channels).

Let be a program consisting of unbuffered channels only. Then, any list of thread-local run-time traces obtained enjoys deterministic trace replay.


Trace replay rules define a rewrite relation among configurations . The formulation in Figure 3 assumes that the list of run-time traces can be shuffled so that replay rules only operate on the first, respectively, the first and the second element in that list. In the following, we assume a more liberal formulation of trace replay rules where any trace can be picked to apply a rule. Both formulations are equivalent and hence enjoy the same properties. But the the more liberal formulation allows us to drop rules (Shuffle) and (Closure) from consideration and we apply some standard (rewriting) reasoning method. We proceed by showing that the more liberal formulation is terminating and locally confluent.

Termination is easy to establish as each rule consumes at least one event.

Next, we establish local confluence by observing all critical pairs. In our setting, critical pairs include configurations as well as the the vector clock annotated events obtained during rewriting. That is, we observe all situations where for and a single rewrite step we find and for some . We show that all critical pairs are joinable by examining all rule combinations that lead to a critical pair. In our setting, joinable means that we find and where vector clocks for events at the same program location in and are identical.

As we only consider unbuffered channels, rules (Send) and (Receive) are not applicable. Hence, we only need to consider combinations of rules (Signal/Wait), (Sync), (Receive-Closed), (Close) and (Default).

Rules (Receive-Closed), (Close) and (Default) only affect a specific run-time trace and the events in that trace. So, any combination of these rules that lead to a critical pair is clearly joinable.

Rules (Signal/Wait) and (Sync) affect two run-time traces. The choice which two traces are affected is fixed. For each (synchronous) sent that took there is exactly one matching receive and vice versa. This is guaranteed by our tracing scheme where we identify send-receive pairs via the sender’s thread id and program counter. The same applies to signal and wait. Hence, any critical pair that involves any of these two rules is joinable.

We summarize. The more liberal rules are terminating and locally confluent. By Newmann’s Lemma we obtain confluence. Any derivation with rules in Figure 3 can be expressed in terms of the more liberal rules. Hence, the rules in Figure 3 are confluent. Confluence implies deterministic trace replay. ∎

Recall Example 3.4 where the run-time trace records that (a) thread 2 synchronizes with thread 3, and (b) thread 4 synchronizes with thread 5. The actual schedule (if first (a) or (b)) is not manifested in the trace. Indeed, if (a) or (b) first does not affect the vector clock information obtained.

Proposition 3.9 ().

Let be a program consisting of unbuffered channels only where for some instrumented program run we find that is the number of thread-local run-time traces and is the sum of the length of all thread-local run-time traces. Then, vector clock annotated events can be computed in time .


Proposition 3.8 guarantees that any order in which trace replay rules are applied yields the same result and most importantly we never get stuck. Rules (Signal/Wait) and (Sync) need to find two matching partners in two distinct traces. All other rules only affect a single trace. Hence, each rewriting step requires . The number of possible combinations of two elements from a set of size . Each rewriting step reduces the size of at least one of the thread-local traces. Hence, we must obtain the result in steps. So, overall computation of takes time . ∎

The situation is different for buffered channels.

Example 3.10 ().


We assume that the send and receive in the helper thread execute first. Here is the resulting trace.

For example, we obtain as thread 1’s program counter is at position 3 after execution of the make and spawn statement.

The initial buffer is of the form . We infer different vector clocks depending which local trace we process first. If we start with thread  we derive vector clock for location (1), for location (2), for location (3) and for location (4). If we start with thread , we obtain vector clock for location (3), for location (4), for location (1) and for location (2).

We conclude that replay is non-deterministic for buffered channels. A different schedule possibly implies a different vector clock annotation.

Proposition 3.11 ().

Let be a program consisting of buffered channels where for some instrumented program run we find that is the number of thread-local run-time traces and is the sum of the length of all thread-local run-time traces. Then, we can enumerate all possible vector clock annotations in time .


In each step there are choices to consider that might lead to a different result. We need a maximum of steps. Hence, exhaustive enumeration takes time . ∎

In practice, we find rarely cases of an exponential number of schedules. As we are in the offline setting, we argue that some extra cost is justifiable to obtain more details about the program’s behavior.

We summarize. The vector clock annotated trace contains a wealth of information. In the upcoming section, we will discuss some specific analysis scenarios where this information can be exploited. For certain scenarios, the complete trace is often not necessary and for vector clocks a more memory-saving representation can be employed as well.

4. Analysis Scenarios

We consider four scenarios.


Message contention.


Send on a closed channel.


Deadlock recovery.


Alternative communication partners for send/receive pairs.

MP is to identify performance bottlenecks. This method can also be used to carry out lock contention. SC spots a bug whereas DR provides hints to a user how to recover from a deadlock. AC provides general information about the concurrent behavior of message-passing programs. Below, we describe how we can implement each scenario based on the vector clock information provided. Realistic examples for each scenario will be discussed in Section 5.

4.1. Message Contention

Figure 4.

Send/Receive Epoch

We wish to check if there are competing send operations for a specific channel . If we also take into account dangling events, we simply consult and count all events where for we have that pre vector clocks and are incomparable (the events are concurrent). The same check applies to receive events.

To carry out the analysis efficiently it is unnecessary to construct the entire set . For each channel , we only need to keep track of concurrent sends/receives. For each concurrent operation, instead of the full (pre) vector clock, we only record a pair of thread id and the time stamp for that thread. We refer to this pair as an epoch following (Flanagan and Freund, 2010).

Definition 4.1 (Send/Receive Epoch).

Notation denotes a list of epochs. In the extreme case, denotes the entire vector clock. For example, the channel’s initial state corresponds to the vector clock at the declaration site. If covers all threads, we treat as the sorted (according to thread ids) list and consider as equivalent to .

Let . We define and for . We define if for all we have that . We define if there exists such that .

For convenience, we carry out the epoch optimization on the annotated trace instead of adjusting the rules in Figure 3. In practice, the epoch optimization can be integrated into the trace replay rules. The epoch optimization rules in Figure 4 introduce a rewrite relation among where for brevity we omit auxiliary rules to drop irrelevant events (close and default) and rearrange suitably. Rule (Channel-Init) deals with the initialization of a channel and assumes that the declaration site is instrumented such that we can obtain ’s initial vector clock. For this purpose, we assume an event .

Rule (Receive-Multiple) covers multiple concurrent receives. We build the maximum number of receives that are concurrent to the current event. Rule (Receive-Single) covers the case of a receive that happens after all prior receives. The formulation for send events is orthogonal to receives. See rules (Send-Multiple) and (Send-Single).

High message contention means that for there is high number of elements in either or .

4.2. Send on Closed

We wish to check if there is a schedule where a send operation attempts to transmit to a closed channel. We assume that this bug did not arise for the given program run. Hence, we use the vector clock information to test for a send operation that either succeeds or is concurrent to a close operation. In terms of , we check for events , where either succeeds or and are incomparable.

The epoch optimization applies here as well. We maintain the list of concurrent send operations as in case of message contention. Instead of the list of receives, we only record the vector clock of the close operation. Each time we update the list of sends, we check that none of the send epochs succeeds or is concurrent to the close.

As discussed in Section 3.7, the vector clock annotation obtained for events may be non-deterministic. Hence, we may miss a send on closed channel bug depending on the schedule. The advantage of our method is to explore alternative schedules to reveal such hidden bugs.

Example 4.2 ().


We assume a program run where first the main thread executes and then the other thread. This yields the following run-time trace.

In our approach, we can explore a different schedule by processing thread 2 (after processing of signal/wait). Then, the close operation appears to be concurrent to the send in the main thread.

4.3. Alternative Communications

We refer to a match pair as a pair of vector clock annotated events where is a sender and is the matching receiver over a common channel . For an unbuffered (synchronous) channel, we have that and . That is, their post vector clocks are synchronized. For a buffered channel, we have that and where .

Match pairs can be directly computed during trace replay as their underlying post events are uniquely connected via the sender’s thread id and program counter. We assume is the number of thread-local run-time traces and is the sum of the length of all thread-local run-time traces. For each match pair we compute alternative communications. For each sender, we count the number of concurrent receives and for each receiver the number of concurrent sends where for each candidate we test if the pre vector clocks are incomparable. We assume the comparison test among vector clocks takes constant time. That is, . For each match pair, there can be at most candidates. Hence, computations of alternative communications for a specific trace replay run takes time . Our experiments show that alternatives can be computed efficiently as we can use the thread’s vector clock to prune the search space for candidates.

Via a similar method, we can compute the number of alternatives for each not selected case of a select statement. We refer to this analysis scenario as ASC.

4.4. Deadlock Recovery

We consider the scenario where program execution results in a deadlock. Via a similar method as described for alternative communications, we can search for potential partners for dangling events.

Example 4.3 ().

Recall the example from the introduction.

We assume a deadlock because (1) synchronizes with (2). Based on the pre vector clock information of the event resulting from (3), we can feedback to the user that the deadlock could possibly be resolved assuming (3) synchronizes with (1).

There are cases where no alternatives can be provided.

Example 4.4 ().

Consider the classic example of a deadlock due to reversed ‘lock’ order where we model a mutex via a buffered channel.

Our analysis reports that no alternatives exist.

The interpretation of analysis is left to the user. We believe the information provided are highly useful in gaining further insights into the (deadlock) bug.

5. Experiments

We have a built a prototype in Go. A snapshot of our implementation including all examples used for experimentation can be accessed via

Our implementation includes analysis methods and optimizations discussed in the earlier sections. Experiments are conducted on a Intel i7 6600U with 12GB RAM, SSD and Windows 10. The results are shown in Figure 5.

5.1. Implementation

The toolchain, entirely implemented in Go, consists of three parts. (1) Instrumentation. (2) Execution. (3) Analysis. In the first part, we instrument the source code to emit pre and post events. As we need to provide an additional argument for channels and channel operations, we need to access the channel’s type. For this purpose, we make use of go-parser to update the AST by updating the channel’s type to an anonymous struct that contains a field for the necessary thread information and the original type of the channel as a value field. The instrumentation follows the scheme outline in Figure 2. Additionally the main function is instrumented to start and stop the tracer. This is necessary to ensure that all occurred events are written to the trace since Go programs terminate as soon as the main thread exits. The tracer uses a separate thread that receives the events through a buffered channel and writes them to the trace which is necessary in case of a deadlock in the program. The separate thread will write all events stored in the channel buffer to the trace before it waits for new messages which will trigger the occurred deadlock with a small delay. After execution, we apply the trace replay method described previously.

5.2. Examples

For experimentation, we use the following examples. They consist of some real-world examples as well as our own examples to highlight certain aspects of our approach.


is a parallel gzip compression/decompression written in Go. It splits the file in several blocks that are send through a buffered channel where the worker threads can collect and compress them. After compressing a block it is send through another channel to a thread that collects the blocks to write them to a file. pgzip makes typical use of synchronous and asynchronous channels to either transfer data or to send signals to other threads like ‘abort’ by closing specific channels. We intend to test if the collection always has to happen in a fixed order and for sends on closed channels due to the way ‘aborts’ are implemented. For the test we compress a 8mb file. (


performs parallel, pipelined executions of a single HTTP ‘GET’ to improve the download speed. It distributes the work to multiple threads that perform a part of the download and collects the finished blocks in a separate thread. It uses a different scheme to recollect the blocks that we test with our analyses. For the test a 8mb download was used. (


is a digital signal processing package for Go where we test the par