Event-driven programming is a popular paradigm in which control flow follows the order of events. The essence of the paradigm is the flexible association between user-defined event handlers and events, such as user interface or operating system actions. When an event is emitted, all event handlers that have been registered for it are eligible to be invoked by the event loop.
Flexibility comes from the fact that event handlers are invoked asynchronously. This asynchrony causes complexity in reasoning about event-driven programs in the presence of mutable state: consider the example of a global variable initialized by one event handler and used by another. The order in which the event handlers are invoked is critical for correctness, but the ordering constraints are not explicit; responsibility for the ordering is imposed on the programmer.
To reason about event-driven programs, a static analysis must model the execution of the event loop. A conservative—but imprecise—approach is to assume that any handler can be invoked in any order, ignoring any run-time constraints. Work by Madsen et al.  avoids such imprecision by using a notion of context sensitivity in which a context abstracts the set of event handlers registered and the set of events emitted. The resulting context-sensitive call graphs can distinguish, e.g., program states where no events have been emitted and program states where an event has been emitted, resulting in a more precise analysis of event-driven programs. Unfortunately, the number of contexts is exponential in the size of the program, so the analysis does not scale.
We propose a technique to write static analysis algorithms without considering the ordering of events and registrations, and then translate them automatically into algorithms that filter out infeasible paths. We leverage two established static analysis frameworks, the Interprocedural Finite Distributive Subset (IFDS) framework introduced by Reps et al.  and the Interprocedural Distributive Environment (IDE) framework of Sagiv et al. . These frameworks have been used on a variety of practical problems, including taint analysis , and a number of solvers are available [1, 2, 5, 9].
The IFDS framework solves interprocedural dataflow problems whose domain consists of subsets of a finite set , and whose dataflow functions are distributive, and it computes a meet-over-valid-paths solution in polynomial time. Any static analysis that can be expressed in this framework is a candidate for our approach. Unfortunately, IFDS cannot enforce constraints on the execution order of event handlers. To overcome this limitation, our approach automatically translates an arbitrary IFDS analysis into an IDE analysis.
The IDE framework generalizes IFDS by using environments as dataflow facts, i.e., maps from some finite set to some lattice of values , and distributive environment transformers as dataflow functions. Like IFDS, IDE problems can be solved efficiently. If the IFDS algorithm computes facts in that hold along interprocedurally valid paths, then the IDE algorithm computes values from along those paths. Our approach associates dataflow functions to edges associated with events and event handlers, so that the composed transfer functions filter out dataflow facts reachable only along infeasible paths.
2 Motivating Examples
We briefly discuss the workings of the application. First, the fs module is loaded (line LABEL:line:RequireFs), making various file-related operations available as methods on an object assigned to variable fs. Next, variable sum is declared, but not initialized (line LABEL:line:declare_sum). Line LABEL:line:readdir calls readdir to read the contents of the current directory, with two arguments: a path to the directory that is to be read and a callback function, f. f is asynchronously invoked with two arguments, err and files, where err is either null or undefined if the operation completes successfully or an error object otherwise, and files is an array containing the names of the files in the directory.
When f is invoked, it checks if an error occurred (line LABEL:line:checkForErrors1). If not, it initializes sum to (line LABEL:line:initializeSum), and uses the built-in forEach function to iterate through all names in array files (line LABEL:line:ForEach). forEach takes a callback, g, that is invoked synchronously for each array element, binding it to variable file. For each file name, the function stat is invoked to access some properties of that file (line LABEL:line:stat). The second argument passed to stat is a callback, h, that is asynchronously invoked with two arguments, err and stats, where stats is an object containing information about the current file. When h is invoked, it retrieves the size of this file, stores it in variable sz (line LABEL:line:readFileSize), and adds it to sum (line LABEL:line:addToSum). Then, it prints information about the current file (lines LABEL:line:FsPrint1–LABEL:line:FsPrint2). Lastly, the application prints ‘done’ (line LABEL:line:FsDone).
Execution behavior. Executing the program in a directory containing, in addition to the script itself, a file f1 of size 100 and a file f2 of size 50, prints:
done\n dirstat.js 428\n sum 428\n f1 100\n sum 528\n f2 50\n sum 578\n
Note that ‘done’ is printed first, because the callback f registered by readdir does not execute until after the top-level code has finished executing.
Static analysis. Suppose that we want to perform a dataflow analysis to determine potentially uninitialized variables. This problem can be expressed in terms of a domain consisting of subsets of a finite set (in this example, the set of possibly uninitialized variables), and using dataflow functions that are distributive, so a meet-over-valid-paths solution can be computed in polynomial time using the IFDS framework . The defining characteristic of IFDS is that it avoids imprecision that would arise from considering data flow along control-flow paths in which function calls and function returns are not matched up properly.
However, suppose the analysis considers the control-flow path shown in bold in fig. 1, where execution of top-level code is followed by execution of h, without ever calling f. On this path, sum is referenced on line LABEL:line:addToSum without having been initialized, so a traditional IFDS-based analysis will report that sum is possibly uninitialized on line LABEL:line:addToSum. In reality, this path is infeasible because h cannot be invoked asynchronously before being registered during execution of f. Furthermore, since f initializes sum and registers callback h (recall that g is invoked synchronously by forEach), and h cannot be invoked until after f has finished executing, sum is guaranteed to be initialized when h executes.
This paper presents a technique for improving the precision of IFDS-based analyses by taking into account the order in which callbacks can execute. Our approach involves transforming the original IFDS problem into an IDE problem  by associating dataflow functions with edges corresponding to event handler registration and event handler invocation. The transfer function obtained by composing the functions along a control-flow path reflects that path’s feasibility, thus effectively “filtering out” dataflow facts if the path is infeasible.
Explicit emission of events. Figure 2 illustrates a more complex scenario where the EventEmitter class of the Node.js events package is used to model a door that responds to open and close events. On line LABEL:line:RegisterHandleOpen, function hdlOpen is registered to handle the open event on door, and on line LABEL:line:RegisterHandleClose, hdlClose is registered to handle the close event. To trigger event handlers, an event must be emitted using the emit method.
In the ICFG, several call sites invoke library functions such as on and emit, while the library invokes hdlOpen and hdlClose. No ordering exists between the libraryhdlOpen and libraryhdlClose edges, so a traditional analysis assumes that these event handlers may execute in an arbitrary order. In particular, the path shown in bold is admitted, but it is infeasible because it entails hdlClose executing before close is emitted.
To understand the impact of imprecision, we again consider an analysis that looks for uninitialized variables. If the analysis considers the infeasible path, it concludes that txt.concat(...) may take place at a time when txt is uninitialized. This is a false positive because it is impossible for hdlClose to execute before being registered or before the close event is emitted.
For this example, we would like to rule out the path marked in bold by tracking three operations associated with each event handler: (i) when an event handler is registered for an event, (ii) when the event is emitted, and (iii) when the event handler is invoked. Infeasible paths will be filtered out if operation (i) does not happen before operation (iii), and if operation (ii) does not happen before operation (iii). To do so, we will determine the possible sequences of these operations associated with each dataflow fact, and filter out those dataflow facts associated with infeasible sequences. Note that in the file system example discussed previously, emit operations are not explicitly present in the application source code, so it can be viewed as a special case of the more general scenario discussed here.
Our technique takes as input an instance of the IFDS framework and outputs an instance of the IDE framework. In this section, we provide some background about these frameworks.
IFDS background. The IFDS framework  is applicable to interprocedural dataflow problems whose domain consists of subsets of a finite set , and whose dataflow functions are distributive (i.e., is distributive if and only if ). It has proven to be sufficiently expressive and efficient to accommodate classical dataflow problems such as the possibly uninitialized variables problem illustrated in fig. 2, but also more complex problems such as taint analysis  and typestate analysis [4, 10].
An IFDS problem instance is defined as , where:
is the ICFG of the input program, called the supergraph;
is a finite set of dataflow facts;
is a set of distributive dataflow functions;
maps supergraph edges to dataflow functions; and
is the meet operator on the powerset (either union or intersection).
The IFDS framework computes in polynomial time the meet-over-valid-paths solution,222Following the IFDS and IDE literature, throughout this paper, we use the lattice meet operation, , to merge dataflow facts when control-flow paths merge. Thus the top element, , of a lattice represents an unreachable state and the bottom element, , means that all concrete states are possible. , of the dataflow constraints, where each node is mapped to a set of dataflow facts. A valid path respects the fact that, when a function finishes executing, it returns to the call site from where it was invoked. denotes the set of all valid paths from the start of the program to node . Formally, the meet-over-valid-paths solution is defined as
where is extended to paths so that .
The key insight behind the IFDS algorithm is that any distributive function can be represented as a bipartite graph with nodes, with edges from one instance of to another instance of ; fig. 3 illustrates an example. Formally, the representation relation, , of a distributive function , is defined as follows:
The edges of the representation relation are sufficient to uniquely determine for any subset , since by distributivity . Also, the meet and composition of two distributive functions can be computed and represented as bipartite graphs, as shown in fig. 4:
IFDS represents a given problem instance as an exploded supergraph, , where:
In essence, each node of the supergraph has been “exploded” into a set of nodes , where each is a dataflow fact (or 0), and each edge becomes the set of edges from the representation relation , where is the dataflow function assigned to . In this graph, a node is reachable from the start node if and only if fact holds at statement .
The algorithm works by iteratively composing a dataflow function for an existing control-flow path with the dataflow function for an additional instruction, thus yielding a dataflow function for a longer path. Once a path covers an entire procedure, its dataflow function becomes a summary function for the procedure and is used to model the effect of the procedure at its call sites.
As discussed informally in section 2, we can encode event handling in the supergraph by modeling an event loop that nondeterministically calls all event handlers. Such an encoding is sound but imprecise, because it ignores the order in which event handlers are called and admits infeasible paths that include handling of events before the handler has been registered or the event has been emitted.
IDE background. The IDE framework  generalizes IFDS to interprocedural distributive environment problems, in which dataflow facts are environments, i.e., maps in from a finite set to a finite-height lattice , and dataflow functions are environment transformers in that distribute over the meet operator of the map lattice . In other words, environments are values from the map lattice , which is lifted from the lattice : the top element is where is the top element of , and for two environments in , .
Formally, an IDE problem instance is defined as , where:
is the supergraph of the input program;
is a finite set of program symbols, e.g., variables;
is a finite-height lattice with top element ; and
is a function that assigns environment transformers to supergraph edges.
IDE computes the meet-over-valid-paths, , of the environment transformers, similar to IFDS. At each node in the supergraph, IFDS computes only the presence or absence of each element of the dataflow domain; however, IDE computes for each an element of the lattice . Thus, IFDS is a special case of IDE in which is fixed to be the two-point lattice, with indicating absence and indicating presence of . Intuitively, one can think of the IDE algorithm as computing facts in that hold along interprocedurally valid paths while simultaneously propagating and computing values from along those paths. Formally, the meet-over-valid-paths solution is defined as
where is extended so that .
An IDE dataflow function in , i.e., a distributive environment transformer, can be encoded as a pointwise representation, using a bipartite graph with nodes. The nodes are the same as in an IFDS representation relation, but each edge is labeled by , a function in called a micro-function. By distributivity, such a set of micro-functions is sufficient to represent an environment transformer , since .
Pointwise representations are also closed under meet and composition, as shown in fig. 4. The meet of two representations and is the union of edges of and , where the micro-function for a shared edge in is the meet of the two micro-functions of that edge in and . The composition of two representations is computed by connecting the two graphs and composing micro-functions along paths in the resulting graph. Therefore, an instantiation of the IDE framework requires an efficient representation of micro-functions as well as an efficient implementation of their composition, meet, and equality test.
The IDE algorithm represents a given problem instance as a labeled exploded supergraph , with each edge labeled by a micro-function . The labels are given by a function . To compute the meet-over-valid-paths solution over the labeled exploded supergraph, the IDE algorithm requires two phases. The first phase is similar to IFDS, iteratively composing bipartite graphs for control-flow paths of increasing length; this determines which nodes are reachable. The second phase applies the composed micro-functions to determine, for each node , the value that is mapped to.
In our approach, we take the IFDS exploded supergraph as input and produce an IDE labeled exploded supergraph by assigning micro-functions to exploded supergraph edges. For a program with a single event handler, we use the lattice to keep track of the event handler registrations and event emissions that have taken place on each control-flow path. To support multiple event handlers, we use the map lattice , where is the set of event handlers in the program and is the lattice for a single event handler. This allows us to track the registration and event emission for each event handler in the program.
Our technique is a transformation of an arbitrary instance of the IFDS analysis framework into an instance of the IDE analysis framework. The IDE solution encodes the same dataflow facts as the IFDS solution, except that it excludes dataflow facts reachable only along infeasible paths.
The input to our technique, an instance of the IFDS framework, is expressed as an exploded supergraph , which encodes the ICFG of the program under analysis, the dataflow analysis, and the transfer functions for that analysis. The output of our technique, an instance of the IDE framework, is a labeled exploded supergraph where EdgeFn assigns micro-functions in to each edge of the exploded supergraph.
The key idea of our transformation is to augment the exploded supergraph with an encoding of event handler operations. We do this by encoding event handler operations as micro-functions on the edges of the exploded supergraph. Our technique does not change the nodes or edges of the exploded supergraph; it only assigns micro-functions to the edges of that graph. Therefore, it does not change the ICFG, the base dataflow analysis, or its transfer functions.
Intuitively, an IFDS analysis asks which elements are present at node of the supergraph, while an IDE analysis asks what lattice value is associated with element at node . In our technique, the lattice encodes event handler state: if an element at node maps to an infeasible event handler state, then we conclude that at node , should be excluded from the results.
By solving this IDE instance, we achieve the effect of eliminating dataflow facts that are reachable only along infeasible paths. In the rest of this section, we describe how we encode event handler operations as micro-functions, and how we transform an IDE solution back to an IFDS solution. We also discuss theoretical properties of our technique.
4.1 Representing event handler state
For simplicity of presentation, we restrict our attention in this subsection to programs with a single event handler. We generalize to multiple event handlers in the next subsection. We define three possible states for an event handler:
- S (start):
the event handler has not yet been registered.
- R (registered):
the event handler has been registered for the event, but the event has not yet been emitted after registration. (Events emitted before registration are ignored.)
- E (emitted):
the event handler has been registered and the event has been emitted after registration.
These states model the event handler during an actual program execution. They are distinct from the event handler operations (event handler registration, event emission, and event handler invocation) we discussed in section 2, which cause transitions between the states. For example, an event handler is initially in the start (S) state. When the handler is registered, then its state becomes the registered (R) state. When an event associated with that handler is emitted, the state becomes the emitted (E) state. Only in this state can the handler be invoked from the event loop; the handler can never be invoked from any other state. These transitions are summarized in fig. 5.
To model this state machine in a static analysis, we need a fourth state, infeasible (X). Invoking the event handler from the start (S) state (before handler registration) or registered (R) state (before event emission) can never happen at run time, but such an ordering may arise during the analysis, so we must identify it as an infeasible path. We use the IDE algorithm to keep track of event handler state and rule out data flow along infeasible paths.
Specifically, we define to be the chain lattice over the set with the ordering , as depicted in fig. 5. The lattice elements , , and indicate the corresponding states of the event handler, and the top element indicates the infeasible state, i.e., the dataflow fact has traversed a control-flow path that was infeasible.
Recall that the IDE algorithm maps a dataflow fact to the top element of to indicate that the fact does not hold at the given program point. The ordering between the four elements is designed to model the behavior at control-flow merge points: when two control-flow paths merge, the associated event handler state after the merge is the lesser of the two states before the merge. For example, if one control-flow path has passed through an infeasible sequence of operations (X) and the second control-flow path has passed through a feasible sequence of operations that results in the event handler being registered but not emitted (R), then after the control-flow merge, the event handler is in state ; it may have been registered but not emitted (R).
At the main entry point of the program, the event handler is defined to be in the start (S) state for each fact that holds at the entry point.333Normally, the IDE algorithm initializes every fact to the top element, i.e., . In this case, we could label edges leaving the entry point with micro-functions that update every fact to S. However, for convenience, we simply initialize every fact to S. As dataflow facts are propagated during the analysis, we track event handler state with IDE micro-functions, encoding the state machine transitions along each edge of the exploded supergraph. The default micro-function along most edges is the identity, indicating that the event handler state does not change. The other micro-functions are defined in table 1 and correspond to the operations discussed in section 2.
Event handler registration. The first micro-function labeled edge is a control-flow edge that represents an event handler registration operation. For example, the control-flow edge from door.on(’open’, ...) to the library in fig. 2 causes the event handler to transition from the start state to the registered state. If the event handler is in any other state, then the registration is ignored. We define the micro-function for this edge in table 1, first column.
Event emission. The second micro-function labeled edge is a control-flow edge that represents event emission. An example is the edge from door.emit(’open’) to the library: the handler associated with the open event transitions from the registered state to the emitted state. In all other cases, the event emission is ignored. We define the micro-function for this edge in table 1, second column.
Event handler invocation. The third micro-function labeled edge is a control-flow edge that represents event handler invocation. Examples of these edges are from the library to the start nodes of both event handlers. If the handler is not in the emitted state, then it transitions to the infeasible state because the handler is being invoked before it has been registered or its event has been emitted. We define the micro-function for this edge in table 1, third column.
Discussion. The transformation converts an instance of the IFDS framework to an instance of the IDE framework. It does not change the structure of the exploded supergraph, , but it provides , an assignment of exploded supergraph edges to micro-functions. For programs with a single event handler, EdgeFn is defined as follows:
Returning to our example in fig. 2, consider the execution path that is actually taken at run time: the door opening event handler is registered by door.on(’open’, ...) on line LABEL:line:RegisterHandleOpen, the door opening event is emitted by door.emit(’open’) on line LABEL:line:EmitOpen, and the door opening event handler is invoked by the edge from the library to startopen.
For this control-flow path, the analysis computes the composition of the micro-functions, namely . Applying this composed function to the initial state, we have , so any data flow associated with this path is considered feasible.
On the other hand, consider a control-flow path in which the event handler is registered and invoked, but the event is never emitted. The composed micro-function for such a path is , so we have . Thus, any data flow computed along that path is considered infeasible.
Recall that an instantiation of the IDE framework requires an efficient representation of micro-functions and an efficient implementation of their composition, meet, and equality test. A micro-function can be efficiently represented as a table of the four values . Since there are only possible such functions, compositions and meets of micro-functions can be precomputed, and only 8 bits are required to represent a micro-function.
4.2 Multiple event handlers
For programs with multiple events and multiple event handlers, it is necessary for the analysis to distinguish them. In fig. 2, the control-flow path that registers the hdlOpen event handler, emits the open event, and invokes the opening event handler is feasible. However, the path that instead invokes the hdlClose handler should be infeasible, because the door closing event handler has never been registered and the close event has never been emitted. Our solution is to maintain a separate state for each event handler.
Thus, we define the IDE lattice to be the map lattice , where is the set of event handlers in the program and is the lattice for a single event handler that we discussed in the previous subsection. For each node in the exploded supergraph, the IDE algorithm using lattice computes a map that assigns a separate state for each event handler in the program.
Recall that the IDE framework requires an efficient representation of micro-functions in , which in this case is . Efficiently representing such functions is non-trivial. There are possible functions of this type, so any representation that could encode all of them would require bits to encode each one. The key to an efficient encoding is the observation that all of the micro-functions that actually occur during an analysis, including their compositions and meets, are separable, in that the effect of an operation on the state of one event handler is independent of the states of other event handlers before the operation. In other words, the state that an event handler transitions to depends only on that handler’s previous state, and not the state of any other event handler.
Each separable micro-function can thus be represented by a function in that models the effect of an operation on each event handler in separately. We discussed in the previous subsection how to efficiently represent a function in . Now, to represent a micro-function in , we need only to tabulate functions of type , one for each event handler in . The operations by the IDE framework, composition, meet, and equality comparison, are computed pointwise, separately for each event handler. Effectively, a micro-function in is represented by a map of event handlers to micro-functions in . Note that this representation of micro-functions and the required operations adds a factor of to the asymptotic complexity of the IDE algorithm.
The version of that supports multiple event handlers is therefore defined as:
We use the subscript to indicate that a micro-function updates only the state assigned to , and not the state of any other event handler. (Note that the default micro-function, id, does not update any state.) In an implementation, EdgeFn must also be able to determine which handler is affected by each edge in the exploded supergraph.
4.3 Transforming IDE results to IFDS results
When IDE finishes analyzing a program, its output is, for each program point, a map from elements of to elements of . To convert this output to a result for the original IFDS problem, we must identify, at each program point, the subset of elements of that are reachable along feasible paths. In our context, a path is feasible if, for every event handler, the operations affecting that event handler along the path are in a feasible sequence (e.g., the handler is not invoked before it is registered or its event emitted). In other words, a path is feasible if the element of computed by the IDE analysis maps every handler to a state other than X. Formally, we define an “untransform” function that converts an IDE result to an IFDS result:
In fig. 2, on the control-flow path that first passes through door.on(’open’, ...) and then through door.emit(’open’), the micro-function for that path is , which computes the event handler state mapping . If that path then continues into hdlOpen, the event state will remain at , and thus the analysis will conclude that the path is feasible.
However, if the path continues into hdlClose instead, the composed micro-function becomes , which computes the event handler state mapping . Since at least one handler is in state X, the analysis will conclude that this path is infeasible and discard all dataflow facts computed along this path.
4.4 Theoretical results
Soundness and precision. Our transformation is sound: the IDE analysis considers all feasible dataflow paths, i.e., the ones that occur during a program execution. Any dataflow fact that IFDS computes along a concrete path will be returned by our technique.
Theorem 4.1 (Soundness)
Let be an IFDS problem, be a concrete execution path, and be a dataflow fact. Then:
Our transformation is precise: the IDE analysis returns a subset of the dataflow facts that would be computed by IFDS. Dataflow facts computed along infeasible paths are not included in the result of our transformation.
Theorem 4.2 (Precision)
Let be an IFDS problem and be any node in the supergraph. Then:
Efficiency. As discussed by Reps et al. [12, sec. 5], the asymptotic complexity of solving an IFDS problem instance is . An equivalent IDE problem instance also requires time to solve, provided that the micro-functions have an efficient representation [13, def. 5.2]. Our representation of micro-functions adds a time and space overhead of . Therefore, the asymptotic complexity of the event-driven IDE analysis is .
5.1 Uninitialized variables analysis as an IFDS problem
5.2 Transforming to an IDE problem
In order to produce more precise results, Borges transforms IFDS problems into IDE problems that track the operations associated with each event handler, as well as each handler’s state. Information about which function calls correspond to which event handler operations must be provided to Borges as an event model specification, which also indicates the argument that represents the event name and the argument that represents the event handler. Using this information, Borges can identify which call sites involve event handler operations.
For example, the program in fig. 2 uses the Node.js events library. Applying static analysis to complex libraries poses challenges that are beyond the scope of this paper, and our approach to handle library-based applications is to provide a stub that models the library’s essential functions and control flow. In the stub for the events library, we provide the functions on, emit, and _eventDispatcher. The event model specifies that a call to on (e.g., on(’open’, hdlOpen)) registers the second argument (hdlOpen) as an event handler on the event given as the first argument (open), a call to emit (e.g., emit(’open’)) emits the event given as its argument (open), and a call from inside the library (specifically, from _eventDispatcher) invokes an event handler.
Using this information, along with the output from TAJS, Borges constructs a mapping of event handler registrations that happen in a program. For each edge in the control flow graph, Borges can identity whether it affects event handler state (i.e., through a registration, event emission, or invocation), and if so, which event name and event handler is involved. Furthermore, Borges also computes a mapping from event names to event handlers, to easily identify which handler responds to a given event emission.
The transformation from an IFDS problem to an IDE problem is straightforward. Recall that the IFDS algorithm uses an exploded supergraph to represent dataflow functions, while in the IDE algorithm, EdgeFn assigns a micro-function to exploded supergraph edges. Borges provides such an implementation of EdgeFn to determine the micro-function for a given edge and event handler. For instance, the edge representing a call to register(’open’, hdlOpen) is labeled with the register micro-function for the hdlOpen handler.
With all the exploded supergraph edges labeled, solving the IDE problem computes the composition of all the micro-functions along a control-flow path, taking the meet whenever multiple paths merge. In other words, when computing dataflow facts for the possibly uninitialized variables analysis, Borges also maintains the event handler states. Thus, before reporting a final result for each program point, Borges can examine the states of each event handler and filter out any result with an event handler in the infeasible state.
6 Case Studies
In this section, we discuss three examples to demonstrate our approach. We return to the file system example in section 2 and briefly discuss two other programs. We run Borges on three small, event-driven Node.js applications, and apply our transformation to a possibly uninitialized variables analysis.
File system module, revisited. Recall fig. 1, where sum is read without being initialized, but only along an infeasible path. Borges can improve precision by considering the order in which callbacks are executed. Specifically, the calls to readdir (line LABEL:line:readdir) and stat (line LABEL:line:stat) are registration operations for the f and h callbacks, respectively. However, the emission operation is implicit and happens from within the event loop. Since event emission happens after event handler registration but before event handler invocation, we model it as occurring immediately after registration. In other words, the micro-function labeling the calls to readdir and stat is . Finally, invocations of f and h are invocation operations, which correspond to the micro-function invoke.
When Borges analyzes the application, it identifies two paths with respect to the callbacks. In one path, readdir is called, f is invoked, stat is called, and h is invoked. The composition of micro-functions along this path is , which computes the event handler state mapping , meaning the path is feasible.
However, in the infeasible path where readdir is called and then h is invoked, the composed micro-function is , which computes the event handler state , meaning the path is infeasible. Therefore, any results computed along this path are filtered out.
Timers module. Figure 6 implements a simple timer. It is similar to the file system example, as it has two callbacks that can be executed only in a certain order. The application prompts the user for a number and then counts down from that number in one-second intervals. It uses the timers module, whose functions are defined in the global scope.
Because the callbacks start (line LABEL:line:stdinCallback) and tick (line LABEL:line:setTimeoutCallback) are invoked asynchronously, a traditional static analysis might consider an execution path where tick is executed before start, and conclude that rem is possibly uninitialized when it is read on line LABEL:line:readRemaining. However, this is an infeasible path: tick is only registered as a callback by start and itself, so it can be invoked only after start has finished executing. As a result, Borges labels the execution path with the micro-function and computes the event handler state mapping as .
Without an ordering constraint between the lstn (line LABEL:line:serverListen) and conn (line LABEL:line:serverConnected) callbacks, a traditional analysis might consider infeasible paths, e.g., where conn is invoked before lstn. Along this path, the analysis concludes that nConn on line LABEL:line:incrementNrConnects is possibly uninitialized. However, conn can be executed only after lstn finishes, which guarantees that nConn is initialized. In Borges, such a path would be labeled by the micro-function , which computes the event handler state .
7 Related work
Bodden et al.  use the IDE algorithm to enhance the precision of an IFDS analysis when analyzing software product lines. They modify any IFDS analysis into an IDE analysis that runs on the original program and tracks the product line variants in which each dataflow fact holds.
Rapoport et al.  observe that context-sensitive analysis can be made more precise by correlating the dynamic dispatch behavior of different call sites on the same receiver object. They also transform an arbitrary IFDS analysis into an IDE analysis that keeps track of which methods have been dynamically dispatched on each receiver.
Jhala and Majumdar  adapt IFDS for asynchronous programs. In these programs, asynchronous calls are similar to event registrations in that the procedure will be invoked at a later time; however, there are no event emissions, so the time of invocation is unpredictable. In their approach, instead of encoding additional state as an IDE problem, they transform the analysis into a larger IFDS analysis that tracks, at each asynchronous call site, the number of pending asynchronous calls made for which the procedure has not yet been invoked.
Madsen et al.  introduce the event-based call graph, an extension of the call graph that models happens-before constraints between event handler registrations and event emissions. However, their approach does not scale well because the number of contexts is exponential in the size of the program.
Traditional static analyses produce imprecise results when applied to event-driven programs because they assume that event handler callbacks can execute in any order. We have presented an approach for precise dataflow analysis that is applicable to any dataflow problem that can be expressed as an instance of the IFDS framework, and is expressed as a transformation from that presentation to an IDE problem, where the dataflow functions associated with edges in the graph filter out infeasible paths that arise due to impossible sequences of event handler invocations. We prove the correctness of our transformation and report on a proof-of-concept tool.
-  Arzt, S., Rasthofer, S., Fritz, C., Bodden, E., Bartel, A., Klein, J., Le Traon, Y., Octeau, D., McDaniel, P.: FlowDroid: Precise Context, Flow, Field, Object-sensitive and Lifecycle-aware Taint Analysis for Android Apps. In: Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI (2014). https://doi.org/10.1145/2594291.2594299
-  Bodden, E.: Heros IFDS/IDE Solver. https://github.com/Sable/heros, accessed: 2018-10-05
-  Bodden, E., Tolêdo, T., Ribeiro, M., Brabrand, C., Borba, P., Mezini, M.: SPL: Statically Analyzing Software Product Lines in Minutes Instead of Years. In: Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI (2013). https://doi.org/10.1145/2491956.2491976
-  Fink, S.J., Yahav, E., Dor, N., Ramalingam, G., Geay, E.: Effective Typestate Verification in the Presence of Aliasing. ACM Transactions on Software Engineering and Methodology, TOSEM 17(2), 9:1–9:34 (2008). https://doi.org/10.1145/1348250.1348255
-  IBM Research: Watson Libraries for Analysis (WALA). https://github.com/wala/WALA, accessed: 2018-10-05
-  Jhala, R., Majumdar, R.: Interprocedural Analysis of Asynchronous Programs. In: Proc. ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL (2007). https://doi.org/10.1145/1190216.1190266
-  Madsen, M., Yee, M.H., Lhoták, O.: From Datalog to Flix: A Declarative Language for Fixed Points on Lattices. In: Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI (2016). https://doi.org/10.1145/2908080.2908096
-  Naeem, N.A., Lhoták, O.: Typestate-like Analysis of Multiple Interacting Objects. In: Proc. ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA (2008). https://doi.org/10.1145/1449764.1449792
-  Rapoport, M., Lhoták, O., Tip, F.: Precise Data Flow Analysis in the Presence of Correlated Method Calls. In: Proc. Symposium on Static Analysis, SAS (2015). https://doi.org/10.1007/978-3-662-48288-9_4
-  Reps, T., Horwitz, S., Sagiv, S.: Precise Interprocedural Dataflow Analysis via Graph Reachability. In: Proc. ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL (1995). https://doi.org/10.1145/199448.199462
-  Sagiv, S., Reps, T., Horwitz, S.: Precise Interprocedural Dataflow Analysis with Applications to Constant Propagation. In: Proc. Conference on Theory and Practice of Software Development, CAAP/FASE (1995). https://doi.org/10.1007/3-540-59293-8_226
Appendix 0.A Proofs
Our work is based on the work by Rapoport et al. , which also transforms a given IFDS problem instance to an IDE analysis that eliminates dataflow facts computed along infeasible paths.
In this section, we assume that and its exploded supergraph representation, , is the base IFDS problem instance given to our transformation, and that and its labeled exploded supergraph representation, , is the IDE problem instance defined in section 4; in particular, lattice is the map lattice where is the event handler state lattice. Finally, to simplify some notation, we write the edge for each . Note that .
0.a.1 Soundness and Precision
Recall that in the IDE definition, we used to denote the top element of the environment lattice, i.e., the environment that maps every element to . We also defined the meet-over-valid-paths solution for an IDE problem as . However, for the event-driven analysis, the initial state is rather than . Thus, the meet-over-valid-paths solution for the event-driven analysis is:
To prove the soundness and precision theorems, we require two lemmas.
Let be a concrete execution trace of some program, and let be an event handler in the program. If at node of the trace , handler is in state , and is a dataflow fact such that , then .
Intuitively, the lemma states that the event-driven analysis over-approximates event handler state in a program execution. Note that is a concrete state, so it cannot be X.
By induction on the length of the program trace.
Base case: . There is no instruction (edge)
in the trace, so there is no dataflow fact . Therefore, the lemma trivially
Induction hypothesis: Let and let
, i.e., is the abstract state
computed by the event-driven analysis for the execution trace , is
some dataflow fact in , and is some event handler.
Suppose the lemma holds for trace , i.e., where
is the concrete state for handler at node after the trace .
Induction step: Now consider . Let be the concrete state for handler at node after the
trace . We must now show .
Because is extended from edges to paths by composition, we can rewrite:
Note that computes the environment at node after the
trace , which is then transformed by to get the
environment at node , a single node after the trace , which is a
map from . Thus,
returns a map from handlers to event handler states.
Now, recall that for a given environment , the IDE framework represents an environment transformer as a set of micro-functions in :
For an edge , gives the environment transformer for that edge, and for , gives the corresponding micro-functions:
By substitution, we can rewrite:
This gives us the inequality:
The inequality compares two different ways of computing the state of handler
for dataflow fact at node (after the trace ). On the
right-hand side, the entire environment at node (after the trace )
is transformed by , and then the state of handler is
obtained from the new environment. On the left-hand side, at node (after
the trace ), a map of event handlers to states (i.e., an element of the
lattice ), is obtained for some dataflow fact and then
updated by the micro-function , before getting the state mapped to handler . The
inequality states that the left-hand side is more precise than the right-hand
side; intuitively, this is because the left-hand side takes the effect of a
single micro-function, while the right-hand side takes the effect of merging all
It remains to show to complete the proof. To simplify notation, let be the map of event handlers to states, as computed by the IDE algorithm along path for dataflow fact . Note that . We proceed by considering the four cases of EdgeFn and how the micro-functions update the map .
is an edge that registers handler , so the
micro-function is .
The micro-function for this edge updates the state for handler : if is in state S, then will be in state R. Otherwise, the state is unchanged. The concrete state of handler at node is state , which cannot be X, so there are three possibilities:
If , then edge registers handler , so we get the new concrete state . By the induction hypothesis, , so at node , is mapped to S, R, or E. In each of those cases, , so the lemma holds.
If , then the event handler has already been registered, so the state is unchanged and . By the induction hypothesis, , so at node , is mapped to R or E. In both of those cases, , so the lemma holds.
If , then the event handler has already been registered (and its event has been emitted), so the state is unchanged and . By the induction hypothesis, , so at node , is mapped to E. In this case, , so , and the lemma holds.
is an edge that emits an event for handler , so
the micro-function is .
The micro-function for this edge updates the state for handler : if is in state R, then will be in state E. Otherwise, the state is unchanged. The concrete state of handler at node is state , which cannot be X, so there are three possibilities:
If , then the event emission is ignored, so . By the induction hypothesis, , so at node , is mapped to S, R, or E. In each of those cases, , so the lemma holds.
If , then the handler can respond to the event, so we get the new concrete state . By the induction hypothesis, , so at node , is mapped to R or E. In both of those cases, , so the lemma holds.
If , then the state is unchanged, so . By the induction hypothesis, , so at node , is mapped to E. In this case, , so , and the lemma holds.
is an edge from the event loop to handler
, so the micro-function is .
The micro-function for this edge updates the state for handler : if is in state E, then the state is unchanged. Otherwise, the state will be X. The concrete state of handler at node is state , which cannot be X, S, or R. X never occurs during a concrete execution. S is not possible because it means the event handler has not been registered, so invocation cannot occur. R is not possible because it means the event has not been emitted, so invocation cannot occur. Therefore, . By the induction hypothesis, , so at node , is mapped to E. In this case, , so , and the lemma holds.
is any other edge, so the micro-function is
The micro-function does not update the state of handler . Similarly, in the concrete execution, there is no event handler operation on this edge, so . By the induction hypothesis, , and , so and the lemma holds. ∎
Let be a concrete execution trace of some program, be an event handler, and be a dataflow fact. Then:
Intuitively, the lemma states that for a concrete execution path, the event-driven analysis never computes an infeasible event handler state.
direction. By induction on the length of the
Base case: . There is no instruction (edge) in the
trace, so there is no dataflow fact . Therefore, the lemma trivially
Induction hypothesis: Let and let
, i.e., is the abstract state
computed by the event-driven analysis for the execution trace , is
some dataflow fact in , and is some event handler.
Suppose the lemma holds for trace , i.e., .
Induction step: Now consider . Let be the concrete state for handler at node after the
trace . We must now show .
From the previous proof, we know:
By the induction hypothesis, for all , so
we know that
is a map where each handler is mapped to S, R, or
E. So we need to examine , the map after being updated by the
micro-function on edge .
Of the four cases, three of them (, , and id) are straightforward. None of these micro-functions map any handler to X. So, for all , we have:
The fourth case is when EdgeFn returns , which will map
to X, unless handler is currently mapped to E. However, along
the concrete execution trace , the last edge corresponds
to an invocation of event handler . This can only happen if has already
been registered and its event emitted. In other words, the concrete state of
must be E. By lemma 1, so and
. Therefore, .
The premise states that after a concrete execution trace , at node and dataflow fact , handler is in a state other than X. In other words, there exists a path in the exploded supergraph to node where holds, so by definition, . ∎
We can now prove the soundness and precision theorems.
Theorem 0.A.1 (Soundness)
Let be an IFDS problem, be a concrete execution path, and be a dataflow fact. Then: