1 Introduction
Eventdriven programming is a popular paradigm in which control flow follows the order of events. The essence of the paradigm is the flexible association between userdefined event handlers and events, such as user interface or operating system actions. When an event is emitted, all event handlers that have been registered for it are eligible to be invoked by the event loop.
Flexibility comes from the fact that event handlers are invoked asynchronously. This asynchrony causes complexity in reasoning about eventdriven programs in the presence of mutable state: consider the example of a global variable initialized by one event handler and used by another. The order in which the event handlers are invoked is critical for correctness, but the ordering constraints are not explicit; responsibility for the ordering is imposed on the programmer.
To reason about eventdriven programs, a static analysis must model the execution of the event loop. A conservative—but imprecise—approach is to assume that any handler can be invoked in any order, ignoring any runtime constraints. Work by Madsen et al. [8] avoids such imprecision by using a notion of context sensitivity in which a context abstracts the set of event handlers registered and the set of events emitted. The resulting contextsensitive call graphs can distinguish, e.g., program states where no events have been emitted and program states where an event has been emitted, resulting in a more precise analysis of eventdriven programs. Unfortunately, the number of contexts is exponential in the size of the program, so the analysis does not scale.
We propose a technique to write static analysis algorithms without considering the ordering of events and registrations, and then translate them automatically into algorithms that filter out infeasible paths. We leverage two established static analysis frameworks, the Interprocedural Finite Distributive Subset (IFDS) framework introduced by Reps et al. [12] and the Interprocedural Distributive Environment (IDE) framework of Sagiv et al. [13]. These frameworks have been used on a variety of practical problems, including taint analysis [1], and a number of solvers are available [1, 2, 5, 9].
The IFDS framework solves interprocedural dataflow problems whose domain consists of subsets of a finite set , and whose dataflow functions are distributive, and it computes a meetovervalidpaths solution in polynomial time. Any static analysis that can be expressed in this framework is a candidate for our approach. Unfortunately, IFDS cannot enforce constraints on the execution order of event handlers. To overcome this limitation, our approach automatically translates an arbitrary IFDS analysis into an IDE analysis.
The IDE framework generalizes IFDS by using environments as dataflow facts, i.e., maps from some finite set to some lattice of values , and distributive environment transformers as dataflow functions. Like IFDS, IDE problems can be solved efficiently. If the IFDS algorithm computes facts in that hold along interprocedurally valid paths, then the IDE algorithm computes values from along those paths. Our approach associates dataflow functions to edges associated with events and event handlers, so that the composed transfer functions filter out dataflow facts reachable only along infeasible paths.
Our main contribution is an automated transformation from IFDS into IDE problems, such that the IDE result solves the original IFDS problem but avoids imprecision due to infeasible paths. We prove our transformation sound and precise. We demonstrate a proofofconcept tool called Borges, which is capable of analyzing small programs in a subset of JavaScript that use eventdriven programming. We report on three case studies on small Node.js programs that use events for asynchronous file I/O, timers, and network I/O. We demonstrate precision improvements in an IFDSbased possibly uninitialized variables dataflow analysis. Our technique is applicable to other frameworks and languages.
2 Motivating Examples
Figure 1 shows an eventdriven JavaScript application that uses the Node.js fs (File System) module. Running the application prints the names and sizes of the files in the current directory, as well as a running sum of their sizes.
We briefly discuss the workings of the application. First, the fs module is loaded (line LABEL:line:RequireFs), making various filerelated operations available as methods on an object assigned to variable fs. Next, variable sum is declared, but not initialized (line LABEL:line:declare_sum). Line LABEL:line:readdir calls readdir to read the contents of the current directory, with two arguments: a path to the directory that is to be read and a callback function, f. f is asynchronously invoked with two arguments, err and files, where err is either null or undefined if the operation completes successfully or an error object otherwise, and files is an array containing the names of the files in the directory.
When f is invoked, it checks if an error occurred (line LABEL:line:checkForErrors1). If not, it initializes sum to (line LABEL:line:initializeSum), and uses the builtin forEach function to iterate through all names in array files (line LABEL:line:ForEach). forEach takes a callback, g, that is invoked synchronously for each array element, binding it to variable file. For each file name, the function stat is invoked to access some properties of that file (line LABEL:line:stat). The second argument passed to stat is a callback, h, that is asynchronously invoked with two arguments, err and stats, where stats is an object containing information about the current file. When h is invoked, it retrieves the size of this file, stores it in variable sz (line LABEL:line:readFileSize), and adds it to sum (line LABEL:line:addToSum). Then, it prints information about the current file (lines LABEL:line:FsPrint1–LABEL:line:FsPrint2). Lastly, the application prints ‘done’ (line LABEL:line:FsDone).
Execution behavior. Executing the program in a directory containing, in addition to the script itself, a file f1 of size 100 and a file f2 of size 50, prints:
done\n dirstat.js 428\n sum 428\n f1 100\n sum 528\n f2 50\n sum 578\n
Note that ‘done’ is printed first, because the callback f registered by readdir does not execute until after the toplevel code has finished executing.
Representing asynchronous control flow. The callbacks passed to readdir and stat are invoked asynchronously. Since JavaScript’s execution model is singlethreaded and nonpreemptive, these functions will not execute until the current callback has finished executing. Figure 1 shows the interprocedural control flow graph (ICFG) for the application. An ICFG (also known as a supergraph in the IFDS literature) contains a subgraph for each function in the application, with nodes for all expressions in the function and edges reflecting possible control flow between them. Each such subgraph contains distinct “start” and “end” nodes representing the function’s entry and exit points. Edges between subgraphs represent interprocedural control flow between functions due to calls and returns. Asynchronous control flow is modeled by way of a special “event loop” node. Edges connect each function’s end node to the event loop node, reflecting that control returns to the event loop when a function at the top of the call stack finishes executing. Edges connect the “event loop” node to the “start” node for each asynchronously invoked function. Thus, in fig. 1, there are edges from “event loop” to the start nodes for f and h.
Static analysis. Suppose that we want to perform a dataflow analysis to determine potentially uninitialized variables. This problem can be expressed in terms of a domain consisting of subsets of a finite set (in this example, the set of possibly uninitialized variables), and using dataflow functions that are distributive, so a meetovervalidpaths solution can be computed in polynomial time using the IFDS framework [12]. The defining characteristic of IFDS is that it avoids imprecision that would arise from considering data flow along controlflow paths in which function calls and function returns are not matched up properly.
However, suppose the analysis considers the controlflow path shown in bold in fig. 1, where execution of toplevel code is followed by execution of h, without ever calling f. On this path, sum is referenced on line LABEL:line:addToSum without having been initialized, so a traditional IFDSbased analysis will report that sum is possibly uninitialized on line LABEL:line:addToSum. In reality, this path is infeasible because h cannot be invoked asynchronously before being registered during execution of f. Furthermore, since f initializes sum and registers callback h (recall that g is invoked synchronously by forEach), and h cannot be invoked until after f has finished executing, sum is guaranteed to be initialized when h executes.
This paper presents a technique for improving the precision of IFDSbased analyses by taking into account the order in which callbacks can execute. Our approach involves transforming the original IFDS problem into an IDE problem [13] by associating dataflow functions with edges corresponding to event handler registration and event handler invocation. The transfer function obtained by composing the functions along a controlflow path reflects that path’s feasibility, thus effectively “filtering out” dataflow facts if the path is infeasible.
Explicit emission of events. Figure 2 illustrates a more complex scenario where the EventEmitter class of the Node.js events package is used to model a door that responds to open and close events. On line LABEL:line:RegisterHandleOpen, function hdlOpen is registered to handle the open event on door, and on line LABEL:line:RegisterHandleClose, hdlClose is registered to handle the close event. To trigger event handlers, an event must be emitted using the emit method.
We consider the program’s execution behavior. After loading the events package (line LABEL:line:Start), the program creates a door (line LABEL:line:CreateDoor) and declares variable txt (line LABEL:line:DeclareText). The call door.on(...) (line LABEL:line:RegisterHandleOpen) associates hdlOpen with the open event. Calling emit triggers hdlOpen,^{1}^{1}1 JavaScript is singlethreaded and nonpreemptive. emit yields control to the event loop, which invokes the associated handler, and control returns to the caller of emit. which, when it executes, initializes txt to ’Hello’ (line LABEL:line:AssignToText) and associates hdlClose with the close event (line LABEL:line:RegisterHandleClose). Line LABEL:line:EmitClose emits the close event, triggering its handler, hdlClose, which, when it executes, updates txt (line LABEL:line:CallConcat) and prints its value ’Hello, world!’. Note that hdlClose must execute after hdlOpen, because it responds only to the close event, which is emitted in the body of hdlOpen.
In the ICFG, several call sites invoke library functions such as on and emit, while the library invokes hdlOpen and hdlClose. No ordering exists between the libraryhdlOpen and libraryhdlClose edges, so a traditional analysis assumes that these event handlers may execute in an arbitrary order. In particular, the path shown in bold is admitted, but it is infeasible because it entails hdlClose executing before close is emitted.
To understand the impact of imprecision, we again consider an analysis that looks for uninitialized variables. If the analysis considers the infeasible path, it concludes that txt.concat(...) may take place at a time when txt is uninitialized. This is a false positive because it is impossible for hdlClose to execute before being registered or before the close event is emitted.
For this example, we would like to rule out the path marked in bold by tracking three operations associated with each event handler: (i) when an event handler is registered for an event, (ii) when the event is emitted, and (iii) when the event handler is invoked. Infeasible paths will be filtered out if operation (i) does not happen before operation (iii), and if operation (ii) does not happen before operation (iii). To do so, we will determine the possible sequences of these operations associated with each dataflow fact, and filter out those dataflow facts associated with infeasible sequences. Note that in the file system example discussed previously, emit operations are not explicitly present in the application source code, so it can be viewed as a special case of the more general scenario discussed here.
3 Background
Our technique takes as input an instance of the IFDS framework and outputs an instance of the IDE framework. In this section, we provide some background about these frameworks.
IFDS background. The IFDS framework [12] is applicable to interprocedural dataflow problems whose domain consists of subsets of a finite set , and whose dataflow functions are distributive (i.e., is distributive if and only if ). It has proven to be sufficiently expressive and efficient to accommodate classical dataflow problems such as the possibly uninitialized variables problem illustrated in fig. 2, but also more complex problems such as taint analysis [1] and typestate analysis [4, 10].
An IFDS problem instance is defined as , where:

is the ICFG of the input program, called the supergraph;

is a finite set of dataflow facts;

is a set of distributive dataflow functions;

maps supergraph edges to dataflow functions; and

is the meet operator on the powerset (either union or intersection).
The IFDS framework computes in polynomial time the meetovervalidpaths solution,^{2}^{2}2Following the IFDS and IDE literature, throughout this paper, we use the lattice meet operation, , to merge dataflow facts when controlflow paths merge. Thus the top element, , of a lattice represents an unreachable state and the bottom element, , means that all concrete states are possible. , of the dataflow constraints, where each node is mapped to a set of dataflow facts. A valid path respects the fact that, when a function finishes executing, it returns to the call site from where it was invoked. denotes the set of all valid paths from the start of the program to node . Formally, the meetovervalidpaths solution is defined as
where is extended to paths so that .
The key insight behind the IFDS algorithm is that any distributive function can be represented as a bipartite graph with nodes, with edges from one instance of to another instance of ; fig. 3 illustrates an example. Formally, the representation relation, , of a distributive function , is defined as follows:
The edges of the representation relation are sufficient to uniquely determine for any subset , since by distributivity . Also, the meet and composition of two distributive functions can be computed and represented as bipartite graphs, as shown in fig. 4:
IFDS represents a given problem instance as an exploded supergraph, , where:

, and

.
In essence, each node of the supergraph has been “exploded” into a set of nodes , where each is a dataflow fact (or 0), and each edge becomes the set of edges from the representation relation , where is the dataflow function assigned to . In this graph, a node is reachable from the start node if and only if fact holds at statement .
The algorithm works by iteratively composing a dataflow function for an existing controlflow path with the dataflow function for an additional instruction, thus yielding a dataflow function for a longer path. Once a path covers an entire procedure, its dataflow function becomes a summary function for the procedure and is used to model the effect of the procedure at its call sites.
As discussed informally in section 2, we can encode event handling in the supergraph by modeling an event loop that nondeterministically calls all event handlers. Such an encoding is sound but imprecise, because it ignores the order in which event handlers are called and admits infeasible paths that include handling of events before the handler has been registered or the event has been emitted.
IDE background. The IDE framework [13] generalizes IFDS to interprocedural distributive environment problems, in which dataflow facts are environments, i.e., maps in from a finite set to a finiteheight lattice , and dataflow functions are environment transformers in that distribute over the meet operator of the map lattice . In other words, environments are values from the map lattice , which is lifted from the lattice : the top element is where is the top element of , and for two environments in , .
Formally, an IDE problem instance is defined as , where:

is the supergraph of the input program;

is a finite set of program symbols, e.g., variables;

is a finiteheight lattice with top element ; and

is a function that assigns environment transformers to supergraph edges.
IDE computes the meetovervalidpaths, , of the environment transformers, similar to IFDS. At each node in the supergraph, IFDS computes only the presence or absence of each element of the dataflow domain; however, IDE computes for each an element of the lattice . Thus, IFDS is a special case of IDE in which is fixed to be the twopoint lattice, with indicating absence and indicating presence of . Intuitively, one can think of the IDE algorithm as computing facts in that hold along interprocedurally valid paths while simultaneously propagating and computing values from along those paths. Formally, the meetovervalidpaths solution is defined as
where is extended so that .
An IDE dataflow function in , i.e., a distributive environment transformer, can be encoded as a pointwise representation, using a bipartite graph with nodes. The nodes are the same as in an IFDS representation relation, but each edge is labeled by , a function in called a microfunction. By distributivity, such a set of microfunctions is sufficient to represent an environment transformer , since .
Pointwise representations are also closed under meet and composition, as shown in fig. 4. The meet of two representations and is the union of edges of and , where the microfunction for a shared edge in is the meet of the two microfunctions of that edge in and . The composition of two representations is computed by connecting the two graphs and composing microfunctions along paths in the resulting graph. Therefore, an instantiation of the IDE framework requires an efficient representation of microfunctions as well as an efficient implementation of their composition, meet, and equality test.
The IDE algorithm represents a given problem instance as a labeled exploded supergraph , with each edge labeled by a microfunction . The labels are given by a function . To compute the meetovervalidpaths solution over the labeled exploded supergraph, the IDE algorithm requires two phases. The first phase is similar to IFDS, iteratively composing bipartite graphs for controlflow paths of increasing length; this determines which nodes are reachable. The second phase applies the composed microfunctions to determine, for each node , the value that is mapped to.
In our approach, we take the IFDS exploded supergraph as input and produce an IDE labeled exploded supergraph by assigning microfunctions to exploded supergraph edges. For a program with a single event handler, we use the lattice to keep track of the event handler registrations and event emissions that have taken place on each controlflow path. To support multiple event handlers, we use the map lattice , where is the set of event handlers in the program and is the lattice for a single event handler. This allows us to track the registration and event emission for each event handler in the program.
4 Technique
Our technique is a transformation of an arbitrary instance of the IFDS analysis framework into an instance of the IDE analysis framework. The IDE solution encodes the same dataflow facts as the IFDS solution, except that it excludes dataflow facts reachable only along infeasible paths.
The input to our technique, an instance of the IFDS framework, is expressed as an exploded supergraph , which encodes the ICFG of the program under analysis, the dataflow analysis, and the transfer functions for that analysis. The output of our technique, an instance of the IDE framework, is a labeled exploded supergraph where EdgeFn assigns microfunctions in to each edge of the exploded supergraph.
The key idea of our transformation is to augment the exploded supergraph with an encoding of event handler operations. We do this by encoding event handler operations as microfunctions on the edges of the exploded supergraph. Our technique does not change the nodes or edges of the exploded supergraph; it only assigns microfunctions to the edges of that graph. Therefore, it does not change the ICFG, the base dataflow analysis, or its transfer functions.
Intuitively, an IFDS analysis asks which elements are present at node of the supergraph, while an IDE analysis asks what lattice value is associated with element at node . In our technique, the lattice encodes event handler state: if an element at node maps to an infeasible event handler state, then we conclude that at node , should be excluded from the results.
By solving this IDE instance, we achieve the effect of eliminating dataflow facts that are reachable only along infeasible paths. In the rest of this section, we describe how we encode event handler operations as microfunctions, and how we transform an IDE solution back to an IFDS solution. We also discuss theoretical properties of our technique.
4.1 Representing event handler state
For simplicity of presentation, we restrict our attention in this subsection to programs with a single event handler. We generalize to multiple event handlers in the next subsection. We define three possible states for an event handler:
 S (start):

the event handler has not yet been registered.
 R (registered):

the event handler has been registered for the event, but the event has not yet been emitted after registration. (Events emitted before registration are ignored.)
 E (emitted):

the event handler has been registered and the event has been emitted after registration.
These states model the event handler during an actual program execution. They are distinct from the event handler operations (event handler registration, event emission, and event handler invocation) we discussed in section 2, which cause transitions between the states. For example, an event handler is initially in the start (S) state. When the handler is registered, then its state becomes the registered (R) state. When an event associated with that handler is emitted, the state becomes the emitted (E) state. Only in this state can the handler be invoked from the event loop; the handler can never be invoked from any other state. These transitions are summarized in fig. 5.
To model this state machine in a static analysis, we need a fourth state, infeasible (X). Invoking the event handler from the start (S) state (before handler registration) or registered (R) state (before event emission) can never happen at run time, but such an ordering may arise during the analysis, so we must identify it as an infeasible path. We use the IDE algorithm to keep track of event handler state and rule out data flow along infeasible paths.
Specifically, we define to be the chain lattice over the set with the ordering , as depicted in fig. 5. The lattice elements , , and indicate the corresponding states of the event handler, and the top element indicates the infeasible state, i.e., the dataflow fact has traversed a controlflow path that was infeasible.
Recall that the IDE algorithm maps a dataflow fact to the top element of to indicate that the fact does not hold at the given program point. The ordering between the four elements is designed to model the behavior at controlflow merge points: when two controlflow paths merge, the associated event handler state after the merge is the lesser of the two states before the merge. For example, if one controlflow path has passed through an infeasible sequence of operations (X) and the second controlflow path has passed through a feasible sequence of operations that results in the event handler being registered but not emitted (R), then after the controlflow merge, the event handler is in state ; it may have been registered but not emitted (R).
At the main entry point of the program, the event handler is defined to be in the start (S) state for each fact that holds at the entry point.^{3}^{3}3Normally, the IDE algorithm initializes every fact to the top element, i.e., . In this case, we could label edges leaving the entry point with microfunctions that update every fact to S. However, for convenience, we simply initialize every fact to S. As dataflow facts are propagated during the analysis, we track event handler state with IDE microfunctions, encoding the state machine transitions along each edge of the exploded supergraph. The default microfunction along most edges is the identity, indicating that the event handler state does not change. The other microfunctions are defined in table 1 and correspond to the operations discussed in section 2.
register  emit  invoke  
X  X  X  X 
S  R  S  X 
R  R  E  X 
E  E  E  E 
Event handler registration. The first microfunction labeled edge is a controlflow edge that represents an event handler registration operation. For example, the controlflow edge from door.on(’open’, ...) to the library in fig. 2 causes the event handler to transition from the start state to the registered state. If the event handler is in any other state, then the registration is ignored. We define the microfunction for this edge in table 1, first column.
Event emission. The second microfunction labeled edge is a controlflow edge that represents event emission. An example is the edge from door.emit(’open’) to the library: the handler associated with the open event transitions from the registered state to the emitted state. In all other cases, the event emission is ignored. We define the microfunction for this edge in table 1, second column.
Event handler invocation. The third microfunction labeled edge is a controlflow edge that represents event handler invocation. Examples of these edges are from the library to the start nodes of both event handlers. If the handler is not in the emitted state, then it transitions to the infeasible state because the handler is being invoked before it has been registered or its event has been emitted. We define the microfunction for this edge in table 1, third column.
Discussion. The transformation converts an instance of the IFDS framework to an instance of the IDE framework. It does not change the structure of the exploded supergraph, , but it provides , an assignment of exploded supergraph edges to microfunctions. For programs with a single event handler, EdgeFn is defined as follows:
Returning to our example in fig. 2, consider the execution path that is actually taken at run time: the door opening event handler is registered by door.on(’open’, ...) on line LABEL:line:RegisterHandleOpen, the door opening event is emitted by door.emit(’open’) on line LABEL:line:EmitOpen, and the door opening event handler is invoked by the edge from the library to start_{open}.
For this controlflow path, the analysis computes the composition of the microfunctions, namely . Applying this composed function to the initial state, we have , so any data flow associated with this path is considered feasible.
On the other hand, consider a controlflow path in which the event handler is registered and invoked, but the event is never emitted. The composed microfunction for such a path is , so we have . Thus, any data flow computed along that path is considered infeasible.
Recall that an instantiation of the IDE framework requires an efficient representation of microfunctions and an efficient implementation of their composition, meet, and equality test. A microfunction can be efficiently represented as a table of the four values . Since there are only possible such functions, compositions and meets of microfunctions can be precomputed, and only 8 bits are required to represent a microfunction.
4.2 Multiple event handlers
For programs with multiple events and multiple event handlers, it is necessary for the analysis to distinguish them. In fig. 2, the controlflow path that registers the hdlOpen event handler, emits the open event, and invokes the opening event handler is feasible. However, the path that instead invokes the hdlClose handler should be infeasible, because the door closing event handler has never been registered and the close event has never been emitted. Our solution is to maintain a separate state for each event handler.
Thus, we define the IDE lattice to be the map lattice , where is the set of event handlers in the program and is the lattice for a single event handler that we discussed in the previous subsection. For each node in the exploded supergraph, the IDE algorithm using lattice computes a map that assigns a separate state for each event handler in the program.
Recall that the IDE framework requires an efficient representation of microfunctions in , which in this case is . Efficiently representing such functions is nontrivial. There are possible functions of this type, so any representation that could encode all of them would require bits to encode each one. The key to an efficient encoding is the observation that all of the microfunctions that actually occur during an analysis, including their compositions and meets, are separable, in that the effect of an operation on the state of one event handler is independent of the states of other event handlers before the operation. In other words, the state that an event handler transitions to depends only on that handler’s previous state, and not the state of any other event handler.
Each separable microfunction can thus be represented by a function in that models the effect of an operation on each event handler in separately. We discussed in the previous subsection how to efficiently represent a function in . Now, to represent a microfunction in , we need only to tabulate functions of type , one for each event handler in . The operations by the IDE framework, composition, meet, and equality comparison, are computed pointwise, separately for each event handler. Effectively, a microfunction in is represented by a map of event handlers to microfunctions in . Note that this representation of microfunctions and the required operations adds a factor of to the asymptotic complexity of the IDE algorithm.
The version of that supports multiple event handlers is therefore defined as:
We use the subscript to indicate that a microfunction updates only the state assigned to , and not the state of any other event handler. (Note that the default microfunction, id, does not update any state.) In an implementation, EdgeFn must also be able to determine which handler is affected by each edge in the exploded supergraph.
4.3 Transforming IDE results to IFDS results
When IDE finishes analyzing a program, its output is, for each program point, a map from elements of to elements of . To convert this output to a result for the original IFDS problem, we must identify, at each program point, the subset of elements of that are reachable along feasible paths. In our context, a path is feasible if, for every event handler, the operations affecting that event handler along the path are in a feasible sequence (e.g., the handler is not invoked before it is registered or its event emitted). In other words, a path is feasible if the element of computed by the IDE analysis maps every handler to a state other than X. Formally, we define an “untransform” function that converts an IDE result to an IFDS result:
In fig. 2, on the controlflow path that first passes through door.on(’open’, ...) and then through door.emit(’open’), the microfunction for that path is , which computes the event handler state mapping . If that path then continues into hdlOpen, the event state will remain at , and thus the analysis will conclude that the path is feasible.
However, if the path continues into hdlClose instead, the composed microfunction becomes , which computes the event handler state mapping . Since at least one handler is in state X, the analysis will conclude that this path is infeasible and discard all dataflow facts computed along this path.
4.4 Theoretical results
Soundness and precision. Our transformation is sound: the IDE analysis considers all feasible dataflow paths, i.e., the ones that occur during a program execution. Any dataflow fact that IFDS computes along a concrete path will be returned by our technique.
Theorem 4.1 (Soundness)
Let be an IFDS problem, be a concrete execution path, and be a dataflow fact. Then:
Our transformation is precise: the IDE analysis returns a subset of the dataflow facts that would be computed by IFDS. Dataflow facts computed along infeasible paths are not included in the result of our transformation.
Theorem 4.2 (Precision)
Let be an IFDS problem and be any node in the supergraph. Then:
Efficiency. As discussed by Reps et al. [12, sec. 5], the asymptotic complexity of solving an IFDS problem instance is . An equivalent IDE problem instance also requires time to solve, provided that the microfunctions have an efficient representation [13, def. 5.2]. Our representation of microfunctions adds a time and space overhead of . Therefore, the asymptotic complexity of the eventdriven IDE analysis is .
5 Implementation
To demonstrate the effectiveness of our technique on smallscale eventdriven programs, we implemented a proofofconcept called Borges, which analyzes a subset of JavaScript.
5.1 Uninitialized variables analysis as an IFDS problem
As input, Borges takes a list of JavaScript files to be analyzed (including a model of any library functions used) and an event model specification describing which function calls represent event handler registrations, event emissions, and event handler invocations. Borges transforms the IFDS problem into an IDE problem, solves the IDE problem, and filters out results that were computed by traversing infeasible paths.
Borges is implemented as a Scala application and builds on two program analysis infrastructures: TAJS [6] and Flix [9]. We use TAJS to construct control flow graphs and call graphs for JavaScript programs. Borges uses the control flow graph as the basis for constructing the supergraph that is used by IFDS and IDE, and the call graph to determine which functions are invoked from each call site. We use Flix to solve the IFDS and IDE problems; in particular, we implement the analyses in the Flix language and instantiate the uninitialized variables analysis by implementing the dataflow functions in Scala. In principle, however, Borges is applicable to any programming language and dataflow problem that can be expressed in the IFDS framework.
One challenge that we encountered involves the handling of arrays and objects. In JavaScript, arrays are listlike objects that may be noncontiguous, and object properties are accessed via string values that may be computed at run time, posing significant challenges to static analysis [15]. Since the challenge of precisely modeling objects and arrays is largely orthogonal to the issue of avoiding infeasible paths in the presence of eventhandling constructs, we chose to adopt a simplistic approach where the abstract locations used to represent objects and arrays are unified with those representing their elements. In other words, if an object (array) is initialized, then so are all its properties (elements).
5.2 Transforming to an IDE problem
In order to produce more precise results, Borges transforms IFDS problems into IDE problems that track the operations associated with each event handler, as well as each handler’s state. Information about which function calls correspond to which event handler operations must be provided to Borges as an event model specification, which also indicates the argument that represents the event name and the argument that represents the event handler. Using this information, Borges can identify which call sites involve event handler operations.
For example, the program in fig. 2 uses the Node.js events library. Applying static analysis to complex libraries poses challenges that are beyond the scope of this paper, and our approach to handle librarybased applications is to provide a stub that models the library’s essential functions and control flow. In the stub for the events library, we provide the functions on, emit, and _eventDispatcher. The event model specifies that a call to on (e.g., on(’open’, hdlOpen)) registers the second argument (hdlOpen) as an event handler on the event given as the first argument (open), a call to emit (e.g., emit(’open’)) emits the event given as its argument (open), and a call from inside the library (specifically, from _eventDispatcher) invokes an event handler.
Using this information, along with the output from TAJS, Borges constructs a mapping of event handler registrations that happen in a program. For each edge in the control flow graph, Borges can identity whether it affects event handler state (i.e., through a registration, event emission, or invocation), and if so, which event name and event handler is involved. Furthermore, Borges also computes a mapping from event names to event handlers, to easily identify which handler responds to a given event emission.
The transformation from an IFDS problem to an IDE problem is straightforward. Recall that the IFDS algorithm uses an exploded supergraph to represent dataflow functions, while in the IDE algorithm, EdgeFn assigns a microfunction to exploded supergraph edges. Borges provides such an implementation of EdgeFn to determine the microfunction for a given edge and event handler. For instance, the edge representing a call to register(’open’, hdlOpen) is labeled with the register microfunction for the hdlOpen handler.
With all the exploded supergraph edges labeled, solving the IDE problem computes the composition of all the microfunctions along a controlflow path, taking the meet whenever multiple paths merge. In other words, when computing dataflow facts for the possibly uninitialized variables analysis, Borges also maintains the event handler states. Thus, before reporting a final result for each program point, Borges can examine the states of each event handler and filter out any result with an event handler in the infeasible state.
6 Case Studies
In this section, we discuss three examples to demonstrate our approach. We return to the file system example in section 2 and briefly discuss two other programs. We run Borges on three small, eventdriven Node.js applications, and apply our transformation to a possibly uninitialized variables analysis.
File system module, revisited. Recall fig. 1, where sum is read without being initialized, but only along an infeasible path. Borges can improve precision by considering the order in which callbacks are executed. Specifically, the calls to readdir (line LABEL:line:readdir) and stat (line LABEL:line:stat) are registration operations for the f and h callbacks, respectively. However, the emission operation is implicit and happens from within the event loop. Since event emission happens after event handler registration but before event handler invocation, we model it as occurring immediately after registration. In other words, the microfunction labeling the calls to readdir and stat is . Finally, invocations of f and h are invocation operations, which correspond to the microfunction invoke.
When Borges analyzes the application, it identifies two paths with respect to the callbacks. In one path, readdir is called, f is invoked, stat is called, and h is invoked. The composition of microfunctions along this path is , which computes the event handler state mapping , meaning the path is feasible.
However, in the infeasible path where readdir is called and then h is invoked, the composed microfunction is , which computes the event handler state , meaning the path is infeasible. Therefore, any results computed along this path are filtered out.
Timers module. Figure 6 implements a simple timer. It is similar to the file system example, as it has two callbacks that can be executed only in a certain order. The application prompts the user for a number and then counts down from that number in onesecond intervals. It uses the timers module, whose functions are defined in the global scope.
Because the callbacks start (line LABEL:line:stdinCallback) and tick (line LABEL:line:setTimeoutCallback) are invoked asynchronously, a traditional static analysis might consider an execution path where tick is executed before start, and conclude that rem is possibly uninitialized when it is read on line LABEL:line:readRemaining. However, this is an infeasible path: tick is only registered as a callback by start and itself, so it can be invoked only after start has finished executing. As a result, Borges labels the execution path with the microfunction and computes the event handler state mapping as .
Net module. The program in fig. 7 implements a small TCP server using the Node.js net module. It creates a server that listens for client connections and mirrors input back to the client. A corresponding client application could be implemented in JavaScript using the net module, or in any other language of choice.
Without an ordering constraint between the lstn (line LABEL:line:serverListen) and conn (line LABEL:line:serverConnected) callbacks, a traditional analysis might consider infeasible paths, e.g., where conn is invoked before lstn. Along this path, the analysis concludes that nConn on line LABEL:line:incrementNrConnects is possibly uninitialized. However, conn can be executed only after lstn finishes, which guarantees that nConn is initialized. In Borges, such a path would be labeled by the microfunction , which computes the event handler state .
7 Related work
Bodden et al. [3] use the IDE algorithm to enhance the precision of an IFDS analysis when analyzing software product lines. They modify any IFDS analysis into an IDE analysis that runs on the original program and tracks the product line variants in which each dataflow fact holds.
Rapoport et al. [11] observe that contextsensitive analysis can be made more precise by correlating the dynamic dispatch behavior of different call sites on the same receiver object. They also transform an arbitrary IFDS analysis into an IDE analysis that keeps track of which methods have been dynamically dispatched on each receiver.
Jhala and Majumdar [7] adapt IFDS for asynchronous programs. In these programs, asynchronous calls are similar to event registrations in that the procedure will be invoked at a later time; however, there are no event emissions, so the time of invocation is unpredictable. In their approach, instead of encoding additional state as an IDE problem, they transform the analysis into a larger IFDS analysis that tracks, at each asynchronous call site, the number of pending asynchronous calls made for which the procedure has not yet been invoked.
Madsen et al. [8] introduce the eventbased call graph, an extension of the call graph that models happensbefore constraints between event handler registrations and event emissions. However, their approach does not scale well because the number of contexts is exponential in the size of the program.
Sotiropoulos et al. [14] introduce , a model of asynchrony in JavaScript, as well as the callback graph, which describes the possible orderings of callback execution. They design a callbacksensitive analysis for JavaScript that uses the callback graph to respect the execution order of callbacks. Their technique is specific to JavaScript, while our approach is language agnostic.
8 Conclusion
Traditional static analyses produce imprecise results when applied to eventdriven programs because they assume that event handler callbacks can execute in any order. We have presented an approach for precise dataflow analysis that is applicable to any dataflow problem that can be expressed as an instance of the IFDS framework, and is expressed as a transformation from that presentation to an IDE problem, where the dataflow functions associated with edges in the graph filter out infeasible paths that arise due to impossible sequences of event handler invocations. We prove the correctness of our transformation and report on a proofofconcept tool.
References
 [1] Arzt, S., Rasthofer, S., Fritz, C., Bodden, E., Bartel, A., Klein, J., Le Traon, Y., Octeau, D., McDaniel, P.: FlowDroid: Precise Context, Flow, Field, Objectsensitive and Lifecycleaware Taint Analysis for Android Apps. In: Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI (2014). https://doi.org/10.1145/2594291.2594299
 [2] Bodden, E.: Heros IFDS/IDE Solver. https://github.com/Sable/heros, accessed: 20181005
 [3] Bodden, E., Tolêdo, T., Ribeiro, M., Brabrand, C., Borba, P., Mezini, M.: SPL: Statically Analyzing Software Product Lines in Minutes Instead of Years. In: Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI (2013). https://doi.org/10.1145/2491956.2491976
 [4] Fink, S.J., Yahav, E., Dor, N., Ramalingam, G., Geay, E.: Effective Typestate Verification in the Presence of Aliasing. ACM Transactions on Software Engineering and Methodology, TOSEM 17(2), 9:1–9:34 (2008). https://doi.org/10.1145/1348250.1348255
 [5] IBM Research: Watson Libraries for Analysis (WALA). https://github.com/wala/WALA, accessed: 20181005
 [6] Jensen, S.H., Møller, A., Thiemann, P.: Type Analysis for JavaScript. In: Proc. Static Analysis Symposium, SAS (2009). https://doi.org/10.1007/9783642032370_17
 [7] Jhala, R., Majumdar, R.: Interprocedural Analysis of Asynchronous Programs. In: Proc. ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL (2007). https://doi.org/10.1145/1190216.1190266
 [8] Madsen, M., Tip, F., Lhoták, O.: Static Analysis of EventDriven Node.js JavaScript Applications. In: Proc. ACM SIGPLAN Conference on ObjectOriented Programming, Systems, Languages, and Applications, OOPSLA (2015). https://doi.org/10.1145/2814270.2814272
 [9] Madsen, M., Yee, M.H., Lhoták, O.: From Datalog to Flix: A Declarative Language for Fixed Points on Lattices. In: Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI (2016). https://doi.org/10.1145/2908080.2908096
 [10] Naeem, N.A., Lhoták, O.: Typestatelike Analysis of Multiple Interacting Objects. In: Proc. ACM SIGPLAN Conference on ObjectOriented Programming, Systems, Languages, and Applications, OOPSLA (2008). https://doi.org/10.1145/1449764.1449792
 [11] Rapoport, M., Lhoták, O., Tip, F.: Precise Data Flow Analysis in the Presence of Correlated Method Calls. In: Proc. Symposium on Static Analysis, SAS (2015). https://doi.org/10.1007/9783662482889_4
 [12] Reps, T., Horwitz, S., Sagiv, S.: Precise Interprocedural Dataflow Analysis via Graph Reachability. In: Proc. ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL (1995). https://doi.org/10.1145/199448.199462
 [13] Sagiv, S., Reps, T., Horwitz, S.: Precise Interprocedural Dataflow Analysis with Applications to Constant Propagation. In: Proc. Conference on Theory and Practice of Software Development, CAAP/FASE (1995). https://doi.org/10.1007/3540592938_226
 [14] Sotiropoulos, T., Livshits, B.: Static Analysis for Asynchronous JavaScript Programs. In: Proc. European Conference on ObjectOriented Programming, ECOOP (2019). https://doi.org/10.4230/LIPIcs.ECOOP.2019.8
 [15] Sridharan, M., Dolby, J., Chandra, S., Schäfer, M., Tip, F.: Correlation tracking for pointsto analysis of JavaScript. In: Proc. European Conference on ObjectOriented Programming, ECOOP (2012). https://doi.org/10.1007/9783642310577_20
Appendix 0.A Proofs
Our work is based on the work by Rapoport et al. [11], which also transforms a given IFDS problem instance to an IDE analysis that eliminates dataflow facts computed along infeasible paths.
In this section, we assume that and its exploded supergraph representation, , is the base IFDS problem instance given to our transformation, and that and its labeled exploded supergraph representation, , is the IDE problem instance defined in section 4; in particular, lattice is the map lattice where is the event handler state lattice. Finally, to simplify some notation, we write the edge for each . Note that .
0.a.1 Soundness and Precision
Recall that in the IDE definition, we used to denote the top element of the environment lattice, i.e., the environment that maps every element to . We also defined the meetovervalidpaths solution for an IDE problem as . However, for the eventdriven analysis, the initial state is rather than . Thus, the meetovervalidpaths solution for the eventdriven analysis is:
To prove the soundness and precision theorems, we require two lemmas.
Lemma 1
Let be a concrete execution trace of some program, and let be an event handler in the program. If at node of the trace , handler is in state , and is a dataflow fact such that , then .
Intuitively, the lemma states that the eventdriven analysis overapproximates event handler state in a program execution. Note that is a concrete state, so it cannot be X.
Proof
By induction on the length of the program trace.
Base case: . There is no instruction (edge)
in the trace, so there is no dataflow fact . Therefore, the lemma trivially
holds.
Induction hypothesis: Let and let
, i.e., is the abstract state
computed by the eventdriven analysis for the execution trace , is
some dataflow fact in , and is some event handler.
Suppose the lemma holds for trace , i.e., where
is the concrete state for handler at node after the trace .
Induction step: Now consider . Let be the concrete state for handler at node after the
trace . We must now show .
Because is extended from edges to paths by composition, we can rewrite:
Note that computes the environment at node after the
trace , which is then transformed by to get the
environment at node , a single node after the trace , which is a
map from . Thus,
returns a map from handlers to event handler states.
Now, recall that for a given environment , the IDE framework represents an environment transformer as a set of microfunctions in :
For an edge , gives the environment transformer for that edge, and for , gives the corresponding microfunctions:
By substitution, we can rewrite:
This gives us the inequality:
The inequality compares two different ways of computing the state of handler
for dataflow fact at node (after the trace ). On the
righthand side, the entire environment at node (after the trace )
is transformed by , and then the state of handler is
obtained from the new environment. On the lefthand side, at node (after
the trace ), a map of event handlers to states (i.e., an element of the
lattice ), is obtained for some dataflow fact and then
updated by the microfunction , before getting the state mapped to handler . The
inequality states that the lefthand side is more precise than the righthand
side; intuitively, this is because the lefthand side takes the effect of a
single microfunction, while the righthand side takes the effect of merging all
the microfunctions.
It remains to show to complete the proof. To simplify notation, let be the map of event handlers to states, as computed by the IDE algorithm along path for dataflow fact . Note that . We proceed by considering the four cases of EdgeFn and how the microfunctions update the map .
Case 1
is an edge that registers handler , so the
microfunction is .
The microfunction for this edge updates the state for handler : if is in state S, then will be in state R. Otherwise, the state is unchanged. The concrete state of handler at node is state , which cannot be X, so there are three possibilities:

If , then edge registers handler , so we get the new concrete state . By the induction hypothesis, , so at node , is mapped to S, R, or E. In each of those cases, , so the lemma holds.

If , then the event handler has already been registered, so the state is unchanged and . By the induction hypothesis, , so at node , is mapped to R or E. In both of those cases, , so the lemma holds.

If , then the event handler has already been registered (and its event has been emitted), so the state is unchanged and . By the induction hypothesis, , so at node , is mapped to E. In this case, , so , and the lemma holds.
Case 2
is an edge that emits an event for handler , so
the microfunction is .
The microfunction for this edge updates the state for handler : if is in state R, then will be in state E. Otherwise, the state is unchanged. The concrete state of handler at node is state , which cannot be X, so there are three possibilities:

If , then the event emission is ignored, so . By the induction hypothesis, , so at node , is mapped to S, R, or E. In each of those cases, , so the lemma holds.

If , then the handler can respond to the event, so we get the new concrete state . By the induction hypothesis, , so at node , is mapped to R or E. In both of those cases, , so the lemma holds.

If , then the state is unchanged, so . By the induction hypothesis, , so at node , is mapped to E. In this case, , so , and the lemma holds.
Case 3
is an edge from the event loop to handler
, so the microfunction is .
The microfunction for this edge updates the state for handler : if is in state E, then the state is unchanged. Otherwise, the state will be X. The concrete state of handler at node is state , which cannot be X, S, or R. X never occurs during a concrete execution. S is not possible because it means the event handler has not been registered, so invocation cannot occur. R is not possible because it means the event has not been emitted, so invocation cannot occur. Therefore, . By the induction hypothesis, , so at node , is mapped to E. In this case, , so , and the lemma holds.
Case 4
is any other edge, so the microfunction is
id.
The microfunction does not update the state of handler . Similarly, in the concrete execution, there is no event handler operation on this edge, so . By the induction hypothesis, , and , so and the lemma holds. ∎
Lemma 2
Let be a concrete execution trace of some program, be an event handler, and be a dataflow fact. Then:
Intuitively, the lemma states that for a concrete execution path, the eventdriven analysis never computes an infeasible event handler state.
Proof
direction. By induction on the length of the
program trace.
Base case: . There is no instruction (edge) in the
trace, so there is no dataflow fact . Therefore, the lemma trivially
holds.
Induction hypothesis: Let and let
, i.e., is the abstract state
computed by the eventdriven analysis for the execution trace , is
some dataflow fact in , and is some event handler.
Suppose the lemma holds for trace , i.e., .
Induction step: Now consider . Let be the concrete state for handler at node after the
trace . We must now show .
From the previous proof, we know:
By the induction hypothesis, for all , so
we know that
is a map where each handler is mapped to S, R, or
E. So we need to examine , the map after being updated by the
microfunction on edge .
Of the four cases, three of them (, , and id) are straightforward. None of these microfunctions map any handler to X. So, for all , we have:
Therefore, .
The fourth case is when EdgeFn returns , which will map
to X, unless handler is currently mapped to E. However, along
the concrete execution trace , the last edge corresponds
to an invocation of event handler . This can only happen if has already
been registered and its event emitted. In other words, the concrete state of
must be E. By lemma 1, so and
. Therefore, .
direction.
The premise states that after a concrete execution trace , at node and dataflow fact , handler is in a state other than X. In other words, there exists a path in the exploded supergraph to node where holds, so by definition, . ∎
We can now prove the soundness and precision theorems.
Theorem 0.A.1 (Soundness)
Let be an IFDS problem, be a concrete execution path, and be a dataflow fact. Then:
Comments
There are no comments yet.