Log In Sign Up

Sub-Turing Islands in the Wild

by   Earl T. Barr, et al.

Recently, there has been growing debate as to whether or not static analysis can be truly sound. In spite of this concern, research on techniques seeking to at least partially answer undecidable questions has a long history. However, little attention has been given to the more empirical question of how often an exact solution might be given to a question despite the question being, at least in theory, undecidable. This paper investigates this issue by exploring sub-Turing islands – regions of code for which a question of interest is decidable. We define such islands and then consider how to identify them. We implemented Cook, a prototype for finding sub-Turing islands and applied it to a corpus of 1100 Android applications, containing over 2 million methods. Results reveal that 55% of the all methods are sub-Turing. Our results also provide empirical, scientific evidence for the scalability of sub-Turing island identification. Sub-Turing identification has many downstream applications, because islands are so amenable to static analysis. We illustrate two downstream uses of the analysis. In the first, we found that over 37% of the verification conditions associated with runtime exceptions fell within sub-Turing islands and thus are statically decidable. A second use of our analysis is during code review where it provides guidance to developers. The sub-Turing islands from our study turns out to contain significantly fewer bugs than `theswamp' (non sub-Turing methods). The greater bug density in the swamp is unsurprising; the fact that bugs remain prevalent in islands is, however, surprising: these are bugs whose repair can be fully automated.


page 1

page 2

page 3

page 4


Finding Bugs with Specification-Based Testing is Easy!

Automated specification-based testing has a long history with several no...

Towards Extending the Range of Bugs That Automated Program Repair Can Handle

Modern automated program repair (APR) is well-tuned to finding and repai...

HIPPODROME: Data Race Repair using Static Analysis Summaries

Implementing bug-free concurrent programs is a challenging task in moder...

Attention Please: Consider Mockito when Evaluating Newly Released Automated Program Repair Techniques

Automated program repair (APR) has caused widespread concern in recent y...

Turing Number: How Far Are You to A. M. Turing Award?

The ACM A.M. Turing Award is commonly acknowledged as the highest distin...

Turing's cascade instability supports the coordination of the mind, brain, and behavior

Turing inspired a computer metaphor of the mind and brain that has been ...

A Turing Test for Crowds

The realism and believability of crowd simulations underpins computation...

1. Introduction

This paper seeks answer the following fundamental question at the intersection of programming languages theory and empirical software engineering:

What portion of the code of a large corpus of real software systems lies in Sub-Turing islands; ‘islands’ of code that denote computation for which interesting program analysis questions are decidable?

We use the term ‘Turing Swamp’ to refer to any code that does not lie in such a Sub-Turing island. Of course, merely determining whether or not code lies within an island or in the swamp is, itself, undecidable. Therefore, our tool uses a simple conservative under-approximation of Sub-Turing islands (and corresponding over-approximation of the swamp).

Our Sub-Turing island identification algorithm, Cook, guarantees that the halting problem is decidable for any computation it identifies as lying within an island; as a result, Cook necessarily under-approximates the amount of code that lies within such islands. Even with this relatively simple under-approximation, we were able to determine that a large proportion of non-trivial production code (for Android) does indeed lie in island (not swamp) code. That is, for a corpus of 1100 Android applications, containing over 2 million methods, we found that 55% of the methods are sub-Turing.

Even if we remove the ‘long tail’ of simple methods (like getters and setters and methods with fewer than 30 bytecode instructions) we still find that 22% of all code lies in a Sub-Turing island. We then ask

Since we find that at least one fifth of non-trivial real-world systems lies in a Sub-Turing island, what are some of the ramifications for programming languages and software engineering applications that rely on static analysis?

To investigate these implications we conducted two empirical studies of the impact for sub-Turing islands. Even with our conservative under-approximation, we found that (at least) 37% of the verification conditions for runtime exceptions (e.g., array bounds and null pointer violations) lie within sub-Turing islands. Furthermore, (for a dataset of ten open source applications), we found a statistically significant difference in bug density, with a large effect size.

These findings reveal a glimpse of the potential implications and applications of Sub-Turing analysis. In a single paper we cannot claim to have addressed more that the first few natural questions that occur when considering the approximate computation of the boundary between Sub-Turing islands and the swamp. Nevertheless, we believe that our results demonstrate that a surprisingly large portion of code does clearly lie within a Sub-Turing island and that there is practical merit in studying islands to inform and improve static analysis. Sub-Turing Islands support fully automatic and precise symbolic reasoning; this reasoning might be exploited for bug repair and free humans to concentrate problems occurring in the swamp.

There are many avenues for future work. We outline some of these and their relationship to existing trends of intellectual investigation in the programming languages and software engineering research communities. We hope that this paper will stimulate the further investigation of Sub-Turing analyses of software and real-world applications of these findings. Our paper seeks to motivate this research agenda with scientific evidence for the prevalence of Sub-Turing islands (within Android applications in this case) and the real-world impact and implications for bug density and verification.

Specifically, the contributions of this paper are the following.

  • We introduce and formalise the concept of sub-Turing island.

  • We provide an analysis for identifying sub-Turing islands and its implementation in the prototype tool Endeavour.

  • We reveal that Sub-Turing code is more prevalent in real-world system than might be expected: a conservative lower bound is at least one fifth of non-trivial Android App code is Sub-Turing.

  • We demonstrate that Sub-Turing island analysis has great potential for real-world application. Specifically, we report that 37% of the array bounds and null pointer verification conditions lie within islands, while islands enjoy lower bug density than the Turing swamp.

2. Sub-Turing Islands

This section first defines Sub-Turing Island where the definition is parameterized by a decision procedure. As an illustrative decision procedure, we consider terminating islands, Sub-Turing islands with a conservative decision procedure for halts versus may not halt. The bulk of this section then presents the syntax and semantics of Carib, a core language that facilitates the identification of Sub-Turing islands.

Definition 2.0 (Sub-Turing Island).

A region of code is sub-Turing with respect to property if there exists a decision procedure that determines whether holds over all executions of .

Under Rice’s theorem (rice53), Sub-Turing islands are only computable when approximates while being certain when it determines holds; it cannot decide both and . For an arbitrary property, a suitable decision procedure may not exist, hence the existential quantification in the definition of sub-Turing. Finding a decision procedure is based on human ingenuity. The parameterisation of the definition on decision procedure , implies that sub-Turing islands are only defined with respect to a given decidable property. Different static analyses can safely approximate the islands, and with different levels of precision, thereby giving rise to the generation of different islands. However, if any approximation safely under-approximates the code that lies within an island then it will be safe to make ‘island-aware assertions and inferences’ within any given island.

We focus our investigation on a decision procedure for the halting problem. Our realisation, given in Section 3.2, soundly approximates this undecidable problem. Given this decision procedure, we frame the Terminating Islands Identification Problem in terms of states, , where denotes the set of program l-values and denotes the lifted value domain , which we leave otherwise unspecified. We say that a state, is divergence free if none of its l-values are mapped to . Given a divergence-free state and a region of code , our goal is to determine if is also divergence-free; we formalize , the semantics of Carib, in Section 2.1. From a practical perspective, we are interested in regions that represent meaningful code fragments such as a method or a loop body.

Definition 2.0 (The Terminating Island Identification Problem).

Given a region of code , the Terminating Island Identification Problem is to determine if

When this condition is satisfied, divergence can only result from the code that makes up . In this sense, is independent of its context (the rest of the program). Since we consider only a decision procedure for termination in this paper, we use Sub-Turing Island to refer to a Terminating Island in what follows.

To illustrate the goal of our analysis, consider the three examples shown in Figure 1 where the region of code considered is a method. To being with, method foo of Figure 1a calls method bar, which is clearly sub-Turing; thus it is not a source of divergence to its caller foo, which is also sub-Turing. In contrast, in Figure 1b, callee bar is not sub-Turing as it contains a loop whose termination can not be guaranteed. As a result its caller foo is also not sub-Turing. Finally, Figure 1c is similar to Figure 1b except that the source of divergence is a call to an API method, which may diverge (A call to a recursive method would have the same effect.) The potentially divergent API call does not, however, relegate Line 4 to the swamp, despite its control dependence on the call to bar. This is because a Carib function always terminates, so the fact that a Line 4 is termination-sensitive control dependent on Line 3 does not matter (sdetal:unifying; kaetal:acm-surveys).

1void foo(){
2  int x = 5;
3  x = bar();
4  return x;
7int bar(){
8  int y = 1;
9  return y;
1void foo(){
2  int x = 5;
3  x = bar();
4  return x;
7void bar(){
8  int y = 0;
9  while(){
10    y = y+1;
11  }
12  return y;
1void foo(){
2  int x = 5;
3  unused = bar();
4  return x;
7void bar(){
8  int y = 0;
9  r = api();
10  if (r)
11    y = y+1;
12  }
13  return y;
(a) (b) (c)

Figure 1. Three examples illustrating the outcome of our Sub-Turing analysis. Symbol ✔ indicates that the method is sub-Turing and ✘ indicates the opposite. In the example is a condition that cannot be proven to eventually be false while api is a call to an unknown API method.

2.1. Carib: Its Syntax and Semantics

Figure 2. The grammar for Carib, our core Jimple-like language.

Figure 2 defines , the grammar of Carib, our core language. Carib is a Jimple-like (Vallee-RaiGHLPS00) intermediary representation with a minimal set of instructions. Vallée et al. (Vallee-RaiGHLPS00) and Bartel et al. (BartelKTM12) have shown that Jimple can encode the entire instruction set of widely deployed virtual machines, such as the JVM and Dalvik. A Carib program is a set of methods derived from the nonterminal prog. Carib incorporates three simplifications that ease the presentation. First, instead of the usual syntax for method invocation , Carib uses the form , with the receiver being the first argument. Second, Carib defines only the two structured control constructs while and if-else. Finally, to simplify reasoning about side-effects, Carib restricts pointer dereferences to its assignment statement, where the dereference operator can only appear either in the LHS or RHS alone. Further, the call syntax only permits an as an actual parameter, again ruling out pointer dereferences. These properties simplify reasoning about aliasing in Carib.

Carib’s semantics, , extends a conventional semantics, , such as Winskel’s IMP (Winskel) where is a partial function from to that updates the state to reflect the execution of . For (Figure 2), is identical to the conventional semantics, , when terminates. When does not terminate, reifies the nontermination by binding , to each variable modified by .

To reify nontermination, must first identify it. In Carib there are three potential sources: loops, recursive method calls, and (unknown) API calls, The semantics uses three oracles in the identification: , , and (Section 3 discusses our computable approximations to these three). For example, the termination oracle, is used to identify nonterminating loops as well as recursive methods that may diverge. Figure 3 defines these oracles and other symbols and functions used to define Carib’s semantics.

Finally, we formalise the notions of state and state update as used in the Carib semantics. As a convenience, we assign a unique name to each local variable and formal parameter, and then simply refer to only those names that are in scope. We use to denote the set of all program l-values (in Figure 2, these include identifiers and array/structure references, ). An l-value denotes a memory location that holds a value from the lifted value domain, , where denotes divergence; we leave otherwise unspecified. A program state maps each l-value to its value . denotes the (possibly infinite) set of all program states. We write to denote the value that maps to and to denote the updated state where and . As a notational convenience, we write for variable set to denote and as shorthand for , with and and . Finally, we write to denote .

Carib’s semantics must account for two potentially divergent constructs, while and call. If a loop does not terminate, our semantics effectively replaces the loop with a parallel assignment of to all the l-values that the loop may modify. We handle recursive calls in the same way. Other constructs in may propagate , but will not introduce it. In essence, our semantics is a collecting semantics based on taint analysis where while and call are the only taint sources. We say that a program point is in the swamp if reaches it, otherwise the point is a sub-Turing island. Finally, we emphasis that, under its semantics Carib methods always terminate.

set of all l-values potentially written by the execution of
the termination oracle used to check for nontermination of
the set of (assumed divergent) API methods
the set of recursive methods deems divergent
the set of free variables in expression
a fresh pseudo-variable used to hold a method’s return value
Figure 3. Symbols and functions used to define Carib’s semantics in Figure 4; the first three are parameters.

sequence :
assign :
ife : =
while :
call :

Figure 4. Carib’s denotational semantics: while and call can introduce divergence (); the other equations only propagate it.

Figure 4 presents Carib’s semantics, . The rule for while leverages to determine if a loop terminates. For a loop that may not terminate, the first line of while rule binds to each l-value potentially written during the execution of . The externally supplied function identifies these l-values. Section 3 describes our conservative realisation of as well as our conservative determination of the set of written l-values.

The second source of non-termination is calls to recursive methods and API methods, denoted and in Figure 4. For the call rule, the first case binds to all l-values that the called method’s execution potentially updates together with , the variable receiving the method’s return value. In call’s second case, conventional semantics apply. Here, denotes the statements of the called method. Working outward from , we evaluate ’s on the state formed by binding ’s formals to the actuals found in the call. Other than the return value, there is no need to map information back to the caller because uses call by value semantics. In the case of objects (and arrays), Carib passes a copy of the reference to the object to the callee thereby allowing the called method to update (only) the members of the class (or array) associated with the actual. To handle return values, we store the value in and then update the final state to bind to this value.

Finally, the ife and assign rules can only propagate ; they do not introduce it. For assign, if reaches any variables in , assign binds to ; otherwise, the conventional semantics is used. When reaches an if statement’s conditional expression the ife rule assigns to all l-values potentially written by either branch of the if statement; otherwise, it applies to the appropriate branch.

3. Realising Oracles and Translating to Carib

To analyse industrial programs, we must translate them into Carib and instantiate Carib’s three oracles: its writable location oracle , its termination oracle , and its externally defined, set of divergent API methods . To handle constructs Carib does not define, we translate to Carib in the obvious way, conceptually de-sugaring them. For , we assume a user-supplied list. Below, we describe how we realize and for Android bytecode. Realising these oracles is not enough. To apply the Cook analysis to actual programs, we also need to change the semantics of their potentially divergent constructs, like loops and method calls. We achieve this via a program transformation that replaces each potentially divergent construct with parallel assignments of to the l-values of .

3.1. Soundly Identifying Potential Writes

Realising requires finding writable locations, both syntactic l-values and what can be reached through them. We currently harvest syntactic l-values from assignment statements without considering their feasibility. Computing reachable l-values would require handling aliasing, which occurs when two l-values refer to the same object. To account for the possibility of aliasing in the analysis we use l-value representatives where two l-values are aliases if they have the same representative.

We want our Cook analysis (Section 4) to scale to large applications, so we need a sound and efficient handling of aliases. Instead of performing pointer analysis (Section 6.4), we use Sundaresan et al.’s approach (SundaresanHRVLGG00) as it offers a simple and scalable solution. Their approach is based on the observation that only objects of compatible types can be aliases in a type-safe programming language, such as Java. Another advantage of this approach is its simplicity and ease of implementation. Sundaresan et al. use a flow-insensitive analysis because their main purpose is to keep track of object types. In our case, we need to track value transfer between variables. Therefore, we take inter-variable flow relations into account. For example, in the code x := y; y := z, a flow-insensitive analysis captures data-flow from y to x and z to y and the spurious flow from z to x. Our approach does not include the spurious flow. Despite this additional precision, we still benefit, in terms of scalability, from Sundaresan et al.’s alias handling.

Let denote augmented with abstracted locations created from the types of the subject program. Formally, we define to map each l-value to its representative:

where is an object of type , is a field, is an array, is an index, and denotes the highest class in the type hierarchy of that contains the field .

In Carib (Section 2), an l-value is either a variable, an array, or a field access. In the absence of an array or field dereference, the mapping is simply . The other two cases, a field or array access, are more involved. Two field accesses and are aliases if and point to the same object. To handle this case, all potentially aliasing field accesses must have the same representative. In type safe languages, like Java, and can be aliases only if and belong to the same type hierarchy. While in principle if and are aliases, then either suffices as the representative, for ease of identification, we include in ’s range representatives based on type names, specifically .

An array access l-value can alias for two reasons: reference and index. In the reference case, and alias if and alias. Alternatively, and alias if . To take both into account, we perform a lightweight alias analysis that partitions array terms into parts of potential aliases. Given an array , returns the representative of ’s alias part. Defining the representative of as solves the problem of reference-induced aliasing, but not index-induced aliasing. Tracking indexes may generate an unbounded number of terms when indices are modified inside a loop. Therefore, ’s third case conservatively assumes all indices alias.

Using and , the set of all syntactic l-values in , we approximate as

This realisation of , , assumes can access ’s internals. This assumption does not hold, in general, for API calls whose implementation can be externally defined in a black box library. Such API calls are prevalent in real world code. To handle them, we use a second instantiation of that we describe in Section 3.3 where we first use it.

3.2. Identifying Divergent Loops and Calls

Realising Carib’s semantics for real bytecode demands that we first identify loops. Since bytecode permits unstructured loops, we implemented a loop detection analysis that searches for loops in the control flow graph of each method, which was experimentally shown to outperform existing alternatives (WeiMZC07). This analysis is an optimised depth first traversal that discovers the control flow graph. This analysis also allows us to detect both simple and complex loops including nested loops and those constructed using gotos. A purely syntactic method, this analysis is complete (finds all loops), but unsound, in that it may report infeasible loops.

Having identified loops, we turn to realizing . Despite the undecidability of loop termination in general, we can statically determine that some loop forms terminate. While simple, our oracle realisation is not purely syntactic. It simplifies a loop before checking its form. Let be a loop that does not contain any nested loops, but otherwise has an arbitrary body. The execution of loop can be expressed as a sequence of single-iteration cycles: . Our oracle concludes that terminates iff both of the following hold: 1) each cycle increments a counter and 2) this counter is bounded in each cycle. These two conditions guarantee the existence of an increasing ranking function that is bounded from above, which is sufficient to ensure loop termination (PodelskiR04). This instantiation of , which we denote , safely and conservatively under-approximates the set of terminating loops.

Our corpus of Android apps includes 627,423 loops. Of these, our oracle shows that 330,894 (53%) terminate. Despite its simplicity and conservatism, identifies a large number of terminating loops.

In Section 2.1, we use to populate , the set of divergent recursive methods. works only on loops, which necessitates a separate mechanism for . To do so, we conservatively build a call graph for the program using the class hierarchy approach (SundaresanHRVLGG00) that provides a conservative approximation of the runtime types of receiver objects. We then identify recursive methods as those nodes belonging to strongly connected components in the call graph. To this end, we use Tarjan’s algorithm for detecting strongly connected components (Tarjan72).

3.3. Rewriting Divergent Constructs

Figure 5. The program transformations used to convert divergent constructs into parallel assignments of ; used in REC is ’s formals.

To track divergence in a program, we identify divergent constructs and replace them with assignments of to every l-value representative potentially modified. We do this as a source-to-source transformation , which maps each statement to a statement that explicitly includes all necessary assignments of . As Figure 5 depicts, we define using the three rewriting rules: loop, rec, and api. The loop rule replaces the loop with a parallel assignment of to all l-value representatives that the loop modifies. It uses to identify these l-values. The rec rule does the same for calls to recursive methods.

Handling loops and recursion makes us complete with respect to our language Carib (Section 2), assuming that a program is self-contained (i.e. does not make API calls). Most programs, however, make API calls to external libraries. Because an external library is a black box, we cannot syntactically determine which of its parameters it may write through. Thus, we need a second instantiation of to handle API calls. This second instantiation determines all the l-value representatives potentially modified using a given formal parameter. For arrays or objects, however, we must consider their fields. Let ‘fields’ be the set of fields of a variable where the empty set denotes a scalar. Given the formal parameter x, its reachable l-values are the set of l-value representatives given by the function defined as follows

is irreflexive because, under Carib’s call by value semantics, actual parameters are immutable. In Figure 5, the api rule uses to handle API calls where the set includes all l-value representatives reachable from ’s actual parameters, as determined by .

As an illustration of , consider the code shown in Figure 6. To simplify the presentation, the code examples in the paper use to a more common Java-like syntax. The actual analysis is applied to bytecode, which is closer to Carib’s Jimple-like syntax of Figure 2; however, the more Java-like syntax better communicates the intuition behind our technique. In Figure 6, method m has a single formal parameter a and accesses its b field and subsequently b’s field x through the variable tmp. Thus, aa.btmp.x, which, assuming A has no super classes, yields {A.b, A.B.x}.

class A { B b;}
class B {int x;}
void m(A a)
   a.b = ...;
   B tmp = a.b;
   tmp.x = ...;
Figure 6. A Java example illustrating the l-values reachable from the formal parameter .

Our transformation applies the rules loop, rec, and api. We naturally extend to an entire program , where each divergent construct in is replaced with in .

4. Cook: Discovering Sub-Turing Islands

This section introduces our analysis algorithm Cook, named after the British explorer Captain Cook. Starting from the basic knowledge that divergent constructs are clearly part of the swamp, we want to analyze their impact on other parts of the program. Let us write to refer to variable x at program location (e.g., at a given line number). We also write to indicate that in the scope of statement , variable y at location depends on variable x at location . In other words, modifying x at may modify y at . We over-approximate (Figure 4) with respect to divergence propagation via the following rule: * [left=(divergence propagation)] ℓ_1:x=⊥     depend(ℓ_2:y,ℓ_1:x, s)ℓ_2:y=⊥ This suggests an approach for identifying sub-Turing islands by applying a dependency analysis whose goal is to assign to variables that depend on other divergence-affected variables. Broadly speaking, our analysis approximates the dependency relation induced by the program over variables, yielding an over-approximation of the swamp. Methods not in the swamp make up the sub-Turing islands, which we thus conservatively under-approximate.

Figure 7 overviews Cook’s workflow and components. Cook takes as input a transformed program whose divergent constructs have been rewritten. Cook outputs a report indicating which methods are sub-Turing and which fall into the Turing swamp.

Figure 7. Cook’s workflow and main components: check marks sub-Turing island methods; cross marks methods in the Turing swamp.

While a sub-Turing island can be any code region, the islands we consider in the remainder of the paper are methods. Cook implements a bottom-up inter-procedural dependency analysis. It consists of two fix-point computations. The outer computation, Explore (Algorithm 1), operates over the whole program and calls the inner computation Landfall (Algorithm 2), to compute facts for methods. In what follows, we describe each algorithm in detail.

4.1. Explore: Interprocedurally Searching for Sub-Turing Islands

Starting from the transformed program, Cook is an inter-procedural taint analysis that propagates divergence. Cook assigns a method to the swamp if it uses a tainted variable when called in a divergence-free state. Thus, Cook considers only taints produced by the method or a method it transitively calls.

In sub-Turing analysis of termination, nested loops (and recursion) can propagate bottom outwards but enclosing loops cannot propagate it inwards. Otherwise, if non-termination were to be defined to propagate inwards, this would make the analysis of Islands often trivial and useless. For example, a loop-free reactive program, encased in a single non-terminating loop would often simply become ‘all swamp’. That would not be helpful for analysis: the body is loop free and so this body always terminates. It can be analysed as a terminating island of code, in isolation from its surrounding loop.

In such a reactive system, figuratively speaking, the program is a single large ‘castle’ on an island surrounded by a ‘moat’ of swamp. Such a ‘swamp castle’ does not, itself, fall into the swamp. Pragmatically, this means that we could (and we argue, should) analyse and reason about the body of such a reactive system (which is loop free) in a very different way to the way in which we would reason about it as a whole component in a larger system. However, for our Cook analysis, the fact that taints do not propagate from the calling context means we cannot use an off-the-shelf solution (Flowdroid).

Cook’s output is the set of sub-Turing methods. Cook is inter-procedural and needs the program’s call graph. Object-oriented languages, in general, have many features, such as method overriding, that make constructing an exact call graph at compile time impossible. Thus, Cook over-approximates the call graph using a class hierarchy approach (SundaresanHRVLGG00) that conservatively approximates the runtime types of receiver objects. For an object having a declared type

, its estimated types will be

plus all the subclasses of . If is an interface then its estimated types are all the classes implementing it and the classes derived from them. We use the notation to represent the inheritance relation between classes (types); means that is a subclass of . This relation is reflexive, thus . Given an object , the function returns all the types that can potentially have at runtime. If the declared type of is the class then we have

Let function return all classes implementing interface , including the implementations of subinterfaces of . If the declared type of an object is an interface , then we have

This means that we take into account all the classes implementing , the ones implementing subinterfaces of and their subclasses. For a method invocation , the possible resolutions of the virtual method at runtime is given by

We use a class name as a prefix to distinguish different virtual methods. We write to indicate that method is defined in class and use to stipulate that statement appears in method . Finally, the call graph of a program is given by

By , we mean that the class is defined in the program . Hence the call graph represents the set of all possible pairs of (caller, callee) belonging to the given program.

Input: Program
Output: ’s sub-Turing methods
:= ;
  // is the set of ’s methods.
1 Var map ;
2 Var set := ;
3 foreach  do
4       := ;
6while  do
7       := ;
       := ;
        // Algorithm 2 defines Landfall.
8       if  then
9             :=
       := ;
        // Remove ’s locals from its summary.
10       if  then
11             := ;
             /* is ’s call graph. */
12             foreach  do
                    // ’s summary changed, so we update its callers.
14return ;
Algorithm 1 Explore traverses its input program’s call graph, calling Landfall on each method, until it reaches a fix point where no method summaries change and it exhausts its worklist.

Leveraging the approximate call graph, Algorithm 1 implements , Cook’s interprocedural algorithm. takes a program transformed by (Section 3). initializes a worklist to hold all the methods found in the program (Line 1) and associates empty summaries with each method (Lines 4-5). The swamp is also initially empty (Line 3). A summary for each method is then computed, by calling (Line 8), described below. Using the facts returned by , Line 9 tests if belongs in the swamp. It does when its summary contains at least one element of the form , meaning that Cook cannot safely, statically determine that it terminates. If this is the case, adds to the . Function locals returns the set of l-value representatives corresponding to a method’s local variables. It is useless to keep such elements in a summary; Line 11 discards them. If ’s summary has changed (Line 14), its entry is updated and its callers are placed on the worklist (Lines 15). Finally, on Line 16 returns the set of sub-Turing methods as the complement of against the set of all program methods.

4.2. Landfall: Cook’s Intraprocedural Analysis

Landfall (Algorithm 2) is an intra-procedural analysis. It approximates the dependence relation induced by a given method over program variables. It uses the lifted set of l-values and an abstract interpretation over the domain representing the powerset of pairs of l-value representatives:

where is defined as . Each pair in means that x depends on y with the use of taking aliasing into account. We call the pair a fact. Furthermore, the element expresses that we cannot rule out the possibility that x might be affected by divergence.

Landfall computes the transitive closure over elements from the domain with respect to statements of method using two auxiliary functions: control-dependence function and data-dependence function . The function captures control dependencies created by conditional statements. For example, consider the code shown in Figure 8. If in this example we only account for data dependencies, we conclude that variable y only depends on z errantly omitting x. However, if x is affected by divergence, we need to propagate this fact to y.

Before describing how we compute , we introduce relevant terminology. Each method in the program is represented by a Control Flow Graph (CFG), a directed graph where is the set of nodes and a set of edges. Each node represents either an assignment or a branch condition. The edges, , represent control flow between program statements. In Carib, we map each assignment to a node with one successor and each conditional statement to a node with two successors, representing the and branches. For CFG node , is the set of successors of , its predecessors, and the statement represents. Finally, each CFG includes two special nodes: is the CFG’s unique entry node, which has no predecessors, and is its unique exit node, which has no successors.

To compute control dependencies, we use the well-established approach of Ferrante et al. (FerranteOW87), which we denote as where CFG is a control flow graph, a location, and a map associating locations with sets of facts. This function returns the set of facts induced by control dependencies for location . Function includes transitive control dependencies.

1        y = 0;
2        if (x > 0)
3            y = z;
Figure 8. Code illustrating a case of control (implicit) dependency. Variable y is control-dependent on variable x.

Turning to the data dependences, the function models the effect of program statements on elements of the abstract domain . For a given fact and statement , is defined as follows:

where is the set of dependencies locally induced by statement . For example, yields . Function data_dep transitively extends the relation represented by the input facts and the relation induced by the . It also excludes (kills) facts that are no longer valid after the assignment. For example,

Since the assignment modifies x, the fact no longer holds. Landfall transitively obtains the fact from the input fact combined with from the assignment statement.

We provide the definitions of functions and for Carib’s basic statements in Table 1. Assignments to simple variables (the first five cases) result in dependencies expressing how the assignment’s left-hand-side depends on the identifiers appearing in its right-hand-side except when the right-hand-side is a constant, which does not introduce any dependencies. When the right-hand-side is an object field or an array reference, we use its representative to take aliases into account. The return statement is modeled as the assignment ret := id, where ret is a special variable (see Figure 4) used to store and retrieve the method’s return value. In all these cases, we kill input facts expressing dependencies involving the assignment’s left-hand-side.

In case of an assignment to a field or array element, we use its representative to take aliases into account. To preserve soundness, we do not kill any facts. Indeed, a representative over-approximates possible aliases. Therefore, the updated l-values may or may not be an actual alias of a given fact. For a call to a method , we replace the formal parameters with the corresponding actuals in ’s summary, which is a set of facts expressing dependencies induced by . We also replace the special variable ret with r. Landfall computes method summaries iteratively, on-the-fly when demanded by Explore. Finally, for the assignment , we keep the fact expressing that the assigned variable is affected by divergence because the purpose of our analysis is to track the propagation of .


Table 1. Definition of and for relevant Carib statements; for call statements, contains ’s formals.

Landfall uses and in a standard worklist. The input and output of all nodes is initialized the empty set on Lines 4-5. Then, the entry node’s input is created on Line 6. New facts are produced by simulating the effect of program statements using the transfer function data_dep (Line 12), accounting for control dependencies (Line 13). When the set of facts associated with a given location changes, all successors of are explored again (Lines 14-16). The algorithm is guaranteed to terminate because is finite and so is the set of facts. Once a fix-point is reached, the algorithm returns the set of facts accumulated at the exit node.

Input: Program , method
Output: set of facts
1 Var map , ;
2 Let be the control flow graph of ;
3 Let be the lifted set of l-values appearing in ;
4 foreach  do
5       := := ;
8 := ;
9 while  do
10       := ;
11       := ;
12       := ;
13       := ;
14       := ;
15       if  then
16             foreach  do
17                   ;
19return ;
Algorithm 2 Landfall approximates the dependency relation over program variables that its input method defines; Table 1 defines its and functions..

4.3. Implementation

We implemented our approach for sub-Turing island identification in a tool called Endeavour, which is written in Python. Endeavour takes as input an Android application and returns a report that includes the analysis result together with other statistics. Endeavour accepts Android apps directly in binary (APK) format. It uses Androguard111 to parse and decompile the APK files as well as generate the control flow graphs. Hence, Endeavour does not require source code. We use our own intermediary representation for instructions which has a lisp-like format. One key phase in Endeavour is loop extraction (Section 3.2), which extracts a list of loops, each of which is identified by its header together with the nodes it contains. It also obtains the hierarchical (domination) relation between loops. Finally, Endeavour implements the over-approximation of the call graph based on the class hierarchy approach (SundaresanHRVLGG00) (Section 4.1). Endeavour is available at

5. Experimental Results

This section empirically investigates six research questions involving sub-Turing islands, henceforth abbreviated ST-islands. We start by overviewing the application corpora that makes up our experimental subjects. The investigation then begins by considering the prevalence of ST-islands. Simply put if ST-islands are rare then their study is of little practical value. We next take a deeper look in into the main causes of divergence. Finding API methods the dominant source, we consider the impact of safe listing subsets of the API methods. Then turning to two of the many applications of ST-islands, we consider first the relationship between bug density in the swamp and on the ST-islands, and second the percentage of verification conditions, such array bound violations and null object dereferences that occur on ST-islands. Finally. we consider the runtime efficiency of our tool Endeavour.

In the experiments, unless otherwise stated, we make the following assumptions. First, we discarded getters and setters as we assume that they are implemented in a standard way making them trivially sub-Turing. In addition, we initially assume that all API calls diverge and bind to all variables they may write or that depend on them.

We study two sets of apps. A large dataset, app_bin, of over one thousand apps, for which source code is unavailable, and a smaller set, app_src, of ten apps, for which full source code is available. Both corpora, are composed of a range of real world production apps to ensure that our empirical scientific findings have high external validity. The app_bin dataset is composed of 1100 Android applications uniformly selected from more than 600000 apps collected from the Androzoo222 Androzoo apps have diverse origins, including the Google Play, store which is the predominant source the apps we study. Our set of 1100 apps contains more than 2 million methods. The app_src dataset is composed of ten applications selected from Github under certain criteria that we describe later. We only consider this dataset in the experiment described in Section 5.4, which requires the app source code. In all other experiments, we consider the larger add_bin set.

5.1. Landscape of ST-islands

First of all it is important to know the proportion of code that resides within ST-islands. The answer to this question suggests the code size over which we can reason precisely. A significant proportion means that it is worth investing in the improvement of static analysis as the benefit may be substantial. So the first research question we address is the following:

RQ1: What is the proportion of code occupied by ST-islands?

The results using app_bin are summarized in Figure 9 where the left boxplot shows the distribution of ST-method percentages.

Finding 1a: Overall, the average percentage of ST-methods in an app is approximately 55%, hence, the majority of methods are sub-Turing.

To study the impact of code size on our results, we want to exclude trivial methods. Defining trivial is hard. We conservatively consider methods of fewer than ten lines as trivial. To convert lines into bytecode instructions, we averaged method length in bytecode instructions over its non-comment source code length and found that on average each line of source code generates three bytecode instruction. Thus, we consider a method trivial if it includes fewer than thirty bytecode instructions. The results when considering only non-trivial methods are shown on the right of Figure 9.

Finding 1b: Discounting trivial methods, the percentage of ST-methods is 22%, which while lower than overall average, still represents a significant portion of the code.

While the percentage of ST-methods drops, it remains significant as it represents almost a quarter of each app. Moreover, our analysis is both sound and efficient, hence the percentage under estimates the true proportion of code that lies in non-trivial sub-Turing islands. A more precise but less efficient analysis can only ever uncover additional sub-Turing methods. Hence, this result underscores the value of investing in static analysis tools specialized to exploit ST-islands.

Figure 9. Percentage distribution of ST-methods in our 1100 apps, discarding getters and setters (left boxplot). In addition to discarding getters and setters we also discard methods with less than 30 bytecode instructions (right boxplot). The average percentage of ST-methods in the first case (left) is 55% and it is 22% for the second case (right).

5.2. Causes of Divergence

Understanding the causes of divergence informs us about prevalent reasons of precision loss. For example, if it turns out that a certain language construct is the dominant cause of divergence, then we might want to give it greater attention in future work. Therefore, we seek an answer to the following research question:

RQ2: What are the main causes of divergence?

To answer this question, we refined our analysis by extending the abstract domain with an element indicating the cause of divergence: API call, loop, or recursive method.

Finding 2: Over corpus app_bin

, we classify the sources of divergence as following

api loop recursion 76% 13% 11%

We can see that over three quarters of the divergence is due to library API calls. This suggests that a more precise modelling of API calls is likely to improve the precision of a given static analysis. We set out to experimentally investigate this hypothesis in the next section.

5.3. API Safe Listing

(a) : All methods. (b) : Only methods with #inst 30.
Figure 10. Percentage distribution of ST-methods considering a safe list of most frequently used APIs. The -axis shows the size of the safe list as a percentage of most frequently used APIs. Chart (a) shows box plots for all 1100 apps, discarding getters and setters while Chart(b) also discards methods with fewer than 30 bytecode instructions.

Cook is very conservative as it assumes that all API calls cause divergence. In practice, many called API methods have a quite well-understood and documented behaviour, making it is plausible to assume that calls to such API methods are not a source of divergence. In this section, we test the impact of this possibility in the following research question:

RQ3: How does a more precise modelling of APIs impact the analysis?

We define a safe list of most frequently used APIs which are assumed to not induce divergence. Among the selected APIs are methods from the Java standard library and some Android frequently used API methods. Under this setting, we repeat the experiments of Section 5.1, where we vary the size of the API safe list. Results are shown in Figure 10. Figure 10a, shows the percentage of ST-methods per app for different sizes of the API safe list while Figure 10b considers only methods with more than 30 bytecode instructions. Results when using an empty API safe list repeat the data shown in Figure 9. We included them as a baseline. At the other end placing all API methods on the safe list allows us to investigate the impact of a developer who seeks to focus the analysis solely on his or her code.

Finding 3: For a safe list containing just 5% of most frequently used APIs, the average percentage of ST-methods grows to almost 80% when all methods are considered and just over 50% when only methods containing more than 30 bytecode are considered.

Here a safe list of only 5% of the frequently used APIs yields an important increase in ST-methods. Interestingly, increasing this to 10% has minimal impact, which may be an instance of the way the most frequently used calls tend to distribute as a power law. Finally, including all APIs on the safe list causes 88% of all methods and 66% of all non-trivial methods to be ST-methods. The trend here hints at the value in techniques such as providing formal summaries for the common API methods.

5.4. Distribution of Bugs over ST-Islands

It is interesting to check whether there is a correlation between bugs and ST-islands. We address this possibility in the following research question:

RQ4: Is there a significant difference in the bug distribution in the swamp compared to the ST-islands?

Investigation of this research question requires application source code; thus we make use the app_src collection, which was collected under the following constraints:

  • Open source: we need the code of the application as well as the corresponding repository to perform the experiment.

  • Repository history: to rule out simple weekend projects.

  • Non-trivial size: to rule out small toy applications.

  • Number of application installations: we want the apps to have real users, thereby attesting to their practical use.

The resulting app_src collection includes the ten real-world applications shown in Table 2.

We compute bug density for ST-methods and swamp methods using the following steps:

  • To identify bugs and their corresponding locations, we use a heuristic based on a bag of words. We check the presence of certain commits associated with keywords such as ”bug”, ”fix”, etc. in the

    git repository of each application. We call such commits bug-fix commits.

  • A buggy line is any line removed, added, or modified by a bug-fixing commit. A method is buggy if it contains a buggy line. We assume that a single bug is associated with a single commit and write bugs(m) to express the number of bugs associated with method m.

  • As our analysis is at the bytecode level, we compile the original source code of each app considered to obtain a binary APK file to analyse.

  • Finally, we compute bug density for ST-methods and swamp methods. The bug density for an application , , is defined as

    where is the number of lines of code in method and the number of methods in . We respectively denote the bug density for ST-methods and swamp methods as and .

app LOC
bitcoinwallet 23392
connectbot 26625
irccloud 57471
k9 123606
mgit 10919
orbot 18772
owncloud 63495
signal 92868
vlc 69976
worldpress 128433

Table 2. Bug Density in ST-methods and the swamp for 10 Android open source projects given as number of bugs per kilo line.

Overall in app_src there are 6906 ST-methods comprised of 475 KLoC with 1863 bugs, and 7417 swamp methods comprised of 894 KLoC with 5317 bugs. We compare bugginess statistically using the non-parametric Wilcoxon test at first the method level and then the line level. The average bugs per method of 0.27 for ST-methods and 0.72 for the swamp are statistically different (). Because swamp methods tend to include more lines of code, we also compare the two using bugs-per-line. In this case, the 0.0265 for ST-methods is again statistically less than the 0.289 for the swamp (). Table 2 breaks these bug density out by program.

Finally, we use generalized linear models to investigate the question “How likely is a method to be buggy?” where a method is considered buggy if it contains one or more bugs. A method’s bugginess forms each model’s response variable. Generalized linear models enable us to consider multiple explanatory variables as well as binary response variables. In the first model, we use ST-island as the sole explanatory variable. With an odd ratio of 2.07, the model predicts that a swamp method is over twice as likely to contain a bug when compare to an ST-island method (

). Including program as an additional explanatory variable, which enables the model to account for differences between programs, increases the odds ratio to 2.09. The impact of additionally including lines as an explanatory variable is negligible with or without the program variable. Finally, it is interesting that there is no significant interaction between program and a method being an ST-method; thus, the likelihood of being an ST-island method is independent of the program. This unexpected uniformity strongly supports the external validity of our findings.

Finding 4: The bug densities for ST-islands are statistically smaller than that of the swamp ().

From the above statistics, bug density tends to be higher in the swamp. This result further supports our suggestion to use the swamp as a hint for guiding bug search. In other words, one should allocated a limited budget (time, resources, etc.) to the swamp than to the ST-islands.

5.5. Finding Potential Errors

ST-islands are portions of code about which we can precisely answer whether a given property holds. We would like to investigate the presence of concrete properties falling into ST-islands on which program safety relies. One such property is a runtime exception such as an array out-of-bounds and null-object dereferences. We address the following research question:

RQ5: What is the percentage of verification conditions related to detecting bound violations and null object dereference runtime errors that occur in ST-islands?

We studied the spread of these two potential runtime exceptions over ST-islands in our app_bin corpus of 1100 applications. We count all array accesses and object dereferences in the code and compute the proportion of the ones occurring in ST-methods for each application.

The results, presented in Figure 11, show that just over one in three exceptions can be precisely checked at compile time because it lies on a sub-Turing island. This is a lower bound for our corpus of 1100 apps, because our determination of sub-Turing islands is a safe under-approximation. Moreover, as visible in the violin plot (Figure 11), the percentage of array accesses and object dereferences is around 80% for a notable number of apps.

Figure 11. Percentage distribution of array accesses and object dereferences in ST-methods per application. The -axis depicts a kernel density plot of the data, mirrored around the plot’s central line. Intuitively, kernel density captures the likelihood that the -axis has this value.

Finding 6: A lower bound on the average percentage of sub-Turing array accesses and object dereferences in our corpus is 37%.

5.6. Analysis Performance

We have established that non-trivial portions of real world Android app code lie in sub-Turing islands and have demonstrated that this has implications for bug density and verification in an empirical analysis. Finally, we report on the computational cost of identifying sub-Turing islands using our approximation. While many other techniques for approximation could be used, and should be explored in future work, it is useful to know whether, at least one such analysis exists that is scalable. If we are able to provide evidence that our approximation is computationally feasible and, therefore, that there does exist a scalable useful approximation to sub-Turing islands, this will further underscore the practical value of sub-Turing analysis.

RQ6: Can ST-islands be efficiently identified?

We measured Endeavour’s analysis time from parsing an application to delivering its output on a 3.2GHz Intel Core i5 quad-core processor with 8GB of memory, running Linux. The results show that our approach is scalable to real-world applications.

Finding 7: Endeavour takes less than four minutes for even the largest applications studied, containing more than methods.

6. Related Work

Our analysis marries taint analysis with termination reification (as divergence). Taint analysis is a technique used in software security (Flowdroid; Enck:2014; WeiROR14; TrippPCCG13; GordonKPGNR15). The goal of taint analysis is to show the absence of information leaks from a set of given sources to a set of given sinks. It can be performed statically (Flowdroid; WeiROR14; TrippPCCG13; GordonKPGNR15) or dynamically  (Enck:2014). Our bottom-up inter-procedural data-flow analysis is a flow-sensitive taint analysis that takes into account implicit information flows due to control dependencies. In our case, sources are divergent constructs. Our work also relates to various other topics, including invariant generation, loop summarization, bounded model checking, termination analysis, strictness analysis and program slicing.

6.1. Loops

As loops are a key component in our study, we consider work from the literature aimed at their analysis.


Our modelling of potentially non-terminating loops consists of assigning divergence values to variables they possibly modify. Loop summarization techniques allow to infer loop-free code that soundly approximates a given loop.

Sharygina and Browne proposed a syntactic transformation for abstracting branches in loops in a UML dialect (design level) (SharyginaB03). Kroening and Weissenbacher proposed an approach based on associating recurrence equations with loop variables and then computing a closed form for each equation. Kroening et al. (KroeningSTTW08) a proposed related technique for replacing code fragments, including loops, with corresponding abstract transformers that play the role of the summaries. Seghir proposed a lightweight technique for inferring loop summaries over array segments as well as simple variables using a set of inference rules (Seghir11). Xie et al. presented a technique for summarizing loops that contain multiple paths and manipulate strings, with conditions over string content (XieLLLC15). They further extended their work to support disjunctive reasoning (XieCLLL16). Loop summarization can be folded into our approach to increase the number of loops that can be statically determined to terminate by construction.


One approach for reasoning about loops in the context of program verification is through loop invariants (Hoare69). Many verification tools rely on manually provided invariants (FlanaganLLNSS02; BarnettCDJL05; DahlweidMSTS09). However, the literature is rich in terms of approaches that automatically infer invariants in various domains: arithmetic (Karr76; Muller-OlmS04; ColonSS03) (linear), (SankaranarayananSM04) (non-linear), arrays (JhalaM07; GulwaniMT08; SrivastavaG09) and heaps (SagivRW02; PodelskiW05). Software model checkers attempt to build invariants automatically, during the verification process (BallRaj; HenzingerJMS02; ChakiETAL03; PodelskiRybalARMC; IvanicicSGG05; cksy2004), relying on a popular technique called predicate abstraction (GrafS97). We can use invariants to express state changes (transitions) by introducing fresh variables to symbolically model initial values of variables. Hence, similar to summaries, we can use them to express the effect of a given loop, which should improve our algorithm’s precision.


Termination is another issue related to loops. Knowing the after-state of a given loop is only possible when the loop terminates. Therefore, we model the effect of potentially non-terminating loops by assigning a divergent value to potentially modified variables. The literature is rich with work regarding termination analysis (PodelskiRybalARMC; PodelskiR04; UrbanGK16; PodelskiR05; PodelskiR04LICS; CookPR05). So-called ranking functions (PodelskiR04; UrbanGK16) and transition invariants (PodelskiRybalARMC; PodelskiR05; PodelskiR04LICS; CookPR05) are one of the key approaches proposed to show termination. They both express relationships over program states modeling the progress of variables. From a more pragmatic perspective, showing termination of loops via simple arguments (analysis) has also been studied (FratantonioMBKV15). Integrating loop analysis with our approach would help us mitigate precision loss.

Bounded Model Checking

Bounded model checking (BMC) is a technique that deals with loops in a systematic manner by simply unrolling (simulating) them (ClarkeKL04; FalkeMS13; Cordeiro10). The unrolling process may eventually result in a loop-free code fragment that exactly models the original loop’s effect on program variables. Unfortunately, such an approach does not work for loops that are not explicitly bound as the unfolding process will not terminate. Nonetheless, we can combine BMC with our approach to improve our reasoning precision by restricting its application to loops with explicit bounds and apply other techniques to those that are not.

6.2. Slicing

Program slicing is a technique proposed by Weiser (Weiser81) to extract a set of statements, called a slice, that influence a specified computation of interest, referred to as the slicing criterion. The semantics of the original program are preserved by the slice with respect to the slicing criterion. There has been a tremendous amount of work on slicing and its applications (BinkleyH04). While the original proposal statically defined a slice, a dynamic variant has been proposed as well (AgrawalH90). In the latter, a slice is a set of statements that affect the slicing criterion with respect to a particular input.

Slicing has been applied to various problems: program debugging (AgrawalDS93), testing (HarmanHLMW07) comprehension (KorelR98), re-use (CanforaLM98), and re-engineering (RepsR95). While the original proposal is syntax-preserving (i.e., the statements of the slice are all taken from the original code), some variants amorphous (HarmanD97), allowing changes to the program syntax as long as the program semantics are preserved with respect to the criterion. In the context of software model checking, path slicing was proposed to find statements in a given path that are relevant to show its (in)feasibility (JhalaM05). Slicing has also been used to reduce the number of interlivings in event-oriented applications (BlackshearCS15), and recently it has been combined with runtime analysis to extract values of variables that make an application difficult to statically analyse (SiegSME16).

Our approach shares with slicing the characteristic of relying on dependency analysis. Moreover, our analysis naturally yields sub-Turing slices (i.e., portions of the program that are sub-Turing). We obtain them by simply backtracking paths in the control flow graph of a given method and selecting statements that are not affected by divergent values.

6.3. Strictness Analysis

Similar to our approach, strictness analysis has been proposed to track divergence resulting from non-termination and error causing program crashes, such as division by zero. A function is said to be strict if it diverges whenever one of its parameters diverges. A variant of strictness analysis, joint-strictness, takes into account parameter combinations. A function is jointly-strict in a subset of its arguments if it divergences when all the arguments of the subset diverge. Mycroft proposed an approach to approximate the divergence relationship induced by a given function over its parameters and the result it returns (Mycroft80). The approach relies on an underlying forward abstract interpretation (CousotC77). A backward analysis has been implemented into the Glasgow Haskell Compiler to perform strictness analysis in a demand-driven fashion (tpda-haskell). Other forms of strictness analysis have been proposed in the literature. For example, Wadler and Hughes describe several projection-based strictness (WadlerH87), such as head-strictness and tail-strictness, refining the original basic definition.

However, a function being sub-Turing neither entails strictness nor the other way around. Indeed, if a function always diverges regardless of its parameters, it is strict but not sub-Turing. On the other hand, the function f(x,y){if x return 1 else return y} is sub-Turing but not strict. It is sub-Turing as it does not contain any divergent construct. However, it is not strict because in case x is true the function does not diverge even if y diverges.

6.4. Pointer Analysis

Pointer analysis aims at determining the set of memory locations a pointer may refer to during program execution. Two popular pointer analysis that constitute the basis of many other approaches are Steensgaard’s (Steensgaard96) Andersen’s (Andersen94programanalysis). While Steensgaard’s analysis does not take into account the direction of flow of values induced by assignments, Andersen’s approach models assignment direction. Therefore, Steensgaard’s technique offers more scalability while Andersen’s provides more precision. Das proposed an algorithm lying between Andersen’s and Steensgaard’s approaches (Das00). It is scalable and, at the same time, its precision is very close to Andersen’s

Lhoták and Hendren introduced the SPARK framework (LhotakH03) that offers building blocks for implementing various pointer analysis for Java.

Sridharan et al. proposed a pointer analysis variant which is suitable for environments with small time and memory budgets (SridharanGSB05). Their approach is demand-driven, i.e., performs only the work necessary to answer a query issued by a client.

Instead of applying a pointer analysis, we soundly handle aliases using the variable representative idea inspired by Sundaresan et al (SundaresanHRVLGG00) (3.1). We plan to empirically study the impact of pointer analysis on Cook.

7. Conclusion

In this paper, we addressed the empirical question of how often a program analysis question has, in practice, an exact solution. To this end, we introduced sub-Turing islands, which are portions of code in which any question of interest is decidable. We provided a formal definition of sub-Turing islands and presented an algorithm for identifying such islands in applications. We have implemented our approach in a tool called Endeavour and applied it to a representative corpus of 1100 Android applications.

Our empirical study revealed that sub-Turing islands make up 55% of the methods in the 1100 Android apps studied. These results are not merely of theoretical interest, but have practical ramifications in software engineering. Our findings suggest that we can provide more precise assessments of test coverage; that we can expect more precise assessments of change impact analysis; that we can hope for more precise slices, and thereby, more precise re-use, better comprehension, and better re-engineering interventions. For example, in the code on which we report, 37% of runtime-exception guards reside within sub-Turing islands. This means that an exact answer regarding the validity of these guards can be statically determined.