Legion: Best-First Concolic Testing

02/15/2020 ∙ by Dongge Liu, et al. ∙ Universität München The University of Melbourne 0

Legion is a grey-box concolic tool that aims to balance the complementary nature of fuzzing and symbolic execution to achieve the best of both worlds. It proposes a variation of Monte Carlo tree search (MCTS) that formulates program exploration as sequential decisionmaking under uncertainty guided by the best-first search strategy. It relies on approximate path-preserving fuzzing, a novel instance of constrained random testing, which quickly generates many diverse inputs that likely target program parts of interest. In Test-Comp 2020, the prototype performed within 90



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Test-Generation Approach

Coverage testing aims to traverse all execution paths of the program under test to verify its correctness. Two traditional techniques for this task, symbolic execution [king1976symbolic] and fuzzing [takanen2018fuzzing] are complementary in nature [godefroid2008automated].

1int ackermann(int m, int n) {
2  if (m==0) return n+1;
3  if (n==0) return ackermann(m-1,1);
4  return ackermann(m-1,ackermann(m,n-1));
7void main() {
8  int m = input(), n = input();
9  // choke point
10  if (m < 0 || m > 3) || (n < 0 || n > 23) {
11    log(n,m);               // common branch
12    return;
13  } else {
14    int r = ackermann(m,n); // rare branch
15    assert(m < 2 || r >= 4);
16  }
Figure 1: Ackermann02.c

Program Entry

Program state
selected for fuzzing


Unknown paths

Observed paths

Score: estimate the likelihood of finding new paths

Figure 2: MCTS-guided fuzzing in Legion

Consider exploring the program Ackermann02 in Fig. 2 from the Test-Comp benchmarks as an example. Symbolic execution can compute inputs to penetrate the choke point (line 10) to reach the “rare branch” (lines 14/15), but then becomes unnecessarily expensive in solving the exponentially growing constraints from repeatedly unfolding the recursive function ackermann. By comparison, even though very few random fuzzer-generated inputs pass the chokepoint, the high speed of fuzzing means the “rare branch” will be quickly reached.

The following research question arises when exploring the program space in a conditional branch: Will it be more efficient to focus on the space under the constraint, or to flood both branches with unconstrained inputs, to target the internals of log(m,n) in line 11 at the same time?

Legion 111The name Legion comes from the Marvel fictional character who changes personalities for different needs, to reflect the strategy adaption depending on the program. introduces MCTS-guided program explorationas a principled answer to this question, tailored to each program under test. For a program like Fig. 2, Legion estimates the expectation of finding new paths by the UCT score (upper confidence bound for trees), a successful approach for games [MCTS-Survey], aiming to balance exploration of program space (where success is still uncertain) against exploitation of partial results (that appear promising already). Code behind rare branches is targeted by approximate path-preserving fuzzing to efficiently generate diverse inputs for a specific sub-part of the program.

Our variation of MCTS integrates traces of concrete executions into a tree-structured search space and iterates as follows: We recursively descend down the tree to determine which sub-tree to target by approximate path-preserving fuzzing. The choice between depth-first and breadth-first exploration is made by choosing the highest-scoring node among a parent and all of its children at each step. The score is based on the ratio of distinct vs. all paths that were observed passing through a given node, but nodes selected less often in the past are more likely to be chosen. Next, we execute the program with many inputs that satisfy the constraints from the path condition of the selected node. The resulting execution traces are recorded and integrated into the tree.

The efficiency of Legion is crucially determined by its ability to quickly generate inputs that pass through the program state of interest. We adopt the technique of QuickSampler [QuickSampler]

from propositional logic to path constraints of program state over bitvectors. Although inputs generated via this heuristic may take a different path down the tree, approximate path-preserving fuzzing is in general accurate, efficient, and produces fairly uniformly distributed inputs.

2 Tool Description & Configuration

We implemented Legion as a prototype in Python 3 on top of the symbolic execution engine angr [wang2017angr]. We have extended its solver backend, claripy, by the path-preserving sampler, relying on the optimizer component of Z3 [bjorner2015nuz]. Binaries are instrumented to output execution traces as lists of addresses.

Installation. Download and unpack the competition archive (commit b2fc8430):

Legion requires Python 3 with python-setuptools installed, and gcc-multilib for the compilation of C sources. Necessary libraries compiled for Ubuntu 18.04 are included in the subfolder lib (modified versions of angr, claripy and their dependencies). The archive contains the main executable, Legion.py, and a wrapper script, legion-sv that includes lib into PYTHONPATH. The version tag is 0.1-testcomp2020, options can be shown with python3 ./Legion.py --help.

Configuration. In the competition, we ran ./legion-sv with these parameters:

--save-tests save test cases as xml files in Test-Comp format
--persistent keep running when no more symbolic solutions are found (mitigates issue with dynamic memory allocations)
--time-penalty 0 do not penalise a node for expensive constraint-solving (experimental feature, not yet evaluated)
--random-seed 0 fix the random seed for deterministic result
--symex-timeout 10 limit symbolic execution and constraint solving to 10s
--conex-timeout 10 limit concrete binary execution to 10s

In the category cover-branches, we additionally use this flag:

--coverage-only don’t stop when finding an error

Finally, -32 and -64 indicate whether to use 32 or 64 bits (this affects binary compilation and the sizes for nondeterministic values of types int, …).

Participation. Legion participates in all categories of Test-Comp 2020.

Software Project and Contributors. Legion is principally developed by Dongge Liu, with technical and conceptual contributions by all authors of this paper. Legion will be made available at https://github.com/Alan32Liu/Legion.

3 Discussion

Legion is competitive in many categories of Test-Comp 2020, achieving within 90% of the best score in 2 of 9 error categories and 7 of 13 coverage categories.

1void main( ) {
2  int N=100000, a1[N], a2[N], a3[N], i;
3  for (i=0; i<N; i++)
4    a1[i] = input(); a2[i] = input();
5  for(i=0; i<N; i++) a3[i] = a1[i];
6  for(i=0; i<N; i++) a3[i] = a2[i];
7  for(i=0; i<N; i++) assert(a1[i] == a3[i]);
Figure 3: standard_copy2_ground-1.c

Legion’s instrumentation and exploration algorithm can accurately model the program. Consider the benchmark standard_copy2_ground-1.c in Fig. 3. With a single symbolic execution through the entire program over a trace found via initial random inputs, Legion understands that all guards of the for loops can only evaluate in one way, and so omits them from the selection phase. It does discover that the assertion inside the last loop contributes interesting decisions, however, and will come up with two different ways to evaluate the comparison a1[i] == a3[i], one of which triggers the error. With such an accurate model, Legion is particularly good at covering corner cases in deep loops: All other tools failed to score full marks in standard_copy*_ground-*.c benchmarks, but Legion succeeded in 9 out of 18. We can furthermore solve benchmarks where pure constraint solving fails, e.g., when the solver times out on hard constraints of complex paths we label the respective branches for pure random exploration.

While instrumentation provides accurate information on the program, its currently naive implementation significantly slows down the concrete execution of programs with long execution traces. We mitigate this weakness by setting a time limit on the concrete executions. As a consequence, inputs that correspond to long concrete execution are not saved. In the future, we plan to explore Intel’s PIN tool, which offloads binary tracing into the CPU with negligible overhead.

Legion inherits some limitations from angr as a symbolic execution backend. Some benchmarks, such as array-tiling/mbpr5.c, dynamically allocate memory with a symbolic size that depends on the input. angr eagerly concretises this value, so that later on the constraint solver cannot find any solutions as the input is no longer symbolic, even though such solutions do exist. To mitigate this issue, Legion detects this case and omits the erroneous program states from selection. This helps e.g. on bubblesort-alloca-1.c where Legion achieved full coverage (in contrast to most other participants) despite the dynamic allocations.

Legion performed poorly on benchmark sets bitvector and ssh-simplified. These programs have long sequences of equality constraint that are hard to satisfy with fuzzing. This happens to be an extreme example of the parent-child trade-off that Legion intends to balance where fuzzing the parent gives nearly no reward. This could potentially be mitigated by decreasing Legion’s exploration ratio in the UCT score, but we have not attempted such fine-tuning.

Another problem is allocations when loop counters or array sizes are randomly chosen very large in 64 bit mode, leading to excessively long concrete execution traces that cause timeouts or memory exhaustion. We plan to periodically prune the in-memory representation of the tree in the future.