Configuration-dependent Fault Localization

11/18/2019 ∙ by Son Nguyen, et al. ∙ The University of Texas at Dallas 0

In a buggy configurable system, configuration-dependent bugs cause the failures in only certain configurations due to unexpected interactions among features. Manually localizing configuration-dependent faults in configurable systems could be highly time-consuming due to their complexity. However, the cause of configuration-dependent bugs is not considered by existing automated fault localization techniques, which are designed to localize bugs in non-configurable code. Thus, their capacity for efficient configuration-dependent localization is limited. In this work, we propose CoFL, a novel approach to localize configuration-dependent bugs by identifying and analyzing suspicious feature interactions that potentially cause the failures in buggy configurable systems. We evaluated the efficiency of CoFL in fault localization of artificial configuration-dependent faults in a highly-configurable system. We found that CoFL significantly improves the baseline spectrum-based approaches. With CoFL, on average, the correctness in ranking the buggy statements increases more than 7 times, and the search space is significantly narrowed down, about 15 times.



There are no comments yet.


page 1

page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Problem Statement and Background

Configurable system supports the diversification of software products by providing configuration options that are used to control different features. However, this induces challenges in program analyses and quality assurance  [productlinesbook1, productlinesbook2, productline_survey].

In quality assurance for configurable system, configuration-dependent faults, which cause the failures in only certain configurations because of unexpected interactions among several features, are not rare  [Garvin:2011, plugin_testing, Kuhn:2004, Yin:2011]. Manually localizing configuration-dependent faults in configurable systems could be highly costly due to their complexity [Meinicke:2016, productline_survey].

Meanwhile, existing automated fault localization techniques [wong2009survey] are designed to localize the faults in non-configurable code. Specifically, for configurable code, they do not consider the cause of configuration-dependent bug(s), which is the unexpected feature interactions. Thus, many parts of the buggy system, which are not related to those unexpected interactions, are inappropriately considered as suspicious. Indeed, for example, despite that one can adapt spectrum-based techniques [tarantula, ochiai, wong2009survey] for configurable code by considering static conditional statements (e.g., #if) on configuration options as if-statements, the adapted techniques still access and rank all executed statements including the ones that might not affect the fault-inducing interactions, even not the program’s states. For slice-based methods [static_slice, dynamic_slice], the suspicious domain is reduced to all slices that are related to failed test execution information, which might include the slices irrelevant to the unexpected feature interactions. Therefore, the capacity of the traditional techniques [wong2009survey] for efficient configuration-dependent fault localization is limited.

Ii Motivation and Observation

Let us start with a real configuration-dependent bug in Linux kernel to motivate our approach (Fig. 1). In this example, the maximum value of KMALLOC_SHIFT_HIGH is 25 (lines 9–10). This indicates that kmalloc_caches contains a maximum of 26 elements (line 13). When PPC_256K_PAGES is enabled and PPC_16K_PAGES is disabled, the maximum index used to access kmalloc_caches is defined as (PAGE_SHIFT + MAX_ORDER-1) (line 18), which is 28. This leads to an exception that array kmalloc_caches is accessed out of its bounds. However, this bug is not revealed by any configuration, except the configurations in which PPC_256K_PAGES, SLAB, LOCKDEP, and SLOB are enabled, and PPC_16K_PAGES is disabled.

Observations. From the example shown in Fig. 1, we have the following observations:

O1. In a configurable system containing configuration-dependent bug, there are certain features that are (ir)relevant to the visibility of the bug. For example, in Fig. 1, feature NUMA (line 27) does not involve in the bug because when PPC_256K_PAGES, SLAB, LOCKDEP, and SLOB are enabled and PPC_16K_PAGES is disabled, the system still fails regardless of whether NUMA is enabled or disabled. Meanwhile, for some configurations, enabling/disabling certain features might make the test results (passing all tests or not) of the resulting configurations change. In Fig. 1, the all-enabled configuration behaves as expected, while if PPC_16K_PAGES is disabled and all other options enabled, the resulting configuration fails.

O2. In the features s that must be enabled to make the bug visible, only the statements that implement the interaction between them are more likely to be buggy than others. In LOCKDEP, the buggy statement is at line 18, which is one of the statements realizing the interaction between s. In contrast, if the bug is caused by the statements not related to the interaction between s, the visibility of the bug would not depend on all of those s. In Fig. 1, the enabled features s include PPC_256K_PAGES,SLAB, LOCKDEP, SLOB, and PPC_16K_PAGES. The bug is not related to the statement at line 21 in LOCKDEP, which is not used to realize the interaction of s.

O3. In the features s that must be disabled to make the bug visible, the statements that implement the interactions with s also provide useful indication to help us find suspicious statements in s. In Fig. 1, PPC_16K_PAGES is a disabled feature . Although line 6 in PPC_16K_PAGES (being disabled) is not considered as faulty, however analyzing the impact of the statement at this line (defining PAGE_SHIFT) on the statements in LOCKDEP and SLAB can provide the suggestion to identify the statement need to be fixed (i < PAGE_SHIFT + MAX_ORDER). The intuition of this phenomenon is that despite that the statements in s are not faulty, s have the impact of “hiding”/“masking” the bug when they are enabled. Thus, we need to consider the interactions of other features with s in localizing configuration-dependent bugs.

O4. Because certain statements in the enabled features to make the bug visible are considered as suspicious, the statements in the same/different features having impacts on the suspicious statements via program dependencies [cia, pdg] should also be considered as suspicious. For example, although line 1 does not belong to any , that statement is also suspicious since it has an impact on the statements at lines 9, 10, and 18.

Figure 1: A Configuration-dependent Bug in Linux Kernel

Iii Approach

We propose, CoFL, a novel approach for configuration-dependent fault localization. For a buggy configurable code, to reduce the suspicious domain, we analyze the test results of the executed configurations, the code, and the test execution information to identify the executed statements related to the interactions among the features whose enabling/disabling affect the visibility of the bugs which potentially cause the failures. These statements are ranked by their suspiciousness levels assigned by existing techniques [wong2009survey] based on their test execution information.

In particular, CoFL

first determines minimal sets of feature candidates whose enabling/disabling (feature selection) make the bugs visible (based on

O1). Let us call such a set of feature selections the suspicious partial configuration (SPC). For example, {SLAB=T, PPC_16K_PAGES=F, PPC_256K_PAGES=T, LOCKDEP=T, SLOB=T} is considered as the of the bug in Fig.1. The selection of NUMA does not belong to the of the bug because they do not have any impact on its visibility.

Next, CoFL aims to detect the suspicious statements that are responsible for the feature interactions and potentially cause the faults. To do that, it analyzes the features in to detect the interactions between them that are potentially cause/disguise the configuration-dependent bugs. Then, CoFL detects the statements that realize those interactions (based on O2 and O3). The interactions are detected via the shared program entities including variables and functions controlled by different features and the operations including define and use performed on them. For example, PPC_256K_PAGES define PAGE_SHIFT which is used by SLAB and LOCKDEP. In the example, the statements realizing the interactions among the s in the are at lines 3, 9, 10, 13, 18, and 20 (). Meanwhile, the statements in s for interactions between the s and the s in the are at lines 9 and 18 ().

After that, the suspicious statements are used to detect other suspicious statements that are executed and have dependencies on the statements in both and in the failed configurations (based on O3 and O4). The output for the running example is the set of statements at lines 3, 9, 10, 18, and 1. Finally, these statements are ranked by their suspiciousness scores computed by existing techniques [wong2009survey] such as spectrum-based methods based on their test execution information.

Iv Empirical Evaluation

We evaluate CoFL’s efficiency in localizing configuration-dependent bugs over 2 spectrum-based techniques, Tarantula [tarantula] and Ochiai [ochiai]. We randomly seeded the set of 32 artifical configuration-dependent bugs into the subject system BusyBox [busybox]. For each bug, the output rank are evaluated via  [exam] and the suspicious domain size (). The lower and smaller the more efficient the technique.

Tarantula 37.50 147.17
CoFL with Tarantula 5.12 10.58
Ochiai 36.54 147.17
CoFL with Ochiai 4.97 10.58
Table I: Comparison Results

Table I shows the average and average of Tarantula, Ochiai and CoFL with their formula. As seen, on average, the correctness in ranking the buggy statements increases more than 7 times, and the search space is significantly narrowed down, about 15 times.

Conclusion. The novel idea of CoFL, our configuration-dependent fault localization method for configurable code, is to leverage the test results and code analysis to detect interactions between features that potentially cause the bugs and use these interactions to reduce the suspicious domain.