BOLD: An Ontology-based Log Debugger for C Programs

The different activities related to debugging such as program instrumentation, representation of execution trace and analysis of trace are not typically performed in an unified framework. We propose BOLD, an Ontology-based Log Debugger to unify and standardize the activities in debugging. The syntactical information of programs can be represented in the from of Resource Description Framework (RDF) triples. Using the BOLD framework, the programs can be automatically instrumented by using declarative specifications over these triples. A salient feature of the framework is to store the execution trace of the program also as RDF triples called trace triples. These triples can be queried to implement the common debug operations. The novelty of the framework is to abstract these triples as spans for high-level reasoning. A span gives a way of examining the values of a particular variable over certain portion of the program execution. The properties of the spans are defined formally as a Web Ontology Language (OWL) ontology called Program Debug (PD) Ontology. Using the span abstraction and PD ontology, end-users can debug a given buggy program in a standard manner. A notable feature of using ontology is that users can accurately debug in some cases of missing information, which can be practically useful. To demonstrate the feasibility of the proposed framework, we have debugged the programs in a standard bug benchmark suite Software-artifact Infrastructure Repository (SIR). Experiments show that the querying time is almost the same as in gdb. The reasoning time depends on the sub-language of OWL. We find that the expressibility offered by OWL-DL language is sufficient for the bugs in SIR programs; but to achieve scalability in reasoning, a restricted OWL-RL language is required.



There are no comments yet.


page 1

page 2

page 3

page 4


Datalog Rewritability of Disjunctive Datalog Programs and its Applications to Ontology Reasoning

We study the problem of rewriting a disjunctive datalog program into pla...

Ontology-based Representation and Reasoning on Process Models: A Logic Programming Approach

We propose a framework grounded in Logic Programming for representing an...

Specification and Inference of Trace Refinement Relations

Modern software is constantly changing. Researchers and practitioners ar...

Verifying Relational Properties using Trace Logic

We present a logical framework for the verification of relational proper...

Deep Learning for Ontology Reasoning

In this work, we present a novel approach to ontology reasoning that is ...

Bonsai: Synthesis-Based Reasoning for Type Systems

We describe algorithms for symbolic reasoning about executable models of...

Exploration of the scalability of LocFaults

A model checker can produce a trace of counterexample, for an erroneous ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Ontology is a formal knowledge modeling framework that facilitates reasoning and easy integration of information from diversified sources. It has been successfully applied in various domains such as semantic web [linkeddata], robotics [robotics], genetics [geneontology]. Today, there is growing interest in ontology-based program analysis [ecoop, codeontology]. In this paper, we discuss the application of Web Ontology Language (OWL)-based ontology to the debugging problem.

The traditional debuggers are reasonably effective in aiding users in identifying bugs. However, those are primarily of forward type, and execute or simulate a program only in the forward direction. Thus, users can inspect the state of only the current statement being executed. If users want to inspect the state of a previous statement, they have to rerun the program. To circumvent this issue, two types of debuggers have been proposed [engblom]: log-based and replay debuggers. These debuggers differ in how the execution trace is stored and reproduced for a query that requires backward-navigation in time. The log-based debuggers [odb, tralfamadore, pothier, engblomSystem, undodb] execute the program and store the state information in a log prior to the start of the debugging session. For queries that require backward-navigation, these debuggers simply query the log. The replay-based debuggers [undodb, urdb], on the other hand, save the results at the instances of the program points where there is non-determinism and create checkpoints. The checkpoints are places where the program state can be safely reconstructed. The backward-navigation queries are answered using checkpoints and saved results.

In this paper, we focus on log-based debugging. It can be easily adapted to complex applications involving multiple threads and machines (e.g., in a distributed setup). It is flexible because they require no human intervention. As the logs are stored, it can be analyzed offline. Kernighan and Pike [kernighan] advocate that log-based debuggers are more productive than stepping through the code or working with breakpoints. We provide simple, yet practically effective solutions to debugging.

Log-based debugging demands inspection of the execution trace of interest. The trace requires program instrumentation, which can be done at various times [instrumentationTechniques]

: source-code level, intermediate representation level, compilation-time, execution-time instrumentation, and run-time injection. Since source code continues to be dominant and user-friendly representation of a program, our framework uses source-code instrumentation. A variety of techniques exist for source-code instrumentation. The techniques are classified into preprocessors, tool-specific constructs to define instrumentation specification 

[custom1_instrumentation, custom2_instrumentation, custom3_instrumentation], aspect-oriented programming [aop_instrumentation, aop_cpp_instrumentation]. A primary drawback of these approaches, except for aspect-oriented instrumentation, is that there is no standard terminology for defining the instrumentation specification. Another limitation is that all these approaches including aspect-oriented instrumentation do not fit into a generalized framework. Thus, the existing approaches are not part of a unified framework which solves the instrumentation problem (and other problems discussed below) in debugging in a standardized way.

Several log-based debuggers [odb, tralfamadore, pothier, engblomSystem] exist to aid debugging. Unfortunately, these debuggers have two shortcomings. First, the techniques used for storing the trace are ad hoc, and limited to the tool-specific infrastructure. There is no standard, language-agnostic semantic model for storing the execution trace. This poses a hindrance for establishing communication with other tools that can be used, for instance, to analyze or summarize the trace. A generalized log-based debugger which is independent of the source-code-programming-language and source-code-debugger, but depends only on the trace is very helpful for software development. It provides a standard and cost-effective way of debugging applications developed in different programming languages.

The second shortcoming is that there are no standard solutions to build abstractions and reason over the trace. This functionality assists the developer to easily diagnose the cause of the bugs. Most of the existing techniques for analyzing the trace are limited to querying or replaying the stored log of the program by simulating the program execution. Some sophisticated tools [expositor, mztake, dalek, ebba, coca] allow defining and executing scripts over the trace, which are written in high-level languages. For example, the Expositor [expositor] and MzTake [mztake] tools abstract the subsets of the execution trace as lists and event streams respectively. But these approaches lack a standard abstraction and hence are limited only to the languages supported by these tools. The existing approaches for reasoning are based on special libraries available in a programming language, or through an external rule based reasoner. For example, the execution trace sub-lists created by the Expositor tool [expositor] are processed by the list-processing API libraries available in Python. In effect, these approaches lack a standard model and abstraction useful for reasoning formally. Hence they can produce inconsistent results and fail in case of incomplete information. We propose ontology-based modeling and reasoning to address these limitations.

Recently, ontologies have been used in declarative program-analysis to standardize the vocabulary and integrate information from different tools. The PATO framework [ecoop] used ontologies to standardize the syntactic knowledge of C programs. It also showed that ontologies can be used to achieve cooperation between different program-analysis-tools using standard vocabulary. The CodeOntology framework [codeontology] developed ontology for Java programming language. While these frameworks pioneered the marriage of ontologies and program analysis, they use ontologies in a minimal sense. These frameworks have used ontologies to standardize the vocabulary of a programming language. Further, they use external rule-based reasoning systems to carry out the program analysis tasks specified using the standardized vocabulary. In effect, these frameworks utilized only the descriptive ability but not the reasoning power of ontologies. In this work we illustrate both the standardized modeling and the reasoning benefits of utilizing ontologies for the debugging problem.

We propose an OWL-based framework called BOLD (Ontology-Based Log Debugger). We illustrate its utility for debugging C programs. In particular, we make the following contributions.

  1. We introduce BOLD, a framework to automate program instrumentation based on the C ontology [ecoop] and the external knowledge expressed as standard Resource Description Framework (RDF) triples. This allows programmers to control instrumentation through SPARQL, the W3C recommended query language for RDF.

  2. We introduce a semantic model to store the execution traces of C programs as triples. We also facilitate the users of the BOLD framework to abstract the trace as spans and use the properties of spans for debugging. This reduces both the time and the effort of the programmers in debugging tasks.

  3. We propose Program Debug (PD) Ontology that is used to standardize the representation of the resources in the trace and the span abstraction. We compare three different versions of the ontology that differ in the model used to represent the span and the sub-language of the ontology.

  4. We illustrate the effectiveness of BOLD using bug benchmark suite Software-artifact Infrastructure Repository (SIR) [sir]. Our experiments reveal that BOLD is able to diagnose the causes of the bugs using the PD Ontology and OWL reasoner Pellet [pellet]. Compared to gdb, BOLD also improves the time taken to query the trace information.

The rest of the paper is organized as follows. Section 2 provides a brief background on RDF, OWL and the associated definitions, along with an overview of the conversion of C program into RDF triples using the PATO framework [ecoop]. Section 3 describes the program instrumentation and trace model aspects of the BOLD framework. Section 4 describes the Program Debug ontology. Section 5 describes the ontology-based-reasoning and compares it with rule-based-reasoning. Section 6 compares the performance of BOLD to gdb. It also compares the scalability of OWL reasoners with different ontology models. Section 7 compares and contrasts with the existing relevant work. Section 8 concludes the paper with interesting directions to future work.

2 Background

In computer science, an ontology provides an explicit specification of the shared conceptualization of a domain. It is a description and logic language used to standardize the terminology and knowledge in a domain. The semantic web project[timBernersLee] introduced the languages for realizing ontologies. The description abilities were achieved using the languages such as RDF, RDF Schema (RDFS) and the logic abilities were achieved using OWL. The BOLD framework is developed using these languages. We give an overview of these languages in Section 2.1. The BOLD framework is built upon the PATO framework [ecoop] that will be reviewed in Section 2.2.

1int maxSubArraySum(int a[], int size) { 
2  int globalMax = -32767; 
3  int localMax = 0; 
4  for (int i = 0; i < size; i++) {
5    localMax = localMax + a[i]; 
6    if (globalMax > localMax) 
7        globalMax = localMax; 
8    if (localMax < 0)
9        localMax = 0;
10  } 
11  return globalMax; 
(a) maxSubArraySum.c
(b) Partial set of triples representing maxSubArraySum.c.
Figure 1: Representation of a C program using triples. The URI reference of the file in Figure fig:cfile is abbreviated using the prefix file in Figure fig:ttlfile.

2.1 Languages for Realizing Ontologies

RDF: The RDF language [rdf, rdfSyntax] treats every domain as a set of resources that form the domain. In RDF, we represent the information about the resources by making statements about them. Every statement describes a property of a resource and its value. They are formally represented as triples of the form (subject predicate object). Here subject is a resource in the domain, predicate is a property that has the value denoted by object. All the three elements of a triple are uniquely identified using Uniform Resource Identifiers (URIs). Since URIs are generally long, it is a convention to use abbreviations for them. For example, consider line 0(a) of the C program given in Figure 0(a). RDF treats this variable declaration line as a resource. It can be represented using the URI We use the abbreviation file to denote the URI The variable declaration has many properties such as statement type, name of the variable etc. They are represented as the following triples (file:ln2 rdf:type c:VariableDecl), (file:ln2 c:hasName globalMax). Here rdf is the abbreviation of the URI of the standard RDF terminology recommended by W3C. Similarly, c is the abbreviation of the URI of the C ontology proposed in PATO [ecoop] framework.

SPARQL [sparql] is the W3C recommended query language for RDF triples. It is a declarative language that uses triple patterns and operators to specify data retrieval requests on triples. We omit the technical details as they aren’t required for this paper.

RDFS: The vocabulary used to describe the resources and properties in every domain can be standardized to provide uniform representation and avoid ambiguities. The users can use the standard vocabulary to make RDF triples. The RDFS [rdfs] provides a way to declare the standard. It treats every domain in terms of classes/concepts, properties and individuals/instances. A class/concept is used to identify a collection of individuals that share some feature. Note that an individual can belong to multiple classes. The assertion that instance i belongs to class C is usually written as (i rdf:type C). A property is used to represent a directed binary relationship between classes or individuals. This is usually written as (i P i), where the individual i is related to the individual i through the property P. For example, the C ontology developed as part of the PATO framework [ecoop] is a domain ontology for describing the syntax of C programs. The class FunctionDefinition in C ontology contains the function definitions in C programs as the instances. Some other classes in the ontology include VariableDeclaration, ForStatement, AssignmentStatement etc. The property hasDataType represents the binary relationship between variables and their data types. Some other properties in the ontology include hasReturnType, hasParent, hasScope etc.

The RDFS language is mainly used to standardize the vocabulary in the domain. It provides constructs to assert the sub-class relationship among classes, domain and range of properties etc. The logic capabilities in RDFS are intentionally restricted [owl]. OWL extends RDFS to represent the logical knowledge and allows reasoning in the domain.

OWL: The OWL [owlLanguage] is a logic language that provides operators to define concepts, form concept and property hierarchies. The OWL is a generic term; it is actually of collection of many sub-languages. These sub-languages provide trade-off in terms of expressiveness and reasoning complexity. In this work, we use two sub-languages called OWL-DL and OWL-RL. The OWL-DL language is expressive enough to build knowledge bases of practical significance and provides feasible reasoning. The language is based on Description Logics (DLs). The DLs are fragments of first-order logic that assures the reasoning complexity in feasible time. Since the syntax of OWL is rather verbose, we use the equivalent concise syntax of DLs to describe our domain. The DLs provide operators to define concept expressions from primitive concepts. For example, the operator provides union of two concepts and the operator provides the negation (or complement) of a concept, the existential operator acts on a property to provide the set of individuals in the domain that relate to other via . These concept expression can be used to define new concepts called defined concepts using the operator . An example of the definition for the Adult class is as follows. Adult (Male Female) (hasAge[xsd:long ]). Adult is defined as a person who is male or female and has age greater than 17.

The complexity of reasoning in OWL-DL is exponential [dlhandbook], but for most of the practical requirements the time taken by the reasoners is acceptable. In some rare cases, the reasoners need considerably long time. To circumvent this problem, W3C endorsed three sub-languages as OWL profiles [owlProfiles]. These sub-languages are less expressive than OWL-DL but reasoning is tractable. We use an OWL-profile called OWL-RL. The features in this sub-language are a subset of the features in OWL-DL. Hence we use the same DL syntax to describe OWL-RL.

Semantic Web Rule Language (SWRL): There are many assertions of practical interest that can’t be represented in OWL-DL[owlrules]. But some of them can be represented using rules. DL-safe SWRL rules [dlSafeSWRL] can be used with OWL-DL to preserve the feasibility of reasoning. A DL-safe SWRL rule is of the form . Each atom is of the form or where is a class, is a property, , and are variables or individuals. Informally, the meaning of the rule is "if all the atoms in the antecedent are satisfied for a particular instantiation of the variables then the atoms in the consequent also hold true for same ".

2.2 Semantic Model of C Programs

We use PATO framework [ecoop] to construct a semantic model of C programs. In PATO, C programs are parsed using the ROSE compiler and the RDF triples that describe every line in the program are generated. An example C program and a partial set of the corresponding triples are shown in Figure 1. The program is a buggy-version of the standard maximum-sub-array-sum algorithm that takes an array of integers and determines the contiguous sub-array in which the sum of the elements is maximum. In the paper when we refer to a particular execution instance of the program, we assume the function is called with the parameters (, 6). We use this program and the execution instance with the above input as the running example. In the triples of Figure 0(b), the subject denotes a construct of the C program in Figure 0(a). The prefix file is the URI of the C file containing the construct. The meaning of the first triple (file:ln1_ln12 rdf:type c:FunctionDecl) is that the C construct beginning in line 0(a) and ending in line 0(a) is of type function declaration (c:FunctionDecl). The triples following the first triple describe some of the conceptual details of the function declaration statement. For example, the third triple asserts that the return type of the function file:ln1_ln12 is of type int. Similarly, some conceptual details of the variable declaration (int globalMax) denoted by the URI file:ln2 are also included in the triple set.

3 BOLD Framework

The BOLD framework is designed to address the problems of log-based debugging using ontology. It provides solution to the common subtasks of log-based debugging, namely program instrumentation, storage model for the execution-trace, reasoning about and querying over the trace in a unified approach. It is a unified framework because all the elements involved in the problem such as program, execution-trace are represented using the same semantic model RDF. Also, the querying and reasoning over RDF are carried out by the standard languages recommended by W3C. In this section, we describe various features of the framework. We omit the implementation and user interface details for want of space.

In general, source-code instrumentation involves two steps: (i) identifying source-code statements to be instrumented. The specification for describing these statements is called instrumentation specification. (ii) inserting log-statements at the statements identified. We describe the instrumentation in Section 3.1. Later, in Section 3.2, we present the semantic model of the program execution-trace, followed by the description of various debugging operations provided by BOLD in Section 3.3.

3.1 Instrumentation Specification

In general, it is not feasible to instrument every statement of the whole program, as the instrumented program may run many times slower than the original ones. So every application that requires instrumentation typically involves identifying the statements to be instrumented. For instance, one may want to instrument on every definition of a variable of interest, and not instrument other statements. The requirements of the specification depend on the application (or client). Based on the source of information feeding into these requirements, we classify them into two categories. The first natural category requires information from within the source-code. Examples of such specifications are identifying the statements where a particular variable is modified, the statements that contain a function call, etc. The second category requires information outside the source-code but is relevant for the instrumentation. Examples of such external-specifications are identifying the set of functions that belong to a library or the set of functions that "conceptually" belong together (e.g., all the variables that are vulnerable to buffer overflow, or all the functions that establish a safe TCP connection).

Existing systems such as gcc advocate changing the source-code of the program (or a header inclusion) which enable specific macros to be available during compilation [gccMacros]. These macros can then be used in the instrumentation. While this approach works, it is far from the ideal of good software engineering practices. We address this as below. In the proposed framework, all the information required for instrumentation specification can be described in the form of triples (e.g., see Figure 0(b)). We utilize the source-code-triples generated by PATO [ecoop] to implement the source-code-specifications. The external information can also be described as RDF triples. For example, all the functions in file library can be asserted as the instances of FileLibraryFunctions class. Such identifiers of classes, when used in the specifications, makes them concise and maintainable. The instrumentation specifications can be implemented by running a query to select the statements and the variables / expressions of interest after those statements. We avail the services of SPARQL for information retrieval. The advantage of using SPARQL is that there is a clear delineation between the program to be instrumented and the instrumentation specification, making the process modular. This way, the source-code of the program can be free from the macros that control the instrumentation.

The BOLD system processes the instrumentation specification given as a SPARQL query. It adds statements to log the values of the variables / expressions at appropriate locations. The statements used for logging generate trace information in the form of triples. These triples serve as the semantic model of the trace that we discuss in the following section.

3.2 Semantic Model of the Execution Trace

The trace information is captured after the execution of the statements as dictated by the instrumentation specification. The information includes the variable values and pointer addresses after the statements of interest. Note that this information can be captured after the same statement multiple times during the execution. It is important to distinguish between the multiple visits of the statement . To fix this issue, we associate a unique timestamp with the information captured after each statement. The timestamps are implemented as natural numbers and follow a linear order with the relation . If the trace information captured after two statements and has timestamps and respectively and , then must have been executed before (and vice versa).

We define the trace information captured after each statement as a model called execution-trace model. The model is same for scalar variables, expressions, array member accesses and pointers to scalar data types. Even though the model is same for all the above mentioned types of variables, we refer to the model as simply the model of scalar variables for simplicity of writing. Intuitively, the model for scalar variables is described by the tuple . The element is a variable that has the value after the statement during the execution. The element timestamp identifies the position of visit of in the overall sequence of visits of all statements during the execution. The model slightly changes for the member variables of struct data type in C. For these variables, the model is described by the tuple . The element is a member of the struct data type. The element identifies the variable of the struct data type used to access the member . The remaining elements are the same as those of the model described for scalars. The trace information captured after all the statements can be conceived of as one big list. We call this list as execution-trace list.

Recall that we use the input array to capture the execution trace for the C program in Figure 1. The partial set of execution-trace model is shown in Figure 2. The tuples describe the trace information of the variable localMax (identified by the URI (file:ln3Var)) after line 0(a) in the program. The tuples are captured in different loop iterations. This is indicated by the tuple’s last element timestamp, which is in the increasing order. The third element denotes the value of the localMax variable in different iterations.

Figure 2: Partial set of the execution-trace model captured after line 0(a) of C program in Figure  1. The identifier file:ln3Var denotes the variable localMax in the program

3.3 Debugging Operations in BOLD

The BOLD framework supports three types of debug operations on the trace: querying the trace, creating high-level abstractions over the trace, and reasoning over the created abstractions. To facilitate debugging, the framework allows users to have an interactive debugging session with the system similar to gdb session. During the interaction, the users can issue various framework commands. These commands are used to invoke the functions which perform the debug operations. The syntax is similar to Datalog atoms.

3.3.1 Querying the Execution Trace

Querying the information available in the execution-trace is a standard feature offered by the debuggers. The queries are posed to the system as framework commands are essentially SPARQL queries. The queries run on the execution-trace triples and are used to perform the commonly used debug operations such as break (to go to a desired statement), inspect (to know the information of the variables at a particular statement), and step (to step-through instructions). Note that the existing debuggers cannot express source-code or external-knowledge (relevant knowledge outside the program) related requirements in a standard way. For instance they can’t express the computation of the execution path in a function, the size of an array after the class of functions that can cause buffer-overflow, or the value of file pointer after file library functions. The reason is that the source-code knowledge, and external knowledge are typically not available together in a standardized way to the debuggers. We address this issue in BOLD framework through a class of queries called integrated queries. They are called integrated because they run on the cumulative triples formed from the integration of three different triple stores, namely, source-code triples, execution-trace triples, and external triples. The integration of knowledge from multiple sources is an easy task in ontology because of the use of URIs to represent the resources.

3.3.2 Abstractions of the Execution Trace

The abstraction phase is used to create meaningful abstractions of the trace that provide effective reasoning. In BOLD framework, it is defined as the process of constructing new sequences (called spans) from the execution-trace list. Each span contains the information related to a variable or a simple expression and is expected to satisfy a property. We explain about spans in this section and their properties in Section 5.1

Each span contains the values of a variable or an expression retrieved from the execution-trace list at discrete time stamps. The span is formally described as . Here is the name of the span and is the variable or the arithmetic-expression for which the span is constructed. is an ordered list of cells, where each cell is described as . The element is the value of the variable at the timestamp identified by the element . To facilitate reasoning in the standard format, BOLD stores the spans in the form of RDF triples.

In the running example, to record the variable values at the end of each iteration of the for-loop, we construct a span each for variables globalMax, localMax, and i. One can see from the (intended) semantics of the program, that the values stored in the respective span would be non-decreasing for globalMax, non-negative for localMax, and strictly increasing for the variable i.

To construct a span, the users pose framework commands related to abstraction to the BOLD system. The system executes these commands to retrieve the timestamps called span-timestamps. Timestamp is a system-internal element and the users don’t have access to it. Therefore, one of the interesting challenges in BOLD is to allow users specify the timestamps. Recall from Section 3.2 that timestamp is associated with the execution of each instance of the statement during execution. One solution that BOLD system implements is to let the users specify the statement identifiers; so that the system can retrieve timestamps from them.

The formal specification of the abstraction commands is described as . Here is the variable for which the span is constructed. The element is the identifier of the statement after which the variable is accessed. Since the statement may be executed many times, it is necessary to filter and identify the required instances. We resolve this issue by introducing the concept of filters. The last element in the specification , called filter, is used to identify the instances.

In the running example, we construct a span on globalMax variable that contains the values of the variable at the end of the for-loop in different iterations. We refer to this span as globalMaxSpan. Similarly, we construct a span on globalMax variable at the end of the for-loop in the iterations where the value of localMax after line 0(a) is positive. We refer to this span as globalMaxFilteredSpan and the condition highlighted in blue as localMaxPositive condition. Next we explain the details of filters.

A filter is formally defined as an interval or a set of timestamps relative to which the span-timestamps are computed. The filters are of two types: interval-based filters and set-based filters. An interval-based filter is specified using two timestamps: lower and upper bounds of the interval. Both the bounds are optional. If they are not specified the BOLD system considers the lower and upper bounds as and respectively. It considers only the timestamps that are with the bounds of the interval to construct the spans. Intuitively, a filter-interval specifies a sub-list of the execution-trace list over which the spans are constructed.

The motivating example for set-based filters is as follows. In the globalMaxFilteredSpan specification, the localMaxPositive condition is true after many instances of the statement in line 0(a) because the statement is in a loop. In the iterations where the value of localMax is positive, we are interested in the value of globalMax at the end of the loop. Note that the timestamp at the end of the loop is different from the timestamp after line 0(a). To construct the globalMaxFilteredSpan-like spans, we introduce the concept of set-based filters. A set-based filter is described as a set of timestamps of the form . The BOLD system interprets each of the successive elements of the set as a pair. So it forms a set of pairs of timestamps for each filter-set. Each pair in the set provides one span-timestamp (the first such span-timestamp if there are many). The span-timestamp lies within the bounds of the pair. In the globalMaxFilteredSpan specification, the timestamps related to line 0(a) where the localMaxPositive condition is true form the set. The timestamps at the end of the loop in those iterations where the condition is true form the span-timestamps.

The lower, upper bounds of interval-based filters and the elements in filter set are timestamps. As already mentioned, the users don’t have access to the timestamps. But they have to specify them as part of the abstraction commands. We tackle this issue by specifying them in terms of the states of the variables (called filter-variables). The state of a variable is formally described as a 4-component tuple . The meaning of the tuple is that the relation element describes the equality / inequality relationship between the filter-variable and the value after the statement . Note that the filter-variables can be different from the variable for which the span is constructed. For example, in the globalMaxFilteredSpan specification, localMax is the filter-variable and globalMax is the variable for which the span in constructed. The statement instances in the filter can be same as that of the statement instances at which the span is constructed or any nearby statements’ instances. In globalMaxFilteredSpan example, the statement identified by file:ln5 is used to define the filter and file:ln8-ln9 (RDF identifier for the if-conditional in lines 8–9 of Figure 0(a)) is used to construct the span. Using the above explanation, the globalMaxFilteredSpan is specified by the tuple file:ln2Var, file:ln8-ln9, file:ln5, file:ln3Var, .

4 Program Debug Ontology

(a) The tuple (file:ln3Var, file:ln5, 3, 3) of the execution trace from Figure 2
(b) The first cell of the globalMaxSpan abstraction with list model
(c) The first cell of the globalMaxSpan abstraction with set model
Figure 3: Partial set of triples representing the execution trace and the abstractions for our example

Our Program Debug Ontology provides the vocabulary and axioms to standardize the information generated in different features of the BOLD framework. In this section, we explain the important terms of the ontology. The execution trace and the span abstraction defined above are represented and stored in the form of triples. The ontology also provides axioms and rules that formalize the reasoning process in BOLD, which we explain in the next section.

Recall that the execution-trace model for variables is described by the tuple that maps the variables to different states. This tuple is stored as a set of triples. The property hasState maps a variable to a state. The state is elaborated using three properties. The value of the first property, afterStatement, provides the statement id after which the state is captured. The second property, hasValue, provides the value of the variable after the statement . The value of the final property, timestamp, provides the time-stamp value. For our running example, the triples for the first tuple (file:ln3Var, file:ln5, 3, 3) in Figure 2 are presented in Figure 2(a). The individual pd:st1 denotes the state of the variable after executing the statement file:ln5 at time-stamp 3.

The span abstraction is also represented in the form of triples. We implemented two models to realize the triples corresponding to spans. They are the list model and the set model. The elements in the list model are ordered based on time-stamp, whereas the elements in the set model are unordered. In principle, they both are equivalent because the information present in both the models is the same. The difference arises in the time taken to realize different properties of the span. We will empirically show it in Section 6.

We used the standard rdf:List to implement the list-model of spans. The rdf:List is a recursive list data structure that is in principle similar to lists in Prolog. It contains a sequence of cells connected by the property rdf:rest. The RDF standard represents the contents of each cell using the property rdf:first. In BOLD, the content of a cell is the value of a variable at time-stamp . Additionally, we used the property timeStamp to assert the time-stamp value of a cell. The set-model is similar to the list-model except that the cells are not explicitly connected to each other. Instead, every cell has an additional property called index which gets used to maintain the relative position in the span.

Recall that, in our running example, globalMaxSpan contains the values of the globalMax variable in different iterations of the for-loop. The globalMaxSpan has a constant value -32767 in all the cells (because of a bug that will be identified in the next section). The RDF representation of the first cell in the span in both the list and the set models is presented in Figure 2(b) and Figure 2(c) respectively. In both the figures, the span is identified by the individual pd:l1. The first cell is identified by pd:l1 in the list model and pd:c1 in the set model. Note that the identifiers of the span and the first cell are the same in the list model because list model is represented by a sequence of cells that are linked to each other (analogous to singly linked list). The linkage is done through the property rdf:rest. In the Figure 2(b), the last triple shows the link between the first and the second cells. When there are no more links, the rdf:nil individual is used to terminate the list. There are no such links in the set model. Instead, the index of the cells in represented using the property pd:index. In both the figures, the triples given in the curly braces describe the contents of the first cell. Similar set of triples is used to represent every other cell. This makes the representation standard, and querying and reasoning concise. The concepts of the ontology that aid in debugging are presented in Section 5.1.

5 Ontology-based reasoning

Reasoning in the BOLD framework is the process of inferring the properties of spans created in the abstraction phase. There are two approaches to implement reasoning in the debugging systems. They are ontology-based-reasoning and rule-based-reasoning. Most of the existing declarative-debugging systems [coca, declarative_debugging1, declarative_debugging2] are based on rule-based-reasoning. The novel contribution of this work is the usage of ontology-based reasoning for debugging.

We discuss three advantages of using ontology-based reasoning compared to rule-based reasoning. Foremost, ontologies provide a standard way to define non-primitive concepts in terms of primitive concepts, properties, and individuals using the operators. So by using an ontology-based system, the vocabulary used in debugging can be standardized. Second, the ontology-based systems are designed to always produce consistent inferences. Such systems generate inferences only if the ontology axioms and rules are consistent. In contrast, the rule-based systems can not detect inconsistencies in the rule specifications. Rules can be accidentally written to infer both an assertion and its negation. Third, the ontology-based-reasoning systems can produce accurate results in some cases of incomplete information, which can be practically useful. This is because they adopt

open-world assumption. This means the facts that are unavailable are considered to be unknown. We explain this difference with an example.

1int main() {
2  int A[] = {5, 9, 2, 4};
3  f(A); // A is sorted in ascending order
4  g(A); // The "sorted" order of A is unknown
5  h(A); // A is sorted in non-ascending order
Figure 4: A partial C program to illustrate the advantages of ontology-based-reasoners

Consider a partial C program presented in Figure 4. All the functions f, g and h sort the array that is provided as the parameter. The program is executed and the trace contains the values of array A after f and h function calls but does not contain information about A immediately after g. After the function calls f and h, the values of array A are in ascending and non-ascending orders respectively. From the intended semantics of the program, it is known that A is sorted after function call g. But whether A is in ascending or non-ascending order is unknown. Suppose a query is posed to the reasoner asking whether the main function contains consecutive function call statements after which A is either in ascending or non-ascending orders. As the information about A is unavailable after g, the rule-based-reasoners can not find consecutive calls that satisfy the query. Hence, they return no as the answer. The ontology-based-reasoners can utilize the fact that every sorted array must be in either ascending or non-ascending order (can be provided through ontology). So A must be in one of the two orders after g. It then computes the answer based on cases. In the first case where A is assumed to be ascending, the reasoner has found consecutive function calls f and g. In the second case where A is assumed to be non-ascending, the reasoner has found consecutive function calls g and h. Thus, the reasoner returns the accurate answer (yes).

Ontology-based reasoning systems also suffer from certain drawbacks. The first drawback is that these systems don’t allow new individuals to be created during the reasoning process to maintain computational feasibility. In practice, however, this is not a deterrent, especially for procedural style programs. This is because in program analyses, we often need to store the intermediate results. For example, if we are reasoning over tree data structure, we might store the result of a traversal. Therefore, to adddress this drawback, we propose a preprocessing step in BOLD which creates new individuals to be utilized by ontology reasoners. The abstraction phase discussed in Section 3.3.2 is the preprocessing step.

The second drawback of ontology-based systems is that, as already mentioned in Section 2, the OWL-DL reasoners are exponential time algorithms [dlhandbook]. It means the reasoners are not scalable with increase in the size of data. We illustrate the limits empirically in Section 6. To overcome this drawback, we expressed the reasoning properties used for debugging in OWL-RL profile (refer to Section 5.1). As we will show in Section 6, the ontology expressed in OWL-RL scales better than the one expressed in OWL-DL.

5.1 Implementation in BOLD

Identifying span properties is helpful to determine the cause of the program bugs. We discuss about two types of properties: intra-span and inter-span. The intra-span properties are useful to verify the properties of the values of a variable abstracted into a span. This abstraction is mostly done after a particular statement in different iterations of a loop, or in a sequence of statements in a program, or relative to the statements that hold a particular value of a variable. The intra-span properties perform universal, existential, and comparison checks on the span. An example of universal check is whether all the elements are positive. Similarly, one can check for negative, zero, non-positive, non-negative, and non-zero elements. An example of existential check is whether the span contains a positive element. Similarly, one can check for a negative element, a zero element, duplicate elements and unique elements. The comparison checks verify "sorted"-ness properties of a span such as ascending, descending, non-ascending and non-descending. In our running example, we can verify the properties of globalMaxSpan and globalMaxFilteredSpan constructed on globalMax variable. Recall that globalMaxSpan contains the values of the variable in all iterations of the for-loop and globalMaxFilteredSpan contains the values of the same variable in the iterations where the localMax variable is positive. If the program is correct, the globalMaxSpan must be a non-descending span and globalMaxFilteredSpan must be a span with all the positive elements. In the running example, the first property is satisfied and the second property is violated. It means even though the value of localMax is positive, globalMax is not updated. Hence there must be a bug in lines 0(a)0(a) related to globalMax variable assignment.

The inter-span properties are useful to verify the properties related to two different spans. Note that the two spans can belong to different variables / expressions. They are useful to verify loop invariants, assertions described as if-else rules, relations between variables etc. They are most likely to be verified within a block of statements such as a particular loop or a particular instance of a function call. We address this issue using the idea of comparable spans. Let the sequence of cells in two spans and be described as and , where is the value in span at timestamp . Two spans and are comparable if for and . Intuitively, in two comparable spans and , the cell at index of is ordered based on the timestamp between the cells at index and of . In BOLD framework, to perform coherent operations, we restricted that all inter-span properties be verified only on comparable spans. The inter-span properties are used to verify the relation between the values in two spans at the same index position. The relationships can be any of the standard comparison operators such as . In the running example, we can define a span on the localMax variable that contains the values of the variable after line  0(a) in different iterations of the loop. If the program is correct, each value in this span is less than or equal to the same-index value in globalMaxSpan.

Description of the properties of span in PD Ontology: Recall that span is implemented as either the list or the set model, and the PD ontology is expressed using either OWL-DL or OWL-RL. These combinations lead to three versions of the ontology: OWL-DL-list, OWL-DL-set, and OWL-RL-list. The vocabulary used to abstract the span in the list and the set models is already discussed in Section 4. We now provide an example of the description of a typical property called non-zero span property in these three versions.

Recall that OWL-DL ontology can be used in conjunction with SWRL rules. The description of the property in OWL-DL-list version is created using the rules 1, 1 and axioms 1, 1 in Table 1. The rule 1 declares that the RDF list ?l1 that has the first cell as zero as an instance of the concept ListWithZeroElement. The rule 1 declares that if the sub-list ?l2 of the RDF list ?l1 belongs to the ListWithZeroElement concept, then ?l1 also belongs to the same concept. In these rules, the terminology used in connection with rdf:List follows from RDF standard [rdf]. The axioms 1 and 1 gives the definitions of the concepts ListWithAllNon-ZeroElements and SpanWithAllNon-ZeroElements expressed in OWL-DL. The Span is also a rdf:List that is not included in any other list (analogous to the head of a singly linked list). The definition of the non-zero span property in OWL-DL-set version is given in 1. The meaning of the axiom is as follows. The SpanWithAllNon-ZeroElements concept is defined as the span that has no cell with value zero. In general, the definitions of the properties of span expressed in OWL-DL-set version are concise than those in OWL-DL-list version.

Recall that OWL-RL is intentionally designed to be less expressive than OWL-DL to preserve the tractability of reasoning. It does not allow SWRL rules and many DL operators such as concept-equivalence. It adopts a peculiar asymmetric syntax in the usage of quantifiers in the class subsumption axioms. The existential quantifier is allowed only on the left-hand side and the universal quantifier only on the right-hand side. In the OWL-RL-list version of the PD ontology, we declared the ListWithZeroElement concept as a primitive concept. Even though there is loss of precision, the advantages can be seen in the time taken for reasoning (discussed in Section 6.2). The non-zero span property is defined by making slight modification to 1 and 1 axioms. As the equivalence operator is disallowed, it is replaced with the subsumption operator. The descriptions of the other properties of span are similar to the non-zero span property defined above, and we omit those for brevity.

(R1) rdf:List(?l1) rdf:first(?l1,?a) swrlb:equals(?a,0) ListWithZeroElement(?l1)
(R2) rdf:List(?l1) rdf:rest(?l1,?l2) ListWithZeroElement(?l2) ListWithZeroElement(?l1)
(A1) ListWithAllNon-ZeroElements rdf:List ListWithZeroElement
(A2) SpanWithAllNon-ZeroElements Span ListWithAllNon-ZeroElements
(A3) SpanWithAllNon-ZeroElements Span hasSpanCell (hasValue xsd:long[])
Table 1: Description Logic axioms and SWRL rules for the non-zero span property in the PD ontologies

6 Experiments

The BOLD system is implemented in Java and it enabled us to use the existing SPARQL engine [jena] and Pellet [pellet] reasoner. We have used SIR [sir], a popular bug benchmark repository for experiments. The LOC of the benchmarks are presented in Table 2. Each benchmark contains the correct version of the program and several faulty versions seeded with bugs. We used the BOLD system to debug faulty versions and to empirically assess the performance of different modules. The objective of the assessments is to measure the performance of the main features of BOLD – querying and reasoning. We present the querying-related experiments in Section 6.1 and the reasoning-related experiments in Section 6.2. For faithful comparison, we have debugged the same faulty versions using gdb.

Debugging is an art, which depends upon the expertise of the user in the profession and the user’s knowledge of the program. The productivity assessment of debugging involves subjective factors such as guessing the code location that might have caused the bug. For the sake of automation as well as computing precision, we restricted our experiments to objective and reproducible parameters. To debug a given faulty program, we have created a sequence of the typical actions a user is likely to perform. We have implemented the same action-sequence in both BOLD and gdb. The action-sequence along with the properties of spans verified for different bugs in this section are explained in the supplementary material.

All the experiments have been performed on a system with Intel Sandy Bridge processor, 10GB RAM, and Ubuntu 16.04 operating system. The reported times are obtained by calculating the arithmetic average of five runs with the same input.

6.1 Performance of Querying

The querying-related experiments compare the times taken to query the same trace information in BOLD and in gdb. For faithful comparison, we have avoided any manual intervention and have used fully automated versions of the two systems. A series of commands is provided to both the systems at startup.

Since our instrumentation and querying using SPARQL may add overheads, it is imperative to check how BOLD’s trace generation, information gathering, and querying perform against the well-optimized gdb. We observe that BOLD adds only a minimal overhead, as discussed below. Time taken for querying in both the systems is presented in Table 2. The gdb-Debug column indicates the time consumed by gdb. The BOLD-Query column indicates the time taken to generate the trace and query the trace for the relevant information. Since gdb actually executes the program, to be fair, we have included the time taken to generate trace in BOLD-Query time (trace generation also requires program execution). From the gdb-Debug and BOLD-Query columns it can be seen that both the systems exhibit comparable performance. In all the benchmarks except printtokens2, one of the systems performs better than the other only by a small margin (order of milliseconds). Considerably larger time in the printtokens2 benchmark is due to the use of an inter-span property (isEqualSpanOf). Computation of this property builds upon the results of two SPARQL queries. Hence, BOLD-Query takes longer. However, we note that the extra time helps avoid manual labor as in gdb. In gdb, it is a manual task to compare the information at two distinct instances of the statements , which is not accounted for in gdb-Debug time.

The last column presents the time taken for abstraction in BOLD. This time includes the BOLD-Query time because the abstraction is performed on the results of the queries. It can be seen that the major part of this time involves querying. For all the benchmarks except flex, the time taken for abstraction excluding querying is almost constant. The flex benchmark does not require abstractions because the debug-operation is to find the program path taken in a function during the execution. The path can be retrieved by integrating the source-code triples (for identifying the branch statements) and execution-trace triples. Finally, the operation is formulated as an integrated query (explained in Section 3.3.1).

6.2 Performance of reasoning

The reasoning-related experiments compare the performance of BOLD in the three versions of the PD ontology described in Section 5.1. Recall that such a reasoning is not possible with the existing debuggers. The time taken for reasoning the properties of the spans constructed to debug the SIR benchmarks is given in Table 3. These spans are constructed over the abstractions of the benchmarks reported in Table 2. The Span size column indicates the number of elements in the span. The OWL-DL-list version and the OWL-DL-set version columns indicate the time taken for reasoning when the span is modelled as list and set respectively, using the OWL-DL language. The last column OWL-RL-list version indicates the reasoning time when the span is modelled as list, using the OWL-RL language. When the span size is small (), the set version consumes slightly more time than the OWL-DL-list version. When the span size is large (we call these as big spans), the set version performs considerably faster than the OWL-DL-list version. Our analysis for this performance difference is presented below.

The properties verified using the big spans in our experiments are all either universal checks (all elements in the span are same / all are non-zero) or comparison checks (increasing span). In the set version, a negative evidence is sufficient to prove these checks whereas in the list version, the reasoner has to traverse through all the elements because the RDF lists are essentially recursive lists. The axioms and rules explained in Table 1 and Section 5.1 for the non-zero span property provide sufficient justification. The last version, OWL-RL-list, clearly outperforms both the OWL-DL versions because of the use of OWL-RL ontology. The complexity of Pellet [pellet] reasoner used for reasoning depends on the ontology. When used with OWL-RL ontology, it shows tractable performance.

It can be seen in Table 3 that as the span size increases, the performance of BOLD using different versions changes differently. So to test the scalability of each version, we increased the size of span on a particular benchmark and verified (an arbitrary) property: all elements are non-zero. We enforced a time-out of 2 minutes. The results of the experiment are shown in Table 4. The two versions with OWL-DL language timed-out for span sizes above 100. The version with OWL-RL language clearly shows effective response time even for bigger spans.

Debugging is an interactive activity, and the response time of the system is crucial for practical usage. Our experiments reveal that OWL-RL language provides good performance at scale. OWL-DL language can be used if the span sizes are small.

Benchmark LOC Execution time (in msec)
flex 14034 210 290 NA
grep 10929 205 193 220
printtokens 563 206 236 280
printtokens2 510 235 637 733
schdeule 412 210 187 218
sed 8059 210 241 274
space 9126 232 203 229
totinfo 406 231 202 237
Table 2: Time taken for querying in gdb and BOLD on SIR benchmarks
Execution time (in msec)
list version
set version
list version
grep 1 354 452 382
printtokens 1 390 475 388
printtokens2 12 10091 3300 631
schedule 7 1411 720 548
sed 2 460 500 402
space 2 425 520 397
totinfo 6 1250 809 507
Table 3: Time taken for reasoning in BOLD by using different sub-languages of OWL
Execution time (in sec)
OWL-DL-list OWL-DL-set OWL-RL-list
10 2.5 1.1 0.22
30 28.1 2.1 0.23
50 115.8 19.1 0.24
100 Timeout 32.1 0.27
200 Timeout Timeout 0.32
1000 Timeout Timeout 0.78
5000 Timeout Timeout 16.23
Table 4: Scalability of reasoning in BOLD by using different sub-languages of OWL (Timeout = 120 seconds)

7 Related work

We discuss related work in three categories. First, we discuss the existing approaches for instrumentation. The source code preprocessors (such as gcc, Rose [rose_instrumentation], OPARI [opari_instrumentation]) typically provide directives to control instrumentation. They focus on a specific issue or uses hard-coded commands in the program. This affects the readability and maintenance of the code. Many tools [custom1_instrumentation, custom2_instrumentation, custom3_instrumentation] provide custom methods to control instrumentation. These tools lack a standard terminology to define the instrumentation specification. Aspect-oriented programming [aop_instrumentation] generalizes functional decomposition and separates aspect code from the original code. This paradigm is successfully used for instrumentation by [aop_cpp_instrumentation]. But, the drawback is that it is tied to the programming language and does not provide standardization. There are languages (such as PTQL [ptql_instrumentation], PQL [pql_instrumentation], UFO [ufo_instrumentation]) designed to instrument queries into the source code. In principle, BOLD offers a comprehensive solution than these languages. Further, unlike these languages, in BOLD, programs need not be recompiled for each new query.

Second, we discuss the existing log-based debuggers and the tools that support high-level abstractions over the trace. The ODB system [odb] designed for Java programs uses ad hoc format to store the trace. The Tralfamadore [tralfamadore] system is based on the ideas of streaming databases. The TOD [pothier] system uses distributed databases to support storage and query processing. These systems provide an interface to navigate through sub-parts of the execution trace. The replay-based debuggers log information at specific places called checkpoints. Some examples include UndoDB [undodb] and URDB [urdb] for user-level programs. A survey on replay-based debugging can be found in Engblom [engblom]. On the other hand, some tools provide high-level abstractions to simplify debugging. The Expositor tool [expositor] abstracts the execution trace as a list and the users can query the trace using list processing API. The MzTake [mztake], Dalek [dalek], EBBA [ebba] systems adopt a similar approach. The Coca [coca] system allows users to write Prolog predicates over program states. Overall, these debuggers lack a standard semantic model to store and analyse the execution trace in a standard manner.

Third, we discuss the application of ontologies in different problems related to programming. PATO [ecoop] and CodeOntology [codeontology] frameworks proposed ontologies for C and Java languages respectively. The PATO framework [ecoop] converts C programs into triples and performs static analysis using those. The SmartAPI [smartapi] approach uses ontologies for automatic code generation. They expand libraries with domain ontology which assist easy location of code for a given goal. Ontologies are used to publish meta-information of software [swo]. This caters to the needs of a broad range of users. Ontologies are used for code maintenance [lassie] and teaching programming languages [ontologyteaching1, ontologyteaching2]. The applications of ontologies in software engineering are discussed by [sereference].

In summary, BOLD framework differs from the existing works by providing a standard and extensible, ontology-based semantic model for debugging.

8 Conclusion and future work

We presented a unified framework called BOLD for debugging. The framework integrates different tasks in debugging such as instrumentation, representation, and analysis of execution trace in a standard manner. The standardization is accomplished using an OWL ontology called PD ontology. The ontology provides different properties of spans (sub-sequences of trace list) to simplify debugging. We have used the popular SIR benchmarks to demonstrate the effectiveness of the framework. The experiments indicate that we are able to diagnose the causes of the bugs in SIR benchmarks. The time taken is also practical for an interactive session.

A very interesting direction for future work is to extend the BOLD framework to create and reason about program assertions. The current approaches for assertions are restricted to a statement. Ontologies provide a general way to create assertions about the whole programs. These assertions can be used to automatically identify and generate explanations for the bugs.