## 1 Introduction

Test oracles (simply called oracles) are usually used to evaluate the correctness of systems’ responses to test data. In black-box testing approaches, test data are usually generated from machine-readable specifications which can also be used in automating the evaluation of responses and the production of verdicts on the presence of faults. In white-box testing approaches [8], test data serve to cover some artifacts during executions of a system and an expert which plays the role of the oracle evaluates the responses. Devising automated proper oracles is needed; however it is a tedious task which almost always requires the human expertise. Efforts are needed to facilitate this task [2, 20] and to alleviate the intervention of experts in recurrent test activities.

Our work consider a typical conformance testing scenario [11], where an oracle is a deterministic finite state machine (DFSM). However, uncertainty can occur in devising oracles. E.g., it can be a consequence of misunderstanding or misinterpretation of requirements of systems often described with natural languages [7, 3, 6]

. As a result of the uncertainty, a set of candidate oracles can be proposed. For example, machine learning-based translation approaches

[7, 18] for reactive systems return the most likely DFSM, but the latter may be undesired due to decisions made by automated translation procedures. Instead, they could automatically return a set of candidate oracles of which the likelihood is above a certain threshold. On the other hand when a candidate oracle is available (e.g., it can be in the form of a Program under test), a set of its versions can be produced mutating it with operations mimicking the introduction or the correction of faults. Such a set can compactly be represented by a non deterministic finite state machine (NFSM) thus modelling an imprecise oracle. The candidate oracles are called precise in the opposite of the imprecise oracle defining them. Devising an oracle then consists in mining the proper candidate from the imprecise oracle.In this paper we propose an approach to mining the proper oracle from an imprecise oracle represented with a NFSM. An expert can answer queries related to the correctness of NFSM’s responses. An answer can be either yes or no. Based on the answers, the proper DFSM is automatically mined. We assume that the proper oracle is not available to the expert and the expert might have limited time resources for answering the queries. In this context, the expert cannot check the equivalence between a candidate oracle and the unavailable proper oracle; so, polynomial time active learning approaches inspired by

[1] are less adequate for devising the proper DFSM. In our approach, distinct responses to the same test data permit to distinguish between candidate oracles. Responses, as well as the corresponding test data, are automatically computed. Our approach is iterative and applies the ”divide and conquer” principle over a current set of ”good” candidates. At each iteration step, the current candidate set is divided into a subset of ”good” candidates exhibiting ”expected” responses to test data and the complementary subset of ”bad” ones. The approach uses a Boolean encoding of the imprecise oracle; it takes advantage of the efficiency of constraint solvers to facilitate the search of good candidates.The paper is organized as follows. The next section provides preliminary definitions. In Section 3, we describe the oracle mining problem and introduce the steps of our solution to it. In Section 4 we propose a Boolean encoding for an imprecise oracle and test-equivalent candidates; then we present the reduction of an imprecise oracle based on the selection of expected responses by experts. In section 5, we propose a procedure for verifying the adequacy of a test data set for mining an oracle and a mining procedure based on automatic generation of test data. Experiments for promoting the applicability of the approach are presented in Section 6. In section 7, we present the related work. We conclude our work in Section 8.

## 2 Preliminaries

A Finite State Machine (FSM) is a 5-tuple , where is a finite set of states with initial state ; and are finite non-empty disjoint sets of inputs and outputs, respectively; is a transition relation and a tuple is called a transition from to with input and output . The set of transitions from state is denoted by . denotes the set of transitions in with input . For a transition , we define , , and . The set of uncertain transitions in an object is denoted by . Transition is uncertain if , i.e., several transitions from the have the same input as ; otherwise is certain. The number is called the uncertainty degree of state on input . defines the uncertainty degree of . We say that is deterministic (DFSM) if it has no uncertain transition, otherwise it is non-deterministic (NFSM). In other words if is deterministic. is completely specified (complete FSM) if for each tuple there exists transition .

An execution of in , is a finite sequence of transitions forming a path from in the state transition diagram of , i.e., , for every . Execution is deterministic if every is the only transition in that belongs to , i.e., does not include several uncertain transitions from the same state with the same input. is simply called an execution of if . is initially connected, if for any state there exists an execution of to . A DFSM has only deterministic executions, while an NFSM can have both. A trace is a pair of an input sequence and an output sequence , both of the same length. The trace of is . A trace of in is a trace of an execution of in . Let denote the set of all traces of in and denote the set of traces of in the initial state . Given a sequence , the input (resp. output) projection of , denoted (resp. ), is a sequence obtained from by erasing symbols in (resp. ); if is the trace of execution , then (resp. ) is called the input (resp. output) sequence of and we say that is the response of in to (the application of) input sequence . denotes the size of set .

Two complete FSMs are distinguished with an input sequence for which they produce different responses. Given input sequence , let denote the set of responses which can be produced by when is applied at state , that is . Given state and of an FSM and an input sequence , and are -distinguishable, denoted by if ; then is called a distinguishing input sequence for and . and are -equivalent, denoted by if . and are distinguishable, denoted by , if they are -distinguishable for some input sequence ; otherwise they are equivalent. Let . A distinguishing input sequence for and is minimal if is not distinguishing for and . Two complete DFSMs and over the same input and output alphabets are distinguished with input sequence if .

Henceforth, FSMs and DFSMs are complete and initially connected.

Given a NFSM , a FSM is a submachine of , denoted by if , and .

We will use a NFSM to represent a set of candidate DFSMs. We let denote the set of candidate DFSMs included in NFSM . Later, we will be interested in executions of that are executions of a DFSM in . Let be an execution of a NFSM in . We say that involves a submachine of if , i.e., all the uncertain transitions in are defined in . The certain transitions are defined in each DFSM in , but distinct DFSMs in define distinct sets of uncertain transitions.

## 3 The Oracle Mining Problem and Overview of the Proposed Solution

Oracles play an important role in testing and verification activities, especially they define and evaluate the responses of implementations to given tests. The evaluation serves to provide verdicts on the presence of faults in the implementations. Letting experts play the role of an oracle is expensive. The experts will intervene in recurrent test campaigns for judging an important number of responses. For these reasons, automated test oracles are preferred.

Devising precise oracles (shortly oracles) is a challenging task that might require uncertainty resolution, as discussed in Section 1. Full automation of this task might result in undesired oracles. Inspired by previous work [5, 12], we represent oracles with DFSMs and a test with an input sequence.

We propose a semi-automated mining approach for devising oracles. First we suggest modelling uncertainties with non deterministic transitions in a NFSM. This latter NFSM represents an imprecise oracle and it defines conflicting outputs for the same input applied in the same state. It also defines a possibly big number of candidate oracles (shortly candidates) which are the DFSM included in it. Secondly, experts can take useful decisions for the resolution of uncertainties and the automatic extraction of the proper candidate. The decisions concern the evaluation and the selection of conflicting responses. The fewer are the decisions, the less is the intervention of experts in the mining process and the recurrent testing activities with the selected oracle.

Let a NFSM represent an imprecise oracle. We say that is the proper oracle w.r.t. experts if always produces the expected responses to every test, according to the point of view of experts; otherwise is inappropriate. Equivalent DFSMs represent an identical oracle. In practice the uncertainty degree of should be much smaller than its maximal value ; we believe that it could be smaller than the maximum of and . The oracle mining problem is to select the proper oracle in , with the help of an expert. We assume that always contains the proper oracle.

The NFSM in Figure (a)a represents an imprecise oracle. It defines eight candidate oracles with six uncertain transitions, namely . Figure (c)c and Figure (d)d present two candidates; one of them is proper.

Mining the proper oracle is challenging even with the help of an expert, especially when the NFSM for an imprecise oracle defines an important number of candidates. The one-by-one enumeration of the candidates might not work because of the sheer number of candidates induced by an imprecise oracle. A naive approach could consist to deactivate in each state of the NFSM, the transitions producing outputs evaluated as unexpected by the expert. This naive approach does not work. For example, the imprecise oracle in Figure 5 has four executions with input sequence , namely , , and . The two plausible responses for these executions are and . The latter is expected as it is produced by the proper oracle in Figure (c)c.

All but one executions produce the desired output in state 3 on the last input . One could deactivate or remove the transition based on the fact that it produces the last undesired output in the unexpected response. In consequence the reduction of the imprecise oracle will result in an oracle not defining . Any candidate not defining is not equivalent to the proper oracle. This naive approach of selecting some transitions from transition sequences fails in mining the proper oracle. This is because entire sequences of transitions used to reach states (and so their input-output sequences) define the proper candidate.

Our oracle mining approach relies on the evaluation by experts of responses (instead of isolated outputs) of the candidates to tests. The principle of the approach is iterative and quite simple. At each iteration step, first we use pair of candidates to generate tests. Next, we generate the plausible responses for generated tests. Then we let experts select expected responses. Eventually we remove from the candidate set, the ones producing unexpected responses; this can be done by deactivating transitions in imprecise oracle and removing candidates from the set of solutions of the Boolean formulas. The iteration process continues if two remaining candidates are distinguishable. A lot of memory can be needed to store each and every candidate, especially if a great number of them is available. To reduce the usage of the memory, we encode candidates with Boolean formulas and we use a solver to retrieve candidates from the Boolean encodings. The Boolean encoding is also useful for representing the candidates already used to generate distinguishing tests.

In the next section we propose Boolean encodings for the DFSMs including in a NFSM and the test-equivalent DFSMs. We also present how to deactivate/remove transitions in a NFSM for modelling reduced candidate sets.

## 4 Boolean Encodings

Let be an imprecise oracle. represents a set of candidate oracles, i.e., a set of DFSMs. We encode candidates with Boolean formulas over variables representing the transitions in . A solution of a formula determines the transitions corresponding to the variables it assigns to ”true”. An FSM is determined (encoded) by a formula if exactly all its transitions are determined by a solution of the formula.

### 4.1 Candidates in an imprecise oracle

Let be a set of variables, each variable corresponds to a transition in . Let us define the Boolean expression as follows:

It holds that every solution of determines exactly one variable in . Indeed, assigns True if both and are True. is True whenever at least one is True. If some is True, then every , must be False in order for to be True. So every solution of determines exactly one transition in ; this transition corresponds to the only variable in that the solution assigns to .

We encode the candidates in with the formula

For every state and every input , every solution of determines exactly one transition in , which entails that a solution of cannot determine two different transitions with the same input from the same state. So determines exactly the candidates in .

### 4.2 Candidates involved in executions of an imprecise oracle

An execution of involves a FSM if every is defined in . Recall that all the certain transitions are defined in every candidate. Let us define the formula . Clearly determines every uncertain transition in , so it determines the deterministic and non deterministic FSMs involved in . However we are interested in DFSMs in only. Remark that if DFSM is involved in , then is deterministic. Conversely, is deterministic if includes a DFSM involved in . An execution of must be deterministic for a DFSM to be involved in it. So determines the DFSMs involved in if is deterministic. Let be a set of deterministic executions of and let us define the formula . The formula determines the DFSMs involved in an execution in .

Consider the NFSM in Figure (a)a and a set consisting of four executions and . Remark that the executions are deterministic and they have the same input sequence but distinct responses, namely for , for and and for . The formula encodes the DFSMs involved in the three executions.

### 4.3 Test-equivalent candidate

Let be a test. To determine the -equivalent DFSMs, we can partition into subdomains. The DFSMs in each subdomain produce the same response to test . Our encoding of each subdomain with a Boolean formula works as follows.

Let be the set of responses the DFSMs in to test . Each response , with , corresponds a maximal set of deterministic executions of with input sequence . We denote by the set of deterministic executions producing on input sequence . Clearly characterizes a subdomain of -equivalent DFSMs. The maximal size of equals and it is reached when the imprecise oracle is the universe of all DFSMs, which is not the practical context of our work with imprecise oracles having reasonable uncertainty degrees.

Let denote the set of DFSM in involved in an execution in . It holds that constitutes a partition of , i.e., every deterministic submachine of exactly belongs to one , and every DFSM in is a submachine of for every .

For each , we define the formula . It holds that encodes the maximal set of DFSMs indistinguishable by . Indeed, determines exactly the -equivalent FSMs involved in deterministic executions in and determines the DFSMs in . We can show that every DFSM included in is determined by the formula for exactly one . Furthermore, if is not distinguishing for the DFSMs in , then and are equivalent, i.e., they determine the DFSMs in .

Considering our running example and the test , we have that . Since the four executions have distinct responses (i.e., output sequences), we get , and . Table 1 presents the corresponding subdomains and the number of oracles in each subdomain. The two oracles in the subdomain for response are equivalent. The same for response . The subdomain for response defines four -equivalent candidate oracles. Later, experts are invited to select the expected response that will serve to reduce the imprecise oracle.

Response | Subdomain for | size | Precise oracles in the subdomain |
---|---|---|---|

4 | , , , | ||

2 | , | ||

2 | , |

where,

### 4.4 Reducing an imprecise oracle

The selection of test-equivalent candidates renders useless transitions of the imprecise oracle unused in the selected candidates. These transitions can be deactivated for obtaining a reduced imprecise oracle.

Let be an input complete NFSM and be a trace. is partitioned into the set of DFSMs producing on and the set of DFSMs not producing on . We say that a transition is eligible for a candidate involved in if uses or for every used in .

###### Lemma 1

There is a submachine of such that .

###### Proof

Let e be a deterministic execution in . Remark that all the transitions in are eligible for the candidates involved in . Moreover is the only execution with input sequence and response in each of these candidates.

We build with by deactivating (deleting) non eligible transitions for candidates in . Formally belongs to if it is eligible for a candidate involved in some deterministic execution . belongs to if is used in a transition in . Clearly, is a complete and initially connected submachine of ; is not necessarily deterministic because several executions in can use several uncertain transitions defined in the same state and with the same input; these transitions belong to .

First we show that by contradiction. Assume that there is in but not in . is deterministic and by construction it defines all the transitions in a deterministic execution of . This implies the response of on is , which is a contradiction with hypothesis . Secondly, we show that . Let . produces on with exactly one of its execution . The transitions eligible for are defined in . So . ∎

Consider Table 1 and assume experts choose the expected response . The reduced imprecise oracle for , is the imprecise oracle in Figure (b)b which was obtained by removing transition from in Figure (a)a. This is because among the two transitions and from state with input , the executions in only use .

Reducing an imprecise oracle permits to speed up the computation of executions with given tests. Indeed, once it becomes clear that passing some transitions in the imprecise oracle leads to the production of undesired responses, one does not need to consider these transitions in determining new execution sets.

Let be a candidate in and be a test-response pair.

###### Lemma 2

if and only if is determined by .

Remark that in some circumstances is the same as . This happens when the union of eligible transitions over a set of executions equals the set of transitions of . Such a case will be presented in Section 5.2. Uncertain transitions in but not in are not determined by because other uncertain transitions are determined by and a solution of cannot determine two uncertain transitions from the same state with the same input.

## 5 Mining an Oracle

To mine an oracle represented with a DFSM, we apply a test set on an imprecise oracle . We say that is adequate for mining the proper oracle from if is distinguishing for some and every other candidate in that is not equivalent to ; moreover is proper. Verifying the mining adequacy of is the first step in mining the proper oracle. In case is not adequate, new tests can be generated.

### 5.1 Verifying adequacy of a test set for mining the proper oracle

Our method of verifying the adequacy of a test is iterative. At each iteration step, a test is randomly chosen and the corresponding plausible responses are computed with the imprecise oracle. Then experts select an expected response and send it to an automated procedure. The automated procedure reduces the imprecise oracle, i.e., deactivates some transitions from the imprecise oracle. The procedure stops when the responses for every test are examined or no imprecision remains. The procedure verify_test_adequacy_for_mining scripted in Algorithm 1 returns a verdict of the verification.

Procedure verify_test_adequacy_for_mining takes as inputs an imprecise oracle represented by a NFSM, a test set and the expert knowledge about the expected outputs for the tests. We represent the expert knowledge with a DFSM. It uses Boolean encoding presented in the previous section. The procedure ends the iteration if all the tests were visited or the Boolean encoding defines a single DFSM. If the Boolean encoding of the test-equivalent DFSMs defines two non equivalent DFSMs then the tests do not enable mining an oracle; otherwise one of the remaining equivalent DFSMs is mined. The procedure also returns the Boolean encoding of the selected DFSMs for the tests, i.e, the DFSMs which produce the expected output on every test.

Consider the original imprecise oracle in Figure (a)a. For verifying whether the test is adequate for mining an oracle, verify_test_adequacy_for_mining determines the plausible responses (see Table 1) for the deterministic execution on . Assume that experts choose expected response . The procedure determines as we discussed in Section 4.3; then it builds in Table 1 and the reduced imprecise oracle in Figure (b)b as discussed in Section 4.4. The formula determines four -equivalent candidates presented in Table 1. Two of these candidates are distinguished with test , namely the oracle in Figure (c)c and the one defining the transition set . This latter oracle provides response whereas the former provides for test . In conclusion the procedure returns indicating that test is not adequate for mining the proper oracle in Figure (c)c; it also returns the reduced imprecise oracle and the encoding with of -equivalent candidates.

### 5.2 Test generation in mining an oracle

Procedure precise_oracle_mining in Algorithm 2 mines an oracle from an imprecise one by generating tests. The procedure makes a call to semi-automated procedure verify_test_adequacy_for_mining in Algorithm 1. If given tests are not adequate for the mining task, procedure verify_test_adequacy_for_mining returns a Boolean encoding of a reduced set of test-equivalent candidates. Then, procedure precise_oracle_mining generates a distinguishing test for two candidates in the reduced set. Such a test can correspond to a path to a sink state in the distinguishing product [15] of two candidates. The test generation stops if the generated test is adequate for mining the proper oracle in the reduced set of candidates; otherwise another test is generated. Procedure precise_oracle_mining always terminates because at each iteration step, the set of candidates is reduced after a call to procedure verify_test_adequacy_for_mining and the number of DFSMs included in the original imprecise oracle is finite. On termination of verify_test_adequacy_for_mining, the initial tests augmented with the generated ones constitute adequate tests for mining the proper oracle determined by .

Considering the running example, the first call to verify_test_adequacy_for_mining in the execution of Procedure precise_oracle_mining permits establishing that the test is not adequate for mining an oracle. This was discussed at the end of the previous section where the test was generated as a distinguishing test for two candidates determined by and included in the reduced imprecise oracle in Figure (b)b. In the first iteration step of the while loop, Procedure precise_oracle_mining makes a second call to verify_test_adequacy_for_mining for checking whether the generated test is adequate for mining an oracle from the new context and . Here is what happens within this second call. The plausible responses for belong to ; they are obtained with deterministic executions of in