Automating Test Case Classification and Prioritization for Use Case-Driven Testing in Product Lines

05/28/2019 ∙ by Ines Hajri, et al. ∙ 0

Product Line Engineering (PLE) is a crucial practice in many software development environments where software systems are complex and developed for multiple customers with varying needs. At the same time, many development processes are use case-driven and this strongly influences their requirements engineering and system testing practices. In this paper, we propose, apply, and assess an automated system test case classification and prioritization approach specifically targeting regression testing in the context of the use case-driven development of product families. Our approach provides: (i) automated support to classify, for a new product in a product family, relevant and valid system test cases associated with previous products, and (ii) automated prioritization of system test cases using multiple risk factors such as fault-proneness of requirements and requirements volatility in a product family. Our evaluation was performed in the context of an industrial product family in the automotive domain. Results provide empirical evidence that we propose a practical and beneficial way to classify and prioritize system test cases for industrial product lines.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Product Line Engineering (PLE) is a common practice in many domains such as automotive and avionics to enhance product quality, to reduce development costs, and to improve time-to-market (Pohl et al., 2005). In such domains, many development processes are use case-driven and this strongly influences their requirements engineering and system testing practices (Nebut et al., 2006a) (Nebut et al., 2006b) (Wang et al., 2015a) (Wang et al., 2015b). For example, IEE S.A. (in the following “IEE”) (IEE, 2018), a leading supplier of embedded software and hardware systems in the automotive domain and the case study supplier in this paper, develops automotive sensing systems enhancing safety and comfort in vehicles for multiple major car manufacturers worldwide. The current development and testing practice at IEE is use case-driven, and IEE, like many other development environments, follows the common product line testing strategy referred to as opportunistic reuse of test assets (da Mota Silveira Neto et al., 2011). A new product line is typically started with a first product from an initial customer. Analysts elicit requirements as use case specifications and then derive system test cases from these specifications. For each subsequent customer for that product, the analysts start from the current use case specifications, and negotiate variabilities with the customer to produce new specifications. They then manually choose and prioritize, from the existing test suite of the previous product(s), test cases that can and need to be rerun to ensure existing, unmodified functionalities are still working correctly in the new product. With this form of test reuse, there is no structured, automated method that supports the activity of selecting and prioritizing test cases. It is fully manual, error-prone and time-consuming, which leads to ad-hoc change management for use case models and system test cases in product lines. Therefore, product line test case classification and prioritization techniques, based on a dedicated use case modeling methodology, are needed to automate the reuse of system test cases in the context of use case-driven development and product families.

The need for supporting PLE for the purpose of test automation has already been acknowledged and many product line testing approaches have been proposed in the literature (da Mota Silveira Neto et al., 2011) (do Carmo Machado et al., 2014) (Engström and Runeson, 2011) (Johansen et al., 2011) (Lee et al., 2012) (Oster et al., 2011) (Runeson and Engström, 2012) (Tevanlinna et al., 2004). Most of the existing approaches follow the product line testing strategy design test assets for reuse (da Mota Silveira Neto et al., 2011) in which test assets, e.g., abstract test cases or behavioral models, are created in advance for the entire product family, including common and reusable parts. When a new product is developed, test assets are selected to be reused, extended, and refined into product-specific test cases. Due to deadline pressures and limited resources, many companies, including IEE, find the upfront creation of test assets to be impractical because of the large amount of manual effort required before there are (enough) customers to justify it.

Lity et al. (Lity et al., 2016) (Lochau et al., 2014) (Lity et al., 2012) propose a test case selection approach which follows an alternative product line testing strategy, i.e., incremental testing of product lines (da Mota Silveira Neto et al., 2011). In this strategy, the initial product is tested individually and the following products are tested using regression testing techniques, i.e., test case selection and prioritization. Lity et al. apply incremental model slicing to determine the impact of changes on a system model, e.g., finite state machines, and to select the scenarios to be retested with the new product. The approach does not require the entire test suite of the product family to be generated in advance since the test cases of the new product are selected and derived incrementally from the test suites of the previous product(s). Its main limitation is the need for detailed behavioral models, e.g., finite state machines, which rarely exist in industrial practice since software development and testing are typically driven by requirements in Natural Language (NL) and behavioural models are typically specified only for a limited set of critical system features (Larman, 2002).

Many approaches for test case classification and prioritization require the source code of the system under test together with code coverage information (Yoo and Harman, 2012). However, this information is often partially available in industrial contexts. Indeed, when system testing is outsourced to companies or independent test teams, the source code of the system under test is often partially or not available. For example, test teams may have access only to the source code of a single product, not the entire product line. In addition, structural coverage information is often unavailable in the case of embedded systems. Indeed, traditional compiler-based methods used to collect coverage data (Yang et al., 2009) cannot be applied when test cases need to be run on dedicated hardware. These are the main motivations in this paper to rely on a requirements-driven approach to test case classification and prioritization.

In our previous work (Hajri et al., 2015), we proposed the Product Line Use case modeling Method (PUM), which supports variability modeling in PL use case diagrams and specifications in NL, intentionally avoiding any reliance on feature models or activity and sequence diagrams. Based on PUM, we developed a use case-driven configuration approach (Hajri et al., 2018b) (Hajri et al., 2016) which guides the analysts in making configuration decisions and automatically generating PS use case models. It is supported by a tool, PUMConf (Product line Use case Model Configurator), integrated with IBM DOORS.

In this paper, we propose, apply and assess an approach for the definition and prioritization of test cases in product lines, based on our use case-driven modeling and configuration techniques (Hajri et al., 2015) (Hajri et al., 2018b). Our approach supports the incremental testing of new products of a product family where requirements are captured as use case specifications. Consistent with the strategy referred to as “incremental testing of product lines”, we automate the definition of system test cases by reusing test cases that belong to existing products. After the initial product is tested individually, new test cases might be needed and some of the existing test cases may need to be modified for new products, while some existing test cases are simply reused verbatim. The definition of test cases for new products is based on the classification and selection of existing test cases in the product line and on the identification of new, untested scenarios for new products under test. Test case prioritization is based on prediction models trained using product line historical data.

To reuse the existing system test cases, our approach automatically classifies them as obsolete, retestable, and reusable. An obsolete test case cannot be executed on the new product as the corresponding use case scenarios are not selected for the new product. A retestable test case is still valid but needs to be rerun to determine the possible impact of changes whereas a reusable test case is also valid but does not need to be rerun for the new product. We implemented a model differencing pipeline which identifies changes in the decisions made to configure a product (e.g., selecting a variant use case). There are two sets of decisions: (i) decisions made to generate the PS use case specifications for the previous product(s) and (ii) decisions made to generate the PS use case specifications for the new product. Our approach compares the two sets to classify the decisions as new, deleted and updated, and to identify the impacted parts of the use case models of the previous product(s). By using the trace links from the impacted parts of the use case models to the system test cases, we automatically classify the existing system test cases to be reused for testing the new product. In addition, we automatically identify the use case scenarios of the new product that have not been tested before, and provide information on how to modify existing system test cases to cover these new, untested use case scenarios, i.e., the impact of use case changes on existing system test cases.

System test cases are automatically prioritized based on multiple risk factors such as fault-proneness of requirements and requirements volatility in the product line. To this end, we rely on prediction models; more precisely, we leverage logistic regression models that capture how likely changes in these risk factors impact the failure likelihood of each test case. To support these activities, we extended

PUMConf. We have evaluated the effectiveness of the proposed approach by applying it to select and prioritize the test cases of five software products belonging to a product line in the automotive domain. To summarize, the contributions of this paper are:

  • a test case classification and prioritization approach that is specifically tailored to the use case-driven development of product families, does not rely on behavioral system models, and guides engineers in testing new products in a product family;

  • a publicly available tool integrated with IBM DOORS as a plug-in, which automatically selects and prioritizes system test cases when a new product is configured in a product family;

  • an industrial case study demonstrating the applicability and benefits of our test selection and prioritization approach.

This paper is structured as follows. Section 2 provides the background on PUM and PUMConf on which this paper builds the proposed test case classification and prioritization approach. Section 3 introduces the industrial context of our case study to illustrate the practical motivations for our approach. Section 4 discusses the related work in light of the industrial needs identified in Section 3. In Section 5, we provide an overview of the approach. Sections 6 and 7 provide the details of its core technical parts. In Section 8, we present our tool while Section 9 reports on our evaluation in an industrial setting, involving an embedded system called Smart Trunk Opener (STO). In Section 10, we conclude the paper.

2. Background

In this section we give the background regarding the elicitation of PL use case models (see Section 2.1), and our configuration approach (see Section 2.2).

In the rest of the paper, we use Smart Trunk Opener (STO) as a case study, to motivate, illustrate and assess our approach. STO is a real-time automotive embedded system developed by IEE. It provides automatic, hands-free access to a vehicle’s trunk, in combination with a keyless entry system. In possession of the vehicle’s electronic remote control, the user moves her leg in a forward and backward direction at the vehicle’s rear bumper. STO recognizes the movement and transmits a signal to the keyless entry system, which confirms that the user has the remote. This allows the trunk controller to open the trunk automatically.

2.1. Elication of Variability in PL Use Cases with PUM

Figure 1. Part of the PL Use Case Diagram for STO

Elicitation of PL use cases is based on the Product line Use case modeling Method (PUM) (Hajri et al., 2015). In this section, we give a brief description of the PUM artifacts.

2.1.1. Use Case Diagram with PL Extensions

For use case diagrams, we employ the PL extensions proposed by Halmans and Pohl (Halmans and Pohl, 2003) (Bühne et al., 2003) since they support explicit representation of variants, variation points, and their dependencies (see Fig. 1).

A use case is either Essential or Variant. Variant use cases are distinguished from essential use cases, i.e., mandatory for all the products in a product family, by using the ‘Variant’ stereotype. A variation point given as a triangle is associated to one, or more than one use case using the ‘include’ relation. A mandatory variation point indicates where the customer has to make a selection (the black triangles in Fig. 1). A ‘tree-like’ relation, containing a cardinality constraint, is used to express relations between variants and variation points, which are called variability relations. The relation uses a [min..max] notation in which and define the minimum and maximum numbers of variants that can be selected for the variation point.

1 USE CASE Recognize Gesture
2 1.1 Basic Flow (BF)
3 1. The system REQUESTS move capacitance FROM the sensors.
4 2. INCLUDE USE CASE Identify System Operating Status.
5 3. The system VALIDATES THAT the operating status is valid.
6 4. The system VALIDATES THAT the movement is a valid kick.
7 5. The system SENDS the valid kick status TO the STO Controller.
8 1.2 <OPTIONAL>Bounded Alternative Flow (BAF1)
9 RFS 1-4
10 1. IF voltage fluctuation is detected THEN
11 2. ABORT.
12 3. ENDIF
13 1.3 Specific Alternative Flow (SAF1)
14 RFS 3
15 1. ABORT.
16 1.4 Specific Alternative Flow (SAF2)
17 RFS 4
18 1. The system increments OveruseCounter by the increment step.
19 2. ABORT.
20
21 USE CASE Identify System Operating Status
22 1.1 Basic Flow (BF)
23 1. The system VALIDATES THAT the watchdog reset is valid.
24 2. The system VALIDATES THAT the RAM is valid.
25 3. The system VALIDATES THAT the sensors are valid.
26 4. The system VALIDATES THAT there is no error detected.
27 1.5 Specific Alternative Flow (SAF4)
28 RFS 4
29 1. INCLUDE <VARIATION POINT: Storing Error Status>.
30 2. ABORT.
31
32 USE CASE Provide System User Data
33 1.1 Basic Flow (BF)
34 1. The tester SENDS the system user data request TO the system.
35 2. INCLUDE <VARIATION POINT : Method of Providing Data>.
36
37 <VARIANT>USE CASE Provide System User Data via Standard Mode
38 1.1 Basic Flow (BF)
39 V1. <OPTIONAL>The system SENDS calibration TO the tester.
40 V2. <OPTIONAL>The system SENDS sensor data TO the tester.
41 V3. <OPTIONAL>The system SENDS trace data TO the tester.
42 V4. <OPTIONAL>The system SENDS error data TO the tester.
43 V5. <OPTIONAL>The system SENDS error trace data TO the tester.
Table 1. Some STO Use Cases in the extended RUCM

A variability relation is optional where () or ( and ); n is the number of variants in a variation point. A variability relation is mandatory where (). Optional and mandatory relations are depicted with light-grey and black filled circles, respectively (see Fig. 1). For instance, the ‘Provide System User Data’ essential use case has to support multiple methods of providing data where the methods of providing data via IEE QC mode and Standard mode are mandatory. In addition, the customer can select the method of providing data via diagnostic mode. In STO, the customer may decide that the system should not store the errors determined during the identification of the operating state (see the ‘Storing Error Status’ optional variation point in Fig. 1). The extensions support the dependencies require and conflict among variation points and variant use cases (Bühne et al., 2003). With require, the selection of the variant use case in ‘Storing Error Status’ implies the selection of the variant use case in ‘Clearing Error Status’ in Fig. 1.

2.1.2. Restricted Use Case Modeling (RUCM) with PL Extensions

This section introduces the RUCM (Restricted Use Case Modeling) template and its PL extensions which we proposed in our previous work (Hajri et al., 2015). RUCM is a use case modeling method which provides restriction rules and keywords constraining the use of NL (Yue et al., 2013). Since it was not designed for PL modeling, we introduced some PL extensions (see Table 1). In RUCM, use cases have basic and alternative flows (Lines 2, 8, 13, 16, 22, 27, 33 and 38). In Table 1, we omit some alternative flows and basic information such as actors and pre/post conditions.

A basic flow describes a main successful path that satisfies stakeholder interests (Lines 3-7, 23-26 and 39-43). It contains use case steps and a postcondition. A step can be a system-actor interaction: an actor sends a request or data to the system (Lines 34); the system replies to an actor with a result (Line 7). In addition, the system validates a request or data (Line 5), or it alters its internal state (Line 18). The use case inclusion is given in a step with the keyword ‘INCLUDE USE CASE’ (Line 4). The keywords are in capital letters. ‘VALIDATES THAT’ (Line 5) indicates a condition that must be true to take the next step, otherwise an alternative flow is taken.

An alternative flow describes other scenarios or branches, both success and failure. It depends on a condition occurring in a specific step in a flow of reference, referred to as reference flow, and that reference flow is either the basic flow or another alternative flow.

RUCM has specific, bounded and global alternative flows. A specific alternative flow refers to a step in a reference flow (Lines 13, 16, and 27). A bounded alternative flow refers to more than one step in a reference flow (Line 8), while a global one refers to any step in a reference flow. ‘RFS’ is used to refer to reference flow steps (Lines 9, 14, 17, and 28). Bounded and global alternative flows begin with ‘IF .. THEN’ for the conditions under which they are taken (Line 10). Specific alternative flows do not necessarily begin with ‘IF .. THEN’ since a guard condition is already indicated in their reference flow steps (Line 5).

PUM extensions to RUCM include (i) new keywords for modeling interactions in embedded systems and (ii) new keywords for modeling variability. The keywords ‘SENDS .. TO’ and ‘REQUESTS .. FROM’ capture system-actor interactions (Lines 3, 7, 34, and 39-43). For instance, Step 1 (Line 3) indicates an input message from the sensors to the system. For consistency with PL use case diagrams, PUM introduces into RUCM the notion of variation point and variant use case. Variation points can be included in basic or alternative flows with the keyword ‘INCLUDE <VARIATION POINT : … >’ (Lines 29 and 35). Variant use cases are given with the keyword ‘<VARIANT >’ (Line 37). To capture variability that cannot be modeled in use case diagrams because of their coarse granularity, PUM introduces optional steps, optional alternative flows and a variant order of steps. Optional steps and alternative flows begin with the keyword ‘<OPTIONAL>’ (Lines 8 and 39-43). The keyword ‘V’ is used before any step number to express variant step order (Lines 39-43).

2.2. Configuration of PS Use Case Models

Figure 2. Generated PS Use Case Diagram for STO

PUMConf supports users in making configuration decisions and automatically generating PS use cases from PL use cases.

The user selects (1) variant use cases in the PL use case diagram and (2) optional use case elements in the PL use case specifications, to generate the PS use case models. For instance, in the case of the STO example, the analyst makes decisions for the variation points in Fig. 1. A decision is about selecting, for the product, variant use cases in the variation point. The user selects Store Error Status and Clear Error Status in the variation points Storing Error Status and Clearing Error Status, respectively. She also unselects Clear Error Status via Diagnostic Mode in the variation point Method of Clearing Error Status, while Clear Error Status via IEE QC Mode is automatically selected by PUMConf because of the mandatory variability relation. Finally, the user unselects Provide System User Data via Diagnostic Mode in the variation point Method of Providing Data.

Given the configuration decisions, PUMConf automatically generates the PS use case diagram from the PL diagram (see Fig. 2 generated from Fig. 1). For instance, for the decision for the variation point Method of Providing Data, PUMConf creates the use cases Provide System User Data via IEE QC Mode and Provide System User Data via Standard Mode, and two include relations in Fig. 2.

1 USE CASE Recognize Gesture
2 1.1 Basic Flow (BF)
3 1. The system REQUESTS the move capacitance FROM the sensors.
4 2. INCLUDE USE CASE Identify System Operating Status.
5 3. The system VALIDATES THAT the operating status is valid.
6 4. The system VALIDATES THAT the movement is a valid kick.
7 5. The system SENDS the valid kick status TO the STO Controller.
8 1.2 Specific Alternative Flow (SAF1)
9 RFS 3
10 1. ABORT.
11 1.3 Specific Alternative Flow (SAF2)
12 RFS 4
13 1. The system increments the OveruseCounter by the increment step.
14 2. ABORT.
15
16 USE CASE Identify System Operating Status
17 1.1 Basic Flow (BF)
18 1. The system VALIDATES THAT the watchdog reset is valid.
19 2. The system VALIDATES THAT the RAM is valid.
20 3. The system VALIDATES THAT the sensors are valid.
21 4. The system VALIDATES THAT there is no error detected.
22 1.5 Specific Alternative Flow (SAF4)
23 RFS 4
24 1. INCLUDE USE CASE Store Error Status.
25 2. ABORT.
26
27 USE CASE Provide System User Data
28 1.1 Basic Flow (BF)
29 1. The tester SENDS the system user data request TO the system.
30 2. The system VALIDATES THAT ‘Precondition of Provide System User Data via Standard Mode’.
31 3. INCLUDE USE CASE Provide System User Data via Standard Mode.
32 1.2 Specific Alternative Flow (SAF1)
33 RFS 2
34 1. INCLUDE USE CASE Provide System User Data via IEE QC Mode.
35 2. ABORT.
36
37 USE CASE Provide System User Data via Standard Mode
38 1.1 Basic Flow (BF)
39 1. The system SENDS the trace data TO the tester.
40 2. The system SENDS the calibration data TO the tester.
41 3. The system SENDS the error trace data TO the tester.
Table 2. Some of the Generated Product Specific Specifications

After identifying variant use cases to be included in the PS diagram, the user makes decisions based on the PL specifications. In Table 1, there are two variation points (Lines 29 and 35), one variant use case (Lines 37-43), five optional steps (Lines 39-43), one optional alternative flow (Lines 8-12), and one variant order group (Lines 39-43). The decisions for the variation points are already made in the PL diagram. Three optional steps are selected with the order V3, V1, and V5. The optional alternative flow is unselected.

PUMConf automatically generates the PS use case specifications from the PL specifications, the diagram decisions and the specification decisions. Table 2 shows a PS use case specification generated from Table 1, where selected optional steps are generated with the order decided in the PS specifications (Lines 39-41). In the presence of multiple variants selected for the same variation point, PUM introduces validation checks to select the variation point to be used, based on their preconditions. For instance, based on the diagram decision for Method of Providing Data in Fig. 1, PUMConf creates two include statements for Provide System User Data via Standard Mode and via IEE QC Mode (Lines 31 and 34 in Table 2), and a validation step (Line 30) that checks if the precondition of Provide System User Data via Standard Mode holds. If it holds, Provide System User Data via Standard Mode is executed in the basic flow (Line 31). If not, Provide System User Data via IEE QC Mode is executed (Lines 32-35).

3. Motivation and Context

Our test case classification and prioritization approach is developed as an extension of our configurator, PUMConf, in the context of software systems configured for multiple customers, and developed according to a use case-driven development process. In such a context, requirements variability is communicated to customers using an interactive configuration process and an incremental testing strategy is followed for which guidance and automated support are needed. For instance, for each new product in a product family, IEE negotiates with customers how to resolve variation points in requirements or, in other words, how to configure the product line, and then selects and prioritizes, from the existing test suite(s) of the initial/previous product(s), the system test cases to be executed for the new product. In addition, IEE engineers identify new requirements that have not been tested before and existing test cases that need to be modified for the new product.

Figure 3. Opportunistic Reuse of System Test Cases

The current system testing practice at IEE, like in many other environments, is based on opportunistic reuse of test assets (da Mota Silveira Neto et al., 2011) (see Fig. 3). Product requirements are elicited from an initial customer and documented as a use case diagram and use case specifications. IEE engineers generate system test cases from the use case specifications. For each new customer in the product family, they copy the current models, and negotiate variabilities with the customer to produce a new use case diagram and new use case specifications (see copy and modify in Fig. 3). As a result of the negotiations, they make changes in the copied use case models (see modify). They manually choose and prioritize, from the existing test suite(s) of the previous product(s), test cases that can and need to be rerun to ensure existing, unmodified functionalities are still working correctly in the new product (Hajri, 2016) (Hajri et al., 2017a) (see select, prioritize and modify).

In practice, from a more general standpoint, engineers need more efficient and automated techniques to manage the reuse of common and variable requirements, given as use cases, together with system test cases derived from use cases across products in a product line. This is particularly important in contexts where functional safety standards (RTCA and EUROCAE, 2018; ISO, 2018) require traceability between requirements and system test cases. In such contexts, traceability (Ramesh and Jarke, 2001; SWE, 2014) helps guarantee that test cases properly cover all requirements, a very important objective in the standards systems need to comply with. In our previous work (Hajri et al., 2018b) (Hajri et al., 2016), we presented PUM which addresses the reuse of common and variable requirements across products, and PUMConf which supports the automated generation of product specific use case diagrams and specifications (see Section 2). In this paper, we address the problem of providing automated support for the selection and prioritization of system test cases derived from use cases in a product family.

We identify two challenges that need to be carefully considered when deciding about which system test cases to run on a new product in a product family:

Challenge 1: Identifying the Impact of Use Case Changes on System Test Cases. When there is a new customer requiring a new product in a product family, changes are made in the use case specifications for the new product and act as a contract. Since the generated use case models of the products differ, the test suites used to verify the compliance of the products with their specifications also differ. Therefore, for each new product in the product family, the engineer needs to identify (i) changes on the use case models for the new product, (ii) existing system test cases impacted by those changes, and (iii) the part of the updated use case models which has not been tested yet in the product family.

Figure 4. Test Cases derived from (a) the Basic Flow of the Use Case Recognize Gesture in Table 2 and (b) the First Specific Alternative Flow of the Same Use Case

Let us consider the two system test cases in Fig. 4(a) and (b), which are derived from the use case Recognize Gesture in Table 2. Fig. 4(a) covers the happy path scenario (the basic flow); Fig. 4(b) covers the scenario including the alternative flow SAF1 (Lines 8-10 in Table 2) in which the system aborts because of the invalid operating status. For a new STO product, the engineer decides to cover the scenario in which the voltage fluctuation is checked and detected (see the optional bounded alternative flow in Lines 8-12 in Table 1). When she changes the configuration for that scenario, the bounded alternative flow is added in the PS use case specification of Recognize Gesture. She needs to check (1) if the two test cases verifying Recognize Gesture are invalid because they exercise execution scenarios that are impossible due to the new bounded alternative flow, (2) if they need to be re-executed because they exercise scenarios in the new product which have been impacted by the new alternative flow, or (3) if it is not necessary to re-execute them because the new bounded alternative flow does not have any impact on the scenarios they cover. Further, she needs to derive new test cases to cover the voltage fluctuation scenario that is not covered by the existing test cases.

Challenge 2: Requirements-based Prioritization of System Test Cases based on Multiple Risk Factors. Multiple risk factors (such as requirements volatility in the product line and fault-proneness of requirements) may have to be considered while system test cases are prioritized for each new product in a product line. For instance, changing requirements, i.e., evolving configuration decisions in the context of automated configuration, cause changes in the design and implementation of the product, and thus increases the likelihood of introducing faults. It may also be desirable to rank higher system test cases for more complex requirements in case the system testing process should be stopped due to deadlines or budget restrictions. These factors may have varying importance for test case prioritization in different product lines due to technical and organizational factors. Therefore, the changing importance of risk factors on test case prioritization should be accounted for in each product line.

In the remainder of this paper, we focus on how to best address these challenges in a practical manner, in the context of use case-driven development, while relying on PUM for modeling PL use cases, and on PUMConf for the configuration of PS use case models.

4. Related Work

In this section, we cover the related work across three categories.

4.1. Testing of Product Lines

Various product line testing approaches have been proposed in the literature (do Carmo Machado et al., 2014) (da Mota Silveira Neto et al., 2011) (Lee et al., 2012) (Engström and Runeson, 2011) (Runeson and Engström, 2012) (Oster et al., 2011) (Tevanlinna et al., 2004) (Johansen et al., 2011). Neto et al. (da Mota Silveira Neto et al., 2011) present a comprehensive survey of product line testing strategies, i.e., testing product by product, opportunistic reuse of test assets, design test assets for reuse, division of responsibilities, and incremental testing of product lines. The strategy testing product by product ignores the benefits of reusing test cases developed for previous products, while the strategy opportunistic reuse of test assets focusses on the reuse of test assets across products without considering any systematic reuse method. The strategy design test assets for reuse enforces the creation of test assets early in product line development, under the assumption that product lines and configuration choices are exhaustively modeled before the release of any product. This assumption does not hold when product lines and configuration choices are refined during product configuration, which is a common industry practice. The strategy division of responsibilities is about defining testing phases that facilitate test reuse. Our approach follows the strategy incremental testing of product lines employing regression testing techniques, i.e., test case selection and prioritization. It is the first to support incremental testing of product lines through test case selection and prioritization for use case-driven development.

Product line testing covers two separate but closely related test engineering activities: domain testing and application testing. Domain testing validates and verifies reusable components in a product line while application testing validates and verifies a product in the product line against its specification. Our approach currently supports application testing, but can be employed in the context of domain testing. For each new product, our approach can be used to classify and prioritize domain test cases derived from PL use case models.

There are various product line testing approaches that support test case generation and execution in product lines (e.g., (Nebut et al., 2006b) (Reuys et al., 2006) (Kamsties et al., 2004) (Geppert et al., 2004) (McGregor, 2001) (Uzuncaova et al., 2010) (Uzuncaova et al., 2008) (Arrieta et al., 2017)). Some of them generate system test cases from use case models in a product family. However, they require detailed behavioural models (e.g., sequence or activity diagrams) which engineers tend to avoid because of the costs related to their development and maintenance. Among these approaches, the main work is that of Reuys et al. (Reuys et al., 2005) (Reuys et al., 2006) (Pohl and Metzger, 2006) (Kamsties et al., 2004), i.e., ScenTED, which is based on the systematic refinement of PL use case scenarios to PL system and integration test scenarios. ScenTED requires activity diagrams capturing activities described in use case specifications together variants of the product family. Extensions of ScenTED include the ScenTED-DF approach (Stricker et al., 2010) which relies on data-flow analysis to avoid redundant execution of test cases derived with ScenTED. A methodology that does not rely on detailed behavioural models is PLUTO (Product Lines Use Case Test Optimization) (Bertolino and Gnesi, 2003). PLUTO automatically derives test scenarios from PL use cases with some special tags for variability, but executable system test cases need to be manually derived from test scenarios.

Our approach complements the test generation approaches mentioned above. Not all generated system test cases need to be executed for new products since some of them have already been successfully tested for previous products. UMTG (Wang et al., 2015a) (Wang et al., 2015b) is a promising test generation approach that can be integrated into our approach. It generates system test cases from PS use case specifications in RUCM and from a domain model (class diagram). PUMConf can be used to automatically generate PS use case specifications, which are taken as input by UMTG to generate test cases to be classified and prioritized by our approach.

4.2. Test Case Classification and Selection

When defining a product in a product family for a new customer, there is a need not only for testing the changed parts of the product but also for testing other parts for regression. As the product grows, not all test cases can be rerun for regression due to limited resources. Test case selection is a strategy commonly adopted by regression testing techniques to reduce testing costs (Engström et al., 2010) (Yoo and Harman, 2012) (Do, 2016). Regression test selection techniques aim to reduce testing costs by selecting a subset of test cases from an existing test suite (Rothermel and Harrold, 1996). Most of them are code-based and use code changes and code coverage information to guide the test selection (e.g., (Kung et al., 1995) (Binkley, 1997) (Rothermel and Harrold, 1997) (Rothermel et al., 2000) (Harrold et al., 2001) (Qu et al., 2011) (Nardo et al., 2015)). Other techniques use different artifacts such as requirements specifications (e.g., (Vaysburg et al., 2002) (Mirarab et al., 2008) (Dukaczewski et al., 2013)), architecture models (e.g., (von Mayrhauser and Zhang, 1999) (Muccini et al., 2006) (Muccini and van der Hoek, 2003) (Muccini, 2007)), or UML diagrams (e.g., (Briand et al., 2009) (Chen et al., 2002) (Hemmati et al., 2010) (Zech et al., 2017)). For instance, Briand et al. (Briand et al., 2009) present an approach for automating regression test selection based on UML diagrams and traceability information linking UML models to test cases. They propose a formal mapping between changes on UML diagrams (i.e., class and sequence diagrams) and a classification of regression test cases into three categories (i.e., reusable, retestable, and obsolete).

The approaches mentioned above require detailed system design artifacts (e.g., finite state machines and UML sequence diagrams), rather than requirements in NL, such as use cases. Further, they compare a system artifact with its modified version to select test cases from a single test suite in the context of a single system, not in the context of a product line.

There are several product line test case selection approaches (Runeson and Engström, 2012) (Engström, 2013). Wang et al. (Wang et al., 2016) (Wang et al., 2017) propose a product line test case selection method using feature models. The method works in three steps: (i) software engineers indicate features that need to be tested; (ii) a toolset is used to check the consistency between features included in a program; and (iii) test cases are automatically selected so that all the test cases associated with a feature to be tested will be executed. The main limitation is that all the test cases of the product family need to be derived upfront even if some of them may never be executed and that the scope of the product family be defined in advance. There are other similar approaches suffering from the same limitation (Cabral et al., 2010) (Knapp et al., 2014) (Shurr et al., 2010) (Kahsai et al., 2008). In contrast, our approach requires that only test cases of the initial product be available in advance.

A test case selection approach that does not require early generation of test cases for the product family is that of Lity et al. (Lity et al., 2016) (Lochau et al., 2014) (Lity et al., 2012), which is based on model slicing for incremental product line testing. Lity et al. apply incremental model slicing to determine the impact of changes on a test model, e.g., finite state machines, and to reason about their potential retest. The approach needs detailed test models, e.g., finite state machines, which rarely exist in contexts where requirements are mostly captured in NL. In addition, Lity et al. do not support the definition of test cases for new requirements while our approach identifies use case scenarios that have not been tested before, and provides information of how to modify existing test cases to cover those new, untested scenarios (Challenge 1). Dukaczewski et al. (Dukaczewski et al., 2013) briefly discuss how to apply incremental product line testing strategies to NL requirements. They do not provide any method to model variability in requirements; it is only suggested that a requirement is split into several requirements, one for each possible product variant. Also, there is no reported systematic approach supported by a tool.

4.3. Test Case Prioritization

Test case prioritization techniques schedule test cases in an order that increases their effectiveness in meeting some performance goals (e.g., rate of fault detection and number of test cases required to discover all the faults) (Rothermel et al., 2001) (Khatibsyarbini et al., 2018) (Yoo and Harman, 2012). They mostly use information about previous executions of test cases (e.g., (Wong et al., 1997) (Rothermel et al., 2001) (Li et al., 2007) (Engström et al., 2011) (Gonzales-Sanchez et al., 2011) (Lachmann et al., 2016b) (Hemmati et al., 2017)), human knowledge (e.g., (Srikanth et al., 2014) (Srikanth and Williams, 2005) (Srikanth et al., 2016) (Srikanth and Banerjee, 2012) (Arafeen and Do, 2013) (Krishnamoorthi and Mary, 2009) (Tonella et al., 2006)), or a model of the system under test (e.g., (e Zehra Haidry and Miller, 2013) (Kundu et al., 2009) (Tahat et al., 2012) (Korel et al., 2005) (Korel et al., 2008)). For instance, Shrikanth et al. (Srikanth et al., 2005) propose a test case prioritization approach that takes into consideration customer-assigned priorities of requirements, developer-perceived implementation complexity, requirements volatility, and fault proneness of requirements. Tonella et al. (Tonella et al., 2006)

propose a test case prioritization technique using user knowledge through a machine learning algorithm (i.e., Case-Based Ranking). Lachmann et al. 

(Lachmann et al., 2016b) propose another test case prioritization technique for system-level regression testing based on supervised machine learning. In contrast to the aforementioned approaches, we do aim at prioritizing test cases for a new product in a product family, not for the next version of a single system. Our approach considers multiple risk factors in a product line, identifies their impact on the test case prioritization for the previous products in the product line, and prioritizes test cases for a new product accordingly (Challenge 2).

There are some other approaches that address product lines (e.g., (Runeson and Engström, 2012) (Engström, 2013) (Al-Hajjaji et al., 2014) (Baller et al., 2014) (Henard et al., 2014) (Ensan et al., 2011) (Devroey et al., 2017) (Devroey et al., 2014) (Al-Hajjaji et al., 2017a) (Al-Hajjaji et al., 2017b) (Lity et al., 2017)). For instance, to increase feature interaction coverage during product-by-product testing, Al-Hajiaji et al. (Al-Hajjaji et al., 2014) (Al-Hajjaji et al., 2016) propose a similarity-based prioritization approach that incrementally selects the most diverse product in terms of features to be tested. Baller et al. (Baller et al., 2014) propose an approach to prioritize products in a product family based on the selection of test suites with regard to cost/profit objectives. The aforementioned techniques prioritize the products to be tested, which is not useful in our context since products are seldom developed in parallel. In contrast, our approach prioritizes the test cases of a new product to support early detection of software faults based on multiple risk factors (Challenge 2).

There are search-based approaches for multi-objective test case prioritization in product lines (e.g., (Wang et al., 2014) (Parejo et al., 2016) (Arrieta et al., 2016) (Pradhan et al., 2018) (Arrieta et al., 2019)). For instance, Parejo et al. (Parejo et al., 2016)

model test case prioritization as a multi-objective optimization problem and implement a search-based algorithm to solve it based on the NSGA-II evolutionary algorithm. Arrieta et al. 

(Arrieta et al., 2019) propose another approach that cost-effectively optimizes the test process of product lines. None of them work based on information at the level of NL requirements.

Lachmann et al. (Lachmann et al., 2015) introduce a test case prioritization technique for incremental testing of product lines using delta-oriented architecture models. The differences between products are captured in the form of deltas (Clarke et al., 2010), which are modifications between architecture models of products used for integration testing. The proposed approach ranks test cases based on the number of changed elements in the architecture. It is later extended using risk factors (Lachmann et al., 2017) and behavioral knowledge of architecture components (Lachmann et al., 2016a). The approach proposed by Lachmann et al. requires access to product architecture descriptions and information about component behavior. In contrast, we do not require any design information but relies on NL requirements specifications, i.e., use case specifications (Challenge 2).

5. Overview of the Approach

The process in Fig. 5 presents an overview of our approach. In Step 1, Classify system test cases for the new product, our approach takes as input (i) system test cases, PS use case models, their trace links, and configuration decisions for previous products in the product family, and (ii) PS use case models and configuration decisions for the new product, to classify the system test cases for the new product as obsolete, retestable, and reusable, and to provide information on how to modify obsolete system test cases to cover new, untested use case scenarios (Challenge 1).

Figure 5. Approach Overview

Step 1 is fully-automated. The classification and modification information in output of this step is for the test engineer to decide which test cases to execute for the new product and which modifications to make on the obsolete test cases to cover untested, new use case scenarios. We give the details of this step in Section 6.

In Step 2, Select and modify system test cases for the new product, by using the classification information and modification guidelines automatically provided by our approach, the engineer decides which test cases to run for the new product and modifies obsolete test cases to cover untested, new use case scenarios. The activity is not automated because, for the selection of system test cases, the engineer may also need to consider implementation and hardware changes (e.g., code refactoring and replacing some hardware with less expensive technology) in addition to the classification information provided in Step 2, which is purely based on changes in functional requirements. For instance, a reusable test case might need to be rerun because part of the source code verified by the test case is refactored.

In Step 3, Prioritize system test cases for the new product, selected test cases are automatically prioritized based on risk factors including fault proneness of requirements, and requirements volatility (Challenge 2). We discuss this step in Section 7.

6. Classification of System Test Cases in a Product Family

The test case classification is implemented as a pipeline (see Fig. 6), which takes as input the configuration decisions made for the previous products, the configuration decisions made for the new product, and the previous product’s system test cases, trace links, and PS use case models. The pipeline produces an impact report with the list of existing test cases classified.

Configuration decisions are captured in a decision model that is automatically generated by PUMConf during the configuration process. The decision model conforms to a decision metamodel described in our previous work (Hajri et al., 2018b). The metamodel includes the main use case elements for which the user makes decisions (i.e., variation points, optional steps, optional alternative flows, and variant orders). PUMConf keeps a decision model for each configuration in the product line. Fig. 7 provides the decision metamodel and two decision models for the PL use case models in Fig. 1 and Table 1.

Figure 6.

Overview of the Model Differencing and Test Case Classification Pipeline

The pipeline has four steps (see Fig. 6). The first three steps are run for each of the previous products in the product line, where each one has a decision model with . Note that we also employ the first two steps of the pipeline in our previous work (Hajri et al., 2018a) (Hajri et al., 2017b). In Step 1, Matching decision model elements, our approach automatically runs the structural differencing of and by looking for corresponding model elements representing decisions for the same variations (see Section 6.1).

Change Types
. Add a decision
. Delete a decision
. Update a decision
  - Select some unselected
  variant element(s)
  - Unselect some selected
  variant element(s)
  - Unselect some selected
  variant element(s) and
  select some unselected
  variant element(s)
  - Change order number
  of variant step order(s)
Table 3. Change Types for Configuration Decisions

In Step 2, Change calculation, the approach determines how configuration decisions of the two products differ. Table 3 lists the types of decision changes. A decision is represented by means of a n-tuple of model elements in a decision model. A change is of type “Add a decision” when a tuple representing a decision in has no matching tuple in . A change is of type “Delete a decision” when a tuple representing a decision in has no matching tuple in . A change is of type “Update a Decision” when a tuple representing a decision in has a matching tuple in with non-identical attribute values (see the red-colored attributes in Fig. 7(c)).

Figure 7. (a) Decision Metamodel, (b) Part of the Example Decision Model of the Previous Product (), and (c) Part of the Example Decision Model of the New Product ()

In Step 3, Test case classification, the system test cases of the previous products are classified for the new product by using the decision changes obtained from Step 2 and the trace links between the system test cases and the PS use case specifications (see Section 6.2). A use case can describe multiple use case scenarios (i.e., sequences of use case steps from the start to the termination of the use case) because of the presence of conditional steps. Each system test case is expected to exercise one use case scenario. For instance, there are three use case scenarios in the use case Provide System User Data in Table 2. For each use case of the new product, we identify the impact of the decision change(s) on the use case scenarios, i.e., any change in the execution sequence of the use case steps in the scenario.

A system test case is classified in one of three categories: obsolete, retestable and reusable. A test case is obsolete if it exercises an invalid execution sequence of use case steps in the new product. A test case is retestable if it exercises an execution sequence of use case steps that has remained valid in the new product, except for internal steps representing internal system operations (e.g., reset of counters). A test case is reusable if it exercises an execution sequence of use case steps that has remained valid in the new product. The test case categories are mutually exclusive. Use case scenarios of the new product that have not been tested for the previous product are reported as new use case scenarios.

In Step 4, Impact report generation, we automatically generate an impact report from the classified test cases of each previous product to enable engineers to select test cases from more than one test suite (see Section 6.3). Steps 1, 2 and 3 are the pairwise comparison of each previous product with the new product. If there are multiple previous products ( in Fig. 6), test cases of each product are classified separately in reports in Step 3. The generated impact report compares these separate reports and lists sets of new scenarios and reusable and retestable test cases for the previous products.

6.1. Steps 1 and 2: Model Matching and Change Calculation

For the first two pipeline steps in Fig. 6, we rely on a model matching and change calculation algorithm we devised in our prior work (Hajri et al., 2017b) (Hajri et al., 2018a). In this section, we provide a brief overview of the two steps and their output for the example decision models in Fig. 7(b) and (c).

Decisions in Decisions in
<B6, B7 > <C6, C7 >
<B18, B19 > <C18, C19 >
<B11, B12, B13 > <C11, C12, C13 >
<B11, B12, B14 > <C11, C12, C14 >
<B11, B12, B15 > <C11, C12, C15 >
<B11, B12, B16 > <C11, C12, C16 >
<B11, B12, B17 > <C11, C12, C17 >
Table 4. Matching Decisions in and in Fig. 7

In Step 1, we identify pairs of decisions in and that are made for the same variants. The decision metamodel in Fig. 7(a) includes the main use case elements for which the user makes decisions (i.e., variation point, optional step, optional alternative flow, and variant order). In a variation point included by a use case111In PL use case diagrams, use cases are connected to variation points with an include dependency., the user selects variant use cases to be included for the product. For PL use case specifications, the user selects optional steps and alternative flows to be included and determines the order of steps (variant order). Therefore, the matching decisions in Step 1 are the pairs of variation points and use cases including the variation points, the pairs of use cases and optional alternative flows in the use cases, and the triples of use cases, flows in the use cases, and optional steps in the flows. Table 4 shows some pairs of decisions in Fig. 7(b) and (c). For example, the pairs and represent two decisions for the variation point Method of Providing Data included in the use case Provide System User Data. The triples and represent two decisions for an optional use case step in the basic flow of the use case Provide System User Data via Standard Mode (i.e., for V2 in Line 40 in Table 1).

In Step 2, Change Calculation, we first identify deleted and added configuration decisions by checking tuples of model elements in one input decision model () which do not have any matching tuples of model elements in another input decision model (). To identify updated decisions, we check tuples of model elements in that have matching tuples of model elements in with non-identical attribute values. The matching pairs of variation points and their including use cases represent decisions for the same variation point (e.g., and in Table 4). If the selected variant use cases for the same variation point are not the same in and , the decision in is considered as updated in . We have similar checks for optional steps, optional alternative flows and variant order of steps. For instance, an optional step is selected in the decision represented by the triple in , while the same optional step is unselected in the decision represented by the matching triple in . For the decision models in Fig. 7, the decisions represented by , , , , , and are identified as updated. There are no deleted or added decisions for the models in Fig. 7.

6.2. Step 3: Test Case Classification

Input: Set of use case specifications of the previous product ,
                    Test suite of the previous product ,
                    Triple of sets of decision-level changes
                    (ADD, DELETE, UPDATE)
Output: Quadruple of sets of classified test cases
1. Let be the empty set for obsolete test cases
2. Let be the empty set for reusable test cases
3. Let be the empty set for retestable test cases
4. Let be the empty set for new use case scenarios
5. Let be the quadruple (OBSOLETE, REUSE, RETEST, NEW)
6. foreach do
7. if (there is a change in for ) then
8. generateUseCaseModel()
9. Let be a new version of after the changes in
10. identifyTestedScenarios(, )
11. foreach do
12. retrieveTestCases(, , )
13. analyzeImpact(, , , )
14. end foreach
15. else
16. retrieveTestCases(, )
17. end if
18. end foreach
19. filterNewScenarios()
20.
Figure 8. Test Case Classification Algorithm

System test cases of the previous product are automatically classified based on the identified changes (Step 3 in Fig. 6). To this end, we devise an algorithm (see Fig. 8) which takes as input a set of use cases (UC), the test suite of the previous product (ts), and a triple of the sets of configuration changes (dc) detected in Step 2. It classifies the test cases and reports use case scenarios of the new product that are not present in the previous product.

For each use case in the previous product, we check whether it is impacted by some configuration changes (Lines 6-7 in Fig. 8). If there is no impact, all the system test cases of the use case are classified as reusable (Lines 15-17); otherwise, we rely on the function generateUseCaseModel (Line 8) to generate a use case model, i.e., a model that captures the control flow in the use case. This model is used to identify scenarios that have been tested by one or more test cases (identifyTestedScenarios in Line 10). For each scenario verified by a test case (retrieveTestCases in Line 12), we rely on the function analyzeImpact (Line 13) to determine how decision changes affect the behaviour of the scenario.

In Sections 6.2.16.2.26.2.3 and 6.2.4, we give the details of the functions generateUseCaseModel, identifyTestedScenarios, retrieveTestCases and analyzeImpact, respectively.

Figure 9. Metamodel for Use Case Scenario Models

6.2.1. Use Case Model Generation

To generate a use case scenario model from a PS use case specification, we rely on a Natural Language Processing (NLP) solution proposed by Wang et al. 

(Wang et al., 2015a). It relies on the RUCM keywords and part-of-speech tagging to extract information required to build a use case model. In this section, we briefly describe the metamodel for use case scenario models, shown in Fig. 9, and provide an overview of the model generation process. UseCaseStart represents the beginning of a use case with a precondition and is linked to the first Step (i.e., next in Fig. 9). There are two Step subtypes, i.e., Sequence and Condition. Sequence has a single successor, while Condition has two successors (i.e., true and false in Fig. 9).

Figure 10. Use Case Scenario Models Generated from the Use Case Specifications in Table 2

Interaction indicates the invocation of an input/output operation between the system and an actor. Internal indicates that the system alters its internal state. Exit represents the end of a use case flow, while Abort represents the termination of an anomalous execution flow. Fig. 10 shows the models generated from the use cases in Table 2. For each step identified as Interaction, Include, Internal, Condition or Exit, a Step instance is generated and linked to the previous Step instance.

For each alternative flow, a Condition instance is created and linked to the Step instance of the first step of the alternative flow (e.g., a4 and a5 in Fig. 10(a)). For multiple alternative flows on the same condition, Condition instances are linked to each other in the order they follow in the specification. For alternative flows that return back to the reference flow, an Exit instance is linked to the Step instance that represents the reference flow step (e.g., next between b8 and b3 in Fig. 10(b)).

For alternative flows that abort, an Abort instance is created and linked to the Step instance of the previous step (e.g., a8, a10, b15 and c7 in Fig. 10). For the end of the basic flow, there is always an Exit instance (e.g., a7, b6, c5 and d5 in Fig. 10).

6.2.2. Identification of Tested Use Case Scenarios

We automatically identify tested use case scenarios in a use case specification. A scenario is a sequence of steps that begins with a UseCaseStart instance and ends with an Exit instance in the use case model. Each use case scenario captures a set of interactions that should be exercised during the execution of a test case.

Figure 11. Some Tested Use Case Scenarios

The function identifyTestedScenarios (see Line 12 in Fig. 8) implements a depth-first traversal of use case models to identify tested scenarios. It visits alternative flows which are tested together with previously visited alternative flows by the same test case.

Fig. 11 shows three tested scenarios extracted from the scenario models in Fig. 10(a) and (c). The scenario in Fig. 11(a) executes the true branch of the Condition instance a5 in Fig. 10(a), while the scenario in Fig. 11(b) executes the false branch of the same Condition instance. The scenario in Fig. 11(c) executes the basic flows in Fig. 10(c) and (d).

6.2.3. Identification of Test Cases for Use Case Scenarios

We use trace links between test cases and use case specifications to retrieve test cases for a given scenario. The accuracy of test case retrieval depends on the granularity of trace links. Companies may follow various traceability strategies (Ramesh and Jarke, 2001), and generate links in a broad range of granularity (e.g., to use cases, to use case flows or to use case steps). We implement a traceability metamodel which enables the user to generate trace links at different levels of granularity (see Fig. 12(a)).

Figure 12. (a)Traceability Metamodel and (b) Example Model

Fig. 12 (b) gives part of the traceability model for trace links, assigned by engineers, between two test cases and the use cases Recognize Gesture and Identify System Operating Status in Table 2. Test case t1 is traced to the basic flows of Recognize Gesture and Identify System Operating Status (i.e., (t1 f1) and (t1 f3)), while test case t2 is traced to the specific alternative flow SAF2 of Recognize Gesture (i.e., (t2 f2)).

We retrieve, using the trace links in Fig. 12(b), t1 for the scenario in Fig. 11(a) since it is the only scenario executing the basic flows of Recognize Gesture and Identify System Operating Status. The scenario in Fig. 11(b) executes the alternative flow SAF2 of Recognize Gesture (see a9 and a10 in Fig. 11(b)). Therefore, we retrieve t2 for the scenario in Fig. 11(b).

Our approach often requires trace links from test cases to only basic and alternative flows without indicating the execution order of the flows. There are few cases where we need finer-grained trace links to retrieve test cases. First, we need finer-grained trace links when multiple scenarios take the same alternative flows with different orders. In such a case, our approach needs the execution order of alternative flows to match test cases and scenarios (see the attribute order in Fig. 12(a)).

Second, we need finer-grained trace links when there are more than one scenario taking the same bounded or global alternative flow. Those alternative flows refer to more than one step in a reference flow. Hence, a scenario can take a bounded/global alternative flow from different reference flow steps; we need trace links indicating the reference flow step in which the flow is taken (see “to” from TraceLink to Step in Fig. 12(a)). If we do not have trace links at the right level of granularity, we ask the user to match scenarios and test cases.

Inputs: Old use case scenario , Set of test cases ,
            New use case specification , Decision changes
Output: Quadruple of sets of classified test cases
(OBSOLETE, REUSE, RETEST, NEW)
1. generateScenarioModel()
2. Let be the UseCaseStart instance in
3. Let be an empty scenario
4. if (there is at least one change in for ) then
5. analyzeChangesOnScenario(, , )
6. identifyNewScenarios(, , , )
7. else
8.
9. end if
10.
Figure 13. Algorithm for analyzeImpact

6.2.4. Impact Identification

We analyze the impact of configuration changes on use case scenarios to identify new scenarios and classify retrieved test cases as obsolete, retestable and reusable. To this end, we devise an algorithm (see Fig. 13) which takes as input a use case scenario to be analyzed (), a set of test cases verifying the scenario (), a use case specification for the new product (), and a triple of the sets of configuration changes (dc) produced in Step 2. If there is no change impacting the scenario, test cases verifying the scenario are classified as reusable (Line 8 in Fig. 13). For any change in the scenario (e.g., removing a use case step), the test cases are classified as either retestable or obsolete (Line 5) as shown in Table 5, which describes how test cases are classified based on the types of changes affecting the variant elements covered by a scenario. A test case is classified as retestable when it does not need to be modified to cover the corresponding scenario. Changes impacting a use case scenario may lead to modifications in source code. Since modifications in the source code may introduce faults, retestable test cases are expected to be re-executed to ensure that the system behaves correctly. A test case is classified as obsolete when the sequence of inputs it provides to the system may no longer enable the execution of the corresponding scenario or when the oracles used to verify the results may no longer be correct, given the changes on the scenario. Obsolete test cases cannot be reused as is to retest the system, but they need to be modified.

Rule ID Change in the Scenario Test Case Classification Rationale
R1 Add or remove an internal step Retestable Internal use case steps represent internal system operations (e.g., reset of counters) and do not directly affect system-actor interactions. Therefore, a test case does not need to be modified to exercise a scenario including added or deleted internal steps (e.g., a new internal step does not imply an additional test input or an update in the test oracle). The test case can be executed against the new product without any change; however, the system may not behave as expected (e.g., because of a faulty implementation of a new internal use case step) and thus the test case is classified as retestable.
R2 Update the order of an internal step Retestable Since internal use case steps do not directly affect system-actor interactions, a test case does not need to be modified in the presence of a change in the order of internal steps (i.e., a different sequence of internal steps does not imply an update in test inputs or oracles). However, the system may not behave as expected (e.g., because of a faulty implementation of the new order of an internal step) and thus the test case is classified as retestable.
R3 Add or remove a condition step where the condition refers exclusively to state variables Retestable Condition steps are used to verify properties of input entities and/or state variables. A condition step, in practice, restricts the execution of a use case scenario to a subset of the values assigned to the input entities and/or state variables verified by the condition. State variables are used to model the system state, while input entities describe system inputs provided by actors. The addition and removal of condition steps that verify the properties of state variables reflect changes in the internal behaviour of the system but not in the system-actor interactions. Therefore, a test case is not modified in the presence of added/removed condition steps that only verify the properties of state variables (e.g., such a new condition step does not imply an update in test inputs and oracle). However, the system may not behave as expected (e.g., because of a faulty implementation of the changed state variables) and thus the test case is classified as retestable.
R4 Add or remove a condition step where the condition refers to an input entity Obsolete Adding or removing a condition step referring to input entities may imply an update in the test inputs if the test input values do not satisfy the changed condition. Since we do not inspect executable test cases in our analysis, it is not possible to determine if the test cases of the previous product already provide the values that fulfill the changed condition. To be conservative, we consider test cases of scenarios impacted by such changes as obsolete thus forcing engineers to verify if the test input values exercise the scenario.
R5 Update the order of a condition step Obsolete When old and new scenarios differ regarding the order in which condition steps appear, then the behaviour triggered by the test case of the previous product might not be the same in the new product (e.g., if the steps that define the variables verified by the condition are between the condition steps that have been changed). Therefore, we consider a test case that exercises an old scenario affected by such changes as obsolete.
R6 Add or remove an input/output step Obsolete Input and output use case steps represent system-actor interactions. Therefore, the implementation of the test case needs to be modified to exercise the targeted scenario when input and output steps are added or removed (e.g., a new input step implies an additional test input in the test case).
R7 Update the order of an input/output step Obsolete Since input and output use case steps represent system-actor interactions, the implementation of the test case needs to be modified to exercise the targeted scenario when the order of input and output steps is updated (e.g., a new order of input steps implies an update in the sequence of test inputs).
R8 Remove an alternative flow Obsolete Alternative flows capture sequences of interactions taking place under certain execution conditions. If a use case scenario of the previous product covers an alternative flow that does not exist in the new product, the corresponding test case should be considered as obsolete because the interactions verified by the test case cannot take place with the new product.
R9 Multiple changes in the use case scenario Obsolete or Retestable A test case is classified as obsolete if there is at least one change in the scenario that makes the test case obsolete. A test case is classified as retestable if there are no changes in the scenario that make the test case obsolete and if there is at least one change in the scenario that makes the test case retestable.
Table 5. Changes in Use Case Scenarios and Classification of System Test Cases
Input: New scenario model , model instance , old scenario , new scenario ,
Output: Set of triples of new scenario, old scenario and guidance
1. Let be the empty set for triples of old scenario, new scenario and guidance
2. if ( is a , or instance) then
3. addToScenario(, )
4. identifyNewScenarios(, , , )
5. end if
6. if ( is a instance) then
7. addToScenario(, )
8. if ( exist in ) then
9. Let be the instance after in the branch taken in
10. Let be the instance corresponding to in
11. identifyNewScenarios(, , , )
12. else
13. if ( represents a condition in a specific alternative flow) then
14. if ( and exist together in ) then
15. identifyNewScenarios(, , , )
16. else
17.
18. identifyNewScenarios(, , , )
19. identifyNewScenarios(, , , )
20. end if
21. else
22. if ( and exist together in ) then
23. identifyNewScenarios(, , , )
24. else
25.
26. identifyNewScenarios(, , , )
27. identifyNewScenarios(, , , )
28. end if
29. end if
30. end if
31. end if
32. if ( is an or instance) then
33. if ( is an instance for the included use case) then
34. identifyNewScenarios(, , , )
35. else
36. addToScenario(, )
37. generateGuidance(, )
38.
39. end if
40. end if
41.
Figure 14. Algorithm for identifyNewScenarios

For the example configuration changes identified in Section 6.1, the scenarios in Fig. 11(a) and (b) are classified as retestable while the scenario in Fig. 11(c) is classified as obsolete. The tuple represents an updated decision; the unselected optional bounded alternative flow of the use case Recognize Gesture is selected in the new product (see Section 6.1). The selected optional flow contains a condition, i.e., “voltage fluctuation is detected” in Line 10 in Table 1, which does not refer to any entity in the input steps. Since the condition step is added in the scenarios in Fig. 11(a) and (b), these two scenarios are classified as retestable. The triples , , and in Fig. 7 represent updated decisions for the use case Provide System User Data (see Section 6.1). Some of the unselected output steps are selected while one selected output step is unselected and the order of the output steps are updated in the basic flow of Provide System User Data (see Fig. 7). Therefore, the test case verifying the scenario in Fig. 11(c) for the basic flow of Provide System User Data is classified as obsolete (see rules R6, R7 and R9 in Table 5).

Figure 15. Two New Scenarios Derived from the Scenarios in Fig. 11

We process scenarios impacted by the configuration changes to identify new scenarios for the new product (Line 6 in Fig. 13). Furthermore, for each new scenario, we provide guidance to support the engineers in the implementation of test case(s). To this end, we devise an algorithm (Fig. 14) which takes as input a use case model of the new product (sm), a use case step in the model (inst), a use case scenario of the previous product () that has been exercised by either an obsolete or retestable test case, and a new scenario () which is initially empty. The algorithm generates a set of triples , , , where is the new scenario identified, is the old scenario of the previous product, and is the guidance, i.e., a list of suggestions indicating how to modify test cases covering to generate test cases covering .

In Fig. 14, the algorithm follows a depth-first traversal of sm by following use case steps in sm that have corresponding steps in . To this end, when traversing condition steps, the algorithm follows alternative flows taken in (Lines 8-11). Whenever a Condition instance is encountered, the algorithm checks if the Condition instance exists also in (Line 8). If so, the algorithm proceeds with the condition branch taken in , i.e., the step following the Condition instance in (Line 11); otherwise, it takes the condition branch(es) which have not yet been taken in (Lines 12-30).

Figure 16. PUMConf’s User Interface for Guidance

Alternative flows may lead to execution loops; this happens when alternative flows resume the execution of steps belonging to the originating flows. In our current implementation we generate scenarios that cover each loop body once. To this end, when processing condition steps, the algorithm checks if the branches that may lead to cycles have already been traversed (i.e., Lines 14 and 22). If it is the case, the traversal of the scenario is directed towards the branch that brings the scenario out of the cycles (i.e, the true branch for specific alternative flows and the false branch for bounded or global alternative flows as shown in Lines 15 and 23, respectively).

The generation of terminates when an Exit or Abort step is reached (Line 32). The only exception is that of Exit steps of included use cases, which lead to another step that follows the Include step (Line 33). Before adding to the result tuple, we automatically compare and , and determine their differences to generate guidance for new test cases ( in Line 37). We provide a set of suggestions for adding, removing and updating test case steps corresponding to added, removed and updated use case steps in and .

Fig. 15 gives two new scenarios derived from the scenarios in Fig. 11. Fig. 15(a) is derived from Fig. 11(a) and (b); Fig. 15(b) is derived from Fig. 11(c). The new scenario in Fig. 15(a) executes the new selected optional bounded alternative flow in which the use case Recognize Gesture aborts due to the voltage fluctuation (see Lines 8-12 in Table 1). While traversing sm for in Fig. 11(a), the new Condition instance anew1 and the new Abort instance anew2 (green-colored in Fig. 15(a)) are added in to execute the bounded alternative flow. in Fig. 15(b) executes the basic flow of the use case Provide System User Data of the new product where the order of one step is updated (blue-colored in Fig. 15(b)) and some new steps are introduced (green-colored) while some others are removed.

Fig. 16 shows the generated guideline to modify the test case verifying the retestable scenario in Fig. 11(a) for the new scenario in Fig. 15(a). We use red and green colors, with a legend, on the scenario to explain the impacted parts of the corresponding test case. The red steps are deleted while the green ones are added to the scenario. Using this information, the engineer adds and deletes test case steps to cover the new scenario.

Fig. 17 shows the header of the test case verifying the new scenario in Fig. 15(a) with the description of the functions under test. For simplification, we omit the implementations of the executable test case. We use the guidance to derive the new test case from the test case in Fig. 4(a) verifying the scenario in Fig. 11(a). The bold lines in Fig. 17 are the new objectives and methods of the test case that correspond to the new use case steps in Fig. 15(a) (i.e., anew1 and anew2).

A new scenario might be derived separately from multiple old scenarios. After all the new scenarios are identified for the new product, we automatically detect such new scenarios and provide guidance for only the test cases of the old scenarios from which the engineer generates the new test cases with the least possible changes (Line 19 in Fig. 8). We rank those old scenarios according to the number of changes. If the number of changes are the same, we give priority to scenarios with more changes removing test case steps. We assume that removing test case steps is more convenient than adding new steps. For instance, our approach derives the new scenario in Fig. 15(a) from two scenarios in Fig. 11(a) and (b). To generate a test case verifying the new scenario in Fig. 15(a), the engineer can modify one of the test cases verifying the scenarios in Fig. 11(a) and (b). In Fig. 15(a), our approach provides guidance for both scenarios because the number of changes and the number of removed and added test case steps are the same for the two scenarios.

Figure 17. System Test Case derived from the System Test Case in Fig. 4(a) of the Scenario in Fig. 11(a)

6.3. Step 4: Impact Report Generation

We automatically generate an impact report from the classified test cases of each previous product in a product line (Step 4 in Fig. 6). To enable engineers to select test cases from more than one test suite and thus maximize the number of test cases that can be inherited from previous products, we compare all the test suites in the product line and identify sets of new scenarios and reusable and retestable test cases for the product line. Assume that there are previous products in a product line. , , …, and are the sets of new scenarios. , , …, and are the sets of reusable test cases; , , …, and are the sets of retestable test cases we identify when we compare a new product with each previous product. To minimize the number of new test cases the engineer needs to generate, we compute the intersection of the sets of new scenarios ( = ). The scenarios which are not in the intersection of the sets are covered by at least one reusable or retestable test case in one of the previous products. Therefore, we take the union of the sets of reusable test cases ( = ) and the union of the sets of retestable test cases ( = ). If a test case is considered both retestable and reusable (i.e., (), we list the previous products in which the test case is identified as retestable and reusable. The engineer decides the classification of the test case based on the test suite of the previous product he chooses for the new product.

Based on the system under test, engineers decide whether to select test cases from a single test suite or from multiple test suites in the product line. For example, if multiple products include different setup procedures (e.g., due to different HW architecture or library versions being used) that need to be executed at the beginning of each test case, it is more practical to select test cases from a single test suite.

7. Prioritization of System Test Cases in a Product Family

Test case prioritization is implemented as a pipeline (see Fig. 18). The pipeline takes as input the test suite of the new product, the test execution history of the previous products (i.e., the outcome of each test case of the product test suite, for each previous product and version), the size of the use case scenarios exercised by the test cases, the classification of the test cases (i.e., reusable or retestable), and the variability information of the product line. Based on a prediction model using these factors, the test cases of the given test suite are sorted to maximize the likelihood of executing failing test cases first.

The prioritization pipeline gives the highest priority to test cases covering new scenarios (i.e., scenarios not available for previous products) since they exercise features that have never been tested before. The prioritization of retestable and reusable test cases is instead driven by a set of factors typically correlated with the triggering of failures, according to the relevant literature (e.g., (Srikanth et al., 2005) (Engström et al., 2011) (Wong et al., 1997) (Rothermel et al., 2001) (Li et al., 2007)): the number of previous products in which the test case failed, the number of previous products’ versions in which the test case failed, the size of the scenario exercised by the test case, the degree of variability in the use case scenario exercised by the test case, and the classification of the test case (i.e., reusable or retestable). Note that different versions of a product share the same test suite because functional requirements do not vary across the versions of the same product. The number of previous products in which the test case failed and the number of versions in which the test case failed capture the fault proneness of the test cases, a factor typically considered by other test case prioritization approaches (Srikanth et al., 2005) (Engström et al., 2011). The size of the use case scenario exercised by a test case is measured in terms of the number of use case steps it contains. The scenario size captures the complexity of the operations performed by the system during the execution of the test case, under the assumption that longer scenarios require more complex software implementations. Implementation complexity is one of the factors considered in other requirements-based prioritization approaches (Srikanth et al., 2005). The degree of variability in the use case scenario exercised by a test case is measured by counting the number of decision elements included in the use case scenario. In the presence of high variability, it is more likely that some of the system properties verified by the test case are not implemented properly. Finally, the classification of a test case as retestable is considered for prioritization since, by definition, the scenario exercised by a retestable test case might be affected by changes in behaviour and thus may trigger a failure.

Figure 18.

Overview of the Test Case Prioritization Pipeline

All these factors mentioned above may have varying importance for test case prioritization in different product lines due to technical and organizational factors. Some factors may even not significantly affect test case prioritization for some product lines. To account for the changing importance of risk factors on test case prioritization, the pipeline first identifies the factors significantly correlated with the presence of failures and prioritizes test cases based on a prediction model relying on such factors.

The prioritization pipeline includes two steps. In Step 1, Identifying significant factors, our approach automatically identifies significant factors for prioritizing the test cases of a new product. To this end, we employ logistic regression (Jr. et al., 2013)

, i.e., a predictive analysis to determine the relationship between one dependent binary variable (i.e., the failure of a test case) and one or more independent variables, which might be either numeric (e.g., the number of the products in which the test case failed in the past) or binary (e.g., the fact that a test case has been classified as retestable).

In our context, the logistic regression model estimates the logarithm of the odds that a test case fails. The logistic regression model is trained using variability information, the size of the use case scenarios exercised by the test cases, the classification of the test cases, and the execution history of the test cases for previous products. The logistic regression model has the following form:

where

is the probability that test case

fails, is the degree of variability of the scenario exercised by the test case (i.e., the number of decision elements in the scenario), is the size of the use case scenario exercised by the test case (i.e., the number of steps), is the number of failing products, is the number of failing versions, and indicates whether the test case has been classified as retestable. is the intercept, while are coefficients which are derived, using the iteratively reweighted least squares approach (Coleman et al., 1980), to estimate the effect size on the failure probability.

We rely on the R environment (Rpr, 2018) to derive the logistic regression model. Our toolset automatically generates from the available data the training data set to be processed by the R environment. Table 6 shows an excerpt of an example training data set generated by our toolset.

Table 6 includes the failure history of products ,  and to be used to prioritize the test cases for . Each row in Table 6 reports the information belonging to a single test case executed against a version of a product. The first and second columns represent the product and its version, respectively. The third column reports the test case identifier, while the fourth column indicates whether the test case fails (i.e., the dependent variable). The rest of the columns in Table 6 represent independent variables used to predict failure. The fifth column indicates if the test case has been classified as retestable. The sixth column reports the size of the use case scenario exercised by the test case. The seventh column reports the degree of variability of the scenario exercised by the test case. Table 6, for instance, shows that test case executed against covers nine use case steps while the same test case covers eight use case steps when executed against ; this is due to the covered use case scenario in including one additional variant element than the use case scenario covered in (see column Degree of Variability). Test case has been introduced in  to cover one additional use case scenario not present in . The eighth and ninth columns report the number of products and the number of versions in which the test case fails, respectively.



Product ID
Version ID Test Case ID Fails Retestable Size of the Use Case Scenario Degree of Variability of the Scenario # of Previous Products in which it Fails # of Previous Versions in which it Fails

P1
V1 TC1 1 0 8 2 0 0
P1 V1 TC2 0 0 4 1 0 0
P1 V2 TC1 1 0 8 2 0 1
P1 V2 TC2 0 0 4 1 0 0
P1 V3 TC1 0 0 8 2 0 2
P1 V3 TC2 0 0 4 1 0 0
P1 V4 TC1 0 0 8 2 0 2
P1 V4 TC2 0 0 4 1 0 0
P2 V1 TC1 1 1 9 3 1 2
P2 V1 TC2 0 0 4 1 0 0
P2 V1 TC3 0 0 4 1 0 0
P2 V2 TC1 0 1 9 3 1 3
P2 V2 TC2 1 0 4 1 0 0
P2 V2 TC3 0 0 4 1 0 0
P2 V3 TC1 0 1 9 3 1 3
P2 V3 TC2 0 0 4 1 0 1
P2 V3 TC3 0 0 4 1 0 0
P3 V1 TC1 1 1 9 3 2 3
P3 V1 TC2 1 1 5 2 1 1
P3 V1 TC3 0 0 4 1 0 0
P3 V2 TC1 1 1 9 3 2 4
P3 V2 TC2 0 1 5 2 1 2
P3 V2 TC3 0 0 4 1 0 0

Table 6. Excerpt of the Training Data Set used for Logistic Regression

To identify the significant factors for test case prioritization, we apply the p-value method of hypothesis testing based on Wald test (Rice, 2007)

. The method relies on the failure probability predicted by the regression model to determine whether there is evidence to reject the null hypothesis that

there is no relationship between the two variables. The p-value indicates the likelihood of observing the data points when the null hypothesis is true. Therefore, if the p-value is smaller than a given threshold (we use 0.05) then it is unlikely that the dataset has been generated by chance and, consequently, the null hypothesis can be rejected (i.e., there is a relationship between the factor and the dependent variable). In the model, we keep the given factors whose p-value is smaller than the threshold. To automatically determine significant factors, we rely on the p-value computed by the Wald test on the logistic regression model trained by including all the factors. Finally, we derive a new, multivariate logistic regression model that includes only the significant factors. For example, the logistic regression model derived for one of the products used in our empirical evaluation (see P4 in Section 9) is the following:

This model, for example, does not include the number of failing products () since it is not significant according to the computed p-value.

The generated logistic regression model is a predictive model that returns, based on the significant factors, the probability that a test case fails. In Step 2, Prioritize test cases, we prioritize test cases by relying on the probability calculated by the regression model. The test cases are sorted in descending order of probability and presented to engineers.

8. Tool Support

We have implemented our approach as an extension of PUMConf (Product line Use case Model Configurator) (Hajri et al., 2016). PUMConf has been developed as an IBM DOORS Plug-in. It supports classifying system test cases for new products, providing guidance to modify system test cases for new use case scenarios, and prioritizing system test cases for new products. For more details and accessing the tool, see: https://sites.google.com/site/pumconf/.

PUMConf relies on Papyrus (https://www.eclipse.org/papyrus/) for managing use case diagrams, and IBM Doors (www.ibm.com/software/products/ca/en/ratidoor/) for managing use case specifications and test cases. The impact reports are visualized as part of IBM Doors’ output using JGraph (https://www.jgraph.com/) and Microsoft Excel (https://products.office.com/en/excel/). PUMConf employs the GATE workbench (http://gate.ac.uk/), i.e., an open source Natural Language Processing (NLP) framework, to generate scenario models from use case specifications. It uses scripts written in the Doors eXtension Language (DXL) to load use case specifications in the native IBM DOORS format. PUMConf employs R scripts (https://www.rdocumentation.org) for logistic regression.

9. Evaluation

Our objective is to assess, in an industrial context, whether our approach could improve test case reuse and reduce testing effort. This empirical evaluation aims to answer the following research questions:

  • RQ1. Does the proposed approach provide correct test case classification results?

    This research question aims to evaluate the precision and recall of the procedure adopted to classify the test cases developed for previous products.

  • RQ2. Does the proposed approach accurately identify new scenarios that are relevant for testing a new product? This research question aims to evaluate the precision and recall of the approach in identifying the new scenarios to be tested for a new product (i.e., new requirements not covered by existing test cases).

  • RQ3. Does the proposed approach successfully prioritize test cases? This research question aims to determine whether the approach is able to effectively prioritize system test cases that trigger failures and thus can help minimize testing effort while retaining maximum fault detection power.

  • RQ4. Can the proposed approach significantly reduce testing costs compared to current industrial practice? This research question aims to determine to what extent the proposed approach can help significantly reduce the cost of defining and executing system test cases.

9.1. Subject of the Study

The subject of our study is the Smart Trunk Opener (STO) system developed by our industry partner IEE. STO has been selected for the assessment of our approach since it is a relatively new project at IEE involving multiple customers requiring varying features. The development history of the STO product line includes five products delivered to different car manufacturers. STO customers include major car manufacturers working in the European, Asian and US markets, with 2017 sales ranging from 200,000 to 3 million vehicles. For each product, IEE engineers developed multiple versions, each sharing the same functional requirements but differing with respect to non-functional requirements (e.g., hardware selection or performance optimizations). In total, STO includes 54 versions.



# of Use Cases # of Variation Points # of Basic Flows # of Alternative Flows # of Steps # of Optional Alternative Flows # of Optional Steps

Essential UCs
15 5 15 70 269 5 14
Variant UCs 14 3 14 132 479 8 13
Total 29 8 29 202 748 13 27

Table 7. Overview of the STO Product Line Use Cases

To develop the STO system, IEE engineers elicited requirements as use cases from an initial customer. For each new customer, they cloned the current use cases and identified differences to produce new use cases. Table 7 provides an overview of the STO product line. The data in Table 7 shows that the system implements 29 use cases, each one being fairly complex since the use cases in total include 202 alternative flows (i.e., alternative cases to be considered when implementing the use case). The STO product line is highly configurable, with 14 variant use cases, 8 variation points, 13 optional alternative flows and 27 optional steps. STO has the size and characteristics of typical embedded product line systems managing automotive components. To apply the proposed approach, we have considered STO requirements written according to PUM (Hajri et al., 2015) (Hajri et al., 2018b). Table 8 reports information about the STO products including the number of versions for each product. In Table 8, the products are sorted according to their delivery date, with P1 being the first product of the product line, and P5 being the last.


Product ID
# of Versions # of Use Case Elements # of Test Cases
Use Cases Use Case Flows Use Case Steps


P1
22 28 236 689 110
P2 8 25 169 568 86
P3 10 28 234 685 96
P4 5 26 212 618 83
P5 9 28 238 695 113

Table 8. Details of the Configured Products in the STO Product Line

The different STO products are characterized by different test suites of different sizes while the same test suite is shared by all the versions of the same product since their functional requirements do not vary. The test cases have been traced to the use case specifications by IEE engineers. Column # Test Cases in Table 8 shows, for every STO product, the number of test cases belonging to the functional test suite of the product.

9.2. Experiment Setup

Our approach for test case classification can be applied using single-product settings (i.e., to classify and prioritize test cases that belong to a previous product) and whole-line settings (i.e., to classify test cases of multiple previous products). To evaluate our approach for test case classification and to spot differences in terms of classification results with the two configurations (e.g., number of test cases that can be reused), we applied the approach using both settings. To evaluate test case prioritization, we prioritized test suites developed to test different STO products. We applied test case prioritization to the entire test suite since its execution is required by safety standards for every product being released.

9.3. Results

9.3.1. Rq1

To answer RQ1, we, together with IEE engineers, inspected the classification results produced by the approach. We evaluated the approach in terms of the average precision and recall we computed over the three different classes according to standard formulas (Sokolova and Lapalme, 2009). In our context, a true positive is a test case correctly classified according to the expected class (e.g., a reusable test case classified as reusable). A false positive is a test case incorrectly classified as being part of a given class (e.g., a retestable test case classified as reusable). A false negative is a test case that belongs to a given class but has not been classified as such (e.g., a reusable test case not classified as reusable).

Tables 9 and 10 provide the results for the single-product and whole-line settings, respectively. The first two columns report the ID of the product(s) whose test suite(s) have been considered for classification and the ID of the product being tested, respectively. The next three columns provide the number of test cases belonging to the three classes. The last two columns indicate precision and recall. We observe that the approach has perfect precision and recall. This is the result of meticulous requirements modeling practices in place at IEE where functional requirements are documented by means of use cases together with proper traceability to test cases. These practices enable a precise identification of impacted scenarios and consequently the correct classification of test cases. It is typical for companies, like IEE, developing embedded, safety-critical systems, since requirements need to be traced and tested to comply with international safety standards (e.g., ISO 26262 (ISO, 2018)). To apply the approach, we relied on the trace links assigned by IEE engineers to use cases and system test cases. We did not need to ask IEE engineers to provide additional trace links to match scenarios with test cases since all the trace links were at the right level of granularity required by our approach.



Classified Test Suite
Product to be Tested # of Reusable # of Retestable # of Obsolete Precision Recall
P1 P2 94 2 14 1.0 1.0
P1 P3 105 2 3 1.0 1.0
P1 P4 102 2 6 1.0 1.0
P1 P5 84 22 4 1.0 1.0
P2 P3 85 0 1 1.0 1.0
P2 P4 83 0 3 1.0 1.0
P2 P5 67 16 3 1.0 1.0
P3 P4 91 0 5 1.0 1.0
P3 P5 77 17 2 1.0 1.0
P4 P5 77 5 1 1.0 1.0
Table 9. Test Case Classification Results for Single-Product Settings


Classified Test Suites
Product to be Tested Reusable Retestable Obsolete Precision Recall

P1
P2 94 2 14 1.0 1.0
P1, P2 P3 107 0 2 1.0 1.0
P1, P2, P3 P4 102 0 12 1.0 1.0
P1, P2, P3, P4 P5 93 15 1 1.0 1.0

Table 10. Test Case Classification Results for Whole-Line Settings

9.3.2. Rq2

To answer RQ2, we checked if the new scenarios were exercised by the test cases in the manually implemented test suites of the new products. If so, we considered those new scenarios relevant. In addition, we, together with IEE engineers, checked whether the new scenarios that were not exercised were relevant for testing these new products. We classified the new scenarios as true positive (i.e., a scenario identified by our approach and relevant for testing), false positive (i.e., a scenario identified by our approach but not relevant for testing), and false negative (i.e., a scenario tested by IEE but not identified by our approach). We computed precision and recall accordingly.

Tables 11 and 12 report the results obtained using the single-product and whole-line settings, respectively. The third, fourth, and fifth columns provide the number of relevant scenarios identified by our approach, and, among these, the number of scenarios tested and not tested by IEE engineers. The sixth column (Not Relevant) indicates the number of irrelevant scenarios. The columns named New Scenarios Not Identified provide the number of scenarios tested by IEE engineers but not identified by our approach. The last two columns report precision and recall. All the new scenarios identified by our approach are relevant; they are covered by the test cases produced by IEE engineers. Consequently, the approach has perfect precision and recall.

In addition, we observe from Table 12 that the availability of additional products in the whole-line settings enables the identification of additional new scenarios, and consequently more accurate testing. This is what happens for product P5, in which the whole-line settings lead to the identification of 14 new scenarios. Five of these new scenarios have not been tested by engineers in any of the existing products. More precisely, the test suites of P1 and P3 enable the identification of four and three scenarios not tested in the test suite of P5, respectively; only two of these scenarios are tested by both for a total of five new scenarios identified. This difference between existing test suites is explained by the fact that certain test teams have defined more complete test suites (i.e., the test team for P1 and P3). Since new scenarios are identified based on existing test cases (see Section 6.2.4), for products with more complete test suites, the availability of more test cases may lead to the identification of additional new scenarios.



Classified
Product New Scenarios Identified New Scenarios Precision Recall
Test Suite to be Tested Relevant (TP) Tested by Engineers Not Tested Not Relevant (FP) Not Identified (FN)
P1 P2 3 3 0 0 0 1.0 1.0
P1 P3 3 3 0 0 0 1.0 1.0
P1 P4 2 1 1 0 0 1.0 1.0
P1 P5 27 23 4 0 0 1.0 1.0
P2 P3 1 1 0 0 0 1.0 1.0
P2 P4 1 1 0 0 0 1.0 1.0
P2 P5 22 22 0 0 0 1.0 1.0
P3 P4 1 1 0 0 0 1.0 1.0
P3 P5 26 23 3 0 0 1.0 1.0
P4 P5 10 10 0 0 0 1.0 1.0


Table 11. Relevance of Scenarios Identified using Single-Product Settings


Classified
Product New Scenarios Identified New Scenarios Preci- Recall
Test Suites to be Tested Relevant (TP) Tested by Engineers Not Tested Not Relevant (FP) Not Identified (FN) sion
P1 P2 3 3 0 0 0 1.0 1.0
P1, P2 P3 1 1 0 0 0 1.0 1.0
P1, P2, P3 P4 0 0 0 0 0 1.0 1.0
P1, P2, P3, P4 P5 14 9 5 0 0 1.0 1.0

Table 12. Relevance of Scenarios Identified using Whole-Line Settings

9.3.3. Rq3

To answer RQ3, we applied our test case prioritization approach to sort the test cases in the test suites of four STO products (i.e., P2, P3, P4 and P5). In total, we built four regression models, one for each STO product. To evaluate the quality of the prediction, we relied on historical data. Using test execution history, we verified that higher priority was given to test cases that have failed. Because of this, we prioritized the test cases that belonged to the test suite originally developed by IEE engineers and ignored the new test scenarios we had identified (Section 6). This did not introduce bias in the evaluation since test cases exercising new scenarios are always on top of the prioritized test suite and their execution is always necessary independently from their predicted capability to discover faults. In the following, we discuss our results including the identification of significant factors and the effectiveness of the prioritized test suites.

Table 13 provides detailed information about the significant factors identified. Column Significant factors lists the significant factors identified for each product. When more historical information is available, more factors significantly correlate with the presence of faults. For example, we observe that the classification of a test case as retestable becomes significant after three products are included in the development history of the product line. This can be explained by the fact that updated configuration decisions impact a limited number of scenarios (i.e., the number of retestable test cases is usually low) and thus, this factor only becomes significant when enough examples of retestable test cases have occurred in previous products. As expected, the number of failing products also becomes significant after a sufficient number of products in the product line.

Column Odds Ratio presents the odds ratio of each significant factor. The odds ratio captures the effect size of the factor on the outcome of the regression model (i.e., the probability of observing a failure). A value above one indicates that the factor positively contributes to the outcome; a factor below one indicates that the factor negatively contributes to the outcome (note that this may be caused by a statistical interaction with another factor). The results show that the number of failing versions is the factor that impacts most positively the probability of failure. It is highly likely that a test case that failed in the past will fail again, which is in line with previous research results. The odds ratio for the number of failing versions varies between 1.71 and 2.09. We observe that, expectedly, the number of failing products statistically interacts with the number of failing versions. This has been determined by running logistic regression on each factor separately. Certain factors show a positive regression parameters when considered alone but become negative when interacting with other factors in the regression model. In this case, this is probably due to the two factors being correlated.

To evaluate the effectiveness of test case prioritization, we measured the percentage of test cases to be executed to trigger all the failures, and compared our approach with the ideal case that executes all the failing test cases first. Table 14 summarizes our findings.

For all the products in our evaluation, our approach identifies more than 80% of the failures by executing less than 50% of the test cases (see Columns %Failures detected with 50% of the test cases and %Test cases executed to identify 80% of the failures). We notice that the number of test cases required to trigger all the failures drops below 60% when the test execution history of at least two products becomes available (see Column %TCs executed to identify all the failures). In the case of P5, for example, the execution of 27% of the test cases is sufficient to trigger all failures. This is explained by the fact that newer products are more mature (i.e., they tend to fail less frequently) but is also due to logistic regression models improving over time. Indeed, for newer products, though there are fewer failing test cases, our approach remains accurate at giving higher priority to failing test cases. This capability is particularly relevant for industry since the early identification of failures enables early maintenance activities and, consequently, speeds up the product release.

To compare our approach with the ideal case, we computed the Area Under Curve (AUC) for the cumulative percentage of failures triggered by executed test cases for both the ideal case and prioritization, and computed the AUC ratio of the two. Fig. 19 shows the two curves. The best result is achieved when the AUC ratio is equal to one (i.e., the AUC for the observed data matches the ideal AUC). The results show that the proposed approach achieves impressive results since the AUC ratio is always greater than or equal to 0.95.

Classified Test Suites Product to be Tested Significant Factors Odds Ratio
P1 P2 V; S; FV 0.35; 1.08; 2.09
P1, P2 P3 V; S; FV 0.35; 1.06; 1.85
P1, P2, P3 P4 V; S; FV; R 0.78; 1.04; 1.71; 0.36
P1, P2, P3, P4 P5 V; S; FP; FV; R 0.36; 1.04; 0.92; 1.87; 0.51

Legend: V=Degree of Variability, S=Size, FP=Failing Products, FV=Failing Versions, R=Retestable.

Table 13. Analysis of Significant Factors identified by Logistic Regression
Classified Test Product to be AUC Ratio %Test Cases Executed to Identify %Failures Detected with
Suites Tested (Observed/Ideal) All the Failures 80% of the Failures 50% of the Test Cases
P1 P2 0.98 (65.46/66.48) 72.09% 38.37% 97.43%


P1, P2
P3 0.99 (82/82.48) 41.66% 22.91% 100%


P1, P2, P3
P4 0.97 (71.02/72.97) 51.80% 22.89% 95%


P1, P2, P3, P4
P5 0.95 (101.32/105.97) 26.54% 18.58% 100%

Table 14. Test Case Prioritization Results
(a) P2
(b) P3
(c) P4
(d) P5
Figure 19. Prioritization results: percentage of failures detected after running prioritized test cases.

9.3.4. Rq4

Product to Test Cases To Be Implemented using the Proposed Approach
be tested Single-Product Settings Whole-Line Settings

P2
3/99 (3%) 3/99 (3%)

P3
1/86 (1%) 1/108 (1%)

P4
1/92 (1%) 0/102 (0%)

P5
10/92 (11%) 14/122 (11%)

Table 15. Test Development Costs Savings
Product to Test Suite Number Test cases to be executed to identify all the failures
be tested Size of Failures Current Practice Proposed Approach

P2
86 39 84 (97.67%) 62 (72.09%)

P3
96 27 80 (83.33%) 40 (41.66%)

P4
83 20 77 (92.77%) 43 (51.80%)

P5
113 14 69 (61.06%) 30 (26.54%)


Table 16. Development Process Savings

Our test case classification and prioritization approach may reduce both (i) test case development costs (i.e., the number of test cases that need to be designed and implemented by engineers to test the software) and (ii) software development time (e.g., by detecting more failures at early stages of testing).

As a surrogate metric to measure the savings, for each product of the STO product line, we report the number and percentage of test cases that can be reused when adopting the proposed approach (see Table 15). Columns Single-Product Settings and Whole-Line Settings report the results achieved by the approach when reusing only the test cases inherited from one previous product and from the test suites of all the previous products in the product line, respectively. As seen in these two columns, the effort required to implement test cases is very limited since, with the proposed approach, engineers need to implement only the test cases required to cover new scenarios. For instance, in the whole-line settings for product P4, engineers do not need to implement any test case at all. Instead, testing teams at IEE currently do not rely on approaches that support systematic reuse of test cases, a practice which often leads to re-implementing most of the test cases from scratch. Finally, one benefit provided by the whole-line settings is the identification of additional new scenarios, as discussed in Section 9.3.2; this is the case of product P5 where the whole-line configuration settings lead to the identification of four additional scenarios not identified with the single-product settings.

To evaluate the impact of our approach on software development time, we measured the percentage of test cases that need to be executed in order to identify all the failures in a product (see Table 16). Column Current Practice in Table 16 reports the percentage of test cases that need to be executed when considering the order followed by IEE engineers, which is based on domain knowledge. Column Proposed Approach reports the results for the proposed approach. For all the products, our approach identifies all the failures with less test cases than the current practice. This is particularly true for product P5 where our approach requires the execution of less than half of the test cases prioritized by engineers. By using our approach, IEE can detect and fix failures earlier and thus speed up their software development.

9.4. Threats to Validity

Internal validity. To limit threats to internal validity, we considered the test cases developed by IEE engineers and the historical information collected over the years of system development. To avoid bias in the results, we considered the use case specifications written by IEE engineers and simply reformulated them according to PUM (Hajri et al., 2015) (Hajri et al., 2018b).

External validity. To mitigate the threat to generalizability, we considered a software product line that includes nontrivial use cases, with multiple customers and many sources of variability, in an application domain where product lines are the norm. The fact that STO has been installed on cars developed by major car manufacturers all over the world guarantees that the configuration decisions for STO cover a wide spectrum of possible configurations and that the testing process put in place by IEE adheres to state-of-the-art quality standards. Based on our experience built over the years with various automotive companies, we expect that the type of configuration decisions characterizing the STO product line and the type and number of test cases developed for STO are representative of other types of embedded automotive systems.

10. Conclusion

This paper presents an automated test case classification and prioritization approach that supports use case-driven testing in product lines. It automatically classifies and prioritizes, for new products in a product line, system test cases of previous product(s), and provides guidance in modifying existing system test cases to cover new use case scenarios that have not been tested in the product line before.

We improve the test case selection and execution process in product lines by informing engineers about the impact of requirements changes on system test cases in a product family and by automatically and incrementally classifying and prioritizing system test cases. Such classification attempts to determine what test cases need to be rerun or modified, whereas prioritization helps ensure failures are triggered as soon as possible.

Our test case selection and prioritization approach is built on top of our previous work (i.e., Product line Use case Modeling method and the Product line Use case Model Configurator), and supported by a tool integrated into IBM DOORS. The key characteristics of our tool support are (1) the automated identification of the impact of requirements changes on existing system test cases, possibly leading to their selection or modification for a new product, (2) the automated identification of new use case scenarios in the new product that have not been tested in the product line, (3) the automated generation of guidance for modifying existing system test cases to cover those new scenarios, and (4) the automated prioritization of the selected system test cases for the new product. We performed an industrial case study in the automotive domain, whose results suggest that our approach is practical and beneficial to classify and prioritize system test cases in industrial product lines and provide useful guidance for modifying existing system test cases for new products in industrial settings.

This work is one of the last steps to achieve our long term objective (Hajri, 2016) (Hajri et al., 2017a), i.e., the support for change impact analysis and regression test selection to help engineers manage changes in requirements and system test cases in a product family. Our approach does not support the evolution of PL use cases. We still need to address and manage changes in variability aspects of PL use cases such as adding a new variation point in the PL use case diagram, and their impact on test cases in the context of test case selection and prioritization.

References

  • (1)
  • SWE (2014) 2014. Guide to the Software Engineering Body of Knowledge (SWEBOK V3.0). IEEE Computer Society Press.
  • IEE (2018) 2018. IEE (International Electronics & Engineering) S.A., http://www.iee.lu/.
  • Rpr (2018) 2018. The R project, https://www.r-project.org.
  • Al-Hajjaji et al. (2017a) Mustafa Al-Hajjaji, Jacob Kruger, Sandro Schulze, Thomas Leich, and Gunter Saake. 2017a. Efficient Product-Line Testing using Cluster-Based Product Prioritization. In AST’17. 16–22.
  • Al-Hajjaji et al. (2017b) Mustafa Al-Hajjaji, Sascha Lity, Remo Lachmann, Thomas Thum, Ina Schaefer, and Gunter Saake. 2017b. Delta-Oriented Product Prioritization for Similarity-Based Product-Line Testing. In VACE’17. 34–40.
  • Al-Hajjaji et al. (2016) Mustafa Al-Hajjaji, Thomas Thum, Malte Lochau, Jens Meinicke, and Gunter Saake. 2016. Effective Product-Line Testing Using Similarity-Based Product Prioritization. Software and System Modeling (2016).
  • Al-Hajjaji et al. (2014) Mustafa Al-Hajjaji, Thomas Thum, Jens Meinicke, Malte Lochau, and Gunter Saake. 2014. Similarity-Based Prioritization in Software Product-Line Testing. In SPLC’14. 197–206.
  • Arafeen and Do (2013) Md Junaid Arafeen and Hyunsook Do. 2013. Test Case Prioritization Using Requirements-Based Clustering. In ICST’13. 312–321.
  • Arrieta et al. (2017) Aitor Arrieta, Goiuria Sagardui, Leire Etxeberria, and Justyna Zander. 2017. Automatic Generation of Test System Instances for Configurable Cyber-Physical Sytems. Software Quality Journal 25, 3 (2017), 1041–1083.
  • Arrieta et al. (2016) Aitor Arrieta, Shuai Wang, Goiuria Sagardui, and Leire Etxeberria. 2016. Test Case Prioritization of Configurable Cyber-Physical Systems with Weight-Based Search Algorithms. In GECCO’16. 1053–1060.
  • Arrieta et al. (2019) Aitor Arrieta, Shuai Wang, Goiuria Sagardui, and Leire Etxeberria. 2019. Search-Based test Case Prioritization for Simulation-Based Testing of Cyber-Physical System Product Lines. Journal of Systems and Software 149 (2019), 1–34.
  • Baller et al. (2014) Hauke Baller, Sascha Lity, Malte Lochau, and Ina Schaefer. 2014. Multi-objective Test Suite Optimization for Incremental Product Family Testing. In ICST’14. 303–312.
  • Bertolino and Gnesi (2003) Antonia Bertolino and Stefania Gnesi. 2003. PLUTO: A Test Methodology for Product Families. In PFE’03. 181–197.
  • Binkley (1997) David Binkley. 1997. Semantics Guided Regression Test Cost Reduction. IEEE Transactions on Software Engineering 23, 8 (1997), 498–516.
  • Briand et al. (2009) Lionel C. Briand, Yvan Labiche, and S. He. 2009. Automating Regression Test Selection based on UML Designs. Information and Software Technology 51 (2009), 16–30.
  • Bühne et al. (2003) Stan Bühne, Günter Halmans, and Klaus Pohl. 2003. Modeling Dependencies between Variation Points in Use Case Diagrams. In REFSQ’03. 59–69.
  • Cabral et al. (2010) Isis Cabral, Myra B. Cohen, and Gregg Rothermel. 2010. Improving the Testing and Testability of Software Product Lines. In SPLC’10. 241–255.
  • Chen et al. (2002) Yanping Chen, Robert L. Probert, and D. Paul Sims. 2002. Specification-based Regression Test Selection with Risk Analysis. In CASCON’02.
  • Clarke et al. (2010) Dave Clarke, Michiel Helvensteijn, and Ina Schaefer. 2010. Abstract Delta Modeling. In GPCE’10. 13–22.
  • Coleman et al. (1980) David Coleman, Paul Holland, Neil Kaden, Virginia Klema, and Stephen C. Peters. 1980. A System of Subroutines for Iteratively Reweighted Least Squares Computations. ACM Trans. Math. Software 6, 3 (1980), 327–336.
  • da Mota Silveira Neto et al. (2011) Paulo Anselmo da Mota Silveira Neto, Ivan do Carmo Machado, John D. McGregor, Eduardo Santana de Almeida, and Silvio Romero de Lemos Meira. 2011. A Systematic Mapping Study of Software Product Line Testing. Information and Software Technology 53 (2011), 407–423.
  • Devroey et al. (2017) Xavier Devroey, Gilles Perrouin, Maxime Cordy, Hamza Samih, Axel Legay, Pierre-Yves Schobbens, and Patrick Heymans. 2017. Statistical Prioritization for Software Product Line Testing: An Experience Report. Software and Systems Modeling 16, 1 (2017), 153–171.
  • Devroey et al. (2014) Xavier Devroey, Gilles Perrouin, Maxime Cordy, Pierre-Yves Schobbens, Axel Legay, and Patrick Heymans. 2014. Towards Statistical Prioritization for Software Product Lines Testing. In VaMoS’14. 1–7.
  • Do (2016) Hyunsook Do. 2016. Recent Advances in Regression Testing Techniques. In Advances in Computers, Vol. 103.
  • do Carmo Machado et al. (2014) Ivan do Carmo Machado, John D Mcgregor, Yguaratã Cerqueira Cavalcanti, and Eduardo Santana De Almeida. 2014. On Strategies for Testing Software Product Lines: A Systematic Literature Review. Information and Software Technology 56, 10 (2014), 1183–1199.
  • Dukaczewski et al. (2013) Michael Dukaczewski, Ina Schaefer, Remo Lachmann, and Malte Lochau. 2013. Requirements-Based Delta-Oriented SPL Testing. In PLEASE’13. 49–52.
  • e Zehra Haidry and Miller (2013) Shifa e Zehra Haidry and Tim Miller. 2013. Using Dependency Structures for Prioritization of Functional Test Suites. IEEE Transactions on Software Engineering 39, 2 (2013), 258–275.
  • Engström (2013) Emelie Engström. 2013. Supporting Decisions on Regression test Scoping in a Software Product Line Context - from Evidence to Practice. Ph.D. Dissertation. Lund University.
  • Engström and Runeson (2011) Emelie Engström and Per Runeson. 2011. Software Product Line Testing - A Systematic Mapping Study. Information and Software Technology 53 (2011), 2–13.
  • Engström et al. (2011) Emelie Engström, Per Runeson, and Andreas Ljung. 2011. Improving Regression Testing Transparency and Efficiency with History-Based Prioritization - An Industrial Case Study. In ICST’11. 367–376.
  • Engström et al. (2010) Emelie Engström, Per Runeson, and Mats Skoglund. 2010. A Systematic Review on Regression Test Selection Techniques. Information and Software Technology 52, 1 (2010), 14–30.
  • Ensan et al. (2011) Alireza Ensan, Ebrahim Bagheri, Mohsen Asadi, Dragan Gasevic, and Yevgen Biletskiy. 2011. Goal-Oriented Test Case Selection and Prioritization for Product Line Feature Models. In ITNG’11. 291–298.
  • Geppert et al. (2004) Birgit Geppert, Jenny Li, and David M. Weiss. 2004. Towards Generating Acceptance Tests for Product Lines. Springer, 35–48.
  • Gonzales-Sanchez et al. (2011) Alberto Gonzales-Sanchez, Eric Piel, Rui Abreu, Hans-Gerhard Gross, and Arjan J.C. van Gemund. 2011. Prioritizing Tests for Software Fault Diagnosis. Software Prac. Experience 41, 10 (2011), 1105–1129.
  • Hajri (2016) Ines Hajri. 2016. Supporting Change in Product Lines within the Context of Use Case-Driven Development and Testing. In Doctoral Symposium - SIGSOFT FSE’16. 1082–1084.
  • Hajri et al. (2015) Ines Hajri, Arda Goknil, Lionel C. Briand, and Thierry Stephany. 2015. Applying Product Line Use Case Modeling in an Industrial Automotive Embedded System: Lessons Learned and a Refined Approach. In MoDELS’15. 338–347.
  • Hajri et al. (2016) Ines Hajri, Arda Goknil, Lionel C. Briand, and Thierry Stephany. 2016. PUMConf: a Tool to Configure Product Specific Use Case and Domain Models in a Product Line. In SIGSOFT FSE’16. 1008–1012.
  • Hajri et al. (2017b) Ines Hajri, Arda Goknil, Lionel C. Briand, and Thierry Stephany. 2017b. Incremental Reconfiguration of Product Specific Use Case Models for Evolving Configuration Decisions. In REFSQ’17. 3–21.
  • Hajri et al. (2018a) Ines Hajri, Arda Goknil, Lionel C. Briand, and Thierry Stephany. 2018a. Change Impact Analysis for Evolving Configuration Decisions in Product Line Use Case Models. Journal of Systems and Software 139 (2018), 211–237.
  • Hajri et al. (2018b) Ines Hajri, Arda Goknil, Lionel C. Briand, and Thierry Stephany. 2018b. Configuring Use Case Models in Product Families. Software and Systems Modeling 17, 3 (2018), 939–971.
  • Hajri et al. (2017a) Ines Hajri, Arda Goknil, and Thierry Stephany. 2017a. A Change Management Approach in Product Lines for Use Case-Driven Development and Testing. In Poster Session - REFSQ’17.
  • Halmans and Pohl (2003) Günter Halmans and Klaus Pohl. 2003. Communicating the Variability of a Software-Product Family to Customers. Software and Systems Modeling 22, 1 (2003), 15–36.
  • Harrold et al. (2001) Mary Jean Harrold, James A. Jones, Tongyu Li, and Donglin Liang. 2001. Regression Test Selection for Java Software. In OOPSLA’01.
  • Hemmati et al. (2010) Hadi Hemmati, Lionel Briand, Andrea Arcuri, and Shaukat Ali. 2010. An Enhanced Test Case Selection Approach for Model-Based Testing: An Industrial Case Study. In FSE’10. 267–276.
  • Hemmati et al. (2017) Hadi Hemmati, Zhihan Fang, Mika V. Mantyla, and Bram Adams. 2017. Prioritizing Manual Test Cases in Rapid Release Environments. Software Testing, Verification and Reliability 27, 6 (2017).
  • Henard et al. (2014) Christopher Henard, Mike Papadakis, Gilles Perrouin, Jacques Klein, Patrick Heymans, and Yves Le Traon. 2014. Bypassing the Combinatorial Explosion: Using Similarity to Generate and Prioritize T-Wise Test Configurations for Software Product Lines. IEEE Transactions on Software Engineering 40, 7 (2014), 650–670.
  • ISO (2018) ISO. 2018. ISO-26262: Road Vehicles – Functional Safety.
  • Johansen et al. (2011) Martin Fagereng Johansen, Øystein Haugen, and Franck Fleurey. 2011. A Survey of Empirics of Strategies for Software Product Line Testing. In ICSTW’11. IEEE, 266–269.
  • Jr. et al. (2013) David W. Hosmer Jr., Stanley Lemeshow, and Rodney X. Sturdivant. 2013. Applied Logistic Regression. Wiley.
  • Kahsai et al. (2008) Temesghen Kahsai, Markus Roggenbach, and Bernd-Holger Schlingloff. 2008. Specification-based Testing for Software Product Lines. In SEFM’08. 149–159.
  • Kamsties et al. (2004) Erik Kamsties, Klaus Pohl, Sacha Reis, and Andreas Reuys. 2004. Testing Variabilities in Use Case Models. In PFE’03. 6–18.
  • Khatibsyarbini et al. (2018) Muhammad Khatibsyarbini, Mohd Adham Isa, Dayang N.A. Jawawi, and Rooster Tumeng. 2018. Test Case Prioritization Approaches in Regression Testing: A Systematic Literature Review. Information and Software Technology 93 (2018), 74–93.
  • Knapp et al. (2014) Alexander Knapp, Markus Roggenbach, and Bernd-Holger Schlingloff. 2014. On the Use of Test Cases in Model-based Software Product Line Development. In SPLC’14. 247–251.
  • Korel et al. (2008) Bogdan Korel, George Koutsogiannakis, and Luay H. Tahat. 2008. Application of System Models in Regression Test Suite Prioritization. In ICSM’08. 247–256.
  • Korel et al. (2005) Bogdan Korel, Luay H. Tahat, and Mark Harman. 2005. Test Prioritization using System Models. In ICSM’05. 559–568.
  • Krishnamoorthi and Mary (2009) R. Krishnamoorthi and S.A. Sahaaya Arul Mary. 2009. Factor Oriented Requirement Coverage based System Test Case Prioritization of New and Regression Test Cases. Information and Software Technology 51 (2009), 799–808.
  • Kundu et al. (2009) Debasish Kundu, Monalisa Sarma, Debasis Sarma, and Rajib Mall. 2009. System Testing for Object-Oriented Systems with Test Case Prioritization. Software Testing, Verification and Reliability 19, 4 (2009), 297–333.
  • Kung et al. (1995) David C. Kung, Jerry Gao, and Pei Hsia. 1995. Class Firewall, Test Order, and Regression Testing of Object-Oriented Programs. Journal of Object-Oriented Programming 8, 2 (1995), 51–65.
  • Lachmann et al. (2017) Remo Lachmann, Simon Beddig, Sascha Lity, Sandro Schulze, and Ina Schaefer. 2017. Risk-Based Integration Testing of Software Product Lines. In VaMoS’17. 52–59.
  • Lachmann et al. (2016a) Remo Lachmann, Sascha Lity, Mustafa Al-Hajjaji, Franz Furchtegott, and Ina Schaefer. 2016a. Fine-Grained Test Case Prioritization for Integration Testing of Delta-Oriented Software Product Lines. In FOSD’16.
  • Lachmann et al. (2015) Remo Lachmann, Sascha Lity, Sabrina Lischke, Simon Beddig, Sandro Schulze, and Ina Schaefer. 2015. Delta-oriented Test Case Prioritization for Integration Testing of Software Product Lines. In SPLC’15. 81–90.
  • Lachmann et al. (2016b) Remo Lachmann, Manuel Nieke, Christoph Seidl, Ina Schaefer, and Sandro Schulze. 2016b. System-Level Test Case Prioritization Using Machine Learning. In ICMLA’16. 361–368.
  • Larman (2002) Craig Larman. 2002. Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and the Unified Process. Prentice Hall Professional.
  • Lee et al. (2012) Jihyun Lee, Sungwon Kang, and Danhyung Lee. 2012. A Survey on Software Product Line Testing. In SPLC’12. 31–40.
  • Li et al. (2007) Zheng Li, Mark Harman, and Robert M. Hierons. 2007. Search Algorithms for Regression Test Prioritization. IEEE Transactions on Software Engineering 33, 4 (2007), 225–237.
  • Lity et al. (2017) Sascha Lity, Mustafa Al-Hajjaji, Thomas Thum, and Ina Schaefer. 2017. Optimizing Product Orders Using Graph Algorithms for Improving Incremental Product-Line Analysis. In VaMoS’17. 60–67.
  • Lity et al. (2012) Sascha Lity, Malte Lochau, Ina Schaefer, and Ursula Goltz. 2012. Delta-Oriented Model-Based SPL Regression Testing. In PLEASE’12. 53–56.
  • Lity et al. (2016) Sascha Lity, Thomas Morbach, Thomas Thum, and Ina Schaefer. 2016. Applying Incremental Model Slicing to Product-Line Regression Testing. In ICSR’16. 3–19.
  • Lochau et al. (2014) Malte Lochau, Sascha Lity, Remo Lachmann, Ina Schaefer, and Ursula Goltz. 2014. Delta-Oriented Model-Based Integration Testing of Large-Scale Systems. Journal of Systems and Software 91 (2014), 63–84.
  • McGregor (2001) John D. McGregor. 2001. Testing a Software Product Line. Technical Report. Software Engineering Institute, Carnegie Mellon University.
  • Mirarab et al. (2008) Siavash Mirarab, Afshar Ganjali, Ladan Tahvildari, Shimin Li, Weining Liu, and Mike Morrissey. 2008. A Requirement-Based Software Testing Framework: An Industrial Practice. In ICSM’08. 452–455.
  • Muccini (2007) Henry Muccini. 2007. Using Model Differencing for Architecture-level Regression testing. In SEAA’07.
  • Muccini et al. (2006) Henry Muccini, Marcio Dias, and Debra J. Richardson. 2006. Software Architecture-Based Regression Testing. Journal of Systems and Software 79 (2006), 1379–1396.
  • Muccini and van der Hoek (2003) Henry Muccini and Andre van der Hoek. 2003. Towards Testing Product Line Architectures. Electronic Notes in Theoretical Computer Science 82, 6 (2003), 99–109.
  • Nardo et al. (2015) Daniel Di Nardo, Nadia Alshahwan, Lionel Briand, and Yvan Labiche. 2015. Coverage-based Regression Test Case Selection, Minimization and Prioritization: a Case Study on an Industrial System. Software Testing, Verification and Reliability 25, 4 (2015), 371–396.
  • Nebut et al. (2006a) Clementine Nebut, Franck Fleurey, Yves Le Traon, and Jean-Marc Jezequel. 2006a. Automatic Test Generation: A Use Case Driven Approach. IEEE Transactions on Software Engineering 32, 3 (2006), 140–155.
  • Nebut et al. (2006b) Clementine Nebut, Yves Le Traon, and Jean-Marc Jezequel. 2006b. System Testing of Product Families: from Requirements to Test Cases. In Software Product Lines. Springer.
  • Oster et al. (2011) Sebastian Oster, Andreas Wübbeke, Gregor Engels, and Andy Schürr. 2011. A Survey of Model-based Software Product Lines Testing. Model-Based Testing for Embedded Systems (2011), 338–381.
  • Parejo et al. (2016) José A Parejo, Ana B Sánchez, Sergio Segura, Antonio Ruiz-Cortés, Roberto E Lopez-Herrejon, and Alexander Egyed. 2016. Multi-objective Test Case Prioritization in Highly Configurable Systems: A Case Study. Journal of Systems and Software 122, 287–310 (2016).
  • Pohl et al. (2005) Klaus Pohl, Gunter Bockle, and Frank van der Linden. 2005. Software Product Line Engineering: Foundations, Principles, and Techniques. Springer.
  • Pohl and Metzger (2006) Klaus Pohl and Andreas Metzger. 2006. Software Product Lines Testing. Commun. ACM 49, 12 (2006), 79–81.
  • Pradhan et al. (2018) Dipesh Pradhan, Shuai Wang, Shaukat Ali, Tao Yue, and Marius Liaaen. 2018. REMAP: Using Rule Mining and Multi-Objective Search for Dynamic Test Case Prioritization. In ICST’18. 46–57.
  • Qu et al. (2011) Xiao Qu, Mithun Acharya, and Brian Robinson. 2011. Impact Analysis of Configuration Changes for Test Case Selection. In ISSRE’11. 140–149.
  • Ramesh and Jarke (2001) Balasubramaniam Ramesh and Matthias Jarke. 2001. Toward Reference Models for Requirements Traceability. IEEE Transactions on Software Engineering 27, 1 (2001), 58–93.
  • Reuys et al. (2005) Andreas Reuys, Erik Kamsties, Klaus Pohl, and Sacha Reis. 2005. Model-Based System Testing of Software Product Families. In CAiSE’05. 519–534.
  • Reuys et al. (2006) Andreas Reuys, Sacha Reis, Erik Kamsties, and Klaus Pohl. 2006. The ScenTED Method for Testing Software Product Lines. In Software Product Lines. 479–520.
  • Rice (2007) John A. Rice. 2007. Mathematical Statistics and Data Analysis. Thomson Higher Education.
  • Rothermel and Harrold (1996) Gregg Rothermel and Mary Jean Harrold. 1996. Analyzing Regression Test Selection Techniques. IEEE Transactions on Software Engineering 22, 8 (1996), 529–551.
  • Rothermel and Harrold (1997) Gregg Rothermel and Mary Jean Harrold. 1997. A Safe, Efficient Regression Test Selection Technique. ACM Transactions on Software Engineering and Methodology 6, 2 (1997), 173–210.
  • Rothermel et al. (2000) Gregg Rothermel, Mary Jean Harrold, and Jeinay Dedhia. 2000. Regression Test Selection for C++ Software. Software Testing, Verification and Reliability 10, 2 (2000), 77–109.
  • Rothermel et al. (2001) Gregg Rothermel, Roland H. Untch, Chengyun Chu, and Mary Jean Harrold. 2001. Prioritizing Test Cases for Regression Testing. IEEE Transactions on Software Engineering 27, 10 (2001), 929–948.
  • RTCA and EUROCAE (2018) RTCA and EUROCAE. 2018. DO-178C: Software Considerations in Airborne Systems and Equipment Certification.
  • Runeson and Engström (2012) Per Runeson and Emelie Engström. 2012. Regression Testing in Software Product Line Engineering. In Advances in Computers, Vol. 86. 223–263.
  • Shurr et al. (2010) Andy Shurr, Sebastian Oster, and Florian Markert. 2010. Model-Driven Software Product Lines Testing: An Integrated Approach. In SOFSEM’10. 112–131.
  • Sokolova and Lapalme (2009) Marina Sokolova and Guy Lapalme. 2009. A Systematic Analysis of Performance Measures for Classification Tasks. Information Processing & Management 45, 4 (2009), 427–437.
  • Srikanth and Banerjee (2012) Hema Srikanth and Sean Banerjee. 2012. Improving Test Efficiency through System Test Prioritization. Journal of Systems and Software 85 (2012), 1176–1187.
  • Srikanth et al. (2016) Hema Srikanth, Charitha Hettiarachchi, and Hyunsook Do. 2016. Requirements based Test Prioritization using Risk Factors: An Industrial Study. Information and Software Technology 69 (2016), 71–83.
  • Srikanth and Williams (2005) Hema Srikanth and Laurie Williams. 2005. On the Economics of Requirements-based Test Case Prioritization. In EDSER’05. 1–3.
  • Srikanth et al. (2005) Hema Srikanth, Laurie Williams, and Jason Osborne. 2005. System Test Case Prioritization of New and Regression Test Cases. In ESEM’05. 64–73.
  • Srikanth et al. (2014) Hema Srikanth, Laurie Williams, and Jason Osborne. 2014. Towards the Prioritization of System Test Cases. Software Testing, Verification and Reliability (2014), 320–337.
  • Stricker et al. (2010) Vanessa Stricker, Andreas Metzger, and Klaus Pohl. 2010. Avoiding Redundant Testing in Application Engineering. In SPLC’10. 226–240.
  • Tahat et al. (2012) Luay Tahat, Bogdan Korel, Mark Harman, and Hasan Ural. 2012. Regression test suite prioritization using system models. Software Testing, Verification and Reliability 22, 7 (2012), 481–506.
  • Tevanlinna et al. (2004) Antti Tevanlinna, Juha Taina, and Raine Kauppinen. 2004. Product Family Testing: a Survey. ACM SIGSOFT Software Engineering Notes 29, 2 (2004), 12–12.
  • Tonella et al. (2006) Paolo Tonella, Paolo Avesani, and Angelo Susi. 2006. Using the Case-based Ranking Methodology for Test Case Prioritization. In ICSM’06. 123–133.
  • Uzuncaova et al. (2008) Engin Uzuncaova, Daniel Garcia, Sarfraz Khurshid, and Don S. Batory. 2008. Testing Software Product Lines Using Incremental Test Generation. In ISSRE’08. 249–258.
  • Uzuncaova et al. (2010) Engin Uzuncaova, Sarfraz Khurshid, and Don S. Batory. 2010. Incremental Test Generation for Software Product Lines. IEEE Transactions on Software Engineering 36, 3 (2010), 309–322.
  • Vaysburg et al. (2002) Boris Vaysburg, Luay H. Tahat, and Bogdan Korel. 2002. Dependence Analysis in Reduction of Requirement based Test Suites. In ISSTA’02. 107–111.
  • von Mayrhauser and Zhang (1999) Anneliese von Mayrhauser and Ning Zhang. 1999. Automated Regression Testing using DBT and Sleuth. Journal of Software Maintenance 11, 2 (1999), 93–116.
  • Wang et al. (2015a) Chunhui Wang, Fabrizio Pastore, Arda Goknil, Lionel C. Briand, and Muhammad Zohaib Z. Iqbal. 2015a. Automatic Generation of System Test Cases from Use Case Specifications. In ISSTA’15. 385–396.
  • Wang et al. (2015b) Chunhui Wang, Fabrizio Pastore, Arda Goknil, Lionel C. Briand, and Muhammad Zohaib Z. Iqbal. 2015b. UMTG: a Toolset to Automatically Generate System Test Cases from Use Case Specifications. In ESEC/SIGSOFT FSE’15. 942–945.
  • Wang et al. (2016) Shuai Wang, Shaukat Ali, Arnaud Gotlieb, and Marius Liaaen. 2016. A Systematic Test Case Selection Methodology for Product Lines: Results and Insights from an Industrial Case Study. Empirical Software Engineering 21, 4 (2016), 1586–1622.
  • Wang et al. (2017) Shuai Wang, Shaukat Ali, Arnaud Gotlieb, and Marius Liaaen. 2017. Automated Product Line Test Case Selection: Industrial Case Study and Controlled Experiment. Software and Systems Modeling 16, 2 (2017), 417–441.
  • Wang et al. (2014) Shuai Wang, David Buchmann, Shaukat Ali, Arnaud Gotlieb, Dipesh Pradhan, and Marius Liaaen. 2014. Multi-Objective Test Prioritization in Software Product Line Testing: An Industrial Case Study. In SPLC’14. 32–41.
  • Wong et al. (1997) W. Eric Wong, Joseph R. Horgan, Saul London, and Hira Agrawal Bellcore. 1997. A Study of Effective Regression Testing in Practice. In ISSRE’97. 230–238.
  • Yang et al. (2009) Qian Yang, J. Jenny Li, and David M. Weiss. 2009. A Survey of Coverage-Based Testing Tools. Comput. J. 52, 5 (2009), 589–597.
  • Yoo and Harman (2012) Shin Yoo and Mark Harman. 2012. Regression Testing Minimization, Selection and Prioritization: a Survey. Software Testing, Verification and Reliability 22, 2 (2012), 67–120.
  • Yue et al. (2013) Tao Yue, Lionel C. Briand, and Yvan Labiche. 2013. Facilitating the Transition from Use Case Models to Analysis Models: Approach and Experiments. ACM Transactions on Software Engineering and Methodology 22, 1 (2013).
  • Zech et al. (2017) Philipp Zech, Philipp Kalb, Michael Felderer, Colin Atkinson, and Ruth Breu. 2017. Model-based Regression Testing by OCL. Software Tools for Technology Transfer 19, 1 (2017), 115–131.