Log In Sign Up

Systematic Mapping Protocol Feature Modeling Tools

by   Samuel Sepúlveda, et al.

The customers and users need for new products and services according to high-quality standards have increased in the last time. In that sense, the production processes must be aligned with the organization and development process in order to achieve this goal. The aim of this paper is to synthesize the current state of the research reported in the literature regarding the application domain, underlying model, origin, degree of empirical validation and quality of existing feature modeling tools used in SPL. Therefore, this technical report presents the protocol definition for a systematic mapping study (SMS) that we will conduct to identify and assess the set of relevant papers on feature model tools.


page 1

page 2

page 3

page 4


Systematic Mapping Protocol: Reasoning Algorithms on Feature Model

Context: The importance of the feature modeling for the software product...

Systematic literature review protocol Identification and classification of feature modeling errors

Context: The importance of feature modeling languages for software produ...

Model-driven Engineering of Safety and Security Systems: A Systematic Mapping Study

This paper presents a systematic mapping study on the model-driven engin...

Measurement of Interpersonal Trust in Global Software Development: SLR Protocol

The purpose of this protocol is to be useful to identify, evaluate and s...

Systematic Mapping Protocol: Variability Management in Dynamic Software Product Lines for Self-Adaptive Systems

Context: The Importance of Dynamic Variability Management in Dynamic Sof...

The Evolution of Code Review Research: A Systematic Mapping Study

Code Review (CR) is a cornerstone for Quality Assurance within software ...

Quality quantification in Systems Engineering from the Qualimetry Eye

Nowadays, quality definition, assessment, control and prediction cannot ...

1 Introduction

The customers and users need for new products and services according to high quality standards have increased in the last time. In that sense, the productive processes must be aligned with the organization and development process in order to achieving this goal.

Software product line (SPL) is an approach that can deal with those needs, increasing productivity without sacrificing quality. SPL is a set of software intensive systems that share a common set of features and are developed for a specific segment or domain using a defined process [1]. There are several known benefits of the SPL use: reduction in the cycles of product development, increase in productivity by an order of magnitude, decrease in cost and a substantial improvement in the quality of products [2, 3].

The domain analysis for SPL is, to the best of our knowledge, the most important stage in the development process. Here, feature models (FMs) are a domain analysis artifact used to describe all identified features. Furthermore, it is de facto standard for managing variability [4].

The aim of this paper is to synthesize the current state of research reported in the literature regarding the application domain, underlying model, origin, degree of empirical validation and quality of existing feature modeling tools used in SPL. The motivation is to check for improvement tendencies in the field, covering the period between 2000 and 2019. In particular, empirical validation of the tools, has been repeatedly pointed out as important deficiencies of the field. Also, we include an initial quality assessment for the different feature modeling tools that we will find.

We think this study may be of interest to both academic researchers and industry professionals who wish to get an updated view of feature modeling tools, which are their shortcomings and strengths in terms of some quality criteria. With this knowledge, they will better assess the potential benefits and risks associated with adopting each feature modeling tool. Furthermore, it could be of interest to researchers looking for gaps in research for doing additional studies on feature modeling tools for SPLs. In addition, we see this study as a continuation of what we have done on different aspects for FMs and modeling languages used in SPLs [5, 6, 7].

Therefore, this technical report presents the protocol definition for a systematic mapping study (SMS) that we will conduct to identify and assess the set of relevant papers on feature model tools.

The rest of the report is structured as follows. Section 2 describes the research method to follow. Section 3 presents and discusses the main threats to validity, and the strategy to deal with them. Finally, Section 4 presents our conclusions and ongoing work.

2 Research method

This study has been carried out according to the SMS methodology described by [8], as a methodology that aims to “identify all research related to a specific topic rather than addressing the specific questions that conventional SLRs address”. Similarly, [9] indicate, “a systematic mapping is a method to build a classification scheme and structure a Software Engineering (SE) field of interest. The analysis of results focuses on frequencies of publications for categories within the scheme. Thereby, the coverage of the research field can be determined”

. In this study we will search for existing research related to feature modeling tools in the context of SPLs, and we have classified and analyzed them according to certain predefined criteria.

Next, in Section 2.1 we define the SMS protocol. Then, in Section 2.2 we describe the study selection and in Section 2.3 we define the preliminary data extraction protocol. Finally, in Section 2.4, we briefly describe the tool support used for our SMS. The whole process followed for the SMS is shown in Figure 1, adapted from [9].

Figure 1: SMS steps

2.1 Protocol Definition (Planning)

In this section we present the main steps performed in the protocol definition for this SMS.

2.1.1 Aim and Need

The aim of this SMS is twofold. On the one hand, we have established some issues about application domain, underlying feature model, origin and degree of empirical validation of feature modelling tools for SPLs. On the other hand, we have assessed the “quality level” for the selected feature modelling tools.

Therefore, the importance of this study lies in the issues mentioned above, in addition to other aspects included in this study, namely origin of the papers, context of application, year of publication, publisher and target audience, among others.

We think that a clear picture of all these characteristics may help professionals reduce the associate risk for choosing a tool. Additionally, we aim to foster a discussion among the members of the community about the qualities that feature modelling tools for SPLs should have in order to promote the creation and sharing of high-quality specifications.

2.1.2 Research Questions

The RQs define what should be extracted from the final selected publications [10]. Table 1 presents the four RQs that drive our SMS, together with their contribution to the general aim.

ID Question Aim and Classification Schema
RQ1 What is the feature modelling tool’s application domain? To determine if the tool is multipurpose or has been developed/used in specific domains.
RQ2 What model underlies the selected feature modelling tool? To determine what model each tool is linked to, e.g. FODA and its variants, cardinality based model or others.
RQ3 Where have feature modelling tools for SPLs been developed? To identify the origin of the tools: academia, industry or joint.
RQ4 What is the degree of empirical validation of feature modelling tools in SPLs? To examine how each selected tool was validated: with proofs of concept, through its use in industry, through case studies, through experiments, etc.
Table 1: Research Questions for the Systematic Mapping Study

2.1.3 Search String

The search string was constructed as follows [10, 11]:

  • From the RQs we obtained keywords.

  • From the keywords, we considered synonyms.

  • We built the search string by applying the criterion Population-Intervention-Comparison-Outcomes-Context (PICOC [12]).

According to [10], population in SE should correspond to one of the following: (1) specific SE role, (2) a category of software engineer, (3) an application area or (4) an industry group. In our case, Software Product Lines was considered an application area.

An intervention in SE is defined as a methodology, tool, technology or procedure that addresses a specific issue [10]

. For example, performing specific tasks such as requirements specification, system testing, or software cost estimation. In our case, the intervention is part of a

tool, in particular for Domain Engineering stage and Feature Modelling step.

The comparison element is not applicable to our RQs, because they did not involve the comparison of the collected papers against any commonly used feature modelling tool or technique (the control condition).

The main outcomes of our RQs are the origin, underlying model, application domain, together with their level of validation in the software industry.

Last, the context represents the place where the comparison is done, for example academia, industry or both.

All different defined terms were combined with the “AND” boolean operator, and all the synonyms were joined to each other by using the “OR” operator to improve the completeness of the results. The terms, synonyms, final search string and search strategy are shown in Table 2.

Terms Feature, modelling, model software, family, product, lines, variability, tool

Combining Terms
“Feature modelling”, “Feature model”, “Variability”, “Software product lines”, “Tool”, “Software family”, “Product family”

Search String
(“Feature modelling” OR “Feature model” OR “Variability”) AND (‘Software product lines” OR “Product family” OR “Software family” OR SPL) AND (“Tool”)

Search strategy
The string was entered sequentially into each data source, adapting it accordingly. Variations in spelling (e.g. modelling vs. modeling) were also accounted for.
Table 2: Search String

2.1.4 Inclusion and Exclusion Criteria

In this study we defined both inclusion and exclusion criteria. By checking these criteria we decided whether an article was finally included or not in the SLR, based on its content.

In particular, and following the guidelines of [10], grey literature (i.e. technical reports, white papers and work in progress) was excluded.

These criteria are defined in Table 3.

Inclusion criteria Papers that address the topic of feature modelling tools for SPLs, from any of the following perspectives:
  • Studies that propose feature modelling tools in software product lines.

  • Peer-reviewed studies obtained from journals, conferences and workshops.

  • Studies published from 2000 to 2019.

  • Studies published in English or Spanish.

Exclusion criteria Papers that, even if they discuss proposals and tools for SPL and variability modelling, do not center specifically on feature modelling tools for SPLs:
  • Studies by the same author or group of authors who do not contribute significant improvements to prior proposals in the case where there is a recent proposal.

  • Studies not available online.

  • Studies that deal with secondary research, such as mapping studies or systematic literature reviews.

Table 3: Content-related Inclusion and Exclusion criteria

2.1.5 Protocol Validation

The protocol validation was performed along with the definition of each of the steps of the protocol. This validation was based on the criteria defined by [13], and we concretely and objectively identified how we developed our mapping study. In the appendix A we detail the evaluation process for the SMS protocol.

The information presented in this paper corresponds to the final result (definition plus validation) of each step. According to the evaluation done to our systematic mapping study, we applied at least one action for each rubric criteria group established in the protocol phase [13].

Considering the ratio of the number of actions taken in our study in comparison to the total number of actions possible to be taken, the calculated ratio was 38% (10 over 26 items).

2.2 Primary Study Selection

We made a list that was as complete as possible of papers related to feature modelling tools and SPLs. This SMS dates back to 2000 and the search was conducted between March and May 2019.

2.2.1 Search Process

We design a search strategy that consisted of an automatic search on electronic databases, eventually we consider perform a snowballing approach to complete the search.

We consider the following databases: IEEE Xplore, ACM Digital Library, Science Direct and Scopus. These sources are recognized as being among the most relevant in the Software Engineering community [10, 14].

2.2.2 Pilot Selection

Once both the inclusion and exclusion criteria and the data sources had been defined, we performed a pilot selection and extraction to ensure the reliability of the protocol.

For all the researchers involved in the selection of the primary studies, we will verify that the manner of applying and understanding the inclusion/exclusion criteria be similar for everyone (inter-rater agreement), avoiding any potential bias.

This will be tested as follows: for all the researchers individually deciding on the inclusion/exclusion of a set of papers randomly chosen from those retrieved by this pilot selection. We perform a test of concordance based on the Fleiss’ Kappa statistic as a means of validation [15]. We consider to obtain a Kappa0.75, could be a value that suggests that the criteria were clear enough to apply the inclusion and exclusion criteria in a consistent way for each one of the researchers [16].

2.3 Preliminary Data Extraction Protocol

Once both the search string and the inclusion/exclusion criteria had been tested, we launched the primary study retrieval and the data extraction phase. A summary of this phase can be seen in Table 4.

Paper access Access to each of the papers to be reviewed must be guaranteed.
Initial review of the paper Read the title, abstract and keywords of each paper to decide the relevance to the SMS.
Review Report Scan the whole paper and answer the following questions:
  • Why was the paper accepted/rejected?

  • If the paper was accepted

    • Why is the paper relevant to the SMS?

    • Which of the RQs does the paper answer?

Table 4: Access and Data extraction protocol

First, we will run the search string in the selected data sources, mentioned in Section 2.2.1. This process will turn in aprox. 1000 and 1500 results (according to pilot searchs). After that, we will eliminate the duplicates. Then, we will look through the title, abstract and keywords (if available) to get an initial impression of their thematic relevance (See Table 4). In this step the papers that will not be rejected follow on to the next step.

Next, we will apply the format-related inclusion/exclusion criteria. We will discard papers not in English grey literature. In addition, we will discard papers that presented a different version of the same proposal. When the latter was the case, we retain the most current version of the proposal in the selection.

Last, we will divide that list among the researchers, and each one apply the content-related inclusion/exclusion criteria defined in Table 3, obtaining the final list of selected papers.

This information will be then jointly reviewed to collaboratively accept the final list of selected papers. The whole list, including a brief description of all the selected papers for this SMS will be summarized in a appendix on the final paper with the results of this systematic mapping.

Figure 2: SMS Primary Study Selection Steps.

2.3.1 Preliminary Data Extraction and Assessment

For each selected paper (that will meet the inclusion criteria), we will read it, extracting relevant data in order to answer the established RQs. Figures 3 and 4 show en example for the data extraction form that will be used to compile the details about the paper and the tool reported.

Figure 3: Example for a data extraction form for the selected papers.
Figure 4: Example for a data extraction form for the tools in selected papers.

The extracted data for each paper and their assessment strategy will be as follows: (i) Title, authors, year, (ii) Reason why the paper was initially included, (iii) Type of publication journal

(SCI-JCR quartile

111Journal Citation Reports, or other) or conference proceeding (CORE ranking222Computing, Research and Education, or other) and the corresponding editor, (iv) Type of experience reported, (v) Results, (vi) Community to which the paper was directed and (vii) Tools and programming languages used. The detail of this is shown in Table 5.

Initial reading The abstract, introduction, related work, conclusions and references should be read to collect background information about:
  • Community (Introduction, Related Work and References)

  • Contributions of the paper according to its authors (Abstract, Introduction and Conclusions)

  • Possible consequences of contributions: applications, new techniques or research (Introduction, Conclusions and Future work).

Detailed reading The body of the article should be read in order to:
  • Get detailed information required for the SLR (journals or conferences, publishers, year of publication, thematic content, etc.).

  • Understand and establish the basis of an experiment, theoretical framework or model, etc.

Table 5: Data Extraction Protocol

According to the breakdown for each of the RQs defined, the details and the categorization type that will be used to classify the selected papers are shown in Table 6. This categorization is defined as open (partial) if it does not cover all the possibilities, and therefore more categories could be added. On the opposite, a closed (complete) classification schema covers the whole set of possibilities for that criterion.

RQ Detail CType
RQ1 To establish whether the feature modelling tool was used, we established the domain categories, and each paper was assigned to a category according to the domain where the tool was used. Open
RQ2 To study the evidence about the underlying model that each tool is linked, we examined the information provided by each paper and assigned it to one of the defined categories. Open
RQ3 To establish whether the feature modelling tool was developed from a need of the researchers or satisfied a deficiency detected by the industry, we created three categories, and each paper was assigned to a category according its origin. Closed
RQ4 To quantify how the results provided were validated, we established seven validation categories, and each paper was assigned to a category according to the type of validation. Open
Table 6: RQ - details and classification type (CType: Open, Closed)

2.4 SMS Tool Support

In order to facilitate finding, selecting, documenting and analyzing the information gathered, the following support tools were used: Dropbox333 as a shared repository of resources [17]. The details of using this tool are shown in figure 5.

Figure 5: Dropbox as a shared repository for resources.

Mendeley444 -Desktop and Web- for storing, reading and annotating reviews for selected papers as well as the automatic creation of the .bib files for managing the bibliographic references [18]. The details of using these tools are shown in figure 6.

Figure 6: Mendeley -desktops version- for storing and reviewing papers.

Publish or Perish555 for initial validation of the search string and automatic spreadsheet creation [19]. The details of using this tool are shown in figure 7.

Figure 7: Screenshot from website of Publish or Perish tool.

Overleaf666 for editing, managing and controlling the different file versions used to create this paper. The details of using this tool are shown in figure 8.

Figure 8: Overleaf for writing this report and the paper.

3 Threats to validity

Despite the care taken in the definition of our SMS, secondary studies suffer from some well-known limitations and threats to the validity that we discuss in the following paragraphs, together with how these were addressed to minimize their impact on the execution of this protocol.

  • Bias - searching papers. Possible bias on searching for papers. It is difficult for us to guarantee that all relevant primary studies will be selected on our SMS, due to mainly defined searching process. We will mitigate this threat by following the main references in the chosen primary studies to make sure they will be also present in our list of candidate papers.

  • Bias - relevant papers. Possible bias in excluding relevant papers. We will mitigate this threat through a pilot study in which a high level of inter-rater agreement will be found in order to validate the inclusion/exclusion criteria among the researchers. Also, the decisions of including/excluding the papers will be jointly taken by more than one researcher, thus avoiding individual bias.

  • Limitations - data extraction. Limitations on data extraction from the selected papers. There could be some difficulties in extracting the relevant information related to certain items. For example, some papers can not provide explicit information that directly answer our RQs, such as modelling tool’s application domain or the underlying model for the tool.

  • Limitations - searching in data sources. Limitations of the tools used to conduct searches in the electronic data sources, as already mentioned in Section 2.1 and Table 2.

    We mitigated this threat by talking with experts in SMS and SLR, who gave us feedback, and helped us to validate our defined protocol.

Finally, we pretend to check the performance of executing the SMS Protocol according to the rubrics defined by [13]. The details are shown (plus the expected results) in the appendix A.

4 Conclusions

We have followed the guidelines to plan a SMS according to Petersen [13]. As the whole authors adhered to these guidelines to build up the protocol presented in this document, we think that the conducting phase of the SMS will be repeatable. Finally, the threats to validity have been identified, and mitigated as much as possible.


Samuel Sepúlveda would like to thank to Dr. Pedro Rossel and Ms.(c) Alonso Bobadilla for their useful advices and technical support.


  • [1] P. Clements and L. Northrop, Software Product Lines: Practices and Patterns, 3rd ed.   Addison-Wesley Professional, Aug. 2001.
  • [2] F. Ahmed and L. F. Capretz, “Best practices of rup® in software product line development,” in Proceedings of the International Conference on Computer and Communication Engineering (ICCCE 2008).   IEEE, 2008, pp. 1363–1366.
  • [3] J. D. McGregor, D. Muthig, K. Yoshimura, and P. Jensen, “Guest editors’ introduction: Successful software product line practices,” IEEE Software, vol. 27, no. 3, pp. 16–21, May/June 2010.
  • [4] P. Collet, “Domain Specific Languages for Managing Feature Models: Advances and Challenges,” in Proceedings of the 6th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation (ISoLA 2014), ser. Lecture Notes in Computer Science, T. Margaria and B. Steffen, Eds., vol. 8802.   Springer, Oct. 2014, pp. 273–288.
  • [5] S. Sepúlveda, C. Cares, and C. Cachero, “Towards a Unified Feature Metamodel: a Systematic Comparison of Feature Languages,” in Proceedings of the 7th Iberian Conference on Information Systems and Technologies (CISTI 2012).   IEEE Computer Society, Jun. 2012, pp. 1–7.
  • [6] ——, “Feature Modeling Languages: Denotations and Semantic Differences,” in Proceedings of the 7th Iberian Conference on Information Systems and Technologies (CISTI 2012).   IEEE Computer Society, Jun. 2012, pp. 1–6.
  • [7] S. Sepúlveda, A. Cravero, and C. Cachero, “Requirements modeling languages for software product lines: A systematic literature review,” Information and Software Technology, vol. 69, pp. 16–36, Jan. 2016.
  • [8] B. A. Kitchenham, D. Budgen, and O. P. Brereton, “The value of mapping studies: a participant observer case study,” in Proceedings of the 14th international conference on Evaluation and Assessment in Software Engineering.   BCS Learning & Development Ltd., Apr. 2010, pp. 25–33.
  • [9] K. Petersen, R. Feldt, S. Mujtaba, and M. Mattsson, “Systematic mapping studies in software engineering,” in Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering (EASE’08).   BCS Learning & Development Ltd., Jun. 2008, pp. 1–10.
  • [10] B. Kitchenham and S. Charters, “Guidelines for performing Systematic Literature Reviews in Software Engineering,” Keele University and Durham University, Tech. Rep. EBSE 2007-01, Jul. 2007.
  • [11] M. Unterkalmsteiner, T. Gorschek, A. M. Islam, C. K. Cheng, R. B. Permadi, and R. Feldt, “Evaluation and measurement of software process improvement - a systematic literature review,” IEEE Transactions on Software Engineering, vol. 38, no. 2, pp. 398–424, March-April 2012.
  • [12] M. Petticrew and H. Roberts, Systematic Reviews in the Social Sciences: A Practical Guide.   John Wiley & Sons, Dec. 2008.
  • [13] K. Petersen, S. Vakkalanka, and L. Kuzniarz, “Guidelines for conducting systematic mapping studies in software engineering: An update,” Information and Software Technology, vol. 64, pp. 1–18, Aug. 2015.
  • [14] P. Brereton, B. A. Kitchenham, D. Budgen, M. Turner, and M. Khalil, “Lessons from applying the systematic literature review process within the software engineering domain,” Journal of Systems and Software, vol. 80, no. 4, pp. 571–583, Apr. 2007.
  • [15] K. Gwet, “Inter-rater Reliability: Dependency on Trait Prevalence and Marginal Homogeneity,” Statistical Methods for Inter-Rater Reliability Assessment Series, vol. 2, pp. 1–9, May 2002.
  • [16] J. L. Fleiss, Statistical methods for rates and proportions.   John Wiley & Sons, 1981.
  • [17] I. Drago, M. Mellia, M. M. Munafò, A. Sperotto, R. Sadre, and A. Pras, “Inside dropbox: Understanding personal cloud storage services,” in Proceedings of the 2012 ACM SIGCOMM Internet Measurement Conference.   ACM, Nov. 2012, pp. 481–494.
  • [18] V. Vaidhyanathan, M. Moore, K. A. Loper, J. Van Schaik, and D. Goolabsingh, “Making bibliographic researchers more efficient: Tools for organizing and downloading pdfs, part 1: icyte, mendeley desktop, papers, pdf stacks, pubget paperplane, wizfolio, and zoterot,” Journal of Electronic Resources in Medical Libraries, vol. 9, no. 1, pp. 47–55, 2012.
  • [19] A.-W. Harzing, “Publish or perish, version 3,” Available at, 2007.

Appendix A Evaluation of the SMS process

Here we include a self evaluation for the work that will be done according to [13]. Table 7 shows the activities considered for conducting a SMS. We declare using a check-mark (✓) the activities that will be performed.

Protocol Phase Actions Applied
Need for map Motivate the need and relevance
Define objectives and questions
Consult with target audience to define questions
Study identification Choose search strategy
Conduct database search
Develop the search
Consult librarians or experts
Iteratively try to find more relevant papers
Keywords from known papers
Use standards, encyclopedias, and thesaurus
Evaluate the search
Test-set of known papers
Expert evaluates result
Search web pages of key authors
Inclusion and Exclusion criteria
Identify objective criteria for decision
Add additional reviewer, resolve disagreements between them when needed
Decision rule
Data extraction and classification Extraction process
Identify objective criteria for decision
Obscuring information that could bias
Add additional reviewer, resolve disagreements between them when needed
Classification scheme
Research type
Research method
Venue type
Validity discussion Validity discussion/limitations provided
Table 7: Activities to be conducted in the SMS planned (adapted from [13]).

We used the evaluation rubric suggested by Petersen [13]. Tables 8 to 12 show the rubric criteria. The scores that will pretend to obtain executing this protocol are highlighted. These scores must will be contrasted with the results at the SMS executing and reporting results.

Evaluation Description Score
No description The study is not motivated and the goal is not stated 0
Partial evaluation Motivations and questions are provided 1
Full evaluation Motivations and questions are provided, and have been defined in correspondence with the target audience 2
Table 8: Rubric: need for review
Evaluation Description Score
No description Only one type of search has been conducted 0
Minimal evaluation Two search strategies have been used 1
Full evaluation All three search strategies have been used 2
Table 9: Rubric: choosing the search strategy
Evaluation Description Score
No description No actions have been reported to improve the reliability of the search and inclusion/exclusion criteria 0
Minimal evaluation At least one action has been taken to improve the reliability of the search or the reliability of the inclusion/exclusion criteria 1
Partial evaluation At least one action has been taken to improve the reliability of the search and the inclusion/exclusion criteria 2
Full evaluation All actions identified have been taken 3
Table 10: Rubric: evaluation of the search
Evaluation Description Score
No description No actions have been reported to improve on the extraction process or enable comparability between studies through the use of existing classifications 0
Minimal evaluation At least one action has been taken to increase the reliability of the extraction process 1
Partial evaluation At least one action has been taken to increase the reliability of the extraction process, and research type and method have been classified 2
Full evaluation All actions identified have been taken 3
Table 11: Rubric: extraction and classification
Evaluation Description Score
No description No threats or limitations are described 0
Full evaluation Threats and limitations are described 1
Table 12: Rubric: study validity