Views on Quality Requirements in Academia and Practice: Commonalities, Differences, and Context-Dependent Grey Areas

02/07/2020 ∙ by Andreas Vogelsang, et al. ∙ Berlin Institute of Technology (Technische Universität Berlin) University of Bonn BTH Google Inc 0

Context: Quality requirements (QRs) are a topic of constant discussions both in industry and academia. Debates entwine around the definition of quality requirements, the way how to handle them, or their importance for project success. While many academic endeavors contribute to the body of knowledge about QRs, practitioners may have different views. In fact, we still lack a consistent body of knowledge on QRs since much of the discussion around this topic is still dominated by observations that are strongly context-dependent. This holds for both academic and practitioners' views. Our assumption is that, in consequence, those views may differ. Objective: We report on a study to better understand the extent to which available research statements on quality requirements, as found in exemplary peer-reviewed and frequently cited publications, are reflected in the perception of practitioners. Our goal is to analyze differences, commonalities, and context-dependent grey areas in the views of academics and practitioners to allow a discussion on potential misconceptions (on either sides) and opportunities for future research. Method: We conducted a survey with 109 practitioners to assess whether they agree with research statements about QRs reflected in the literature. Based on a statistical model, we evaluate the impact of a set of context factors to the perception of research statements. Results: Our results show that a majority of the statements is well respected by practitioners; however, not all of them. When examining the different groups and backgrounds of respondents, we noticed interesting deviations of perceptions within different groups that may lead to new research questions. Conclusions: Our results help identifying prevalent context-dependent differences about how academics and practitioners view QRs and pinpointing statements where further research might be useful.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 15

page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Requirements Engineering (RE) constitutes an important success factor in today’s software development projects and is often understood as a determinant of productivity and (product) quality Damian and Chisan (2006). Despite its importance, the discipline remains difficult to investigate due to its inherent complexity and its dependency to the various influences by the particularities of industrial sectors, domains, and individual project environments. This holds especially for quality requirements (QR), what they are, and how they should be handled in practice. Quality requirements are often considered separately to functional requirements in research Robertson and Robertson (2012); Sommerville and Sawyer (1997) and practice Ameller et al. (2012); Borg et al. (2003); Chung and Nixon (1995); Svensson et al. (2009) alike as they tend to be treated differently along their elicitation, documentation, and validation. Yet, there is still no common agreement about what quality requirements exactly are Glinz (2007) despite the fact that there is even an ISO standard ISO/IEC (2011) with a definition. The discourse about quality requirements is still dominated even by the question how to differentiate them from functional requirements Broy (2018); Glinz (2007).

In empirically-informed work Méndez Fernández et al. (2017); Méndez Fernández and Passoth (2019), we pointed out that conceptual contributions to RE are still heavily steered by conventional wisdom rather than by empirical observations. In fact, we still do not exploit the full potential of empirical software engineering principles in RE to reveal robust theories in tune with practical problems. It is per se difficult to provide proper empirical figures that could demonstrate, for instance, how theoretical concepts are reflected in industrial practices or, the other way around, how industry practitioners’ experiences, observations, and opinions can find their way back into academic contributions. This existing gap between academia and industry holds especially for fuzzy notions as the ones reflected by quality requirements. While it is certainly not our intention to marginalise existing empirical work in RE, it is reasonable to say that much of the existing body of knowledge in RE remains still a collection of either isolated, loosely connected hypotheses (e.g., empirical insights from a case study in a very specific context) or hypotheses that remain too universal (i.e., completely neglecting the context regardless whether intentional or not). Conclusions are thus either hardly generalizable or easily falsifiable (in a specific context). Examples for existing isolated, yet empirically grounded, observations are:

  • QRs are mainly elicited by architects Ameller et al. (2012)

  • Functional requirements are often labeled as QRs Eckhardt et al. (2016a)

  • Testing QRs is impossible Borg et al. (2003)

Motivated by this overall situation, we want to better understand the extent to which available research statements on quality requirements111Please note that in our study, we intentionally exclude all non-functional properties that do not address system-specific properties, such as process-related requirements. Hence, we intentionally use the term “quality requirements” instead of “non-functional requirements” to make explicit that we exclusively focus on non-functional characteristics of a system under consideration. See also Section 2 for further information., as found in academic peer-reviewed and frequently cited publication are consistent with the perceptions held by practitioners.

In particular, we aim at understanding the extent to which the views and perceptions held by practitioners are corroborated by those of academics. More precisely, we want to understand how well research statements frequently referred to in academic works are perceived by practitioners in their respective context. Questions we opt for answering are:

  • What is the agreement of practitioners with existing research statements about QRs?

  • Which context factors (e.g., industrial sector, company size, experience) influence the agreement of practitioners with research statements about QRs?

  • Can we assign a specific perception of QRs to stereotypical groups of practitioners?

Our hope is that an increased understanding of the practitioners’ beliefs and views helps us identifying differences, commonalities, and context-dependent grey areas and pinpoint to existing (and regularly cited) statements where further context-dependent research would be useful.

The paper makes the following contributions:

  1. We define a set of 21 research statements about quality requirements from a total of 17 exemplary and commonly cited research papers from the RE research community.

  2. We survey practitioners from several application domains and business contexts regarding their agreement with the previously identified statements about quality requirements. The survey results suggest that practitioners hold strong, and diverse opinions, and that some results inspire more passion and dissension than others.

  3. We provide a statistical model that allows evaluating the impact of specific context factors on the perception of research statements. The results of the evaluation show that the perception of some research statements is homogeneous across different development contexts while the perception of others strongly depends on the context.

  4. We provide a detailed discussion of the results and contrast them with the original studies from which the statements emerged.

Our intention is not to criticize selected academic manuscripts but to increase our understanding on (1) how much practitioners’ views differ with respect to their daily working context, and (2) what we, the research community, can learn from it. Our vision is to contribute to reducing the gap between industrial practice and problems, and academic contributions and solution proposals.

Relation to Previous Publications: In the past, we have conducted a number of studies in which we investigated the perception Eckhardt et al. (2016b) and use Eckhardt et al. (2016a, 2015) of quality requirements by practitioners. The research questions, results, and the underlying data presented in this article are original in the sense that they have not been addressed in a previous analysis and consequently in a publication. The only commonality between the study at hand and one of our previous publications Eckhardt et al. (2016b) is that the data underlying these studies have been collected using the same questionnaire (but different parts of it). More specifically, we designed a questionnaire on personal experiences to understand whether quality requirements and functional requirements are handled differently in practice. That means, our previous study Eckhardt et al. (2016b) and the study presented in this article share the same subject population but the analyzed questions are completely disjoint. In our 2017 paper Eckhardt et al. (2016b), we analyzed the answers to questions in questionnaire Sections 3–6, while in the article at hand, we analyze the questions from Section 7 of the questionnaire. The full questionnaire is part of our additional material package disclosed in our replication package Vogelsang et al. (2018).

2 Background

In this section, we provide background and related work on QR classifications and on the implications of QRs on software development.

2.1 QR/NFR Research

The term quality requirement and the closely related term non-functional requirement (NFR) are subject to constant discussions and even misunderstandings in academia and practice. In fact, there seems to be an agreement in academia that the term “non-functional requirements” should be generally avoided when characterizing requirements. In his seminal paper, Glinz Glinz (2007) performs a comprehensive review on the existing definitions related to the term “non-functional requirement”. He highlights three problems: (1) a definition problem, i.e., NFR definitions have discrepancies in the used terminology and concepts, (2) a classification problem, i.e., the definitions provide very different sub-classifications of NFRs, and (3) a representation problem, i.e., the notion of NFRs is representation-dependent. Similarly, Pohl Pohl (2010) discusses the misleading use of the term “non-functional” and argues to use, instead, “quality requirements” for product-related NFRs that are not constraints. In this manuscript, we also rely on this distinction and use “quality requirement” when particularly referring to product-related non-functional properties. Glinz proposes a requirements classification without even using the term NFR at all. However, he also recognizes the prevalent use of the term NFR and defines it as a “requirement that is an attribute of or a constraint on a system” (see Figure 1).

Figure 1: Requirements taxonomy according to Glinz Glinz (2007).

As stated earlier, in our work, we are interested in product-related (quality) requirements. In the taxonomy of Glinz, our characterization covers performance requirements and specific quality requirements. However, instead of using Glinz’s term attribute, we follow the suggestion of Pohl Pohl (2010) and use the term quality requirements throughout this manuscript. Broy Broy (2018) takes a completely different view towards categorizing requirements. He points out that there is no precise definition of both terms functional and non-functional, and he avoids both in his taxonomy. He argues that a requirements categorization should rather differentiate whether a requirement relates to the system’s interface, its internal architecture, its internal state, or whether it prescribes representational aspects.

Despite the observation that recent academic taxonomies seem to avoid using the term “non-functional”, too, the term is still widely used in practice and also in scientific papers, mostly in the sense of everything besides the functional requirements. Eckhardt et al Eckhardt et al. (2016a) analyzed 11 requirements specifications from industrial environments with a particular focus on requirements labeled as “quality” or “non-functional”. They found that most requirements specifications separate quality requirements from functional requirements in the documentations. However, when analyzing the quality requirements in detail, they found that many requirements labeled as QR describe system behavior and, thus, could as well be labeled as functional requirements. Jung et al. Jung et al. (2004) performed a study on the adequacy of the quality characteristics defined in the ISO/IEC 9126 standard ISO/IEC (2001). They asked 75 study subjects to rate a given software product in terms of a number of quality sub-characteristics defined in the standard. Afterwards, they clustered the sub-characteristics based on correlation between the given answers. Their results reveal ambiguities in the way that ISO/IEC 9126 is structured in terms of characteristics and sub-characteristics. For example, their results imply that four specific sub-characteristics actually measure the same intrinsic concept, which is a mixed concept of maintainability and portability.

One may argue that discussing requirements categorization is an academic gimmick. However, there is empirical evidence that the categorization influences how requirements are elicited, documented, and validated in practice Ameller et al. (2012); Borg et al. (2003); Chung and Nixon (1995); Svensson et al. (2009). In an earlier work Eckhardt et al. (2016b), we found that the development process for requirements of the two classes strongly differs (e.g., in testing). We obtained these findings based on a survey with practitioners to which we will also refer in the work presented here. We further found that many reasons are based on assumptions rather than on evidence. As a matter of fact, up to now, there does not exist a commonly accepted approach for the QR-specific elicitation, documentation, and analysis Borg et al. (2003); Svensson et al. (2009); QRs are usually described vaguely Borg et al. (2003); Ameller et al. (2012), remain often not quantified Svensson et al. (2009), and as a result remain difficult to analyze and test Ameller et al. (2012); Borg et al. (2003); Svensson et al. (2009). Furthermore, QRs are often retrofitted in the development process or pursued in parallel with, but separately from, functional requirements Chung and Nixon (1995) and, thus, are implicitly managed with little or no consequence analysis Svensson et al. (2009). This limited focus on QRs can result in the long run in high maintenance costs Svensson et al. (2009).

All these studies indicate, so far, that QRs are not well integrated in practical software development processes and furthermore that several problems are evident with QRs. In this paper, our goal is to analyze the discrepancy between perceptions of QRs in academia and practice.

2.2 Research about Perception of Research Statements in Software Engineering

In our work presented here, we want to analyze the perception of practitioners on statements in research about quality requirements. We call those statements research statements. Analyzing the appraisal of practitioners with respect to some normative statements has been targeted by a number of other authors as well.

Devanbu et al. Devanbu et al. (2016), for instance, report on a case study on the prior beliefs of developers at Microsoft, and the relationship of these beliefs to actual empirical data on the projects in which these developers work. Their results suggest that programmers do indeed have very strong beliefs on certain topics, that their beliefs are primarily formed based on personal experience rather than on findings in empirical research, and that beliefs can vary with each project, but do not necessarily correspond with actual evidence in that project. They conclude that more effort should be taken to disseminate empirical findings and that more research is needed on the relation between belief and evidence in software practice.

Rainer et al. Rainer et al. (2003) used content analysis to analyze a group discussion about software process improvement (SPI) between developers within one company and compare the respondents opinion with four research papers on SPI. The main finding from this analysis is that there is an apparent contradiction between developers saying that they want evidence, and what developers will accept as evidence. This main finding is related to issues such as hierarchies of knowledge, the value of empirical evidence to practitioners, local expertise, an incremental approach to improvement that may develop familiarity with those improvements, and differences between developers and managers with regards to their interest in the process. A serious implication follows from the main finding: even if researchers could demonstrate a strong, reliable relationship between software process improvement and organisational performance, there would still be the problem of convincing practitioners that the evidence applies to their particular situation.

The work at hand is inspired by the work of Devanbu et al. Devanbu et al. (2016), as it is the first work to raise the question of the discrepancy between evidence and belief in software engineering and its empirical investigation. Our work follows their line of reasoning but concentrates on research statements related to QRs. In addition, we perform an in-depth analysis of the influence of several context factors on the practitioners’ perceptions in order to uncover research statements that may be valid in certain contexts only. To the best of our knowledge, the paper at hands is the first attempt to empirically investigate research statements about QRs and their perception in practice.

3 Study Design

Our overall goal is to analyze the extent to which practitioners agree with statements on quality requirements emerging from academic research. By investigating potential mismatches between academic statements, perceptions of practitioners, and the specific context of practitioners, we hope to be able to identify research gaps for specific contexts, potential misconceptions that raise the need for further investigations, or statements that are only true in certain contexts. Our study consisted of the following steps:

  1. Identify research statements.
    We identified and extracted a set of research statements about quality requirements by browsing the literature from relevant journals and conferences.

  2. Collect feedback via a survey.
    We integrated all found research statements into an online survey where respondents from industry should state their general agreement with the statements. We sent out the survey to a broad spectrum of practitioners that work with requirements in general.

  3. Analyze the data.
    Given the responses, we applied a statistical model on each research statement that relates the level of agreement to specific context factors (described in Section 3.3

    ). The model allows calculating probabilities of a higher agreement or disagreement given a respondent characterized by some context factors.

Given these steps, our study approach is a form of qualitative survey Jansen (2010). Qualitative surveys particularly study diversity (not distribution) in a population in a cross-sectional manner. They do not aim at establishing frequencies, means, or other parameters but at determining the diversity of some topic of interest within a given population. This type of survey does not count the number of people with the same characteristic (value of variable) but it establishes the meaningful variation (relevant dimensions and values) within that population. Although this type of survey is coined as qualitative survey, it has to be noted that we use a statistical (i.e., quantitative) model to analyze the answers of the survey (see Section 3.4).

3.1 Identification of Research Statements

In the context of our study, a research statement is an assertion about quality requirements that has been stated in a high-quality, scientific publication. Research statements usually correspond to observations that single researchers (the authors of the papers) made in a specific context, even if it is not necessarily made explicit in the papers. Hence, research statements do not necessarily need to be true or commonly accepted by the whole community, but we expect that the research statements considered in our study to show a certain degree of scientific rigor proven by the peer-review process and by a considerable number of citations.

Our goal is to use a number of research statements as a vehicle to assess the differences and commonalities between the perception of academics and practitioners on the topic of quality requirements. To this end, we identified research statements on QRs by analyzing existing empirical studies concerning non-functional requirements or quality requirements. This process was not intended to be systematic in the sense of conducting a secondary study such as a systematic literature review, because we did not aim for a complete coverage of all research statements. Instead, we analyzed the relevant literature known to us to extract an exemplary set of research statements covering different RE topic areas. The selection of the research statements was, in that sense, opportunistic. Most statements were identified in the introduction and conclusion sections of the considered papers. We further validated and discussed the research statements in the team of authors to strengthen our confidence that the statements are reflected correctly. In total, we extracted 21 research statements about quality requirements. Please further note that although the extracted statements are based on publications, they are not necessarily taken verbatim and they do not necessarily reflect the authors’ opinions. Finally, at a later phase of the study, we compared the identified research statements also against recent publications that we found in major venues on requirements engineering and software engineering (in particular, the conferences RE, REFSQ, ESEM, and ICSE) in the time between 2010 and 2016. More specifically, we analyzed all publications categorized as concerning quality requirements in another literature study Franch et al. (2017) and looked for statements that either support or oppose research statements from our initial list of research statements. Table 1 summarizes the research statements we consider for this study and adds further publications supporting or opposing the statements. For a better overview, we clustered the research statements according to the different RE activities.

Id Research Statement Pro Con

General

G1 The application domain strongly influences the relevance of individual types of QRs. Eckhardt et al. (2016a); Rahimi et al. (2014)
G2 Many QRs describe functional aspects of a system. Eckhardt et al. (2016a) Li et al. (2014)
G3 QRs are sometimes ignored. Borg et al. (2003)
G4 Architects do not share a common terminology for types of QRs. Ameller et al. (2012); Daneva et al. (2014); Mahmoud (2015) Daneva et al. (2013)
G5 Only few QRs deal with architectural aspects. Eckhardt et al. (2016a) Clements and Bass (2010); Poort et al. (2012)

Elicitation

E1 In requirements elicitation, the focus is on FRs, not on QRs. Borg et al. (2003); Ameller et al. (2012); Svensson et al. (2011); Rahimi et al. (2014); Felderer et al. (2014); Mahmoud (2015)
E2 Many QRs remain undiscovered. Borg et al. (2003); Rahimi et al. (2014); Mahmoud (2015)
E3 QRs are mainly elicited by architects. Ameller et al. (2012); Daneva et al. (2013) Clements and Bass (2010); Daneva et al. (2014)

Specification

S1 QRs are often not documented. Ameller et al. (2012); Rahimi et al. (2014) Daneva et al. (2013, 2014)
S2 The documentation of QRs is not always precise. Ameller et al. (2012); Svensson et al. (2009); Fotrousi et al. (2014); Li et al. (2014)
S3 QRs are often described in too vague terms. Borg et al. (2003); Li et al. (2014)
S4 The documentation of QRs usually becomes desynchronized. Ameller et al. (2012); Mirakhorli et al. (2012); Mahmoud (2015)
S5 Functional requirements are often labeled as QRs. Eckhardt et al. (2016a); Ernst and Mylopoulos (2010); Rahimi et al. (2014)
S6 QRs are often specified by referencing a standard or a legislative text. Eckhardt et al. (2016a); Daneva et al. (2013)

Testing

T1 Only few types of QRs are validated at the end of the project. Ameller et al. (2012) Daneva et al. (2013)
T2 QRs are satisfied at the end of the project. Ameller et al. (2012); Ernst and Mylopoulos (2010); Anh et al. (2012)
T3 Most QR types are difficult to test properly. Borg et al. (2003); Svensson et al. (2009); Fotrousi et al. (2014); Mahmoud (2015)
T4 Testing QRs is time-consuming. Borg et al. (2003)
T5 Testing QRs is impossible. Borg et al. (2003)

Mgmt.

M1 QRs are often not sufficiently prioritized. Borg et al. (2003); Svensson et al. (2011) Daneva et al. (2013)
M2 Software architects do not use a specific tool for QR management. Ameller et al. (2012)
Table 1: Identified research statements considered in our study with publications supporting (pro) and opposing (con) the statements.

3.2 Subject Selection

With our study, we targeted at practitioners who work with requirements. This includes practitioners who write requirements (e.g., requirements engineers), practitioners whose work is based on requirements (e.g., developers or testers), and practitioners who manage projects or requirements. For inviting practitioners to participate, we did not select a specific closed group of practitioners but, instead, contacted as many practitioners as possible via the authors’ personal contacts from previous collaborations, via public mailing lists such as RE-online, and via social networks. Our survey was further conducted anonymously. Since we were not able to exactly control who is answering the survey, it was especially important to follow Kitchenham and Pfleeger’s Kitchenham and Pfleeger (2008) advice on the need to understand whether the respondents had enough knowledge to answer the questions in an appropriate manner. For this, we excluded data from respondents who answered that they do not use requirements specifications at all or respondents who stated that they did not know how requirements are handled in their company. We offered respondents the chance to leave an email address if they were interested in the results of the survey.

3.3 Data Collection

We integrated the survey questions of this paper into a larger survey about general QR practices using the Enterprise Feedback Suite EFS Survey from Questback. We published the results of this larger study in a previous paper Eckhardt et al. (2016b). However, in the previously published paper, we did not report on the results or discuss any of the questions from this study. We started our data collection on February 4th, 2016 and closed the survey on February 22nd, 2016. In the following, we introduce the main elements of our instrument used. The full instrument is part of our additional material package disclosed in our replication package Vogelsang et al. (2018).

3.3.1 Subject Matter Clarification

In the survey questionnaire, we wanted to ensure that all respondents have a similar understanding about the subject matter. Therefore, we first introduced a common terminology (“With NFRs, we refer to those requirements that address quality characteristics of a product or system (like availability or performance)”) and further narrowed down the scope via specific questions on a set of specific quality characteristics. In the survey, we intentionally used the term “non-functional requirements” when referring to “quality requirements” (excluding process and project requirements as well as constraints), because the term NFR is widely used in practice (see also Section 2.1). In the remainder of this paper, we will exclusively use the term “quality requirements”.

3.3.2 Demographics and Context Factors

We collected demographic data from the respondents to be able to interpret and triangulate the data with respect to different contexts of the respondents. The elicitation of demographic data included the following context factors:

  • Role in project: Free text answers that we afterwards categorized to the project roles manager, requirements engineer, architect, test engineer, developer, and other.

  • Experience: Choice between less than 3 years of experience in dealing with requirements (novice) and more than 3 years of experience (senior).

  • Company size: Choice between less than 250 employees (small company), between 250 and 2,000 employees (medium company), and more than 2,000 employees (large company).

  • Typical project size: Choice between less than 10 employees (small projects), between 10 and 50 employees (medium projects), and more than 50 employees (large projects).

  • Geographical team distribution: Choice between all team members in one location, team members distributed over several locations in one country, and team members distributed over several locations in several countries.

  • Development process paradigm: Choice between rather agile, mixed, rather plan-driven.

  • Industrial sector: Free text answers that we afterwards categorized to the sectors telecommunication, automotive, automation, avionics, finance, healthcare, public, and other.

  • System type: Choice between embedded systems, business information systems, consumer software, and hybrid systems.

  • Role of requirements specifications in the company: Choice between create and use for in-house development, create and an external company is responsible for the development, use as a subcontractor for e.g. development or testing, and don’t use.

  • Documentation of QRs: Choice between yes (QRs are documented) and no (QRs are not documented).

To better understand the participant’s focus and project context, we additionally asked the respondents to evaluate the importance of different types of QRs in their projects. The respondents were asked to assess the importance of quality factors taken from ISO/IEC 25010 ISO/IEC (2011) for their typical projects on a 5-point Likert scale with the values “very important”, “important”, “moderately important”, “slightly important”, “not important”. We added another value “Don’t know” that allows respondents to skip the answer if they can or do not want to answer the question. The quality factors that we asked for were functional suitability, performance/efficiency, compatibility, usability, reliability, security, maintainability, and portability.

3.3.3 Research Statement Agreement

Finally, we presented all research statements to the respondents and asked them to state their agreement with the research statement: “Please consider your experiences: To which degree do you agree with the following statements?” The research statements were presented in an arbitrary order that was different for all respondents. The respondents could express their agreement on a 5-point Likert scale with the values “strongly agree”, “agree”, “neither agree nor disagree”, “disagree”, “strongly disagree”. We added another value “Don’t know” to allow respondents to express that they have no opinion about or cannot answer the question. The last category was included to address the diverse background of respondents—not all respondents will understand all research statements.

3.4 Data Analysis

To asses the relation between the agreement with the research statements and the context factors, we set up a regression model that we applied to all the research statements. For this purpose, the collected data has to be coded appropriately. The following representation focuses on one research statement only. Let the data be given by , where is the response to the research statement of respondent from 1 (strongly agree) to 5 (strongly disagree),

is the vector of context factors, and

is the number of respondents. Each context factor contained in

is coded according to its scale. For the binary context factors, we use usual dummy variables (i.e., auxiliary variables taking the value 0 or 1). They are

Experience [0: novice, 1: senior] and Documentation of QRs [0: no, 1:yes]. For the categorical context factors, we use an effect coding scheme, where the effect of the last category is fixed, respectively. The coding of two examples with three categories is Company size [(1,0): small, (0,1): medium, (-1,-1): large] and Development process paradigm [(1,0): rather agile, (0,1): mixed, (-1,-1): plan-driven]. For the other categorical context factors it works in the same way. For simplicity, the answers on the importance of quality factors were treated as numeric variables in the model.

In the regression model, the response to the research statement is treated as an ordinal variable, that is, one explicitly uses the ordering of the variable. An important class of ordinal regression models is the class of cumulative models 

Agresti (2010)Liu and Agresti (2005)

. The most prominent one is the proportional odds model, which is applied here. The basic form of the model is given by

(1)

where are category specific threshold parameters and

is the vector of regression coefficients (estimated impact of the context factors) that is independent of the category

. Consistent estimates of the regression coefficients are obtained by maximizing the log-likelihood function by means of the Fisher scoring algorithm Fahrmeir and Tutz (2001). Estimation was carried out by the statistical software R R Core Team (2017) using function vglm() of the add-on package VGAM Yee (2010)Yee (2014).

3.4.1 Interpretation of the Parameters

In Equation 1 the threshold parameters define the general preference for the categories of a research statement (the general level of agreement) and the parameters determine the shifting of the agreement distribution by the context factors. In detail, let us consider two respondents with context factors and and the corresponding cumulative odds and . Simple derivation shows that the proportion of the cumulative odds for the two respondents is given by

and therefore does not depend on the category . Consequently, the interpretation of parameters does not depend on the category. More concise, represents the factor by which all the cumulative odds change if context factor increases by one unit, while all the other context factors remain constant. For binary context factors, corresponds to the difference between the two groups, for example, senior compared to novice

. For categorical variables with effect coding,

corresponds to the difference between group and a “middle group”, respectively. Accordingly, each estimate has to be interpreted as the effect of one group compared to a “middle effect”.

If is positive, the distribution is shifted to the left, which means more agreement. On the other hand, if is negative, the distribution is shifted to the right, which corresponds to less agreement.

3.4.2 Hypothesis Tests

Standard errors to examine the significance of each context factor can be obtained by asymptotic theory. For a detailed description on inference techniques, see Fahrmeir and Tutz (2001) and Tutz (2012). It is well known that the covariance of the estimates is asymptotically given by the expected Fisher matrix Fahrmeir and Tutz (2001). This allows us to use Wald tests Wald (1943) for the null hypotheses against . A Wald test is a classical approach to hypothesis testing of coefficients in a regression model. The test provides a p

-value and a test statistic for differences in the coefficients for different context factors. Small sample sizes for certain context factors may lead to larger confidence intervals when the Wald test is applied and thus increase the likelihood of

p-values above the significance level. In Section 4

, we give the results of the Wald tests based on significance (type I error) level

.

3.5 Validity Procedures

To strengthen the confidence in our results, we performed a few validity procedures in advance. To ensure that our respondents are really practitioners we explicitly stated that the survey is aimed at addressing practitioners in the introductory text of the survey. In addition, we removed those respondents from the population that stated that they do not deal with requirements.

To lower the threat of biased answers of our respondents, we conducted the survey anonymously and asked additional questions to characterize the context of the respondents.

To lower the threat that respondents misunderstand or misinterpret particular questions in the questionnaire, we conducted a pilot phase with three practitioners in which we tested and improved the instrument used, but also to evaluate the envisioned data analysis techniques based on the pilot data (which we deleted again prior to starting the survey).

4 Study Results

In total, 283 people followed the link to our online survey, 172 started the survey (61%), and 109 completed it (39%). From these 109 respondents, we excluded 6 as they matched our exclusion criteria (no knowledge about how requirements are handled in their company). The survey answers are also available as .csv file in our additional replication package Vogelsang et al. (2018).

After reporting the demographics, we provide the following results:

  1. An overview of the general agreement for each research statement (Section 4.2 and especially Figure 3).

  2. An overview of the significant context factors for each research statement (Section 4.3 and especially Table 2).

  3. A visual overview of the significant impacts on research statements for each context factor (Section 4.4 and especially Figures 5 and 4).

  4. An aggregated view of the research statements with respect to consensus and mean agreement (Section 4.5 and especially Figure 6).

4.1 Demographics

Figure 2 shows an overview over our study population. The figures show that our respondents cover a wide spectrum of context factors.

(a) Roles
(b) Sector
(c) Company Size
(d) Project Size
(e) Process Paradigm
(f) Experience
(g) Type of System
(h) Geographical Distribution
Figure 2: Overview of study population

4.2 General Agreement with Research Statements

Figure 3 provides an overview of the answer distribution for each research statement. For each statement, on the left side, distribution from strongly agree over neutral (centered) to strongly disagree is shown from dark gray to light gray. On the right side, the total number of respondents who answered the question is shown (the dark gray bar indicates the percentage of don’t know answers). On average, 99 (median of 101) respondents answered each question, with a minimum of 87 (S4) and maximum of 103 (G3, S1, S3, T2).

Figure 3: Distribution of answers for all research statements. Left: from strongly agree (dark gray) to strongly disagree (light gray). Additionally, the total percentage of agreement (and disagreement) is shown and the left (right) side of the plot. Right: the number of total answers in gray and the number of don’t know answers in dark gray.

4.3 Impact of Context Factors to the Level of Agreement

Table 2 shows an overview of the significant context factors and the corresponding estimated regression coefficients for each research statement. The table only contains context factors for which the regression model indicates a statistically significant impact to the level of agreement (p-value: ). A row in this table can be read as follows: For the research statement stated in the first column (e.g., G1), respondents belonging to one of the groups listed in the second column tend to agree more with this statement (e.g., respondents using an agile process paradigm, respondents who stated that maintainability is important, and respondents working in medium size projects). Respondents belonging to one of the groups listed in the right-hand side column tend to disagree more with this statement (e.g., respondents working in small projects or respondents who stated that portability is important). The change factor provided in brackets after each context factor represents the factor by which the probability to agree or disagree more changes if this context factor changes (see also Section 3.4.1).

RS Tendency to agree more Tendency to disagree more
G1 Agile process paradigm (5.1), Maintainability important (4.0), Medium projects (3.1) Portability important (2.1), Small projects (6.8)
G2 - -
G3 Consumer SW (53.1), Automation sector (15.9), Medium projects (2.8) Compatibility important (2.4), Usability important (3.1), Small projects (5.6), Embedded Systems (13.7)
G4 Healthcare sector (11.5), Non-distributed project (5.2), Small projects (4.6)
G5 Automotive sector (15.2), Testers (8.3), Managers (3.9), Nat. distributed projects (3.3), Compatibility important (2.3) Requirements engineers (4.5), Small companies (4.5)
E1 Performance/Efficiency important (3.7) Managers (5.3), Nat. distributed projects (6.8), Railway sector (75.8)
E2 Healthcare sector (9.0), Small companies (7.7), Performance/Efficiency important (2.8), Medium projects (2.8), Maintainability important (2.8) Funct. suitability important (2.7), Reliability important (3.4), Small projects (6.6), Testers (6.9), Avionics sector (26.7), Railway sector (41.2)
E3 Architects (24.9), Healthcare sector (7.3), Automotive sector (6.6), Inhouse dev. (3.2), Performance/Efficiency important (3.2) Requirements engineers (4.0)
S1 Architects (7.6) Medium companies (3.5), Embedded systems (6.8), Railway sector (70.9)
S2 - -
S3 Automation sector (15.8) Automotive sector (6.2), Documented QRs (29.0)
S4 Developers (104.2), Telecommunication sector (24.8), Performance/Efficiency important (3.3) Portability important (2.2), Reliability important (5.0), Business information systems (5.7), Requirements engineers (6.8), Funct. suitability important (8.7), Automation sector (22.0), Railway sector (69.6)
S5 Agile process paradigm (6.7), Embedded systems (5.7), Architects (4.4), Mixed process paradigm (1.9) Portability important (2.0), Plan-driven process paradigm (4.4), Business information systems (6.6)
S6 Performance/Efficiency important (3.2) -
T1 Healthcare sector (12.4), Automotive sector (12.2), Business information systems (4.6) Portability important (2.3), Consumer SW (66.7), Railway sector (261.6)
T2 - Telecommunication sector (9.3)
T3 Automotive sector (6.6), Managers (6.4), Business information system (5.3), Agile process paradigm (3.5) Non-distributed projects (6.9), Finance sector (14.5), Seniors (17.7)
T4 Avionics sector (59.4), Agile process paradigm (8.3), Architects (8.3) Portability important (2.6), Testers (6.8)
T5 Non-distributed projects (6.6), Inhouse dev. (3.1), Security important (3.1) Small companies (7.6), Seniors (27.6)
M1 Architects (5.7) Usability important (2.5), Testers (13.0), Seniors (17.8)
M2 Consumer SW (120.7), Reliability important (14.5), Medium companies (8.0) Small companies (96.9)
Table 2: Relation between context factors and level of agreement for each research statement. The change factor (given in brackets) indicates the factor by which the agreement/disagreement changes if this context factor is changed by one unit, while all the others remain constant.

4.4 Context Factor Analysis

To better understand the influence of context factors on agreement or disagreement, we also report the results of our regression model along the significant context factors. In particular, we create a graph that contains all research statements (G1-5, E1-3,S1-6, T1-5, M1-2) as nodes and all context factors that show at least one significant influence in our regression model as nodes. For each context factor that shows a correlation with agreement or disagreement with a research statement, we add an edge to the graph, colored in green (positive) and red (negative), respectively. Moreover, we weight the width of each edge with the value of the regression coefficient. Since the whole graph with all statements and context factors is rather large, Figures 5 and 4 show the subgraphs for each group of context factors.222The gephi file of the whole graph is part of our additional material package Vogelsang et al. (2018).

For example, in Figure (a)a, the relationship between the role of a participant and the research statements is shown: Testers tend to disagree more with research statements M1, E2, T4 and tend to agree more with research statement G5.

(a) Role
(b) Agility
(c) Company size
(d) Experience
(e) QR documentation
(f) Company distribution
(g) Domain
Figure 4: Context factor analysis (1/2): The research statements are depicted as gray nodes and the context factors as colored nodes. Edges indicate a tendency to agree more (green) or to disagree more (red). The width of the arrow indicates the value of the regression coefficient.
(a) Project Size
(b) SRS role
(c) QR Importance
(d) Sector
Figure 5: Context factor analysis (2/2): The research statements are depicted as gray nodes and the context factors as colored nodes. Edges indicate a tendency to agree more (green) or to disagree more (red). The width of the arrow indicates the value of the regression coefficient.

4.5 Summary and Categorization of Results

In Figure 6, we have plotted the research statements with respect to their mean level of agreement333We are aware that using the mean as a measure for central tendency of Likert (i.e., ordinal) scales is something to be careful with. Therefore, we refrain from interpreting the mean value itself but use the value only to order the statements with respect to their level of agreement. In addition, we have labeled the statements also with respect to their median value. and their consensus value, which is a measure of dispersion for answers on Likert scales Tastle and Wierman (2007). We divide this plot into four areas: We consider statements with a high level of agreement and high level of consensus as Commonalities (between academic statements and practitioners’ perception). In contrast, we consider statements with a low level of agreement and high level of consensus as Differences (between academic statements and practitioners’ perception). Research statements with a low level of consensus are more interesting for research. If the general level of agreement is high but there is also low consensus in the answers, there is a need for follow-up studies to investigate why the statements may not be true in certain areas (i.e., why some specific respondents disagreed with the statements). If the general level of agreement is low and there is also a low level of consensus in the answers, there is a need for investigating the context of the original studies in more detail. It might be the case that the statement has been stated in a particular context that does not generalize to other contexts.

Figure 6: Classification of research statements into four areas based on consensus and mean agreement.

In Table 3, we list the four categories with the belonging statements. In the table, we list only the statements with a non-neutral median agreement.

Commonalities
G1: The application domain strongly influences the relevance of individual types of QRs
T4: Testing QRs is time-consuming
M1: QRs are often not sufficiently prioritized
S2: The documentation of QRs is not always precise
M2: Software architects do not use a specific tool for QR management
G3: QRs are sometimes ignored
Differences
T5: Testing QRs is impossible
S5: Functional requirements are often labeled as QRs
Need for follow-up studies
S3: QRs are often described in too vague terms
T1: Only few types of QRs are validated at the end of the project
T3: Most QR types are difficult to test properly
E1: In requirements elicitation, the focus is on FRs, not on QRs
E2: Many QRs remain undiscovered
Need for contextualization
G5: Only few QRs deal with architectural aspects
Table 3: Summary of research statements and categories

4.5.1 Commonalities

We identified six statements with a generally high level of agreement among the participating practitioners. Those statements include G1 (“The application domain strongly influences the relevance of individual types of QRs”), T4 (“Testing QRs is time-consuming”), M1 (“QRs are often not sufficiently prioritized”), S2 (“The documentation of QRs is not always precise”), M2 (“Software architects do not use a specific tool for QR management”), and G3 (“Many QRs describe functional aspects of a system”). These statements are not only commonly agreed between academics and practitioners with a high level of consensus, but they seem also to be commonly agreed within the academic research community itself as we found only one publication that argues against one of these statements (M1). For all other statements, we only found publications supporting the statements (see Table 1).

4.5.2 Differences

We identified two statements with a low level of agreement and a high level of consensus, i.e., these statements are rejected by practitioners in general. This area includes T5 (“Testing QRs is impossible”) and S5 (“Functional requirements are often labeled as QRs”). The general disagreement with statement S5 is interesting since we found three independent publications Eckhardt et al. (2016a); Ernst and Mylopoulos (2010); Rahimi et al. (2014) where this statement was issued (see Table 1).

One explanation we have for the strong disagreement with statement T5 is, not surprisingly, its universal nature. It is reasonable to assume that our respondents have encountered at least one situation where this statement does not hold, especially when considering quality requirements for which testing procedures have, in fact, been adopted in one form or the other; for instance, because of their functional nature that allows for the direct adoption of existing testing procedures (e.g., performance-related requirements), but also because other QR classes are often in scope of dedicated engineering activities (e.g., usability-related requirements being in scope of, for example, prototypes and mock-ups).

4.5.3 Need for Follow-up Studies

Statements for which we identified the need to conduct follow-up studies are those with a high-level of agreement while, at the same time, showing a low level of consensus among the responding practitioners. For example, the statement E1 (“In requirements elicitation, the focus is on FRs, not on NFRs”) has a high level of agreement in general but a low level of consensus. In terms of related publications, this statement has a strong standing with six papers supporting the statement and none opposing it (see Table 1). However, respondents from the railway sector, for instance, have strongly objected this statement (see Table 2). Therefore, it may be interesting to investigate this domain in more detail to find out why the statement seems not to be true in that domain. We see similar trends for statements S3, T3, and E2. For all of these statements, we found only publications supporting the statements, however, Table 1 shows certain groups of practitioners who disagree with the statements (e.g., automotive sector for S3, senior developers for T3, or railway sector for E1 and E2).

4.5.4 Need for Contextualization

We identified only statement G5 (“Only few NFRs deal with architectural aspects”) in this particular area. It is the only statement with a low level of agreement between practitioners in general but also with a low level of consensus in the answers. That could mean that the statement was stated in a specific context where it may be true but in many other contexts, the statement is perceived as not true. This is in tune with the balanced number of supporting and opposing publications related to this statement (see Table 1).

5 Discussion

In the following, we discuss the results that we found particularly interesting and contrast them with the original studies where they were stated.

Please note that we do not discuss the results of every research statement in detail as this would be purely speculative. Hence, the presented discussion can only serve as starting point for further studies examining the discussed aspects because our study does not provide any data on the specific reasons why practitioners agree or disagree with a research statement.

5.1 In Elicitation, Priority is on FRs, not on QRs

Most of our respondents confirmed that in requirements elicitation, the focus is on FRs, not on QRs (E1). Respondents from the railway domain and from large, distributed projects are an exception. They tend to disagree more with this statement. Nevertheless, most respondents confirmed that QRs are at least documented in the end. They disagreed with the statement that QRs are not documented at all (S1). This is also in line with observations from another survey, where 88% of respondents answered that they document QRs Eckhardt et al. (2016b). As a result of this lower priority of QRs in the elicitation, our respondents stated that many QRs remain undiscovered (E2) and only few types of QRs are validated at the end of a project (T1). Interestingly, respondents from the railway sector who stated that the elicitation focuses not necessarily on FRs only also disagreed with the negative consequences of undiscovered and not validated QRs. Further studies are neccessary to disover a possible causal relation between these effects. In general, practitioners seem to focus on FRs in requirements elicitation and handle QRs with lower priority.

5.2 Different Roles have Diverging Opinions about QRs

As shown in Figure (a)a, respondents with different roles have different opinions (regarding agreement/disagreement) on the research statements. Specifically, requirements engineers and architects seem to disagree on the responsibility of QR elicitation. While architects overly agreed that QR elicitation is mainly done by them (E3), requirements engineers tend to disagree with this statement. A similar divergence is between requirements engineers and developers when it comes to documentation of QRs (S4). It seems that developers perceive QR documentation more often as not in sync with the current state of the system, while requirements engineers do not perceive this threat as that strong.

Testing QRs is perceived as time-consuming especially by architects. Testers, on the other hand, overly disagreed with the statement T4 (Testing QRs is time-consuming). This seems an interesting point for further research on specific tests that architects have in mind.

The statement that only few QRs deal with architectural aspects (G5) is interesting since the architects themselves do not have a strong opinion for or against that statement while all other roles have. Testers and managers agree more with the statement while requirements engineers disagree more.

5.3 Are QRs really Non-Functional?

Research statements G2 and S5 belong to the statements with which the respondents disagreed most (39% and 59% disagreement). Both statements address the confusion about the classification of requirements. G2 states that many QRs describe functional aspects of a system and S5 states that functional requirements are often labeled as QRs, which sounds counterintuitive. One explanation to us is that this could be the reason why respondents had a tendency to disagree with these. The statements originate from a paper where the authors investigated requirements specifications from industry Eckhardt et al. (2016a). In the documents they analyzed, they found a considerable number of requirements that were labeled as QR but that actually described functionality. The disagreement with the statements shows that the results of that study are controversial. Therefore, we need more studies that investigate the effects of blurry classification rules.

5.4 Testing QRs is hard but not impossible

In general, our respondents were not as pessimistic about the possibility to test QRs as the original study suggests. In fact, 86% disagreed with the respective statement T5 (Testing QRs is impossible). This shows that our respondents at least seem to have some ideas how to test QRs. Especially experienced respondents overly disagreed with this statement. Similarly, respondents from large companies strongly disagreed. It would be interesting to see what impact the company size has on testing procedures. Respondents that tend to agree more with the impossibility of testing QRs worked in non-distributed projects or considered security as an important quality attribute for their systems. This may indicate to that we are still lacking good testing mechanisms for security.

On the other hand, our respondents support the statement that testing QRs is difficult (T3) and time-consuming (T4). 55% agreed or even strongly agreed that testing QRs is difficult. Only 8% disagreed with the statement that testing QRs is time-consuming, while 18% even strongly agreed with it. Especially in the avionics and automotive industry, testing seems to be a big issue since respondents of these industries overly agreed with T3 and T4. Also, respondents who stated that they work with an agile process paradigm agreed significantly stronger with the difficulty and necessary effort to test QRs. The only group of respondents that tend to disagree more with the statement that testing is time-consuming are the testers themselves. It seems that testers have a more optimistic perception towards testing QRs.

In the original publication of statements T4 and T5, the authors stated a refined version of the statements. They said “[…] when expressed in non-measurable terms [QR] testing is time-consuming or even impossible” Borg et al. (2003). In our survey, this premise about a measurable specification of QRs does not play a specific role for a stronger agreement. The context factors that correlate with a strong agreement to statements T4 and T5 do not correlate with the research statements that relate to a precise documentation of QRs (e.g., S2, S3).

5.5 Different Domains, Different Problems

For a number of research statements, the industry sector of the respondents is strongly correlated with the participant’s agreement to the statement. This is further corroborated by research statement G1 (The application domain strongly influences the relevance of individual types of QRs.) showing the strongest agreement among the respondents. This means that the perceived importance, the handling, and the problems associated with QRs strongly depend on the industry sector. For instance, while respondents from the healthcare and automotive sector strongly agreed with a number of research statements (see Figure (g)g), the respondents from the railway sector strongly disagreed with a number of research statements: research statement T1 (Only few types of QRs are validated at the end of the project.) seems to be strongly accepted in the healthcare and automotive domain but not in the railway domain. Similarly, research statement E2 (Many QRs remain undiscovered) was well received by respondents from healthcare but opposed by respondents from avionics or railway. This could indicate to the different relevance of QRs in the respective domains. Railway, healthcare, and automotive may provide interesting domains for conducting further case studies on the handling of QRs because respondents from these areas had strong positive or negative feelings about the research statements.

5.6 QRs and Architecture: A Love-Hate Relationship

Many publications stress the importance of QRs for architectural decisions. The participant’s answers support this relation. The statement that only few QRs deal with architectural aspects (G5) was rejected by the majority of respondents (56%), especially by respondents from small companies and by requirements engineers. On the other hand, there were also some groups of respondents that significantly agreed more with this statement, namely respondents from the automotive industry, testers, and managers. It would be interesting to investigate what these groups see in QRs besides architectural aspects. In the original publication of that statement Eckhardt et al. (2016a), the authors reported that they identified a large number of requirements that they found in QR sections of industrial requirements documents as functional requirements.

Despite this important relation between QRs and architecture, the respondents mostly agreed with a lack of common terminology between architects regarding QRs (G4) and also with the statement that there is no specific tool support for managing QRs (M2). Yet, these issues do not seem to be that prevalent in small projects or companies as respondents from these groups disagreed more with statements G4 and M2.

5.7 The Perceived Importance of Quality Attributes Shapes the Opinion

In the results, the agreement tendency also correlates with the perceived importance of QR types for the systems that the respondents develop.

Respondents who considered performance/efficiency as an important quality in their systems also had a distinct opinion about elicitation and specification of QRs. For example, these respondents overly agreed with the statement that in elicitation, the focus is on FRs and not on QRs (E1). On the other hand, they also tend to agree more with the statement that many QRs remain undiscovered (E2). Additionally, respondents disagreed that QRs are mainly elicited by architects (E3). The authors of another study on a similar topic mention that companies have specialized teams focusing on performance or reliability Eckhardt et al. (2016b). This could indicate that QR elicitation is mainly done by these specialized teams. A downside of that could be that the documentation of QRs becomes desynchronized to which our respondents with focus on performance also agreed overly (S4).

The opinion of respondents who considered portability as an important quality in their systems deviated several times from the average opinion of all respondents. They overly disagreed with the influence of the application domain (G1). Interestingly, people who considered maintainability as an important quality attribute overly agreed with the influence of the application domain. The study of Jung et al. Jung et al. (2004) provide indication that portability and maintainability actually measure the same intrinsic concept. However, this study was only performed in the context of one specific system. Our study indicates that there may still be a difference between the two characteristics in some application domains. Regarding specification of QRs, respondents who considered portability as an important quality overly disagreed with the statement that functional requirements are often labeled as QRs (S5) and with the possible desynchronization of QR documentation (S4). In testing, they disagreed that testing is time-consuming and that only few types are validated at the end of a project. This could indicate that testing of systems with a high demand of portability is actually in a good shape.

5.8 Experience Fosters Optimism

When focusing on the group of respondents that are already experienced in their field (more than 3 years of experience), we see that these disagree more with some of the negative statements about QRs. For example, seniors are not so pessimistic about the possibility of testing QRs (T5), the difficulty of testing QRs (T3), and the insufficient prioritization of QRs (M1). These results may indicate that testing and prioritizing QRs is hard and need practice to be done properly. On the other hand, this may also indicate that we currently lack sufficient methods and tools for these activities, which seniors may compensate with experience and best practices.

A second interpretation of these results is that the different perceptions are due to a changing perception of related risks. Novices may still strive for perfection (e.g., 100% test coverage) and therefore perceive testing and prioritizing QRs as overly hard. Seniors, on the other hand, may have experienced that small deviations from a perfect solution can be compensated and, thus, perceive some of the challenges related to QRs as less dramatic.

5.9 Scale Changes the View on QRs

Working in small projects and small companies is different from working in large and distributed projects. Our results show that this difference also correlates with the perception of some QR statements. As for the project size, we found three statements (G1, G3, and E2) with which respondents in small projects overly disagreed while respondents from medium and large projects overly agreed. In small projects, the influence of the application domain on the relevance of QR types is perceived less strong compared to larger projects (G1). Additionally, it seems that some problems related to QRs get more serious in larger projects. Respondents from medium and large projects overly agreed that many QRs remain undiscovered (E2) and that QRs are sometimes ignored (G3), while respondents from small projects overly disagreed with these statements.

Interestingly, the effects of differences in project size are not in tune with those of differences in company size. For company size, only research statement M2 shows a tendency that respondents of small companies disagreed stronger with the statement that architects do not use a specific tool for QR management while respondents from medium and large companies overly agreed with this statement. For statement E2 (Many QRs remain undiscovered), we even got a counterintuitive result since respondents from small projects overly disagreed with the statement, while respondents from small companies overly agreed with it. This means that undiscovered QRs are not so much of a problem in small projects, however, they are in small companies.

6 Threats to Validity

Our results and conclusions are subject to a number of threats that we discuss in the following:

6.1 Identification of Research Statements

The identification of the research statements as described in Section 3.1 has not been done as part of a systematic literature study. One of the authors browsed through the papers of major recent requirements engineering and software engineering venues to identify papers on quality requirements first and then extract statements from them. Since we validated and discussed the extracted statements in the team of authors, we are confident that the analyzed statements fit the study purpose; however, we do not claim that the list of research statements is complete or representative for all views in the research community. In fact, we have to assume that there are other research statements on QRs that we have not addressed in our study. Therefore, we refrain from making any statements about the completeness of research statements on QRs in general. Nevertheless, we argue that this does not invalidate the results we gathered from the research statements in scope. It is reasonable to doubt that respondents would have answered differently if there were more research statements.

6.2 Participant Selection

One limitation in the study is the missing lack of control over the respondents given that we distributed the survey invitation over various networks. Apart from the ultimately unknown response rate, this means that we cannot control how representative the responses are. Despite the applied validity procedures described in Section 3.5, we cannot guarantee that all the views taken really result from practitioners.

When looking at the distribution of context factors, we see that some context factors are only represented by a low number of respondents (e.g., railway domain, junior engineers, or consumer software). This does not invalidate the significance verdicts of the Wald test because smaller sample sizes may lead to larger confidence intervals and, thus, increase the likelihood of non-significant results. However, small sample sizes, on the other hand, may decrease the statistical power of our results. Lower statistical power indicates a lower probability that an effect is actually true when the Wald test says it is significant. In other words, the small sample sizes for some context factors do not threaten the validity of the observed significances but increase the chance that these significant results do not indicate a true effect because the few respondents with that specific context factor may not be representative for all subjects in that group.

6.3 Survey Research

Further threats to the validity result from the nature of survey research. We cannot control on which basis the respondents provide their answers and the respondents might be biased. Secondly, there is the possibility that respondents have misinterpreted some of the questions or even the concept of QRs. Some of the research statements that we found in the literature are formulated very fuzzy (e.g., only few QRs deal with architectural aspects; How many are few? What are aspects?). Therefore, respondents may have interpreted the statements differently. Thus, respondents could select “Don’t know” if they could not understand a statement. However, this option was selected in a few cases only (see also Figure 3). Our interpretation is that the respondents had an opinion on most of the statements in their exact wording.

6.4 Data collection

In the questionnaire, we characterized some context factors by specific boundaries (e.g., large companies are those with more than 2,000 employees, seniors are those respondents with more than 3 years of experience). We set these boundaries based on what we believe is reasonable and what has been also used in other studies. However, the results may change if we would have set the boundaries differently.

6.5 Statistical Model

The regression model applied here is well tailored to the analysis of the research statements measured on a Likert scale, as it exploits the ordinal structure of the answers. An attractive property is the simple interpretation of its parameters, which results from the proportional odds assumption. However, one should be aware that this implies a strong assumption on the underlying data generating process; that is, it assumes that the interpretation of parameters does not depend on the category, i.e., the level of agreement (see Section 3.4.1). On the other hand, an extended version of the model that we used in which all (or part of) the parameters are category-specific makes the parameters hard to interpret and in the presence of only 103 respondents yields unreliable estimates.

6.6 Limitations in the Generalization

Related to the threats described above and as it is often the case with survey-based research, the ability to generalize is difficult due to the size of the sample and due to the diversity of the answers given. The lack of related studies with explicit context factors similar to those captured in our survey further renders generalization by analogy very difficult, if not impossible. However, while generalization of certain assessments of statements (i.e., the opinions and experiences made by practitioners) were not part of our overall objectives, we still argue that we can generalize from our results at least to a few selected context factors where our respondents commonly agreed in their assessments. More precisely, we are confident in that the assessments of the research statements for which we have a high level of agreement and a high level of consensus among the respondents (see the upper right corner in Figure 6) could be generalized, with caution, to those context factors that seemed to play a role. One such example is statement G1. On the other hand, those statements with a rather low consensus indicate that we need, at least, more detailed investigations on the context factors to have a more differentiated view that would allow for generalizations (e.g., G2).

7 Conclusions

In our study, we aim at assessing the perception of research statements about quality requirements in the view of practitioners. We want to identify potential research gaps for specific contexts, potential differences that raise the need for further investigations, or statements that are only true in certain contexts. To this end, we identified 21 exemplary research statements as reported in academic literature and surveyed practitioners about their general agreement with the statements. We were not only interested in whether practitioners agree or disagree with the statements but also whether context factors have an influence on the perception. This shall allow us to pinpoint to research statements where additional research might be useful.

Our results are based on an online survey yielding 103 responses from practitioners having a broad spectrum of different backgrounds. We analyzed the responses by means of a statistical regression model that calculates the factor by which the probability to agree or disagree more changes if a context factor changes. Our results show that a majority of the statements is also well respected by practitioners; however, not all of them. When examining the different groups and backgrounds of respondents, we noticed interesting deviations of perceptions within different groups that may lead to new research questions. Especially the perceptions of respondents with different roles may explain why communication and clarification problems about QRs may occur in practice. Most respondents perceive testing QRs as time-consuming and difficult, especially in agile contexts and in the automotive and avionics sector. Testers themselves have a more positive view on testing QRs. They overly disagreed with the more pessimistic statements about problems related to QRs. We additionally conclude from our results that it is not reasonable to speak about QRs in general because different types of QRs have very different characteristics and related challenges. The importance of specific types of QRs strongly influenced their perception on how to elicit, document, test, and mange QRs.

Overall, however, we need to differentiate in our findings between statements to which practitioners commonly agreed with a high level of consensus and statements where the consensus is lower (see Figure 6). While the first category of statements allows for a certain generalization of our findings, the latter is especially interesting for additional research as it highlights the need for better contextualisation. As future work, we therefore motivate and plan ourselves for further investigations in this direction. In particular, we argue for the need for further replications considering more context factors as well as triangulations in the inquiry methods going beyond survey research. Especially case study research is of particular interest to further explore the perceptions of practitioners on the research statements with respect to the particularities of their context.

Finally, one hope we associate with our work is also that other researchers are encouraged to describe more explicitly the context in which they conduct their empirical studies, and that they discuss the conditions and possible limitations of their statements. Otherwise, their conclusions might just contribute further to the existing leprechauns that still dominate requirements engineering research.

Acknowledgements

We thank the respondents of our survey for sharing their opinion with us. We furthermore thank Wolfgang Böhm, Kevin Schlieper and Tobias Mühlbauer for piloting the study and Jan Sürmeli for feedback on earlier versions of this manuscript.

References

  • A. Agresti (2010) Analysis of ordinal categorical data, 2nd edition. Wiley. Cited by: §3.4.
  • D. Ameller, C. Ayala, J. Cabot, and X. Franch (2012) How do software architects consider non-functional requirements: An exploratory study. In 20th IEEE International Requirements Engineering Conference (RE), Cited by: 1st item, §1, §2.1, Table 1.
  • N. D. Anh, D. S. Cruzes, R. Conradi, M. Höst, X. Franch, and C. Ayala (2012)

    Collaborative resolution of requirements mismatches when adopting open source components

    .
    In 18th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ), Cited by: Table 1.
  • A. Borg, A. Yong, P. Carlshamre, and K. Sandahl (2003) The bad conscience of requirements engineering: an investigation in real-world treatment of non-functional requirements. In 3rd Conference on Software Engineering Research and Practice in Sweden (SERPS), Cited by: 3rd item, §1, §2.1, Table 1, §5.4.
  • M. Broy (2018) Rethinking functional requirements: A novel approach categorizing system and software requirements. Cited by: §1, §2.1.
  • L. Chung and B. A. Nixon (1995) Dealing with non-functional requirements: three experimental studies of a process-oriented approach. In 17th International Conference on Software Engineering (ICSE), Cited by: §1, §2.1.
  • P. Clements and L. Bass (2010) Using business goals to inform a software architecture. In 18th IEEE International Requirements Engineering Conference (RE), Cited by: Table 1.
  • D. Damian and J. Chisan (2006) An empirical study of the complex relationships between requirements engineering processes and other processes that lead to payoffs in productivity, quality, and risk management. IEEE Transactions on Software Engineering 32 (7). Cited by: §1.
  • M. Daneva, L. Buglione, and A. Herrmann (2013) Software architects’ experiences of quality requirements: What we know and what we do not know?. In 19th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ), Cited by: Table 1.
  • M. Daneva, S. Marczak, and A. Herrmann (2014) Engineering of quality requirements as perceived by near-shore development centers’ architects in eastern europe: The hole in the whole. In 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), Cited by: Table 1.
  • P. Devanbu, T. Zimmermann, and C. Bird (2016) Belief & evidence in empirical software engineering. In 38th International Conference on Software Engineering (ICSE), Cited by: §2.2, §2.2.
  • J. Eckhardt, D. Mendéz Fernández, and A. Vogelsang (2015) How to specify non-functional requirements to support seamless modeling? A study design and preliminary results. In 9th International Symposium on Empirical Software Engineering and Measurement (ESEM), Cited by: §1.
  • J. Eckhardt, A. Vogelsang, and D. Mendéz Fernández (2016a) Are non-functional requirements really non-functional? An investigation of non-functional requirements in practice. In 38th International Conference on Software Engineering (ICSE), Cited by: 2nd item, §1, §2.1, Table 1, §4.5.2, §5.3, §5.6.
  • J. Eckhardt, A. Vogelsang, and D. Mendéz Fernández (2016b) On the Distinction of Functional and Quality Requirements in Practice. In 17th International Conference on Product-Focused Software Process Improvement (PROFES), Cited by: §1, §2.1, §3.3, §5.1, §5.7.
  • N. A. Ernst and J. Mylopoulos (2010) On the perception of software quality requirements during the project lifecycle. In 16th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ), Cited by: Table 1, §4.5.2.
  • L. Fahrmeir and G. Tutz (2001) Multivariate statistical modelling based on generalized linear models. Springer. Cited by: §3.4.2, §3.4.
  • M. Felderer, A. Beer, J. Ho, and G. Ruhe (2014) Industrial evaluation of the impact of quality-driven release planning. In 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), Cited by: Table 1.
  • F. Fotrousi, S. A. Fricker, and M. Fiedler (2014) Quality requirements elicitation based on inquiry of quality-impact relationships. In 22nd IEEE International Requirements Engineering Conference (RE), Cited by: Table 1.
  • X. Franch, D. Mendéz, M. Oriol, A. Vogelsang, R. Heldal, E. Knauss, G. H. Travassos, J. C. Carver, O. Dieste, and T. Zimmermann (2017) How do practitioners perceive the relevance of requirements engineering research? An ongoing study. In 25th IEEE International Requirements Engineering Conference (RE), Cited by: §3.1.
  • M. Glinz (2007) On non-functional requirements. In 15th IEEE International Requirements Engineering Conference (RE), Cited by: §1, Figure 1, §2.1.
  • ISO/IEC (2001) ISO/iec 9126. software engineering – product quality. ISO/IEC. Cited by: §2.1.
  • ISO/IEC (2011) Systems and software engineering – Systems and software quality requirements and evaluation (SQuaRE) – System and software quality models. Technical report International Organization for Standardization. Cited by: §1, §3.3.2.
  • H. Jansen (2010) The logic of qualitative survey research and its position in the field of social research methods. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research 11 (2). Cited by: §3.
  • H. Jung, S. Kim, and C. Chung (2004) Measuring software product quality: a survey of ISO/IEC 9126. IEEE Software 21 (5). Cited by: §2.1, §5.7.
  • B. A. Kitchenham and S. L. Pfleeger (2008) Personal opinion surveys. Cited by: §3.2.
  • F. Li, J. Horkoff, J. Mylopoulos, R. S. S. Guizzardi, G. Guizzardi, A. Borgida, and L. Liu (2014) Non-functional requirements as qualities, with a spice of ontology. In 22nd IEEE International Requirements Engineering Conference (RE), Cited by: Table 1.
  • I. Liu and A. Agresti (2005) The analysis of ordered categorical data: An overview and a survey of recent developments. Test 14 (1). Cited by: §3.4.
  • A. Mahmoud (2015) An information theoretic approach for extracting and tracing non-functional requirements. In 23rd IEEE International Requirements Engineering Conference (RE), Cited by: Table 1.
  • D. Méndez Fernández and J.-H. Passoth (2019) Empirical software engineering: From discipline to interdiscipline. Journal of Systems and Software 148. Cited by: §1.
  • D. Méndez Fernández, S. Wagner, M. Kalinowski, M. Felderer, P. Mafra, A. Vetro, T. Conte, M.-T. Christiansson, D. Greer, C. Lassenius, T. Männistö, M. Nayebi, M. Oivo, B. Penzenstadler, D. Pfahl, R. Prikladnicki, G. Ruhe, A. Schekelmann, S. Sen, R. Spinola, J.L. de la Vara, A. Tuzcu, and R. Wieringa (2017) Naming the pain in requirements engineering. Empirical Software Engineering 22 (5). Cited by: §1.
  • M. Mirakhorli, Y. Shin, J. Cleland-Huang, and M. Cinar (2012) A tactic-centric approach for automating traceability of quality concerns. In 34th International Conference on Software Engineering (ICSE), Cited by: Table 1.
  • K. Pohl (2010) Requirements engineering: Fundamentals, principles, and techniques. Springer. Cited by: §2.1, §2.1.
  • E. R. Poort, N. Martens, I. van de Weerd, and H. van Vliet (2012) How architects see non-functional requirements: Beware of modifiability. In 18th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ), Cited by: Table 1.
  • R Core Team (2017) R: A language and environment for statistical computing. Cited by: §3.4.
  • M. Rahimi, M. Mirakhorli, and J. Cleland-Huang (2014) Automated extraction and visualization of quality concerns from requirements specifications. In 22nd IEEE International Requirements Engineering Conference (RE), Cited by: Table 1, §4.5.2.
  • A. Rainer, T. Hall, and N. Baddoo (2003) Persuading developers to ’buy into’ software process improvement: a local opinion and empirical evidence. In 2nd International Symposium on Empirical Software Engineering (ISESE), Cited by: §2.2.
  • S. Robertson and J. Robertson (2012) Mastering the requirements process: Getting requirements right. Addison-wesley. Cited by: §1.
  • I. Sommerville and P. Sawyer (1997) Requirements engineering: a good practice guide. John Wiley & Sons, Inc.. Cited by: §1.
  • R. B. Svensson, T. Gorschek, B. Regnell, R. Torkar, A. Shahrokni, R. Feldt, and A. Aurum (2011) Prioritization of quality requirements: State of practice in eleven companies. In 19th IEEE International Requirements Engineering Conference (RE), Cited by: Table 1.
  • R. B. Svensson, T. Gorschek, and B. Regnell (2009) Quality requirements in practice: An interview study in requirements engineering for embedded systems. In 15th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ), Cited by: §1, §2.1, Table 1.
  • W. J. Tastle and M. J. Wierman (2007) Consensus and dissention: A measure of ordinal dispersion. International Journal of Approximate Reasoning 45 (3). Cited by: §4.5.
  • G. Tutz (2012) Regression for Categorical Data. Cambridge University Press. Cited by: §3.4.2.
  • A. Vogelsang, J. Eckhardt, D. Mendéz Fernández, and M. Berger (2018) The leprechauns of quality requirements: Beliefs and dispersion in academia and practice - Additional material. External Links: Document, Link Cited by: §1, §3.3, §4, footnote 2.
  • A. Wald (1943) Tests of statistical hypotheses concerning several parameters when the number of observations is large. Transactions of the American Mathematical society 54 (3). Cited by: §3.4.2.
  • T.W. Yee (2010) The VGAM package for categorical data analysis. Journal of Statistical Software 32 (10). Cited by: §3.4.
  • T. W. Yee (2014) VGAM: vector generalized linear and additive models. Cited by: §3.4.