How do Quantifiers Affect the Quality of Requirements?

02/07/2020 ∙ by Katharina Winter, et al. ∙ Technische Universität München Berlin Institute of Technology (Technische Universität Berlin) 0

Context: Requirements quality can have a substantial impact on the effectiveness and efficiency of using requirements artifacts in a development process. Quantifiers such as "at least", "all", or "exactly" are common language constructs used to express requirements. Quantifiers can be formulated by affirmative phrases ("At least") or negative phrases ("Not less than"). Problem: It is long assumed that negation in quantification negatively affects the readability of requirements, however, empirical research on these topics remains sparse. Principal Idea: In a web-based experiment with 51 participants, we compare the impact of negations and quantifiers on readability in terms of reading effort, reading error rate and perceived reading difficulty of requirements. Results: For 5 out of 9 quantifiers, our participants performed better on the affirmative phrase compared to the negative phrase. Only for one quantifier, the negative phrase was more effective. Contribution: This research focuses on creating an empirical understanding of the effect of language in Requirements Engineering. It furthermore provides concrete advice on how to phrase requirements.



There are no comments yet.


page 12

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Requirements are a crucial part of the software development process. However, in contrast to the code making up the software, requirements themselves do not have much direct value for a customer. Femmer and Vogelsang define requirements as “means for a software engineering project” [7]. Thus, bad quality in requirements may result in issues that possibly arise in later stages of the development process leading to a rework of process steps, potentially impacting software code or tests, for example. Indicators of these potential quality issues are named “Requirements Smells” [8], including, for instance, ambiguous words or passive voice. In this paper, we examine the use of specific quantifiers as one particular type of requirements smells. Although the use of quantifiers, such as “at least”, “all”, or “exactly”, is substantial in requirements specifications [2], they have not received much attention in literature so far. Questions on how different use and phrasing of quantifiers affect the quality of requirement artifacts remain unacknowledged. To shed light on this topic, we categorize the quantifiers into different scopes and use this categorization as a theoretical foundation to compare them. Each quantifier scope has one semantic interpretation but can be expressed in different syntactic ways. For example, “At least” and “Not less than”, belong to the same semantic scope but one is expressed in an affirmative syntax, while the other is expressed in negative syntax.

In this paper, we examine 9 different quantifier scopes and compare the impact on requirements readability. We conducted an experiment with 51 participants and compare reading times, error rates, and perceived difficulty of quantifiers in affirmative and negative syntax. The goal of our research is to provide empirical evidence for justifying requirements writing guidelines and offer best practices on quantifier usage in requirements specifications.

Our results show that the use and phrasing of specific quantifiers has a significant effect on reading times, errors, and perceived difficulty. Based on our results, we formulate concrete advice for writing better requirements.

2 Background

2.1 Quantifiers in the English Language

Determiners are frequent parts of speech in the English language. While determiners in general describe what a noun refers to, for instance, “the”, “some”, or “their”, quantifiers represent a subcategory of determiners referring to a certain quantity of the noun. Keenan and Stavi offer an extensive list of natural language determiners, which includes a substantial number of quantifiers [11]. Many quantifiers have similar meaning. As an example, “at least n” or “n or more” include the same set of items with regard to “n”. We categorized the quantifiers according to their semantic scope, i.e. quantifiers of the same category hold true for equal sets. Based on this categorization, we defined 11 scopes, of which two defined as Some and Many are ambiguous and thus irrelevant to this paper, which deals with explicit quantifiers. The 9 exact scopes are: None, All, Exactly n, At least, At most, Less than, More than, One and All but, as depicted in Figure 1. Exact quantifiers are either numbers, like “one”, “two”, or “exactly a hundred”, which is contained in the scope Exactly n. When speaking of All, every element of a set is included, while None as its counterpart excludes all elements. Some quantifiers are graded: The scope At least is upward entailing, i.e. it includes all elements in the subset , while the scope At most is its downward entailing counterpart. The scopes More than and Less than are similar, however, they have open intervals, thus exclude the value “n”. The scope One refers to a certain instance, rather than any set with a certain property. Quantifiers included in this scope are, for example, “the”, or “a”, as in “the object”, or “a group of objects”. The scope All but is the counterpart to this scope, excluding this instance of a set.

Figure 1: Quantifier Scopes

The quantifiers listed by Keenan and Stavi [11]

can be classified into these semantic scope categories. From the original set of determiners, indefinite quantifiers, such as “nearly all” are excluded and duplicates or similar quantifiers, such “five or more” and “a hundred or more” are aggregated into one scope.

Hence, natural language possesses a variety of determiners [9] that can be utilized to express the different scopes of quantification. This presents us the question, whether some determiners are more readable and comprehensible than others with an equivalent semantic scope.

2.2 Affirmative and Negative Sentences

Christensen [4] examined the neurobiological implications of affirmative and negative sentences in the brain. The findings suggest that different brain areas are activated when processing affirmative and negative sentences, i.e. sentences containing a negative operator. Moreover, the brain requires more processing time for negation than for affirmation, thus response time is also longer. Performance, however, is suggested to be equal for both types of sentences. According to Christensen, affirmative sentences involve a simpler semantic and syntactic structure than negative sentences [4]. According to their work, negative sentences entail a more complex syntactical structure, which requires “additional syntactic computation” in the brain [4]. Christensen denotes affirmative polarity as “default”, to which negative operators add additional structure. More precisely, when reading negative sentences, the human brain interprets all sentences as affirmative at first and in the second step, adds negative polarity to negative sentences [4].

In the dataset of requirements specifications used as a source for this paper, quantifiers are formulated in both, affirmative and negative form. To express the same semantic scope, one can employ positive and negative structures. For example, one could say “at least ten”, or equivalently “no fewer than ten”. Which of these two possibilities is more advisable to use in requirements specifications? Although Christensen has given an indication on the answer to this question, it could also be assumable that negative quantifiers yield longer response time, but better reading comprehension.

2.3 Requirements Readability

Requirements artifact quality can be understood as the extent to which properties of the artifact impact activities that are executed based on the artifact. In particular, quality factors affect the effectiveness and efficiency of use [7]. One relevant activity on requirements specifications is reading [1]. Consequently, good quality in practice includes efficient and effective readability of the requirements specifications. We therefore examine the implications of the quality factor quantifiers on effectiveness and efficiency of reading. We understand readability as an indicator for the “ease of understanding or comprehension due to the style of writing” [12]. Readability thus describes reading efficiency and good quality in readability minimizes the reading effort to gain comprehension of the requirements. Reading comprehension indicates effectiveness of reading. When considering readability, the reading performance must not be neglected. Although ease of understanding and sentence comprehension are closely related, good readability that yields a wrong understanding of the phrase is an indicator of bad quality. It is thus required to achieve both, efficiency and effectiveness in requirements specifications.

Objective indicators are one aspect of the assessment of quality in readability. Klare [12] makes a point with the statement: “The reader must be the judge”. Hence, subjective perception should also be considered in regard to readability of requirements specifications. Therefore, we examine the readability, comprehension, and subjective perception of syntactically affirmative and negative quantifiers for a limited set of quantifier scopes.

3 Study Design

In this study, we analyze the impact of affirmative and negative quantifier phrases on readability, comprehension, and perceived difficulty.

Research Question 1: How does affirmative and negative syntax of quantifiers impact reading efficiency?

Research Question 2: How does affirmative and negative syntax of quantifiers impact reading effectiveness?

Research Question 3: How does affirmative and negative syntax of quantifiers impact the subjective perception of reading difficulty?

3.1 Data Collection

We conducted an experiment to gather data on the research questions following the guideline by Wohlin et al. [14]. To assess the differences between affirmative and negative quantifiers, we examine the relationship between quantifier syntax and readability, comprehension, and perceived difficulty. We implemented a web-based experiment, which yields a controllable testing environment and allows for a general evaluation of our hypotheses since the experiment questions are not bound to a certain context and thus do not require prior knowledge on a particular topic. Instead, the web application contains an artificial problem to easily gain first results on the research questions.

3.2 Study Objects and Treatments

Based on the research questions, the independent variable that is controlled in this experiment is the syntactical structure of the quantifying sentences. The two treatments are affirmative and negative syntactical structure. The dependent variables that will be measured in the experiment are the readability, understandability, and subjective perception of difficulty for each treatment.

Wohlin et al. [14] offer a standard design type for such experiments with one factor and two treatments. Leaning on this design type, we aim “to compare the two treatments against each other”. Furthermore, we choose a paired comparison study design, where “each subject uses both treatments on the same object” [14]. We compare the two treatments, affirmative and negative syntactical sentence structure, on sentences addressing the same quantifier scope.

Table 1 lists all samples of quantifiers that are given in the experiment. These samples were made up by us and did not have any specific background or focus. For each quantifier scope, an affirmative quantifier and a negative equivalent is displayed. Note that the scope None is a special case, as it is naturally negative and thus its counterpart is positive. Words in bold are characteristic for the respective syntactic structure.

Scope Affirmative Syntax Sample Negative Syntax Sample
All All registered machines must be provided in the database. No deficit of a machine is not provided in the database
None All access is blocked without a valid login. None of the service workers may have access to ’Budget’.
More than At more than 5 deficits the signal token turns red. Not only defective machines are displayed in the system.
At least The number of new parts per order must be at least 3. A highly defective machine has no less than 5 defects.
At most Per machine, at most 4 photos can be uploaded to the database. An approved machine has no more than 2 minor defects.
Less than Less than 3 supervisors may be assigned to each service worker. Not as many supervisors as 3 may be assigned to each machine.
Exactly n Exactly 2 emergency contacts must be displayed at all times. No more or less than 2 supervisors must be online at all times.
All but All machines but the current one must be on the list ’new jobs’. No location but the location of the current machine is on the map.
One Only the location of the current machine is on the map. The current job is the only job that is not listed in ’last jobs’
Table 1: Affirmative and negative syntax samples for each quantifier scope.

The task of the experiment was to compare the given sentence with three given situations and decide which of the three situations (one, two, or all three) match the given sentence. The situations are presented as images. Figure 5 depicts one of the 18 answers in the experiment and belongs to the sentence “A highly defective machine has no less than 5 defects”. The images are nearly identical, except for quantification, represented in red crosses in this image. The quantifier scope At least, which is stated here in negative syntax, entails the amounts of {five, six, seven, …} crosses. Thus, the correct answers to select are Image 1 and Image 2, as they are entailed, whereas Image 3 does not accurately describe the sentence.

(a) Image 1 (correct answer)
(b) Image 2 (correct answer)
(c) Image 3 (wrong answer)
Figure 5: Example question from the experiment: Which of the images match the sentence “A highly defective machine has no less than 5 defects”?

As recommended by Wohlin et al. [14], the order of the sentences is randomized to prevent the effect of order and have a balanced design, such that the subjects’ paths through the experiment are diverse. To further avert information gain from past questions, we not only randomized each sentence pair, but mix all sentences. Moreover, we created sentence pairs with identical quantifying scopes and similar but not equal semantic meaning (see Table 1).

3.3 Subject Selection

We selected the subjects for the experiment by convenience sampling [14] via mailing lists, or personal and second-degree contacts of the authors. The experiment was conducted online with anonymous participants. Thus, we had no control over the situation and context in which the experiment was executed by each participant. Our web-based experiment was started by 76 participants of which 51 completed the experiment. All figures in this paper refer to the 51 participants that completed the experiment. Prior to the experiment, we ask the participants whether they have a background in computer science (yes: 94.1%, no: 5.9%), whether they are native English speakers (yes: 5.9%, no: 94.1%), and what their profession is (academic: 23.5%, professional: 49.0%, student: 27.5%).

3.4 Data Analysis

To answer the proposed research questions, we selected the following metrics.

Readability: To evaluate readability in terms of efficient reading, the effort of reading needs to be measured. Many studies and experiments measure reading time as an indicator for the level of difficulty it requires to process a sentence [13, 5, 10]

. Therefore, we also use reading time as an indicator of reading difficulty to examine the effort for a person to understand a sentence. In the experiment, we measured the time that a participant required to read and comprehend the sentence. The counter was started when the sentence appeared on the screen and stopped again when the participant clicked on the button to submit the answer. To examine the differences in reading time between the affirmative and the negative syntax sample for each scope, we applied a Wilcoxon signed-rank test, which is suitable for comparing two paired samples with data that is not normally distributed. As we will see later, the assumption of the Wilcoxon signed-rank test holds, since the reading times in our experiment are not normally distributed. The test returns a

p-value to assess the significance of the effect and an effects size to assess the magnitude of the effect.

Understandability: To measure correctness, we test whether the understanding reflects the true meaning of the sentence or represents a false belief. As discussed in Section 2.1, some quantifiers entail a range of correct solutions. For instance, the quantifier five items or more entails all numbers of items of five and above (i.e. five, six, seven,…). For other quantifiers, like exactly five items

, one number, namely five, is the correct quantification, while all other numbers, like four or six items, do not reflect the true meaning of the quantifier. Hence, the three situations presented as answers in the online experiment are independent and include correct as well is incorrect quantifications of the given statement. We consider a sentence as “understood” if all included and excluded options are correctly identified. To examine the differences in correctly understood sentences, we build a 2x2 contingency table containing the number of participants with correct and incorrect answers in affirmative and negative sentences (see Table 

2). Since our samples are matched, we focus on the discordant cells in the contingency table ( and ) and apply an exact binomial test to compare the discordant cell

to a binomial distribution with size parameter

and probability

. This test is suggested for 2x2 contingency tables with matched samples and few samples in the discordant cells (). As a measure for the effect size, we report the odds ratio: .

affirmative syntax
incorrect correct
negative syntax incorrect a b
correct c d
Table 2: 2x2 contingency table of correct and incorrect answers for one scope.

Perceived Difficulty: For the determination of perceived difficulty, we asked the participants to rate the reading difficulty on a scale with the values “easy”, “medium”, and “difficult”. We use this ordinal scale as it allows for the assessment of less to greater, where intervals are not equal. The perceived difficulty is subjective and intervals between the options “easy” and “medium”, as well as between “medium” and “difficult” are not necessarily equal. Furthermore, levels of difficulty may differ in between the category itself. To examine the differences in the perceived difficulty, we applied a Wilcoxon signed-rank test, which is suitable for comparing two paired samples with ordinal data.

3.5 Experiment Validity

Prior to starting the experiment and collecting the data, we launched a test run with three participants to receive feedback on the correctness of language, the comprehensibility of the overall experiment, and remaining technical bugs. Although the affirmative and negative syntax sample for each quantifier describe different situations (see Section 3.1), the generated sentences are similar by choosing a narrow vocabulary throughout the experiment. The difference between sentences averages about 1.77 words, where in five cases the affirmative sentence contains more words and in four cases the negative sentences is longer. The sentences have a simple structure, such that other syntactical phenomena, like sentence complexity, do not invalidate the results. On average, the sentences have 11 words. For each sentence, the study subjects have three answer options. To avoid complexity of the answers through e.g. answer sentences that are difficult to understand, the answer options are displayed as images (see Figure 5). Like the sentences, the images have a similar image vocabulary containing equal symbols and language of form. For each sentence in the experiment, the images have minimal, but distinguishable differences. One or more of these images represent the correct meaning of the sentence given. By providing more than one correct answer, the effect of exclusion by comparison between different images should be avoided and the subject is forced to deal with each answer option separately.

To assure transparency and improve reproducibility, we have published the raw results of the experiment and the R-script that we used for processing the data.111

4 Study Results

4.1 Effects on Readability (RQ1)

When examining the collected reading times, we saw that all values were below 77 seconds, except for two data points where the reading time were 665 seconds and 12,281 seconds (both measured for sentences with negative syntax). Since we had no control over the situation in which the experiment was conducted, we consider both data points as outliers, possibly due to a disturbance of the participant, and removed the data points as well as their corresponding affirmative sentences from the dataset. Figure 

6 displays boxplots of the remaining reading times for each scope. As shown in the figure, for six of the nine pairs, it took more time on average to read the negative quantifier compared with the positive quantifier of the same scope.

Figure 6: Distribution of reading times per scope

Table 3 lists the results of the Wilcoxon signed-rank test for each scope in terms of the p-value and effect size for significance level .

Scope All None More than At least At most Less than Exactly n All but One
p-value .000 .019 .077 .000 .409 .001 .000 .000 .000
effect size .46 .23 .18 .37 .08 .32 .58 .52 .52
Table 3: Wilcoxon signed-rank test for differences in reading times between affirmative and negative syntax in each scope.

According to the significance test, the following quantifier scopes exhibit a significant difference in reading time: All, None, At least, Less than, Exactly n, All but, and One. Only for quantifiers At most, and More than

, we were not able to reject the null hypothesis of equal reading times. Among the quantifier scopes,

All but and None yield significantly longer reading times for the affirmative quantifier than for the negative, as depicted in Figure 6. In all other cases, affirmative quantifiers perform better than their negative equivalences regarding the average reading time. The effect size values indicate small () to moderate effects ([6]. An effect size of means that exactly 50% of participants spent less reading time for the affirmative sentence than the mean reading time for the negative case (i.e., there is no difference). A moderate effect size of indicates that 69% of participants spent less reading time for the affirmative sentence than the mean reading time for the negative case, while for large effect size () this is already true for 79% of participants.

4.2 Effects on Comprehension (RQ2)

Figure 7 shows the ratio of incorrect answers per scope. For 6 of the 9 quantifiers, our participants made more errors in the sentence with negative syntax. Only for the quantifier scopes More than, At most, and All but, the participants made more errors in the sentence with affirmative syntax.

Figure 7: Distribution of incorrect answers per scope

Table 4 lists the results of the exact binomial test for each scope in terms of the p-value and the odds ratio as a measure for effect size for significance level .

Scope All None More than At least At most Less than Exactly n All but One
p-value .007 1.0 .000 .013 1.0 .017 .008 .016 1.0
odds ratio 6.50 1.50 12.50 6.00 1.33 3.40 Inf Inf 2.00
Table 4: Binomial test for differences in error ratio between affirmative and negative syntax in each scope

For 5 of the 9 quantifier scopes, our participants made significantly more errors in the negative sentence. For the scopes All but and More than, our participants made significantly more errors in the affirmative sentences. The odds ratios as a measure for effect size varied between small effects (), moderate effects () and large effects ([3]. An odds ration of 6.5 for the scope All, for example, means that the chances of incorrectly answering the negative sentence was 6.5 times higher than the chances of answering the affirmative sentence incorrectly.

4.3 Effects on Perceived Difficulty (RQ3)

After each question in the experiment, the participants were confronted with a self-assessment scale on how difficult they perceived the sentences. Answer options were easy, medium, and difficult. Figure 8 depicts the assessments over the participants.

Figure 8: Subjective perception of sentence difficulty per scope

Table 5 lists the results of the Wilcoxon signed-rank test for each scope in terms of the p-value and effect size for significance level .

Scope All None More than At least At most Less than Exactly n All but One
p-value .000 .510 .000 .000 .141 .000 .000 .164 .001
effect size .55 .07 .36 .51 .15 .49 .52 .14 .37
Table 5: Wilcoxon signed-rank test for differences in perceived difficulty between affirmative and negative syntax in each scope

Six of the nine quantifier scopes show significant differences in the perceived difficulty. For all of these scopes, the participants perceived the affirmative phrase as easier. The effect size measures for the scopes with significant differences all indicate moderate effects ([6]. For the remaining three scopes, the difference in perceived difficulty is not significant.

4.4 Summary of the Results

Figure 9 summarizes the results of the three research questions. The figure shows the scopes and the measured differences with a qualitative evaluation of the effect sizes according to Cohen [6].

Figure 9: Summary of results

Overall, negative quantifiers perform worse in more cases than positive quantifiers, which is clear for the quantifiers All, At least and Exactly n and also apparent for the quantifier Less than. For other quantifiers, results are neutral, like the quantifier None, which is in a special position, as it naturally is formulated in negative syntax, or At most and One, which exhibit neutral objective measurements, but show tendencies in self-assessment towards differences in subjective difficulty. The quantifier More than was the outlier in the measurements regarding the number of mistakes in the positive sentence. Thus, this result should be treated with care. Especially, since self-assessment showed clear tendencies that the negative sentence is more difficult. Last but not least, the extra quantifier All but performed worse in two measurements, namely reading time and self-assessment, and only neutral when it came to the number of wrong answers. Hence, it is the only quantifier that yields a worse overall performance of the positive quantifier.

5 Discussion

5.1 Threats to Validity

For the interpretation of the results, several threats to validity need to be considered.

Construct Validity: Only one specific representative quantifier was stated for each scope in the experiment. Thus, results are inferred from these exact representatives, not the quantifier scopes in general. Using other quantifiers to express a scope may possibly yield different results. In addition, results may depend on the setup of the experiment. As subject area of all sample requirements, we used a software product for machine maintenance. Each quantifier was then embedded in a sentence that was equal for all test subjects and had specific answer options encoded as images. A different use of sentences, images, or other factors, such as the professional background and English proficiency of the subjects, could lead to different results.

Conclusion and Internal Validity: Prior to starting the experiment and collecting the data, we launched a test run with three participants to receive feedback on the correctness of language, the comprehensibility of the overall experiment, and remaining technical bugs. Since we used an experimental design where all participants were faced with all treatments in a random order, we do not expect effects due to a confounding variable. The sample size of 51 subjects is reasonable to draw statistical conclusions. We only asked the participants whether they have a background in computer science, whether they are native English speakers, and what their profession is. We did not analyze the effect of these demographic factors due to the small size of single groups. In addition, we are not able to analyze effects related to further contextual factors of subjects such as experience, closeness to the application domain, or others. We selected the applied statistical tests based on the characteristics of the experiment (e.g., paired samples) and checked the test’s assumptions (e.g., normal distribution). All elicited measures (reading times, number of errors, and perceived difficulty) are independent from any kind of judgement by the authors. To make the results transparent, we report p-value and effect size. Still, we used an arbitrary, yet common, significance level threshold of for the statistical tests.

External Validity: Since we used convenience sampling, we cannot claim that our participant group is representative for the group of all people working with requirements. Particularly, participants with a different language background may have more or less difficulties with negative or affirmative syntax. In addition, we used artificial requirements for the treatments. We cannot claim that these are representative for real requirements in the context of each participant.

5.2 Interpretation and Writing Guidelines

Taking the threats to validity into account, we can cautiously interpret these results. We conclude that some quantifiers exhibit better readability and better comprehension when phrased in affirmative syntax. Furthermore, self-perception mostly coincides with readability and comprehension, which might be owed to the fact that longer reading time and the guessing of answers impact the perceived difficulty of the sentences. Nevertheless, even for the quantifier scopes where participants made significantly more errors and spent more reading time with the affirmative syntax, the participants did not perceive the negative phrasing as easier to read (see Figure 9).

An observation that was surprising to us was the high error rate for the scopes More than (affirmative case) and Less than (negative case). As shown in Figure 7, almost 60% of our participants answered the question incorrectly. A deeper analysis of the results showed that, for the sentence “more than x…”, a large number of participants incorrectly selected the answer that showed exactly x instances. This results is mirrored in the negative case of the Less than scope. Apparently, our participants had difficulties with sentences that represent open intervals. Given that the error ratio for scopes At least, and At most is lower, we may conclude that it is better to use formulations that represent closed intervals.

In summary, we draw the following conclusions that can be used as advice for writing requirements that are faster to read, have lower chances of misinterpretation, and are perceived as easier to read:

Use affirmative syntax for scopes All, At least, Less than, Exactly n, and One: Write All… instead of No…not Write At least… instead of No less than… Write Less than… instead of Not as many as… Write Exactly n… instead of No more or less than n… Write Only … instead of Only…not Use negative syntax for the scope None: Write None of… instead of All…without Use closed-interval formulation instead of open-interval formulation: Write At least… instead of More than… Write At most… instead of Less than… In doubt, use affirmative syntax since it is perceived as easier.

5.3 Relation to Existing Evidence

Berry and Kamsties [2] noticed that some quantifiers may be dangerous to use in requirements because they create ambiguity. They specifically recommend avoiding indefinite quantifiers, such as “nearly all”, and the quantifier all with a plural noun because it is not clear whether the corresponding statement applies to each instance separately or to all instances as a whole. In our experiment, the affirmative sentence for the scope All contained the quantifier all with a plural noun. Although 10% of our participants gave an incorrect answer for this sentence, this number was not particularly higher than in other scopes.

Christensen [4] performed an empirical study on the effect of negative and affirmative statements on response time (i.e., how fast did subjects answer questions about the presented statements) and reading performance (i.e., how often were the answers correct). They found significantly shorter response times for affirmative sentences and lower error rates (differences were not significant). Our results corroborate the results of Christensen in general although there were some scopes with effects in favor of the negative syntax (e.g., All but).

6 Conclusion

In the course of this study, we raised questions on the readability, comprehension, and subjective difficulty of affirmative and negative quantifier formulations in natural language requirements. We designed and conducted a web-based experiment, from which we evaluated the results using the time for readability, correctness for comprehension, and self-assessment for subjective difficulty. The results were interpreted and yielded a tendency towards better overall performance of affirmative quantifiers compared to their negative equivalences. This extends and confirms related studies from psycholinguistics. Moreover, our results suggest using quantifiers representing closed intervals instead of open intervals.

Our results depict first empirical impressions on quantifiers in requirements specifications. However, much about this topic remains to examine. First of all, it remains to review, whether the categorization of quantifiers in this study is sensible or whether other categorizations are also possible. Since we only examined one concrete quantifier formulation for each scope, the results may not be generalized to other syntactic representations of the same scope. Future research could thus involve repeating the experiment with a different set of quantifiers in a different context to validate the results and give additional information to eventually generalize the results. Last but not least, certain quantifiers could be proposed as new requirements smells and tools may be used to detect these smells to improve the quality of natural language requirements.


  • [1] I. Atoum (2020)

    A novel framework for measuring software quality-in-use based on semantic similarity and sentiment analysis of software reviews

    Journal of King Saud University - Computer and Information Sciences 32 (1), pp. 113 – 125. Cited by: §2.3.
  • [2] D. M. Berry and E. Kamsties (2005) The syntactically dangerous all and plural in specifications. IEEE Software 22 (1), pp. 55–57. Cited by: §1, §5.3.
  • [3] H. Chen, P. Cohen, and S. Chen (2010) How big is a big odds ratio? Interpreting the magnitudes of odds ratios in epidemiological studies. Communications in Statistics - Simulation and Computation 39 (4), pp. 860–864. Cited by: §4.2.
  • [4] K. R. Christensen (2009) Negative and affirmative sentences increase activation in different areas in the brain. Journal of Neurolinguistics 22 (1), pp. 1–17. Cited by: §2.2, §5.3.
  • [5] R. K. Cirilo and D. J. Foss (1980) Text structure and reading time for sentences. Journal of Verbal Learning and Verbal Behavior 19 (1), pp. 96–109. Cited by: §3.4.
  • [6] J. Cohen (2013) Statistical power analysis for the behavioral sciences. Routledge. Cited by: §4.1, §4.3, §4.4.
  • [7] H. Femmer and A. Vogelsang (2019) Requirements quality is quality in use. IEEE Software 36 (3), pp. 83–91. Cited by: §1, §2.3.
  • [8] H. Femmer, D. Mèndez Fernàndez, S. Wagner, and S. Eder (2017) Rapid quality assurance with requirements smells. Journal of Systems and Software 123, pp. 190–213. Cited by: §1.
  • [9] M. Glanzberg (2006) Quantifiers. In The Oxford Handbook of Philosophy of Language, pp. 794–821. Cited by: §2.1.
  • [10] A. C. Graesser, N. L. Hoffman, and L. F. Clark (1980) Structural components of reading time. Journal of Verbal Learning and Verbal Behavior 19 (2), pp. 135–151. Cited by: §3.4.
  • [11] E. L. Keenan and J. Stavi (1986) A semantic characterization of natural language determiners. Linguistics and Philosophy 9 (3), pp. 253–326. Cited by: §2.1, §2.1.
  • [12] G. R. Klare (2000) The measurement of readability: Useful information for communicators. ACM J. Comput. Doc. 24 (3), pp. 107–121. Cited by: §2.3, §2.3.
  • [13] D. G. MacKay (1966) To end ambiguous sentences. Perception & Psychophysics 1 (5), pp. 426–436. Cited by: §3.4.
  • [14] C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wesslén (2000) Experimentation in software engineering: An introduction. Kluwer Academic Publishers. Cited by: §3.1, §3.2, §3.2, §3.3.