“Every line of code represents an ethical and moral decision; every bit of data collected, analyzed, and visualized has moral implications.” Grady Booch (Booch, 2014)
Over the last few years, the media has reported many examples of software-intensive systems that intentionally or unintentionally ignored or violated ethical and human values (S. Galhotra, Y. Brun, and A. Meliou (2017); 1). These systems have sometimes posed irreversible damages and challenges to end-users (e.g., loss of life), society (e.g., ignoring or being biased against a particular gender), and the software industry (e.g., damaging the software creator’s reputation). For example, Amazon’s “Prime same-day delivery service” was developed to provide all US citizens an equal and fair shopping experience (Ingold and Soper, 2016). However, it has been found that it prevents black neighborhoods from receiving such a shopping experience. In another recent example, the Facebook AI-based feature recommendation system mistakenly labeled a video of Black men as ‘Primates’ (12; 1).
Human values such as privacy, inclusion, power, fairness, and pleasure are defined as something that is deemed important for an individual, a group of people, or a society (Schwartz, 1992). Many definitions and models have been proposed for human values in social science (Cheng and Fleischmann, 2010). However, the most well-known and widely used one is Schwartz’s theory of basic human values (Schwartz, 1992, 2012). Schwartz’s theory of basic values includes ten universal values (self-direction, stimulation, hedonism, achievement, power, security, conformity, tradition, benevolence, universalism), which were identified through a survey of participants across 80 countries. While ethics are referred to as moral expectations that all individuals in a society agree upon, each person’s values may differ from those of another (Whittle et al., 2021; Fieser, 2016). Hence, human values are “enduring beliefs that a specific mode of conduct or end state of existence is personally or socially preferable to an opposite or converse mode of conduct or end state of existence” (Rokeach, 1973).
Human values have extensively been studied in the human-computer interaction field since the late 1980s (Whittle et al., 2021). However, the software engineering community has recently attempted to develop new software engineering practices and techniques or adapt the existing ones (e.g., the value-based requirements engineering method (Thew and Sutcliffe, 2018) and the fairness-aware programming technique (Albarghouthi and Vinitsky, 2019)) to operationalize human values in software (e.g., (Perera et al., 2020b; Ferrario et al., 2016; Harbers et al., 2015)). Operationalizing human values in software is defined as “the process of identifying human values and translating them to accessible and concrete concepts so that they can be implemented, validated, verified, and measured in software” (Shahin et al., 2021). The ultimate goal is to develop software systems that better reflect and respect human values. Other lines of research aimed to understand which types of human values are discussed by developers in GitHub’s issue tracking systems (Nurwidyantoro et al., 2021) and analyzed app reviews to determine which human values are ignored or violated by apps (Obie et al., 2021a; Shams et al., 2020).
Inspired by (Booch, 2014; Whittle et al., 2021; Gotterbarn et al., 2018), arguing that the codes developed or decisions made by developers may have a moral and ethical implication or a values implication, the objective of this study is to understand the human values-sensitive implications of developers’ solutions. Given Stack Overflow is the most active online platform for developers to share their programming issues and programming solutions (e.g., a code snippet or a design pattern as a solution), we focus on Stack Overflow. Each Stack Overflow post includes a question and a list of answers posted to the question (Zhang et al., 2019a). Stack Overflow provides mechanisms for users to comment on both questions and answers. This enables them to have further discussions (e.g., discuss the weaknesses of a solution) on the posted questions and answers.
In this study, we analyzed 2000 comments and their associated questions or answers (i.e., 1980 unique questions or answers). We first studied which types of human values in Schwartz’s theory of basic values are violated by Stack Overflow posts by manually analyzing comments. Then, we investigated the reactions of Stack Overflow users to human value violations. Key contributions of the work include:
To our knowledge, the first detailed study of human values violations in Stack Overflow posts
A large number of Stack Overflow comments are manually analyzed against the Schwartz human values framework
Identification of 315 comments using the framework that raise concerns that their 313 unique associated posts violate human values
A set of recommendations for practitioners and researchers to address human value violations in SO posts
We first introduce our research motivation and questions in Section 2. Section 3 provides the background and summarizes the related studies. In Section 4, we describe our data collection process, followed by reporting our findings in Section 5. We reflect on our findings in Section 6. Possible threats of our study are reported in Section 7. We conclude our paper in Section 8.
2. Motivation and Research Questions
Posts (questions and answers) in Stack Overflow may come with a number of issues. For example, a proposed solution in an answer might be obsolete (Zhang et al., 2019b) or even incorrect (Zhang et al., 2019a). Stack Overflow provides different mechanisms such as downvoting (43) and commenting (11) to enable Stack Overflow users to indicate the possible issues associated with posts. Stack Overflow recommends that question and answer comments be used to provide constructive criticism, to request clarification for a question or an answer from the poster, or to add relevant information (11). A question owner can also comment under their question, and comments under an answer can be added by the answer owner and the owner of its associated question. Any user with at least 50 reputation points can post comments under any question and answer (11; H. Zhang, S. Wang, T. Chen, and A. E. Hassan (2019a)).
Previous research has shown that more than 50% of both hidden and displayed comments are informative and can enhance their associated answers (Zhang et al., 2019a). Zhang et al. (Zhang et al., 2019a)classified comments in Stack Overflow in seven categories, in which comments falling in advantage, improvement, weakness, inquiry, and addition categories are considered informative. Hence, we argue that comments (in particular improvement, weakness, and inquiry comments) can be the ideal place to point out problems and weaknesses of a post from a human values perspective. The reason behind this is that these types of comments try to challenge a question or answer.
Figure 1 shows a question (Question ID: 12686545) in Stack Overflow with one of its highly voted answers and some of the comments posted under the answer. As shown in Figure 1, a user posted a comment to criticize the proposed answer because spammers can use it, for example, to send thousands of unsolicited emails to GitHub’s users resulting in upsetting or annoying them. Hence, we argue that this answer has a values implication (e.g., violating the value of hedonism - value item of pleasure). We developed the following research questions that we wanted to answer in this study:
RQ1. Which types of human values are perceived to be violated in Stack Overflow posts?
Motivation. As discussed, comments under a post can be used by Stack Overflow users to point out problems and weaknesses of a post from different perspectives, such as a human values perspective. This research question aims to identify types of values that are violated in Stack Overflow posts, and the frequency with which each value type is violated, by analyzing the claims made in Stack Overflow comments. This question can provide deep insights for Stack Overflow askers and answerers on potential human values-implications of their posts and the Stack Overflow community to develop mechanisms to recognize such posts.
RQ2. How quick are commenters to raise concerns over human values violations?
Motivation. A post (question or answer) might get several comments. Comments under a post are sorted and displayed by their creation time. As discussed earlier, the nature of comments varies from praising a post to criticizing and pointing out possible issues. This research question aims to understand how quickly comments citing human values violations are added to a post and determine their position among all comments under a post.
RQ3. Are posts accused of violating human values downvoted by Stack Overflow users?
Motivation. Apart from the commenting mechanism, the downvoting mechanism in Stack Overflow also enables the community to indicate the posts (questions and answers) with deficiencies, e.g., incorrect answers (43). While voting down questions is free, voting down answers comes with some costs (43). A limited number of downvotes can be made by a user (voter) per day. Voting down an answer diminishes two reputations from the post owner and one reputation from the voter. Hence, voting down an answer should be done carefully. In this research question, we want to know whether Stack Overflow users vote down posts that violate human values despite the associated costs.
RQ4. How does the original poster react to concerns of potential human values violations?
Motivation. Once human values violations occur in a software system, its creators are expected to react to such violations adequately (e.g., fixing values violations immediately). In this research question, we aim to understand the reactions of Stack Overflow posters (askers and answerers) once issues of potential human values violations are raised concerning their posts. Such reactions can range from completely denying the violation to modifying the post to mitigate the violation.
3. Background and Related Work
3.1. Human Values in Software Engineering
Human values are the guiding principles for what people consider important in life (Cheng and Fleischmann, 2010). Although these principles are often unarticulated using formal terminologies, they undergird people’s decisions, technologists and non-technical people alike. Hence, the influence of human values can be detected in people’s preferences, from the choice of end-user applications (Obie et al., 2021b), to the technical design decisions of developers in software engineering projects (Winter et al., 2019).
The study of human values in software engineering (SE) is often based on Schwartz’s theory of basic human values (Perera et al., 2020a). This theory organizes values into 10 broad values and is established on surveys conducted in multiple countries covering a wide range of ages, genders, occupations, cultural backgrounds, and geography (Schwartz, 1992). Table 1 shows the 10 value categories. The 10 value categories comprise 58 value items, e.g., the value category of benevolence is comprised of the value items of responsible, helpful, forgiving, honest, loyal, mature love, a spiritual life, meaning in life, and true friendship (c.f. (Schwartz, 1992)). Additionally, the theory has been widely accepted and adopted in several areas, including the social sciences, computer science, and software engineering (Perera et al., 2020a).
Recent research on human values in SE has underscored the need for software companies to directly cater to issues of human values in their software development processes (Whittle et al., 2021), as the resulting software artefacts have a direct and indirect impact on end-users and society at large. Hussain et al. (Hussain et al., 2020) argue that the maturity levels of companies in addressing human values may very well depend on their awareness and overall organisational culture. They propose that incorporating human values should be done through the evolution of already established software practices, i.e., adapting existing processes to include human values considerations, e.g., the inclusion of values in the development of personas, rather than through a revolution of the field of SE. Winter et al. (Winter et al., 2018) proposed the values Q-sort method for measuring human values in SE. Applying the values Q-sort method to 12 software engineers shows 3 “software engineer” values prototypes. In a similar study, Shams et al. (Shams et al., 2021) applied the portrait values questionnaire (PVQ) to 193 Bangladeshi female farmers to elicit their values. Their study reported conformity and security as the most important value categories while power, hedonism, and stimulation were the least important for Bangladeshi female farmers.
Other studies have applied indirect approaches by using app reviews as a proxy for eliciting values requirements. Shams et al. (Shams et al., 2020) analysed 1,522 reviews from 29 Bangladeshi agricultural apps to understand both the desired and missing values that should be addressed in the development of such apps. Furthermore, Obie et al. (Obie et al., 2021a)
introduced a dictionary-based natural language processing technique for detecting the violation of human values in app reviews. The result of their study showed that 26.5% of the analyzed 22,119 app reviews contained perceived violations of human values by the end-users. In addition,benevolence and self-direction were the most violated categories while conformity and tradition were the least violated categories.
As important steps towards addressing the violation of the values of mobile apps users in society, Obie et al. (Obie et al., 2021a) proposed the mining of values requirements from rich data sources, the alignment of values between stakeholders in SE projects, and the adoption of critical technical practice in mobile SE. A recent study further contends that careful consideration of domain context in the design and application of values instruments should be made during values requirements gathering, as the hierarchy of end-users values may vary depending on the end-users domain context (Obie et al., 2021b).
Furthermore, Mougouei (Mougouei, 2020) proposed a framework for accounting for human values at the level of source code. This framework established a relationship between human values and Android APIs and includes the following aspects: annotating APIs with the relevant human values, inspecting source code to detect potential sources of values violations, and recommending fixes to mitigate the violations. Building on the work of Mougouei (Mougouei, 2020), Li et al. (Li et al., 2021) proffered 6 algorithms for detecting potential violation of values in 6 Android APIs. Their analysis applying these algorithms to 10,000 Android apps shows a correlation between the violation of human values and the presence of viruses in these apps.
As these studies have shown, the reflection of, support, and violation of human values may impact individual end-users and society as a whole. The research area of human values in SE is still in its early stages, and more work needs to be done. However, we present this work to further the discussion of human values in SE from the perspective of software developers as captured in their Stack Overflow posts.
|Self-direction||Independent thought and action - choosing, creating, exploring|
|Stimulation||Excitement, novelty, and challenge in life|
|Hedonism||Pleasure or sensuous gratification for oneself|
|Achievement||Personal success through demonstrating competence according to social standards|
|Power||Social status and prestige, control or dominance over people and resources|
|Security||Safety, harmony, and stability of society, of relationships, and of self|
|Conformity||Restraint of actions, inclinations, and impulses likely to upset or harm others and violate social expectations or norms|
|Tradition||Respect, commitment, and acceptance of the customs and ideas that one’s culture or religion provides|
|Benevolence||Preserving and enhancing the welfare of those with whom one is in frequent personal contact|
|Universalism||Understanding, appreciation, tolerance, and protection for the welfare of all people and for nature|
3.2. Mining of Stack Overflow Posts
Question and Answer (Q & A) websites such as Stack Overflow are a rich source of information and provide insights into understanding developers’ behaviours, interactions, and viewpoints on specific topics amongst others (Treude et al., 2011). Several studies have mined Stack Overflow posts and comments to shed light on key areas. For example, Novielli et al. (Novielli et al., 2014)
focused on the social aspect of Stack Overflow and showed that the emotional lexicons in technical questions have an impact on the probability of obtaining satisfying responses to questions. Similarly, another study introduced multi-label classifiers for classifying the emotions encapsulated in Stack Overflow posts(Cabrera-Diego et al., 2020). Wang et al. (Wang et al., 2013) analysed 100,000 questions from Stack Overflow to understand developer interactions on the platform. Key findings from their study show that developers are keen on contributing to the community and not just getting their questions answered; developers extend a helping hand to others whether or not their gesture is reciprocated. Similarly, another study found that being prompt and being the first to respond to questions helps quickly build a reputation on Stack Overflow (Bosu et al., 2013).
Other studies on Stack Overflow have focused on specific technical topics, e.g., Bangash et al. (Bangash et al., 2019)
Some studies have relied on the rich dataset from Stack Overflow to develop tools for supporting software development. For instance, Ponzanelli et al. (Ponzanelli et al., 2014) introduced PROMPTER, an Eclipse plugin. Given a context in the IDE, PROMPTER automatically retrieves and analyses relevant discussions from Stack Overflow and then notifies the developer about available help. To support automatic source code documentation, Vassalo et al. (Vassallo et al., 2014) proposed CODES, a tool that extracts candidate method documentation from Stack Overflow discussions and creates Javadoc descriptions from it.
The studies discussed above have been vital in understanding the various themes discussed on Stack Overflow and have also shown Stack Overflow as a rich data source for understanding these varied themes in discussions related to developers and the software development practice. We build on this body of knowledge by investigating developers’ discussion from the important lens of human values and how it affects society.
4. Data Collection
To understand human values violations in Stack Overflow and the possible reactions of its users to such violations, we first needed to identify a sufficient number of posts that have a values implication (e.g., violating a human value) (See Figure 1). In the first data collection step, we executed a random SQL SELECT query on the Stack Overflow publicly available dataset111https://tinyurl.com/4c74uz5n with Google BigQuery. This dataset, hosted through the Google Cloud Public Dataset Program 222https://tinyurl.com/y2x9rzch, is updated weekly by Stack Overflow and contains information about posts, comments, and voting, among other kinds of site activity. The SQL SELECT statement returned a random sample of 10000 comments and their corresponding posts, of which the first four authors (the analysts) individually manually analyzed the first 300. We imported the ID and content of these 300 comments and their corresponding posts into a spreadsheet. This spreadsheet was then shared between the analysts. The spreadsheet included 10 columns to enable the analysts to indicate which of the ten human value categories they judged were violated by the corresponding posts. The analysts then read each comment and its associated post (question or answer) to identify which of the ten value categories in Schwartz’s theory (if any) were violated in the parent post according to the comment.
Once the analysts finished this labeling process, they held several meetings and used a negotiated agreement method (Campbell et al., 2013; Morrissey, 1974) to resolve any disagreements and conflicts. Using the negotiated agreement method, all analysts collaboratively agreed on the label (coding) of an item under review. This approach is particularly useful for addressing reliability issues of codes when there are multiple categories as opposed to a binary category where a Cohen’s Kappa measure would suffice. At the end of this step, we found a very low prevalence of comments (8 of 300 comments) that raised concerns about at least one of Schwartz’s value categories.
In the second step, we designed a query to identify more posts that may potentially contain human values violations. To do so, we developed a list of regex keywords and phrases that are likely to be associated with human values violations concerns, such as “moral”, “ethical”, “human-cent”, and “society”. This list was designed based on our observations from the 300 comments and their corresponding posts labeled in the first step and consulting a dictionary of human values-related keywords and phrases developed in (Obie et al., 2021a) for identifying human values violations in app user reviews. This list was adjusted over several query runs, reducing to 21 regex keywords, and was used to design a SQL SELECT query to identify comments that contained one of the keywords in the list. The SQL SELECT query returned 10144 comments. Note that keywords have also been used in previous studies (e.g., (Zhang et al., 2019b; Obie et al., 2021a)) to fine-tune data collection and minimize false positives. The list of 21 keywords and phrases is available in our replication package (Krishtul et al., ).
In the final step, we randomly selected 2000 comments from 10144 comments, which is well above a significance level of 99% and a significance interval of 3%. We use these 2000 comments and their associated posts (1980 unique posts) to answer our research questions.
5.1. RQ1. Which types of human values are perceived to be violated in Stack Overflow posts?
Approach. To identify human values violations in Stack Overflow, we qualitatively analyzed 2,000 comments and their associated posts (1,980 unique questions or answers) collected in the Data Collection section (See Section 4). We used Schwartz’s theory of basic values (Schwartz, 2012), specifically Schwartz’s ten value categories (self-direction, stimulation, hedonism, achievement, power, security, conformity, tradition, benevolence, universalism), as a reference point to identify human values violations in Stack Overflow posts. This decision was made because Schwartz’s theory of basic values is the most widely used and cited values model in social science and software engineering (Obie et al., 2021a; Perera et al., 2020a; Schwartz, 1992). The 10 value categories, in turn, comprise 58 value items. In this study, we focus on the 10 value categories. Note that the treatment of the 58 individual value items is beyond the scope of this work. However, where appropriate, we refer to the relevant value items associated with the value categories to increase the clarity of the results.
We first created a spreadsheet and shared it with all authors. The spreadsheet included 15 columns. The first four columns recorded the Comment ID and the content of the 2000 comments and their associated post ID (question ID or answer ID) and the link to the posts. The next ten columns were the ten value categories. We also added a column called “remark” to allow the analysts to point out what they thought was important about a given comment/post.
The data analysis process was conducted in two steps. In the first step, the first author (the first analyst) followed an iterative process to label the first 1,000 comments and their associated posts. The first analyst selected approximately 100 comments and their associated posts and labeled them in each iteration. The reason behind investigating the associated posts was to thoroughly understand the context, meaning, and rationale behind comments. The first analyst was asked to indicate whether a comment discussing its associated post violated at least one human value. If so, they had to specify which of the ten value categories the given comment violated and put “1” in the corresponding columns in the spreadsheet. Comments could be labeled as violating more than one category of human value. After each iteration, three other authors (the validators) cross-checked the comments labeled in that iteration. In total, 400 comments (out of the 1,000 comments labeled by the first analyst) and their associated posts were cross-checked by the first validator, 400 by the second validator, and the rest by the third validator. This distribution was based on the availability of the validators. A negotiated agreement method (Campbell et al., 2013; Morrissey, 1974) was used to resolve conflicts and disagreements between the first analyst and the validators. The validators had extensive experience in human values and software engineering.
In the next step, the second analyst (the fifth author) labeled the rest of the comments (1,000 comments) and their corresponding posts. The second analyst conducted an iterative labeling process similar to the first analyst for this purpose. Then, the three validators in the previous step and the first analyst (acting as the fourth validator in this step) cross-checked the comments labeled by the second analyst in each iteration. In total, the first and second validators cross-checked 300 comments each, the third validator checked 100 comments, and the fourth validator cross-checked the rest. Similar to the previous step, the second analyst held several meetings with the validators to resolve disagreements and conflicts using the negotiated agreement method (Campbell et al., 2013; Morrissey, 1974).
Results. Our analysis of 2,000 Stack Overflow comments and their associated posts (1980 unique questions and answers) indicates that 315 comments (15.75%) complained their corresponding posts (313 unique posts) violated human values (See our replication package (Krishtul et al., )). Out of 10 Schwartz theory’s value categories, we only found violations related to self-direction, hedonism, security, conformity, tradition, benevolence, and universalism in the 315 comments. Our analysis did not find any violations regarding power, achievement, and stimulation. The vast majority of the comments (270 out 315) include concerns that explained their corresponding post violated the value of hedonism. An example of a comment highlighting the violation of the value of hedonism (value item of pleasure) is:
“Interesting approach and a possible solution, but not very user-friendly. The user won’t see the proper day values when he is moving the slider (unless I write another JS function to take care of that).” Comment ID: 35155704
The value of benevolence is the second most frequently violated value (reported in 41 comments). For example, a Stack Overflow user raised the issue of the violation of benevolence (value item of responsibility) and criticized the irresponsibility of another user because their proposed solution does not care about the sensitivity of information but only about the cost of the solution.
“What I meant was that when it comes to sensitive information, your attitude of “SSL is too expensive” is unethical and irresponsible. When you have sensitive information in your hands, you must do everything you can to secure it. You say that you don’t see its worth in “common business cases”, but “common business cases” often involve sensitive information (addresses, phone numbers, email messages, trade secrets, etc). A business has much to lose in a breach of data integrity.” Comment ID: 4919033
We found only 10 comments in which Stack Overflow users complained that the proposed solutions violate security.
“Still, passwords are private to the person that fills it in during registration. Not encrypting them is not very ethical, but I guess that’s another subject to discuss.” Comment ID: 3334429
Eight comments were found complaining about violating conformity. In the following example, a user criticized the poster as their approach to terminate an app (proposed in a question, Question ID: 3318806) violates Apple’s User Interface standards and guidelines.
“I think your app may get rejected if you terminate it within the app (unless due to unrecoverable error/fault handling). Apple doesn’t like you to mess with their user experience, and pressing the Home button to exit/suspend an app is a big part of that user experience.” Comment ID: 3442304
For self-direction and universalism, we found five and four examples of human values violation, respectively, and two for tradition. For example, the comment below is mapped to violation of the value of self-direction as its associated post violates the freedom and independence of end-users (i.e., freedom and independence are two value items in the value category of self-direction).
“Not to mention it strikes me as ethically dicey at best to grab a user’s location without their permission.” Comment ID: 16973544
5.2. RQ2. How quick are commenters to raise concerns over human values violations?
Approach. To answer this RQ, we extracted the date of the 315 comments about values violations and the date of their corresponding posts and calculated the time difference. To further understand how quickly comments about values violations are raised, we compared the position of a post’s comment that voices out about values violation with that of all of its associated comments.
Results. Figure 2 depicts the time differences in hours and days. It is shown that almost 55% of comments (173 out of 315) about values violations were received less than one hour after the corresponding posts were made. Our analysis shows that only 59 comments were raised 24 hours after the date of their corresponding posts. As shown in Figure 3, comments that raised concerns about human values mostly were the first or second comments (202 out of 315 comments, 64.12%) of their corresponding questions or answers. Only 25 comments about values violation appeared after the sixth comment.
5.3. RQ3. Are posts accused of violating human values downvoted by Stack Overflow users?
Approach. We developed an SQL Query to count the downvotes that were cast on the 313 unique posts associated with the 315 human values violation comments following the creation of those comments.
Results. We found that Stack Overflow users did not vote down most posts accused of violating human values violation. In fact, out of 313 posts with human values violation comments, only 74 (23.64%) posts were downvoted after comments complaining about human values violations were raised. Out of these 74 downvoted posts, most were downvoted once (46 posts), followed by twice (10 posts) and three times (10 posts). The rest were downvoted four times (7 posts) and five times (1 post).
For example, a user asked the question (Question ID: 6854611): “How to return value when AJAX request is succeeded.” The accepted answer suggested setting parameter async to false to address the problem. However, a Stack Overflow user criticized this solution because it negatively impacts user experience. The accepted answer received three downvotes after the comment was made.
“Async: false will LOCK UP THE BROWSER for the duration of the ajax call. This is often a horrible user experience. The good solution requires refactoring the code to use a callback or a function call from the success/error functions to continue execution of the process and pass on the result of the ajax call.” Comment ID: 8151749
In another example, a user sought a solution to automatically turn on the user phone’s GPS as the given android app launches (Question ID: 17723347). Another user indicated that such solutions violate user privacy. Such solutions can also violate the self-direction value (specifically, the value item of independence) as it denies the independence of the user to choose and set their security and privacy settings. This question got downvoted once.
“You can’t and you shouldn’t - it’s ethically questionable to override something the user has set that has security/privacy considerations.” Comment ID: 25833312
5.4. RQ4. How does the original poster react to concerns of potential human values violations?
5.4.1. RQ4.1 Does the original poster respond to concerns about potential human values violations in further comments? If so, how?
Approach. To understand if and how the original poster responds to a comment raising concerns about human values violation in a post, we conducted a qualitative study on all comments made by the original posters after the date of the 315 human value violations comments. We found that the original posters added 288 comments after the 315 human value violations comments. The first author applied the open coding technique (Glaser et al., 1968) on all the 288 posters’ comments to decide which one(s) were added to respond to the reported human value violations and categorized the original posters’ responses (e.g., denying the post violates a human value). To reflect the distribution of original posters’ reaction on Stack Overflow, we limited tagging to at most one response comment one per human values violation comment; this way, each original poster’s response to their associated human values concern would be counted once, if, e.g., the poster created multiple comments after the concern was raised. In the next step, the outcomes of the open coding process (i.e., identified codes and categories) were shared with the second author for review. Then, the first and second authors held several Zoom meetings to discuss disagreements and inconsistencies and reached a consensus on the final list of codes and categories.
Results. Our analysis shows that out of the 288 comments added by original posters after a comment citing a human values violation, 140 did not address the violation concern. The remaining 148 comments addressed the human values violation concern directly, but the exact nature of the responses varied widely. The qualitative analysis of these 148 comments using open coding led to their classification as one of six response types.
The first four response types - namely “Acknowledging their violation”, “Adding redemptive detail”, “Proposing an alternative”, and “Asking for clarification” - were all responses, in which the original poster was receptive to commenter’s concern about human values violations in their post. Receptive responses totaled 105 of the 148 relevant responses.
Acknowledging their violation (n=44): In these responses, the original poster accepted the commenter’s concern wholly as described. This was often accompanied by an attempt to mitigate the violation, either by adapting their original proposed solution or by abandoning their approach altogether.
“@David, I think you are right. The user experience of editing a Rich TextBox within a DataGridView is rather bad, so I followed your hint on providing a separate edit mask. However, I’m also stuck here. See: http://stackoverflow.com/questions/10224556/how-to-edit-a-dataset-in-a-new-form.” Comment ID: 13134970 replied to Comment ID: 13112316
Adding redemptive detail (n=40): In this type of response, the original poster added further information about the context and/or requirements of their solution to explain why their post does not violate the human value claimed by the commenter.
“I need to do it. It is a special app that is meant to change desktop background, get system settings, disable icons, auto-hide task bar etc. It is used in one of the computer stores to display computer sale sticker directly on the screen.” Comment ID: 9302848 replied to Comment ID: 9302833
Proposing an alternative (n=13): In these responses, the original poster responded with an alternative solution in an attempt to mitigate the violation while still achieving their objective.
“In that case then you can queue users selection and updated it one by one, or if it is possible to stop the current operation, stop it and start the new fresh thread.” Comment ID: 28900232 replied to Comment ID: 28887851
Asking for clarification (n=8): In these responses, the original poster asked for further information about the commenter’s concern. These responses reflected a willingness to engage with the concern raised and to further inquire as to whether their solution is violating human values.
“@JPReddy: Sorry, I didn’t understand your problem, the changing must affect only the ComboBox control, so it must not affect any other cell in this column or other column, so can you explain more.” Comment ID: 4324670 replied to Comment ID: 4324160
The remaining two response types - “Denying” and “Conceding and pressing on with the issue” - were dismissive, rather than receptive, to the human values concerns raised earlier in the comment thread.
Denying (n=28): In these responses, the original poster denied that their post violated human values at all.
“@Stephen: No. ‘Expensive’ isn’t ‘unethical’. It’s just free market economy. The school didn’t have to give the job to that guy. It CHOSE to do so. They could always look for alternatives and choose the cheaper offer.” Comment ID: 1749489 replied to Comment ID: 1748956
Conceding and pressing on with the issue (n=15): In these responses, the original poster wholly accepted that there was a human values violation in their post but asserted that they were nonetheless going to persist with their approach.
“Preventing me from viewing a site simply because I’m on desktop because they want to feed me more ads and JS spam isn’t particularly ethical either. Oh well..” Comment ID: 59368548 replied to Comment ID: 59304750
5.4.2. RQ4.2 Does the original poster modify their original post in light of concerns about human values violations? If so, how?
Approach. Some original posters may go beyond a comment response to a values violation claim, and may modify their original post itself in light of the claim. We first collected posts associated with the 315 comments which cited human values violations. Then we checked how many of them were modified after those comments were added. We found that 103 posts had at least one modification made on or after the time that the human values violation comment was created. Next, the first author manually checked all activities carried out on the 103 posts after the time of the 315 human value violations comments to understand which ones were related to the reported values violations.
Results. Out of 103 posts modified by its poster after the reported human values violations, only 14 posts were edited by the original poster in response to the claim about human values violations. The rest (89 posts) were modified for other reasons. These 14 posts can be grouped into two categories. Nine posts were edited to mitigate human value violations. For example, an original poster changed the source code in their post (Post ID: 15135545) after receiving criticism from a user who indicated the proposed solution “will cause the GUI to completely freeze, which is not a good user experience” Comment ID: 21307680.
In the other category of edit, the original poster edited their post to reinforce that no human values violation existed - this was true of five posts. For example, although an original poster edited their post (Post ID: 16660109), the solution in the post was not modified, and the original poster only added more information to clarify why there was no user experience issue with their solution.
6. Discussion and Implications
6.1. Prevalence of Hedonism Violations
The distribution of the types of values violations in our dataset was largely dominated by hedonism violations. The Schwartz model defines the hedonism value category as “pleasure or sensuous gratification for oneself”, and associated with it are the value items pleasure, enjoying life, and self-indulgent (Schwartz, 2012). As such, any comment in the dataset which voiced concern about a post’s negative impact on users’ pleasure and experience would be tagged as a hedonism violation. Moreover, the value of a programming solution is largely determined by how the user feels when interacting with it, and user experience requirements are typically included within the technical requirements for software projects. Thus, when commenters voice concerns about a violation of hedonism, they may be addressing the technical requirements of the software rather than incidental, unintended consequences on human values. As a result, any objections on technical user experience grounds to a solution would be considered a complaint about a violation of the hedonism value category.
Implications: This finding raises the need to re-think the paradigms used for analyzing human values discourse in software contexts. When the concern for human values falls directly in line with software technical requirements –as in the case of hedonism– it is important to account for the distinction of this dynamic from other cases where human values come at the expense of satisfying and delivering technical requirements. We claim that values conforming to technical goals is precisely the objective of values discourse: the more the software community voices their opinions about the importance of human values, the more those values become a fixed feature of software quality. Regardless, the dynamics between human values and technical requirements should be directly addressed in further research in these fields.
6.2. Reactions to Posts Violating Values
We investigated the reactions to posts that violated human values from the perspectives of Stack Overflow commenters, users, and the original posters. We found that commenters are quick in terms of raising concerns about human values violations. However, our study shows that Stack Overflow users did not downvote most posts (76.35%) accused of violating values. This may not be surprising as Stack Overflow users are encouraged to use upvotes and downvotes to report on the quality of the information in posts. A downvote on a question means “this question does not show any research effort; it is unclear or not useful”, and, on an answer, “this answer is not useful” (43). As such, downvotes are not intended to reflect perceived violations of human values per se. A question post that is well-crafted and thought-out, while containing blatant human values violations, could avoid any downvotes, and so too with an answer post. Conversely, any downvotes on a post containing a human values violation may have nothing to do with the values violations in question but rather with the quality and usefulness of the post.
Nevertheless, it is useful to observe Stack Overflow users’ actual behavior in casting downvotes. Our results indicate that downvotes are, to some extent, cast in the wake of a human values violation being voiced in the comments. Indeed, it is possible that users are employing downvotes as a way to cast disapproval in light of perceived values violations, despite official site guidelines; how well users comply with site guidelines in their site activity is a major topic of discussion on the Meta Stack Exchange site (45).
Implications: Further research is needed to measure the significance of our findings on downvotes in human values contexts against general patterns of downvoting on Stack Overflow posts. This would shed light both on users’ reactions to perceived values violations on Stack Overflow and their regard for site voting guidelines in general.
6.3. Reactions to the Accusation of Human Values Violation in Posts
We observed while the original posters usually acknowledge commenters who criticize their posts, they tended to downplay the severity of the issue, either by minimising the impact of the violation, or by justifying their decision in spite of the severity of the issue. While being receptive is a good characteristic, we emphasize that the original posters need to mitigate values violations in their posts actively. This can potentially avoid the possible risks that such posts may have on the end-user and society. There is also the need for (Stack Overflow) developers to consider human values and the potential violation of these values in the technical solutions that they proffer in these platforms. We also recommend the consideration of other currently less investigated human values such as achievement, tradition, and conformity (as categorised by Schwartz (Schwartz, 1992)) beyond the well-researched values of privacy and security.
Implications: Investigating the nature of values discourse on Stack Overflow could find relationships between the language and format used to voice concerns about value violations and the types of reactions they evoke from authors of the software in question. This would allow researchers to understand what makes users more or less receptive to criticism when it comes to the implications of their work on human values.
6.4. Towards an Automated Tool
Given our study is an exploratory study, we chose a manual approach to identify comments containing concerns about human values violations and categorize the response comments from the original posters to such values violations comments. This limited our sample size, as manual methods become more time-consuming as the amount of data scales up.
Implications: Automated approaches using machine learning and natural language processing methods should be developed to detect comments raising concerns about human values violations. In this study, we mainly used the contents of comments under posts to recognize posts violating values. AI-based techniques could leverage other features (e.g., votes, response comments by the original poster, the reputation score of the commenter and poster) to detect such posts and the types of values violated by these posts. Such automated methods will inform Stack Overflow users of a possible values implications of a post and let them decide if they want to use solutions proposed in the post in their software systems (Nurwidyantoro et al., 2021; AlOmar et al., 2021).
7. Threats to Validity
External Validity refers to what extent our findings can be generalized to other contexts (Wohlin et al., 2012). This study collected and analyzed a random sample of only 2000 comments and their associated posts (questions or answers) from a dataset of comments and their associated posts described in Section 4. So, we acknowledge that our findings may not be generalized to all posts and comments in Stack Overflow and other question and answer websites such as Reddit333https://www.reddit.com/ and Gitter444https://gitter.im/. Further research needs to be conducted to explore how posts in other question and answer websites violate human values and how their users react to such violations.
Internal Validity is defined as threats that may have impacted our findings (Wohlin et al., 2012). The random selection of the 2000 comments from a dataset of 10144 comments and the qualitative analysis processes conducted for RQ1 and RQ4 may have threatened our findings. First, our decision to build a dataset of 10144 comments from millions of comments in Stack Overflow was motivated to reduce the number of false-positive comments as much as possible. Furthermore, it was not possible for us to manually analyze all 10144 comments. Hence, we analyzed a random sample of 2000 comments and their corresponding posts. Therefore, we may have missed some important types of value violations because of our dataset in Section 4 and the random selection of the 2000 comments from the dataset. Furthermore, some of the phrases used to build the dataset are particularly hedonism-related terms, which may have allowed the hedonism
category to be over-represented in our dataset, resulting in a skew towards hedonism comments in RQ1.
The qualitative analysis processes to answer RQ1 and RQ4 might be subjective and error-prone. In RQ1, we employed two approaches to reduce these issues. First, the assigned analysts were asked to analyze the data iteratively (in each iteration, only 100 comments and their corresponding posts were analyzed by the analyst). Second, once each set of 100 comments and their associated posts were analyzed, the validators cross-checked these comments labeled by each analyst. Furthermore, several meetings were organized between the analysts and validators to resolve disagreements and conflicts using the negotiated agreement method (Campbell et al., 2013; Morrissey, 1974). In RQ4, the second author checked all categories and their corresponding codes, and any disagreements and conflicts were resolved through meetings. In both RQ1 and RQ4, there were comments and their associated posts that made it difficult for us to precisely identify the type of human values violations (RQ1) or the type of the poster’s responses (RQ4). In such cases, we labeled the comment as a non-human values comment (RQ1) or as an irrelevant response (RQ4) to avoid possible risks and mistakes. Hence, we can be reasonably confident that our findings are credible with minimum mislabelled comments.
Construct Validity. In RQ1, our decision to use ten value categories in the Schwartz theory may have introduced two threats. First, other values models such as Rokeach’s Value Survey (Rokeach, 1973) and List of Values (Kahle and Kennedy, 1988) with different types and numbers of values could be used instead of the Schwartz theory. While none of them have been developed for software engineering, the Schwartz theory is widely used in software engineering (e.g., (Perera et al., 2020a; Nurwidyantoro et al., 2021)). Furthermore, the definition of human values might have been vague for the analysts and validators. So, they might have struggled to map Stack Overflow comments to human values. To mitigate this threat, apart from reading seminal papers (Schwartz, 1992, 2012) on the Schwartz theory, we consulted the previous research that leveraged (the definition of) human values in the software engineering context, e.g., app reviews (Obie et al., 2021a; Shams et al., 2020), GitHub issue discussions (Nurwidyantoro et al., 2021), and source codes (Mougouei, 2020). Finally, the quantitative measures used in RQ2, RQ3, and R4 to determine the characteristics of comments citing values violations or reactions to posts being accused of values violations would not capture all aspects of these comments and posts. Future research is encouraged to further characterize these comments and posts.
8. Conclusion and Future Work
In this work, we conducted an exploratory study investigating the potential violation of human values in Stack Overflow. Adopting the widely accepted Schwartz model of basic human values, we analyzed 2,000 Stack Overflow comments and their associated posts (1980 unique questions or answers) to identify posts and comments containing perceived human values violations, the categories of the values violated, and the reactions of Stack Overflow users to concerns related to these violations. Our results show that 315 (out of 2,000) comments raised issues concerning the violation of 7 out of the 10 value categories in the Schwartz model. We find that Stack Overflow commenters react quickly to issues of values violations; 203 (out of 315) comments raising the concerns of values violations were made less than 2 hours after the corresponding posts. Also, most posts (76.35%) accused of human values violation did not get downvoted at all. Furthermore, only 148 of the original posters responded to the concerns of values violations made by other commenters in follow-up comments of their own.
In the future, we plan to build upon our exploratory study by diving deeper into specific value categories and their associated value items to understand the different factors that cause their violations. In addition, due to the limitations of a manual approach to categorizing values and their violations, we plan to build machine learning models to automate this process.
Support for this work from ARC Laureate Program FL190100035 and Discovery Project DP200100020 is gratefully acknowledged.
-  (Website) External Links: Cited by: §1.
- Fairness-aware programming. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 211–219. Cited by: §1.
- Finding the needle in a haystack: on the automatic identification of accessibility user reviews. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–15. Cited by: §6.4.
- Mining questions asked by web developers. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 112–121. External Links: Cited by: §3.2.
- What do developers know about machine learning: a study of ml discussions on stackoverflow. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), Vol. , pp. 260–264. Cited by: §3.2.
- The human and ethical aspects of big data. IEEE software 31 (1), pp. 20–22. Cited by: §1, §1.
- Building reputation in stackoverflow: an empirical investigation. In 2013 10th Working Conference on Mining Software Repositories (MSR), Vol. , pp. 89–92. Cited by: §3.2.
- Classifying emotions in stack overflow and jira using a multi-label approach. Knowledge-Based Systems 195, pp. 105633. External Links: Cited by: §3.2.
- Coding in-depth semistructured interviews: problems of unitization and intercoder reliability and agreement. Sociological Methods & Research 42 (3), pp. 294–320. Cited by: §4, §5.1, §5.1, §7.
- Developing a meta-inventory of human values. Proceedings of the American Society for Information Science and Technology 47 (1), pp. 1–10. Cited by: §1, §3.1.
-  Comment everywhere. Stack Overflow. External Links: Cited by: §2.
-  (2021) Facebook apologizes after a.i. puts ‘primates’ label on video of black men. The New York Times. External Links: Cited by: §1.
- Values-first se: research principles in practice. In 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C), pp. 553–562. Cited by: §1.
- Ethics. the internet encyclopedia of philosophy. issn 2161–0002. Cited by: §1.
- Fairness testing: testing software for discrimination. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pp. 498–510. Cited by: §1.
- The discovery of grounded theory; strategies for qualitative research. Nursing research 17 (4), pp. 364. Cited by: §5.4.1.
- Acm code of ethics and professional conduct. Cited by: §1.
- Embedding stakeholder values in the requirements engineering process. In International Working Conference on Requirements Engineering: Foundation for Software Quality, pp. 318–332. Cited by: §1.
- Human values in software engineering: contrasting case studies of practice. IEEE Transactions on Software Engineering (), pp. 1–15. Cited by: §3.1.
- Amazon doesn’t consider the race of its customers. should it?. Bloomberg L.P.. External Links: Cited by: §1.
- Using the list of values (lov) to understand consumers. Journal of Services Marketing. Cited by: §7.
- Using and asking: apis used in the android market and asked about in stackoverflow. In Social Informatics, A. Jatowt, E. Lim, Y. Ding, A. Miura, T. Tezuka, G. Dias, K. Tanaka, A. Flanagin, and B. T. Dai (Eds.), Cham, pp. 405–418. External Links: Cited by: §3.2.
-  The replication package of the paper. Zenodo. Note: October, 2021 External Links: Cited by: §4, §5.1.
- A first step towards detecting values-violating defects in android apis. External Links: Cited by: §3.1.
- Sources of error in the coding of questionnaire data. Sociological Methods & Research 3 (2), pp. 209–232. Cited by: §4, §5.1, §5.1, §7.
- Engineering human values in software through value programming. Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops, pp. 133–136. Cited by: §3.1, §7.
- Towards discovering the role of emotions in stack overflow. In Proceedings of the 6th International Workshop on Social Software Engineering, SSE 2014, pp. 33–36. External Links: Cited by: §3.2.
- Human values in software development artefacts: a case study on issue discussions in three android applications. Information and Software Technology, pp. 106731. Cited by: §1, §6.4, §7.
- A first look at human values-violation in app reviews. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS), pp. 29–38. Cited by: §1, §3.1, §3.1, §4, §5.1, §7.
- Does domain change the opinion of individuals on human values? a preliminary investigation on ehealth apps end-users. External Links: Cited by: §3.1, §3.1.
- A study on the prevalence of human values in software engineering publications, 2015-2018. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE), pp. 409–420. Cited by: §3.1, §5.1, §7.
- Continual human value analysis in software development: a goal model based approach. In 2020 IEEE 28th International Requirements Engineering Conference (RE), pp. 192–203. Cited by: §1.
- Mining stackoverflow to turn the ide into a self-confident programming prompter. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 102–111. External Links: Cited by: §3.2.
- The nature of human values.. Free press. Cited by: §1, §7.
- Universals in the content and structure of values: theoretical advances and empirical tests in 20 countries. In Advances in experimental social psychology, Vol. 25, pp. 1–65. Cited by: §1, §3.1, Table 1, §5.1, §6.3, §7.
- An overview of the schwartz theory of basic values. Online readings in Psychology and Culture 2 (1), pp. 2307–0919. Cited by: §1, §5.1, §6.1, §7.
- Operationalizing human values in software engineering: a survey. arXiv preprint arXiv:2108.05624. Cited by: §1.
- Society-oriented applications development: investigating users’ values from bangladeshi agriculture mobile applications. In 2020 IEEE/ACM 42nd International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS), pp. 53–62. Cited by: §1, §3.1, §7.
- Measuring bangladeshi female farmers’ values for agriculture mobile applications development. In 54th Hawaii International Conference on System Sciences, HICSS’21, pp. 1–10. Cited by: §3.1.
- Value-based requirements engineering: method and experience. Requirements engineering 23 (4), pp. 443–464. Cited by: §1.
- How do programmers ask and answer questions on the web? (nier track). In Proceedings of the 33rd International Conference on Software Engineering, ICSE ’11, pp. 804–807. External Links: Cited by: §3.2.
- CODES: mining source code descriptions from developers discussions. In Proceedings of the 22nd International Conference on Program Comprehension, ICPC 2014, pp. 106–109. External Links: Cited by: §3.2.
-  Vote down. Stack Overflow. External Links: Cited by: §2, §2, §6.2.
- An empirical study on developer interactions in stackoverflow. In Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC ’13, pp. 1019–1024. External Links: Cited by: §3.2.
-  When is it justifiable to downvote a question?. Stack Overflow. External Links: Cited by: §6.2.
- A case for human values in software engineering. IEEE Software 38 (1), pp. 106–113. Cited by: §1, §1, §1, §3.1.
- Advancing the study of human values in software engineering. In 12th International Workshop on Cooperative and Human Aspects of Software Engineering, CHASE ’19, pp. 19–26. Cited by: §3.1.
- Measuring human values in software engineering. In 2018 ACM/IEEE 12th International Symposium on Empirical Software Engineering and Measurement, pp. 1–4. External Links: Cited by: §3.1.
- Experimentation in software engineering. Springer Science & Business Media. Cited by: §7, §7.
- Reading answers on stack overflow: not enough!. IEEE Transactions on Software Engineering. Cited by: §1, §2, §2.
- An empirical study of obsolete answers on stack overflow. IEEE Transactions on Software Engineering. Cited by: §2, §4.