An empirical study of question discussions on Stack Overflow

09/27/2021 ∙ by Wenhan Zhu, et al. ∙ HUAWEI Technologies Co., Ltd. University of Waterloo Queen's University 0

Stack Overflow provides a means for developers to exchange knowledge. While much previous research on Stack Overflow has focused on questions and answers (Q A), recent work has shown that discussions in comments also contain rich information. On Stack Overflow, discussions through comments and chat rooms can be tied to questions or answers. In this paper, we conduct an empirical study that focuses on the nature of question discussions. We observe that: (1) Question discussions occur at all phases of the Q A process, with most beginning before the first answer is received. (2) Both askers and answerers actively participate in question discussions; the likelihood of their participation increases as the number of comments increases. (3) There is a strong correlation between the number of question comments and the question answering time (i.e., more discussed questions receive answers more slowly); also, questions with a small number of comments are likely to be answered more quickly than questions with no discussion. Our findings suggest that question discussions contain a rich trove of data that is integral to the Q A processes on Stack Overflow. We further suggest how future research can leverage the information in question discussions, along with the commonly studied Q A information.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Stack Overflow is a technical question answering (Q&A) website widely used by developers to exchange programming-related knowledge through asking, discussing, and answering questions. The Q&A process on Stack Overflow creates a crowdsourced knowledge base that provides a means for developers across the globe to collectively build and improve their knowledge on programming and its related technologies. Stack Overflow has become one of the largest public knowledge bases for developers with more than 16.8 million questions as of December 2018 stackoverflow2018datadump. A survey shows that retrieving information from Stack Overflow is an essential daily activity for many software developers 7498605.

On Stack Overflow, users can ask, answer, and discuss questions, and each question can receive multiple proposed answers. The user who asked the question (i.e., the “asker”) can decide to mark one answer as accepted, indicating that it resolves their question authoritatively. While ultimately Q&A is the most important activity on Stack Overflow, users can also post comments and/or start chat rooms that are tied to a specific post (i.e., question or answer). In this paper, we refer to comments and chat rooms messages on Stack Overflow as discussions; each discussion is associated with a single question (a question discussion) or proposed answer (an answer discussion). In prior studies, answer discussions were found to be useful in various ways, including providing complementary background information 8906075, as well as highlighting obsolescence 8669958 and security issues chen2019reliable in proposed answers. However, so far there has been no research on question discussions and how they affect the Q&A process on Stack Overflow.

To help understand why it is important to study how question discussions integrate with the Q&A process, we now consider a motivating example. Fig. 1 shows a question titled “Unable to set the NumberFormat property of the Range class111https://stackoverflow.com/questions/10801537/.” Four minutes after the question was asked, another user posted a comment — attached to the question — asking for clarification on the problematic code snippet. A chat room was then created for the asker and the user to continue the discussion in real-time. A consensus was reached in the chat, and the results were summarized and posted as a proposed answer by the user, which the asker designated as accepted. This example highlights how the process of asking and answering questions is enabled by the discussion mechanisms of commenting and chatting, allowing a resolution to be reached quickly. That is, the question discussion can serve as a simple and effective socio-technical means to achieving closure on the question.

Figure 1: An example of the Q&A process involving discussions: (A) a user (the “asker”) asked a question; (B) another user (the “answerer”) started discussing with the asker in the comment thread; (C) the question was further clarified then resolved in the chat room; (D) the content of the comments and chat messages that led to the resolution of the question were summarized as an answer, which was marked as the accepted answer by the asker.

In this work, we use the Stack Overflow data dump from December 2018 stackoverflow2018datadump as our dataset; this dataset contains 33.5 million comments and 1.0 million chat messages. We use this data to explore the nature of question discussions and how they integrate with the crowdsourced Q&A process on Stack Overflow. To make our study easy to follow, we use the following notations to refer to different groups of questions observed within the dataset:

Symbol Meaning # in dataset
Qdisc Questions with comments 9.9 M
Qchat Questions with chat rooms (and comments) 19,527
Qnd Questions with no discussions 6.9 M
Qa Questions with answers 14.6 M
Qd/a Questions with both discussions and answers 8.2 M
Qd/aa Questions with both discussions and accepted answers 4.9 M
Qhd/a Questions with both discussions with “hidden comments”222In Stack Overflow, comments are “hidden” (i.e., elided from view) by default when there are six or more attached to the same question. and answers 1.6 M

Specifically, we investigate and answer three research questions (RQs):

4.1 RQ1: How prevalent are question discussions on Stack Overflow?
We found that question discussions occur in 58.8% of the questions on Stack Overflow. More specifically, 9.9 million questions have comments (i.e., Qdisc) with a median of 3 comments, and 19,527 questions have chat rooms (i.e., Qchat). The popularity of question discussions is also increasing, with the proportion of questions with discussions nearly doubling from 32.3% in 2008 to 59.3% in 2018. Question discussions exist in all phases of the Q&A process on Stack Overflow. In questions that are both discussed and have an accepted answer (i.e., Qd/aa), discussions in 80.0% of the questions begin before the accepted answer was posted. We found that the duration of question discussions can extend beyond the Q&A process: In 29.4% of Qd/aa, question discussions begin before the first answer and continue after the accepted answer is posted; and in 19.9% of Qd/aa, question discussions begin after the question receives its accepted answer.

4.2 RQ2: How do users participate in question discussions?
We found that 20.0% (i.e., 1.9 million) of registered users on Stack Overflow have participated in question discussions, which is comparable to the number of users who have answered questions (i.e., 20.9%). Question discussions allow askers and answerers to communicate with each other directly, enabling fast exchanges on the issues of concern. For questions that have both discussions and answers (i.e., Qd/a), we found that as the number of comments increases, both askers and answerers were more likely to participate in the question discussions. Also, we found that when there are six or more comments present (i.e., Qhd/a), then there is a high likelihood of both askers (90.9%) and answerers (51.3%) participating in the discussions.

4.3 RQ3: How do question discussions affect the question answering process on Stack Overflow?
Question discussions tend to lead to more substantial updates to the body of the original question. For example, a median of 97 characters are added to the question body when the question discussion has a chat room instance (i.e., Qchat). While most other questions have no change in their question body length, a larger proportion of questions with comments are revised, with an increase in the question body length compared to questions with no discussion. Questions with more comments receive answers more slowly, with a Spearman correlation of between the number of comments and the answer-receiving-time for the first answer. However, the answering process takes less time for questions with a small to moderate amount of discussion (i.e., at least one comment but fewer than eight) compared to questions with no discussion.

The main contribution of our study is to identify discussions attached to questions as a common and integral part of the Q&A process on Stack Overflow. We highlight that question discussions occur in a significant proportion (i.e., 58.8%) of questions on Stack Overflow. The number of commenting activities (i.e., 33.5 million comments) are comparable in size to answering activities (i.e., 25.9 million answers) on Stack Overflow. The user base that have participated in discussions (i.e., 20.0% of active users) are also comparable to those who have answered questions (i.e., 20.9% of active users). We observed a strong correlation between the number of comments and the question answering speed, suggesting that question discussions have an impact on creating answers. Moreover, despite the answer-receiving-time of questions with extended discussions are longer, the answering process takes less time for questions with a small amount of discussion compared to questions with no discussion. Our findings suggest that question discussions can facilitate the Q&A process since they provide a means for askers and potential answerers to communicate throughout the Q&A process. We encourage future research on Stack Overflow to consider question discussions in addition to leveraging the information in the questions and answers of Stack Overflow.

Paper Organization. The rest of this paper is organized as follows. Section 2 introduces Q&A on Stack Overflow and commenting/chatting on Stack Overflow. Section 3 describes how we collect data for our analysis. Section 4 details the results of our empirical study. Section 5 discusses our findings and their implications. Section 6 describes threats to the validity of our study. Section 7 surveys related research. Finally, Section 8 summarizes the findings of our study.

2 Background

2.1 The Q&A Process on Stack Overflow

Stack Overflow is a technical Q&A website where users ask, answer, and discuss questions related to programming and software development. Stack Overflow has been widely embraced by the software engineering community, and has become the largest public knowledge base for programming-related questions. There are 16.8 million questions together with 25.9 million answers on Stack Overflow as of December 2018.

The Stack Overflow Q&A process begins with a user posting a question that relates to programming or a similar technical topic. At that point, other users can start to engage either by proposing an answer, or by taking part in a discussion in the form of a comment or a chat room. Discussions can be attached to either the original question (i.e., a question discussion) or one of the proposed answers (i.e., an answer discussion). If a proposed answer successfully resolves the question, the user who asked the original question (i.e., the asker) may at their discretion choose to designate that answer as the accepted answer. Once an accepted answer has been selected, users may continue to contribute to the question thread by adding new answers or editing existing content; in practice, however, user activity related to that question and its answers tends to diminish sharply at that point DBLP:conf/msr/BaltesDT008. We note that the Stack Overflow uses the term post internally to refer to either a question or answer, but not a discussion.

2.2 Discussions on Stack Overflow

In this work, we focus on question discussions to better understand how discussions affect the crowdsourced knowledge sharing activities once a question is posted, especially those that occur early in the Q&A process.

Stack Overflow offers two different forms of communication channels for users to discuss on questions and answers, that is, commenting as an asynchronous communication channel and chatting as a synchronous communication channel. When users are commenting, they may not expect an immediate reply. Meanwhile, when users are chatting, a live session is formed where information flows freely within the group in real-time 7498605. On Stack Overflow, users begin discussions in comments. When extended discussions occur in comments, users are proposed with continuing the discussions in dedicated chat rooms. While commenting is the dominating communication channel on the Stack Overflow for question discussions, whenever possible, we take special notice of the existence of chat rooms since they represent a different form of communication channel compared to comments.

As previously mentioned, users can attach comments to a post (i.e., a question or answer). Stack Overflow considers comments as “temporary ‘Post-It’ notes left on a question or answer.”333https://stackoverflow.com/help/privileges/comment Stack Overflow users are encouraged to post comments “to request clarification from the author; leave constructive criticism to guide the author in improving the post, and add relevant but minor or transient information to a post.” When multiple comments are present in the same post, they form a comment thread.

Stack Overflow offers real-time, persistent collaborative chat for the community444https://chat.stackoverflow.com/faq with chat rooms. Stack Overflow promotes users to continue the discussions in a chat room when there are more than three back-and-forth comments between two users (i.e., at least 6 in total). Users are prompted with a message before a chat room can be created: “Please avoid extended discussions in comments. Would you like to automatically move this discussion to chat?” When the user agrees to create the chat room, an automated comment is posted and contains a link to the newly created chat room. In the newly created chat room, automated messages are posted indicating the associated question and the comments leading to the chat room. Users can also create chat rooms directly that are not associated with questions or answers.

3 Data Collection

In our study, we use the Stack Overflow data dump from December 2018. The data dump is a snapshot of the underlying database used by Stack Overflow; it contains all meta-data for each comment, such as which user posted the comment and which question the comment is associated with. We mainly used the Posts and Comments table from the dataset to extract the required information. The data dump also contains the history of each question, via the PostHistory table. We analyze the history of each question to reconstruct the timeline of when the question was created, edited, commented, and answered.

Data about chat rooms is not contained in the Stack Overflow data dump; instead, we collected it manually by crawling the Stack Overflow website itself555We’ve made our dataset open access on Zenodo: https://zenodo.org/record/5516190. We also labelled the chat room instances based on whether they are general666General chat rooms are standard chat rooms on Stack Overflow that are not associated with a question or an answer., attached to a question, or attached to an answer. During the initial phase of data collection we extracted 26,401 chat rooms that are associated with questions. After cross-referencing their associated question IDs with the Stack Overflow data dump, we removed chat room discussions that are unrelated to programming, such as those on Meta Stack Overflow which focuses on the operation of Stack Overflow itself. This left us with a total of 19,571 chat rooms comprising 1.0 million messages that are associated with 19,527 questions as of June 2019. Figure 2 shows the detailed extraction process of chat rooms from Stack Overflow.

Figure 2: An overview for the creation of Qchat (questions with chat rooms)

4 Case Study Results

In this section, we explore the underlying motivation, the approach taken, and the results of our three research questions (RQs) concerning question discussions on Stack Overflow.

4.1 Rq1: How prevalent are question discussions on Stack Overflow?


Motivation: As a technical Q&A platform related to programming, Stack Overflow hosts a large number of questions Treude:2011:PAA:1985793.1985907. From the user’s point of view, creating an answer can be challenging since the initial version of a question is often incomplete or ambiguous. For this reason, potential answerers may first wish to engage the asker in a discussion to clarify their intent and possibly seek additional context, which is typically done using comments attached to the question. If the discussion proves to be fruitful, the user may then post an answer based on the discussion; also, the asker may decide to edit the original question to clarify the intent for other readers. For example, Example 4.1 shows a comment pointing out an confounding issue in the original question. After the discussions, the asker acknowledged the issue and edited the original question for clarity.

A prior study showed that active tutoring through discussions in chat rooms can substantially improve the quality of newly posted questions by novice users ford2018we. However, it is labor intensive to provide such tutoring with an average of more than 7,000 new questions posted per day on Stack Overflow in 2019. At the same time, there has been no detailed study of question discussions as yet; in this RQ, we explicitly study question discussions to gain a better understanding of their prevalence in the Q&A process.

[sharp corners, title=Example 1] In a comment linked to a question titled: “Write to Excel — Reading CSV with Pandas & Openpyxl - Python.777https://stackoverflow.com/questions/48956597/”, a user observed that the example CSV file given in the question did not follow the CSV standard, and suggested the asker to double check the input format.

Comment:

The structure of the first three lines doesn’t match the structure of lines 5 onwards so you cannot read this file with a CSV library. Please check the provenance of the file and what it should look like. I suspect you probably want to skip the first four lines.


Approach: We begin our study of the prevalence of question discussions by investigating the trend in the number and proportion of question discussions over the years. We distinguish between answered questions with and without an accepted answer to investigate whether there exists a difference between the two groups of questions.

Figure 3: Timeline of question thread events. Question discussions can occur at any time since the creation of a question.

We then study when question discussions occur relative to key events in the Q&A process. After a question is posted on Stack Overflow, several different types of follow-up events may occur, as illustrated by Fig. 3. For example, after a question is posted any of the following can occur:

  • other users can propose answers to the question;

  • users can post comments to discuss either the question or the associated answers;

  • the asker can mark one of the answers as accepted; and

  • the question (and proposed answers) can be edited for clarity.

For each question, we construct the timeline consisting of each event, and we analyze the prevalence of question discussions with respect to other Q&A activities. Here, we focus mainly on two key events: when the question receives its first answer, and when it receives the accepted answer.


Results: Stack Overflow questions are discussed by 33.5 million comments and 1.0 million chat messages, forming a large dataset of community question discussions, in addition to the 16.8 million questions and 25.9 million answers. The proportion of questions with discussions also nearly doubled from 32.3% in 2008 to 59.3% in 2013, and has remained roughly stable since then. Fig. 3(a) shows the number and proportion of questions with discussions per year, and Fig. 3(b) suggests a similar trend for questions with an accepted answer.

(a) All questions
(b) Questions with the accepted answer
Figure 4: The number and proportion of questions with comments

Question discussions occur throughout the Q&A process, ranging from before the first answering event to after the accepted answer is posted. Fig. 5 shows the proportion of question discussions relative to answering events in the Q&A process. The height of the band across each vertical line indicates the proportion of questions with a specific activity occurring in that phases of a question thread’s life cycle. For example, from the left-most bar, all questions can be split into two groups: questions with discussions (Qdisc) and questions without discussions (Qnd). The top band (with strata in blue) represents 58.8% of the questions with discussions and the bottom band (with strata in red) represents 41.2% of the questions without any discussions. Flowing from left to right, the strata in blue and red continue to represent the questions with and without discussions until the right most band where it represent the final answering status of the question.

Figure 5: Question discussion with respect to answering events during the Q&A process. The blue bands represent questions with discussions and the red bands represent questions without discussions.

In Qd/a, 75.4% (i.e., 6.1 million) of the question discussions begin before the first answer is posted, suggesting an influence of question discussions on answering activities. Furthermore, 80.0% (i.e., 3.9 million) of the question discussions begin before the accepted answer is posted, indicating a slightly more active involvement of question discussions in Qd/aa. In answered and solved questions of Qchat, 76.8% (i.e., 11,506) of the chat activities begin before the first answer is received, and 76.6% (i.e., 7,657) of the chat activities begin before the accepted answer is posted.

The early occurrence of question discussions in the Q&A process suggests that they enable interested users to engage with the asker informally, to allow for clarification. For example, in Ex. 5, 13 minutes after the question was initially posted, a user asked for a concrete example that can demonstrate the problem the asker had. The asker then updated the question with the requested information. The question was answered 15 minutes later, incorporating the newly added information based on the discussions.

[sharp corners, title=Example 2] A user comments to ask for information in a question titled “Can I modify the text within a beautiful soup tag without converting it into a string?888https://stackoverflow.com/questions/25869533/

Comment:

UserB: Please give an example html that demonstrates the problem. Thanks. [2014-09-16 13:15]
UserA (the asker): Just added some example html, sorry about that.
[2014-09-16 13:20]

In 29.4% (i.e., 1,424,887) of Qd/aa, the discussions begin before the accepted answer has been received, and continue after the accepted answer is posted. Furthermore, 19.9% (i.e., 967,812) of the question discussions begin after the accepted answer is posted. These findings indicate that the community may continue to discuss questions even after the asker has designated a “best” answer that solves their problem anderson2012discovering. This may be due to the fact that software development technologies tend to evolve rapidly; old “truths” may need to be updated over time, and additional discussions may provide new insights despite the asker considering the question to be solved. Example 5 shows a comment that pointed out a potential security vulnerability in the code snippet 5 years after the initial question is posted.

[sharp corners, title=Example 3] A user posted a comment to warn about a potential security vulnerability 5 years after a question was posted.999https://stackoverflow.com/questions/17690956/

Comment:

Beware. If you’ve configured your Struts application in this particular way (setting ‘alwaysSelectFullNamespace’ to ‘true’), your application is very likely vulnerable to CVE-2018-11776: semmle.com/news/apache-struts-CVE-2018-11776

[title=RQ1 Summary:] There are 33.5 million comments and 1.0 million chat room messages in our dataset, which forms a large corpus of question discussion activities on Stack Overflow. Since the introduction of comments, the popularity of question discussions has nearly doubled from 32.3% in 2008 to 59.3% in 2013 and has remained stable since. The occurrence of question discussions is prevalent throughout the Q&A process. While question discussions in most questions (75.4% in Qd/a and 80.0% in Qd/aa) begin before the answering activities, question discussions can continue or even begin after the accepted answer is posted.

4.2 Rq2: How do users participate in question discussions?


Motivation: The crowdsourced Q&A process on Stack Overflow is driven by user participation. In addition to the questions and answers, question discussions are also part of the user-contributed content on Stack Overflow. In this RQ, we explore how different users participate in question discussions, to better understand how question discussions facilitate the Q&A process.

We focus on two aspects of user participation. First, we investigate the overall user participation in question discussions on Stack Overflow. We note that in RQ1, we observed a high proportion of questions with discussions; here, we focus on the users who participate in question discussions. Second, we change the scope to focus on the question-level discussion participation. We are interested in what other activities that the participating users join in on. For example, did the user ask the question in the first place, or did the user post an answer for the question.


Approach: To study user participation in question discussions and gain an overall idea of the popularity of discussion activities compared to other activities on Stack Overflow, we extract from the data dump the list of all users who contributed content to Stack Overflow. In particular, we sought users who asked, answered, or discussed questions; we note that while other activities, such as voting, may help the community, we do not consider these activities in our study as they do not directly contribute content. We also ignored activity related to answer discussions, as it was outside of the scope of our investigations.

We extracted the unique UserIDs from all questions, answers, and question comments to build the groups of users who participated in each of those activities. We then compared the intersection between the different sets of users to determine which of them participated in multiple types of activities on Stack Overflow.


Results: 1.9 million (i.e., 20.0%) users on Stack Overflow have participated in question discussions. Fig. 6 shows the overlap of the number of users participating in different activities on Stack Overflow. We observe that 95.7% of users who participated in question discussions also asked questions on Stack Overflow, and 93.6% of them answered questions.

Figure 6: The number of users who participate in different types of activities on Stack Overflow, and the number and proportion of users who participate in question discussions.

In 57.7% of Qd/a (i.e., 6.0 million), askers participate in the question discussions and in 33.9% of Qd/a (i.e., 2.8 million), an answerer participated in the question discussion. The involvement of askers and answerers indicate that the two parties often leverage question discussions as a collaboration medium.

We further investigate the trend of the proportion of questions with askers and answerers in question discussions as the number of comments increases. When the number of comments increases, a higher proportion of questions have askers and answerers participating. Fig. 7 shows the trend of the proportion of askers and answerers participating in question discussions as the number of comments increases. When there are at least 6 comments associated with a question (i.e., when Stack Overflow starts to hide additional comments), askers are present in at least 90.9% of the question discussions and answerers are present in at least 51.3% of the question discussions. Moreover, when answerers are present in a question discussion, 78.0% (i.e., 2.2 million) of the answerers and 79.8% (i.e., 1.2 million) of the accepted answerers joined the question’s discussions before posting the answers. The increasing proportion and early engagements of answerers in question discussions suggest that users are actively leveraging the question discussions as a communication channel to facilitate the answering of questions.

Figure 7: The proportion of question discussions with the participation of askers and answerers

[title=RQ2 Summary:] 1.9 million (i.e., 20.0%) users on Stack Overflow have participated in question discussions. These users overlap heavily with users who asked and answered questions on Stack Overflow. In Qd/a, 57.7% of the questions have the asker participating in the question discussion and 33.9% of the questions have an answerer participating in the question discussion. The proportion of questions with askers and answerers participating in question discussions increases as the number of comments increases. When at least 6 comments are present, more than 90.9% of the discussions have askers participating and more than 51.3% have answerers participating. In 78.0% of Qd/a (79.8% of Qd/aa), the answerer (accepted answerer) participated in the question discussion before they posted the answer (accepted answer).

4.3 Rq3: How do question discussions affect the question answering process on Stack Overflow?


Motivation: On Stack Overflow, questions serve as a starting point for curating crowdsourced knowledge. To encourage users to ask high-quality questions, in late 2019 Stack Overflow modified its reputation system to reward more reputation points on upvotes for questions, increasing the points rewarded from 5 to 10101010https://stackoverflow.blog/2019/11/13/were-rewarding-the-question-askers/. As noted previously, a question can have several follow-up answers; also, discussions can be associated with either the question or its answers. Questions (and answers) may be edited and revised by their original author, and this happens commonly.111111Comments may be deleted by their author, but they may not be edited in place. This may be done to reflect new knowledge learned though the Q&A process, and to improve the quality of the posts themselves. In practice, some revisions are editorial or presentational in nature, such as fixing typos and formatting content for readability; however, questions are also edited to improve the quality of the crowdsourced knowledge jin2019what. Baltes et al. DBLP:conf/msr/BaltesDT008 observed that comments have a closer temporal relationship with edits than posts (i.e., a question or an answer), that is, the time difference between comments and post edits are smaller compared to comments and post creations. Typically, this happens for clarification purposes as answers and discussions shed new light on the original problem. For example, sometimes the asker’s question may not include enough technical detail to be easily answered; similarly, the asker may conflate several issues into one posting. In these cases, the asker may seek to clarify the content of their question by adding new context or editing out extraneous details. Also, sometimes new answers emerge to older questions as the accompanying technologies evolve. Thus, it is important to recognize that the question discussions can affect the evolution of the question itself; the question version that appears to a casual reader may have evolved since its original posting.

In this RQ, we study how question discussions are associated with the evolution of questions. More specifically, we study the association between the number of comments and question revisions; we do so to better understand how question discussions affect the evolution of the question content. We also study the association between the number of comments and the answer-receiving-time to explore how question discussions affect the Q&A process.


Approach: To understand how question discussions affect the evolution of questions, we first study the correlation between question discussions and question revisions. Here, we are mainly interested in the scale of question edits in terms of the size of question content change in the question body. Specifically, we calculate the change in the number of characters in the question body between its initial version and the current version. We also categorize all questions into three groups, i.e., questions with no discussions (Qnd), questions with comments (Qdisc), and questions with chat rooms (Qchat). For each question from any category, we calculate the character length difference between the current version of the question and its initial version to investigate how question discussions are associated with the changes in the question content over a question’s lifetime.

To understand how question discussions associate with the speed of question answering, we study the correlation between the number of received comments before answering activities and the answer-receiving-time. Similar to RQ1, here we investigate the answer-receiving-time of two different answering events: the answer-receiving-time for the first answer (i.e., tFA) and the answer-receiving-time for the accepted answer (i.e., tAA). For each question, we compute both tFA and tAA. We then group the questions by the number of received comments before the first answer and accepted answer respectively. Finally, we measure the Spearman correlationspearman1961proof between the number of comments and the median tFA (tAA) for questions with the same number of received comments before the first answer (accepted answer) is posted.


Results: Questions with chat rooms are more likely to be revised than questions without chat rooms, with a median size increase of 97 characters. Questions without chat rooms, on the other hand, do not exhibit a net change in size, although such questions may still receive edits. Thus, the existence of a chat room attached to a question makes it more likely that the question will undergo significant revision. Fig. 8 shows the distribution of questions by the change in question body length after the question is posted, according to different levels of question discussion activities. From the figure, we can observe that while Qnd and Qchat share the same median and modal of zero characters change in question body length, a higher proportion of questions with comments receive revisions that lead to an increase in the question body length.

Figure 8: The distribution of the number of questions to the change in question body character length after the question is posted at different levels of question discussion activity

The answering process takes less time in questions with a small to moderate amount of discussion. When there are 8 or fewer comments for Qd/a (5 or fewer comments for Qd/aa), questions receive their first (accepted) answer faster compared to questions with no discussions. The shorter answering time suggests these discussions are beneficial to the questions, and help the questions to get answered in a shorter amount of time. On the other hand, when the number of comments grows larger, questions receive answers more slowly. Overall, the number of comments is strongly correlated with both tFA (i.e., , ) and tAA (i.e., , ). Fig. 9 shows the median tFA and tAA of questions with respect to the number of received comments before their respected answering events. Questions with many discussions also take a longer time to answer. One possibility is that the difficulty of these questions is also higher, therefore requiring more effort by the users to have an extended discussion before the question can be answered. At the same time, for the answer-receiving-time of Qchat, we find that it takes a median of 5,493.5 secs (i.e., 1.53 hrs) and 7,892 secs (i.e., 2.2 hrs) to receive the first answer and the accepted answer. The answering time follows the same trend of more discussions, i.e., a longer answering time. The strong correlation between the number of comments that a question receives and the answer-receiving-time suggests a close relationship between question discussions and creating answers. Our findings suggest that after a question is asked, interested users may offer help first in comments when an answer can’t be created immediately. Therefore, they begin the Q&A process by discussing with the asker through commenting. This is also supported by our observations in RQ1 and RQ2 where discussions mainly begin before answering and a high proportion of answerers participate in question discussions.

Figure 9: Median answer-receiving-time with respect to the number of comments that are posted before the answer. The median is only calculated for questions with answers and questions with accepted answers respectively.

[title=RQ3 Summary:] Question revisions for Qchat are more likely to lead to larger edits in the question body, with a median increase of 97 characters to the question body. While there is a strong correlation between the number of comments and the answer-receiving-time, the answering process takes less time for questions with a small to moderate amount of discussion compared to questions with no discussion.

5 Implications and Discussions

5.1 Suggestions for future research on question discussions

Question discussions occur at a large scale on Stack Overflow. The collection of comments and chat room messages forms a large corpus that facilitates the Q&A process. Question discussions are also highly participated in by askers and answerers, and most of this discussion occurs before the first proposed answer is posted. The prevalence of question discussions and their clear positive effect on questions being resolved earlier suggests that they play a key role in the overall Q&A process; consequently, any empirical study of the Stack Overflow Q&A process has much to gain by explicitly considering question discussions in their modelling.

Question discussions are found throughout all phases of the Q&A process, from before a question is answered to after a question receives its accepted answer, and even after an answer has been designated as accepted by the asker. Discussions in most questions (i.e., 75.4% of Qd/a and 80.0% of Qd/aa) begin before the first answer is received; also, 19.9% of Qd/aa begin after the question receives the accepted answer. Question answering is a continuous process, and the state-of-the-art technical knowledge under discussion is always evolving, which often leads to the update/obsolescence of information in the posted questions and answers. Therefore, the question discussions throughout different Q&A phases (as shown in Fig. 3) can be used to understand how questions evolve over time. For example, prior studies investigated why questions are not answered Asaduzzaman2013, and the likelihood of code segments posted in questions being compilable Yang:2016:QUC:2901739.2901767; horton2018gistable. To understand the maintainability and quality of questions in general, future research can perform more finely-grained studies of question discussions in different Q&A phases.

Researchers have proposed tools to support developers by leveraging Stack Overflow as a knowledge base cai2019answerbot; uddin2020mining; zagalsky2012example. While, these tools mined the content of questions and answers to retrieve relevant information for developers, they do not leverage the information that is contained in question discussions. In our study, we observe that question discussions can contribute to the creation of answers, thus leaving a trace of how the answer is created. We hope that future research will investigate the process of creating a Stack Overflow question, and propose new approaches to aid in question quality enhancement by leveraging the interactive information in both question discussions and edits.

Not all questions are the same. To answer a question, the properties of the question (such as the difficulty and clarity) can be indicated by the discussions. In our study, we observed that questions with more discussion are answered more slowly. However, despite the positive correlation, questions with a small number of comments (i.e., no more than 8 comments) are answered faster compared to questions with no discussion. While highly discussed questions are answered more slowly, we observe that some of these questions appear to be more difficult to answer or require further clarification. These questions are answered after extended discussions that might involve chat rooms, suggesting a great effort in the answering of these questions. Future work should explore metrics to measure the level of difficulty or need of clarification for a question. Question discussions can be further studied to understand whether a question involves more complex code segments, or was initially ambiguous and later edited for clarity.

5.2 Suggestions to leveraging the question discussions corpus

Stack Overflow uses a gamification system based on reputation and badges to reward users who participate in the Q&A process; for example, upvotes on questions and answers reward the original poster with reputation points. However, at present upvotes for comments do not boost the reputation of the commenter, so their system does not currently reward participation in discussions.121212https://meta.stackexchange.com/questions/17364/ Since so much effort is put into discussions — as evidenced by the presence of 33.5 million comments and 1.0 million chat messages in the 2018 data dump — this seems like a missed opportunity. Stack Overflow could reward those users who, through their participation in discussions, help to clarify, explore, and otherwise improve the questions and answers themselves; our studies here have shown just how influential question discussions can be on the improving the quality of the questions and answers. Rewarding participation in discussions would create a positive feedback loop in the Stack Overflow gamification system, which would in turn encourage more users to engage in discussions.

Stack Overflow’s overwhelming success with the international software development community is due largely to the high quality of its content, in the form of questions and answers with accompanying discussions. However, maintaining the quality and relevance of such a large knowledge base is a challenging task; a recent study found that low quality posts hurt the reputation of Stack Overflow 6976134. Because programming technologies evolve quickly, the detailed information in the questions and answers can become obsolete 8669958 and requires continual updating. For this reason, Stack Overflow allows users to edit questions and answers even after a clear consensus has arisen.

A good piece of shareable knowledge starts with a good question, and Stack Overflow has practices to help ensure high quality questions. For example, when novice users (i.e., users with newly registered accounts) first ask questions, they are led through an interactive guide on how to ask a good question. The guide includes both conventions (e.g., tag the question) and best practices for asking questions (e.g., include what has been attempted to solve the question).

In exploring RQ3, we observed that questions with extended discussions — especially those that continue into a chat room — tend to receive more edits to the question body. We conjecture that question discussions can serve as a feedback loop for the asker, resulting in improvements to the questions through subsequent edits. Our observation also echoes a previous study which shows that tutoring novice users before they post their questions can improve the quality of their question ford2018we. Although Stack Overflow already has a detailed walkthrough on how to ask a good question, we observed that in practice, discussing and revising questions remains commonplace. The discussions and revisions suggest a large effort by the community in addition to providing answers.

We also found that there was a strong correlation between the amount of question discussions and the answer-receiving-time for both the first answer and the accepted answer. In other words, questions with more discussions tend to receive answers more slowly. Questions with more discussions are more likely to have the asker and answerers participating in the discussion. These observations suggest that askers and answerers are spending time together in the question discussions, which aids in the creation of eventual answers. At the same time, crowdsourced Q&A is a labor intensive process; for example, a question may take time to attract the “right” answerers or a question may be hard to understand without clarification. We wonder if a question quality assurance “bot” might be able to leverage the question discussion data and mining the discussion patterns to further support askers in efficiently getting answers through crowdsourced Q&A.

Question discussions offer a means for askers and answerers to communicate with each other during the Q&A process. Currently, chat rooms are triggered automatically once three back-and-forth comments occur between two users. However, there are cases where two users may wish to start a live conversation immediately. For example, traditionally in the open source community, it is suggested to ask urgent questions in an IRC channel to receive an immediate response 

raymond2021how. However, when users do so, the information during the Q&A session will be buried in the IRC chat log. On the other hand, if a user were to ask the question on Stack Overflow, in exchange for not having an instant response, the Q&A information will remain easily accessible by the public. While Stack Overflow already offers chat rooms as a means for instant and real-time communication, currently the chat room triggering mechanism in posting comments is an inefficient communication channel for such need. There exists a potential for users to choose between a synchronous or asynchronous discussion through chat rooms or comments, respectively. For example, Stack Overflow could build in a feature that allows users to indicate if they are available online, and are waiting for an answer. When other users see the indicator, they could directly start discussions in chat rooms, and later update the content of the question based on the discussion. An intelligent communication channel selection bot could be designed to help users seek an effective type of communication by mining the historical data of communication preferences. Furthermore, a content summarization tool could be designed to extract pertinent information from both comments and chat rooms, for future users to better understand the context of the evolution of a question.

6 Threats to Validity

External validity: Threats to external validity relate to the generalizability of our findings. In our study, we focus on question discussions on technical Q&A on Stack Overflow, which is the largest and most popular Q&A platform for programming related questions. As a result our results may not generalize to other Q&A platforms (e.g., CodeProject131313https://www.codeproject.com/ and Coderanch141414https://coderanch.com/). To mitigate this threat, future work can consider studying more Q&A platforms.

Another threat is that the studied Stack Overflow data dump only the current copy of Stack Overflow’s website data. For example, users are allowed to delete their comments, answers, and questions. This means that when users delete their comments, they are expunged from the dataset, and we are unaware of how those comments might have affected the rest of the discussion.

Internal validity: Threats to interval validity relate to experimental errors and bias. Our analysis is based on the data dump of Stack Overflow from December 2018 (the comment dataset) and web crawling in June 2019 (the chat room dataset). Stack Overflow as a dynamic platform is subject to change and the data itself can evolve. Future work can assess our observations on new data and evaluate whether our findings continue to hold over time.

Construct validity: Since the Stack Overflow data dump not include chat room-related data, we mined that data directly from the Stack Overflow website. This means that our crawler and the collected data may be subject to errors (e.g., crawler timeout). We mitigate this issue by manually checking a subset of the collected data and verified the correctness of the scripts.

7 Related Work

7.1 Leveraging Discussions in Software Engineering

During software development, communication between members of the team is important for the long-term success of the project. Online discussions are a core part of the process, especially in open source projects where developers may be scattered around the world and rely on a variety of channels to communicate with each other 7498605. Since the advent of ubiquitous e-mail in the 1980s, developers have used mailing lists for discussions about the projects they are working on and interested in. Studies show that the use of mailing lists facilitates the gathering of people with similar interests, and many open source projects still run mailing lists today 5069488 (e.g., the Gnome mailing list151515https://mail.gnome.org/mailman/listinfo). The mailing list archive is an informative resource for researchers to understand the development of the project. Rigby et al. rigby2007what studied the Apache developer mailing list to learn about the personality traits of developers and how the traits shift during the development of the project. Sowe et al. sowe2006identifying studied three Debian mailing lists and constructed social networks of the mailing list to investigate how knowledge is shared between expert to novice participants.

In addition to the asynchronous email exchanges, developers also use real-time communication channels such as IRC for discussions. IRC channels are often used by open source projects as a complement to their mailing list operations (e.g., the #emacs channel on Freenode exists in addition to the project’s mailing list). Shihab et al. investigated GNOME GTK+ 5069488; shihab2009studying and Evolution shihab2009studying IRC channels to better understand how developers discuss in IRC. Although e-mail and IRC are still in use today, newer and more efficient platforms have also emerged to better support the need for communication. For example, developers report bugs and feature requests on issue trackers (e.g., Jira161616https://www.atlassian.com/software/jira), and ask questions on Stack Overflow vasilescu2014how. Vasilescu et al. vasilescu2014how observed that in the R community, developers are moving away from the r-help mailing list to sites like Stack Overflow in the Stack Exchange network since questions are answered faster there. Prior studies examined different communication channels aiming to better understand and improve the communication among developers. Alkadhi et al. Alkadhi:2017:RDC:3104188.3104240

applied content analysis and machine learning techniques to extract the rationale from chat messages to better understand the developers’ intent and the decision making process during software development. Lin et al. 

Lin:2016:WDS:2818052.2869117 studied the usage of Slack by developers and noticed that bots are in discussions to help software developers.

Storey et al. 7498605 surveyed how developers leveraged communication channels and observed that real-time messaging tools and Q&A platforms such as Stack Overflow are essential for developing software. Dittrich et al. 6063155 studied developers’ communication across different platforms and observed that real-time messaging plays a role in the communication of developers. Their study shows that real-time messaging tools can support the usage of other communication channels (e.g., Skype calls) and provide a means for developers to form social and trust relationships with their colleagues. Chatterjee et al. Chatterjee:2019:ESS:3341883.3341961 analyzed characteristics of Q&A sessions in Slack and observed that they cover the same topics as Stack Overflow. Wei et al. WeiAutomating

applied neural networks techniques on real-time messages to automatically capture Q&A sessions. Ford et al. 

ford2018we experimented with using real-time chat rooms for the mentoring of asking questions on Stack Overflow for novice users. Chowdhury et al. Chowdhury:2015:MSF:2820518.2820577 leveraged information from Stack Overflow to create a content filter to effectively filter irrelevant discussions in IRC channels.

In our study, we focus on question discussions on Stack Overflow to better understand how they facilitate the Q&A process.

7.2 Understanding and Improving Stack Overflow

Prior research investigated how developers leverage Stack Overflow and studied different mechanisms aiming to improve the design of Stack Overflow Xia:2013:TRS:2487085.2487140; chen2018data; ZhouBounties; 8485395; ford2018we. Treude et al. Treude:2011:PAA:1985793.1985907 categorized the types of questions on Stack Overflow, and observed that Stack Overflow can be useful for code review and learning the concepts of programming. Wang et al. 8485395 studied the edits of answers and observed that users leverage the gamification system on Stack Overflow to gain more reputation points. Prior studies also aimed to understand the quality of the crowdsourced knowledge on Stack Overflow. For example, Srba et al. 7412622 observed that an increasing amount of content with relatively lower quality is affecting the Stack Overflow community. Lower quality content on Stack Overflow may also affect how questions are answered. Asaduzszaman et al. Asaduzzaman2013 showed that the quality of questions plays an important role in whether a question receives an answer by studying unanswered questions on Stack Overflow. An automated system to identify the quality of posts and filter low-quality content was proposed by Ponzanelli et al. 6976134. To improve the quality of the crowdsourced knowledge on Stack Overflow, prior studies aimed to identify artifacts with different properties wang2018understanding; ragkhitwetsagul2019toxic; 7365804; tian2013towards; vasilescu2014how; ye2017structure; ZhouBounties. For example, Nasehi et al. 6405249 examined code examples on Stack Overflow and identified characteristics of effective code examples. Their study shows that explanations for code examples have the same importance as code examples. Yang et al. Yang:2016:QUC:2901739.2901767 analyzed code snippets of popular languages (C#, Java, JavaScript, and Python) on Stack Overflow and examined their usability by compiling or running them. Zhang et al. 8669958 conducted an empirical study to understand answer obsolescence on Stack Overflow.

Prior studies also examined various supporting processes on Stack Overflow to better understand its operation and improve its efficiency of the crowdsourced knowledge sharing process. Chen et al. chen2018data

used a convolutional neural network (CNN) based approach to predict the need for post revisions to improve the overall quality of Stack Overflow posts. Several studies proposed approaches to automatically predict tags on Stack Overflow 

Xia:2013:TRS:2487085.2487140; 6624009; beyer2015synonym. Wang et al. Wang:2014:EET:2705615.2706107; Wang:2018:EET:3211160.3211174 proposed an automatic recommender for tags based on historical tag assignments to improve the accuracy of the labeling of tags for questions.

Instead of the extensively studied artifacts on Stack Overflow (e.g., questions, answers, tags), we investigate the question discussions by an empirical study of 33.5 million comments and 1.0 million chat room messages to understand how discussions can facilitate the Q&A process.

8 Conclusions

Question discussions are an integral part of the Q&A process on Stack Overflow, serving as an auxiliary communication channel for many developers whose technical information needs are not fully met within their nominal work environment. Question discussions occur throughout all phases of the Q&A process, especially before questions are answered. In 75.4% of Qd/a and 80.0% of Qd/aa, the question discussions begin before the first answer and the accepted answer is posted; furthermore, 19.9% of the question discussions begin even after the accepted answer is posted. Question discussions allow askers and potential answerers to interact and solve the question before posting an answer. In Qd/a, askers participate in 57.7% (i.e., 6.0 million) of the questions discussions and answerers participate in 33.9% (i.e., 2.8 million) of question discussions. When the number of comments increases, a higher proportion of questions are participated by askers and answerers. Moreover, while the answer-receiving-time of a question is strongly correlated (i.e., with a Spearman correlation of ) with the number of comments a question receives before its first answer, questions in Qhd/a are answered faster compared to questions with no discussion. We believe that our study of question discussions can be leveraged in several ways to improve the Q&A process. For example, an automated triaging system could suggest an appropriate communication channel; also, bots could be designed to warn about questions that seem unclear and might require further clarification.

References