An Empirical Study on the Characteristics of Question-Answering Process on Developer Forums

09/05/2019 ∙ by Yi Li, et al. ∙ New Jersey Institute of Technology The University of Texas at Dallas 0

Developer forums are one of the most popular and useful Q&A websites on API usages. The analysis of API forums can be a critical resource for the automated question and answer approaches. In this paper, we empirically study three API forums including Twitter, eBay, and AdWords, to investigate the characteristics of question-answering process. We observe that +60 forums were answered by providing API method names or documentation. +85 the questions were answered by API development teams and the answers from API development teams drew fewer follow-up questions. Our results provide empirical evidences for us in a future work to build automated solutions to answer developer questions on API forums.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Application Programming Interfaces (APIs) have become the backbone in modern software development. Accurately and effectively using APIs becomes extremely critical in software development [uddin2017automatic]. Recently, developer question and answering (Q&A) websites have become popular, critical and essential on-line resources that developers use to seek for their solutions on API usages, to share and learn knowledge of using APIs, and even to make discussions on the design of APIs [mamykina2011design].

Recently, two main types of developer Q&A websites (DQA) have become popular. The first type is the general-purpose Q&A websites, for example, Stack Overflow [stackoverflow]), taking any questions relevant to any APIs. The second type of DQA is the API Q&A forums maintained by the libraries’ providers, e.g., Twitter [twitter], and they accept only the questions relevant to the APIs of specific libraries. The main differences between the two DQAs can be summarized as follows: (1) Typically, an API forum is run by a library’s provider and has the members from the development team to answer the questions relevant to the APIs of the libraries. Developers tend to ask API-specific questions on API forums. However, StackOverflow tends to deem the valid questions yet specific to a particular library as off-topic questions [squire2015should]. The API development teams on API forums can offer fast and right-to-the-point responses to the API specific questions [venkatesh2016client]; and (2) Typically, the general-purpose DQA provides incentives, to improve the credibility of the responders and their public answers [wang2018users]. The API Q&A forums often do not allow developers to modify others’ questions or answers. Due to those major differences, it is neccessary to help API development teams answer more questions, and provide high-quality and right-to-the-point answers on API forums.

Extensive research has been devoted to studying one of the most popular DQA websites, Stack Overflow (SO), such as [zhang2019empirical, calefato2018ask, calefato2019empirical, li2018learning]. However, despite the importance of API forums, little research has been focused on API specialized forums. In this paper, we set out to investigate the process of question-answering on API specialized forums. We empirically studied three popular API Q&A forums, Twitter [twitter], eBay [ebay], and Google AdWords [AdWords], to answer the following research questions:

RQ1. How are the questions answered?

In this RQ, we want to study how a question is answered by developers on an API forum. Our results indicate that majority of the questions were answered with providing API method names (or sometimes links to API documentation).

RQ2. Who does answer the questions?

Similar to the general purpose DQA websites, any developer can answer a question on API forums. However, we observe that majority of the studied questions were answered by API development teams.

RQ3. What is the quality of answers?

Our further analysis of the answered questions on API forums show that the answers from API development teams drew fewer follow-up questions than the ones answered by other developers.

Ii Empirical Study Design

Our overall gaol is to understand the process of question-answering on API forums.

Data Collection and Processing. We conduct an empirical study on three popular web API Q&A forums, i.e., Twitter, eBay, and AdWords, to investigate the basics of developer API Q&A forums and motivate our study using our findings. We crawl all of the questions and their answers from each above-mentioned developer API Q&A forum (last access in April 2018). Table I shows that over 50% of the questions on each forum are not answered.

Category Twitter eBay AdWords
Total # of questions 16,874 6,204 23,731
# of questions with answers 8,910 3,524 12,364
# of questions without answers 7,964 2,680 11,367
% of questions without answers 52.8% 56.8% 52.1%
Table I: Statistics of Each Developer API Q&A Forum.

Analysis Approaches for RQs.

We analyze the answered questions of each forum to study who and how questions are answered. Using the confidence level 95% with an interval 5%, we randomly select 368, 358, 374 questions from 8,910, 5,231, and 14,245 questions having answers on Twitter, eBay, and AdWords, respectively. After selecting questions, we manually study each question and its answers to classify the question based on how the question was answered.

There can be many metrics for measuring answer quality, for simplicity, in this Late Breaking Results paper, we use the number of follow-up questions on an answer as one indicator to evaluate the quality of an answer in the analysis of RQ3.

Figure 1: Eight Categories of Questions.

Results of RQs. (RQ1.) Majority of the questions were answered with providing API method names (or sometimes links to API web pages). Figure 1 shows that we identify 8 categories of questions based on how they were answered For example, the API Docs indicates the percentage of the studied questions were answered using an API document, and on avg., 63% of the questions were answered by using API documentation. The Code Fixing indicates the percentage of the questions were caused by code errors, not relevant to the usage of APIs. The Refer to External Website means that the questions that can not be answered by API documents need the information from other external websites. (RQ2.) Majority of the studied questions were answered by API development teams. Figure 1(a) shows that about 84-87.7% questions were answered by API development teams on the studied three forums, while other developers only answered a small portion of the questions. (RQ3.) The answers from API development teams drew fewer follow-up questions than the ones answered by other developers. For example, Figure 1(b) shows that only around 13% of the questions answered by the Twitter API development team drew follow-up questions, while about 65% of the questions answered by other developers drew follow-up questions. The other API forums share the same phenomenon.

(a)
(b)
Figure 2: (a) Comparison between the Percentage of Questions Answered by API Development Team and Other Developers. (b) Comparison between the Percentage of Questions Answered by API Development Team and Other Developers, Receiving Follow-up Questions.

Iii Discussion

Our three preliminary RQs show that it is important to assist API development teams to answer developer questions, as on avg. over 85% of the questions were answered by API development team and fewer follow-up questions were fired up after an answer was provided by an API development team. Furthermore, on avg. over 60% of the questions were answered using API document links or directly API method names, suggesting that recommending relevant API documents to answer a question can be very useful.

Iv Threats to Validity

Manual Analysis of API forum posts. During the labeling process, most answers can clearly show the relevant APIs. However, some answers can contain outdated links for API documents, which makes it very difficult to determine the right relevant APIs. We discard such answers to try our best to minimize the bias. Although this part of work is common in question and answering, this process will bring bias to our results since the authors of this paper are not from API teams.

Selection of API forums. There are many API forums for different websites. In our research, we only focused on three very popular API forums, Twitter, eBay, and AdWords. Thus, we cannot claim that our approach is generic for all API forums. However, the key drivers of our approach outperforming the baselines are general across forums.

V Related Work

There has been extensive research devoted to analyzing Stack Overflow, for example, such as analyzing obsolete answers [zhang2019empirical], proposing guidelines for writing questions [calefato2018ask], discussing best-answer prediction models [calefato2019empirical], learning to answer SO questions [li2018learning].

Vi Conclusion

Our empirical results show that it is necessary to build automatic solutions to help API development team to answer developer questions. We plan to conduct further analysis on API forum posts and eventually propose solutions to automatically answer developer questions on API forums.

References