The knowledge-intensive nature of current-day software engineering means that software developers are continually in search of knowledge. A popular model for knowledge sharing on the Internet is the community question answering site, with Stack Overflow  serving as the de facto forum for most programmers . On Stack Overflow, registered users can post questions, answer posted questions, and comment on questions and answers by other users, which can then be viewed by anyone. As of March 2019, Stack Overflow archives 18M questions, 26M answers, and 87M comments. At this scale, Stack Overflow constitutes a major information broker between posters, contributors, and so-called “lurkers” (non-contributing readers).
Stack Overflow, however, does not exist in isolation—the site is only one of many sources of programmer knowledge in a software documentation ecosystem. Past research has extensively characterized the strengths and weaknesses of Stack Overflow (e.g., good at “how-to” documentation , bad at completeness ) compared to other sources, such as API documentation (e.g., good at structure , bad at scenarios ). Given these complementary strengths and weaknesses of different sources in the documentation ecosystem, it is only natural that links exist from one source to another. In fact, a preliminary study found that link sharing is a significant phenomenon on Stack Overflow, that Stack Overflow is an important resource for software development innovation dissemination and that it is part of a larger interconnected network of online resources used and referenced by developers .
Stack Overflow explicitly encourages the inclusion of links to external resources in answers, but requests that users add context so that “fellow users will have some idea what it is and why it’s there. Always quote the most relevant part of an important link, in case the target site is unreachable or goes permanently offline” . This advice is overly general. Not all link targets need to be quoted, and in some cases, the context for a link is obvious. However, deciding when and how to include links to other documentation sources in Stack Overflow posts requires differentiating common linking practices and understanding their unique characteristics.
With the general goal of helping to improve the efficiency of information flow on Stack Overflow, we conducted a multi-case study to answer the question of how and why documentation is referenced in Stack Overflow threads
. We sampled 759 links from two different domains, Java regular expressions and Android development, classified and qualitatively analyzed them, and then used the resulting data to derive association rules and build logistic regression models to identify properties of Stack Overflow questions that attract links to documentation resources.
Our main findings include that links on Stack Overflow serve widely diverse purposes that range from simple pointers to API documentation over links to concept descriptions on Wikipedia to suggestions of software components and background readings. This purpose spectrum allows us to modulate Stack Overflow’s requirement to add context for links. We also find that links to documentation resources are a reflection of the information needs typical to a technology domain, with significant differences between the two domains in our multi-case study.
Our main contributions are: (1) a framework and method to analyze the context and purpose of documentation links on Stack Overflow, (2) a public dataset with 759 annotated links that other researchers can use, and (3) a description of five major observations about linking practices on Stack Overflow, with detailed links to evidence, implications, and a conceptual framework to capture the relations between the five observations.
The remainder of this paper is structured as follows: We provide motivating examples in Section 2 and outline our study design in Section 3. Section 4 describes our method for link classification and sampling, Sections 5 and 6 describe our qualitative and quantitative analyses, respectively. Section 7 presents the major findings derived from these analyses, Section 8 describes threats to validity. Section 9 reviews related work before we conclude the paper in Section 10.
2 Examples of Linking on Stack Overflow
When considering the potential value of links on Stack Overflow, the best case scenario is the recommendation of specific information relevant to the thread (links are in bold):
…have a look at Greedy, Reluctant, and Possessive Quantifiers section of the Java RegEx tutorial… 
In this case, a contributor provided a comment to point the original poster to a section of a tutorial introducing the concept of regular expression quantifiers and explaining how to use them. These “ideal” links provide clear value added to the thread, and form a type of information that can even be automatically mined to improve information discovery .
However, the reality of linking practices goes broadly beyond this expected scenario. For example, links to obvious documentation resources can be introduced defensively by the original poster themselves, to avoid having a question downvoted :
Other links simply bind a reference to library classes to its documentation, even for well-known, pervasive classes:
When you want to return more than one result, you need to return an array (String) or a Collection like an ArrayList, for example.
From the point of view of links as mechanisms to increase the flow of valuable software development knowledge, degenerate practices include providing links to comic strips (such as xkcd) and similar sites:
…reminds me of this xkcd
And, possibly one of the most feared and resented pieces of information on the site, the inclusion of the link to a famous placating blog post :
I like to refer [you] to whathaveyoutried.com…
As these examples show, linking practices on Stack Overflow are diverse and the intrinsic value of a link as a carrier of relevant technical information is not uniform. The first example link, to a specific section of a tutorial, has an obvious purpose and value. The link to a comic strip is clearly noise. Between these extremes lies a gray zone where links play different roles in different contexts.
3 Study Design
To investigate how and why documentation resources are referenced in Stack Overflow threads, we conducted a mixed-methods study involving a qualitative analysis of 759 links from 742 different threads and a quantitative analysis using association rule mining and logistic regression models.
The overall goal of the study is to discover the roles that links to documentation play in Stack Overflow threads and thus pave the way for a more systematic treatment of documentation references on Q&A sites for software developers. We split our research questions into two sub-questions:
What is the context around documentation links in Stack Overflow threads? With this question we study how links are provided.
What is the purpose that documentation links in Stack Overflow threads serve? With this question we study why links are provided.
With these questions, our aim was to collect specific insights about linking practices on Stack Overflow, that can support actionable implications for authors and readers of Q&A forums and for the development of technology based on the analysis of such forums.
Our first research question was motivated by the fact that Stack Overflow encourages users to provide context for links , in particular by quoting external sources . We qualitatively analyzed whether users follow this advice (see Section 5), but we also built logistic regression models capturing different features of Stack Overflow posts to quantitatively analyze which of those features are related to the presence of documentation links (see Section 6).
As the examples in Section 2 illustrate, links on Stack Overflow serve diverse purposes. To conduct a structured analysis of those purposes, we first built a classifier that was able to identify links to the most frequently referenced documentation resources (see Section 4). Based on a stratified sample of documentation links identified using the classifier, all three authors independently coded the purpose of 759 links using a jointly developed coding guide (see Section 5). We mined the resulting data for association rules between documentation resources and assigned purposes and then used our qualitative and quantitative results to corroborate five major findings about linking practices on Stack Overflow (see Section 7).
Cases Studied: Regex and Android
Because even a cursory inspection of Stack Overflow threads shows clear differences in the use of references to external documentation, we structured our research as a multi-case study of linking practices for two different domains: use of regular expressions in Java (Regex), and Android development (Android). We bounded our investigation to clearly-defined domains to support a richer analysis of linking practices in the context of the wider documentation ecosystem they integrate. We selected Regex and Android because they constituted two very different domains (library vs. framework, small vs. large, included in the programming language vs. third-party, theoretically vs. practically grounded), and because we were familiar with both technologies. The importance of this latter aspect is not to be underestimated as a contributor to the meaningfulness of qualitative data analysis.
Overview of the Research Process
Despite the ready availability of structured data from Stack Overflow, generating reliable insights about linking practices requires an extensive combination of analytical processing and manual inspection. Figure 1 outlines the general process. The research proceeded sequentially: we first completed an entire iteration for Regex (referenced as number 1 on the figure), and then repeated the process for the second case (Android), referenced as number 2.
In the following description, numbers refer to the step in the process overview (indicated after the period in Figure 1).
The first step was to retrieve all Stack Overflow threads related to each case (.1). For this purpose we utilized the SOTorrent dataset . For the Regex case, we retrieved all threads with tags java and regex, and for the Android case, the threads with tags java and android. For each case, we used the most recent release at the time (2018-05-04 for the Regex case  and 2018-07-31 for the Android case ).
The second step was to process the links to determine what they were linking to, and to abstract the target of the links to one of a small set of documentation resource categories (e.g., links to other Stack Overflow threads vs. links to API documentation). We built a URL mapper to classify links to such documentation resources using the 25 most frequently referenced root domains for each case (Section 4 and .2 in Figure 1).
The classification of links was necessary to create a stratified sample for detailed analysis, i.e., a sample guaranteed to contain links to all different types of resources. The third step was then to draw samples containing links to all identified documentation resources and qualitatively analyze their context and purpose (see Section 5 and .3 in Figure 1). This step involved extensive manual inspection and labeling of links in their context.
To support the complete replicability of this process and the verification of the results presented in this paper, we provide our coding guide, samples, and the analysis and data retrieval scripts as supplementary material .
4 Link Classification and Sampling
|Domain||#Posts (%)||Resource Categories|
|oracle.com||4,316 (19.8%)||JavaAPI, JavaReference,|
|Domain||#Posts (%)||Resource Categories|
|android.com||42,199 (23.7%)||AndroidAPI, AndroidReference|
|google.com||11,924 (6.7%)||AndroidIssue, AndroidReference,|
|Resource Category||#Links in Regex||#Links in Android|
Links on Stack Overflow may point to resources other than documentation, e.g., tools or images. To be able to study links to documentation resources on Stack Overflow, we built a URL-based classifier that takes as input a link and outputs either one of 12 documentation resource categories that best describes the target of the link, or marks the link as NotDocumentation (see Table III). We used this classifier to categorize all links in the two cases and then sampled links from each category of documentation links for our qualitative analysis.
Building the Classifier
We built the link categorization and corresponding classifier following a grounded, iterative approach.
First, we ranked all referenced root domains according to the number of posts in which they are referenced (the root domain of en.wikipedia.org, for example, is wikipedia.org). Starting with the most frequently referenced root domain, we inspected the extracted links and either decided that they form a new resource category or assigned them to an existing one. We then built regular expressions matching the paths of the domains that point to documentation resources. After integrating those regular expressions in our link classifier, we executed the classification and analyzed the links to the current domain that were not matched yet. We then refined the regular expressions and repeated the process until all links to documentation resources were matched. This process was performed by two authors who continuously discussed the emerging resource categories and associated regular expressions. All decisions in the process were made unanimously. The source code of the classifier, including the regular expressions for all documentation resources, is available on GitHub111https://github.com/sbaltes/condor and archived on Zenodo .
Table I lists the five most frequently referenced root domains for Regex, together with the number of links to those domains and the assigned resource categories. Table II lists this information for Android.
To provide an example for our classification approach, we briefly describe the path matching for stackoverflow.com, the most frequently referenced root domain in both cases. Because the links to this domain are internal to the Stack Overflow platform, we created a dedicated documentation resource category StackOverflow. However, we did not consider all links to stackoverflow.com to be documentation links. Our classifier uses a whitelisting approach and only matches links to Stack Overflow questions, answers, post revisions, and comments—but not links to user profiles or pages with tips on how to write questions and answers. The regular expressions for the StackOverflow documentation resource all start with:
This prefix is followed by expressions matching the different paths we rated as pointing to documentation resources:
/(a|q|questions)/[\\d]+.* /revisions.* /posts/\\d+/revisions.* /posts/comments.*
We repeated the classification process for the 25 most frequently referenced root domains in both samples, which enabled us to classify 78.5% of all active links in the Regex sample and 68.9% of all active links in the Android sample. The ratio of classified active links can be derived from the data in Table III as follows: . Because we conducted our analysis of the Android case after the Regex case had been completed, the classifier for Android links was built by extending the Regex link classifier.
Table III shows the documentation resources we extracted for both cases. The resource JavaReference represents official Java documentation except for the Java API documentation, which is represented by JavaAPI. OtherReference, AndroidReference, OtherAPI, and AndroidAPI are analogously defined. AndroidIssue represents links to Android issue descriptions.
Because of the high effort involved in reviewing each link manually, we produced a sample of links to documentation resources for the qualitative analysis. We randomly sampled (up to) 40 links per documentation resource: We selected 20 links from questions (10 from question posts and 10 from question comments) and 20 links from answers (10 from answer posts and 10 from answer comments). Because some documentation resources had insufficient links to fulfill all of those selection constraints, the Regex sample contained only 279 links (and not ). The Android sample contained links, because we added four additional documentation resources that were only exhibited in that domain (see Table III).
5 Qualitative Analysis
We qualitatively analyzed all links in our samples to build a first layer of interpretation of linking practices. Following our research questions, we organized the coding  along two dimensions, context and purpose. For analyzing the context, much information is already available directly in the posts (e.g., the text surrounding the links). For context, we designed the coding task to complement this information with insights that are impossible to extract automatically, namely, whether the text in the context includes a quote or a summary of the link target. For purpose, we were interested in producing an abstraction of the purpose of the link as it would appear to a third party who read the corresponding thread.
Development of the Coding Guide
We developed a coding guide by considering the context and purpose dimensions separately.
For the context, creating the coding guide simply amounted to agreeing on what constituted a quote and a summary. The task was thus to indicate, for each link in the sample, true or false as values for the attributes Quote and Summary. The attribute Quote indicates the presence of non-trivial content that has been copied without modification from the linked documentation resource into the Stack Overflow post or comment, the attribute Summary indicates that the Stack Overflow author provided at least one key insight from the linked documentation resource in their own words.
The development of a reliable coding guide for a link’s purpose was much more challenging, and required multiple iterations. In an initial coding phase, we built a coding guide using a subset of the links for Regex. During the initial coding, all three authors coded 80 links in four tasks of 20 each, discussing emerging categories after completing each task, until a stable coding guide emerged. Prior to starting with the Android sample, all three authors coded 50 links and then discussed if changes to the coding guide were required, which only led to one minor addition. Note that, while the codes are not mutually exclusive, the coders always assigned one code that they considered to most accurately describe the link purpose. Table IV lists the codes with a brief description. The full description can be found in the supplementary material. The modification that was required for the Android case was simply to add “watching a video” to the code BackgroundReading, because of the new documentation resource Youtube.
|ATT||Attribution||Link to a resource simply to credit the source for material taken verbatim.|
|AWA||Awareness||Link intended to make readers aware that a certain resources exists, or provide information about the nature of its content, without necessarily endorsing it.|
|BGR||BackgroundReading||Link to a resource that a user thinks other users should read or watch to get better general knowledge of the topic related to the thread.|
|CPT||Concept||Link to a resource that contains a general description of a concept that the reader should know about.|
|CST||Consulted||Link to documentation to indicate that it was consulted prior to posting.|
|LMN||LinkedMention||Link to the element-level (class, method, field) Javadocs of an API element that is mentioned as part of the text, without more specific indication for the purpose of the link.|
|ONL||LinkOnly||Link that only contains the URL (including anchor text) without any additional information surrounding it.|
|RCM||Recommendation||Link to resources that are landing pages for tools, libraries, API elements, or algorithms, for the purpose of recommending these.|
|REF||Reference||Links to a resource to indicate the source of knowledge for an explicit claim, statement, or information conveyed in the post.|
|OTH||Other||Link whose purpose is other than can be captured by other codes, unclear, or unknown.|
We used the coding guide in a focused coding phase to go over all links in the sample and code them according to the guide, which we provide as supplementary material. All three authors used the coding guide to independently code the links by opening the Stack Overflow thread in a web browser, locating the link, and analyzing the surrounding context.
We coded the links in sets of up to 100 links, computing inter-rater agreement and discussing results after each set to ensure there were no major divergences or misunderstandings of the coding guide. To measure our inter-rater agreement, we calculated a three-way Cohen’s kappa ()  for each set. Table V presents the agreement data.
The task of identifying the purpose of a link turns out to be very challenging. In some cases, the purpose can be ambiguous or opaque. The difficulty of the task is reflected in the kappa values. Although they increase towards the end as we became more proficient, values in the 0.65-0.80 range, although usable, are indicative of a non-negligible amount of residual flexibility of interpretation. The difficulty of the coding task is the reason we opted for the unusual and very labor intensive practice of coding every single item in our data set in triplicate. This decision significantly mitigates the threats of bias in the coding task, since we were able to systematically detect links with ambiguous purpose and resolve disagreements through a formal process. After each coding iteration, we merged the purpose codes by selecting the code which at least two investigators used, and assigned the code Other if there was no agreement, which happened for 14 Regex links (5%) and for 13 Android links (2.7%). The binary codes capturing the link context were assigned a value of true if at least two investigators considered the link to be accompanied by a Quote or Summary respectively.
Tables VI and VII show the frequency of each code per documentation resource for both cases. While our URL mapper was able to detect most invalid or dead links, we still noticed some broken links in the samples (coded as N/A). We also coded links as N/A if they were not rendered on Stack Overflow’s website, but present in the Markdown source of the posts or comments, which we used to extract the links from.
6 Quantitative Analysis
The qualitative analysis provides the foundation that enabled three quantitative analyses to better understand linking practices:
A systematic comparison of code distributions between our two cases, to relate differences to their context.
The mining of association rules to detect correspondences between a resource type and a link purpose.
The building of logistic regression models, using question features as independent variables, to determine the characteristics of a Stack Overflow question that are related to the features of documentation links in an answer or a comment.
Code Frequency Comparison
Figure 2 shows the relative frequency of the purpose codes we assigned. The bar charts reveals two major differences: the code Awareness was about twice as common in the Android case than in the Regex case (31.0% vs. 16.8%). The reverse was true for the code Concept, which was about twice as common in the Regex case (13.3% vs. 6.3%). Both difference were significant according to a two-tailed Fisher’s exact test  with a significance level of .222The p-values were for the Awareness frequency difference and for the Concept frequency difference.
Both of these differences can be directly linked to salient aspects of the technological environment of the cases analyzed. The Regex case exhibits twice as many Concept-related links, which can be explained by the theoretical nature of the domain. The links we coded are to concepts such as context-free grammar and regular language. As for Android, the extensive use of links for Awareness purposes can be explained by the huge size of this technology ecosystem, where many users end up posting answers and comments simply to point out relevant resources to each other.
Association Rule Mining
To mine association rules, we transformed the documentation resource categories and purpose codes into binary properties of the links, added the Quote and Summary codes, and then applied the apriori algorithm  as implemented in the R package arules333https://cran.r-project.org/web/packages/arules/index.html to retrieve binary rules.
Tables VIII and IX show the binary association rules between the documentation resource types and the purposes we coded. We note that the maximum support of a rule is limited by the fact that we only sampled up to 40 links per documentation resource. The Regex sample, for example, contained 279 links in total. If a rule is true for all 40 links to one particular resource, the support would still only be . In our analysis, we considered rules with at least 10% of the maximum possible support, which was for the regex sample and for the Android sample. Moreover, we excluded rules with less than 25% confidence, meaning that a rule must be true in at least 1 out of 4 cases, and rules involving the code Other.
We use the resulting rules to distill the main motivation behind linking to resources of a certain type.
The purpose Concept was clearly associated with the resource Wikipedia, having the highest and second highest confidence in the two samples, respectively. A typical usage scenario was to mention a concept related to the question and then use the first mention of the concept as link anchor pointing to the corresponding page on Wikipedia:
This observation provides a clear characterization of the extent to which Wikipedia is leveraged to avoid defining concepts. The observation directly corroborates that of Vincent et al., who found that “on SO, Wikipedia supports answers in the form of links and quoted text. Answers often use technical terms or acronyms and include a Wikipedia link in lieu of defining these terms.”
A second dominant group of association rules are related to Recommendations, which often pointed directly to the API documentation of a recommended software component. This is represented by the rule OtherAPI Recommendation in the regex sample and JavaAPI/OtherAPI Recommendation in the Android sample.
You could use Apache Commons Lang for that… 
A main use case of Reference documentation was providing readers with pointers to resources for BackgroundReading. This relationship is also reproduced in the association rules we identified, since JavaResources were associated with BackgroundReading in both samples. Moreover, AndroidReference was associated with this purpose in the second sample. An example for BackgroundReading is provided below:
Instead of asking people to code your regular expressions for you, try reading the Java Regular Expressions Tutorial. …docs.oracle.com/javase/tutorial/... 
Other rules for link purposes were not as insightful because they rather confirmed the definition of our codes than indicated a particular linking practice. For example, although StackOverflow Awareness was a strong rule for both cases, it is hardly surprising that people will link to a Stack Overflow post to make others aware of it.
Regarding the context of links, we only identified one rule for Quote and one for Summary that was present in both samples. The only rule we identified for quotes, Attribution Quote, had a support of 0.04 in the Regex sample and 0.03 in the Android sample, with a confidence of 1.0 (Regex) and 0.88 (Android) ( and ). The only rule for summaries, Reference Summary, had a support of 0.08 (Regex) and 0.07 (Android) with a confidence of 0.7 in both cases ( and ).
Generally, quoting content was not very common for documentation links. In the Regex sample, 7.5% of the links referred to content being quoted, in the Android sample only 3.1% (see Table VI). The quoted content ranged from complete code snippets to small parts of the reference documentation. Summarizing linked resources was more common than quoting (17.9% in Regex and 7.1% in Android). However, there was neither a summary nor a quote for 203 Regex (72.8%) and 400 Android links (83.3%), which can be a problem once the links are dead.
To investigate which properties of a Stack Overflow question might explain whether it will attract documentation links, we built separate logistic regression models for the Regex and Android cases.
For each of the two cases (Regex and Android), the input data for the model building were three samples, each containing 100 Stack Overflow threads:
One sample with threads that attracted links to documentation resources. To identify such threads, we relied on our previous classification and randomly selected 100 threads with at least one answer or comment containing a link classified as pointing to one of the documentation resources (see Table III).
One sample with threads that attracted links, but not to documentation resources. We randomly selected 100 threads with at least one answer or comment containing a non-classified or non-documentation link (see Table III).
One sample with threads that did not attract links at all. To draw this sample, we utilized the SOTorrent dataset and selected only threads without any links in answers and comments (no records in tables PostVersionUrl and CommentUrl).
Our data retrieval and sampling scripts are available as part of the supplementary material. Two of the authors independently analyzed all 600 threads to verify that they are indeed a representative of the corresponding class. In case we found contradicting evidence (e.g., a link to a documentation resource in the second sample), we excluded those threads and then sampled and analyzed replacements.
In the course of that manual analysis, we also coded the purposes of all non-documentation links. In the Regex sample, the most common purposes of non-documentation links were referring to a (regex) tool (46), source code (19), or websites with posting recommendations444Examples: http://whathaveyoutried.com/ or http://sscce.org/ (16). In the Android sample, the most common purposes were linking source code (28), a tool (22, e.g., JSON or XML validators), or an image file (19, e.g., icons or screenshots).
|TitleLength||# of characters in question title|
|TextBlockCount||# of text blocks in question|
|CodeBlockCount||# of code blocks in question|
|LineCountText||# of lines of text in question|
|LineCountCode||# of lines of code in question|
|LengthText||# of characters formatted as text|
|LengthCode||# of characters formatted as code|
|UserAgeWhenPosting||# of days since account creation|
|UserReputation||reputation of user|
|LinkCount||# of links in question|
|LinkSpecificity||0: no link|
|1: link to root domain|
|2: path present|
|3: path contains fragment identifier|
|Tags||tags associated with the question|
|(Regex: 4 features, Android: 3 features)|
|Title||the question title|
|(Regex: 14 features, Android: 2 features)|
|Text||all text in the question body|
|(Regex: 86 features, Android: 69 features)|
|Code||all code in the question body|
|(Regex: 23 features, Android: 118 features)|
Table X shows the features used as independent variables in the logistic regression models. The set of features consists of numeric features that can be extracted from the question, such as LengthText or CodeBlockCount. Note that we excluded features that would be unknown at the time when the question was posted, such as how many views the question attracted or its score. We retrieved the data for the features from the SOTorrent dataset, which contains the content of Stack Overflow posts separated into text and code blocks, collects links from posts and questions, and provides the metadata from the official Stack Overflow data dump.
For the textual features, shown in the bottom part of Table X, we treated each token as a separate feature and used token frequency as feature values. We separated text into tokens using whitespace, and we removed stopwords555We used the “Long Stopword List” from https://www.ranks.nl/stopwords and punctuation as well as special characters. All tokens were stemmed using the Porter stemming algorithm . We discarded features consisting of a single character such as a single digit, and we limited the set of features to tokens whose frequency in our dataset exceeded a minimum threshold. We used the goodness of fit (measured using McFadden’s pseudo- ) to determine the best threshold for each dataset, resulting in a threshold of 15 for the Regex dataset and 22 for the Android dataset. This led to a total of 138 features for the Regex dataset and 203 features for the Android dataset. Table X shows the number of features resulting from each textual property.
The interpretation of logistic regression models may be misleading if the metrics that are used to construct them are correlated . As Table X shows, some of our features are likely to be correlated, e.g., LineCountText and LengthText. To mitigate correlated metrics, we used AutoSpearman , an automated metric selection approach based on correlation analyses, with a threshold of 0.7.
Following the advice of Tantithamthavorn and Hassan , we used ANOVA Type-II importance scores to interpret our logistic regression models after constructing them using the glm function in R.
Models For Documentation Resources
We built logistic regression models for specific types of documentation resources. While we did not have enough data to allow the construction of models for all types of resources, Tables XI and XII show the most important features (as determined by the ANOVA Type-II test) for a subset of resource types for the Regex and Android datasets. Table XI indicates that Regex questions about parsing and patterns are associated with a higher chance of attracting a link to Wikipedia. In contrast, questions about specific problems are associated with a lower likelihood. For Android, questions about devices are associated with a higher chance of attracting Wikipedia links while questions about converting are associated with attracting links to the JavaAPI. As shown in Table XII, links to the AndroidReference documentation are associated with questions asked by users with a higher reputation. Interestingly, a manual inspection of the corresponding questions suggests that many of these high-reputation users are outsiders whose expertise is in areas other than Android.
Our systematic analysis of the context (RQ1) and purpose (RQ2) of documentation links led to five major findings about linking practices on Stack Overflow. Furthermore, the findings build on each other to form a small conceptual framework defined in terms of logical implications. Figure 3 summarizes the findings and their relationships. Our primary finding concerns the variety of linking purposes we elicited and the observation that linking purpose types span a spectrum that characterizes to what extent a link is intended to be followed (Purpose Spectrum). We also collected evidence of a notable correspondence between a resource type (e.g., Wikipedia) and a link’s purpose (Purpose–Resource Correspondence), and that link usage may be specific to a technology domain (Domain-Specific Link Usage). Both of these observations are consequences of Purpose Spectrum in the sense that it is the observed richness of linking purposes that enables the elicitation of specific linking practices. A fourth observation is the extent to which links in Stack Overflow threads lack context, despite the presence of guidelines explicitly requesting such context (Missing Link Context). To a certain extent, this observed problem can be mitigated by Purpose–Resource Correspondence because this correspondence supports partial inference of a link’s purpose. Finally, our analysis reveals a pattern that would be counter-intuitive at first glance: users with high reputation attract answers with links to the reference documentation, which can also be construed a symptom of lack of expertise (Reputation-Expertise Mismatch). This finding is enabled by the Purpose–Resource Correspondence which relates links to documentation resources with a type of information need. In the remainder of this section, we detail the evidence for each finding and discuss its main implication.
Our qualitative analysis has shown that documentation links on Stack Overflow serve a variety of purposes. Figure 2 shows a rich diversity of purposes with eight of eleven categories showing relative frequency above 5%. Manually reviewing all the links (through the coding process) also showed that the different categories of link purposes can be positioned on a spectrum bounded by the concepts of Citation and Recommendation, where citations are not meant to be consulted whereas recommendations are explicit entreaties to follow the link. Figure 4 positions every link purpose category except for LinkOnly and Other along this axis.
Citation links include the ones labeled as Attribution and LinkedMention. The purpose of Attribution links is to credit the source of content copied into Stack Overflow, and the purpose of the LinkedMention links is to uniquely identify a software artifact without the need to provide further context. Often, users add such LinkedMention references as inline links, which underlines their peripheral role:
Is there a regex that would work with String.split() to break a String into contiguous characters…?
We place Consulted and Concept in the middle of the spectrum because they are open to interpretation. Consulted links are typically added for context, but in some cases this context is simply to show due diligence (closer to citation) and in some cases it is to point to an unclear document to be explained, e.g.,:
I am trying to understand the regular expression in Solr and came across this Java doc where explains… having a hard time understanding what it really means. 
As for Concept links, they are useful for readers who want to learn more about a mentioned concept, but they are usually also peripheral to the actual content of the post or comment (reproduced from Section 6).
Closer towards Recommendation we place Awareness links that steer users’ attention towards related resources, without particularly endorsing them, as well as Reference links that users include to make statements verifiable and more trustworthy by pointing to documentation resources supporting their claims.
One purpose of links towards the Recommendation end of the spectrum is to explicitly guide readers to BackgroundReading. Such links are especially helpful for users who are new to a topic or domain since they support them in identifying relevant background knowledge:
There is a good detailed description of lookarounds (look-behind and look-ahead) as well as a lot of other regex “magic” here
Finally, we find explicit Recommendation links. They allow readers to retrieve a specific software component recommended by a Stack Overflow author using the provided link (reproduced from Section 6).
You could use Apache Commons Lang for that… 
Implication: Forum users add links to documentation for a variety of purposes. This purpose may not be clear to the reader. Links whose purpose is not clear may confuse or waste the time of inexperienced users, who are surmised to visit more links as they navigate web sites . Automated analysis of link data (e.g., ) may miss opportunities for additional interpretation if link purpose is not taken into account.
In the two cases we studied, mined association rules show consistent relations between a resource type (e.g., Wikipedia, Stack Overflow) and a link’s purpose. Links to Wikipedia, for example, often serve to define Concepts, an observation consistent with previous work . Links to the documentation of software components and tools are often included to recommend the tool rather than to refer to the linked document specifically (Recommendation).
Implication: For technology domains where certain resource types can be strongly associated with a link purpose, it may be possible to automatically recommend links to enhance a post, or infer the purpose of a linked resource.
Domain-specific Link Usage
The distribution of link purposes shown in Figure 2 and detailed in Tables VI–VII shows remarkable consistency between cases except for two major differences. All link purpose ratios are within 3% of each other except for Concept (about twice as common for Regex) and Awareness (about twice as common for Android). Both of these results were significant at the level (see Section 6). From this we conjectured that the higher proportion of Concept links is explained by the theoretical nature of the domain, which involves concepts such as “parsing”, “context-free grammar”, “pattern”, etc. This observation is corroborated by the regression model, which shows that one of the dominant features for explaining whether a Stack Overflow question related to regular expressions will attract a particular type of documentation link include such theoretical concepts, namely “parsing” and “pattern”. As for Android, the extensive use of links for Awareness purposes can be explained by the size of this technology ecosystem.
Another manifestation of domain-specific link usage is the fact that in the Regex case, only 26 posts pointed to Youtube (0.09% of all posts containing links), while in the Android case, linking Youtube videos was much more common (1,822 posts or 0.8% of all posts containing links). The difference was significant according to a two-tailed Fisher’s exact test  with a significance level of (). Typical use cases of linking Youtube videos include pointing to tutorials666Example tutorial: https://youtu.be/fn5OlqQuOCk or conference talks.777Example conference talk: https://youtu.be/N6YdwzAvwOA
Implication: Links to documentation resources are a reflection of the information needs typical to a technology domain. Details on the distribution of purpose links for a domain can thus assist in the design of documentation.
Missing Link Context
Even though Stack Overflow encourages users to provide context for links , they are rarely accompanied by a Quote  or a Summary. Our analysis shows that, for 72.8% of the analyzed links, authors did not provide a quote and for 83.3% of the links they did not provide a summary. Although in some situations this lack of context may render links worthless once their target is unavailable, our analysis also revealed valid use cases for links without context, as links at the Citation end of the purpose spectrum do not necessarily need context. However, links towards the Recommendation end of the spectrum should always be accompanied by additional information to preserve that information in case the linked resources becomes unavailable.
Implication: Our link Purpose Spectrum observation allows us to modulate the requirement to add context for links, given that our data shows the context to be self-explanatory for links whose purpose is akin to a citation. We hypothesize that the importance of context for orienting users is proportional to a link’s position on the purpose spectrum. Missing context is thus not necessarily a problem for links whose purpose is citation.
The logistic regression analysis shows that users with a high reputation score are not necessarily more familiar with reference documentation than lower reputation users. Links to theAndroidReference documentation are associated with questions asked by users with a higher reputation. The median user reputation of users asking questions which attract links to the AndroidReference documentation in the dataset used for the logistic regression analysis is , while the corresponding median for the remaining questions is . A manual inspection of the corresponding questions suggests that many of these high-reputation users are outsiders whose expertise is in areas other than Android (often iOS). Similarly, links to Wikipedia are also associated with questions asked by users with a higher reputation.
Implication: In the past, researchers have treated reputation on Stack Overflow as a general proxy for knowledge (e.g., ). Our results indicate that this operationalization may not be valid in all scenarios, because Stack Overflow authors’ knowledge is domain-specific.
8 Threats to Validity
The external validity of our results may be limited due to our choice of the two specific domains Regex and Android. Nevertheless, the documentation resources we identified, such as API documentation  and Wikipedia , are likely to also play an important role in other domains.
Another threat is that our URL mapper was only able to classify 78.5% of all active links in the Regex sample and 68.9% of all active links in the Android sample (see Section 4). However, a classification of the remaining links would only add more documentation resources, but not invalidate the ones we identified.
The stratified sampling strategy we used to select documentation links for our analyses represents a threat to the external validity of our results. In the association rule analysis we conducted, support and confidence only hold for our samples—they would differ in non-stratified samples. We described how to interpret those values considering the stratification (see Section 6). Moreover, the fact that all rules derived from the Regex sample were also present in the Android sample supports their credibility.
The purpose distribution would likely differ in a random sample. However, in a random sample, frequently referenced documentation resources such as Stack Overflow, JavaAPI, and AndroidAPI would dominate the analysis. The stratification allowed us to consider a more diverse range of resources and purposes.
Qualitative data analysis always depends on the imagination and perception of the researcher. To mitigate this threat, all three authors conducted the qualitative analysis independently. We coded links in sets of up to 100 links, thoroughly discussed our results after finishing each set, assessed the inter-rater agreement, and only assigned a code if at least two researchers agreed on it.
9 Related Work
There have been different studies investigating individual aspects of link usage on Stack Overflow. Gomez et al.  conducted a basic but comprehensive study of the links found on Stack Overflow. However, they studied different types of links in posts (not comments) and did not focus on specific domains. In this paper, we focus on two specific domains, which allows us to understand the data in a specific context. Moreover, we analyze the purpose of the information sharing and not just its nature.
Vincent et al.  analyzed the usage of Wikipedia by Stack Overflow authors. Their work is closely related to the purpose Concept in our coding schema. However, while we found this purpose to be associated with Wikipedia links, our notion of Concept is not limited to this particular website. Vincent et al. found that 1.28% of all Stack Overflow posts contain links to Wikipedia. Using version 2018-07-31 of the SOTorrent dataset , we identified 1.94% of all threads, but only 0.85% of all posts, to contain links to Wikipedia. Also considering links in comments, which Vincent et al. did not, the ratio of threads with links to Wikipedia increases to 2.55%.
Li et al.  built a collaborative filtering recommender system to recommend other learning resources, which is based on co-occurrences of links in Stack Overflow posts. As outlined above, the purpose-resource correspondence that we identified may help to improve such recommender systems.
Xia et al.  describe that it is common for developers to search for reusable code snippets on the web. In particular when developers want to copy such snippets into their projects, Attribution links can be helpful to decide under which license the content can be used, a problem which many developers are not aware of .
Treude and Robillard  conducted a survey to investigate whether Stack Overflow code fragments are self-explanatory and found that less than half of the fragments in their sample were considered to be self-explanatory. Providing links to related documentation resources may help developers in understanding such code fragments.
To describe topics of Stack Overflow questions and answers, different methods such as manual analysis  and Latent Dirichlet Allocation [45, 46] have been used. Automatically identifying high-quality posts has been another research direction, where metrics based on the number of edits on a question , author popularity , and code readability  yielded good results. To identify the topics and assess the quality of Stack Overflow posts, linked documentation resources, and in particular their correspondence to certain link purposes, could be used in future work.
In their conceptual framework of success factors for Stack Overflow questions, Calefato et al.  considered the presence of links as one aspect of a question’s presentation quality. However, they did not find a significant effect of the fact that a question contained a link on the the success of that question, that is whether it attracted an accepted answer. A direction of future work is to consider not only the presence of a links, but also their purpose and targets.
Outside of Stack Overflow, Hata et al.  studied the role of links contained in source code comments in terms of prevalence, link targets, purposes, decay, and evolutionary aspects. Similar to our findings, Hata et al. also report that link can be fragile since they are vulnerable to link rot and link targets change frequently.
Over the past decade, the community question answering platform Stack Overflow has become extremely popular among programmers for finding and sharing knowledge. However, the site does not exist in isolation, and users frequently link to other documentation sources, such as API documentation and encyclopedia articles, from within questions, answers, or comments on Stack Overflow. To understand how and why documentation is referenced from Stack Overflow threads, we conducted a multi-case study of links in two different technology domains, regular expressions and Android development. We used qualitative and quantitative research methods to systematically investigate the context and purpose of a sample of 759 documentation links.
We identified a spectrum of purposes for which links are included in Stack Overflow threads, ranging from Attribution and LinkedMention on the citation end of the spectrum to BackgroundReading and Recommendation of software artifacts on the recommendation side. Citations are not necessarily meant to be consulted whereas recommendations are explicit requests to follow a link. This observation relates to Stack Overflow’s recommendation to add context to every link: While adding context in the form of summaries or quotes is important for links on the recommendation end of the purpose spectrum, it is less important for links primarily included for citation purposes.
We also found that links to documentation resources are a reflection of the information needs typical to a technology domain. For example, Concept links were twice as common in threads about regular expressions compared to Android, while we found the opposite for Awareness links. These insights can inform the design and customization of documentation for different technology domains.
Our work forms a first step towards understanding how and why documentation resources are referenced on Stack Overflow, with the ultimate goal of improving the efficiency of information flow on Stack Overflow and the broader software documentation ecosystem. In the short term, Stack Overflow authors can use our results to reflect on the intended purpose before posting a link, and to learn how they can make their post more valuable by providing context.
-  J. Spolsky, “Stack Overflow launches,” Blog Post https://www.joelonsoftware.com/2008/09/15/stack-overflow-launches/, accessed: 21 August 2018.
-  L. Mamykina, B. Manoim, M. Mittal, G. Hripcsak, and B. Hartmann, “Design lessons from the fastest Q&A site in the west,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2011, pp. 2857–2866.
-  C. Treude, O. Barzilay, and M.-A. D. Storey, “How do programmers ask and answer questions on the web?” in 33rd International Conference on Software Engineering (ICSE 2011), R. N. Taylor, H. C. Gall, and N. Medvidovic, Eds. Waikiki, Honolulu: ACM, 2011, pp. 804–807.
-  C. Parnin, C. Treude, L. Grammel, and M.-A. Storey, “Crowd documentation: Exploring the coverage and the dynamics of api discussions on stack overflow,” Georgia Institute of Technology, Tech. Rep, 2012.
-  W. Maalej and M. P. Robillard, “Patterns of knowledge in api reference documentation,” IEEE Transactions on Software Engineering, vol. 39, no. 9, pp. 1264–1282, 2013.
-  M. P. Robillard and R. Deline, “A field study of api learning obstacles,” Empirical Software Engineering, vol. 16, no. 6, pp. 703–732, 2011.
-  C. Gómez, B. Cleary, and L. Singer, “A study of innovation diffusion through link sharing on Stack Overflow,” in Proceedings of the 10th IEEE Working Conference on Mining Software Repositories, 2013, pp. 81–84.
-  “How do i write a good answer?” Stack Overflow Help Center https://stackoverflow.com/help/how-to-answer, accessed: 04 February 2019.
-  Stack Overflow, “Searching for both word and its negation in a string using java regex,” https://stackoverflow.com/q/21761788/.
-  J. Li, Z. Xing, and A. Sun, “Linklive: discovering web learning resources for developers from q&a discussions,” World Wide Web, pp. 1–27, 2018.
-  J. Slegers, “The decline of Stack Overflow: How trolls have taken over your favorite programming q&a site,” Hackernoon Blog Post https://hackernoon.com/the-decline-of-stack-overflow-7cb69faa575d, accessed: 21 August 2018.
-  Stack Overflow, “Self-signed certificate on android,” https://stackoverflow.com/q/24121224.
-  ——, “Mongo find() with regex in java only return one entry,” https://stackoverflow.com/a/24890987.
-  ——, “Java regular expression to discover regular expression,” https://stackoverflow.com/q/30910685.
-  M. Gemmell, “What have you tried?” Blog, https://mattgemmell.com/what-have-you-tried/, 2008.
-  Stack Overflow, “Does anyone know how extract my date string & change the format?” https://stackoverflow.com/q/15565774.
-  “How to reference material written by others,” Stack Overflow Help Center https://stackoverflow.com/help/referencing, accessed: 04 February 2019.
-  S. Baltes, L. Dumani, C. Treude, and S. Diehl, “SOTorrent: Reconstructing and analyzing the evolution of Stack Overflow posts,” in Proceedings of the 15th International Conference on Mining Software Repositories (MSR 2018), 2018, pp. 319–330.
-  S. Baltes and L. Dumani, “SOTorrent dataset 2018-06-17,” Jun. 2018. [Online]. Available: https://doi.org/10.5281/zenodo.1295405
-  ——, “SOTorrent dataset 2018-07-31,” Jul. 2018. [Online]. Available: https://doi.org/10.5281/zenodo.1401828
-  R. Agrawal, T. Imieliński, and A. Swami, “Mining association rules between sets of items in large databases,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1993, pp. 207–216.
-  S. Baltes, C. Treude, and M. P. Robillard, “Contextual Documentation Referencing on Stack Overflow — Supplementary Material,” Feb. 2019. [Online]. Available: https://doi.org/10.5281/zenodo.2556642
-  S. Baltes, M. P. Robillard, and C. Treude, “sbaltes/condor on GitHub,” 2019. [Online]. Available: https://doi.org/10.5281/zenodo.2557446
-  K. Charmaz, Constructing grounded theory, 2nd ed. Sage, 2014.
-  J. Cohen, “A coefficient of agreement for nominal scales,” Educational and Psychological Measurement, vol. 20, no. 1, pp. 37–46, 1960.
R. A. Fisher, “On the interpretation of
from contingency tables, and the calculation of p,”Journal of the Royal Statistical Society, vol. 85, no. 1, pp. 87–94, 1922.
-  R. Agrawal and R. Srikant, “Fast algorithms for mining association rules in large databases,” in Proceedings of 20th International Conference on Very Large Data Bases (VLDB 1994), 1994, pp. 487–499.
-  Stack Overflow, “Finding the index of the first match of a regular expression in java,” https://stackoverflow.com/q/8752252#comment10904903_8752252.
-  N. Vincent, I. Johnson, and B. Hecht, “Examining wikipedia with a broader lens: Quantifying the value of Wikipedia’s relationships with other large-scale online communities,” in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 2018, pp. 566:1–566:13.
-  Stack Overflow, “how to check if string contains only numerics or letters properly? android,” https://stackoverflow.com/a/34792055.
-  ——, “Regular expression match a-alphanumeric&b-digits&c-digits,” https://stackoverflow.com/q/17267166/.
-  M. F. Porter, “An algorithm for suffix stripping,” Program, vol. 14, no. 3, pp. 130–137, 1980.
D. McFadden, “Conditional logit analysis of qualitative choice behavior,” inFrontiers in Econometrics, P. Zarembka, Ed. New York, NY, USA: Wiley, 1973, ch. 4, pp. 105–142.
-  C. Tantithamthavorn and A. E. Hassan, “An experience report on defect modelling in practice: Pitfalls and challenges,” in Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice. ACM, 2018, pp. 286–295.
-  J. Jiarpakdee, C. Tantithamthavorn, and C. Treude, “Autospearman: Automatically mitigating correlated metrics for interpreting defect models,” in Proceedings of the 34th International Conference on Software Maintenance and Evolution, 2018, to appear.
-  Stack Overflow, “Split regex to extract strings of contiguous characters,” https://stackoverflow.com/q/13596454.
-  ——, “Usage of — and :== in java doc,” https://stackoverflow.com/q/35762611.
-  ——, “Regex handling zero-length match,” https://stackoverflow.com/a/28153330.
-  A. Chevalier and M. Kicka, “Web designers and web users: Influence of the ergonomic quality of the web site on the information search,” International Journal of Human-Computer Studies, vol. 64, no. 10, pp. 1031 – 1048, 2006. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1071581906000838
-  P. Morrison and E. Murphy-Hill, “Is programming knowledge related to age? An exploration of Stack Overflow,” in 10th International Working Conference on Mining Software Repositories (MSR 2013), T. Zimmermann, M. Di Penta, and S. Kim, Eds. San Francisco, CA, USA: IEEE, 2013, pp. 69–72.
-  C. Parnin, C. Treude, and L. Grammel, “Crowd documentation: Exploring the coverage and the dynamics of api discussions on Stack Overflow,” eorgia Institute of Technology, Tech. Rep., 2012.
-  X. Xia, L. Bao, D. Lo, P. S. Kochhar, A. E. Hassan, and Z. Xing, “What do developers search for on the web?” Empirical Software Engineering, vol. 22, no. 6, pp. 3149–3185, 2017.
-  S. Baltes and S. Diehl, “Usage and attribution of Stack Overflow code snippets in GitHub projects,” Empirical Software Engineering, pp. 1–44, 2018.
-  C. Treude and M. P. Robillard, “Understanding Stack Overflow Code Fragments,” in 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME 2017), H. Mei, L. Zhang, and T. Zimmermann, Eds. Shanghai, China: IEEE Computer Society, 2017, pp. 509–513.
-  S. Wang, Lo David, and L. Jiang, “An empirical study on developer interactions in StackOverflow,” in 28th Annual ACM Symposium on Applied Computing (SAC 2013), S. Y. Shin and J. C. Maldonado, Eds. Coimbra, Portugal: ACM, 2013, pp. 1019–1024.
-  M. Allamanis and C. Sutton, “Why, when, and what: Analyzing Stack Overflow questions by topic, type, and code,” in 12th Working Conference on Mining Software Repositories (MSR 2015), M. Di Penta, M. Pinzger, and R. Robbes, Eds. Florence, Italy: IEEE Computer Society, 2015, pp. 53–56.
-  J. Yang, C. Hauff, A. Bozzon, and G.-J. Houben, “Asking the right question in collaborative Q&A systems,” in 25th ACM Conference on Hypertext and Social Media (HT 2014), L. Ferres, G. Rossi, V. A. F. Almeida, and E. Herder, Eds. Santiago, Chile: ACM, 2014, pp. 179–189.
-  L. Ponzanelli, A. Mocci, A. Bacchelli, and M. Lanza, “Understanding and classifying the quality of technical forum questions,” in 14th International Conference on Quality Software (QSIC 2014), W. E. Wong and B. McMillin, Eds. Allen, TX, USA: IEEE, 2014, pp. 343–352.
-  M. Duijn, A. Kucera, and A. Bacchelli, “Quality Questions Need Quality Code: Classifying Code Fragments on Stack Overflow,” in 12th Working Conference on Mining Software Repositories (MSR 2015), M. Di Penta, M. Pinzger, and R. Robbes, Eds. Florence, Italy: IEEE Computer Society, 2015, pp. 410–413.
-  F. Calefato, F. Lanubile, and N. Novielli, “How to ask for technical help? Evidence-based guidelines for writing questions on Stack Overflow,” Information and Software Technology, vol. 94, pp. 186–207, 2018.
-  H. Hata, C. Treude, R. G. Kula, and T. Ishio, “9.6 million links in source code comments: Purpose, evolution, and decay,” in Proceedings of the 41st International Conference on Software Engineering, 2019.