The Evolution of Stack Overflow Posts: Reconstruction and Analysis

11/02/2018
by   Sebastian Baltes, et al.
0

Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of code snippets and free-form text on a wide variety of topics. Like other software artifacts, questions and answers on SO evolve over time, for example when bugs in code snippets are fixed, code is updated to work with a more recent library version, or text surrounding a code snippet is edited for clarity. To be able to analyze how content on SO evolves, we built SOTorrent, an open dataset based on the official SO data dump. SOTorrent provides access to the version history of SO content at the level of whole posts and individual text or code blocks. It connects SO posts to other platforms by aggregating URLs from text blocks and comments, and by collecting references from GitHub files to SO posts. In this paper, we describe how we built SOTorrent, and in particular how we evaluated 134 different string similarity metrics regarding their applicability for reconstructing the version history of text and code blocks. Based on different analyses using the dataset, we present: (1) insights into the evolution of SO posts, e.g., that post edits are usually small, happen soon after the initial creation of the post, and that code is rarely changed without also updating the surrounding text; (2) a qualitative study investigating the close relationship between post edits and comments, (3) a first analysis of code clones on SO together with an investigation of possible licensing risks. Finally, since the initial presentation of the dataset, we improved the post block extraction and our predecessor matching strategy.

READ FULL TEXT
research
03/20/2018

SOTorrent: Reconstructing and Analyzing the Evolution of Stack Overflow Posts

Stack Overflow (SO) is the most popular question-and-answer website for ...
research
09/08/2018

SOTorrent: Studying the Origin, Evolution, and Usage of Stack Overflow Code Snippets

Stack Overflow (SO) is the most popular question-and-answer website for ...
research
03/20/2022

Human Values Violations in Stack Overflow: An Exploratory Study

A growing number of software-intensive systems are being accused of viol...
research
04/16/2023

A Study of Update Request Comments in Stack Overflow Answer Posts

Comments play an important role in updating Stack Overflow (SO) posts. T...
research
02/20/2022

SOTitle: A Transformer-based Post Title Generation Approach for Stack Overflow

On Stack Overflow, developers can not only browse question posts to solv...
research
03/13/2023

Representation Learning for Stack Overflow Posts: How Far are We?

The tremendous success of Stack Overflow has accumulated an extensive co...
research
04/17/2020

An Annotated Dataset of Stack Overflow Post Edits

To improve software engineering, software repositories have been mined f...

Please sign up or login with your details

Forgot password? Click here to reset