An Annotated Dataset of Stack Overflow Post Edits

04/17/2020
by   Sebastian Baltes, et al.
13

To improve software engineering, software repositories have been mined for code snippets and bug fixes. Typically, this mining takes place at the level of files or commits. To be able to dig deeper and to extract insights at a higher resolution, we hereby present an annotated dataset that contains over 7 million edits of code and text on Stack Overflow. Our preliminary study indicates that these edits might be a treasure trove for mining information about fine-grained patches, e.g., for the optimisation of non-functional properties.

READ FULL TEXT

page 1

page 2

page 3

research
03/25/2019

git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories

Data from software repositories have become an important foundation for ...
research
02/28/2018

Orion+: Automated Problem Diagnosis in Computing Systems by Mining Metric Data

This work presents the suspicious code at a finer granularity of call st...
research
11/21/2019

Analysing Time-Stamped Co-Editing Networks in Software Development Teams using git2net

Data from software repositories have become an important foundation for ...
research
03/26/2022

MQDD: Pre-training of Multimodal Question Duplicity Detection for Software Engineering Domain

This work proposes a new pipeline for leveraging data collected on the S...
research
03/06/2018

A Gold Standard for Emotion Annotation in Stack Overflow

Software developers experience and share a wide range of emotions throug...
research
11/02/2018

The Evolution of Stack Overflow Posts: Reconstruction and Analysis

Stack Overflow (SO) is the most popular question-and-answer website for ...
research
04/17/2020

Can We Use Stack Overflow as a Source of Explainable Bug-fix Data?

Bug-fix data sets are important for building various software engineerin...

Please sign up or login with your details

Forgot password? Click here to reset