LinkFormer: Automatic Contextualised Link Recovery of Software Artifacts in both Project-based and Transfer Learning Settings

11/01/2022
by   Maliheh Izadi, et al.
0

Software artifacts often interact with each other throughout the software development cycle. Associating related artifacts is a common practice for effective documentation and maintenance of software projects. Conventionally, to register the link between an issue report and its associated commit, developers manually include the issue identifier in the message of the relevant commit. Research has shown that developers tend to forget to connect said artifacts manually, resulting in a loss of links. Hence, several link recovery techniques were proposed to discover and revive such links automatically. However, the literature mainly focuses on improving the prediction accuracy on a randomly-split test set, while neglecting other important aspects of this problem, including the effect of time and generalizability of the predictive models. In this paper, we propose LinkFormer to address this problem from three aspects; 1) Accuracy: To better utilize contextual information for prediction, we employ the Transformer architecture and fine-tune multiple pre-trained models on textual and metadata of issues and commits. 2) Data leakage: To empirically assess the impact of time through the splitting policy, we train and test our proposed model along with several existing approaches on both randomly- and temporally split data. 3) Generalizability: To provide a generic model that can perform well across different projects, we further fine-tune LinkFormer in two transfer learning settings. We empirically show that researchers should preserve the temporal flow of data when training learning-based models to resemble the real-world setting. In addition, LinkFormer significantly outperforms the state-of-the-art by large margins. LinkFormer is also capable of extending the knowledge it learned to unseen projects with little to no historical data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2023

EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit Link Recovery

Issue-commit links, as a type of software traceability links, play a vit...
research
08/10/2021

Issue Link Label Recovery and Prediction for Open Source Software

Modern open source software development heavily relies on the issue trac...
research
07/17/2018

Automatic Traceability Maintenance via Machine Learning Classification

Previous studies have shown that software traceability, the ability to l...
research
05/18/2020

Improving the Effectiveness of Traceability Link Recovery using Hierarchical Bayesian Networks

Traceability is a fundamental component of the modern software developme...
research
02/08/2021

Traceability Transformed: Generating more Accurate Links with Pre-Trained BERT Models

Software traceability establishes and leverages associations between div...
research
06/14/2022

Automated Detection of Typed Links in Issue Trackers

Stakeholders in software projects use issue trackers like JIRA to captur...
research
04/27/2022

Beyond Duplicates: Towards Understanding and Predicting Link Types in Issue Tracking Systems

Software projects use Issue Tracking Systems (ITS) like JIRA to track is...

Please sign up or login with your details

Forgot password? Click here to reset