Traceability Transformed: Generating more Accurate Links with Pre-Trained BERT Models

02/08/2021
by   Jinfeng Lin, et al.
0

Software traceability establishes and leverages associations between diverse development artifacts. Researchers have proposed the use of deep learning trace models to link natural language artifacts, such as requirements and issue descriptions, to source code; however, their effectiveness has been restricted by availability of labeled data and efficiency at runtime. In this study, we propose a novel framework called Trace BERT (T-BERT) to generate trace links between source code and natural language artifacts. To address data sparsity, we leverage a three-step training strategy to enable trace models to transfer knowledge from a closely related Software Engineering challenge, which has a rich dataset, to produce trace links with much higher accuracy than has previously been achieved. We then apply the T-BERT framework to recover links between issues and commits in Open Source Projects. We comparatively evaluated accuracy and efficiency of three BERT architectures. Results show that a Single-BERT architecture generated the most accurate links, while a Siamese-BERT architecture produced comparable results with significantly less execution time. Furthermore, by learning and transferring knowledge, all three models in the framework outperform classical IR trace models. On the three evaluated real-word OSS projects, the best T-BERT stably outperformed the VSM model with average improvements of 60.31 (MAP). RNN severely underperformed on these projects due to insufficient training data, while T-BERT overcame this problem by using pretrained language models and transfer learning.

READ FULL TEXT
research
07/03/2022

Enhancing Automated Software Traceability by Transfer Learning from Open-World Data

Software requirements traceability is a critical component of the softwa...
research
07/15/2023

Improving Trace Link Recommendation by Using Non-Isotropic Distances and Combinations

The existence of trace links between artifacts of the software developme...
research
04/06/2018

Traceability in the Wild: Automatically Augmenting Incomplete Trace Links

Software and systems traceability is widely accepted as an essential ele...
research
04/06/2018

Semantically Enhanced Software Traceability Using Deep Learning Techniques

In most safety-critical domains the need for traceability is prescribed ...
research
03/16/2023

Measuring Improvement of F_1-Scores in Detection of Self-Admitted Technical Debt

Artificial Intelligence and Machine Learning have witnessed rapid, signi...
research
04/25/2022

Generating and Visualizing Trace Link Explanations

Recent breakthroughs in deep-learning (DL) approaches have resulted in t...
research
11/01/2022

LinkFormer: Automatic Contextualised Link Recovery of Software Artifacts in both Project-based and Transfer Learning Settings

Software artifacts often interact with each other throughout the softwar...

Please sign up or login with your details

Forgot password? Click here to reset