Traceability Transformed: Generating more Accurate Links with Pre-Trained BERT Models

by   Jinfeng Lin, et al.

Software traceability establishes and leverages associations between diverse development artifacts. Researchers have proposed the use of deep learning trace models to link natural language artifacts, such as requirements and issue descriptions, to source code; however, their effectiveness has been restricted by availability of labeled data and efficiency at runtime. In this study, we propose a novel framework called Trace BERT (T-BERT) to generate trace links between source code and natural language artifacts. To address data sparsity, we leverage a three-step training strategy to enable trace models to transfer knowledge from a closely related Software Engineering challenge, which has a rich dataset, to produce trace links with much higher accuracy than has previously been achieved. We then apply the T-BERT framework to recover links between issues and commits in Open Source Projects. We comparatively evaluated accuracy and efficiency of three BERT architectures. Results show that a Single-BERT architecture generated the most accurate links, while a Siamese-BERT architecture produced comparable results with significantly less execution time. Furthermore, by learning and transferring knowledge, all three models in the framework outperform classical IR trace models. On the three evaluated real-word OSS projects, the best T-BERT stably outperformed the VSM model with average improvements of 60.31 (MAP). RNN severely underperformed on these projects due to insufficient training data, while T-BERT overcame this problem by using pretrained language models and transfer learning.


Enhancing Automated Software Traceability by Transfer Learning from Open-World Data

Software requirements traceability is a critical component of the softwa...

Traceability in the Wild: Automatically Augmenting Incomplete Trace Links

Software and systems traceability is widely accepted as an essential ele...

Semantically Enhanced Software Traceability Using Deep Learning Techniques

In most safety-critical domains the need for traceability is prescribed ...

Improving the Effectiveness of Traceability Link Recovery using Hierarchical Bayesian Networks

Traceability is a fundamental component of the modern software developme...

Generating and Visualizing Trace Link Explanations

Recent breakthroughs in deep-learning (DL) approaches have resulted in t...

Comparative Study of Machine Learning Models and BERT on SQuAD

This study aims to provide a comparative analysis of performance of cert...

SLGPT: Using Transfer Learning to Directly Generate Simulink Model Files and Find Bugs in the Simulink Toolchain

Finding bugs in a commercial cyber-physical system (CPS) development too...