Enhancing Automated Software Traceability by Transfer Learning from Open-World Data

07/03/2022
by   Jinfeng Lin, et al.
0

Software requirements traceability is a critical component of the software engineering process, enabling activities such as requirements validation, compliance verification, and safety assurance. However, the cost and effort of manually creating a complete set of trace links across natural language artifacts such as requirements, design, and test-cases can be prohibitively expensive. Researchers have therefore proposed automated link-generation solutions primarily based on information-retrieval (IR) techniques; however, these solutions have failed to deliver the accuracy needed for full adoption in industrial projects. Improvements can be achieved using deep-learning traceability models; however, their efficacy is impeded by the limited size and availability of project-level artifacts and links to serve as training data. In this paper, we address this problem by proposing and evaluating several deep-learning approaches for text-to-text traceability. Our method, named NLTrace, explores three transfer learning strategies that use datasets mined from open world platforms. Through pretraining Language Models (LMs) and leveraging adjacent tracing tasks, we demonstrate that NLTrace can significantly improve the performance of LM based trace models when training links are available. In such scenarios NLTrace outperforms the best performing classical IR method with an 188 Average Precision (MAP). It also outperforms the general LM based trace model by 7 low-resource tracing scenarios where other LM models can not. The knowledge learned from adjacent tasks enables NLTrace to outperform VSM models by 28 on generation challenges when presented with a small number of training examples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2021

Traceability Transformed: Generating more Accurate Links with Pre-Trained BERT Models

Software traceability establishes and leverages associations between div...
research
09/05/2022

Using Consensual Biterms from Text Structures of Requirements and Code to Improve IR-Based Traceability Recovery

Traceability approves trace links among software artifacts based on whet...
research
05/23/2022

Tracing Knowledge in Language Models Back to the Training Data

Neural language models (LMs) have been shown to memorize a great deal of...
research
04/06/2018

Semantically Enhanced Software Traceability Using Deep Learning Techniques

In most safety-critical domains the need for traceability is prescribed ...
research
07/17/2018

Automatic Traceability Maintenance via Machine Learning Classification

Previous studies have shown that software traceability, the ability to l...
research
08/15/2018

Domain Knowledge Discovery Guided by Software Trace Links

Software-intensive projects are specified and modeled using domain termi...
research
04/09/2018

Second-Guessing in Tracing Tasks Considered Harmful?

[Context and motivation] Trace matrices are lynch pins for the developme...

Please sign up or login with your details

Forgot password? Click here to reset