Extracting Software Requirements from Unstructured Documents

02/04/2022
by   Vladimir Ivanov, et al.
0

Requirements identification in textual documents or extraction is a tedious and error prone task that many researchers suggest automating. We manually annotated the PURE dataset and thus created a new one containing both requirements and non-requirements. Using this dataset, we fine-tuned the BERT model and compare the results with several baselines such as fastText and ELMo. In order to evaluate the model on semantically more complex documents we compare the PURE dataset results with experiments on Request For Information (RFI) documents. The RFIs often include software requirements, but in a less standardized way. The fine-tuned BERT showed promising results on PURE dataset on the binary sentence classification task. Comparing with previous and recent studies dealing with constrained inputs, our approach demonstrates high performance in terms of precision and recall metrics, while being agnostic to the unstructured textual input.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2021

conSultantBERT: Fine-tuned Siamese Sentence-BERT for Matching Jobs and Job Seekers

In this paper we focus on constructing useful embeddings of textual info...
research
06/12/2023

Imbalanced Multi-label Classification for Business-related Text with Moderately Large Label Spaces

In this study, we compared the performance of four different methods for...
research
01/12/2022

PromptBERT: Improving BERT Sentence Embeddings with Prompts

The poor performance of the original BERT for sentence semantic similari...
research
03/01/2021

BERT-based knowledge extraction method of unstructured domain text

With the development and business adoption of knowledge graph, there is ...
research
11/10/2022

Deep Learning Methods for Software Requirement Classification: A Performance Study on the PURE dataset

Requirement engineering (RE) is the first and the most important step in...
research
04/12/2021

WHOSe Heritage: Classification of UNESCO World Heritage "Outstanding Universal Value" Documents with Smoothed Labels

The UNESCO World Heritage List (WHL) is to identify the exceptionally va...
research
01/09/2023

Transfer learning for conflict and duplicate detection in software requirement pairs

Consistent and holistic expression of software requirements is important...

Please sign up or login with your details

Forgot password? Click here to reset