BERT_SE: A Pre-trained Language Representation Model for Software Engineering

The application of Natural Language Processing (NLP) has achieved a high level of relevance in several areas. In the field of software engineering (SE), NLP applications are based on the classification of similar texts (e.g. software requirements), applied in tasks of estimating software effort, selection of human resources, etc. Classifying software requirements has been a complex task, considering the informality and complexity inherent in the texts produced during the software development process. The pre-trained embedding models are shown as a viable alternative when considering the low volume of textual data labeled in the area of software engineering, as well as the lack of quality of these data. Although there is much research around the application of word embedding in several areas, to date, there is no knowledge of studies that have explored its application in the creation of a specific model for the domain of the SE area. Thus, this article presents the proposal for a contextualized embedding model, called BERT_SE, which allows the recognition of specific and relevant terms in the context of SE. The assessment of BERT_SE was performed using the software requirements classification task, demonstrating that this model has an average improvement rate of 13 the BERT_base model, made available by the authors of BERT. The code and pre-trained models are available at https://github.com/elianedb.

READ FULL TEXT
research
05/24/2022

Deep Learning Meets Software Engineering: A Survey on Pre-Trained Models of Source Code

Recent years have seen the successful application of deep learning to so...
research
01/02/2023

Adaptive Fine-tuning for Multiclass Classification over Software Requirement Data

The analysis of software requirement specifications (SRS) using Natural ...
research
07/21/2018

ELICA: An Automated Tool for Dynamic Extraction of Requirements Relevant Information

Requirements elicitation requires extensive knowledge and deep understan...
research
07/17/2023

Utilization of Pre-trained Language Model for Adapter-based Knowledge Transfer in Software Engineering

Software Engineering (SE) Pre-trained Language Models (PLMs), such as Co...
research
06/30/2020

SE3M: A Model for Software Effort Estimation Using Pre-trained Embedding Models

Estimating effort based on requirement texts presents many challenges, e...
research
03/31/2022

On the Evaluation of NLP-based Models for Software Engineering

NLP-based models have been increasingly incorporated to address SE probl...
research
12/04/2020

A Comparison of Natural Language Understanding Platforms for Chatbots in Software Engineering

Chatbots are envisioned to dramatically change the future of Software En...

Please sign up or login with your details

Forgot password? Click here to reset