A Machine Learning Based Ensemble Method for Automatic Multiclass Classification of Decisions

05/03/2021
by   Liming Fu, et al.
0

Stakeholders make various types of decisions with respect to requirements, design, management, and so on during the software development life cycle. Nevertheless, these decisions are typically not well documented and classified due to limited human resources, time, and budget. To this end, automatic approaches provide a promising way. In this paper, we aimed at automatically classifying decisions into five types to help stakeholders better document and understand decisions. First, we collected a dataset from the Hibernate developer mailing list. We then experimented and evaluated 270 configurations regarding feature selection, feature extraction techniques, and machine learning classifiers to seek the best configuration for classifying decisions. Especially, we applied an ensemble learning method and constructed ensemble classifiers to compare the performance between ensemble classifiers and base classifiers. Our experiment results show that (1) feature selection can decently improve the classification results; (2) ensemble classifiers can outperform base classifiers provided that ensemble classifiers are well constructed; (3) BoW + 50 ensemble classifier that combines Naïve Bayes (NB), Logistic Regression (LR), and Support Vector Machine (SVM) achieves the best classification result (with a weighted precision of 0.750, a weighted recall of 0.739, and a weighted F1-score of 0.727) among all the configurations. Our work can benefit various types of stakeholders in software development through providing an automatic approach for effectively classifying decisions into specific types that are relevant to their interests.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2019

Efficient Feature Selection techniques for Sentiment Analysis

Sentiment analysis is a domain of study that focuses on identifying and ...
research
12/15/2022

A new weighted ensemble model for phishing detection based on feature selection

A phishing attack is a sort of cyber assault in which the attacker sends...
research
04/26/2016

A New Approach in Persian Handwritten Letters Recognition Using Error Correcting Output Coding

Classification Ensemble, which uses the weighed polling of outputs, is t...
research
03/19/2021

Empirical Analysis of Machine Learning Configurations for Prediction of Multiple Organ Failure in Trauma Patients

Multiple organ failure (MOF) is a life-threatening condition. Due to its...
research
11/11/2019

Item Response Theory based Ensemble in Machine Learning

In this article, we propose a novel probabilistic framework to improve t...
research
11/10/2020

Glioma Classification Using Multimodal Radiology and Histology Data

Gliomas are brain tumours with a high mortality rate. There are various ...

Please sign up or login with your details

Forgot password? Click here to reset