A systematic literature review on the code smells datasets and validation mechanisms

06/02/2023
by   Morteza Zakeri Nasrabadi, et al.
0

The accuracy reported for code smell-detecting tools varies depending on the dataset used to evaluate the tools. Our survey of 45 existing datasets reveals that the adequacy of a dataset for detecting smells highly depends on relevant properties such as the size, severity level, project types, number of each type of smell, number of smells, and the ratio of smelly to non-smelly samples in the dataset. Most existing datasets support God Class, Long Method, and Feature Envy while six smells in Fowler and Beck's catalog are not supported by any datasets. We conclude that existing datasets suffer from imbalanced samples, lack of supporting severity level, and restriction to Java language.

READ FULL TEXT

page 8

page 15

page 17

page 19

research
07/11/2022

Partial Resampling of Imbalanced Data

Imbalanced data is a frequently encountered problem in machine learning....
research
12/20/2019

On The Effect Of Code Review On Code Smells

Code smells are symptoms of poor design quality. Since code review is a ...
research
06/26/2022

An Empirical Study on Bug Severity Estimation Using Source Code Metrics and Static Analysis

In the past couple of decades, significant research efforts are devoted ...
research
10/06/2021

Towards Heuristics for Supporting the Validation of Code Smells

The identification of code smells is largely recognized as a subjective ...
research
05/01/2019

Bean Split Ratio for Dry Bean Canning Quality and Variety Analysis

Splits on canned beans appear in the process of preparation and canning....
research
07/01/2019

Understanding GCC Builtins to Develop Better Tools

C programs can use compiler builtins to provide functionality that the C...

Please sign up or login with your details

Forgot password? Click here to reset