FRUGAL: Unlocking SSL for Software Analytics

08/22/2021
by   Huy Tu, et al.
0

Standard software analytics often involves having a large amount of data with labels in order to commission models with acceptable performance. However, prior work has shown that such requirements can be expensive, taking several weeks to label thousands of commits, and not always available when traversing new research problems and domains. Unsupervised Learning is a promising direction to learn hidden patterns within unlabelled data, which has only been extensively studied in defect prediction. Nevertheless, unsupervised learning can be ineffective by itself and has not been explored in other domains (e.g., static analysis and issue close time). Motivated by this literature gap and technical limitations, we present FRUGAL, a tuned semi-supervised method that builds on a simple optimization scheme that does not require sophisticated (e.g., deep learners) and expensive (e.g., 100 learner's configurations (via a simple grid search) while validating our design decision of labelling just 2.5 As shown by the experiments of this paper FRUGAL outperforms the state-of-the-art adoptable static code warning recognizer and issue closed time predictor, while reducing the cost of labelling by a factor of 40 (from 100 2.5 labelling especially in validating prior work or researching new problems. Based on this work, we suggest that proponents of complex and expensive methods should always baseline such methods against simpler and cheaper alternatives. For instance, a semi-supervised learner like FRUGAL can serve as a baseline to the state-of-the-art software analytics.

READ FULL TEXT
research
02/03/2023

Less, but Stronger: On the Value of Strong Heuristics in Semi-supervised Learning for Software Analytics

In many domains, there are many examples and far fewer labels for those ...
research
02/02/2022

How to Improve Deep Learning for Software Analytics (a case study with code smell detection)

To reduce technical debt and make code more maintainable, it is importan...
research
06/04/2020

Semi-supervised and Unsupervised Methods for Heart Sounds Classification in Restricted Data Environments

Automated heart sounds classification is a much-required diagnostic tool...
research
03/13/2018

Applications of Psychological Science for Actionable Analytics

Actionable analytics are those that humans can understand, and operation...
research
01/15/2021

When SIMPLE is better than complex: A case study on deep learning for predicting Bugzilla issue close time

Is deep learning over-hyped? Where are the case studies that compare sta...
research
11/10/2022

When Less is More: On the Value of "Co-training" for Semi-Supervised Software Defect Predictors

Labeling a module defective or non-defective is an expensive task. Hence...
research
10/06/2021

SNEAK: Faster Interactive Search-based Software Engineering (using Semi-Supervised Learning)

When reasoning over complex models, AI tools can generate too many solut...

Please sign up or login with your details

Forgot password? Click here to reset