Towards Lifelong Learning for Software Analytics Models: Empirical Study on Brown Build and Risk Prediction

05/16/2023
by   Doriane Olewicki, et al.
0

Nowadays, software analytics tools using machine learning (ML) models to, for example, predict the risk of a code change are well established. However, as the goals of a project shift over time, and developers and their habits change, the performance of said models tends to degrade (drift) over time, until a model is retrained using new data. Current retraining practices typically are an afterthought (and hence costly), requiring a new model to be retrained from scratch on a large, updated data set at random points in time; also, there is no continuity between the old and new model. In this paper, we propose to use lifelong learning (LL) to continuously build and maintain ML-based software analytics tools using an incremental learner that progressively updates the old model using new data. To avoid so-called ”catastrophic forgetting” of important older data points, we adopt a replay buffer of older data, which still allows us to drastically reduce the size of the overall training data set, and hence model training time. We empirically evaluate our LL approach on two industrial use cases, i.e., a brown build detector and a Just-in-Time risk prediction tool, showing how LL in practice manages to at least match traditional retraining-from-scratch performance in terms of F1-score, while using 3.3-13.7x less data at each update, thus considerably speeding up the model updating process. Considering both the computational effort of updates and the time between model updates, the LL setup needs 2-40x less computational effort than retraining-from-scratch setups.

READ FULL TEXT
research
06/16/2023

Catastrophic Forgetting in the Context of Model Updates

A large obstacle to deploying deep learning models in practice is the pr...
research
07/06/2023

OLR-WA Online Regression with Weighted Average

Machine Learning requires a large amount of training data in order to bu...
research
04/09/2023

CILIATE: Towards Fairer Class-based Incremental Learning by Dataset and Training Refinement

Due to the model aging problem, Deep Neural Networks (DNNs) need updates...
research
10/11/2022

Detect, Distill and Update: Learned DB Systems Facing Out of Distribution Data

Machine Learning (ML) is changing DBs as many DB components are being re...
research
06/08/2022

Can Backdoor Attacks Survive Time-Varying Models?

Backdoors are powerful attacks against deep neural networks (DNNs). By p...
research
05/22/2023

Mitigating ML Model Decay in Continuous Integration with Data Drift Detection: An Empirical Study

Background: Machine Learning (ML) methods are being increasingly used fo...
research
01/05/2021

One vs Previous and Similar Classes Learning – A Comparative Study

When dealing with multi-class classification problems, it is common prac...

Please sign up or login with your details

Forgot password? Click here to reset