Model Stability with Continuous Data Updates

01/14/2022
by   Huiting Liu, et al.
0

In this paper, we study the "stability" of machine learning (ML) models within the context of larger, complex NLP systems with continuous training data updates. For this study, we propose a methodology for the assessment of model stability (which we refer to as jitter under various experimental conditions. We find that model design choices, including network architecture and input representation, have a critical impact on stability through experiments on four text classification tasks and two sequence labeling tasks. In classification tasks, non-RNN-based models are observed to be more stable than RNN-based ones, while the encoder-decoder model is less stable in sequence labeling tasks. Moreover, input representations based on pre-trained fastText embeddings contribute to more stability than other choices. We also show that two learning strategies – ensemble models and incremental training – have a significant influence on stability. We recommend ML model designers account for trade-offs in accuracy and jitter when making modeling choices.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset