Online Feature Selection for Efficient Learning in Networked Systems

12/15/2021
by   Xiaoxuan Wang, et al.
0

Current AI/ML methods for data-driven engineering use models that are mostly trained offline. Such models can be expensive to build in terms of communication and computing cost, and they rely on data that is collected over extended periods of time. Further, they become out-of-date when changes in the system occur. To address these challenges, we investigate online learning techniques that automatically reduce the number of available data sources for model training. We present an online algorithm called Online Stable Feature Set Algorithm (OSFS), which selects a small feature set from a large number of available data sources after receiving a small number of measurements. The algorithm is initialized with a feature ranking algorithm, a feature set stability metric, and a search policy. We perform an extensive experimental evaluation of this algorithm using traces from an in-house testbed and from a data center in operation. We find that OSFS achieves a massive reduction in the size of the feature set by 1-3 orders of magnitude on all investigated datasets. Most importantly, we find that the accuracy of a predictor trained on a OSFS-produced feature set is somewhat better than when the predictor is trained on a feature set obtained through offline feature selection. OSFS is thus shown to be effective as an online feature selection algorithm and robust regarding the sample interval used for feature selection. We also find that, when concept drift in the data underlying the model occurs, its effect can be mitigated by recomputing the feature set and retraining the prediction model.

READ FULL TEXT
research
10/28/2020

Online feature selection for rapid, low-overhead learning in networked systems

Data-driven functions for operation and management often require measure...
research
04/07/2021

Online Feature Screening for Data Streams with Concept Drift

Screening feature selection methods are often used as a preprocessing st...
research
11/28/2022

Weight Predictor Network with Feature Selection for Small Sample Tabular Biomedical Data

Tabular biomedical data is often high-dimensional but with a very small ...
research
02/05/2019

Robust Regression via Online Feature Selection under Adversarial Data Corruption

The presence of data corruption in user-generated streaming data, such a...
research
06/18/2020

Leveraging Model Inherent Variable Importance for Stable Online Feature Selection

Feature selection can be a crucial factor in obtaining robust and accura...
research
03/30/2018

Online Regression with Model Selection

Online learning algorithms have a wide variety of applications in large ...
research
11/30/2022

Feature Selection with Distance Correlation

Choosing which properties of the data to use as input to multivariate de...

Please sign up or login with your details

Forgot password? Click here to reset