Reduced Robust Random Cut Forest for Out-Of-Distribution detection in machine learning models

06/18/2022
by   Harsh Vardhan, et al.
15

Most machine learning-based regressors extract information from data collected via past observations of limited length to make predictions in the future. Consequently, when input to these trained models is data with significantly different statistical properties from data used for training, there is no guarantee of accurate prediction. Consequently, using these models on out-of-distribution input data may result in a completely different predicted outcome from the desired one, which is not only erroneous but can also be hazardous in some cases. Successful deployment of these machine learning models in any system requires a detection system, which should be able to distinguish between out-of-distribution and in-distribution data (i.e. similar to training data). In this paper, we introduce a novel approach for this detection process using a Reduced Robust Random Cut Forest (RRRCF) data structure, which can be used on both small and large data sets. Similar to the Robust Random Cut Forest (RRCF), RRRCF is a structured, but a reduced representation of the training data sub-space in form of cut trees. Empirical results of this method on both low and high-dimensional data showed that inference about data being in/out of training distribution can be made efficiently and the model is easy to train with no difficult hyper-parameter tuning. The paper discusses two different use-cases for testing and validating results.

READ FULL TEXT

page 1

page 6

research
05/22/2022

Robust Flow-based Conformal Inference (FCI) with Statistical Guarantee

Conformal prediction aims to determine precise levels of confidence in p...
research
01/05/2023

Unsupervised High Impedance Fault Detection Using Autoencoder and Principal Component Analysis

Detection of high impedance faults (HIF) has been one of the biggest cha...
research
03/28/2021

Symbolic regression outperforms other models for small data sets

Machine learning is often applied to obtain predictions and new understa...
research
05/18/2022

Property Unlearning: A Defense Strategy Against Property Inference Attacks

During the training of machine learning models, they may store or "learn...
research
02/01/2022

Weighted Random Cut Forest Algorithm for Anomaly Detections

Random cut forest (RCF) algorithms have been developed for anomaly detec...
research
10/14/2021

Interpretable transformed ANOVA approximation on the example of the prevention of forest fires

The distribution of data points is a key component in machine learning. ...
research
05/28/2021

Bridge Data Center AI Systems with Edge Computing for Actionable Information Retrieval

Extremely high data rates at modern synchrotron and X-ray free-electron ...

Please sign up or login with your details

Forgot password? Click here to reset