OptIForest: Optimal Isolation Forest for Anomaly Detection

by   Haolong Xiang, et al.

Anomaly detection plays an increasingly important role in various fields for critical tasks such as intrusion detection in cybersecurity, financial risk detection, and human health monitoring. A variety of anomaly detection methods have been proposed, and a category based on the isolation forest mechanism stands out due to its simplicity, effectiveness, and efficiency, e.g., iForest is often employed as a state-of-the-art detector for real deployment. While the majority of isolation forests use the binary structure, a framework LSHiForest has demonstrated that the multi-fork isolation tree structure can lead to better detection performance. However, there is no theoretical work answering the fundamentally and practically important question on the optimal tree structure for an isolation forest with respect to the branching factor. In this paper, we establish a theory on isolation efficiency to answer the question and determine the optimal branching factor for an isolation tree. Based on the theoretical underpinning, we design a practical optimal isolation forest OptIForest incorporating clustering based learning to hash which enables more information to be learned from data for better isolation quality. The rationale of our approach relies on a better bias-variance trade-off achieved by bias reduction in OptIForest. Extensive experiments on a series of benchmarking datasets for comparative and ablation studies demonstrate that our approach can efficiently and robustly achieve better detection performance in general than the state-of-the-arts including the deep learning based methods.


Isolation Mondrian Forest for Batch and Online Anomaly Detection

We propose a new method, named isolation Mondrian forest (iMondrian fore...

A Mathematical Assessment of the Isolation Tree Method for Data Anomaly Detection in Big Data

We present the mathematical analysis of the Isolation Random Forest Meth...

Interpretable Anomaly Detection with DIFFI: Depth-based Feature Importance for the Isolation Forest

Anomaly Detection is one of the most important tasks in unsupervised lea...

Extended Isolation Forest

We present an extension to the model-free anomaly detection algorithm, I...

Unsupervised Ensemble Methods for Anomaly Detection in PLC-based Process Control

Programmable logic controller (PLC) based industrial control systems (IC...

Distribution and volume based scoring for Isolation Forests

We make two contributions to the Isolation Forest method for anomaly and...

Weighted Random Cut Forest Algorithm for Anomaly Detections

Random cut forest (RCF) algorithms have been developed for anomaly detec...

Please sign up or login with your details

Forgot password? Click here to reset