Log In Sign Up

Correlated Anomaly Detection from Large Streaming Data

by   Zheng Chen, et al.

Correlated anomaly detection (CAD) from streaming data is a type of group anomaly detection and an essential task in useful real-time data mining applications like botnet detection, financial event detection, industrial process monitor, etc. The primary approach for this type of detection in previous researches is based on principal score (PS) of divided batches or sliding windows by computing top eigenvalues of the correlation matrix, e.g. the Lanczos algorithm. However, this paper brings up the phenomenon of principal score degeneration for large data set, and then mathematically and practically prove current PS-based methods are likely to fail for CAD on large-scale streaming data even if the number of correlated anomalies grows with the data size at a reasonable rate; in reality, anomalies tend to be the minority of the data, and this issue can be more serious. We propose a framework with two novel randomized algorithms rPS and gPS for better detection of correlated anomalies from large streaming data of various correlation strength. The experiment shows high and balanced recall and estimated accuracy of our framework for anomaly detection from a large server log data set and a U.S. stock daily price data set in comparison to direct principal score evaluation and some other recent group anomaly detection algorithms. Moreover, our techniques significantly improve the computation efficiency and scalability for principal score calculation.


page 6

page 8


Real-Time Anomaly Detection for Streaming Analytics

Much of the worlds data is streaming, time-series data, where anomalies ...

Anomaly Detection on Financial Time Series by Principal Component Analysis and Neural Networks

A major concern when dealing with financial time series involving a wide...

Online anomaly detection using statistical leverage for streaming business process events

While several techniques for detecting trace-level anomalies in event lo...

An Efficient Anomaly Detection Approach using Cube Sampling with Streaming Data

Anomaly detection is critical in various fields, including intrusion det...

Anomaly Detection in Big Data

Anomaly is defined as a state of the system that do not conform to the n...

CADDeLaG: Framework for distributed anomaly detection in large dense graph sequences

Random walk based distance measures for graphs such as commute-time dist...

Real-time Anomaly Detection and Classification in Streaming PMU Data

Ensuring secure and reliable operations of the power grid is a primary c...