1. Introduction
Anomaly detection (AD) seeks to identify atypical events. Anomalies tend to be domain or problem specific, and many occur over a period of time. We refer to such events as range-based anomalies, as they occur over a range (or period) of time111Range-based anomalies are a specific type of collective anomalies (Chandola et al., 2009). Moreover, range-based anomalies are similar, but not identical, to sequence anomalies (Warrender et al., 1999).. Therefore, it is critical that the accuracy measures for anomalies, and the systems detecting them, capture events that occur over a range of time. Unfortunately, classical metrics for anomaly detection were designed to handle only fixed-point anomalies (Aggarwal, 2013)
. An AD algorithm behaves much like a pattern recognition and binary classification algorithm: it recognizes certain patterns in its input and classifies them as either normal or anomalous. For this class of algorithms,
Recall and Precision are widely used for evaluating the accuracy of the result. They are formally defined as in Equations 1 and 2, where denotes true positives, denotes false positives, and denotes false negatives.(1) | ||||
(2) |
While useful for point-based anomalies, classical recall and precision suffer from their inability to capture, and bias, classification correctness for domain-specific time-series anomalies. Because of this, many time-series AD systems’ accuracy are being misrepresented, as point-based recall and precision are used to measure their effectiveness (Singh and Olinsky, 2017). Furthermore, the need to accurately identify time-series anomalies is growing due to the explosion of streaming and real-time systems (Twitter, 2015; Malhotra et al., 2015; Guha et al., 2016; Ahmad et al., 2017; Lee et al., 2018). To address this, we redefine recall and precision to encompass range-based anomalies. Unlike prior work (Lavin and Ahmad, 2015; Ahmad et al., 2017), our mathematical definitions are a superset of the classical definitions, enabling our system to subsume point-based anomalies. Moreover, our system is broadly generalizable, providing specialization functions to control a domain’s bias along a multi-dimensional axis that is necessary to accommodate the needs of specific domains.
In this short paper, we present novel formal definitions of recall and precision for range-based anomaly detection that both subsume those formerly defined for point-based anomaly detection as well as being customizable to a rich set of application domains. Empirical data has been omitted to meet the venue’s compressed format.
|
|
2. Range-based Recall
Classical Recall rewards an AD system when anomalies are successfully identified (i.e., TP) and penalizes it when they are not (i.e., FN). It is computed by counting the number of anomalous points successfully predicted and then dividing that number by the total number of anomalous points. However, it is not sensitive to domains where a single anomaly can be represented as a range of contiguous points. In this section, we propose a new way to compute recall for such range-based anomalies. Table 1 summarizes our notation.
Notation | Description |
---|---|
set of real anomaly ranges | |
the real anomaly range | |
set of predicted anomaly ranges | |
the predicted anomaly range | |
number of real anomaly ranges | |
number of predicted anomaly ranges | |
relative weight of existence reward | |
relative weight of overlap reward | |
overlap cardinality function | |
overlap size function | |
positional bias function |
Given a set of real anomaly ranges and a set of predicted anomaly ranges , our formulation iterates over the set of all real anomaly ranges (), computing a recall score for each real anomaly range () and adding them up into a total recall score. This total score is then divided by the total number of real anomalies () to obtain an average recall score for the whole time-series.
(3) |
When computing the recall score for a single real anomaly range , we take the following aspects into account:
-
Existence: Identifying an anomaly (even by a single point in ) may be valuable in some application domains.
-
Size: The larger the size of the correctly predicted portion of , the higher the recall score will likely be.
-
Position: In some cases, not only size, but also the relative position of the correctly predicted portion of may be important to the application (e.g., early and late biases).
-
Cardinality: Detecting with a single predicted anomaly range may be more valuable to an application than doing so with multiple different ranges in .
We capture these aspects as a sum of two reward terms weighted by and , respectively, where and . represents the relative importance of rewarding existence, whereas represents the relative importance of rewarding size, position, and cardinality, which all stem from the overlap between and the set of all predicted anomaly ranges ().
(4) |
If anomaly range is identified (i.e., across all ), then an existence reward of is earned.
(5) |
Additionally, an overlap reward, dependent upon three application-defined functions , , and , can be earned. These functions capture the cardinality (), size (), and position () of the overlap. The cardinality term serves as a scaling factor for the rewards earned from size and position of the overlap.
(6) |
The cardinality factor is largest (i.e., 1), when overlaps with at most one predicted anomaly range (i.e., it is identified by a single prediction range). Otherwise, it receives a value defined by the application.
(7) |
The constants ( and ) and functions (, , and ) are tunable according to the needs of the application. Next, we illustrate how they can be customized with examples.
The cardinality factor should generally be inversely proportional to , i.e., the number of distinct prediction ranges that a real anomaly range overlaps. For example, can simply be set to .
Figure (a)a provides an example for the function for size, which can be used with many different functions for positional bias as shown in Figure (b)b. If all index positions are equally important, then the flat bias function should be used. If earlier ones are more important than later ones (e.g., early cancer detection (Kourou et al., 2015), real-time apps (Ahmad et al., 2017)), then the front-end bias function should be used. Finally, if later index positions are more important (e.g., delayed response in robotic defense), then the tail-end bias function should be used.
3. Range-based Precision
Classical is computed by counting the number of successful prediction points (i.e., TP) in proportion to the total number of prediction points (i.e., TP+FP). The key difference between Precision and Recall is that Precision penalizes FPs. In this section, we extend classical precision to handle range-based anomalies. Our formulation follows a similar structure as .
Given a set of real anomaly ranges and a set of predicted anomaly ranges , iterates over the set of predicted anomaly ranges (), computing a precision score for each range () and then sums them. This sum is then divided by the total number of predicted anomalies (), averaging the score for the whole time-series.
(8) |
When computing for a single predicted anomaly range , there is no need for an existence reward, because precision by definition emphasizes prediction quality, and existence by itself is too low a bar for judging the quality of a prediction. This removes the need for and constants. Therefore:
(9) |
, , and are customizable as before. Furthermore, under the same settings as in Section 2 (except and are not needed). Note that, while provides a potential knob for positional bias, we believe that in many domains a flat bias function will suffice for , as an is typically considered uniformly bad wherever it appears in a prediction range.
4. Conclusion
In this paper, we note that traditional recall and precision were invented for point-based analysis. In range-based anomaly detection, anomalies are not necessarily single points, but are, in many cases, ranges. In response, we offered new recall and precision definitions that take ranges into account.
Acknowledgments. This research has been funded in part by Intel.
References
- (1)
- Aggarwal (2013) Charu C. Aggarwal. 2013. Outlier Analysis. Springer.
- Ahmad et al. (2017) Subutai Ahmad, Alexander Lavin, Scott Purdy, and Zuha Agha. 2017. Unsupervised Real-time Anomaly Detection for Streaming Data. Neurocomputing 262 (2017), 134–147.
- Chandola et al. (2009) Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly Detection: A Survey. ACM Computing Surveys 41, 3 (2009), 15:1–15:58.
-
Guha
et al. (2016)
Sudipto Guha, Nina
Mishra, Gourav Roy, and Okke
Schrijvers. 2016.
Robust Random Cut Forest Based Anomaly Detection
on Streams. In
International Conference on Machine Learning (ICML)
. 2712–2721. - Kourou et al. (2015) Konstantina Kourou, Themis P. Exarchos, Konstantinos P. Exarchos, Michalis V. Karamouzis, and Dimitrios I. Fotiadis. 2015. Machine Learning Applications in Cancer Prognosis and Prediction. Computational and Structural Biotechnology Journal 13 (2015), 8–17.
- Lavin and Ahmad (2015) Alexander Lavin and Subutai Ahmad. 2015. Evaluating Real-Time Anomaly Detection Algorithms - The Numenta Anomaly Benchmark. In IEEE International Conference on Machine Learning and Applications (ICMLA). 38–44.
- Lee et al. (2018) Tae Jun Lee, Justin Gottschlich, Nesime Tatbul, Eric Metcalf, and Stan Zdonik. 2018. Greenhouse: A Zero-Positive Machine Learning System for Time-Series Anomaly Detection. https://arxiv.org/abs/1801.03168/. In SysML Conference.
-
Malhotra
et al. (2015)
Pankaj Malhotra, Lovekesh
Vig, Gautam Shroff, and Puneet
Agarwal. 2015.
Long Short Term Memory Networks for Anomaly
Detection in Time Series. In
European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)
. 89–94. - Singh and Olinsky (2017) Nidhi Singh and Craig Olinsky. 2017. Demystifying Numenta Anomaly Benchmark. In International Joint Conference on Neural Networks (IJCNN). 1570–1577.
- Twitter (2015) Twitter. 2015. AnomalyDetection R Package. https://github.com/twitter/AnomalyDetection/. (2015).
- Warrender et al. (1999) Christina Warrender, Stephanie Forrest, and Barak Pearlmutter. 1999. Detecting Intrusions using System Calls: Alternative Data Models. In IEEE Symposium on Security and Privacy. 133–145.