Anomaly detection stands for detecting outliers (anomalies) in data, i.e. points that deviate significantly from the distribution of the data. Outlier detection, however, is an under-specified and consequently ill-posed task due to its inherent unsupervised nature. Anomaly detection strategies such as distance-based[fan2006nonparametric, ghoting2008fast], density-based [lof, papadimitriou2003loci], and subspace-based methods [keller2012hics, lazarevic2005feature]
have been pioneers in the literature. Additionally, autoencoders[chen2017outlier, sarvari2021unsupervised], and adversarial networks [geiger2020tadgan] have given a substantial contribution. However, the literature neglects assessing the trustfulness of the predicted outcomes, namely their uncertainty.
Uncertainty is a measure of model confidence, which may be learnt from the data [lakshminarayanan2016simple, kendall2017uncertainties] or by the use of extra instances [gal2016dropout]
. Uncertainty estimation has been a long-standing challenge in machine learning. Most recently, it has been successfully adopted to improve performance on object detection[jiang2018acquisition, kuppers2020multivariate, 9156274, neumann2018relaxed]vakhitov2021uncertainty]
, unsupervised and self-supervised learning[fiery2021, 8289350, suris2021hyperfuture]. Yet, uncertainty remains largely unexplored in the context of anomaly detection.
In this work, we propose a novel model based on Hyperbolic uncertainty for Anomaly Detection, which we dub HypAD. We leverage the current state-of-the-art technique for anomaly detection in univariate time series, TadGAN [geiger2020tadgan]. TadGAN detects anomalies by attempting to reconstruct the input signal, making use of an LSTM sequence encoding and two GAN critics, cf. Sec. III-A. We introduce uncertainty into the anomaly detector: we map the input and the reconstructed signal into a hyperbolic space, where the signals additionally have an uncertainty score; and we train the novel embeddings end-to-end with a Poincaré distance loss, cf. Sec. III-B.
The proposed HypAD uses uncertainty to discern whether the reconstruction error is large because the signal is anomalous, or simply because the model cannot reconstruct it well. In the former case, HypAD is certain about the reconstruction (e.g. most signal is well-behaved and the model expects known patterns) but its reconstruction is wrong, as a part of the signal is anomalous. In the latter, HypAD downgrades its anomaly score because it is not certain about the signal reconstruction. This may be because of a complex pattern, which the model did not have enough capacity to learn. The larger uncertainty indicates that the larger reconstruction error may be due to anomaly or to a model failure in the reconstruction. (See discussion in Sec III-B2.)
Thanks to uncertainty, HypAD outperforms the state-of-the-art univariate anomaly detector TadGAN [geiger2020tadgan] on the established univariate benchmarks of NASA, Yahoo, Numenta Anomaly Benchmark [lavin2015evaluating], as well as on two multivariate datasets of daily activities in elderly home residences CASAS [cook2012casas] and industrial water treatment plant SWaT [Mathur2016SWaTAW]. As we show in experimental results in Sec. IV, reducing anomaly scores in uncertain cases also yields fewer false alarms (the model achieves best F1 performance with larger precision).
The main contributions of this work are:
We propose the first model for anomaly detection based on hyperbolic uncertainty;
We propose the novel key idea of detectable anomaly: an instance is anomalous when the model is certain about it but wrong;
We integrate the estimated uncertainty into a state-of-the-art univariate anomaly detector and consistently outperform it on established univariate and multivariate datasets.
Ii Related works
To the best of our knowledge, this is the first work to have combined anomaly detection with uncertainty estimation, and the first work to have further proposed hyperbolic uncertainty for it. Previous work relates to ours from three main perspectives, which we review here: uncertainty estimation techniques, anomaly detection in time series and hyperbolic neural networks.
Ii-a Uncertainty Estimation Techniques
We identified two different strategies of approximating uncertainty. Ensemble-based posterior approximation uses several weak models to make naive predictions and combine them according to a consensus function into a more complex predictive model [dietterich2000ensemble]
. One of the most popular approaches to uncertainty estimation based on ensembles is Monte Carlo (MC) Dropout. It drops neurons on every layer during training and test phases[gal2016dropout].
Generative models for aleatory modelling
use an additional latent variable to make stochastic predictions and evaluate the uncertainty of the model. Generative Adversarial Networks (GANs)[goodfellow2014generative]
play a minmax game where the discriminator needs to distinguish between real examples and the generated outcomes. GANs have state-of-the-art performances and we build on top of that by attaching hyperspace mapping layers to estimate the uncertainty of the model. Another interesting approach to estimate uncertainty is by using energy-based models[du2019implicit, salakhutdinov2009deep, xie2016theory]. They learn an energy function that models the compatibility of the input and the output. Our method transcends energy-based models because the integrated hyperbolic uncertainty mechanism does not suffer from cold- nor warm-start problems [zhang2021dense] which undermine the training complexity [xie2018cooperative].
Ii-B Anomaly Detection in Time Series
We identified five categories of methods proposed in the literature for anomaly detection in time series. Distance-based outlier detectors consider the distance of a point from its k-nearest neighbours [conf/vldb/KnorrN98, angiulli2002fast, ghoting2008fast]. Density-based methods [lof, papadimitriou2003loci, He2003DiscoveringCL, 10.1145/3447548.3467137, 10.1145/3292500.3330672, WangCNLCT21, su2019robust] take into account the density of the point and its neighbours. Prediction-based methods [benkabou2021local, unsupahmad, conf/kdd/HundmanCLCS18] calculate the difference between the predicted value and the true value to detect anomalies. Reconstruction based methods [An2015VariationalAB, Malhotra2016LSTMbasedEF, 10.1145/3447548.3467174] compare the input signal and the reconstructed one in the output layer typically by using autoencoders. These methods assume that anomalies are difficult to reconstruct and are lost when the signal gets mapped to lower dimensions, thus a higher reconstruction error means higher anomaly score. [Malhotra2016LSTMbasedEF] uses an LSTM autoencoder for multi-sensor anomaly detection. [chen2017outlier, sarvari2021unsupervised] use an ensemble of autoencoders to boost performances by focusing on learning the inlier characteristics at each iteration. [li2021multivariate] uses a hierarchical variational auto-encoder with two stochastic latent variables to learn the temporal and inter-metric embeddings for multivariate data.SISVAE [li2020anomaly] uses a variational auto-encoder with a smoothness-inducing prior over possible estimations to capture latent temporal structures of time series without relying on the assumption of constant noise. Recently, GANs have been employed to detect anomalies in time series data. Our method also lies in this category. MAD-GAN [Li2019MADGANMA] combines the discriminator output and reconstruction error to detect anomalies in multivariate time series. BeatGAN [ijcai2019-616] uses encoder-decoder generator with a modified time-warping-based data augmentation to detect anomalies in medical ECG inputs. TadGAN [geiger2020tadgan] uses a cycle-consistent GAN architecture with an encoder-decoder generator and additionally proposes several ways to compute reconstruction error and its combination with the critic outputs. We build on top of TadGAN’s architecture by incorporating the hyperbolic mapping layer to the reconstructed time-windows to assess the uncertainty of the detector.
Ii-C Hyperbolic Neural Networks
Deep representation learning in hyperspaces has gained momentum after the pioneering work of hyperNNs [NEURIPS2018_dbab2adc]
that generalises Euclidean operations (e.g. matrix multiplications) to their counterparts in hyperspace. The authors propose analogue counterparts in hyperspace of neural network components such as fully-connected (FC) layers, multinomial logistic regression (MLR) and recurrent neural networks. Furthermore, methods like Einstain midpoint[gulcehre2018hyperbolic] and Fréchet mean [Lou2020DifferentiatingTT] propose different ways of aggregating features in hyperspace. The work in [shimizu2021hyperbolic] extends hyperNN and proposes Poincaré split/concatenation operations, generalising the convolutional layer to hyperspace. [NEURIPS2019_103303dd, chami2019hyperbolic] propose hyperbolic graph neural networks, leveraging hyperNNs.
Thus formulated, hyperNNs have mainly been adopted to improve performance by leveraging hierarchies and uncertainty in zero-shot learning [Liu_2020_CVPR], re-identification [Khrulkov_2020_CVPR], and action recognition . Of particular interest, [suris2021hyperfuture] has leveraged hyperNNs to model a hierarchy of actions from unlabeled videos. To the best of our knowledge, this is the first work to have applied hyperNNs for sequence modelling with the goal of anomaly detection.
In this section, we first discuss best practises in anomaly detection (Section III-A); then we detail the proposed hyperbolic uncertainty and its use for detecting anomalies (Section III-B); finally we discuss the motivation for it (Section III-C).
The current state-of-the-art in univariate anomaly detection is a reconstruction-based technique [geiger2020tadgan] which additionally leverages a GAN critic score. TadGAN encodes the input data to a latent space and then decodes the encoded data. This encoding-decoding operation requires two mapping functions and . The reconstruction operation can be given as . TadGAN leverages adversarial learning to train the two mappings by using two adversarial critics and . The goal of is to distinguish between the real and the generated time series, while measures the performance of the mapping into latent space. The model is trained using the combination of Wasserstein loss [pmlr-v70-arjovsky17a] and Cycle consistency loss . TadGAN computes the reconstruction error between x and using three types of reconstruction functions: i. Point-wise difference: considers the difference of values at every time stamp; ii. Area difference: is applied on signals of fixed lengths and measures the similarity between local regions; iii. Dynamic time warping: additionally handles time gaps between the two signals to calculate the reconstruction error.
To calculate the anomaly score, TadGAN first normalises the reconstruction error and critic scores by subtracting the mean and dividing by standard deviation. The normalised scores,and , are then combined using their product:
Iii-B Hyperbolic uncertainty for Anomaly detection (HypAD)
We propose a novel model for anomaly detection in time series based on hyperbolic uncertainty. HypAD is a reconstruction-based model and minimises the reconstruction loss, given by a measure of the hyperbolic distance between the input signal and its reconstruction. In hyperbolic space, errors are exponentially larger when predictions are certain. Therefore, HypAD tends to predict either certain correct reconstructions or uncertain possibly mistaken reconstructions.
This leads, as we discuss in Sec. III-C, to a novel definition of detectable anomaly: i.e. the case of a large reconstruction error with high certainty.
Iii-B1 Hyperbolic Reconstruction Error
The proposed HypAD is illustrated in Figure 2. It integrates the machinery of hyperbolic neural networks into the reconstruction-based architecture of TadGAN. In HypAD the input signal is first passed through an encoder, then followed by a decoder sub-network. The output of the decoder as well as the original signal are mapped to the hyperspace, shown as the red edge box with red background.
An -dimensional hyperspace is a Riemannian geometry with a constant negative sectional curvature. As in [Khrulkov_2020_CVPR, suris2021hyperfuture], we adopt the Poincaré ball model of hyperspaces, given by the manifold endowed with the Riemmanian metric , where is the conformal factor and
is the Euclidean metric tensor. For details, see[riemannian, smoothmanifold].
In order to map and to the Poincaré ball, we leverage an exponential map centered at [suris2021hyperfuture]. This is followed by a hyperbolic feed-forward layer [NEURIPS2018_dbab2adc] to estimate the corresponding hyperbolic embeddings and , shown as solid green boxes in Figure 2. Finally, the two hyperbolic embeddings are compared using the Poincaré distance, formulated as follows:
where and are the distances of the embeddings from the center of the Poincaré ball. Note that the same reconstruction error function , is used at training as well as inference time.
Iii-B2 Hyperbolic Uncertainty
A key property of the Poincaré ball is that the distance between two points grows exponentially as we move away from the origin. This means that an erroneous reconstruction towards the circumference of the disk is penalised exponentially more than an erroneous reconstruction close to the centre. This leads to the useful tendency of HypAD to either predict a matched reconstruction ( and are close by) or an unmatched reconstruction towards the origin ( and are far-away, and are small), in order to minimize Eq. (2).
Hence, the distance of the reconstruction to the origin provides a natural estimate of the model’s uncertainty, referred to as hyperbolic uncertainty, , thus formulated:
The smaller the distance from the origin, the more uncertain is the model.
Iii-B3 Combining Hyperbolic Uncertainty with Reconstruction Error and Critic Score
Hyperbolic uncertainty is integrated into the anomaly score as follows:
Equation 4 brings together the reconstruction error (the larger the error, the more likely the anomaly) with the critic score (larger critic scores point to anomalies) and the model certainty: .
The simple multiplication formulation of the certainty of the model reduces the scores of anomalies when HypAD is not confident of the reconstructions. While being simple, this outperforms the current state-of-the-art, as we show in Section IV.
Iii-C Motivation for HypAD
HypAD takes motivation from a key idea: a detectable anomaly is one where the model is certain, but it predicts wrongly. In other words, if the model encounters a known pattern, which it knows how to reconstruct, then it will call it anomalous, if the reconstruction does not match the input signal.
The principled formulation of hyperbolic uncertainty is paramount towards this goal: HypAD predicts a reconstruction as uncertain if it is doubtful that it may be wrong.
Fig. 3 illustrates this key concept for all the datasets. The first, second and third rows correspond to the univariate, U-CASAS and SWaT datasets respectively. The bar plot depicts the average cosine distances between the input signals and their reconstructions, against specific intervals of uncertainty, along the x-axis. The higher the cosine distances, the more distinct the reconstruction is from the provided signals111The Poincaré ball model is conformal to the Euclidean space and it preserves the same angles [shimizu2021hyperbolic].. Note that for the first two rows, the initial three plots (columns) correspond to the signals that report the best improvement in F1-score and the last plot corresponds to the signal with worst improvement. A single bar-plot is reported for SWaT because this consists of a single long-term signal.
Observe in the second row, how HypAD learns to correctly assign higher uncertainty to wronger estimates for the cases of Fall, Weakness, and Nocturia. For the last signal SlowerWalking, HypAD fails to learn a meaningful uncertainty and labels all reconstructions as certain. It is however notable that the representation is still interpretable and the case of failure discernible. Trends are similar for the cases of Univariate and SWaT datasets.
|Num. of signals||53||27||67||100||100||100||6||5||17||7||10||1||1||1||1||1||1|
|Num. of anomalies||67||36||178||200||939||835||6||11||30||14||33||2||2||4||2||2||33|
|Num. of anomaly points||54,696||7,766||1,699||466||943||837||2,418||795||6,312||1,560||15,651||99||239||1,248||276||1,060||10786|
|Percentage of total||9.7%||5.8%||1.8%||0.3%||0.6%||0.5%||10%||9.9%||9.3%||9.8%||9.8%||0.7%||1.7%||8%||2.4%||6.3%||10.05%|
|Num. out of distribution||18,126||642||861||153||21||49||123||15||210||86||520||-||-||-||-||-||-|
|Num. of instances||k||k||k||k||168k||168k||k||k||k||k||k||k||k||k||k||k||k|
We compare HypAD against the current best univariate anomaly detector TadGAN [geiger2020tadgan] on a benchmark of 11 time series, and extend the comparison to 2 multivariate sensor datasets: one comprising a water treatment plant and one comprising daily activities in elderly residences. First, we introduce the benchmarks (Sec. IV-A), then we compare against baselines and the state-of-the-art (Sec. IV-B), finally we conduct ablative studies on the importance of uncertainty for performance and the reduction of false alarms (Sec. IV-C).
Iv-a Datasets, metrics and experimental setup
Table I summarizes the main characteristics of the datasets, which we coarsely divide into univariate, main test bed of our baseline TadGAN [geiger2020tadgan], and multivariate, to which we extend comparison. In the table, we report the sources of signals (NASA, Yahoo, NAB, CASAS and SWaT) and the datasets within each source (SMAP, MSL, A1 etc.), cf. detailed description later in the section.
In the table, the number of signals is the number of time series within the datasets. Note that univariate datasets are composed of multiple signals, while the multivariate only comprise single large time series. The number of anomalies counts the instances, within the time series, labelled as anomalous. These are further detailed as point anomalies, single anomalous values at a specific point in time, or collective anomalies, sets of contiguous times which are altogether anomalous. Yahoo is the sole dataset with point anomalies, and the only with synthetic sequences A2, A3, A4. We also report the percentage of total anomalous points and, following [geiger2020tadgan], for the univariate datasets, the number of out-of-distribution points, exceeding the means by more than .
NASA includes two spacecraft telemetry datasets222https://s3-us-west-2.amazonaws.com/telemanom/data.zip. based on the Mars Science Laboratory (MSL) and the Soil Moisture Active Passive (SMAP) signals. The former consists of scientific and housekeeping engineering data taken from the Rover Environmental Monitoring Station aboard the Mars Science Laboratory. The latter includes measurements of soil moisture and freeze/thaw state from space for all non-liquid water surfaces globally within the top layer of the Earth.
We analyse Yahoo datasets based on real production traffic to Yahoo computing systems. Additionally, we consider three synthetic datasets coming from the same source333https://webscope.sandbox.yahoo.com/catalog.php?datatype=s&did=70. The dataset tests the detection accuracy of various anomaly-types including outliers and change-points. The synthetic dataset consists of time-series with varying trend, noise and seasonality. The real dataset consists of time-series representing the metrics of various Yahoo services.
Numenta Anomaly Benchmark (NAB) is a well-established collection of univariate time-series from real-world application domains444We invite the reader to consult https://github.com/numenta/NAB/tree/master/data for the entire directory of datasets.. To be consistent with [geiger2020tadgan], we analyse Art, AdEx, AWS, Traf, and Tweets from the original collection.
We consider for analysis the SWaT [10.1007/978-3-319-71368-7_8, Mathur2016SWaTAW] and CASAS [cook2012casas, dahmen2021indirectly] datasets. SWaT is collected from a cyber-physical system testbed that is a scaled down replica of an industrial water treatment plant. The data was collected every second for a total of 11 days, where for the first few days the system was operated normally, while for the remaining days certain cyber-physical attacks were launched. Following [10.1145/3447548.3467137], we sample sensor data every every 5 seconds.
CASAS is a collection of two weeks of sensor data from retirement homes. Each sensor reading has a label attached to it, according to the activity of elderly people recognised by human annotators. The 5 time series are a collection of activities, grouped by the medical conditions established prior to the experimentation: Falling (F), More Time in Chair (MTC), Weakness (W), Slower Walking (SW), and Nocturia (N) for nightly time visits. Although sensor readings give fine-grained information, we are interested in creating daily patient profiles. Hence, we collapse them into a single engendered activity with a start and end time555The start time corresponds to the first sensor reading for that activity, whereas the end time is the last reading. for each consecutive sensor signal. We then create a time matrix structure where is the number of days the patient is monitored and 1440 represent the total number of minutes in a day. represents the label performed in minute of day . Lastly, we densify each value with contextual and duration information corresponding to the label therein.
Finally, we create train and test splits of this data, caring to safeguard sequentiality of observations, and gathering the few anomalies into the test set for evaluation only. For this reason we name this Unsupervised-CASAS, dubbed U-CASAS, differing from the original CASAS [cook2012casas, dahmen2021indirectly]
since the latter interleaves anomalous sensor readings with normal instances, thus breaking the sequentiality of anomalies. For each anomalous day encountered in the test set, we pad two days prior to it and two after. If the padding overlaps with another anomalous day, then we concatenate666The concatenation procedure merges common sequences. If the sequence contains more than one anomalous day within the two-day padding window, the padded sequence gets collapsed into a single one. them and perform the padding procedure again. We delete the days assigned to the test set from the overall dataset and assign the rest as training. Moreover, because we employ a time-related strategy, we create time windows of 30 actions to detect anomalies.
Being all the enlisted datasets highly unbalanced, accuracy is misleading. Therefore, as done in [suris2021hyperfuture], we use the F1 score to account for this challenge. Notice that we do not use the cumulative F1 score as proposed in [garg2021evaluation] to evaluate the performances because not all datasets contain anomalous events; rather they contain anomalous data points. Based on [hundman2018detecting], for the univariate and the CASAS datasets, we penalize high false positive rates and encourage the detection of true positives in a timely fashion. Since anomalies are rare events and come in collective sequences in real-world applications, we proceed as follows:
We record a true positive (TP) if any predicted window overlaps a true anomalous window.
We record a false negative (FN) if a true anomalous window does not overlap any predicted window.
We record a false positive (FP) if a predicted window does not overlap any true anomalous region.
For U-CASAS, we also measure the g-measure which is the geometric mean of the product of recall (R) and precision (P), being a robust metric when classes are imbalanced[dahmen2021indirectly].
|TadGAN [geiger2020tadgan]||0.623||0.680||0.668||0.820||0.631||0.497||0.667||0.667||0.610||0.455||0.605||0.629 0.123|
|HypAD (proposed)||0.565||0.643||0.610||0.670||0.670||0.470||0.777||0.663||0.630||0.570||0.670||0.631 0.075|
is reported from the paper. TadGAN* refers to its reproduced result with the PyTorch version (see Secs.IV-A, IV-B). Mean and standard deviation are computed across all datasets.
|LstmAE||0.085||0.014||0.182||0.108||0.000||0.000||0.158||0.049||0.133||0.035||0.112 0.064||0.041 0.037|
|AE||0.139||0.127||0.033||0.027||0.116||0.103||0.000||0.000||0.158||0.049||0.089 0.062||0.061 0.047|
|ConvAE||0.086||0.014||0.284||0.150||0.251||0.119||0.158||0.048||0.134||0.035||0.183 0.074||0.073 0.052|
|TadGAN*||0.222||0.267||0.570||0.555||0.000||0.000||0.630||0.570||0.267||0.222||0.338 0.233||0.323 0.216|
|HypAD (proposed)||0.447||0.333||0.660||0.610||0.447||0.333||0.470||0.364||0.577||0.5||0.520 0.095||0.428 0.123|
We include the following strategies as our baselines in this paper:
AE [baldi2012autoencoders] - We use a six-layer fully-connected autoencoder.
- We have three layers of convolutional encoding interleaved with max pooling. The decoder has a specular composition as the encoder where the de-convolution is aided by two-dimensional up-sampling layers.
- We use a deep stacked LSTM autoencoder with four layers. The first LSTM hidden and output vectors get passed to the second LSTM layer. The latent representation of the encoder gets then reconstructed in reverse order from the decoder.
TadGAN* [geiger2020tadgan] - We use a one-layer bidirectional LSTM for the generator , and a two-layer bidirectional LSTM for . For the critic we use a fully connected layer, and two dense layers for .
For the first three baselines, we set the number of epochs to 30, the batch size to 32, and the learning rate to. For TadGAN*, we set the epochs to 30, the batch size to 64, the learning rate to , and the iteration for the critic to 5. We use Adam as the optimisation function to train all the baselines.
For our proposed method HypAD, we took inspiration from an online available PyTorch implementation777https://github.com/arunppsg/TadGAN. We leave the architecture of TadGAN unvaried, but we incorporate the hyperbolic transformation888https://github.com/cvlab-columbia/hyperfuture as in [suris2021hyperfuture]
. The hyperparameters are the same in the original paper, but we use the Riemannian Adam999https://github.com/geoopt/geoopt as optimisation function.
Iv-B Comparison to the state of the art
HypAD defines a new state-of-the-art performance for univariate anomaly detection by having the highest average F1-scores of 0.631. In Table II, HypAD outperforms the current best technique (TadGAN*, 0.600) by 5.17%, as well as all baselines by a large margin. Note that we also report the original paper [geiger2020tadgan] performance of TadGAN in the first row (0.629), which we could not reproduce with the available PyTorch code (cf. Sec. IV-A). In the Table, the column F1 reports mean and standard deviation over all datasets. Looking at , HypAD also appears as the most consistent performer. Considering the F1-score, the largest performance gain of HypAD Vs TadGAN are on NAB and NASA datasets, while it is outperformed more largely on the A2, A3 and A4 Yahoo datasets, the sole synthetic univariate ones.
In Table III, we extend the evaluation of HypAD to the multivariate U-CASAS dataset. We cannot include Isudra [dahmen2021indirectly] because the underlying architecture uses a small amount of labels to conduct the selection of parameters for the execution. Additionally, Isudra is trained in a supervised fashion differing from all the other methods reported here. For completeness, we also report the g-measure [dahmen2021indirectly] and its average across all datasets. As shown in the table, HypAD surpasses TadGAN* by 32.51% in terms of average F1-score (0.428 Vs. 0.323).
Finally, in Table IV, we also extend comparison to the multivariate SWaT dataset. Here, following previous work [10.1145/3447548.3467137]
, we also report the precision and recall in addition to the F1-score. HypAD outperforms the baseline TadGAN* (0.753 Vs. 0.722) but both techniques are behind the current state-of-the-art on multivariate anomaly detection, NSIBF[10.1145/3447548.3467137]. Observe however that HypAD achieves its highest F1-performance at the highest precision (0.996) among all other methods. This confirms that HypAD really detects anomalies which it is certain about, i.e. when it understands the time series and it is certain about what to expect, but cannot reconstruct the input signal due to an anomaly.
|EncDec-AD [Malhotra2016LSTMbasedEF] [ICML-WorkShop16]||0.945||0.620||0.748|
|OmniAnomaly [10.1145/3292500.3330672] [KDD19]||0.979||0.757||0.854|
|USAD [10.1145/3394486.3403392] [KDD20]||0.987||0.740||0.846|
|NSIBF [10.1145/3447548.3467137] [KDD21]||0.982||0.863||0.919|
|Hyperbolic + Uncertainty (HypAD)||0.631||0.428||0.753|
Iv-C Ablation Studies
In Table V, we analyze the importance of the hyperbolic embedding and the use of uncertainty for anomaly detection on the univariate datasets, as well as the multivariate U-CASAS and SWaT. The first row shows the performance of the model in Euclidean space (average F1-score across datasets). This corresponds to TadGAN* in Table II, III and IV. In the second row, we report performance for the hyperbolic TadGAN*, without including uncertainty (cf. Sec. III-B1). This improves marginally over the univariate (0.604 Vs. 0.600) and more importantly over the U-CASAS datasets (0.397 Vs. 0.323), but it decreases performance in SWaT (0.566 Vs. 0.722), which we further analyze in the following subsection.
The complete proposed HypAD model, in the third row, improves consistently on both ablative variants. HypAD yields 0.631 on the univariate datasets, 0.428 on U-CASAS, and 0.753 on SWaT. Improvements wrt not using uncertainty are large, respectively 4.5%, 7.8% and 33%. We therefore conclude that uncertainty is fundamental for ameliorating anomaly detection.
Qualitative Ablation on SWaT: In Figure 4, we demonstrate qualitative ablation on the SWaT dataset, which consists of a single long signal. The three plots correspond to the three ablative variants of Table V. In all plots, the blue and green points represent the predicted anomaly scores and the ground-truth anomalies, respectively. The red line denotes the anomaly detection threshold i.e. blue points above the red line are the predicted anomalies. False positives are blue points above the red line threshold outside the green (ground-truth) anomalous regions.
As it shows in the figure, the Euclidean model (first plot) yields many false positives; the hyperbolic model w/o uncertainty (second plot) reduces the number of false positives substantially, but it also misses anomalies, esp. in the middle signal part. This explains the drop in the F-1 score for the SWaT dataset (cf. Table V). Integrating hyperbolic uncertainty (proposed HypAD, third plot) recovers detection of these anomalies, it increases true positives but maintains the number of false positives low, thus yielding the best F1 score of 0.753.
We have proposed a novel model for anomaly detection based on hyperbolic uncertainty, HypAD. The proposed hyperbolic uncertainty allows HypAD to self-adjusts its output, encouraging the model to either predict a correct reconstruction or a less certain wrong reconstruction. This benefits anomaly detection in two ways: it provides better reconstructions of the signal (the deviation from those being anomalous) and it yields a measure of certainty. This is a novel viewpoint on anomaly detection: detectable anomaly instances are those which are certain but wrongly predicted.