I Introduction
Anomaly detection stands for detecting outliers (anomalies) in data, i.e. points that deviate significantly from the distribution of the data. Outlier detection, however, is an underspecified and consequently illposed task due to its inherent unsupervised nature. Anomaly detection strategies such as distancebased
[fan2006nonparametric, ghoting2008fast], densitybased [lof, papadimitriou2003loci], and subspacebased methods [keller2012hics, lazarevic2005feature]have been pioneers in the literature. Additionally, autoencoders
[chen2017outlier, sarvari2021unsupervised], and adversarial networks [geiger2020tadgan] have given a substantial contribution. However, the literature neglects assessing the trustfulness of the predicted outcomes, namely their uncertainty.Uncertainty is a measure of model confidence, which may be learnt from the data [lakshminarayanan2016simple, kendall2017uncertainties] or by the use of extra instances [gal2016dropout]
. Uncertainty estimation has been a longstanding challenge in machine learning. Most recently, it has been successfully adopted to improve performance on object detection
[jiang2018acquisition, kuppers2020multivariate, 9156274, neumann2018relaxed][vakhitov2021uncertainty], unsupervised and selfsupervised learning
[fiery2021, 8289350, suris2021hyperfuture]. Yet, uncertainty remains largely unexplored in the context of anomaly detection.In this work, we propose a novel model based on Hyperbolic uncertainty for Anomaly Detection, which we dub HypAD. We leverage the current stateoftheart technique for anomaly detection in univariate time series, TadGAN [geiger2020tadgan]. TadGAN detects anomalies by attempting to reconstruct the input signal, making use of an LSTM sequence encoding and two GAN critics, cf. Sec. IIIA. We introduce uncertainty into the anomaly detector: we map the input and the reconstructed signal into a hyperbolic space, where the signals additionally have an uncertainty score; and we train the novel embeddings endtoend with a Poincaré distance loss, cf. Sec. IIIB.
The proposed HypAD uses uncertainty to discern whether the reconstruction error is large because the signal is anomalous, or simply because the model cannot reconstruct it well. In the former case, HypAD is certain about the reconstruction (e.g. most signal is wellbehaved and the model expects known patterns) but its reconstruction is wrong, as a part of the signal is anomalous. In the latter, HypAD downgrades its anomaly score because it is not certain about the signal reconstruction. This may be because of a complex pattern, which the model did not have enough capacity to learn. The larger uncertainty indicates that the larger reconstruction error may be due to anomaly or to a model failure in the reconstruction. (See discussion in Sec IIIB2.)
Thanks to uncertainty, HypAD outperforms the stateoftheart univariate anomaly detector TadGAN [geiger2020tadgan] on the established univariate benchmarks of NASA, Yahoo, Numenta Anomaly Benchmark [lavin2015evaluating], as well as on two multivariate datasets of daily activities in elderly home residences CASAS [cook2012casas] and industrial water treatment plant SWaT [Mathur2016SWaTAW]. As we show in experimental results in Sec. IV, reducing anomaly scores in uncertain cases also yields fewer false alarms (the model achieves best F1 performance with larger precision).
The main contributions of this work are:

We propose the first model for anomaly detection based on hyperbolic uncertainty;

We propose the novel key idea of detectable anomaly: an instance is anomalous when the model is certain about it but wrong;

We integrate the estimated uncertainty into a stateoftheart univariate anomaly detector and consistently outperform it on established univariate and multivariate datasets.
Ii Related works
To the best of our knowledge, this is the first work to have combined anomaly detection with uncertainty estimation, and the first work to have further proposed hyperbolic uncertainty for it. Previous work relates to ours from three main perspectives, which we review here: uncertainty estimation techniques, anomaly detection in time series and hyperbolic neural networks.
Iia Uncertainty Estimation Techniques
We identified two different strategies of approximating uncertainty. Ensemblebased posterior approximation uses several weak models to make naive predictions and combine them according to a consensus function into a more complex predictive model [dietterich2000ensemble]
. One of the most popular approaches to uncertainty estimation based on ensembles is Monte Carlo (MC) Dropout. It drops neurons on every layer during training and test phases
[gal2016dropout].Generative models for aleatory modelling
use an additional latent variable to make stochastic predictions and evaluate the uncertainty of the model. Generative Adversarial Networks (GANs)
[goodfellow2014generative]play a minmax game where the discriminator needs to distinguish between real examples and the generated outcomes. GANs have stateoftheart performances and we build on top of that by attaching hyperspace mapping layers to estimate the uncertainty of the model. Another interesting approach to estimate uncertainty is by using energybased models
[du2019implicit, salakhutdinov2009deep, xie2016theory]. They learn an energy function that models the compatibility of the input and the output. Our method transcends energybased models because the integrated hyperbolic uncertainty mechanism does not suffer from cold nor warmstart problems [zhang2021dense] which undermine the training complexity [xie2018cooperative].IiB Anomaly Detection in Time Series
We identified five categories of methods proposed in the literature for anomaly detection in time series. Distancebased outlier detectors consider the distance of a point from its knearest neighbours [conf/vldb/KnorrN98, angiulli2002fast, ghoting2008fast]. Densitybased methods [lof, papadimitriou2003loci, He2003DiscoveringCL, 10.1145/3447548.3467137, 10.1145/3292500.3330672, WangCNLCT21, su2019robust] take into account the density of the point and its neighbours. Predictionbased methods [benkabou2021local, unsupahmad, conf/kdd/HundmanCLCS18] calculate the difference between the predicted value and the true value to detect anomalies. Reconstruction based methods [An2015VariationalAB, Malhotra2016LSTMbasedEF, 10.1145/3447548.3467174] compare the input signal and the reconstructed one in the output layer typically by using autoencoders. These methods assume that anomalies are difficult to reconstruct and are lost when the signal gets mapped to lower dimensions, thus a higher reconstruction error means higher anomaly score. [Malhotra2016LSTMbasedEF] uses an LSTM autoencoder for multisensor anomaly detection. [chen2017outlier, sarvari2021unsupervised] use an ensemble of autoencoders to boost performances by focusing on learning the inlier characteristics at each iteration. [li2021multivariate] uses a hierarchical variational autoencoder with two stochastic latent variables to learn the temporal and intermetric embeddings for multivariate data.SISVAE [li2020anomaly] uses a variational autoencoder with a smoothnessinducing prior over possible estimations to capture latent temporal structures of time series without relying on the assumption of constant noise. Recently, GANs have been employed to detect anomalies in time series data. Our method also lies in this category. MADGAN [Li2019MADGANMA] combines the discriminator output and reconstruction error to detect anomalies in multivariate time series. BeatGAN [ijcai2019616] uses encoderdecoder generator with a modified timewarpingbased data augmentation to detect anomalies in medical ECG inputs. TadGAN [geiger2020tadgan] uses a cycleconsistent GAN architecture with an encoderdecoder generator and additionally proposes several ways to compute reconstruction error and its combination with the critic outputs. We build on top of TadGAN’s architecture by incorporating the hyperbolic mapping layer to the reconstructed timewindows to assess the uncertainty of the detector.
IiC Hyperbolic Neural Networks
Deep representation learning in hyperspaces has gained momentum after the pioneering work of hyperNNs [NEURIPS2018_dbab2adc]
that generalises Euclidean operations (e.g. matrix multiplications) to their counterparts in hyperspace. The authors propose analogue counterparts in hyperspace of neural network components such as fullyconnected (FC) layers, multinomial logistic regression (MLR) and recurrent neural networks. Furthermore, methods like Einstain midpoint
[gulcehre2018hyperbolic] and Fréchet mean [Lou2020DifferentiatingTT] propose different ways of aggregating features in hyperspace. The work in [shimizu2021hyperbolic] extends hyperNN and proposes Poincaré split/concatenation operations, generalising the convolutional layer to hyperspace. [NEURIPS2019_103303dd, chami2019hyperbolic] propose hyperbolic graph neural networks, leveraging hyperNNs.Thus formulated, hyperNNs have mainly been adopted to improve performance by leveraging hierarchies and uncertainty in zeroshot learning [Liu_2020_CVPR], reidentification [Khrulkov_2020_CVPR], and action recognition [9157196]. Of particular interest, [suris2021hyperfuture] has leveraged hyperNNs to model a hierarchy of actions from unlabeled videos. To the best of our knowledge, this is the first work to have applied hyperNNs for sequence modelling with the goal of anomaly detection.
Iii Method
In this section, we first discuss best practises in anomaly detection (Section IIIA); then we detail the proposed hyperbolic uncertainty and its use for detecting anomalies (Section IIIB); finally we discuss the motivation for it (Section IIIC).
Iiia Background
The current stateoftheart in univariate anomaly detection is a reconstructionbased technique [geiger2020tadgan] which additionally leverages a GAN critic score. TadGAN encodes the input data to a latent space and then decodes the encoded data. This encodingdecoding operation requires two mapping functions and . The reconstruction operation can be given as . TadGAN leverages adversarial learning to train the two mappings by using two adversarial critics and . The goal of is to distinguish between the real and the generated time series, while measures the performance of the mapping into latent space. The model is trained using the combination of Wasserstein loss [pmlrv70arjovsky17a] and Cycle consistency loss [8237506]. TadGAN computes the reconstruction error between x and using three types of reconstruction functions: i. Pointwise difference: considers the difference of values at every time stamp; ii. Area difference: is applied on signals of fixed lengths and measures the similarity between local regions; iii. Dynamic time warping: additionally handles time gaps between the two signals to calculate the reconstruction error.
To calculate the anomaly score, TadGAN first normalises the reconstruction error and critic scores by subtracting the mean and dividing by standard deviation. The normalised scores,
and , are then combined using their product:(1) 
IiiB Hyperbolic uncertainty for Anomaly detection (HypAD)
We propose a novel model for anomaly detection in time series based on hyperbolic uncertainty. HypAD is a reconstructionbased model and minimises the reconstruction loss, given by a measure of the hyperbolic distance between the input signal and its reconstruction. In hyperbolic space, errors are exponentially larger when predictions are certain. Therefore, HypAD tends to predict either certain correct reconstructions or uncertain possibly mistaken reconstructions.
This leads, as we discuss in Sec. IIIC, to a novel definition of detectable anomaly: i.e. the case of a large reconstruction error with high certainty.
IiiB1 Hyperbolic Reconstruction Error
The proposed HypAD is illustrated in Figure 2. It integrates the machinery of hyperbolic neural networks into the reconstructionbased architecture of TadGAN. In HypAD the input signal is first passed through an encoder, then followed by a decoder subnetwork. The output of the decoder as well as the original signal are mapped to the hyperspace, shown as the red edge box with red background.
An dimensional hyperspace is a Riemannian geometry with a constant negative sectional curvature. As in [Khrulkov_2020_CVPR, suris2021hyperfuture], we adopt the Poincaré ball model of hyperspaces, given by the manifold endowed with the Riemmanian metric , where is the conformal factor and
is the Euclidean metric tensor. For details, see
[riemannian, smoothmanifold].In order to map and to the Poincaré ball, we leverage an exponential map centered at [suris2021hyperfuture]. This is followed by a hyperbolic feedforward layer [NEURIPS2018_dbab2adc] to estimate the corresponding hyperbolic embeddings and , shown as solid green boxes in Figure 2. Finally, the two hyperbolic embeddings are compared using the Poincaré distance, formulated as follows:
(2) 
where and are the distances of the embeddings from the center of the Poincaré ball. Note that the same reconstruction error function , is used at training as well as inference time.
IiiB2 Hyperbolic Uncertainty
A key property of the Poincaré ball is that the distance between two points grows exponentially as we move away from the origin. This means that an erroneous reconstruction towards the circumference of the disk is penalised exponentially more than an erroneous reconstruction close to the centre. This leads to the useful tendency of HypAD to either predict a matched reconstruction ( and are close by) or an unmatched reconstruction towards the origin ( and are faraway, and are small), in order to minimize Eq. (2).
Hence, the distance of the reconstruction to the origin provides a natural estimate of the model’s uncertainty, referred to as hyperbolic uncertainty, , thus formulated:
(3) 
The smaller the distance from the origin, the more uncertain is the model.
IiiB3 Combining Hyperbolic Uncertainty with Reconstruction Error and Critic Score
Hyperbolic uncertainty is integrated into the anomaly score as follows:
(4) 
Equation 4 brings together the reconstruction error (the larger the error, the more likely the anomaly) with the critic score (larger critic scores point to anomalies) and the model certainty: .
The simple multiplication formulation of the certainty of the model reduces the scores of anomalies when HypAD is not confident of the reconstructions. While being simple, this outperforms the current stateoftheart, as we show in Section IV.
IiiC Motivation for HypAD
HypAD takes motivation from a key idea: a detectable anomaly is one where the model is certain, but it predicts wrongly. In other words, if the model encounters a known pattern, which it knows how to reconstruct, then it will call it anomalous, if the reconstruction does not match the input signal.
The principled formulation of hyperbolic uncertainty is paramount towards this goal: HypAD predicts a reconstruction as uncertain if it is doubtful that it may be wrong.
Fig. 3 illustrates this key concept for all the datasets. The first, second and third rows correspond to the univariate, UCASAS and SWaT datasets respectively. The bar plot depicts the average cosine distances between the input signals and their reconstructions, against specific intervals of uncertainty, along the xaxis. The higher the cosine distances, the more distinct the reconstruction is from the provided signals^{1}^{1}1The Poincaré ball model is conformal to the Euclidean space and it preserves the same angles [shimizu2021hyperbolic].. Note that for the first two rows, the initial three plots (columns) correspond to the signals that report the best improvement in F1score and the last plot corresponds to the signal with worst improvement. A single barplot is reported for SWaT because this consists of a single longterm signal.
Observe in the second row, how HypAD learns to correctly assign higher uncertainty to wronger estimates for the cases of Fall, Weakness, and Nocturia. For the last signal SlowerWalking, HypAD fails to learn a meaningful uncertainty and labels all reconstructions as certain. It is however notable that the representation is still interpretable and the case of failure discernible. Trends are similar for the cases of Univariate and SWaT datasets.
Univariate  Multivariate  

NASA  Yahoo  NAB  CASAS  SWaT  
SMAP  MSL  A1  A2  A3  A4  Art  AdEx  AWS  Traf  Tweets  F  MTC  W  SW  N  
Num. of signals  53  27  67  100  100  100  6  5  17  7  10  1  1  1  1  1  1  
Num. of anomalies  67  36  178  200  939  835  6  11  30  14  33  2  2  4  2  2  33  
Point anomalies  0  0  68  33  935  833  0  0  0  0  0  0  0  0  0  0  0  
Collective anomalies  67  36  110  167  4  2  6  11  30  14  33  2  2  4  2  2  33  
Num. of anomaly points  54,696  7,766  1,699  466  943  837  2,418  795  6,312  1,560  15,651  99  239  1,248  276  1,060  10786  
Percentage of total  9.7%  5.8%  1.8%  0.3%  0.6%  0.5%  10%  9.9%  9.3%  9.8%  9.8%  0.7%  1.7%  8%  2.4%  6.3%  10.05%  
Num. out of distribution  18,126  642  861  153  21  49  123  15  210  86  520              
Num. of instances  k  k  k  k  168k  168k  k  k  k  k  k  k  k  k  k  k  k  
Synthetic?  No  No  No  Yes  Yes  Yes  Yes  No  No  No  No  No  No  No  No  No  No 
Iv Results
We compare HypAD against the current best univariate anomaly detector TadGAN [geiger2020tadgan] on a benchmark of 11 time series, and extend the comparison to 2 multivariate sensor datasets: one comprising a water treatment plant and one comprising daily activities in elderly residences. First, we introduce the benchmarks (Sec. IVA), then we compare against baselines and the stateoftheart (Sec. IVB), finally we conduct ablative studies on the importance of uncertainty for performance and the reduction of false alarms (Sec. IVC).
Iva Datasets, metrics and experimental setup
Table I summarizes the main characteristics of the datasets, which we coarsely divide into univariate, main test bed of our baseline TadGAN [geiger2020tadgan], and multivariate, to which we extend comparison. In the table, we report the sources of signals (NASA, Yahoo, NAB, CASAS and SWaT) and the datasets within each source (SMAP, MSL, A1 etc.), cf. detailed description later in the section.
In the table, the number of signals is the number of time series within the datasets. Note that univariate datasets are composed of multiple signals, while the multivariate only comprise single large time series. The number of anomalies counts the instances, within the time series, labelled as anomalous. These are further detailed as point anomalies, single anomalous values at a specific point in time, or collective anomalies, sets of contiguous times which are altogether anomalous. Yahoo is the sole dataset with point anomalies, and the only with synthetic sequences A2, A3, A4. We also report the percentage of total anomalous points and, following [geiger2020tadgan], for the univariate datasets, the number of outofdistribution points, exceeding the means by more than .
Univariate datasets
NASA includes two spacecraft telemetry datasets^{2}^{2}2https://s3uswest2.amazonaws.com/telemanom/data.zip. based on the Mars Science Laboratory (MSL) and the Soil Moisture Active Passive (SMAP) signals. The former consists of scientific and housekeeping engineering data taken from the Rover Environmental Monitoring Station aboard the Mars Science Laboratory. The latter includes measurements of soil moisture and freeze/thaw state from space for all nonliquid water surfaces globally within the top layer of the Earth.
We analyse Yahoo datasets based on real production traffic to Yahoo computing systems. Additionally, we consider three synthetic datasets coming from the same source^{3}^{3}3https://webscope.sandbox.yahoo.com/catalog.php?datatype=s&did=70. The dataset tests the detection accuracy of various anomalytypes including outliers and changepoints. The synthetic dataset consists of timeseries with varying trend, noise and seasonality. The real dataset consists of timeseries representing the metrics of various Yahoo services.
Numenta Anomaly Benchmark (NAB) is a wellestablished collection of univariate timeseries from realworld application domains^{4}^{4}4We invite the reader to consult https://github.com/numenta/NAB/tree/master/data for the entire directory of datasets.. To be consistent with [geiger2020tadgan], we analyse Art, AdEx, AWS, Traf, and Tweets from the original collection.
Multivariate datasets
We consider for analysis the SWaT [10.1007/9783319713687_8, Mathur2016SWaTAW] and CASAS [cook2012casas, dahmen2021indirectly] datasets. SWaT is collected from a cyberphysical system testbed that is a scaled down replica of an industrial water treatment plant. The data was collected every second for a total of 11 days, where for the first few days the system was operated normally, while for the remaining days certain cyberphysical attacks were launched. Following [10.1145/3447548.3467137], we sample sensor data every every 5 seconds.
CASAS is a collection of two weeks of sensor data from retirement homes. Each sensor reading has a label attached to it, according to the activity of elderly people recognised by human annotators. The 5 time series are a collection of activities, grouped by the medical conditions established prior to the experimentation: Falling (F), More Time in Chair (MTC), Weakness (W), Slower Walking (SW), and Nocturia (N) for nightly time visits. Although sensor readings give finegrained information, we are interested in creating daily patient profiles. Hence, we collapse them into a single engendered activity with a start and end time^{5}^{5}5The start time corresponds to the first sensor reading for that activity, whereas the end time is the last reading. for each consecutive sensor signal. We then create a time matrix structure where is the number of days the patient is monitored and 1440 represent the total number of minutes in a day. represents the label performed in minute of day . Lastly, we densify each value with contextual and duration information corresponding to the label therein.
Finally, we create train and test splits of this data, caring to safeguard sequentiality of observations, and gathering the few anomalies into the test set for evaluation only. For this reason we name this UnsupervisedCASAS, dubbed UCASAS, differing from the original CASAS [cook2012casas, dahmen2021indirectly]
since the latter interleaves anomalous sensor readings with normal instances, thus breaking the sequentiality of anomalies. For each anomalous day encountered in the test set, we pad two days prior to it and two after. If the padding overlaps with another anomalous day, then we concatenate
^{6}^{6}6The concatenation procedure merges common sequences. If the sequence contains more than one anomalous day within the twoday padding window, the padded sequence gets collapsed into a single one. them and perform the padding procedure again. We delete the days assigned to the test set from the overall dataset and assign the rest as training. Moreover, because we employ a timerelated strategy, we create time windows of 30 actions to detect anomalies.Metrics
Being all the enlisted datasets highly unbalanced, accuracy is misleading. Therefore, as done in [suris2021hyperfuture], we use the F1 score to account for this challenge. Notice that we do not use the cumulative F1 score as proposed in [garg2021evaluation] to evaluate the performances because not all datasets contain anomalous events; rather they contain anomalous data points. Based on [hundman2018detecting], for the univariate and the CASAS datasets, we penalize high false positive rates and encourage the detection of true positives in a timely fashion. Since anomalies are rare events and come in collective sequences in realworld applications, we proceed as follows:

We record a true positive (TP) if any predicted window overlaps a true anomalous window.

We record a false negative (FN) if a true anomalous window does not overlap any predicted window.

We record a false positive (FP) if a predicted window does not overlap any true anomalous region.
For UCASAS, we also measure the gmeasure which is the geometric mean of the product of recall (R) and precision (P), being a robust metric when classes are imbalanced
[dahmen2021indirectly].NASA  YAHOO  NAB  

MSL  SMAP  A1  A2  A3  A4  Art  AdEx  AWS  Traf  Tweets  F1 ()  
TadGAN [geiger2020tadgan]  0.623  0.680  0.668  0.820  0.631  0.497  0.667  0.667  0.610  0.455  0.605  0.629 0.123 
AE  0.199  0.270  0.283  0.008  0.100  0.073  0.283  0.100  0.239  0.088  0.296  0.176 0.099 
LstmAE  0.317  0.318  0.310  0.023  0.097  0.089  0.261  0.130  0.223  0.136  0.299  0.200 0.103 
ConvAE  0.300  0.292  0.301  0.000  0.103  0.073  0.289  0.129  0.254  0.082  0.301  0.212 0.096 
TadGAN*  0.500  0.580  0.620  0.865  0.750  0.576  0.420  0.550  0.670  0.480  0.590  0.600 0.115 
HypAD (proposed)  0.565  0.643  0.610  0.670  0.670  0.470  0.777  0.663  0.630  0.570  0.670  0.631 0.075 
is reported from the paper. TadGAN* refers to its reproduced result with the PyTorch version (see Secs.
IVA, IVB). Mean and standard deviation are computed across all datasets.F  W  N  SW  MTC  

gmeasure  F1  gmeasure  F1  gmeasure  F1  gmeasure  F1  gmeasure  F1 



LstmAE  0.085  0.014  0.182  0.108  0.000  0.000  0.158  0.049  0.133  0.035  0.112 0.064  0.041 0.037  
AE  0.139  0.127  0.033  0.027  0.116  0.103  0.000  0.000  0.158  0.049  0.089 0.062  0.061 0.047  
ConvAE  0.086  0.014  0.284  0.150  0.251  0.119  0.158  0.048  0.134  0.035  0.183 0.074  0.073 0.052  
TadGAN*  0.222  0.267  0.570  0.555  0.000  0.000  0.630  0.570  0.267  0.222  0.338 0.233  0.323 0.216  
HypAD (proposed)  0.447  0.333  0.660  0.610  0.447  0.333  0.470  0.364  0.577  0.5  0.520 0.095  0.428 0.123 
Baselines
We include the following strategies as our baselines in this paper:

AE [baldi2012autoencoders]  We use a sixlayer fullyconnected autoencoder.

ConvAE [maggipinto2018convolutional]
 We have three layers of convolutional encoding interleaved with max pooling. The decoder has a specular composition as the encoder where the deconvolution is aided by twodimensional upsampling layers.

LstmAE [sagheer2019unsupervised]
 We use a deep stacked LSTM autoencoder with four layers. The first LSTM hidden and output vectors get passed to the second LSTM layer. The latent representation of the encoder gets then reconstructed in reverse order from the decoder.

TadGAN* [geiger2020tadgan]  We use a onelayer bidirectional LSTM for the generator , and a twolayer bidirectional LSTM for . For the critic we use a fully connected layer, and two dense layers for .
Implementation details.
For the first three baselines, we set the number of epochs to 30, the batch size to 32, and the learning rate to
. For TadGAN*, we set the epochs to 30, the batch size to 64, the learning rate to , and the iteration for the critic to 5. We use Adam as the optimisation function to train all the baselines.For our proposed method HypAD, we took inspiration from an online available PyTorch implementation^{7}^{7}7https://github.com/arunppsg/TadGAN. We leave the architecture of TadGAN unvaried, but we incorporate the hyperbolic transformation^{8}^{8}8https://github.com/cvlabcolumbia/hyperfuture as in [suris2021hyperfuture]
. The hyperparameters are the same in the original paper, but we use the Riemannian Adam
^{9}^{9}9https://github.com/geoopt/geoopt as optimisation function.IvB Comparison to the state of the art
HypAD defines a new stateoftheart performance for univariate anomaly detection by having the highest average F1scores of 0.631. In Table II, HypAD outperforms the current best technique (TadGAN*, 0.600) by 5.17%, as well as all baselines by a large margin. Note that we also report the original paper [geiger2020tadgan] performance of TadGAN in the first row (0.629), which we could not reproduce with the available PyTorch code (cf. Sec. IVA). In the Table, the column F1 reports mean and standard deviation over all datasets. Looking at , HypAD also appears as the most consistent performer. Considering the F1score, the largest performance gain of HypAD Vs TadGAN are on NAB and NASA datasets, while it is outperformed more largely on the A2, A3 and A4 Yahoo datasets, the sole synthetic univariate ones.
In Table III, we extend the evaluation of HypAD to the multivariate UCASAS dataset. We cannot include Isudra [dahmen2021indirectly] because the underlying architecture uses a small amount of labels to conduct the selection of parameters for the execution. Additionally, Isudra is trained in a supervised fashion differing from all the other methods reported here. For completeness, we also report the gmeasure [dahmen2021indirectly] and its average across all datasets. As shown in the table, HypAD surpasses TadGAN* by 32.51% in terms of average F1score (0.428 Vs. 0.323).
Finally, in Table IV, we also extend comparison to the multivariate SWaT dataset. Here, following previous work [10.1145/3447548.3467137]
, we also report the precision and recall in addition to the F1score. HypAD outperforms the baseline TadGAN* (0.753 Vs. 0.722) but both techniques are behind the current stateoftheart on multivariate anomaly detection, NSIBF
[10.1145/3447548.3467137]. Observe however that HypAD achieves its highest F1performance at the highest precision (0.996) among all other methods. This confirms that HypAD really detects anomalies which it is certain about, i.e. when it understands the time series and it is certain about what to expect, but cannot reconstruct the input signal due to an anomaly.Model  Precision  Recall  F1 

EncDecAD [Malhotra2016LSTMbasedEF] [ICMLWorkShop16]  0.945  0.620  0.748 
DAGMM [zong2018deep][ICLR18]  0.946  0.747  0.835 
OmniAnomaly [10.1145/3292500.3330672] [KDD19]  0.979  0.757  0.854 
USAD [10.1145/3394486.3403392] [KDD20]  0.987  0.740  0.846 
TadGAN* [BigData20]  0.937  0.587  0.722 
NSIBF [10.1145/3447548.3467137] [KDD21]  0.982  0.863  0.919 
HypAD (Ours)  0.996  0.605  0.753 
Univariate  UCASAS  SWaT  

Euclidean (TadGAN)  0.600  0.323  0.722 
Hyperbolic  0.604  0.397  0.566 
Hyperbolic + Uncertainty (HypAD)  0.631  0.428  0.753 
IvC Ablation Studies
In Table V, we analyze the importance of the hyperbolic embedding and the use of uncertainty for anomaly detection on the univariate datasets, as well as the multivariate UCASAS and SWaT. The first row shows the performance of the model in Euclidean space (average F1score across datasets). This corresponds to TadGAN* in Table II, III and IV. In the second row, we report performance for the hyperbolic TadGAN*, without including uncertainty (cf. Sec. IIIB1). This improves marginally over the univariate (0.604 Vs. 0.600) and more importantly over the UCASAS datasets (0.397 Vs. 0.323), but it decreases performance in SWaT (0.566 Vs. 0.722), which we further analyze in the following subsection.
The complete proposed HypAD model, in the third row, improves consistently on both ablative variants. HypAD yields 0.631 on the univariate datasets, 0.428 on UCASAS, and 0.753 on SWaT. Improvements wrt not using uncertainty are large, respectively 4.5%, 7.8% and 33%. We therefore conclude that uncertainty is fundamental for ameliorating anomaly detection.
Qualitative Ablation on SWaT: In Figure 4, we demonstrate qualitative ablation on the SWaT dataset, which consists of a single long signal. The three plots correspond to the three ablative variants of Table V. In all plots, the blue and green points represent the predicted anomaly scores and the groundtruth anomalies, respectively. The red line denotes the anomaly detection threshold i.e. blue points above the red line are the predicted anomalies. False positives are blue points above the red line threshold outside the green (groundtruth) anomalous regions.
As it shows in the figure, the Euclidean model (first plot) yields many false positives; the hyperbolic model w/o uncertainty (second plot) reduces the number of false positives substantially, but it also misses anomalies, esp. in the middle signal part. This explains the drop in the F1 score for the SWaT dataset (cf. Table V). Integrating hyperbolic uncertainty (proposed HypAD, third plot) recovers detection of these anomalies, it increases true positives but maintains the number of false positives low, thus yielding the best F1 score of 0.753.
V Conclusions
We have proposed a novel model for anomaly detection based on hyperbolic uncertainty, HypAD. The proposed hyperbolic uncertainty allows HypAD to selfadjusts its output, encouraging the model to either predict a correct reconstruction or a less certain wrong reconstruction. This benefits anomaly detection in two ways: it provides better reconstructions of the signal (the deviation from those being anomalous) and it yields a measure of certainty. This is a novel viewpoint on anomaly detection: detectable anomaly instances are those which are certain but wrongly predicted.