Anomaly detection (AD) entails determining whether a data point comes from the same distribution as a prior set of normal data. Anomaly detection systems are used to discover credit card fraud, detect cyber intrusion attacks and identify cancer cells. Since normal examples are readily available while anomalies tend to be rare in production environments, we consider the semi-supervised or one-class setting where only normal instances are present in the training set. It is important to remark that the outlier space is often much larger than the inlier space, though anomalous observations are uncommon. For instance, the space of normal dog images is sparse in the space of anomalous non-dog images. This discrepancy between data availability and space sizes makes anomaly detection hard, as a model must account for the entire input space despite only having information about a minuscule subspace. Deep learning models generally struggle with this challenge since it is impractical to manage their behavior over the entire input space. Linear models, however, do not face such a difficulty.
To gain control over our model’s behavior on the entire input space, we employ a linear transformation as its main component and subsequently penalize its Frobenius norm. However, this transformation has to be performed over an exponentially large feature space for our model to be expressive—an impossible task with full matrices. Thus, we leverage tensor networks as sparse representations of such large matrices. All-in-all, our model is an end-to-end anomaly detector for general data that produces a normality score via its decision function . Our novel method is showcased on several tabular and image datasets. We attain significant improvements over prior methods on tabular datasets and competitive results on image datasets, despite not exploiting the locality of image pixels.
2 Related Work
Stoudenmire and Schwab  first demonstrated the potential of tensor networks in classification tasks, using the well-known density matrix renormalization group algorithm from computational physics [White, 1992] to train Matrix Product States (MPS) [Schollwöck, 2011, Östlund and Rommer, 1995]
as a weight matrix in classifyingMNIST digits [LeCun et al., 2010]. Subsequent work has also applied tensor networks in the classification of medical images [Selvan and Dam, 2020] and regression [Reyes and Stoudenmire, 2020] but investigations into unsupervised and semi-supervised settings are lacking.
The literature on anomaly detection (AD) is vast and we will focus on reviewing previous work in the one-class context for arbitrary data (e.g. not restricted to images). Kernel-based methods, such as the One-Class SVM (OC-SVM) [Manevitz and Yousef, 2002], learn a tight fit of inliers in an implicit high-dimensional feature space while the non-distance-based Isolation Forest [Liu et al., 2008]
directly distinguishes inliers and outliers based on partitions of feature values. Unfortunately, such classical AD algorithms presume the clustering of normal instances in some feature space and hence suffer from the curse of dimensionality, requiring substantial feature selection to operate on feature-rich, multivariate data[Bengio et al., 2013].
and Deep Belief Networks (DBN)[Erfani et al., 2016], that were later fed to a OC-SVM. End-to-end deep learning models, without explicit AD objectives, have also been devised. Auto-Encoder AD models [Hawkins et al., 2002, Sakurada and Yairi, 2014, Chen et al., 2017] learn an encoding of inliers and subsequently use the reconstruction loss as a decision function. Other AE-variants, such as Deep Convolutional Auto-Encoders (DCAE) [Masci et al., 2011, Makhzani and Frey, 2015], have also been studied by [Seebock et al., 2019, Richter and Roy, 2017]
. Next, generative models learn a probability distribution for inliers and subsequently identify anomalous instances as those with low probabilities or those which are difficult to find in their latent spaces (in the case of latent variable models). Generative Adversarial Networks (GANs)[Goodfellow et al., 2014] have been popular in the latter category, with the advent of AnoGAN [Schlegl et al., 2017], a more efficient variant [Zenati et al., 2018] based on BiGANs [Donahue et al., 2016], GANomaly [Akcay et al., 2019] and ADGAN [Deecke et al., 2018].
Deep learning models with objectives that resemble shallow kernel-based AD algorithms have also been explored. Such models train neural networks as explicit feature maps while concurrently finding the tightest decision boundary around the transformed training instances in the output space. Deep SVDD (DSVDD)[Ruff et al., 2018]
seeks a minimal-volume hypersphere encapsulating inliers, motivated by the Support Vector Data Description (SVDD)[Tax and Duin, 2004], while One-Class Neural Networks (OC-NN) [Chalapathy et al., 2018]
searches for a maximum-margin hyperplane separating normal instances from the origin, in a fashion similar to OC-SVM. Contemporary attention has been directed towards self-supervised models, mostly for images[Golan and El-Yaniv, 2018, Gidaris et al., 2018, Hendrycks et al., 2019], with the exception of the more recent GOAD [Bergman and Hoshen, 2020] for general data. These models transform an input point into several altered instances according to a fixed class of rules and train a classifier that predicts a score for each altered instance belonging to its corresponding class of transformation. Outliers are then reflected as points with extreme scores, aggregated over all classes. In particular, GOAD unifies DSVDD and the self-supervised GEOM model [Golan and El-Yaniv, 2018] by defining the anomaly score of each transformed instance as its distance from its class’ hypersphere center.
3 Model Description
In this section, we introduce our model that we call Tensor Network Anomaly Detector (TNAD). TNAD, as an end-to-end network with an explicit AD objective, falls into the second last category of models with one crucial caveat. Its notion of tightness does not rely on the volume of the decision boundary, which is an inadequate measure in the one-class setting. To illustrate this point, DSVDD and GOAD may find tiny hyperspheres during training but still have a loose fit around inliers, as they may map all possible inputs into the hyperspheres—a problem acknowledged by the original authors of DSVDD as “hypersphere collapse” [Tax and Duin, 2004]. In the scenario where outliers are available, one can indeed judge the tightness of a fit by the separation of inliers and outliers with respect to a decision boundary but in the one-class setting where no such points of reference are available, the tightness of a model’s fit on training instances must be gauged relative to its other predictions. As such, we design TNAD to incorporate a canonical measure of its overall tendency to predict normality.
A schematic of TNAD is depicted in Figure 1. A fixed feature map is applied to map inputs onto the surface of a unit hypersphere in a vector space with dimension in the number of original features . The training instances are sparse in this high-dimensional space and thus enables the learnt component of our model to be expressive, despite being a simple linear transformation . Upon action by , normal instances will be mapped close to the surface of a hypersphere in of an arbitrarily chosen radius (set to be in our experiments) while anomalous instances can be identified as those close to the origin. The decision function of the model with respect to an input , where a larger value indicates normality, is then
To accommodate the possible predominance of outliers, we allow to have a smaller exponential scaling with so that for to have a large null-space. can thus be understood informally as a “projection” that annihilates the subspace spanned by outliers. To parameterize which has dimensions exponential in , we leverage the Matrix Product Operator (MPO) tensor network, which is both memory and computationally-efficient.
To obtain a tight fit around inliers, we penalize the Frobenius norm of during training.
where are the matrix elements of with respect to some basis. Since
is the sum of squared singular values of, it captures the total extent to which the model is likely to deem an instance as normal. Ultimately, such a spectral property reflects the overall behavior of the model, rather than its restricted behavior on the training set.
3.2 Matrix Product Operator Model
In this section, the details of TNAD is expounded in tensor network notation—for which the reader is recommended to consult a brief introduction in our supplementary material and more comprehensive reviews in [Biamonte and Bergholm, 2017, Orús, 2014]. The input space is assumed to be for (flattened) grey-scale images and for tabular data, where is the number of features. Given a predetermined map where is a parameter known as the physical dimension, an input is first passed through a feature map defined by
Recall is a -dimensional and thus very large vector space. In tensor network notation,
The map is chosen to satisfy for all such that for all , implying that maps all points to the unit hypersphere in . Two forms of were explored. The -dimensional “trigonometric” embedding is defined as
Our grey-scale image experiments were conducted with with , which possesses the following natural interpretation. Since are the two standard basis vectors of , the set of binary-valued images is mapped to the standard basis of . Intuitively, the values and correspond to extreme cases in a feature (which reflects the pixel brightness in this case) so are devised to be orthogonal for maximal separation. Now, for any , since the inner product satisfies , is highly sensitive to each individual feature—flipping a single pixel value from to would lead to an orthogonal vector after . In essence, then contains all extreme representatives of the input space , which can be seen to be images of highest contrast, and is mapped by to the standard basis of for maximal separation. The squared F-norm of our subsequent linear transformation then obeys
Recalling as the value of TNAD’s decision function on an input , is thus conferred the meaning of the total degree of normality predicted by the model on these extreme representatives—apt since images with the best contrast should be the most distinguishable. Unfortunately, such an interpretation of does not extend to so we also considered the -dimensional “fourier” embedding on tabular data, defined component-wise (indexing from ) as
This map has a period of and the following property. On , the -th value in is mapped to the -th standard basis vector of . Thus, and its periodic-equivalents are deemed as extreme cases and a similar analysis follows as before. Ultimately, both versions of segregate points close in the -norm of the input space by mapping inputs into the exponentially-large space , buttressing the subsequent linear transformation .
After the feature map, we learn a tensor where for some parameter referred to as the spacing. Our parameterization of in terms of rank-3 and 4 tensors is the below variant of the Matrix Product Operator (MPO) tensor network.
The modified MPO only has an outgoing red leg every nodes, beginning from the first. The red legs again have dimension while the gold legs have dimension , which is another parameter known as the bond dimension. Intuitively, the gold legs are responsible for capturing correlations between features, for which a larger value of is desirable. In explicit tensor indices,
where we have adopted Einstein’s summation convention (see supplement) and are the parameterizing low-rank tensors. TNAD’s output, , can then be computed as below.
Finally, the following tensor network yields the squared F-norm of used as a training penalty.
Weaving the above together, our overall loss function over a batch ofinstances is given by
is a hyperparameter that controls the trade-off between TNAD’s fit around training points and its overall tendency to predict normality. In words,only sees normal instances during training which it tries to map to vectors on a hypersphere of radius , but it is simultaneously deterred from mapping other unseen instances to vectors of non-zero norm due to the penalty. The ’s are taken to stabilize the optimization by batch gradient descent since the value of a large tensor network can fluctuate by a few orders of magnitude with each descent step even with a tiny learning rate. Finally, the ReLU function is applied to the F-norm penalty to avoid the trivial solution of .
3.3 Contraction Order and Complexity
Now that our tensor network has been identified, the final ingredient is determining an efficient order for multiplying tensors—a process known as contraction—to compute and . Though different contraction schemes lead to the same result, they may have vastly different time complexities, for which the simplest example is the quantity for some matrix and vector —the first bracketing involves an expensive matrix product while the second bypasses it. The time-complexity of a contraction between two nodes can be read off a tensor network diagram as the product of the dimensions of all legs connected to the two nodes, without double-counting. Though searching for the optimal contraction order of a general network is NP-hard [Chi-Chung et al., 1997], the MPO has been extensively studied and an efficient contraction order that scales linearly with is known—despite being a linear transformation between spaces with dimensions exponential in . The initials steps in computing are vertical contractions of the black legs followed by right-to-left horizontal contractions along segments between consecutive red legs.
In practice, only the bottom half of the network is contracted before it is duplicated and attached to itself. Notably, this process can also easily be parallelized. At this juncture, observe that both and the resulting network for are of the form in Figure 7, which can be computed efficiently by repeated zig-zag contractions. The overall time complexities of computing and are and , where only the former is needed during prediction. Meanwhile, the overall space complexity of TNAD is .
The effectiveness of TNAD as a general one-class anomaly detector is verified on both image and tabular datasets. The Area Under the Receiver Operating Characteristic curve (AUROC) is adopted as a threshold-agnostic metric for all experiments. TNAD was implemented with the TensorNetwork library[Roberts et al., 2019] with the JAX backend and trained with the ADAM optimizer in its default settings.
General Baselines: The general anomaly detection baselines evaluated are: One-Class SVM (OC-SVM) [Manevitz and Yousef, 2002], Isolation Forest (IF) [Liu et al., 2008], and GOAD [Bergman and Hoshen, 2020]. OC-SVM and IF are traditional anomaly detection algorithms known to perform well on general data while GOAD is a recent, state-of-the-art self-supervised algorithm with different transformation schemes for image and tabular data. OC-SVM and IF were taken off-shelf from the Scikit-Learn toolkit [Pedregosa et al., 2011] while GOAD experiments were run with the official code of [Bergman and Hoshen, 2020]. For all OC-SVM experiments, the RBF kernel was used and a grid sweep was conducted for the kernel coefficient and the margin parameter according to the test set performance in hindsight—providing OC-SVM a supervised advantage. For all IF experiments, the number of trees and the sub-sampling size were set to and respectively, as recommended by the original paper. GOAD parameters are reported in the specific subsections.
4.1 Image Experiments
Datasets: Our image experiments were conducted on the MNIST [LeCun et al., 2010] and Fashion-MNIST [Xiao et al., 2017] datasets, each comprising training and test examples of grey-scale images belonging to ten classes. In each set-up, one particular class was deemed as the inliers and all original training instances corresponding to that class were retrieved to form the new training set, containing roughly examples. The trained models were then evaluated on the untouched test set.
Additional Image Baselines: To illustrate the strengths of our approach, we include further comparisons to Deep SVDD (DSVDD) [Tax and Duin, 2004] and ADGAN [Deecke et al., 2018], which entail convolutional networks. DSVDD experiments were performed with the original code while ADGAN results are reported from [Deecke et al., 2018, Golan and El-Yaniv, 2018].
Preprocessing: For all models besides DSVDD, the pixel values of the grey-scale images were divided by to obtain a float in the range . Due to the computational complexity of TNAD, awas also performed to reduce the size of the images to
only for our model. In the cases of TNAD, OC-SVM and IF, the images were flattened before they were passed to these models—which thus do not exploit the inherent locality of the images, as contrasted with the convolutional architectures employed by all other models. For GOAD, the images were zero-padded to sizeto be compatible with the official implementation designed for CIFAR-10. Finally, for DSVDD, the images were preprocessed with global contrast normalization in the -norm and subsequent min-max scaling to the interval , following the original paper.
Baseline Parameters: The convolutional architectures and hyper-parameters of all deep baselines (GOAD, DSVDD, ADGAN) follow their original work. GOAD was run with the margin parameter and the geometric transformations of GEOM [Golan and El-Yaniv, 2018] involving flips, translations and rotations. DSVDD was run with and a two-phased training scheme as described in the original paper.
TNAD Parameters: TNAD was run with physical dimension , the -dimensional embedding , bond-dimension , spacing , sites and margin strength. As an aside, TNAD is sensitive to initialization for large since it successively multiplies many tensors, causing the final result to vanish or explode if each tensor is too small or big—we found a standard deviation of to be suitable for and the final performance of TNAD to not vary significantly with the standard deviation, once it was initialized in a reasonable regime. As a further precaution, TNAD was first trained for
“cold-start” epochs with learning rateto circumvent unfortunate initializations and a subsequent epochs with initial learning rate decaying exponentially at rate , for a total of epochs. A small batch size was used due to space constraints. Finally, since only the ’s of tensor network quantities are desired, we employ the trick of rescaling tensors by their element of largest magnitude during contractions and subsequently adding back the of the rescaling to stabilize computations.
Results and Discussion: Our experimental results on image datasets are presented in Table 1
. The mean AUROC, along with the standard error, across ten successful trials are reported for each model and each class chosen as the inliers. Occasionally, GOAD experienced “hypersphere collapse” while TNAD encountered numerical instabilities which led tonan outputs—these trials were removed. Ultimately, TNAD produces consistently strong results and notably emerges second out of all evaluated models on MNIST, surpassing all convolutional architectures besides GOAD despite not exploiting the innate structure of images. Furthermore, TNAD shows the lowest variation in performance other than the deterministic OC-SVM, possibly attributable to its linearity. OC-SVM exhibits a comparably strong performance though it was admittedly optimized in hindsight. Attaining the highest AUROC on most classes, GOAD undeniably triumphs all other evaluated models on images. However, GOAD’s performance dip on MNIST digits , which are largely unaffected by the horizontal flip and rotation used in its transformations, suggests that its success relies on identifying transformations that leverage the underlying structure of the data. Indeed, its authors [Bergman and Hoshen, 2020] acknowledge that the random affine transformations used in GOAD’s tabular experiments degraded its performance on image datasets. As such, TNAD’s performance is especially encouraging, considering its ignorance of the inputs being images.
4.2 Tabular Experiments
Datasets: We evaluate TNAD and other general baselines on
real-world ODDS[Rayana, 2016] datasets derived from the UCI repository [Dua and Graff, 2017]: Wine, Glass, Thyroid, Satellite, Forest Cover. These were selected to exhibit a variety of dataset sizes, features and anomalous proportions—detailed information regarding them is presented in Table 2. Following the procedure of [Bergman and Hoshen, 2020], all models were trained on half of the normal instances and evaluated on the other half plus the anomalies.
For all models and datasets, the data was normalized such that the training set had zero mean and unit variance in each feature.
Baseline Parameters: GOAD employs random affine transformations with output dimension for self-supervision on tabular data and trains a fully-connected classifier with hidden size and leaky-ReLU activations. We adhere to the hyperparameter choices in the original paper, setting , and training epochs for the large-scale dataset Forest Cover and , and training epoch for all other smaller-scale datasets. Finally, we also train DAGMM [Zong et al., 2018] using its original Thyroid architecture for epochs in and report the best results.
TNAD Parameters: The dimensions of the input and output spaces and , which depend on the parameters , and , are crucial to TNAD. As the number of features varies across datasets, we choose and
according to the following heuristics. We setand subsequently choose such that , with a preference for smaller on smaller datasets. The first rule imposes an appropriate nullspace of while the second ensures that the dimension is large enough to exploit the prowess of tensor networks while concurrently small enough to prevent the model from overfitting. On the smallest datasets Wine and Glass, we set to avoid overfitting while the other datasets used . The bond dimension was fixed at and the same two-phased training scheme is adopted as before for the small tensor networks in Wine and Thyroid. For the other larger models, lower learning rates of damped and undamped were used, with a decay rate, to stabilize training. A batch size of was ued for all datasets besides Forest Cover which used . The embedding is used by default. A summary of TNAD parameters is provided in Table 2.
TNAD Parameters on Glass: The default parameters did not work well on Glass so we devised . To fully leverage the orthogonality properties of , we choose a large and in turn set to maintain in the desired range.
Results and Discussion: Table 3 shows the mean AUROC, with standard errors, from our experiments. Due to the large variance caused by its stochastic nature, GOAD was run for trials on small-scale datasets and trials on Forest Cover. All other models were run for trials. TNAD is the best performer across all datasets. Its drastic improvements over the best baseline on the smaller datasets Wine and Glass bear credence to the ability of the F-norm penalty in ensuring a tight fit around scarce inliers. GOAD’s poorer performance on Satellite and Forest Cover supports the expectation that affine transformations may not suit general data. All-in-all, we believe TNAD to be the best AD model, when given no prior domain knowledge of the underlying data.
|Dataset||# Train||# Test||TNAD Parameters|
|Wine||59||60 ()||10 ()||13||4||1||2.9e4||0.3||2e-3|
|Glass*||102||103 ()||9 ()||9||16||2||1.0e6||0.3||5e-4|
|Thyroid||1839||1840 ()||93 ()||6||6||1||4.7e4||0.1||2e-3|
|Satellite||2199||2200 ()||2036 ()||36||4||2||6.9e10||0.1||5e-4|
|Forest||141650||141651 ()||2747 ()||10||8||1||1.1e9||0.1||5e-4|
In this paper, we have introduced TNAD as an adept anomaly detection model for general data. To the best of our knowledge, it is the first instance of a tensor network model exceeding classical and deep methods. It should be remarked that specifically for images and videos, there exist more appropriate tensor networks, such as PEPS [Verstraete and Cirac, 2004], that capture higher-dimensional correlations. Ultimately, we hope to set a paradigm of using tensor networks as “wide” models and spur future work.
6 Statement of Broader Impact
Anomaly detection has many benefits in fraud prevention, network security, health screening and crime investigation—the last two have been explicitly demonstrated by our successes in the Thyroid and Glass datasets. That said, anomaly detection also has applications in areas such as surveillance monitoring, which tie in issues such as individual privacy. Furthermore, what is considered anomalous may be a reflection of our societal norms so caution must taken in ensuring that such technology do not propagate inherent biases in our society.
- GANomaly: semi-supervised anomaly detection via adversarial training. In Computer Vision – ACCV 2018, pp. 622–637. Cited by: §2.
Detecting anomalous data using auto-encoders.
International Journal of Machine Learning and Computing6, pp. 21. Cited by: §2.
- Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (8), pp. 1798–1828. Cited by: §2.
- Classification-based anomaly detection for general data. In International Conference on Learning Representations, Cited by: §2, §4.1, §4.2, §4.
- Tensor networks in a nutshell. External Links: Cited by: §3.2.
- Anomaly detection using one-class neural networks. External Links: Cited by: §2.
- Outlier detection with autoencoder ensembles. In SDM, Cited by: §2.
- On optimizing a class of multi-dimensional loops with reduction for parallel execution. Parallel Processing Letters 07 (02), pp. 157–168. Cited by: §3.3.
- Image anomaly detection with generative adversarial networks. pp. . Cited by: §2, §4.1, Table 1.
- Adversarial feature learning. External Links: Cited by: §2.
- UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. Cited by: §4.2.
- High-dimensional and large-scale anomaly detection using a linear one-class svm with deep learning. Pattern Recognition 58, pp. 121 – 134. Cited by: §2.
- Unsupervised representation learning by predicting image rotations. External Links: Cited by: §2.
- Deep anomaly detection using geometric transformations. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp. 9781–9791. Cited by: §2, §4.1, §4.1, Table 1.
- Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, pp. 2672–2680. Cited by: §2.
- Outlier detection using replicator neural networks. In Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery, pp. 170–180. Cited by: §2.
Using self-supervised learning can improve model robustness and uncertainty. In Advances in Neural Information Processing Systems 32, pp. 15663–15674. Cited by: §2.
- MNIST handwritten digit database. ATT Labs 2. Cited by: §2, §4.1.
- Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining, Vol. , pp. 413–422. Cited by: §2, §4.
- Winner-take-all autoencoders. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15, pp. 2791–2799. Cited by: §2.
- One-class svms for document classification. J. Mach. Learn. Res. 2, pp. 139–154. Cited by: §2, §4.
Stacked convolutional auto-encoders for hierarchical feature extraction. In Proceedings of the 21th International Conference on Artificial Neural Networks - Volume Part I, pp. 52–59. Cited by: §2.
- A practical introduction to tensor networks: matrix product states and projected entangled pair states. Annals of Physics 349, pp. 117–158. Cited by: §3.2.
- Thermodynamic limit of density matrix renormalization. Phys. Rev. Lett. 75, pp. 3537–3540. Cited by: §2.
- Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12, pp. 2825–2830. Cited by: §4.
- ODDS library. Stony Brook University, Department of Computer Sciences. Cited by: §4.2.
- A multi-scale tensor network architecture for classification and regression. External Links: Cited by: §2.
Safe visual navigation via deep learning and novelty detection. In Robotics: Science and Systems, Cited by: §2.
- TensorNetwork: a library for physics and machine learning. External Links: Cited by: §4.
- Deep one-class classification. In Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 80, pp. 4393–4402. Cited by: §2.
- Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, pp. 4–11. Cited by: §2.
- Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In IPMI, Cited by: §2.
- The density-matrix renormalization group in the age of matrix product states. Annals of Physics 326 (1), pp. 96–192. Cited by: §2.
- Unsupervised identification of disease marker candidates in retinal oct imaging data. IEEE Transactions on Medical Imaging 38 (4), pp. 1037–1047. Cited by: §2.
- Tensor networks for medical image classification. External Links: Cited by: §2.
- Supervised learning with quantum-inspired tensor networks. External Links: Cited by: §2.
- Support vector data description. Mach. Learn. 54 (1), pp. 45–66. Cited by: §2, §3.1, §4.1.
- Renormalization algorithms for quantum-many body systems in two and higher dimensions. External Links: Cited by: §5.
- Density matrix formulation for quantum renormalization groups. Phys. Rev. Lett. 69, pp. 2863–2866. Cited by: §2.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. External Links: Cited by: §4.1.
- Learning deep representations of appearance and motion for anomalous event detection. External Links: Cited by: §2.
- Efficient gan-based anomaly detection. External Links: Cited by: §2.
Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In International Conference on Learning Representations, Cited by: §4.2.