Anomaly Detection with Tensor Networks

06/03/2020
by   Jinhui Wang, et al.
Google
Stanford University
0

Originating from condensed matter physics, tensor networks are compact representations of high-dimensional tensors. In this paper, the prowess of tensor networks is demonstrated on the particular task of one-class anomaly detection. We exploit the memory and computational efficiency of tensor networks to learn a linear transformation over a space with dimension exponential in the number of original features. The linearity of our model enables us to ensure a tight fit around training instances by penalizing the model's global tendency to a predict normality via its Frobenius norm—a task that is infeasible for most deep learning models. Our method outperforms deep and classical algorithms on tabular datasets and produces competitive results on image datasets, despite not exploiting the locality of images.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

Robert R Tucci

This is not a neural network because the nodes of a NN are deterministic and the nodes of a Bayesian network are probabilistic, both for classical and quantum B nets. Quantum mechanics is intrinsically probabilistic.

Authors

page 1

page 2

page 3

page 4

10/19/2018

QANet: Tensor Decomposition Approach for Query-based Anomaly Detection in Heterogeneous Information Networks

Complex networks have now become integral parts of modern information in...
06/15/2021

Towards Total Recall in Industrial Anomaly Detection

Being able to spot defective parts is a critical component in large-scal...
01/29/2021

The Deep Radial Basis Function Data Descriptor (D-RBFDD) Network: A One-Class Neural Network for Anomaly Detection

Anomaly detection is a challenging problem in machine learning, and is e...
03/10/2022

An Empirical Investigation of 3D Anomaly Detection and Segmentation

Anomaly detection and segmentation in images has made tremendous progres...
04/29/2021

A Hierarchical Transformation-Discriminating Generative Model for Few Shot Anomaly Detection

Anomaly detection, the task of identifying unusual samples in data, ofte...
06/17/2020

Dynamic Tensor Rematerialization

Checkpointing enables training deep learning models under restricted mem...
02/13/2021

Segmenting two-dimensional structures with strided tensor networks

Tensor networks provide an efficient approximation of operations involvi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Anomaly detection (AD) entails determining whether a data point comes from the same distribution as a prior set of normal data. Anomaly detection systems are used to discover credit card fraud, detect cyber intrusion attacks and identify cancer cells. Since normal examples are readily available while anomalies tend to be rare in production environments, we consider the semi-supervised or one-class setting where only normal instances are present in the training set. It is important to remark that the outlier space is often much larger than the inlier space, though anomalous observations are uncommon. For instance, the space of normal dog images is sparse in the space of anomalous non-dog images. This discrepancy between data availability and space sizes makes anomaly detection hard, as a model must account for the entire input space despite only having information about a minuscule subspace. Deep learning models generally struggle with this challenge since it is impractical to manage their behavior over the entire input space. Linear models, however, do not face such a difficulty.

To gain control over our model’s behavior on the entire input space, we employ a linear transformation as its main component and subsequently penalize its Frobenius norm. However, this transformation has to be performed over an exponentially large feature space for our model to be expressive—an impossible task with full matrices. Thus, we leverage tensor networks as sparse representations of such large matrices. All-in-all, our model is an end-to-end anomaly detector for general data that produces a normality score via its decision function . Our novel method is showcased on several tabular and image datasets. We attain significant improvements over prior methods on tabular datasets and competitive results on image datasets, despite not exploiting the locality of image pixels.

2 Related Work

Stoudenmire and Schwab [2016] first demonstrated the potential of tensor networks in classification tasks, using the well-known density matrix renormalization group algorithm from computational physics [White, 1992] to train Matrix Product States (MPS) [Schollwöck, 2011, Östlund and Rommer, 1995]

as a weight matrix in classifying

MNIST digits [LeCun et al., 2010]. Subsequent work has also applied tensor networks in the classification of medical images [Selvan and Dam, 2020] and regression [Reyes and Stoudenmire, 2020] but investigations into unsupervised and semi-supervised settings are lacking.

The literature on anomaly detection (AD) is vast and we will focus on reviewing previous work in the one-class context for arbitrary data (e.g. not restricted to images). Kernel-based methods, such as the One-Class SVM (OC-SVM) [Manevitz and Yousef, 2002], learn a tight fit of inliers in an implicit high-dimensional feature space while the non-distance-based Isolation Forest [Liu et al., 2008]

directly distinguishes inliers and outliers based on partitions of feature values. Unfortunately, such classical AD algorithms presume the clustering of normal instances in some feature space and hence suffer from the curse of dimensionality, requiring substantial feature selection to operate on feature-rich, multivariate data

[Bengio et al., 2013].

As such, hybrid methods were developed to first learn latent representations using Auto-Encoders (AE) [Xu et al., 2015, Andrews et al., 2016, Seebock et al., 2019]

and Deep Belief Networks (DBN)

[Erfani et al., 2016], that were later fed to a OC-SVM. End-to-end deep learning models, without explicit AD objectives, have also been devised. Auto-Encoder AD models [Hawkins et al., 2002, Sakurada and Yairi, 2014, Chen et al., 2017] learn an encoding of inliers and subsequently use the reconstruction loss as a decision function. Other AE-variants, such as Deep Convolutional Auto-Encoders (DCAE) [Masci et al., 2011, Makhzani and Frey, 2015], have also been studied by [Seebock et al., 2019, Richter and Roy, 2017]

. Next, generative models learn a probability distribution for inliers and subsequently identify anomalous instances as those with low probabilities or those which are difficult to find in their latent spaces (in the case of latent variable models). Generative Adversarial Networks (GANs)

[Goodfellow et al., 2014] have been popular in the latter category, with the advent of AnoGAN [Schlegl et al., 2017], a more efficient variant [Zenati et al., 2018] based on BiGANs [Donahue et al., 2016], GANomaly [Akcay et al., 2019] and ADGAN [Deecke et al., 2018].

Deep learning models with objectives that resemble shallow kernel-based AD algorithms have also been explored. Such models train neural networks as explicit feature maps while concurrently finding the tightest decision boundary around the transformed training instances in the output space. Deep SVDD (DSVDD)

[Ruff et al., 2018]

seeks a minimal-volume hypersphere encapsulating inliers, motivated by the Support Vector Data Description (SVDD)

[Tax and Duin, 2004], while One-Class Neural Networks (OC-NN) [Chalapathy et al., 2018]

searches for a maximum-margin hyperplane separating normal instances from the origin, in a fashion similar to OC-SVM. Contemporary attention has been directed towards self-supervised models, mostly for images

[Golan and El-Yaniv, 2018, Gidaris et al., 2018, Hendrycks et al., 2019], with the exception of the more recent GOAD [Bergman and Hoshen, 2020] for general data. These models transform an input point into several altered instances according to a fixed class of rules and train a classifier that predicts a score for each altered instance belonging to its corresponding class of transformation. Outliers are then reflected as points with extreme scores, aggregated over all classes. In particular, GOAD unifies DSVDD and the self-supervised GEOM model [Golan and El-Yaniv, 2018] by defining the anomaly score of each transformed instance as its distance from its class’ hypersphere center.

3 Model Description

3.1 Overview

In this section, we introduce our model that we call Tensor Network Anomaly Detector (TNAD). TNAD, as an end-to-end network with an explicit AD objective, falls into the second last category of models with one crucial caveat. Its notion of tightness does not rely on the volume of the decision boundary, which is an inadequate measure in the one-class setting. To illustrate this point, DSVDD and GOAD may find tiny hyperspheres during training but still have a loose fit around inliers, as they may map all possible inputs into the hyperspheres—a problem acknowledged by the original authors of DSVDD as “hypersphere collapse” [Tax and Duin, 2004]. In the scenario where outliers are available, one can indeed judge the tightness of a fit by the separation of inliers and outliers with respect to a decision boundary but in the one-class setting where no such points of reference are available, the tightness of a model’s fit on training instances must be gauged relative to its other predictions. As such, we design TNAD to incorporate a canonical measure of its overall tendency to predict normality.

A schematic of TNAD is depicted in Figure 1. A fixed feature map is applied to map inputs onto the surface of a unit hypersphere in a vector space with dimension in the number of original features . The training instances are sparse in this high-dimensional space and thus enables the learnt component of our model to be expressive, despite being a simple linear transformation . Upon action by , normal instances will be mapped close to the surface of a hypersphere in of an arbitrarily chosen radius (set to be in our experiments) while anomalous instances can be identified as those close to the origin. The decision function of the model with respect to an input , where a larger value indicates normality, is then

(1)
Figure 1: Schematic of Tensor Network Anomaly Detector (TNAD)

To accommodate the possible predominance of outliers, we allow to have a smaller exponential scaling with so that for to have a large null-space. can thus be understood informally as a “projection” that annihilates the subspace spanned by outliers. To parameterize which has dimensions exponential in , we leverage the Matrix Product Operator (MPO) tensor network, which is both memory and computationally-efficient.

To obtain a tight fit around inliers, we penalize the Frobenius norm of during training.

(2)

where are the matrix elements of with respect to some basis. Since

is the sum of squared singular values of

, it captures the total extent to which the model is likely to deem an instance as normal. Ultimately, such a spectral property reflects the overall behavior of the model, rather than its restricted behavior on the training set.

3.2 Matrix Product Operator Model

In this section, the details of TNAD is expounded in tensor network notation—for which the reader is recommended to consult a brief introduction in our supplementary material and more comprehensive reviews in [Biamonte and Bergholm, 2017, Orús, 2014]. The input space is assumed to be for (flattened) grey-scale images and for tabular data, where is the number of features. Given a predetermined map where is a parameter known as the physical dimension, an input is first passed through a feature map defined by

(3)

Recall is a -dimensional and thus very large vector space. In tensor network notation,

Figure 2: TNAD Embedding layer.

The map is chosen to satisfy for all such that for all , implying that maps all points to the unit hypersphere in . Two forms of were explored. The -dimensional “trigonometric” embedding is defined as

(4)

Our grey-scale image experiments were conducted with with , which possesses the following natural interpretation. Since are the two standard basis vectors of , the set of binary-valued images is mapped to the standard basis of . Intuitively, the values and correspond to extreme cases in a feature (which reflects the pixel brightness in this case) so are devised to be orthogonal for maximal separation. Now, for any , since the inner product satisfies , is highly sensitive to each individual feature—flipping a single pixel value from to would lead to an orthogonal vector after . In essence, then contains all extreme representatives of the input space , which can be seen to be images of highest contrast, and is mapped by to the standard basis of for maximal separation. The squared F-norm of our subsequent linear transformation then obeys

(5)

Recalling as the value of TNAD’s decision function on an input , is thus conferred the meaning of the total degree of normality predicted by the model on these extreme representatives—apt since images with the best contrast should be the most distinguishable. Unfortunately, such an interpretation of does not extend to so we also considered the -dimensional “fourier” embedding on tabular data, defined component-wise (indexing from ) as

(6)

This map has a period of and the following property. On , the -th value in is mapped to the -th standard basis vector of . Thus, and its periodic-equivalents are deemed as extreme cases and a similar analysis follows as before. Ultimately, both versions of segregate points close in the -norm of the input space by mapping inputs into the exponentially-large space , buttressing the subsequent linear transformation .

After the feature map, we learn a tensor where for some parameter referred to as the spacing. Our parameterization of in terms of rank-3 and 4 tensors is the below variant of the Matrix Product Operator (MPO) tensor network.

Figure 3: Matrix product operator parameterization for .

The modified MPO only has an outgoing red leg every nodes, beginning from the first. The red legs again have dimension while the gold legs have dimension , which is another parameter known as the bond dimension. Intuitively, the gold legs are responsible for capturing correlations between features, for which a larger value of is desirable. In explicit tensor indices,

(7)

where we have adopted Einstein’s summation convention (see supplement) and are the parameterizing low-rank tensors. TNAD’s output, , can then be computed as below.

Figure 4: Squared norm of transformed vector.

Finally, the following tensor network yields the squared F-norm of used as a training penalty.

Figure 5: Squared F-norm of .

Weaving the above together, our overall loss function over a batch of

instances is given by

(8)

where

is a hyperparameter that controls the trade-off between TNAD’s fit around training points and its overall tendency to predict normality. In words,

only sees normal instances during training which it tries to map to vectors on a hypersphere of radius , but it is simultaneously deterred from mapping other unseen instances to vectors of non-zero norm due to the penalty. The ’s are taken to stabilize the optimization by batch gradient descent since the value of a large tensor network can fluctuate by a few orders of magnitude with each descent step even with a tiny learning rate. Finally, the ReLU function is applied to the F-norm penalty to avoid the trivial solution of .

3.3 Contraction Order and Complexity

Now that our tensor network has been identified, the final ingredient is determining an efficient order for multiplying tensors—a process known as contraction—to compute and . Though different contraction schemes lead to the same result, they may have vastly different time complexities, for which the simplest example is the quantity for some matrix and vector —the first bracketing involves an expensive matrix product while the second bypasses it. The time-complexity of a contraction between two nodes can be read off a tensor network diagram as the product of the dimensions of all legs connected to the two nodes, without double-counting. Though searching for the optimal contraction order of a general network is NP-hard [Chi-Chung et al., 1997], the MPO has been extensively studied and an efficient contraction order that scales linearly with is known—despite being a linear transformation between spaces with dimensions exponential in . The initials steps in computing are vertical contractions of the black legs followed by right-to-left horizontal contractions along segments between consecutive red legs.

Figure 6: Initial step in computing

In practice, only the bottom half of the network is contracted before it is duplicated and attached to itself. Notably, this process can also easily be parallelized. At this juncture, observe that both and the resulting network for are of the form in Figure 7, which can be computed efficiently by repeated zig-zag contractions. The overall time complexities of computing and are and , where only the former is needed during prediction. Meanwhile, the overall space complexity of TNAD is .

Figure 7: Zig-zag contraction that is repeated till completion.

4 Experiments

The effectiveness of TNAD as a general one-class anomaly detector is verified on both image and tabular datasets. The Area Under the Receiver Operating Characteristic curve (AUROC) is adopted as a threshold-agnostic metric for all experiments. TNAD was implemented with the TensorNetwork library

[Roberts et al., 2019] with the JAX backend and trained with the ADAM optimizer in its default settings.

General Baselines: The general anomaly detection baselines evaluated are: One-Class SVM (OC-SVM) [Manevitz and Yousef, 2002], Isolation Forest (IF) [Liu et al., 2008], and GOAD [Bergman and Hoshen, 2020]. OC-SVM and IF are traditional anomaly detection algorithms known to perform well on general data while GOAD is a recent, state-of-the-art self-supervised algorithm with different transformation schemes for image and tabular data. OC-SVM and IF were taken off-shelf from the Scikit-Learn toolkit [Pedregosa et al., 2011] while GOAD experiments were run with the official code of [Bergman and Hoshen, 2020]. For all OC-SVM experiments, the RBF kernel was used and a grid sweep was conducted for the kernel coefficient and the margin parameter according to the test set performance in hindsight—providing OC-SVM a supervised advantage. For all IF experiments, the number of trees and the sub-sampling size were set to and respectively, as recommended by the original paper. GOAD parameters are reported in the specific subsections.

4.1 Image Experiments

Datasets: Our image experiments were conducted on the MNIST [LeCun et al., 2010] and Fashion-MNIST [Xiao et al., 2017] datasets, each comprising training and test examples of grey-scale images belonging to ten classes. In each set-up, one particular class was deemed as the inliers and all original training instances corresponding to that class were retrieved to form the new training set, containing roughly examples. The trained models were then evaluated on the untouched test set.

Additional Image Baselines: To illustrate the strengths of our approach, we include further comparisons to Deep SVDD (DSVDD) [Tax and Duin, 2004] and ADGAN [Deecke et al., 2018], which entail convolutional networks. DSVDD experiments were performed with the original code while ADGAN results are reported from [Deecke et al., 2018, Golan and El-Yaniv, 2018].

Preprocessing: For all models besides DSVDD, the pixel values of the grey-scale images were divided by to obtain a float in the range . Due to the computational complexity of TNAD, a

-max pool operation with stride

was also performed to reduce the size of the images to

only for our model. In the cases of TNAD, OC-SVM and IF, the images were flattened before they were passed to these models—which thus do not exploit the inherent locality of the images, as contrasted with the convolutional architectures employed by all other models. For GOAD, the images were zero-padded to size

to be compatible with the official implementation designed for CIFAR-10. Finally, for DSVDD, the images were preprocessed with global contrast normalization in the -norm and subsequent min-max scaling to the interval , following the original paper.

Baseline Parameters: The convolutional architectures and hyper-parameters of all deep baselines (GOAD, DSVDD, ADGAN) follow their original work. GOAD was run with the margin parameter and the geometric transformations of GEOM [Golan and El-Yaniv, 2018] involving flips, translations and rotations. DSVDD was run with and a two-phased training scheme as described in the original paper.

TNAD Parameters: TNAD was run with physical dimension , the -dimensional embedding , bond-dimension , spacing , sites and margin strength

. All tensors were initialized via a normal distribution with standard deviation

. As an aside, TNAD is sensitive to initialization for large since it successively multiplies many tensors, causing the final result to vanish or explode if each tensor is too small or big—we found a standard deviation of to be suitable for and the final performance of TNAD to not vary significantly with the standard deviation, once it was initialized in a reasonable regime. As a further precaution, TNAD was first trained for

“cold-start” epochs with learning rate

to circumvent unfortunate initializations and a subsequent epochs with initial learning rate decaying exponentially at rate , for a total of epochs. A small batch size was used due to space constraints. Finally, since only the ’s of tensor network quantities are desired, we employ the trick of rescaling tensors by their element of largest magnitude during contractions and subsequently adding back the of the rescaling to stabilize computations.

Results and Discussion: Our experimental results on image datasets are presented in Table 1

. The mean AUROC, along with the standard error, across ten successful trials are reported for each model and each class chosen as the inliers. Occasionally, GOAD experienced “hypersphere collapse” while TNAD encountered numerical instabilities which led to

nan outputs—these trials were removed. Ultimately, TNAD produces consistently strong results and notably emerges second out of all evaluated models on MNIST, surpassing all convolutional architectures besides GOAD despite not exploiting the innate structure of images. Furthermore, TNAD shows the lowest variation in performance other than the deterministic OC-SVM, possibly attributable to its linearity. OC-SVM exhibits a comparably strong performance though it was admittedly optimized in hindsight. Attaining the highest AUROC on most classes, GOAD undeniably triumphs all other evaluated models on images. However, GOAD’s performance dip on MNIST digits , which are largely unaffected by the horizontal flip and rotation used in its transformations, suggests that its success relies on identifying transformations that leverage the underlying structure of the data. Indeed, its authors [Bergman and Hoshen, 2020] acknowledge that the random affine transformations used in GOAD’s tabular experiments degraded its performance on image datasets. As such, TNAD’s performance is especially encouraging, considering its ignorance of the inputs being images.

Dataset SVM IF GOAD DSVDD ADGAN TNAD
MNIST
0
1
2
3
4
5
6
7
8
9
2-8 avg
Fashion-
MNIST
0
1
2
3
4
5
6
7
8
9
2-8 avg
In each row, the -th class is taken as the normal instance while all other classes are anomalies. The top two results in each experiment are highlighted in bold. OC-SVM, which is abbreviated as SVM above, did not show variations in performance once it has converged so no standard errors are reported. ADGAN’s results were borrowed from [Deecke et al., 2018, Golan and El-Yaniv, 2018] which did not include error bars.
Table 1: Mean AUROC scores (in ) and standard errors on MNIST and Fashion MNIST

4.2 Tabular Experiments

Datasets: We evaluate TNAD and other general baselines on

real-world ODDS

[Rayana, 2016] datasets derived from the UCI repository [Dua and Graff, 2017]: Wine, Glass, Thyroid, Satellite, Forest Cover. These were selected to exhibit a variety of dataset sizes, features and anomalous proportions—detailed information regarding them is presented in Table 2. Following the procedure of [Bergman and Hoshen, 2020], all models were trained on half of the normal instances and evaluated on the other half plus the anomalies.

Preprocessing:

For all models and datasets, the data was normalized such that the training set had zero mean and unit variance in each feature.

Baseline Parameters: GOAD employs random affine transformations with output dimension for self-supervision on tabular data and trains a fully-connected classifier with hidden size and leaky-ReLU activations. We adhere to the hyperparameter choices in the original paper, setting , and training epochs for the large-scale dataset Forest Cover and , and training epoch for all other smaller-scale datasets. Finally, we also train DAGMM [Zong et al., 2018] using its original Thyroid architecture for epochs in and report the best results.

TNAD Parameters: The dimensions of the input and output spaces and , which depend on the parameters , and , are crucial to TNAD. As the number of features varies across datasets, we choose and

according to the following heuristics. We set

and subsequently choose such that , with a preference for smaller on smaller datasets. The first rule imposes an appropriate nullspace of while the second ensures that the dimension is large enough to exploit the prowess of tensor networks while concurrently small enough to prevent the model from overfitting. On the smallest datasets Wine and Glass, we set to avoid overfitting while the other datasets used . The bond dimension was fixed at and the same two-phased training scheme is adopted as before for the small tensor networks in Wine and Thyroid. For the other larger models, lower learning rates of damped and undamped were used, with a decay rate, to stabilize training. A batch size of was ued for all datasets besides Forest Cover which used . The embedding is used by default. A summary of TNAD parameters is provided in Table 2.

TNAD Parameters on Glass: The default parameters did not work well on Glass so we devised . To fully leverage the orthogonality properties of , we choose a large and in turn set to maintain in the desired range.

Results and Discussion: Table 3 shows the mean AUROC, with standard errors, from our experiments. Due to the large variance caused by its stochastic nature, GOAD was run for trials on small-scale datasets and trials on Forest Cover. All other models were run for trials. TNAD is the best performer across all datasets. Its drastic improvements over the best baseline on the smaller datasets Wine and Glass bear credence to the ability of the F-norm penalty in ensuring a tight fit around scarce inliers. GOAD’s poorer performance on Satellite and Forest Cover supports the expectation that affine transformations may not suit general data. All-in-all, we believe TNAD to be the best AD model, when given no prior domain knowledge of the underlying data.

Dataset # Train # Test TNAD Parameters
Normal Anomalous
Wine 59 60 () 10 () 13 4 1 2.9e4 0.3 2e-3
Glass* 102 103 () 9 () 9 16 2 1.0e6 0.3 5e-4
Thyroid 1839 1840 () 93 () 6 6 1 4.7e4 0.1 2e-3
Satellite 2199 2200 () 2036 () 36 4 2 6.9e10 0.1 5e-4
Forest 141650 141651 () 2747 () 10 8 1 1.1e9 0.1 5e-4
* Only for the Glass dataset, was used while was adopted for the others.
Table 2: Information about ODDS datasets, sorted by size, and TNAD parameters used.
Dataset OC-SVM IF GOAD DAGMM TNAD
Wine
Glass
Thyroid
Satellite
Forest
The top two results in each experiment, ran for trials, are highlighted in bold. OC-SVM did not show variations in performance once it has converged so no standard errors are reported.
Table 3: Mean AUROC scores (in ) and standard errors on ODDS datasets.

5 Conclusion

In this paper, we have introduced TNAD as an adept anomaly detection model for general data. To the best of our knowledge, it is the first instance of a tensor network model exceeding classical and deep methods. It should be remarked that specifically for images and videos, there exist more appropriate tensor networks, such as PEPS [Verstraete and Cirac, 2004], that capture higher-dimensional correlations. Ultimately, we hope to set a paradigm of using tensor networks as “wide” models and spur future work.

6 Statement of Broader Impact

Anomaly detection has many benefits in fraud prevention, network security, health screening and crime investigation—the last two have been explicitly demonstrated by our successes in the Thyroid and Glass datasets. That said, anomaly detection also has applications in areas such as surveillance monitoring, which tie in issues such as individual privacy. Furthermore, what is considered anomalous may be a reflection of our societal norms so caution must taken in ensuring that such technology do not propagate inherent biases in our society.

References

  • S. Akcay, A. Atapour-Abarghouei, and T. P. Breckon (2019) GANomaly: semi-supervised anomaly detection via adversarial training. In Computer Vision – ACCV 2018, pp. 622–637. Cited by: §2.
  • J. Andrews, E. Morton, and L. Griffin (2016) Detecting anomalous data using auto-encoders.

    International Journal of Machine Learning and Computing

    6, pp. 21.
    Cited by: §2.
  • Y. Bengio, A. Courville, and P. Vincent (2013) Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (8), pp. 1798–1828. Cited by: §2.
  • L. Bergman and Y. Hoshen (2020) Classification-based anomaly detection for general data. In International Conference on Learning Representations, Cited by: §2, §4.1, §4.2, §4.
  • J. Biamonte and V. Bergholm (2017) Tensor networks in a nutshell. External Links: 1708.00006 Cited by: §3.2.
  • R. Chalapathy, A. K. Menon, and S. Chawla (2018) Anomaly detection using one-class neural networks. External Links: 1802.06360 Cited by: §2.
  • J. Chen, S. Sathe, C. C. Aggarwal, and D. S. Turaga (2017) Outlier detection with autoencoder ensembles. In SDM, Cited by: §2.
  • L. Chi-Chung, P. Sadayappan, and R. Wenger (1997) On optimizing a class of multi-dimensional loops with reduction for parallel execution. Parallel Processing Letters 07 (02), pp. 157–168. Cited by: §3.3.
  • L. Deecke, R. Vandermeulen, L. Ruff, S. Mandt, and M. Kloft (2018) Image anomaly detection with generative adversarial networks. pp. . Cited by: §2, §4.1, Table 1.
  • J. Donahue, P. Krähenbühl, and T. Darrell (2016) Adversarial feature learning. External Links: 1605.09782 Cited by: §2.
  • D. Dua and C. Graff (2017) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. Cited by: §4.2.
  • S. M. Erfani, S. Rajasegarar, S. Karunasekera, and C. Leckie (2016) High-dimensional and large-scale anomaly detection using a linear one-class svm with deep learning. Pattern Recognition 58, pp. 121 – 134. Cited by: §2.
  • S. Gidaris, P. Singh, and N. Komodakis (2018) Unsupervised representation learning by predicting image rotations. External Links: 1803.07728 Cited by: §2.
  • I. Golan and R. El-Yaniv (2018) Deep anomaly detection using geometric transformations. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp. 9781–9791. Cited by: §2, §4.1, §4.1, Table 1.
  • I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, pp. 2672–2680. Cited by: §2.
  • S. Hawkins, H. He, G. J. Williams, and R. A. Baxter (2002) Outlier detection using replicator neural networks. In Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery, pp. 170–180. Cited by: §2.
  • D. Hendrycks, M. Mazeika, S. Kadavath, and D. Song (2019)

    Using self-supervised learning can improve model robustness and uncertainty

    .
    In Advances in Neural Information Processing Systems 32, pp. 15663–15674. Cited by: §2.
  • Y. LeCun, C. Cortes, and C. Burges (2010) MNIST handwritten digit database. ATT Labs 2. Cited by: §2, §4.1.
  • F. T. Liu, K. M. Ting, and Z. Zhou (2008) Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining, Vol. , pp. 413–422. Cited by: §2, §4.
  • A. Makhzani and B. Frey (2015) Winner-take-all autoencoders. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15, pp. 2791–2799. Cited by: §2.
  • L. M. Manevitz and M. Yousef (2002) One-class svms for document classification. J. Mach. Learn. Res. 2, pp. 139–154. Cited by: §2, §4.
  • J. Masci, U. Meier, D. Cireşan, and J. Schmidhuber (2011)

    Stacked convolutional auto-encoders for hierarchical feature extraction

    .
    In Proceedings of the 21th International Conference on Artificial Neural Networks - Volume Part I, pp. 52–59. Cited by: §2.
  • R. Orús (2014) A practical introduction to tensor networks: matrix product states and projected entangled pair states. Annals of Physics 349, pp. 117–158. Cited by: §3.2.
  • S. Östlund and S. Rommer (1995) Thermodynamic limit of density matrix renormalization. Phys. Rev. Lett. 75, pp. 3537–3540. Cited by: §2.
  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay (2011) Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12, pp. 2825–2830. Cited by: §4.
  • S. Rayana (2016) ODDS library. Stony Brook University, Department of Computer Sciences. Cited by: §4.2.
  • J. Reyes and M. Stoudenmire (2020) A multi-scale tensor network architecture for classification and regression. External Links: 2001.08286 Cited by: §2.
  • C. Richter and N. Roy (2017)

    Safe visual navigation via deep learning and novelty detection

    .
    In Robotics: Science and Systems, Cited by: §2.
  • C. Roberts, A. Milsted, M. Ganahl, A. Zalcman, B. Fontaine, Y. Zou, J. Hidary, G. Vidal, and S. Leichenauer (2019) TensorNetwork: a library for physics and machine learning. External Links: 1905.01330 Cited by: §4.
  • L. Ruff, R. Vandermeulen, N. Goernitz, L. Deecke, S. A. Siddiqui, A. Binder, E. Müller, and M. Kloft (2018) Deep one-class classification. In Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 80, pp. 4393–4402. Cited by: §2.
  • M. Sakurada and T. Yairi (2014) Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, pp. 4–11. Cited by: §2.
  • T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In IPMI, Cited by: §2.
  • U. Schollwöck (2011) The density-matrix renormalization group in the age of matrix product states. Annals of Physics 326 (1), pp. 96–192. Cited by: §2.
  • P. Seebock, S. M. Waldstein, S. Klimscha, H. Bogunovic, T. Schlegl, B. S. Gerendas, R. Donner, U. Schmidt-Erfurth, and G. Langs (2019) Unsupervised identification of disease marker candidates in retinal oct imaging data. IEEE Transactions on Medical Imaging 38 (4), pp. 1037–1047. Cited by: §2.
  • R. Selvan and E. B. Dam (2020) Tensor networks for medical image classification. External Links: 2004.10076 Cited by: §2.
  • E. M. Stoudenmire and D. J. Schwab (2016) Supervised learning with quantum-inspired tensor networks. External Links: 1605.05775 Cited by: §2.
  • D. M. J. Tax and R. P. W. Duin (2004) Support vector data description. Mach. Learn. 54 (1), pp. 45–66. Cited by: §2, §3.1, §4.1.
  • F. Verstraete and J. I. Cirac (2004) Renormalization algorithms for quantum-many body systems in two and higher dimensions. External Links: cond-mat/0407066 Cited by: §5.
  • S. R. White (1992) Density matrix formulation for quantum renormalization groups. Phys. Rev. Lett. 69, pp. 2863–2866. Cited by: §2.
  • H. Xiao, K. Rasul, and R. Vollgraf (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. External Links: 1708.07747 Cited by: §4.1.
  • D. Xu, E. Ricci, Y. Yan, J. Song, and N. Sebe (2015) Learning deep representations of appearance and motion for anomalous event detection. External Links: 1510.01553 Cited by: §2.
  • H. Zenati, C. S. Foo, B. Lecouat, G. Manek, and V. R. Chandrasekhar (2018) Efficient gan-based anomaly detection. External Links: 1802.06222 Cited by: §2.
  • B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen (2018)

    Deep autoencoding gaussian mixture model for unsupervised anomaly detection

    .
    In International Conference on Learning Representations, Cited by: §4.2.