Anomaly detection with localization (AD) is a growing area of research in computer vision with many practical applications industrial inspection, road traffic monitoring , medical diagnostics  . However, the common supervised AD  is not viable in practical applications due to several reasons. First, it requires labeled data which is costly to obtain. Second, anomalies are usually rare long-tail examples
and have low probability to be acquired by sensors. Lastly, consistent labeling of anomalies is subjective and requires extensive domain expertise as illustrated in Figure1 with industrial cable defects.
With these limitations of the supervised AD, a more appealing approach is to collect only unlabeled anomaly-free images for train dataset as in Figure 1
(top row). Then, any deviation from anomaly-free images is classified as an anomaly. Such data setup with low rate of anomalies is generally considered to beunsupervised . Hence, the AD task can be reformulated as a task of out-of-distribution detection (OOD) with the AD objective.
While OOD for low-dimensional industrial sensors (power-line or acoustic) can be accomplished using a common -nearest-neighbor or more advanced clustering methods 
, it is less trivial for high-resolution images. Recently, convolutional neural networks (CNNs) have gained popularity in extracting semantic information from images into downsampled feature maps
. Though feature extraction using CNNs has relatively low complexity, the post-processing of feature maps is far from real-time processing in the state-of-the-art unsupervised AD methods.
To address this complexity drawback, we propose a CFLOW-AD model that is based on conditional normalizing flows. CFLOW-AD is agnostic to feature map spatial dimensions similar to CNNs, which leads to a higher accuracy metrics as well as a lower computational and memory requirements. We present the main idea behind our approach in a toy OOD detector example in Figure 1. A distribution of the anomaly-free image patchesis learned by the AD model. Our translation-equivariant model is trained to transform the original distribution with density into a Gaussian distribution with density. Finally, this model separates in-distribution patches with from the out-of-distribution patches with using a threshold computed as the Euclidean distance from the distribution mean.
2 Related work
We review models222For comprehensive review of the existing AD methods we refer readers to Ruff  and Pang  surveys. that employ the data setup from Figure 1 and provide experimental results for popular MVTec dataset  with factory defects or Shanghai Tech Campus (STC) dataset  with surveillance camera videos. We highlight the research related to a more challenging task of pixel-level anomaly localization (segmentation) rather than a more simple image-level anomaly detection.
propose to use CNN feature extractors followed by a principal component analysis and-mean clustering for AD. Their feature extractor is a ResNet-18 
pretrained on a large-scale ImageNet dataset. Similarly, SPADE  employs a Wide-ResNet-50  with multi-scale pyramid pooling that is followed by a
-nearest-neighbor clustering. Unfortunately, clustering is slow at test-time with high-dimensional data. Thus, parallel convolutional methods are preferred in real-time systems.
, generative models learn distribution of anomaly-free data and, therefore, are able to estimate a proxy metrics for anomaly scores even for the unseen images with anomalies. Recent models employ generative adversarial networks (GANs)[35, 36]
and variational autoencoders (VAEs)[3, 38].
A fully-generative models [35, 36, 3, 38] are directly applied to images in order to estimate pixel-level probability density and compute per-pixel reconstruction errors as anomaly scores proxies. These fully-generative models are unable to estimate the exact data likelihoods [6, 24] and do not perform better than the traditional methods [25, 7] according to MVTec survey in . Recent works [34, 15] show that these models tend to capture only low-level correlations instead of relevant semantic information. To overcome the latter drawback, a hybrid DFR model  uses a pretrained feature extractor with multi-scale pyramid pooling followed by a convolutional autoencoder (CAE). However, DFR model is unable to estimate the exact likelihoods.
Another line of research proposes to employ a student-teacher type of framework [5, 33, 41]. Teacher is a pretrained feature extractor and student is trained to estimate a scoring function for AD. Unfortunately, such frameworks underperform compared to state-of-the-art models.
Patch SVDD  and CutPaste  introduce a self-supervised pretraining scheme for AD. Moreover, Patch SVDD proposes a novel method to combine multi-scale scoring masks to a final anomaly map. Unlike the nearest-neighbor search in , CutPaste estimates anomaly scores using an efficient Gaussian density estimator. While the self-supervised pretraining can be helpful in uncommon data domains, Schirrmeister 
argue that large natural-image datasets such as ImageNet can be a more representative for pretraining compared to a small application-specific datasets industrial MVTec.
The state-of-the-art PaDiM  proposes surprisingly simple yet effective approach for anomaly localization. Similarly to [37, 7, 42], this approach relies on ImageNet-pretrained feature extractor with multi-scale pyramid pooling. However, instead of slow test-time clustering in  or nearest-neighbor search in , PaDiM uses a well-known Mahalanobis distance metric 
as an anomaly score. The metric parameters are estimated for each feature vector from the pooled feature maps. PaDiM has been inspired by Rippel who firstly advocated to use this measure for anomaly detection without localization.
DifferNet  uses a promising class of generative models called normalizing flows (NFLOWs)  for image-level AD. The main advantage of NFLOW models is ability to estimate the exact likelihoods for OOD compared to other generative models [35, 36, 3, 38, 37]. In this paper, we extend DifferNet approach to pixel-level anomaly localization task using our CFLOW-AD model. In contrast to RealNVP  architecture with global average pooling in , we propose to use conditional normalizing flows  to make CFLOW-AD suitable for low-complexity processing of multi-scale feature maps for localization task. We develop our CFLOW-AD with the following contributions:
Our theoretical analysis shows why multivariate Gaussian assumption is a justified prior in previous models and why a more general NFLOW framework objective converges to similar results with the less compute.
We propose to use conditional normalizing flows for unsupervised anomaly detection with localization using computational and memory-efficient architecture.
We show that our model outperforms previous state-of-the art in both detection and localization due to the unique properties of the proposed CFLOW-AD model.
3 Theoretical background
3.1 Feature extraction with Gaussian prior
Consider a CNN trained for classification task. Its parameters are usually found by minimizing Kullback-Leibler () divergence between joint train data distribution and the learned model distribution , where
is an input-label pair for supervised learning.
Typically, the parameters are initialized by the values sampled from the Gaussian distribution  and optimization process is regularized as
where is a regularization term and
is a hyperparameter that defines regularization strength.
3.2 A case for Mahalanobis distance
With the same MVG prior assumption, Lee  recently proposed to model distribution of feature vectors by MVG density function and to use Mahalanobis distance  as a confidence score in CNN classifiers. Inspired by , Rippel  adopt Mahalanobis distance for anomaly detection task since this measure determines a distance of a particular feature vector to its MVG distribution. Consider a MVG distribution with a density function
for random variabledefined as
where is a mean vector and is a covariance matrix of a true anomaly-free density .
Then, the Mahalanobis distance is calculated as
Since the true anomaly-free data distribution is unknown, mean vector and covariance matrix from (3) are replaced by the estimates and calculated from the empirical train dataset . At the same time, density function of anomaly data has different and statistics, which allows to separate out-of-distribution and in-distribution feature vectors using from (3).
3.3 Relationship with the flow framework
Dinh  introduce a class of generative probabilistic models called normalizing flows. These models apply change of variable formula to fit an arbitrary density by a tractable base distribution with density and a bijective invertible mapping . Then, the -likelihood of any can be estimated by
where a sample is usually from standard MVG distribution and a matrix is the Jacobian of a bijective invertible flow model and parameterized by vector .
The flow model is a set of basic layered transformations with tractable Jacobian determinants. For example, in RealNVP 
coupling layers is a simple sum of layer’s diagonal elements. These models are optimized using stochastic gradient descent by maximizing-likelihood in (4). Equivalently, optimization can be done by minimizing the reverse , where is the model prediction and
is a target density. The loss function for this objective is defined as
where is a squared Euclidean distance of a sample (detailed proof in Appendix A).
Then, the loss in (6) converges to zero when the likelihood contribution term of the model (normalized by ) compensates the difference between a squared Mahalanobis distance for from the target density and a squared Euclidean distance for .
This normalizing flow framework can estimate the exact likelihoods of any arbitrary distribution with density, while Mahalanobis distance is limited to MVG distribution only. For example, CNNs trained with regularization would have Laplace prior  or have no particular prior in the absence of regularization. Moreover, we introduce conditional normalizing flows in the next section and show that they are more compact in size and have fully-convolutional parallel architecture compared to [7, 8] models.
4 The proposed CFLOW-AD model
4.1 CFLOW encoder for feature extraction
We implement a feature extraction scheme with multi-scale feature pyramid pooling similar to recent models [7, 8]. We define the discriminatively-trained CNN feature extractor as an encoder in Figure 2. The CNN encoder maps image patches into a feature vectors that contain relevant semantic information about their content. CNNs accomplish this task efficiently due to their translation-equivariant architecture with the shared kernel parameters. In our experiments, we use ImageNet-pretrained encoder following Schirrmeister  who show that large natural-image datasets can serve as a representative distribution for pretraining. If a large application-domain unlabeled data is available, the self-supervised pretraining from [42, 19] can be a viable option.
One important aspect of a CNN encoder is its effective receptive field . Since the effective receptive field is not strictly bounded, the size of encoded patches cannot be exactly defined. At the same time, anomalies have various sizes and shapes, and, ideally, they have to be processed with the variable receptive fields. To address the ambiguity between CNN receptive fields and anomaly variability, we adopt common multi-scale feature pyramid pooling approach. Figure 2 shows that the feature vectors are extracted by pooling layers. Pyramid pooling captures both local and global patch information with small and large receptive fields in the first and last CNN layers, respectively. For convenience, we number pooling layers in the last to first layer order.
4.2 CFLOW decoders for likelihood estimation
We use the general normalizing flow framework from Section 3.3 to estimate -likelihoods of feature vectors . Hence, our generative decoder model aims to fit true density by an estimated parameterized density from (1). However, the feature vectors are assumed to be independent of their spatial location in the general framework. To increase efficacy of distribution modeling, we propose to incorporate spatial prior into model using conditional flow framework. In addition, we model densities using independent decoder models due to multi-scale feature pyramid pooling setup.
Our conditional normalizing flow (CFLOW) decoder architecture is presented in Figure 2. We generate a conditional vector using a 2D form of conventional positional encoding (PE) . Each contains and harmonics that are unique to its spatial location . We extend unconditional flow framework to CFLOW by concatenating the intermediate vectors inside decoder coupling layers with the conditional vectors as in .
Then, the th CFLOW decoder contains a sequence of conventional coupling layers with the additional conditional input. Each coupling layer comprises of fully-connected layer with kernel, softplus activation and output vector permutations. Usually, the conditional extension does not increase model size since . For example, we use the fixed in all our experiments. Our CFLOW decoder has translation-equivariant architecture, because it slides along feature vectors extracted from the intermediate feature maps with kernel parameter sharing. As a result, both the encoder and decoders have convolutional translation-equivariant architectures.
We train CFLOW-AD using a maximum likelihood objective, which is equivalent to minimizing loss defined by
where the random variable , the Jacobian for CFLOW decoder and an expectation operation in is replaced by an empirical train dataset of size . For brevity, we drop the th scale notation. The derivation is given in Appendix B.
After training the decoders for all scales using (7), we estimate test dataset -likelihoods as
Next, we convert -likelihoods to probabilities for each th scale using (8) and normalize them to be in range. Then, we upsample to the input image resolution (
) using bilinear interpolation. Finally, we calculate anomaly score maps by aggregating all upsampled probabilities as .
4.3 Complexity analysis
Table 1 analytically compares complexity of CFLOW-AD and recent state-of-the-art models with the same pyramid pooling setup .
SPADE  performs -nearest-neighbor clustering between each test point and a gallery of train data. Therefore, the method requires large memory allocation for gallery and a clustering procedure that is typically slow compared to convolutional methods.
PaDiM  estimates train-time statistics inverses of covariance matrices to calculate at test-time. Hence, it has low computational complexity, but it stores in memory matrices of size for every th pooling layer.
|# of CL||4 8||8||8||8||8||8||8||8|
|# of PL||2||2 3||3||3||3||3||3||3|
|HW||256||256||256 512||512||256 512||256 512|
|Bottle||97.6||98.64||98.3||100.00||100.0||100.0||(98.4, 95.5)||(98.3, 94.8)||(98.98, 96.80)|
|Cable||90.0||96.75||80.6||97.62||96.2||97.59||(97.2, 90.9)||(96.7, 88.8)||(97.64, 93.53)|
|Capsule||97.4||98.62||96.2||93.15||95.4||97.68||(99.0, 93.7)||(98.5, 93.5)||(98.98, 93.40)|
|Carpet||98.3||99.29||93.1||98.20||100.0||98.73||(97.5, 94.7)||(99.1, 96.2)||(99.25, 97.70)|
|Grid||97.5||98.53||99.9||98.97||99.1||99.60||(93.7, 86.7)||(97.3, 94.6)||(98.99, 96.08)|
|Hazelnut||97.3||98.81||97.3||99.91||99.9||99.98||(99.1, 95.4)||(98.2, 92.6)||(98.89, 96.68)|
|Leather||99.5||99.51||100.0||100.00||100.0||100.0||(97.6, 97.2)||(98.9, 88.8)||(99.66, 99.35)|
|Metal Nut||93.1||97.59||99.3||98.45||98.6||99.26||(98.1, 94.4)||(97.2, 85.6)||(98.56, 91.65)|
|Pill||95.7||98.34||92.4||93.02||93.3||96.82||(96.5, 94.6)||(95.7, 92.7)||(98.95, 95.39)|
|Screw||96.7||98.40||86.3||85.94||86.6||91.89||(98.9, 96.0)||(98.5, 94.4)||(98.86, 95.30)|
|Tile||90.5||95.80||93.4||98.40||99.8||99.88||(87.4, 75.9)||(94.1, 86.0)||(98.01, 94.34)|
|Toothbrush||98.1||99.00||98.3||99.86||90.7||99.65||(97.9, 93.5)||(98.8, 93.1)||(98.93, 95.06)|
|Transistor||93.0||97.69||95.5||93.04||97.5||95.21||(94.1, 87.4)||(97.5, 84.5)||(97.99, 81.40)|
|Wood||95.5||95.00||98.6||98.59||99.8||99.12||(88.5, 97.4)||(94.9, 91.1)||(96.65, 95.79)|
|Zipper||99.3||98.98||99.4||96.15||99.9||98.48||(96.5, 92.6)||(98.5, 95.9)||(99.08, 96.60)|
|Average||96.0||98.06||95.2||96.75||97.1||98.26||(96.0, 91.7)||(97.5, 92.1)||(98.62, 94.60)|
5.1 Experimental setup
dataset with surveillance camera videos. The code is in PyTorch with the FrEIA library  used for generative normalizing flow modeling.
Industrial MVTec dataset comprises 15 classes with total of 3,629 images for training and 1,725 images for testing. The train dataset contains only anomaly-free images without any defects. The test dataset contains both images containing various types of defects and defect-free images. Five classes contain different types of textures (carpet, grid, leather, tile, wood), while the remaining 10 classes represent various types of objects. We resize MVTec images without cropping according to the specified image resolution ( ) and apply augmentation rotations during training phase only.
STC dataset contains 274,515 training and 42,883 testing frames extracted from surveillance camera videos and divided into 13 distinct university campus scenes. Because STC is significantly larger than MVTec, we experiment only with resolution and apply the same pre-processing and augmentation pipeline as for MVTec.
We compare CFLOW-AD with the models reviewed in Section 2
using MVTec and STC datasets. We use widely-used threshold-agnostic evaluation metrics for localization: area under the receiver operating characteristic curve (AUROC) and area under the per-region-overlap curve (AUPRO)
. AUROC is skewed towards large-area anomalies, while AUPRO metric ensures that both large and small anomalies are equally important in localization. Image-level AD detection is reported by the AUROC only.
We run each CFLOW-AD experiment four times on the MVTec and report mean (
) of the evaluation metric and, if specified, its standard deviation (
). For the larger STC dataset, we conduct only a single experiment. As in other methods, we train a separate CFLOW-AD model for each MVTec class and each STC scene. All our models use the same training hyperparameters: Adam optimizer with 2e-4 learning rate, 100 train epochs, 32 mini-batch size for encoder and cosine learning rate annealing with 2 warm-up epochs. Since our decoders are agnostic to feature map dimensions and have low memory requirements, we train and test CFLOW-AD decoders with 8,192 (32256) mini-batch size for feature vector processing. During the train phase 8,192 feature vectors are randomly sampled from 32 random feature maps. Similarly, 8,192 feature vectors are sequentially sampled during the test phase. The feature pyramid pooling setup for ResNet-18 and WideResnet-50 encoder is identical to PaDiM . The effects of other architectural hyperparameters are studied in the ablation study.
5.2 Ablation study
Table 2 presents a comprehensive study of various design choices for CFLOW-AD on the MVTec dataset using AUROC metric. In particular, we experiment with the input image resolution (), encoder architecture (ResNet-18 , WideResnet-50 , MobileNetV3L ), type of normalizing flow (unconditional (UFLOW) or conditional (CFLOW)), number of flow coupling layers (# of CL) and pooling layers (# of PL).
Our study shows that the increase in number of decoder’s coupling layers from 4 to 8 gives on average 0.15% gain due to a more accurate distribution modeling. Even higher 1.4% AUROC improvement is achieved when processing 3-scale feature maps (layers 1, 2 and 3) compared 2-scale only (layers 2, 3). The additional feature map (layer 1) with larger scale () provides more precise spatial semantic information. The conditional normalizing flow (CFLOW) is on average 0.5% better than the unconditional (UFLOW) due to effective encoding of spatial prior. Finally, larger WideResnet-50 outperforms smaller ResNet-18 by 0.81%. MobileNetV3L, however, could be a good design choice for both fast inference and high AUROC.
Importantly, we find that the optimal input resolution is not consistent among MVTec classes. The classes with macro objects cable or pill tend to benefit from the smaller-scale processing (256256), which, effectively, translates to larger CNN receptive fields. Majority of classes perform better with 512512 inputs smaller receptive fields. Finally, we discover that the transistor class has even higher AUROC with the resized to 128128 images. Hence, we report results with the highest performing input resolution settings in the Section 5.3 comparisons.
5.3 Quantitative comparison
Table 4 summarizes average MVTec results for the best published models. CFLOW-AD with WideResNet-50 encoder outperforms state-of-the-art by 0.36% AUROC in detection, by 1.12% AUROC and 2.5% AUPRO in localization, respectively. Table 3 contains per-class comparison for the subset of models grouped by the task and type of encoder architecture. CFLOW-AD is on par or significantly exceeds the best models in per-class comparison with the same encoder setups.
Table 5 presents high-level comparison of the best recently published models on the STC dataset. CFLOW-AD outperforms state-of-the-art SPADE  by 0.73% AUROC in anomaly detection and PaDiM  by 3.28% AUROC in anomaly localization tasks, respectively.
Note that our CFLOW-AD models in Tables 3-4 use variable input resolution as discussed in the ablation study: 512512, 256256 or 128128 depending on the MVTec class. We used fixed 256256 input resolution in Table 5 for the large STC dataset to decrease training time. Other reference hyperparameters in Tables 4-5 are set as: WideResnet-50 encoder with 3-scale pooling layers, conditional normalizing flow decoders with 8 coupling layers.
5.4 Qualitative results
Figure 3 visually shows examples from the MVTec and the corresponding CFLOW-AD predictions. The top row shows ground truth masks from including examples with and without anomalies. Then, our model produces anomaly score maps (middle row) using the architecture from Figure 2. Finally, we show the predicted segmentation masks with the threshold selected to maximize F1-score.
Figure 4 presents an additional evidence that our CFLOW-AD model actually addresses the OOD task sketched in Figure 1 toy example. We plot distribution of output anomaly scores for anomaly-free (green) and anomalous feature vectors (red). Then, CFLOW-AD is able to distinguish in-distribution and out-of-distribution feature vectors and separate them using a scalar threshold .
5.5 Complexity evaluations
|Complexity metric and Model||Inference speed, fps||Model size, MB|
|R18 encoder only||80 / 62||45|
|CFLOW-AD-R18||34 / 12||96|
|WRN50 encoder only||62 / 30||268|
|CFLOW-AD-WRN50||27 / 9||947|
|MNetV3 encoder only||82 / 61||12|
|CFLOW-AD-MNetV3||35 / 12||25|
In addition to analytical estimates in Table 1, we present the actual complexity evaluations for the trained models using inference speed and model size metrics. Particularly, Table 1 compares CFLOW-AD with the models from Tables 4-5 that have been studied by Defard .
The model size in Table 6 is measured as the size of all floating-point parameters in the corresponding model its encoder and decoder (post-processing) models. Because the encoder architectures are identical, only the post-processing models are different. Since CFLOW-AD decoders do not explicitly depend on the feature map dimensions (only on feature vector depths), our model is significantly smaller than SPADE and PaDiM. If we exclude the encoder parameters for fair comparison, CFLOW-AD is 1.7 to 50 smaller than SPADE and 2 to 7 smaller than PaDiM.
Inference speed in Table 6 is measured with INTEL I7 CPU for SPADE and PaDiM in Defard  study with 256256 inputs. We deduce that this suboptimal CPU choice was made due to large memory requirements for these models in Table 6. Thus, their GPU allocation for fast inference is infeasible. In contrast, our CFLOW-AD can be processed in real-time with 8 to 25 faster inference speed on 1080 8GB GPU with the same input resolution and feature extractor. In addition, MobileNetV3L encoder provides a good trade-off between accuracy, model size and inference speed for practical inspection systems.
We proposed to use conditional normalizing flow framework to estimate the exact data likelihoods which is infeasible in other generative models. Moreover, we analytically showed the relationship of this framework to previous distance-based models with multivariate Gaussian prior.
We introduced CFLOW-AD model that addresses the complexity limitations of existing unsupervised AD models by employing fully-convolutional translation-equivariant architecture. As a result, CFLOW-AD is faster and smaller by a factor of 10 than prior models with the same input resolution and feature extractor setup.
CFLOW-AD achieves new state-of-the-art for popular MVTec with 98.26% AUROC in detection, 98.62% AUROC and 94.60% AUPRO in localization. Our new state-of-the-art for STC dataset is 72.63% and 94.48% AUROC in detection and localization, respectively. Our ablation study analyzed design choices for practical real-time processing including feature extractor choice, multi-scale pyramid pooling setup and the flow model hyperparameters.
Analyzing inverse problems with invertible neural networks. In Proceedings of the International Conference on Learning Representations (ICLR), Cited by: §5.1.
-  (2019) Guided image generation with conditional invertible neural networks. arXiv:1907.02392. Cited by: §2, §4.2.
-  (2018) Deep autoencoding models for unsupervised anomaly segmentation in brain MR images. arXiv:1804.04488. Cited by: §2, §2, §2.
MVTec AD – a comprehensive real-world dataset for unsupervised anomaly detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: Figure 1, §1, §1, §1, §2, §2, §2, §5.1, §5.1, Table 2, Table 3, Table 4.
-  (2020) Uninformed students: student-teacher anomaly detection with discriminative latent embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
-  (2019) Improving unsupervised defect segmentation by applying structural similarity to autoencoders. Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Cited by: §2.
-  (2021) Sub-image anomaly detection with deep pyramid correspondences. arXiv:2005.02357v3. Cited by: §2, §2, §2, §2, §3.3, §4.1, §4.3, Table 1, §5.3, Table 3, Table 4, Table 5, Table 6.
-  (2021) PaDiM: a patch distribution modeling framework for anomaly detection and localization. In Proceedings of the International Conference on Pattern Recognition (ICPR) Workshops, Cited by: §1, §2, §3.2, §3.3, §4.1, §4.3, Table 1, §5.1, §5.3, §5.5, §5.5, Table 3, Table 4, Table 5, Table 6.
-  (2017) Density estimation using real NVP. In Proceedings of the International Conference on Learning Representations (ICLR), Cited by: §2, §3.3, §3.3.
-  (2016) A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLOS ONE. Cited by: §1.
-  (2016) Deep learning. MIT Press. Cited by: §3.1, §3.3.
-  (2015) Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Cited by: §3.1.
-  (2016) Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2, §3.1, §5.2.
-  (2019) Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Cited by: §5.2.
-  (2020) Why normalizing flows fail to detect out-of-distribution data. In Advances in Neural Information Processing Systems, Cited by: §2.
-  (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, Cited by: §2.
-  (1992) A simple weight decay can improve generalization. In Advances in Neural Information Processing Systems, Cited by: §3.1.
-  (2018) A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Advances in Neural Information Processing Systems, Cited by: §3.2.
-  (2021) CutPaste: self-supervised learning for anomaly detection and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2, §4.1, Table 3, Table 4.
-  (2020) Multi-granularity tracking with modularlized components for unsupervised vehicles anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Cited by: §1.
-  (2017) A revisit of sparse coding based anomaly detection in stacked RNN framework. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Cited by: §2, §5.1, Table 5.
-  (2016) Understanding the effective receptive field in deep convolutional neural networks. In Advances in Neural Information Processing Systems, Cited by: §4.1.
-  (1936) On the generalized distance in statistics. Proceedings of the National Institute of Sciences (Calcutta). Cited by: §2, §3.2.
-  (2019) Do deep generative models know what they don’t know?. In Proceedings of the International Conference on Learning Representations (ICLR), Cited by: §2.
-  (2018) Anomaly detection in nanofibrous materials by CNN-based self-similarity. Sensors. Cited by: §2, §2, §2.
-  (2021) Deep learning for anomaly detection: a review. ACM Comput. Surv.. Cited by: footnote 2.
Normalizing flows for probabilistic modeling and inference.
Journal of Machine Learning Research. Cited by: §3.3.
-  (2017) Automatic differentiation in PyTorch. In Autodiff workshop at Advances in Neural Information Processing Systems, Cited by: §5.1.
Modeling the distribution of normal data in pre-trained deep features for anomaly detection. arXiv:2005.14140. Cited by: §2, §3.2, §3.2.
-  (2021) Same same but DifferNet: semi-supervised defect detection with normalizing flows. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Cited by: §2, Table 4.
-  (2021) A unifying review of deep and shallow anomaly detection. Proc of the IEEE. Cited by: footnote 2.
-  (2013) Object-centric anomaly detection by attribute-based reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1.
-  (2021) Multiresolution knowledge distillation for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
-  (2020) Understanding anomaly detection with deep invertible networks through hierarchies of distributions and features. In Advances in Neural Information Processing Systems, Cited by: §2, §2, §4.1.
-  (2019) F-AnoGAN: fast unsupervised anomaly detection with generative adversarial networks. Medical image analysis. Cited by: §2, §2, §2.
-  (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International Conference on Information Processing in Medical Imaging, Cited by: §2, §2, §2.
-  (2021) Unsupervised anomaly segmentation via deep feature reconstruction. Neurocomputing. Cited by: §2, §2, §2, Table 4.
q-space novelty detection with variational autoencoders. MICCAI 2019 International Workshop on Computational Diffusion MRI. Cited by: §2, §2, §2.
-  (2017) Attention is all you need. In Advances in Neural Information Processing Systems, Cited by: §4.2.
-  (2020) Attention guided anomaly localization in images. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: Table 5.
-  (2021) Student-teacher feature pyramid matching for unsupervised anomaly detection. arXiv:2103.04257. Cited by: §2.
-  (2020) Patch SVDD: patch-level SVDD for anomaly detection and segmentation. In Proceedings of the Asian Conference on Computer Vision (ACCV), Cited by: §2, §2, §4.1, Table 4.
-  (2016) Wide residual networks. In Proceedings of the British Machine Vision Conference (BMVC), Cited by: §2, §3.1, §5.2.
-  (2020) Encoding structure-texture relation with P-Net for anomaly detection in retinal images. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §1.