Manifolds for Unsupervised Visual Anomaly Detection

06/19/2020 ∙ by Louise Naud, et al. ∙ 0

Anomalies are by definition rare, thus labeled examples are very limited or nonexistent, and likely do not cover unforeseen scenarios. Unsupervised learning methods that don't necessarily encounter anomalies in training would be immensely useful. Generative vision models can be useful in this regard but do not sufficiently represent normal and abnormal data distributions. To this end, we propose constant curvature manifolds for embedding data distributions in unsupervised visual anomaly detection. Through theoretical and empirical explorations of manifold shapes, we develop a novel hyperspherical Variational Auto-Encoder (VAE) via stereographic projections with a gyroplane layer - a complete equivalent to the Poincaré VAE. This approach with manifold projections is beneficial in terms of model generalization and can yield more interpretable representations. We present state-of-the-art results on visual anomaly benchmarks in precision manufacturing and inspection, demonstrating real-world utility in industrial AI scenarios. We further demonstrate the approach on the challenging problem of histopathology: our unsupervised approach effectively detects cancerous brain tissue from noisy whole-slide images, learning a smooth, latent organization of tissue types that provides an interpretable decisions tool for medical professionals.



There are no comments yet.


page 7

page 9

page 10

page 18

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Annotating visual data can be burdensome and expensive in most real-world applications; for example, medical professionals manually inspecting and labeling massive whole-slide images (WSI) for thousands of nucleotides, lymphocytes, tumors, etc. This is exponentially so when trying to label a sufficient amount of anomalous data, as anomalies are by definition rare; even more, we have to assume there are unforeseen anomalous scenarios to arise in the future. Unsupervised methods are thus advantageous, and have seen promising advances with deep generative vision models. Recent and noteworthy work has been developing methods with Variational Auto-Encoders (VAE) Kingma and Welling (2014); Rezende et al. (2014) and Generative Adverserial Networks (GAN) Goodfellow et al. (2014) towards these tasks An and Cho (2015); Zong et al. (2018); Schlegl et al. (2017); Pidhorskyi et al. (2018); Deecke et al. (2018).

Deep generative models learn a mapping from a low-dimensional latent space to a high-dimensional data space, centered around

the manifold hypothesis

: high-dimensional observations are concentrated around a manifold of much lower dimensionality. It follows that by learning the proper manifold we can model the observed data with high-fidelity. It is our aim to investigate properties of nonlinear manifolds and regularity conditions that behoove visual data representation for anomaly detection. We hypothesize Riemannian manifold curvatures other than the typical flat, Euclidean space can provide a more natural embedding on which to infer anomalous data in images.

Non-Euclidean latent spaces have recently been proposed in deep generative models, namely hyperbolic and hyperspherical metric spaces. With the former, the latent Poincaré space is shown to learn hierarchical representations from textual and graph-structured data Nickel and Kiela (2017); Tifrea et al. (2019), and from images with the Poincaré VAE of Mathieu et al. (2019)

. Spherical embedding spaces have been shown useful for class separation and smooth interpolation in the manifold towards computer vision tasks

Mettes et al. (2019); Davidson et al. (2018); Haney and Lavin (2020). We hypothesize these manifold geometries naturally represent distinct normal and abnormal visual data distributions, and can be learnt from data without labels via latent manifold mappings in deep generative models. We take care to investigate the properties of these manifolds most relevant to learning and inferring on unlabeled visual data, and carry out thorough experiments to understand the effects of various Riemannian manifold regimes. We indeed confirm our hypotheses and develop novel VAE methods for utilizing the various manifold curvatures.

Our main contributions111

Code will be open-sourced with camera-ready publication.


  1. Theoretical utilities of Riemannian manifolds for the generative model latent space, towards naturally and efficiently embedding both normal data and sparse anomalous data.

  2. Proposal of Stereographic Projection Variational Auto-Encoders, towards unsupervised visual anomaly detection. We derive a novel gyroplane layer

    for a neural network to be capable of stereographic projections across hyperspherical and hyperbolic manifold shapes.

  3. Empirical analyses of our approach vs comparable methods on challenging benchmark datasets for unsupervised visual anomaly detection, achieving state-of-the-art results.

  4. Neuropathology experiments that show our VAE method can reliably organize the various subtypes of brain tissue without labels, and identify anomalous tissues samples as cancerous. We further motivate the hyperbolic latent space by demonstrating Poincaré mapping, to visualize the latent organization and reliably interpolate between regions of normal and abnormal brain tissue.

2 Representation Learning in Generative Vision

Figure 1: The three regimes of constant curvature Riemannian manifolds, for which we can utilize the stereographic projections of the hyperboloid (left) and hypersphere (right) to respectively yield the Poincaré ball and projected sphere manifolds. Example geodesic arcs of this projection are shown. The mapping is smooth, bijective, and conformal (preserving the angles at which curves meet). This projection is necessary to yield manifolds with consistent modeling properties across the spectrum of curvatures (see text for details).

2.1 Properties of Manifold Curvatures

Consider a true data-generating process that draws samples according to , where is a -dimensional Riemannian manifold embedded in the -dimensional data space , and

. We consider the two problems of estimating the density

as well as the manifold given some training samples .

A deep generative model represents a mapping, , from a relatively low-dimensional latent space to a high-dimensional data space . The learned manifold is a lower dimensional subset of , the input space of images, and is embedded in under fairly weak assumptions on the generative model itself; a generative model with a suitable capacity of representation will recover this smoothed approximation of .

With respect to the (constant) curvature of there are three regimes of Riemannian manifolds to consider: Euclidean, "flat" space , with curvature ; hyperspherical, positively curved space , with ; and hyperbolic, negatively curved space , with .

By definition of Riemannian geometry, the inner-product for both curved regimes and . So as , both hyperspherical and hyperbolic spaces grow and become locally flatter, and . This "non-convergence" property of constant curvature manifolds sends points away from the coordinate space origin in order to maintain the defined curvature of . We also observe an instability as : the hyperspherical and hyperbolic geodesic distance metrics do not converge to the Euclidean distance metric. This is an undesirable property because we a priori must restrict the manifold curvature while learning a deep generative model.

On the other hand, stereographically projected spaces for both the hypersphere and hyperboloid manifold classes inherit the desirable properties from hyperspherical and hyperbolic spaces, while avoiding this property of sending a point to infinity when the curvature of the space has a small absolute value. This projection function is defined as follows, for a manifold of curvature :


where is a point in the ambient space of , .222Skopek et al. (2019) similarly define such a projection function. We note our work was done concurrently, and indeed much of the findings are complimentary.

The stereographic projections relative to the three Riemannian manifold regimes are illustrated in Fig. 1. Later we detail a novel gyroplane layer for performing the stereographic projections in the context of a deep generative neural network.

From Eq. 1 we realize several advantageous properties on these two projected spaces:

The Möbius sum

of two elements has the same structure for both projected spaces, returns an element of the same space, and only involves the Euclidean inner product (identical for every point). This has the nice consequence that the distance function in projected spaces only uses the Euclidean inner product, instead of the inner product induced by the metric tensors of the manifold, which varies on each point of the manifold.

The conformal projection

preserves angles, and the distance functions in the hyperbolic and hyperspherical spaces only depend on angles between vectors. This implies the hyperbolic (resp. hyperspherical) space and the Poincaré ball (resp. projected hypersphere) are isometric.

It is for these reasons we develop deep generative models with these two projected spaces.

If we denote the function that corresponds to if and if (and similarly for and ), the distance function on stereographically projected spaces is:

whereas the gyroscopic distance function on stereographically projected spaces takes a simpler form:


with representing the Mobius addition.

An advantage of a smooth, regularized latent embedding space is the ability to interpolate between data points; see Fig. 2. Interestingly, Shao et al. (2018) show straight lines in the latent space are relatively close to geodesic curves on the manifold, explaining why traversal in the latent space results in visually plausible changes to the generated data. This may work for toy datasets such as MNIST and low-quality natural images (such as CelebA faces dataset). However, in real-world images we suggest the curvilinear distances in the original data metric are not well enough preserved. Even more, we hypothesize the observations of Shao et al. (2018) will not extend beyond the standard Euclidean manifold to . We explore this empirically with large, complex images in histopathology datasets later.

Figure 2: The Poincaré ball provides meaningful geodesics for latent hierarchies, and a well-regularized space where interpolating along hyperbolic geodesics allows for reliable intermediate sampling and the prediction of unseen samples. Figure is revised from Klimovskaia et al. (2019).

2.2 Manifold Learning with VAEs

Our aim is learning manifolds for unsupervised anomaly detection. As such we focus on the Variational Auto-Encoder (VAE) Kingma and Welling (2014); Rezende et al. (2014) class of deep generative models. We refer the reader to our Related Work section later for treatment on comparable Generative Adversarial Networks (GANs) Goodfellow et al. (2014); Radford et al. (2015).

The VAE is a latent variable model representing the mapping , with an encoder stochastically embedding observations in the low-dimensional latent space , and a decoder generating observations from encodings . The model uses two neural networks to respectively parameterize the likelihood and the variational posterior .

Typically the prior distribution assigned to the latent variables is a standard Gaussian. Recent work suggests this limits the capacity to learn a representative latent space (such as Mathieu et al. (2019); Davidson et al. (2018), and others discussed later in the Related Work section). We consider that the limitations of the prior are not due to a limitation in terms of capacity of representation, but more so in terms of principle. Similar to Kalatzis et al. (2020), we identify two major drawbacks of using the Euclidean manifold for the latent space:

Lack of learned semantics.

The first drawback resides in the fact that a Normal distribution or a Gaussian mixture (in a Euclidean space) can be re-parameterized in a manner that does not portray any semantic meaning for the latent data. For instance, a mixture of Gaussians can be simply re-parameterized by a random permutation of the indices of each component in the mixture (

Bishop (2006)); while the re-parameterization is valid, the semantic meaning we associate to it is drastically different. This can rotate arbitrarily the principal components of the Euclidean latent space, and the Euclidean distance will not have a relevant meaning in terms of visual or semantic closeness in the latent space. Moreover, as has been described in Arvanitidis et al. (2016); Hauberg (2018), if the decoder has a sufficient capacity of representation, it will be able to revert any re-parameterization applied in the latent space. This has the consequence that a specific value in the latent space may not be associated with a unique specific value in . In the context of anomaly detection, this could result in anomalous samples aligning closer to the larger groups of normal samples rather than to other anomalous samples, resulting in false negatives for entire subgroups of anomalies.

Irrelevant isotropic sampling.

Secondly, Zhu et al. (2016) suggest that human interpretable images live on a specific manifold, the "natural images manifold", noted here. This manifold is a lower dimensional subset of , the input space of images, and is embedded in under fairly weak assumptions on the network architecture. An encoder with a suitable capacity of representation will recover a smoothed approximation of . This can create a latent space with a significant variable density in terms of latent samples; if our prior distribution is an isotropic Gaussian for instance, samples will be drawn in a rather isotropic manner, even though the distribution of latent samples may not have any sample in this specific area. As such, the sampling procedure in the latent space can return samples that are not relevant. Moreover, a likely consequence of this aforementioned embedding is the "manifold mismatch", or its statistical equivalent "density mismatch" Davidson et al. (2018); Falorsi et al. (2018). Under the assumption of a prior distribution with an infinite support, the VAE may try to map anywhere in the space , and could lead to convergence issues.

Given the requirements of the visual anomaly detection problem, it is highly desirable to have a semantically meaningful topology which automatically embeds data according to hidden data structure, and from which we can reliably sample despite empty regions due to sparsely distributed data points. This leads us to think that an Euclidean latent space may not capture enough topological properties for visual anomaly detection.

3 Stereographic Projections VAE

Our aim is to construct a Poincaré ball latent space (as shown in Fig. 1), and supporting encoder and decoder networks in order to learn a mapping from this latent space to the observation space .

Parametrising distributions on the Poincaré ball

The choice of the probability distribution family for both the prior and the posterior (as the likelihood still lives in the Euclidean space), can be done similarly as in the Euclidean space. There are two distinct philosophies for adapting the Normal distribution to a Riemannian space. The first approach is to consider the Euclidean space that is tangent at every point

in the manifold, and sample from a zero-mean Euclidean Normal distribution in this tangent space. Then, the sampled point on the tangent space is mapped to the manifold through parallel transport and the exponential map. This is known as as the "wrapping" approach. In the second approach, we can maximize the entropy of the distribution to derive what is known as the Riemannian Normal. While the latter is the only form of distribution that is proven to maximize the entropy, both distributions perform similarly in practice. Hence, we choose to use the Wrapped Normal, as it is easier to sample from. We refer to both as Hyperbolic Normal distributions with pdf . We also define the prior on as the Hyperbolic Normal, with mean zero: .

SP-VAE Architecture

Just as in the case of a Euclidean latent space, this network is optimized by maximizing the evidence lower bound (ELBO), via an unbiased Monte Carlo (MC) estimator thanks to reparametrisable sampling schemes introduced in Mathieu et al. (2019); Ganea et al. (2018). It was proven in Mathieu et al. (2019) that the ELBO can be extended to Riemannian latent spaces by applying Jensen’s inequality w.r.t. the measure on the manifold. We use -VAE Higgins et al. (2017), a variant of VAE that applies a scalar weight to the KL term in the objective function, as it has been shown empirically that the -VAE improves the disentanglement of different components of the latent space when . As we want to compare the shape of the latent manifold for visual anomaly detection in real-world applications, we chose the encoder and decoder backbones as a 4-layer convolutional network; simple enough to be able to compare all three curvature configurations, but able to learn the representation of complex images. Just as in Mathieu et al. (2019); Skopek et al. (2019), we use an exponential map to transform the mean of the distribution from the encoder, and then use a gyroplane layer to go back from the Riemannian latent space to the Euclidean space, in order to take into account the shape of the manifold when applying a linear layer.

3.1 Gyroplane Layer

As described in Ganea et al. (2018) and Mathieu et al. (2019), the first layer of a decoder in a VAE whose latent manifold is Euclidean and -dimensional is often a linear layer. A linear layer is an affine transform, and can be written in the form , with , the orientation parameter, and the offset parameter, elements of . This expression can be rewritten as , where

is the hyperplane oriented by

with offset .

In the stereographically projected sphere manifold , the hyperplane is of the form ; we provide the full proof in Supplementary materials. The distance of a point to takes the following form:


This expression was intuitively attainable from Mathieu et al. (2019); Ganea et al. (2018), but here we provide thorough derivation and rationale.

4 Related Work

VAE and Riemannian Manifolds

In Variational Auto-Encoders (VAEs) Kingma and Welling (2014); Rezende et al. (2014), the prior distribution assigned to the latent variables is typically a standard Gaussian. It has, unfortunately, turned out that this choice of prior is limiting the modeling capacity of VAEs and richer priors have been proposed: Tomczak and Welling (2017) propose VampPrior, a method for the latent distribution to instead be a mixture of Gaussians. van den Oord et al. (2017) propose VQ-VAE, a way to encode more complex latent distributions with a vector quantization technique. In Klushyn et al. (2019a), Klushyn et al. proposed a hierarchical prior through an alternative formulation of the objective. In Bauer and Mnih (2018), Bauer et al. propose to refine the prior through a sampling technique. Several notable VAEs with non-Euclidean latent spaces have been developed recently: Davidson et al. (2018) make use of hyperspherical geometry, Falorsi et al. (2018) endow the latent space with a SO(3) group structure, Grattarola et al. (2019) introduce an adversarial auto-encoder framework with constant curvature manifold. However, in these methods the encoder and decoder are not designed to explicitly take into account the latent space geometries. Same goes for Ovinnikov (2019), who proposed to use a Poincaré ball latent space, but were not able to derive a closed-form solution of the ELBO’s entropy term. Mathieu et al. (2019) propose the Poincaré VAE, closely aligned with our work. We extend it mainly to consider practical properties of the manifold geometries towards real applications, arriving at the stereographic projection mechanisms. The method most related to the current paper is mixed-curvature VAE from Skopek et al. (2019). They similarly define a projection across hyperboloid and hypersphere spaces for use in VAEs. Our work was done concurrently, and much of the findings are complimentary.

Visual Anomaly Detection

Anomaly detection is a deep field with many application areas in machine learning. We focus on the image domain, referring the reader to

Chandola et al. (2009); Pimentel et al. (2014)

and references therein for full surveys of the field. A promising area in visual anomaly detection is reconstruction-based methods, with recent works that train deep autoencoders to detect anomalies based on reconstruction error

Zhou and Paffenroth (2017); Zhai et al. (2016); Zong et al. (2018). For example, Zhai et al. (2016) use a structured energy based deep neural network to model the training samples, and Zong et al. (2018) proposed to jointly model the encoded features and the reconstruction error in a deep autoencoder. Although the reconstruction-based methods have shown promising results, their performances are ultimately restricted by the under-designed representation of the latent space. While we focus on images, there exist methods for videos such as applying PCA with optical flow methods Kim and Grauman (2009) and RNNs for next-frame predictions Luo et al. (2017). Schlegl et al. (2017) applied Generative Adverserial Networks (GANs) to the task of VAD. Their AnoGAN was succeeded my the more efficient EGBAD that uses a BiGAN approach Zenati et al. (2018). In Akçay et al. (2018),the combination of a GAN and autoencoder was introduced. For more on GANs in anomaly detection please refer to Mattia et al. (2019). We compare against GANs in the Experiments section, and show superior results with our VAE method. Even more, VAEs are a preferable class of deep generative models because they provide a natural probabilistic formulation, readily work with various priors, and are easier to train.

5 Experiments

5.1 Visual Anomaly Detection Problem Setup

In this paper we consider two related but distinct problems of unsupervised anomaly detection in images: scoring and localization. Let be the space of all images in our domain of interest, and let be the set of images defined as normal. We investigate two different metrics: the reconstruction error probability, which can be used for both tasks, as well as the ELBO derivative with respect to the input. For scoring, we use the average value (

) and standard deviation (

) of the reconstruction error on all pixels on the test set, and take a threshold at . The producing a mask from this result gives us the anomaly localization.

We evaluated our approach on several benchmark datasets for visual anomaly detection. Importantly, we focus on those with real-world images. Prior works limit evaluations to MNIST and Omniglot datasets, which are not representative of natural images.

5.2 Crack Segmentation & PCB Defects Benchmarks

We experiment on two benchmark anomaly detection datasets of real-world images. The Crack Segmentation dataset contains images of cracked surfaces (brick walls, concrete roads, lumpy surfaces, etc.), concatenating images from several datasets: Crack 500 Zhang et al. (2016), CrackTree200 Zou et al. (2012) and AELLT Amhaz et al. (2016), and others.333Crack Segmentation dataset is available at We also experiment with the PCB Dataset Huang and Wei (2019) for defect detection in precision manufacturing. The dataset contains 3597 training images, 1161 validation images, and 1148 testing images, at various resolutions. The dataset is made from defect-free images, and defects are added in images with annotations, including positions of the six most common types of PCB defects (open, short, mousebite, spur, pin hole, and spurious copper).

Figure 3: 2-D Poincaré Ball Embeddings for the PCB dataset, with SVDD scores level lines. Purple points are normal instances, others are anomalous instances.

In constructing the PCB dataset, PCB images were divided in non-overlapping patches. Patches that contain anomalous pixels where stored in the anomalous set of patches and the remaining patches in the "normal" set . Due to the creation process for this dataset, the "anomaly" set has a larger amount of elements than the "normal" set. In order to fit usual real industrial inspection data distribution, a random subset of the anomalous samples was selected, with .

In order to obtain the Poincaré embeddings for the PCB dataset, we adapted unsupervised anomaly detection method Deep SVDD Ruff et al. (2018) for the Poincaré ball latent space. The Auto-Encoder (AE) used a ResNet-18 (He et al. (2015)) backbone for the encoder, followed by a -map and two hyperbolic linear layers to obtain feature vectors in the Poincaré ball manifold; the latent space is 2-dimensional. The decoder was composed of an -operator followed by a deconvolutional ResNet-18 as the backbone. The AE was pretrained for epochs, with the Riemannian Adam optimizer from the Geoopt library (Kochurov et al. (2020)), and a learning rate of in the first epochs and for the remaining epochs.

The second step of the Deep SVDD method starts with center-initialization. In the Euclidean case, the initialization is accomplished by averaging all the features vectors from the training set as output by the trained encoder. In the hyperbolic case, with feature vectors on the Poincaré ball, we computed the gyrobarycenter instead of the Euclidean average. Then, the encoder was fine-tuned with the following loss:


with , the initialized center, the weights of the encoder, and the regularization parameter. The encoder was then trained with the SVDD objective, formulated for the Poincaré ball manifold, also with the Riemannian Adam optimizer and the optimization parameters from the original paper for 150 epochs.

Anomaly scores were computed as: , with an input sample, the center as defined in our previous gryrobarycenter calculation, and the mapping learning by the encoder. The radius of the hyperbolic "sphere" was selected similarly to the Deep SVDD paper, as the

-ile of the the computed scores on the testing set. All samples whose score exceeded this radius were classified as anomalies, as shown in Fig.

3. We applied the hyperbolic UMAP algorithm (with and ) to produce easily interpretable figures for the Poincaré embeddings, with level lines of the anomaly score function inside the ball.

Below are results for both visual anomaly detection tasks, on these two datasets:

Datasets Precision Recall F1 IoU
Crack segmentation 0.4206 1.0 0.5921 0.5470
PCB 0.4514 0.9228 0.6063 0.2942
Table 1: Euclidean, dimension 6
Datasets Precision Recall F1 IoU
Crack segmentation 0.4205 1.0 0.4205 0.5083
PCB 0.4462 0.9520 0.6076 0.2911
Table 2: Projected Sphere, dimension 6
Datasets Precision Recall F1 IoU
Crack segmentation 0.42055 0.9994 0.59199 0.5087
PCB 0.4264 0.9530 0.5801 0.2950
Table 3: Poincaré Ball, dimension 6

Overall, all three manifolds perform similarly across datasets. On images with very orthogonal features, the Euclidean manifold seems to perform better, both on the scoring and segmentation tasks. For crack images, which contain very non-linear cracks, the Poincaré ball performs best for the localization tasks.

5.3 Application in Histopathology

Figure 4: Poincaré ball of the learned latent embedding, showing a structure that separates cancerous tissue (top) from normal tissue (bottom) and non-tissue (e.g. surgical material). We also see some semantically meaningful hierarchy develop; the manifold center splits normal and ab-normal tissues, and progressing down the branch of cancerous tissue we see patterns such as cohesive lesions (meningioma and metastasis) being arranged close together. Importantly this organization is learned unsupervised. Figure is best viewed in color.

We investigated the applicability of our approach to the challenging task of diagnostic neuropathology, the branch of pathology focused on the microscopic examination of neurosurgical specimens. We experimented with a dataset of H&E-stained whole-slide images (WSI) of a glioblastoma containing a heterogeneous mixture of tumor, necrosis, brain tissue, blood and surgical material. See the Supplement for dataset details.

Manual inspection of WSI to sufficiently search for metastatic cells amongst normal cells is infeasible; a single WSI may contain millions of lymphocytes, nucleotides, and other cells for inspection. Automated and unsupervised computer vision methods could prove invaluable. Even more, the task of diagnosis can be error-prone and subjective. For one, overlapping patterns of the most common brain tumor types – gliomas, meningiomas, schwannomas, metastases, and lymphomas – present a challenge. Even more, although these five tumor types represent the majority of cases encountered in clinical practice (75–80%), there are over 100 different brains tumor subtypes to be considered, many of which are exceedingly rare et al. (2014). Similarly, new diseases (e.g. Zika encephalitis) continually arise. For these reasons, we hypothesize the latent hierarchical representation learned by our Stereographic Projection VAE can delineate these complex subtypes while providing a continuous trajectory relating any two points.

The experiment setup was as follows: We trained unsupervised on a dataset of 1024 x 1024 pixel images tiled from WSIs, representing eight non-lesional categories (hemorrhage, surgical material, dura, necrosis, blank slide space, and normal cortical gray, white, and cerebellar brain tissue), and the aforementioned five common lesional subtypes (gliomas, meningiomas, schwannomas, metastases, and lymphomas). Fig. 4 shows example patches from each of these categories, displayed across the learned hyperbolic manifold. The Supplement additionally contains WSI examples and data details.

The model used here is the same as in the experiments described earlier, but for a slight change in the parameter: We used a simulated annealing approach to progressively update as a function of the reconstruction loss during training. Some methods simply use a linear increase schedule for , but such predefined schedules may be suboptimal. The method is not unlike that of Klushyn et al. (2019b), for which we provide details in the Supplementary materials.

We find that our model learns a latent hierarchical embedding that organizes the distributions of normal tissue, non-tissue materials, and cancerous tissue, shown in Fig. 4. Not to mention the manifold embedding reveals interpretable semantics of known and potentially unknown tissue relationships. In the context of unsupervised visual anomaly detection, we learn this latent embedding without labels, and identify cancerous tissues as sparse anomalies that are distributed on Poincaré ball regions opposite normal tissue. If we then train a classifier on the anomalous samples that are discretized by the unsupervised embedding, we achieve a classification performance of > 0.97 across the five disease subtypes, as assessed by the areas under the multi-class receiver operator curve (AUC, mROC); this is consistent with the classification scheme and state-of-the-art results in the fully supervised approach of Faust et al. (2018)).

As mentioned earlier, interpolation in the latent space of a Euclidean VAE is possible because, for simple data regimes, the linear interpolation metric closely approximates the true geodesic curves. We suggest this approximation does not necessarily hold when using a non-Euclidean latent space, particularly when the deep generative model is learning in a complex image space where curvature plays a more prominent role. We investigate this by carrying out geodesic interpolations and comparing these with the corresponding linear counterparts in space. We use Eqn. 2 to estimate the geodesic curve connecting a given pair of images on the generated manifold, discretizing the curve at 10 points. To get an image on the generated manifold, we pick a real image from the dataset and use to get the corresponding point on the generated manifold. Fig. 5 shows example linear and geodesic interpolations between the same endpoints on . We find the linear approximations to be unreliable, counter to prior findings that focused on Euclidean manifolds Shao et al. (2018) and relatively simple images.

Figure 5: Interpolation along the learned manifold from normal grey matter brain tissue (left) to cancerous glioma tissue (right). The top row represents samples along the geodesic path, while the bottom represents samples from the linear approximation. We find that interpolating along the curved manifold yields intermediary samples that are reasonable tissue images, perhaps representing true intermediate cellular states. Linear interpolation comparatively yields blurred intermediary structures. For example, notice the center-top white gap structure that grows organically as the geodesic samples progress from normal to cancerous. Comparatively, the linear samples on the bottom row show unnatural blending. Note both rows are sub-sampled from the latent space learned by our Stereographic Projection VAE, but along different interpolation paths. The Euclidean latent space of a -VAE yielded blurry samples without cellular structures.

Even more, we suggest the linear approximation can yield calibration errors when probing for specific points along an interpolation "arc", particularly with long-range interpolations between sparse data points that arise in anomaly detection settings. We elucidate this in Fig. 5. In scientific endeavors such as interpolating between known and rare brain tumor representations, we need points that reliably lie on the manifold surface such that generated images represent plausible samples.

6 Conclusion

In this paper we explored Riemannian manifolds in deep generative neural networks towards unsupervised visual anomaly detection. Key insights were derived from investigations into specific properties of Riemannian curvatures that best enable natural and efficient embedding of both normal data and sparse anomalous data. To work with such manifolds in the context of Variational Auto-Encoders (VAEs), we derived a gyroplane layer that enables stereographic projections between hyperspherical and hyperbolic latent spaces: a Stereographic Projection VAE. Empirically we found our hypotheses to be valid, and matched state-of-the-art results on real world benchmarks. We also made valuable observations regarding manifold interpolations and sampling, finding linear approximations of geodesic curves to be unreliable. In the challenging domain of neuropathology, our model learns a latent hierarchical organization of brain cancer subtypes and other tissues, despite not using labels. Using Poincaré mapping we effectively interpolate across the manifold, yielding reliable intermediate samples from the manifold. Without this capability, a deep generative model does not necessarily satisfy the manifold hypothesis for natural images. Future work would be to continue development of our approach in histopathology, as this can be a valuable decision support tool for pathologists, and can theoretically be applied to diverse tissue and disease classes. We would also like to continue working with Poincaré mapping and other methods that can derive insights directly from the rich latent space.


  • S. Akçay, A. A. Abarghouei, and T. P. Breckon (2018) GANomaly: semi-supervised anomaly detection via adversarial training. ArXiv abs/1805.06725. Cited by: §4.
  • A. A. Alemi, B. Poole, I. S. Fischer, J. V. Dillon, R. A. Saurous, and K. Murphy (2018) Fixing a broken elbo. In ICML, Cited by: §D.
  • R. Amhaz, S. Chambon, J. Idier, and V. Baltazart (2016) Automatic crack detection on two-dimensional pavement images: an algorithm based on minimal path selection. IEEE Transactions on Intelligent Transportation Systems 17, pp. 2718–2729. Cited by: §5.2.
  • J. An and S. Cho (2015) Variational autoencoder based anomaly detection using reconstruction probability. Cited by: §1.
  • G. Arvanitidis, L. K. Hansen, and S. Hauberg (2016) A locally adaptive normal distribution. External Links: 1606.02518 Cited by: §2.2.
  • M. Bauer and A. Mnih (2018) Resampled priors for variational autoencoders. External Links: 1810.11428 Cited by: §4.
  • C. M. Bishop (2006) Pattern recognition and machine learning (information science and statistics). Springer-Verlag, Berlin, Heidelberg. External Links: ISBN 0387310738 Cited by: §2.2.
  • V. Chandola, A. Banerjee, and V. Kumar (2009) Anomaly detection: a survey. ACM Comput. Surv. 41, pp. 15:1–15:58. Cited by: §4.
  • T. R. Davidson, L. Falorsi, N. D. Cao, T. Kipf, and J. M. Tomczak (2018) Hyperspherical variational auto-encoders. In UAI, Cited by: §1, §2.2, §2.2, §4.
  • L. Deecke, R. A. Vandermeulen, L. Ruff, S. Mandt, and M. Kloft (2018) Image anomaly detection with generative adversarial networks. In ECML/PKDD, Cited by: §1.
  • Q. O. et al. (2014) CBTRUS statistical report: primary brain and central nervous system tumors diagnosed in the united states in 2007-2011. Neuro Oncol 16 Suppl 4, pp. iv1–63. Cited by: §5.3.
  • L. Falorsi, P. de Haan, T. R. Davidson, N. D. Cao, M. Weiler, P. Forré, and T. S. Cohen (2018) Explorations in homeomorphic variational auto-encoding. External Links: 1807.04689 Cited by: §2.2, §4.
  • K. Faust, Q. Xie, D. Han, K. Goyle, Z. I. Volynskaya, U. Djuric, and P. Diamandis (2018)

    Visualizing histopathologic deep learning classification and anomaly detection using nonlinear feature space dimensionality reduction

    BMC Bioinformatics 19. Cited by: §C, §C, §5.3.
  • O. Ganea, G. Bécigneul, and T. Hofmann (2018) Hyperbolic neural networks. External Links: 1805.09112 Cited by: §B, §3, §3.1, §3.1.
  • I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio (2014) Generative adversarial nets. In NIPS, Cited by: §1, §2.2.
  • D. Grattarola, L. Livi, and C. Alippi (2019) Adversarial autoencoders with constant-curvature latent manifolds. Appl. Soft Comput. 81. Cited by: §4.
  • B. Haney and A. Lavin (2020) Fine-grain few-shot vision via domain knowledge as hyperspherical priors. In CVPR Workshop on Fine-grained Categorization, Cited by: §1.
  • S. Hauberg (2018) Only bayes should learn a manifold (on the estimation of differential geometric structure from data). External Links: 1806.04994 Cited by: §2.2.
  • K. He, X. Zhang, S. Ren, and J. Sun (2015) Deep residual learning for image recognition. CoRR abs/1512.03385. External Links: Link, 1512.03385 Cited by: §5.2.
  • I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. M. Botvinick, S. Mohamed, and A. Lerchner (2017) Beta-vae: learning basic visual concepts with a constrained variational framework. In ICLR, Cited by: §D, §3.
  • W. Huang and P. Wei (2019) A pcb dataset for defects detection and classification. External Links: 1901.08204 Cited by: §5.2.
  • D. Kalatzis, D. Eklund, G. Arvanitidis, and S. Hauberg (2020) Variational autoencoders with riemannian brownian motion priors. External Links: 2002.05227 Cited by: §2.2.
  • J. Kim and K. Grauman (2009) Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates. In CVPR, Cited by: §4.
  • D. P. Kingma and M. Welling (2014) Auto-encoding variational bayes. CoRR abs/1312.6114. Cited by: §1, §2.2, §4.
  • A. Klimovskaia, D. Lopez-Paz, L. Bottou, and M. Nickel (2019) Poincaré maps for analyzing complex hierarchies in single-cell data. bioRxiv. Cited by: Figure 2.
  • A. Klushyn, N. Chen, R. Kurle, B. Cseke, and P. van der Smagt (2019a) Learning hierarchical priors in vaes. External Links: 1905.04982 Cited by: §4.
  • A. Klushyn, N. Chen, R. Kurle, B. Cseke, and P. van der Smagt (2019b) Learning hierarchical priors in vaes. ArXiv abs/1905.04982. Cited by: §5.3.
  • M. Kochurov, R. Karimov, and S. Kozlukov (2020)

    Geoopt: riemannian optimization in pytorch

    External Links: 2005.02819 Cited by: §5.2.
  • W. Luo, W. Liu, and S. Gao (2017) A revisit of sparse coding based anomaly detection in stacked rnn framework. 2017 IEEE International Conference on Computer Vision (ICCV), pp. 341–349. Cited by: §4.
  • E. Mathieu, C. L. Lan, C. J. Maddison, R. Tomioka, and Y. W. Teh (2019) Continuous hierarchical representations with poincaré variational auto-encoders. In NeurIPS, Cited by: §1, §2.2, §3, §3.1, §3.1, §4.
  • F. D. Mattia, P. Galeone, M. D. Simoni, and E. Ghelfi (2019) A survey on gans for anomaly detection. ArXiv abs/1906.11632. Cited by: §4.
  • P. Mettes, E. van der Pol, and C. G. M. Snoek (2019) Hyperspherical prototype networks. In NeurIPS, Cited by: §1.
  • M. Nickel and D. Kiela (2017) Poincaré embeddings for learning hierarchical representations. ArXiv abs/1705.08039. Cited by: §1.
  • I. Ovinnikov (2019) Poincaré wasserstein autoencoder. ArXiv abs/1901.01427. Cited by: §4.
  • S. Pidhorskyi, R. Almohsen, D. A. Adjeroh, and G. Doretto (2018)

    Generative probabilistic novelty detection with adversarial autoencoders

    In NeurIPS, Cited by: §1.
  • M. A. F. Pimentel, D. A. Clifton, L. A. Clifton, and L. Tarassenko (2014) A review of novelty detection. Signal Process. 99, pp. 215–249. Cited by: §4.
  • A. Radford, L. Metz, and S. Chintala (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR abs/1511.06434. Cited by: §2.2.
  • D. J. Rezende, S. Mohamed, and D. Wierstra (2014)

    Stochastic backpropagation and approximate inference in deep generative models

    In ICML, Cited by: §1, §2.2, §4.
  • L. Ruff, N. Görnitz, L. Deecke, S. A. Siddiqui, R. A. Vandermeulen, A. Binder, E. Müller, and M. Kloft (2018) Deep one-class classification. In ICML, Cited by: §5.2.
  • T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In IPMI, Cited by: §1, §4.
  • H. Shao, A. Kumar, and P. T. Fletcher (2018) The riemannian geometry of deep generative models. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 428–4288. Cited by: §2.1, §5.3.
  • O. Skopek, O. Ganea, and G. Bécigneul (2019) Mixed-curvature variational autoencoders. arXiv preprint arXiv:1911.08411. Cited by: §3, §4, footnote 2.
  • A. Tifrea, G. Becigneul, and O. Ganea (2019) Poincare glove: hyperbolic word embeddings. In International Conference on Learning Representations, Cited by: §1.
  • J. M. Tomczak and M. Welling (2017) VAE with a vampprior. CoRR abs/1705.07120. External Links: Link, 1705.07120 Cited by: §4.
  • A. Ungar (2009) A gyrovector space approach to hyperbolic geometry. Vol. 1. External Links: Document Cited by: §B.1, 4.Definition.
  • A. van den Oord, O. Vinyals, and K. Kavukcuoglu (2017) Neural discrete representation learning. External Links: 1711.00937 Cited by: §4.
  • H. Zenati, C. S. Foo, B. Lecouat, G. Manek, and V. R. Chandrasekhar (2018) Efficient gan-based anomaly detection. ArXiv abs/1802.06222. Cited by: §4.
  • S. Zhai, Y. Cheng, W. Lu, and Z. Zhang (2016)

    Deep structured energy based models for anomaly detection

    ArXiv abs/1605.07717. Cited by: §4.
  • L. Zhang, F. Yang, Y. D. Zhang, and Y. J. Zhu (2016)

    Road crack detection using deep convolutional neural network

    In Image Processing (ICIP), 2016 IEEE International Conference on, pp. 3708–3712. Cited by: §5.2.
  • C. Zhou and R. C. Paffenroth (2017) Anomaly detection with robust deep autoencoders. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Cited by: §4.
  • J. Zhu, P. Krähenbühl, E. Shechtman, and A. A. Efros (2016) Generative visual manipulation on the natural image manifold. Lecture Notes in Computer Science, pp. 597–613. Cited by: §2.2.
  • B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen (2018)

    Deep autoencoding gaussian mixture model for unsupervised anomaly detection

    In ICLR, Cited by: §1, §4.
  • Q. Zou, Y. Cao, Q. Li, Q. Mao, and S. Wang (2012) CrackTree: automatic crack detection from pavement images. Pattern Recognition Letters 33 (3), pp. 227–238. Cited by: §5.2.

Supplementary Materials

A Notations

  • : height and width of input images

  • : input image space, included in , considered Euclidean

  • : latent dimension

  • : latent space, included in , with

  • : probability distribution of the data

  • : random variable draw in input space

  • prior probability

  • : posterior probability distribution, learned by the encoder

  • : likelihood distribution, learned by the decoder

B Gyroplane layer derivation

Here we provide the proof of the gyroplane layer; it is similar to the one in Ganea et al. [2018], with the following expression:

The distance of a point to takes the form:


b.1 A few definitions

Here are some definitions of notions we are going to use in the proof. They come from Riemannian geometry and Ungar [2009].

is a -dimensional manifold. The tangent space at a point is noted . The Riemannian metric is a set of inner products , varying in a smooth manner with .

Definition .1 (Mobius addition).

The Mobius addition is defined as follows, for two points :


It is worth noting that this addition is neither commutative nor associative, but it does have the following properties, :

  • , (left cancellation law)

The Mobius subtraction is simply defined as .

Definition .2 (Gyroangle).

For , we will denote by the angle between the two geodesics starting from and ending at and respectively. This angle, named the gyroangle can be defined either by the angle between the two initial velocities of each geodesic, and :


Or as:

Definition .3.

The Gyrodistance between two points is defined as: .

Definition .4.

In Ungar [2009], Ungar defined a Gyroline as: , , , where is the Mobius Scalar mutliplication in the Gyrogroup , defined as:


So the geodesic , with , that satisfies the following constraints: and . If we use this definition, and do a reparametrization using the gyrodistance definition to make this geodesic of constant speed, we obtain that the unit speed geodesic starting at with direction is:


with , that satisfies the following constraints: and .

Definition .5 (Log Map).

The Log Map of the stereographically projected sphere is defined, for and in :

Definition .6 (Exp Map).

The Log Map of the stereographically projected sphere is defined, for and :


b.2 Stereographically projected sphere hyperplane

We are defining what an hyperplane in the stereographically projected sphere; the final expression needs justification, so the proof follows the definition.

Definition .7 (Stereographically projected sphere hyperplane).

For a point , a point , let . Since and , we have that: . The hyperplane in the stereographically projected sphere is defined as:


If , by definition of the tangent space.

If , since is a strictly increasing function on , it is a bijection from to . For a fixed , is also a bijection so , which proves equality 14.

Still in the case of :


16 is obtained by definition of the logarithm map of . Since , we obtain 17. Finally, since (), then , so we obtain 18. Lastly, if , 15 is still true, which achieves the proof. ∎

b.3 Distance to hyperplane

We are going to proceed in four steps: firstly, we need to prove the existence and unicity of the orthogonal projection of a point in the manifold on a geodesic that does not go through this point. Then, we will prove that this projection in fact minimizes the distance between the point and the geodesic. Then, we will prove that geodesics that pass through 2 points of an hyperplane belong entirely to that hyperplane, and finally, we will find the explicit expression of this distance.

Existence and Unicity of an orthogonal projection on a geodesic

Thanks to the preliminary theorem, we know that the orthogonal projection of a point on a given geodesic of that does not include exists and is unique.

Minimizing distance between a point and a geodesic

This projection minimizes the distance between the point and the geodesic , since the hypotenuse in a spherical right triangle is strictly longer than the two other sides of the rectangle (constant curvature space sine law).

Geodesics in

Let be an hyperplane in the stereographically projected sphere. Let be a point in , such that . Let us consider the geodesic (it exists since ). As we have previously seen in LABEL:, this geodesic is of the form, :


We want to see if, , belongs to , that is to say: :


We obtain 20 by the use of the left cancellation law; then, by definition of the Möbius scalar product, we obtain 21. And we finally obtain 22 since .

Distance to hyperplane expression

Let be a point in , and be an hyperplane in . Let us denote the point that minimizes (it exists since is continuous in both variables and bounded below). For a , then , and form a gyrotriangle in , noted . Let us suppose now that , then (by B.3).

From now on, in order to find , we are hence going to consider points such that (we know in the set ). By applying the constant curvature sine law in the right triangle , we obtain:


But we have that:


By using the trigonometric property : , we obtain that:


and does not depend on .

The term inside the that influences the minimization of is . Since , minimizing is equivalent to maximizing , to which we know the expression:


It is quite straightforward to prove that . Since , we have that: . Since , then . We know that , so the real number . We can deduce that:


And our optimization problem becomes:


Which is a Euclidean optimization problem, whose solution is well known; we obtain:


So, exists by construction, and by re-injecting the expression of in , we obtain the expression in 5.

C Datasets details


We us the digitized brain tissue dataset as described in Faust et al. [2018]: Brain tissue slides were digitized into whole-slide images (WSI), as shown in Fig. 5. Each WSI is tiled into an image patch of 1024 x 1024 pixels (0.504 per pixel, 516 ) to carry out training and inference, a tile size over 10 times larger than most other approaches. This larger size was chosen because it contains multiple levels of morphologic detail (single cell-level and overall tumor structure) without significantly affecting computation times. The size of WSI varies, most with length and width of approximately 50,000 pixels. In the construction of the dataset by Faust et al. [2018], all samples were anonymized and annotations carried out by board-certified pathologists. Slides were annotated with eight non-lesional categories (hemorrhage, surgical material, dura, necrosis, blank slide space, and normal cortical gray, white, and cerebellar brain tissue), and five common lesional subtypes (gliomas, meningiomas, schwannomas, metastases, and lymphomas). The training image dataset and WSI testing cases are available for download in the Zenodo repository at and, respectively. Below in Table S1 we breakdown the tissue types in the dataset.

Table S1. Distribution of tissue types and images used for training. These numbers represent the aforementioned 1024 x 1024 pixel images. Reproduced from Faust et al. [2018].

Figure 6: Two example H&E-stained whole-slide images (WSI) of glioblastoma from the neuropathology test set, each containing a heterogeneous mixture of tumor, necrosis, brain tissue, blood, and surgical material. Each WSI contains several thousand 1024 x 1024 tiles, as shown in Fig. 4.

D Model hyperparameters and setup

TODO (Louise): params for the models used in experiments

For all experiments we use -VAE Higgins et al. [2017], a variant of VAE that applies a scalar weight to the KL term in the objective function. In the histopathology experiments we found stronger reconstruction results with a weighting schedule applied to the KL term of the ELBO. This is because a different ratio targets different regions in the rate-distortion plane, either favouring better compression or reconstruction Alemi et al. [2018].

We start with to enforce a reconstruction optimization. When the average reconstruction error hits a predefined parameter we initiate the following update scheme:


where is the update’s learning rate.

For the experiments in section 5.2

, the backbone encoder consisted of 4-convolutional layers (2D Convolution + Batch Normalization + Leaky ReLU), with a hidden Euclidean dimension of

. The optimization was done with the Adam optimizer, with a constant learning rate of , and a batch size of . The maximum number of epochs was set to , with an early stopping mechanism, with warm-up epochs and epochs for the lookahead.