I Introduction
Unsupervised learning is generally considered one of the greatest challenges of machine learning research. In recent years, there has been a great progress in modeling data distributions using deep generative models [1, 2], and while this progress has influenced the clustering literature, the full potential has yet to be reached.
Consider a latent variable model
(1) 
where latent variables provide a lowdimensional representation of data and . In general, the prior will determine if clustering of the latent variables is successful; e.g. the common Gaussian prior, , tend to move clusters closer together, making post hoc clustering difficult (see Fig. 1). This problem is particularly evident in deep generative models such as variational autoencoders (VAE) [3, 4] that pick
(2) 
where the mean
and variance
are parametrized by deep neural networks with parameters
. The flexibility of such networks ensures that the latent variables can be made to follow almost any prior , implying that the latent variables can be forced to show almost any structure, including ones not present in the data. This does not influence the distribution , but it can be detrimental for clustering.These concerns indicate that one should be very careful when computing distances in the latent space of deep generative models. As these models (informally) span a manifold embedded in the data space, one can consider measuring distances along this manifold; an idea that share intuitions with classic approaches such as spectral clustering [5]. Arvanitidis et al. [6] have recently shown that measuring along
the data manifold associated with a deep generative model can be achieved by endowing the latent space with a Riemannian metric and measure distances accordingly. Unfortunately, the approach of Arvanitidis et al. require numerical solutions to a system of ordinary differential equations, which cannot be readily evaluated using standard frameworks for deep learning (see Sec.
II). In this paper, we propose an efficient algorithm for evaluating these distances and demonstrate usefulness for clustering tasks.Ii Related Work
Clustering, as a fundamental problem in machine learning, highly depends on the quality of data representation. Recently deep neural networks have become useful in learning clusteringfriendly representations. We see four categories of work based on network structure: autoencoders (AE), deep neural networks (DNN), generative adversarial networks (GAN) and variational autoencoders (VAE).
In AEbased methods, Deep Clustering Networks [7]
directly combine the loss functions in autoenconders and
kmeans, while Deep Embedding Network [8] also revised the loss function by adding localitypreserving and group sparsity constraints to guide the network for clustering. Deep MultiManifold Clustering [9] introduced manifold locality preserving loss and proximity to cluster centroids, while Deep Embedded Regularized Clustering [10] established a nontrivial structure of convolutional and softmax autoencoder and proposed an entropy loss function with clustering regularization. Deep Continuous Clustering [11] inherited the continuity property in the method of Robust Continuous Clustering[12], a formulation having a clear continuous objective and no prior knowledge of clusters number, to integrate the parameters learning in network and clustering altogether. The AEbased methods are easy to implement but introduce hyperparameters to the loss and are very limited in network depth.In DNNbased methods, the networks can be very flexible such as Convolutional Neural Networks [13] or Deep Belief Networks [14] and often involve pretraining and finetuning stages. Deep Nonparametric Clustering [15] and Deep Embedded Clustering [16] are such representative works. Since the network initialization is sensitive to the result, Clustering Convolutional Neural Networks [17] is proposed with initial cluster centroids. To get rid of pretraining, Joint Unsupervised Learning [18] and Deep Adaptive Image Clustering [19]
are proposed for hierarchical cluster and binary relationship of images specifically.
In VAEbased methods, because the VAE is a generative model, Variational Deep Embedding [20] and Gaussian Mixture VAE [21] designed special prior distributions over the latent representation and inferred the data classes correspond to the modes of different priors.
In GANbased methods, as another generative model, Deep Adversarial Clustering [22] was inspired by the ideas behind the Variational Deep Embedding [20], but with GAN structure. Information Maximizing Generative Adversarial Network [23] can disentangle the latent representations both discrete and continuous, and it modeled a clustering function with categorical values for those latent codes. AEbased and DNNbased methods are designed specifically for clustering but do not consider the underlying structure of data resulting in having no ability to generate data. VAEbased and GANbased methods can generate samples and infer the structure of data while because of changing the latent space for clustering, it may conversely affect the true intrinsic structure of data.
Our work, is based on a recent observation that deep generative models immerse random Riemannian manifolds [6]. This implies a change in the way distances are measured in the latent space, which reveals a clustering structure. Unfortunately, practical algorithms for actually computing such distances are missing, and it is the main focus of the present paper. With such an algorithm in hand, clustering can be performed with high accuracy in the latent space of an offtheshelf VAE.
The paper is organized as follows: Section III introduce the usual VAE network, along with its interpretation as a stochastic Riemannian manifold. In Sec. IV we derive an efficient algorithm for computing geodesics (shortest paths) over this manifold, and in Sec. V we demonstrate its usefulness for clustering tasks. The paper is concluded in Sec. VI.
Iii Background on Variational Autoencoders
Deep generative modeling is an area of machine learning which deals with models of distribution in a potentially highdimensional space . Deep generative models can capture data dependencies by learning lowdimensional latent variables to form a latent space . In recent years, the variational autoencoder (VAE)
has emerged as one of the most popular deep generative model because it can be built on top of deep neural networks and be trained with stochastic gradient descent. The VAE aims to maximize the probability of data samples
generated as(3) 
Here, the latent variables
are sampled according to a probability density function defined over
and the distribution denotes the likelihood parametrized by . In VAEs is often Gaussian(4) 
where is the mean function and is the covariance function.
Iiia Inference and Generator
The VAE consists of two parts: an inference network and a generator network, that serve almost the same roles as encoders and decoders in classic autoencoders.
IiiA1 The inference network
is trained to map the training data samples to the latent space meanwhile forcing the latent variables to comply with the distribution . However, both the posterior distribution and are unknown. Therefore, VAE gives the solution that the posterior distribution is a variational distribution , computed by a network with parameters . In order to make accord with the distribution , the KullbackLeibler (KL) divergence [24] is used, that is:
(5) 
IiiA2 The generator network
is trained to map the latent variables to generate data samples that are much like the true samples from the data space . According to the purpose of this network, we know that it needs to maximize the marginal distribution over the whole latent space and actually it is often processed with logarithm and computed by a multilayer network with parameters :
(6) 
From these parts a VAE is jointly trained as
(7) 
IiiB The Random Riemannian Interpretation
The inference network should force the latent variables to approximately follow the prespecified unit Gaussian prior , which implies that the latent space gives a highly distorted view of the original data. Fortunately, this distortion is fairly easy to characterize [6]. First, observe that the generative model of the VAE can be written as (using the socalled reparametrization trick; see also Fig. 2)
(8) 
Now let be a latent variable and let be infinitesimal. Then we can measure the distance between and in the input space using Taylor’s theorem
(9)  
(10) 
where denote the Jacobian of at . This implies that define a local inner product under which we can define curve lengths through integration
(11) 
Here is a curve in the latent space and is its velocity. Distances can then be defined as the length of the shortest curve (geodesic) connecting two points,
(12)  
(13) 
This is the traditional Riemannian analysis associated with embedded surfaces [25]. From this, it is wellknown that lengthminimizing curves are minimizers of energy
(14) 
which is easier to optimize than Eq. 11.
For generative models, the analysis is complicated by the fact that is a stochastic mapping, implying that the Jacobian is stochastic, geodesics are stochastic, distances are stochastic, etc. Arvanitidis et al. [6] propose to replace the stochastic metric with its expectation which is equivalent to minimizing the expected energy [26]. While this is shown to work well, the practical algorithm proposed by Arvanitidis et al. amount to solving a nonlinear differential equation numerically, which require us to evaluate both the Jacobian
and its derivatives. Unfortunately, modern deep learning frameworks such as Tensorflow rely on
reverse mode automatic differentiation [27], which does not support Jacobians. This renders the algorithm of Arvanitidis et al. impractical. A key contribution of this paper is a practical algorithm for computing geodesics that fits within modern deep learning frameworks.Iv Proposed algorithm to compute geodesics
To develop an efficient algorithm for computing geodesics, we first note that the expected curve energy can be written as
(15) 
If we discretize the curve at points (Fig. 3), then this integral can be approximated as
(16) 
Since , the expectation computing can be evaluated in closedform as
(17) 
and the approximated expected energy can be written
(18) 
This energy is easily interpretable: the first term of the sum corresponds to the curve energy along the expected data manifold, while the second term penalizes curves for traversing highly uncertain regions on the manifold. This implies that geodesics will be attracted to regions of high data density in the latent space.
Unlike the ordinary differential equations of Arvanitidis et al. [6], Eq. 18 can readily be optimized using automatic differentiation as implemented in Tensorflow. We can, thus, compute geodesics by picking a parametrization of the latent curve and optimize Eq. 18 with respect to curve parameters.
Iva Curve Parametrization
There are many common choices for parametrizing curves, e.g. splines [6], Gaussian processes [28] or point collections [29]. In the interest of speed, we propose to use the restricted class of quadratic functions, i.e.
(19) 
A curve, thus, has free parameters , , and . In practice, we are concerned with geodesic curves that connect two prespecified points and , so the quadratic function should be constrained to satisfy and , which is easily achieved for quadratics. Under this constraint, there are only
free parameters to estimate when optimizing
(18). Here we perform the optimization using standard gradient descent.IvB Specifying Uncertainty
When training the VAE model, the reconstruction term of Eq. 7
ensure that we can expect highquality reconstructions of the training data. Interpolations between latent training data usually give highquality reconstructions in densely sampled regions of the latent space, but lowquality reconstructions in regions with low sample density. Ideally, the generator variance
should reflect this.From the point of view of computing geodesics, the generator variance is important as it appears directly in the expected curve energy (18). If is small near the latent data and large away from the data, then geodesics will follow the trend of the data [26], which is a useful property in a clustering context.
In practice, the neural network used to model
is only trained where there is data, and its behavior inbetween is governed by the activation functions of the network; e.g. common activations such softplus or
implying that variances are smoothly interpolated inbetween the training data. This is a most unfortunate property for a variance function; e.g. if the optimal variance is low at all latent training points, then the predicted variance will be low at all points in the latent space. To ensure that variance increases away from latent data, Arvanitidis et al. [6] proposed to model the inverse variance (precision) with an RBF network [30] with isotropic kernels. This is reported to provide meaningful variance estimates.We found the isotropic assumptions to be too limited, and instead applied anisotropic kernels. Specifically, we propose to use a rescaled Gaussian Mixture Model (GMM) to represent the inverse variance function
(20) 
where and are the componentwise mean and covariance, and and are positive weights. For simplicity, we set each component has its own single variance. For all the latent variables , we use the usual EM algorithm [24] to obtain the weights and the mean and covariance in each component. is trained by solving Eq. 7. Figure 4 gives an example, showing the inverse output of , and we see that for regions without data, gives low values such that the variance is large as one would expect.
IvC Curve Initialization
Once the VAE is fully trained, we can compute geodesics in the latent space. As previously mentioned, we use gradient descent to minimize (18
). To improve convergence speed, we here propose a heuristic for initialization that we have found to work well.
Since geodesics generally follow the trend of the data [6] we seek an initial curve with this property. As it can be expensive to evaluate the generator network , we propose to first seek a curve that minimize the inverse GMM model . We do this with a simple stochastic optimization akin to a particle filter [31]. This is written explicitly in Algorithm 1.
V Experiments and details related
Va Experimental Pipeline
Throughout the experiments, we use the same threestage pipeline, which is illustrated in Fig. 6. In the first stage we train a VAE with fixed constant output variance; this VAE has five layers in total, Henc, Menc, Senc, Hdec, Mdec which are optimized according to Eq. 7. In the second stage, we fit the generator variance represented by a GMM network (Sec. IVB) according to Eq. 7. Finally, in the third stage, we compute geodesics parametrized by , and compute clusters accordingly. Here we use the medoids algorithm [32] that only rely on pairwise distances. This decision was made to illustrate the information captured by geodesic distances.
VB Visualizing Curvature
A useful visualization tool for the curvature of generative models is the magnification factor [33], which correspond to the Riemannian volume measure associated with the metric [34]. For a given Jacobian , this is defined as
(21) 
In practice, the Jacobian is a stochastic object, so previous work [6, 34] has proposed to visualize . Here we argue that the expectation should be taken as late in the process as possible, and instead visualize the expected volume measure,
(22) 
To compute this measure, we split the latent space into small quadratic pieces, as in Fig. 7.
As we can see from the figure, there are two vectors
, and the corresponding vectors in , and . Note , then the volume measure is:(23) 
Here we compute the righthand side expectation using sampling. As an example visualization, Fig. 8 show the logarithm of the volume measure associated with the model from Fig. 4. In areas of small volume measure (blue), distances will generally be small, while they will be large in regions of large volume measure (red).
VC Experimental Results
VC1 The TwoMoon Dataset
As a first illustration, we consider the classic “Two Moon” data set shown in Fig. 4. For Henc and Hdec layers, we use two hidden fullyconnected layers with softplus activations, and for the Senc layer, we use one fullyconnected layer, again, with softplus. For Menc and Mdec layers we use fullyconnected layers.
Figure 9 show the latent space of the resulting VAE along with several quadratic geodesics. We see that the geodesics nicely follow the structure of the data. This also influences the observed clustering structure. Figure 10 show all pairwise distances using both geodesic and Euclidean distances. I should be noted that the first 50 points belong to the first “moon” while the remaining belong to the other. From the figure, we see that the geodesic distance reveals the cluster structure much more clearly than the Euclidean counterpart. We validate this by performing medoids clustering using the two distances. As a baseline, we also consider standard spectral clustering (SC) [5] to the original data. We report clustering accuracy (the ratio of correct clustered sample number and the number of observations) in Fig. 11 and in Table I. It is evident that the geodesic distance reveals the intrinsic structure of the data.
method  medoids  medoids  SC 

data samples  reconstructed data  latent variable  original data 
distance  Geodesic  Euclidean  Euclidean 
accuracy  1  0.86  0.92 
VC2 Synthetic Anisotropically Distributed Data
method  medoids  medoids  SC 

data samples  reconstructed data  latent variable  original data 
distance  Geodesic  Euclidean  Euclidean 
accuracy  1  0.80  0.96 
Using the same setup as for the twomoon dataset, we generate 100 samples from clusters with anisotropic distributions. Figure 12 shows both volume measure and pairwise distances. Again medoids clustering show that the geodesic distance does a much better job at capturing the data structure than the baselines. Clustering accuracy is in Table II and the found clusters are shown in Fig. 13.
VC3 The MNIST Dataset
From the wellknown MNIST dataset, we take handwritten digit ’0’, ’1’ and ’2’ to test 2class and 3class clustering. For Henc and Hdec
layers, we use two hidden fullyconnected layers with Relu activations
^{1}^{1}1The number of Henc neural nodes: from 784 to 500 and from 500 to 2. The number of Hdec neural nodes: from 2 to 500 and from 500 to 784., and for the Senc layer, we use one fullyconnected layer with a sigmoid activation function, and for Menc, Mdec layers we use fullyconnected layers with identity activation functions. Images generated by both networks are shown in Fig. 14.For the 2class situation, we use digits ’0’ and ’1’. We select 50 samples from each class and compute their pairwise distances, which are shown in Fig. 15. For the 3class situation, we select 30 samples from each class and show pairwise distances in Fig. 16. In both cases, the geodesic distance reveals a clear clustering structure. We also see this in medoids clustering, which outperforms the baselines (Table III).
method  medoids  medoids  SC 

data samples  reconstructed data  latent variable  original data 
distance  Geodesic  Euclidean  Euclidean 
’0’’1’ accuracy  1  0.93  0.69 
’0’’1’’2’ accuracy  1  0.80  0.36 
VC4 The FashionMNIST Dataset
FashionMNIST [35] is a dataset of Zalando’s article images. Each image is a grayscale image. We consider classes ’Tshirt’, ’Sandal’ and ’Bag’ to test 2class and 3class clustering. For Henc and Hdec layers, we use three hidden fullyconnected layers with Relu activations^{2}^{2}2The number of Henc neural nodes: from 784 to 500 and from 500 to 200 and from 200 to 100. The number of Hdec neural nodes: from 100 to 200 and from 200 to 500 and from 500 to 784, and for Senc layer, we use one fullyconnected layer with a sigmoid activation function, and for Menc, Mdec we use fullyconnected layers with identity and sigmoid activation functions respectively. Images generated by the networks are shown in Fig. 17.
For the 2class situation, we use the ’Tshirt’ and ’Sandal’ samples to train the VAE. We select 50 samples from ’Tshirt’ and ’Sandal’ dataset respectively, and compute pairwise distances (see Fig. 18). For the 3class situation we select 30 samples from each class and compute distances (Fig. 19). As before, we see that medoids clustering with geodesic distances significantly outperform the baselines; see Table IV for numbers.
method  medoids  medoids  SC 

data samples  generated data  latent variable  original data 
distance  Geodesic  Euclidean  Euclidean 
’Tshirt’’Scandal’  1  0.98  0.48 
’Tshirt’’Scandal’’Bag’  1  0.93  0.28 
VC5 The EMNISTLetter Dataset
The EMNISTletter dataset [36] is a set of handwritten alphabet characters derived from the NIST Special Database and converted to grayscale images. We select the characters ’D’ and ’d’ as 2 classes, and fit a VAE with the same network architectures as the ones used for FashionMNIST. Generated images are shown in Fig. 20.
We select 50 samples from ’D’ and ’d’ respectively and show pairwise distances in Fig. 21. Again, medoids clustering show that the geodesic distance reflects the intrinsic structure, which improves clustering over the baselines, c.f. Table V.
method  medoids  medoids  SC 

data samples  reconstructed data  latent variable  original data 
distance  Geodesic  Euclidean  Euclidean 
’D’’d’  1  0.76  0.48 
Vi Conclusion
In this paper, we have proposed an efficient algorithm for computing shortest paths (geodesics) along data manifolds spanned by deep generative models. Unlike previous work, the proposed algorithm is easy to implement and fits well with modern deep learning frameworks. We have also proposed a new network architecture for representing variances in variational autoencoders. With these two tools in hand, we have shown that simple distancebased clustering works remarkably well in the latent space of a deep generative model, even if the model is not trained for clustering tasks. Still, the dimension of the latent space, the form of the curve parametrization and modeling variance in generator worth developing further to obtain the more robust geodesics computation algorithm.
Acknowledgments
TY was supported by the National Key R&D Program of China (No. 2017YFB0702104). SH was supported by a research grant (15334) from VILLUM FONDEN. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement n^{o} 757360).
References

[1]
D. P. Kingma, D. J. Rezende, S. Mohamed, and M. Welling, “Semisupervised learning with deep generative models,” in
Proceedings of the 28th Neural Information Processing Systems (NIPS), Montréal Canada, 2014, pp. 3581–3589.  [2] E. Denton, S. Chintala, A. Szlam, and R. Fergus, “Deep generative image models using a laplacian pyramid of adversarial networks,” in Proceedings of the 29th Neural Information Processing Systems (NIPS), Montréal Canada, 2015, pp. 1486–1494.
 [3] D. P. Kingma and M. Welling, “Autoencoding variational bayes,” in Proceedings of the 2nd International Conference on Learning Representations (ICLR), Banff, Canada, 2014.

[4]
D. J. Rezende, S. Mohamed, and D. Wiestra, “Stochastic backpropagation and approximate inference in deep generative models,” in
Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, China, 2014.  [5] U. V. Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, pp. 395–416, 2007.
 [6] G. Arvanitidis, L. Hansen, and S. Hauberg, “Latent space oddity: on the curvature of deep generative models,” in Proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, Canada, 2018.

[7]
B. Yang, X. Xiao, N. Sidiropoulos, and M. Hong, “Towards kmeansfriendly spaces: Simultaneous deep learning and clustering,” in
Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 2017, pp. 3861–3870. 
[8]
P. Huang, Y. Huang, W. Wang, and L. Wang, “Deep embedding network for
clustering,” in
Proceedings of the 22nd International Conference on Pattern Recognition
, Stockholm, Sweden, 2014, pp. 1532–1537. 
[9]
D. Chen, J. Lv, and Z. Yi, “Unsupervised multimanifold clustering by learning
deep representation,” in
Proceedings of the 31st AAAI Conference on Artificial Intelligence
, San Francisco, California USA, 2017, pp. 385–391. 
[10]
K. G. Dizaji, A. Herandi, C. Deng, W. Cai, and H. Huang, “Deep clustering via
joint convolutional autoencoder embedding and relative entropy
minimization,” in
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
, Venice, Italy, 2017, pp. 5736–5745.  [11] S. A. Shah and V. Koltun, “Deep continuous clustering,” arXiv:1803.01449, 2018.
 [12] ——, “Robust continuous clustering,” Proceedings of the National Academy of Sciences of the United States of America, vol. 114, pp. 9814–9819, 2017.

[13]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in
Proceedings of the 26th Conference on Neural Information Processing Systems (NIPS), Harrahs and Harveys, Lake Tahoe, 2012.  [14] R. Sarikaya, G. E. Hinton, and A. Deoras, “Application of deep belief networks for natural language understanding,” ACM Transactions on Audio Speech & Language Processing, vol. 22, pp. 778–784, 2014.
 [15] G. Chen, “Deep learning with nonparametric clustering,” arXiv:1501.03084, 2015.

[16]
J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for clustering analysis,” in
Proceedings of the 33rd International Conference on Machine Learning (ICML), New York, 2016, pp. 478–487.  [17] C. C. Hsu and C. W. Lin, “Cnnbased joint clustering and representation learning with feature drift compensation for largescale image data,” IEEE Transactions on Multimedia, vol. 20, pp. 421 – 429, 2018.
 [18] J. Yang, D. Parikh, and D. Batra, “Joint unsupervised learning of deep representations and image clusters,” in Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016, pp. 5147–5156.
 [19] J. Chang, L. Wang, G. Meng, S. Xiang, and C. Pan, “Deep adaptive image clustering,” in Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017, pp. 5879–5887.
 [20] Z. Jiang, Y. Zheng, H. Tan, B. Tang, and H. Zhou, “Variational deep embedding: an unsupervised and generative approach to clustering,” in Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, 2017, pp. 1965–1972.
 [21] N. Dilokthanakul, P. Mediano, and M. Garnelo, “Deep unsupervised clustering with gaussian mixture variational autoencoders,” arXiv:1611.02648, 2016.
 [22] W. Harchaoui, P. A. Mattei, and C. Bouveyron, “Deep adversarial gaussian mixture autoencoder for clustering,” in Workshop of the 5th International Conference on Learning Representations (ICLR), Toulon, France, 2017.
 [23] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “Infogan: Interpretable representation learning by information maximizing generative adversarial nets,” in Proceedings of the 5th Proceedings of the 30th Neural Information Processing Systems (NIPS), Barcelona, Spain, 2016, pp. 2172–2180.
 [24] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics). Secaucus, NJ, USA: SpringerVerlag New York, Inc., 2006.
 [25] S. Gallot, D. Hulin, and J. Lafontaine, Riemannian geometry. Springer, 1990, vol. 3.
 [26] S. Hauberg, “Only bayes should learn a manifold,” 2018.
 [27] L. B. Rall, “Automatic differentiation: Techniques and applications,” 1981.
 [28] P. Hennig and S. Hauberg, “Probabilistic solutions to differential equations and their application to riemannian statistics,” in Proceedings of the 17th international Conference on Artificial Intelligence and Statistics (AISTATS), vol. 33, 2014.
 [29] S. Laine, “Featurebased metrics for exploring the latent space of generative models,” ICLR workshops, 2018.

[30]
Q. Que and M. Belkin, “Back to the future: Radial basis function networks revisited,” in
Artificial Intelligence and Statistics (AISTATS), 2016.  [31] O. Cappé, S. J. Godsill, and E. Moulines, “An overview of existing methods and recent advances in sequential monte carlo,” Proceedings of the IEEE, vol. 95, no. 5, pp. 899–924, 2007.
 [32] L. Kaufman and P. Rousseeuw, Clustering by means of medoids. NorthHolland, 1987.
 [33] C. M. Bishop, M. Svensen, and C. K. Williams, “Magnification factors for the gtm algorithm,” 1997.
 [34] A. Tosi, S. Hauberg, A. Vellido, and N. D. Lawrence, “Metrics for Probabilistic Geometries,” in The Conference on Uncertainty in Artificial Intelligence (UAI), Jul. 2014.
 [35] H. Xiao, K. Rasul, and R. Vollgraf. (2017) Fashionmnist: a novel image dataset for benchmarking machine learning algorithms.
 [36] G. Cohen, S. Afshar, J. Tapson, and A. V. Schaik, “Emnist:an extension of mnist to handwritten letters,” arXiv:1702.05373, 2017.
Comments
There are no comments yet.