. Clustering becomes difficult when processing large amounts of high-semantic and high-dimensional data samples. In order to overcome these challenges, many latent space clustering approaches such as DEC , DCN  and ClusterGAN 
, have been proposed. In these latent space clustering methods, the original high-dimensional data is first projected to low-dimensional latent space, then clustering algorithms, such as K-means, are performed on the latent space.
Most existing latent space clustering methods focus on learning the “clustering-friendly” latent representations. To avoid learning the random discriminative representations, their training objectives are usually coupled with data reconstruction loss or data generation constraints, which allow to rebuild or generate the input samples from the latent space. These objectives force the latent space to capture all key factors of variations and similarities, which are essential for reconstruction or generation. Therefore, these learned low-dimensional representations are not just related to clusters, and not the optimal latent representations for clustering.
Furthermore, current latent space clustering methods depend on additional clustering methods (e.g., K-means) to output the final clustering result based on learned latent representations. It’s difficult to effectively integrate low-dimensional representation learning and clustering algorithm together. The performance of distance-based clustering algorithms, such as K-means , is highly dependent on the selection of proper similarity and distance measures. Although constructing latent space can alleviate the problem of computing the distance between high dimensional data, defining a proper distance in latent space to obtain best clustering performance is still a challenge.
In this paper, we propose disentangle latent space clustering (DLS-Clustering), a new type of clustering algorithm that directly obtains the cluster information during the disentanglement of latent space. The disentangling process partitions the latent space into two parts: the one-hot discrete latent variables directly related to categorical cluster information, and the continuous latent variables related to other factors of variations. The disentanglement of latent space is actually performing the clustering operation, and no further clustering method is needed. Unlike existing distance-based clustering methods, our method does not need any explicit clustering objective and distance/similarity calculation in the latent space.
To separate the latent space into two completely independent parts and directly obtain clusters, we first couple the inference network and the generator of GAN to form a deterministic encoder-decoder pair under the maximum mean discrepancy (MMD) regularization 
. Then, we utilize the weight sharing strategy, which involves the bidirectional mapping between latent space and data space, to separate the latent space into one-hot discrete variables and continuous variables of other factors. Our method integrates the GAN and deterministic Autoencoder together, to achieve the disentanglement of the latent space. It includes three different types of regularizations: an adversarial density-ratio loss in data space, MMD loss in the continuous latent code and cross-entropy loss in discrete latent code. We choose adversarial density-ratio estimation for modeling the data space because it can handle complex distributions. MMD-based regularizer is stable to optimize and works well with multivariate normal distributions. Our code and models are publicly available at this link 111 after the paper is accepted.
In summary, our contributions are as follows:
(1) We propose a new clustering approach called DLS-Clustering, which directly obtain clusters in a completely unsupervised manner through disentangling latent space.
(2) We introduce a MMD-based regularization to enforce the inference network and the generator of standard GAN to form a deterministic encoder-decoder pair.
(3) We define a disentanglement training procedure based on the standard GAN and the inference network without increasing model parameters and requiring extra inputs. This procedure is also suitable for disentangling other factors of variation.
(4) We evaluate DLS-Clustering using six different types of benchmark datasets. DLS-Clustering achieves superior clustering performance on five of six datasets and close to best result on the other one.
2 Related works
Latent space clustering.
Recently, many latent space clustering methods that leverage the advance of deep neural network based unsupervised representation learning[42, 2] have been developed. Several pioneering works propose to utilize an encoding architecture [48, 4, 23, 3] to learn the low-dimensional representations. In these methods, the pseudo-labels that are created based on some hypothetical similarities are used during optimization process. Because pseudo-labels usually underfit the semanticity of real-world datasets, they often suffer the Feature Randomness problem . Most of recent latent space clustering methods are based on Autoencoders [46, 8, 20, 47, 49], which enables to reconstruct data sample from a low-dimensional representation. For example, Deep Embedded Clustering (DEC)  proposes to pretrain an Autoencoder with the reconstruction objective to learn low-dimensional embedded representations. Then, it discards the decoder and continues to train the encoder for clustering objective through a well-designed regularizer. IDEC  combines the reconstruction objective and clustering objective to jointly learn suitable representations with preserving local structure. DCN  proposes a joint dimensionality reduction and K-means clustering approach, in which the low-dimensional representation is obtained via the Autoencoder. Because the learned latent representations are closely related to the reconstruction objective, these methods still do not achieve the desired clustering results.
Recently, ClusterGAN 
integrated GAN with an encoder network for clustering by creating a non-smooth latent space with the mixture of one-hot encoded discrete variables and continuous latent variables. However, the one-hot encoded discrete variables and continuous latent variables are not completely disentangled in ClusterGAN. Thus, the one-hot encoded discrete variable cannot effectively represent cluster. To obtain clustering assignment, ClusterGAN still need to perform additional clustering on entire dimensions of latent space under the discrete-continuous prior distribution.
Disentanglement of latent space. Learning disentangled representations enables us to reveal the factors of variation in the data , and provides interpretable semantic latent codes for generative models. Generally, existing disentangling methods can be mainly divided into two different types according to the disentanglement level. The first type of disentanglement involves separating the latent representations into two [32, 21, 52, 36] or three  parts. This type of method can be achieved in one step. For example, Mathieu et al.  introduce a conditional VAE with adversarial training to disentangle the latent representations into label relevant and the remaining unspecified factors. Y-AE  focuses on the standard Autoencoder to achieve the disentanglement of implicit and explicit representations. Meanwhile, two-step disentanglement methods based on Autoencoder  or VAE 
are also proposed. In those methods, the first step is to extract the label relevant representations by training a classifier. Then, they obtain label irrelevant representations mainly via the reconstruction loss. All of these methods improve the disentanglement results by leveraging (partial) label information to minimize the cross-entropy loss. The second type of disentanglement, such as-VAE , FactorVAE  and -TCVAE 
, learns to separate each dimension in latent space without supervision.These VAE-based frameworks choose the standard Gaussian distribution as the prior distribution. And they aim to balance the reconstruction quality and the latent code regularization through a stochastic encoder-decoder pair.
Considering that the real-world data usually contains several discrete factors (e.g., categories), which are difficult to be modelled with continuous variables. Several studies begin to disentangle the latent representation to discrete and continuous factors of variation, such as JointVAE  and InfoGAN . Although most of the disentanglement learning methods [36, 12, 13] have been proposed based on the Autoencoder, especially VAEs , VAEs usually can not achieve high-quality generation in real-world scenarios, which is related to the training objective . Recently, InfoGAN , an information-theoretic extension to GAN, reveals the disentanglement of latent code by maximizing the mutual information between the latent code and the generated data. In this paper, the proposed method integrates the Autoencoder and GAN together, and separates the latent variables into two parts without any supervision. The discrete latent variables directly represent clusters, and the other continuous latent variables summarize the remaining unspecified factors of variation.
3 Proposed method
We propose an unsupervised learning algorithm to disentangle the latent space into the one-hot discrete latent variables, , and the continuous latent variables, . For each input, naturally represents the categorical cluster information; is expected to contain information of other variations. By sampling latent variables from these discrete-continuous mixtures, we utilize a generator to map these latent variables to data space and an encoder to project the data back to the latent space. To enforce our model to fully split the latent space into two separate parts, we utilize the bidirectional mapping networks to perform multiple generating and encoding processes, and jointly train the generator and encoder with a disentangling-specific loss. The instability of training due to adversarial training may be mitigated by recent training improvements  and the integration of GAN and Autoencoder [1, 26].
3.1 Problem formulation
Given a collection i.i.d. samples (e.g., images) drawn from the real data distribution , where is the -th data sample and is the size of the dataset, our goal is to learn a general method to project the data to the latent space, which is divided into the one-hot discrete latent variables directly related to clusters and the remaining unspecified continuous latent variables. One important challenge to disentangle latent space is how to encourage independence between and as much as possible. First, this involves a bidirectional mapping between the latent space and the data space . It’s difficult to enforce distributions consistency in both spaces: and . Second, it’s still challenging to learn two separate latent variables without any supervision. Existing methods [21, 52, 36] leverage labels to achieve disentanglement of various factors.
In the following sections, we first describe the GAN that generate data space from a discrete-continuous prior (Section 3.2). Then, we introduce a deterministic encoder-generator pair for bidirectional mapping (Section 3.3). After that, we present our disentangling process (Section 3.4). Finally, we describe the objectives of the proposed method (Section 3.5).
3.2 Discrete-continuous prior in DLS-Clustering
We choose a joint distribution of discrete and continuous latent variables as the prior of our GAN model, same as the ClusterGAN. This discrete-continuous prior is helpful for the generation of structured data in generative models. Using images as an example, distinct identities or attributes of objects would be reasonably represented by discrete variables, while other continuous factors, such as style and scale information, can be represented by the continuous variables. In this work, we split the latent representations into and based on the discrete-continuous prior.
The standard generative adversarial networks [17, 19] consist of two components: the generator and the discriminator . defines a mapping from the latent space to the data space and can be considered as a mapping from the data space to a real value in
, which represents the probability of one sample being real. Eq.1 defines the minimax objective of the standard GANs:
where is the real data distribution, is the prior distribution on the latent space, and is the model distribution of the generated sample . For the original GAN , the function is chosen as , and the Wasserstein GAN  applies . This adversarial density-ratio estimation  enforces to match .
3.3 Deterministic encoder-generator pair
Many previous works, such as ALI , BiGAN , combined the inference network (i.e., encoder) and GAN together to form a bidirectional mapping. However, due to the lack of consistent mapping between data samples and latent variables, it usually obtains poor reconstruction results. To turn the generator in DLS-Clustering into a good decoder, we need to apply several constraints between the posterior distribution and the prior distribution . Because the latent variable , for the prior , these constraints can be added by simply penalizing the discrete variable part and the continuous variable part separately.
The constraint of discrete variables can be computed through the inverse network, which involves first generating the data sample from and then encoding it back to the latent variable (), as shown in Figure 1. Therefore, the penalty of discrete variables can be defined by the cross-entropy loss between the original input and the recalculated discrete variable
The constraint of continuous variables can be considered in the standard Autoencoder model. As shown in Figure 1, the encoder encodes the real data sample to the latent variables and . To ensure that the generator can reconstruct the original data from these latent variables, we apply an additional regularizer to encourage the encoded posterior distribution to match the prior distribution like AAE  and WAE . The former uses the GAN-based density-ratio trick to estimate the KL-divergence , and the latter minimizes the distance between distributions based on Maximum mean discrepancy (MMD) [18, 28]. For the sake of optimization stability, we choose MMD to quantify the distance between the prior distribution and the posterior . And the regularizer based on MMD can be expressed as
where can be any positive definite kernel, are sampled from the prior distribution , is sampled from the posterior and is sampled from the real data samples for .
In DLS-Clustering, the encoding distribution and the decoding distribution are taken to be deterministic, i.e., and can be replaced by and , respectively. Therefore, we use a mean squared error (MSE) criterion as reconstruction loss, and write the standard Autoencoder loss as
3.4 Disentangled representation
Although the above constraints are applied to enforce consistency between the distributions over and , in order to avoid “posterior collapse” and obtain more promising representations, we impose an additional penalty to the objective to disentangle the latent variables. We utilize the weights sharing generator and encoder to enforce the disentanglement between discrete and continuous latent variables. In our architecture (Figure 1), all encoders and generators share the same weights. Thus, it requires no more parameters to disentangle latent variables.
In practice, we sample the data sample from the real data distribution, and sample the latent variable from the discrete-continuous prior. The encoder maps the data sample to latent representations and . To ensure that and are independent, we create the new latent variable by recombining the variables and . Therefore, the generated data samples and will have identical discrete latent variable . Then is re-encoded to the latent variables . The cross-entropy loss between and can ensure that the discrete variable isn’t modified when the continuous variable changes,
In addition, to ensure that the continuous variable doesn’t contain any information about the discrete variable, it is also necessary to use an additional regularizer to penalize the continuous latent variable. The generator generates the data sample from new latent variable , and the encoder recovers the continuous latent variable from . Therefore, we penalize the deviation between and by using the MSE loss:
3.5 Objective of DLS-Clustering
The objective function of our approach can be integrated into the following form:
where the corresponding regularization coefficients , controlling the relative contribution of different loss terms. Each term of Eq. 7 plays a different role for the three components: generator , discriminator and encoder . Both of and are related to and , which constrain the whole latent variables. The term is also related to the , which focus on distinguishing the true data samples from the fake samples generated by . and are related to continuous latent variables, and and are related to discrete latent variables. All these loss terms can ensure that our algorithm will disentangle the whole latent space into cluster information and remaining unspecified factors. The training procedure of DLS-Clustering involves jointly updating the parameters of , and , as described in Algorithm 1. In this work, we empirically set and to enable a reasonable adjustment of the relative importance of continuous and discrete parts.
|Dataset||Discrete Dim.||Continuous Dim.|
In this section, we perform a variety of experiments to evaluate the effectiveness of our proposed method.
4.1 Data sets
The clustering experiments first are carried out on five datasets: MNIST, Fashion-MNIST , YouTube-Face (YTF) , Pendigits and 10x_73k . Both of the first two datasets contain 70k images with 10 categories, and each sample is a grayscale image. YTF contains 10k face images of size , belonging to 41 categories. The Pendigits dataset contains a time series of coordinates of hand-written digits. It has 10 categories and contains 10992 samples, and each sample is represented as a 16-dimensional vector. The 10x_73k dataset contains 73233 data samples of single cell RNA-seq counts of 8 cell types, and the dimension of each sample is 720. We choose these datasets to demonstrate that our method can be effective for clustering different types of data.
We implement different neural network structures for , and to handle different types of data. For the image datasets (MNIST, Fashion-MNIST and YTF), We employ the similar and of DCGAN uses the same architecture as the except the last layer. For the Pendigits and 10x_73k datasets, the , and are the MLP with 2 hidden layers of 256 hidden units each. Table 1 summarizes the network structures for different datasets. The model parameters have been initialized following the random normal distribution. For the prior distribution of our method, we randomly generate the discrete latent code , which is equal to one of the elementary one-hot encoded vectors in , then sample the continuous latent code from , here . The sampled latent code is used as the input of to generate samples. The dimensions of and are shown in Table 2. We implement the MMD loss with RBF kernel  to penalize the posterior distribution . The improved GAN variant with a gradient penalty  is used in all experiments. To obtain the cluster assignment, we directly use the argmax over all softmax probabilities for different clusters. The following regularization parameters work well during all experiments: , ,
. We implement all the models in Python using TensorFlow library, and train them on one NVIDIA DGX-1 station.
4.3 Evaluation of DLS-Clustering algorithm
To evaluate clustering results, we report two standard evaluation metrics: Clustering Purity (ACC) and Normalized Mutual Information (NMI). We compare DLS-Clustering with four clustering baselines: K-means, Non-negative matrix Factorization (NMF) 
, Spectral Clustering (SC) and Agglomerative Clustering(AGGLO) . We also compare our method with the state-of-the-art clustering approaches based on GAN and Autoencoder respectively. For GAN-based approaches, ClusterGAN  is chosen as it achieves the superior clustering performance compared to other GAN models (e.g., InfoGAN and GAN with bp). For Autoencoder-based methods, DEC , DCN  and DEPICT , especially, Dual Autoencoder Network (DualAE)  are used for comparison. In addition, the deep spectral clustering (SpectralNet)  and joint unsupervised learning (JULE)  are also included in our comparison.
Table 3 reports the best clustering metrics of different models from 5 runs. Our method achieves significant performance improvement on Fashion-10, YTF, Pendigits and 10x_73k datasets than other methods. In particular, for the 16-dimensional Pendigit dataset, the methods all perform worse than K-means does, while our method significantly outperforms K-means in both ACC (0.847 vs. 0.793) and NMI (0.803 vs. 0.730). DLS-Clustering achieves the best ACC result on YTF dataset while maintaining comparable NMI value. For MNIST dataset, DLS-Clustering achieves close to best performance on both ACC and NMI metrics.
4.4 Analysis on continuous latent variables
The superior clustering performance of DLS-Clustering demonstrates that the one-hot discrete latent variables directly represent the category information in data. To understand the information contained in the continuous latent variables, we first use t-SNE  to visualize the continuous latent variable of MNIST and Fashion-MNIST datasets and compare them to the original data. As shown in Figure 2, we can clearly see category information in original MNIST (a(1)) and Fashion-MNIST (b(1))data. Meanwhile, there is no obvious category in the of MNIST (a(2)) and Fashion-MNIST (b(2)) data. Samples in all categories are well mixed in both data sets. A small bulk of samples in the right part of a(2) is a group of “1” images. The reason that they are not distributed may be due to their low complexity.
Then, we fix the discrete latent variable and generate images belonging to the same clusters by sampling the continuous latent variables. As shown in Figure 3, the diversity of generated images indicates that the continuous latent variable contains a large number of generative factors, except the cluster information. To further understand the factors in continuous latent variable , we change the value of one single dimension from [-0.5, 0.5] in while fixing other dimensions and the discrete latent variable . As shown in Figure 4, the value changing leads to semantic changes in the generated images. For the MNIST data, this changed dimension represents the width factor of variation in the digits. For the Fashion-MNIST data, it captures the shape factor of objects. All these informative continuous factors are independent of cluster categories.
These results demonstrate that the learned continuous latent representations from DLS-Clustering have captured other meaningful generative factors that are not related to clusters. Therefore, the proposed method successfully performs the mapping from the data to the disentangled latent space. The one-hot discrete latent variable is directly related to clusters, and the continuous latent variable, which corresponds to the other unspecified generative factors, governs the diversity of generated samples.
4.5 Scalability of large number of clusters
To further evaluate the scalability of DLS-Clustering to large numbers of clusters, we run the it on the multi-view object image dataset COIL-100 . The COIL-100 dataset has 100 clusters and contains 7200 images of size . Here, we compare our clustering method with K-means on three standard evaluation metrics: ACC, NMI and Adjusted Rand Index (ARI). As shown in Table 4, DLS-Clustering achieves better performance on all three metrics by directly learning clusters and 100-dimensional continuous latent representations. Especially, DLS-Clustering gains an increase of 0.154 on ACC metric. We also perform image generation task on Coil-100 dataset, to further verify the generative performance, which involves mapping latent variables to the data space. Figure 5 shows the generated samples by fixing one-hot discrete latent variables, which are diverse and realistic. The continuous latent variables represent meaningful factors such as the pose, location and orientation information of objects. Therefore, the disentanglement of latent space not only provides the superior clustering performance, but also retains the remarkable ability of diverse and high-quality image generation.
In this work, we present DLS-Clustering, a new type of clustering method that directly obtain the cluster assignments by disentangling the latent space in an unsupervised fashion. Unlike existing latent space clustering algorithms, our method does not build “clustering friendly” latent space explicitly and does not need extra clustering operation. Furthermore, our method does not disentangle class relevant features from class non-relevant features. The disentanglement in our method is targeted to extract “cluster information” from data. Moreover, unlike distance-based clustering algorithms, our method does not depend on any explicit distance calculation in the latent space. The distance between data may be implicitly defined in neural network.
Besides clustering, the generator in our method can also generate diverse and realistic samples. The proposed method can also support other applications, including conditional generation based on clusters, cluster-specific image transfer and cross-cluster retrieval. In the future, we will explore better priors for the latent space and more disentanglement of other generative factors.
-  (2017) CVAE-gan: fine-grained image generation through asymmetric training. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2745–2754. Cited by: §3.
-  (2013) Representation learning: a review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35 (8), pp. 1798–1828. Cited by: §2, §2.
-  (2018) Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 132–149. Cited by: §1, §2.
-  (2017) Deep adaptive image clustering. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5879–5887. Cited by: §2.
-  (2018) Isolating sources of disentanglement in variational autoencoders. In Advances in Neural Information Processing Systems, pp. 2610–2620. Cited by: §2.
-  (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. In Advances in neural information processing systems, pp. 2172–2180. Cited by: §2, Table 3.
-  (2006) Fuzzy c-means clustering with spatial information for image segmentation. computerized medical imaging and graphics 30 (1), pp. 9–15. Cited by: §1.
-  (2016) Deep unsupervised clustering with gaussian mixture variational autoencoders. arXiv preprint arXiv:1611.02648. Cited by: §2.
-  (2016) Adversarial feature learning. arXiv preprint arXiv:1605.09782. Cited by: §3.3.
-  (2000) High-dimensional data analysis: the curses and blessings of dimensionality. AMS math challenges lecture 1 (2000), pp. 32. Cited by: §1.
-  (2016) Adversarially learned inference. arXiv preprint arXiv:1606.00704. Cited by: §3.3.
-  (2018) Learning disentangled joint continuous and discrete representations. In Advances in Neural Information Processing Systems, pp. 710–720. Cited by: §2.
-  (2018) Structured disentangled representations. arXiv preprint arXiv:1804.02086. Cited by: §2.
-  (2017) Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5736–5745. Cited by: §4.3, Table 3.
-  (2019) From variational to deterministic autoencoders. arXiv preprint arXiv:1903.12436. Cited by: §2.
-  (2018) Image-to-image translation for cross-domain disentanglement. In Advances in Neural Information Processing Systems, pp. 1287–1298. Cited by: §2.
-  (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §3.2, §3.2.
A kernel two-sample test.
Journal of Machine Learning Research13 (Mar), pp. 723–773. Cited by: §1, §3.3.
-  (2017) Improved training of wasserstein gans. In Advances in neural information processing systems, pp. 5767–5777. Cited by: §3.2, §3.2, §3, §4.2.
-  (2017) Improved deep embedded clustering with local structure preservation.. In IJCAI, pp. 1753–1759. Cited by: §2.
A two-step disentanglement method.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 772–780. Cited by: §2, §3.1.
-  (2017) Beta-vae: learning basic visual concepts with a constrained variational framework.. ICLR 2 (5), pp. 6. Cited by: §2.
-  (2017) Learning discrete representations via information maximizing self-augmented training. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1558–1567. Cited by: §2.
-  (2018) Disentangling by factorising. arXiv preprint arXiv:1802.05983. Cited by: §2.
-  (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §2.
-  (2015) Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300. Cited by: §3.
-  (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401 (6755), pp. 788. Cited by: §4.3, Table 3.
Generative moment matching networks. In International Conference on Machine Learning, pp. 1718–1727. Cited by: §3.3.
-  (2008) Visualizing data using t-sne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §4.4.
-  (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1, pp. 281–297. Cited by: §1, §1, §4.3, Table 3.
-  (2015) Adversarial autoencoders. arXiv preprint arXiv:1511.05644. Cited by: §3.3.
-  (2016) Disentangling factors of variation in deep representation using adversarial training. In Advances in Neural Information Processing Systems, pp. 5040–5048. Cited by: §2.
-  (2019) Adversarial deep embedded clustering: on a better trade-off between feature randomness and feature drift. arXiv preprint arXiv:1909.11832. Cited by: §2.
Clustergan: latent space clustering in generative adversarial networks.
Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 4610–4617. Cited by: §1, §2, §3.2, §4.3, Table 3.
-  Columbia object image library (coil-20). Cited by: §4.5.
-  (2019) Y-autoencoders: disentangling latent representations via sequential-encoding. arXiv preprint arXiv:1907.10949. Cited by: §2, §2, §3.1.
-  (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Cited by: §4.2.
-  (2018) Spectralnet: spectral clustering using deep neural networks. arXiv preprint arXiv:1801.01587. Cited by: §4.3, Table 3.
-  (2000) Normalized cuts and image segmentation. Departmental Papers (CIS), pp. 107. Cited by: §4.3, Table 3.
-  (2017) Wasserstein auto-encoders. arXiv preprint arXiv:1711.01558. Cited by: §3.3, §4.2.
-  (2018) Recent advances in autoencoder-based representation learning. arXiv preprint arXiv:1812.05069. Cited by: §1, §3.2, §3.3.
Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research 11 (Dec), pp. 3371–3408. Cited by: §2.
-  (2019) Dominant set clustering and pooling for multi-view 3d object recognition. arXiv preprint arXiv:1906.01592. Cited by: §1.
-  (2011) Face recognition in unconstrained videos with matched background similarity. IEEE. Cited by: §4.1.
-  (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747. Cited by: §4.1.
Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pp. 478–487. Cited by: §1, §2, §4.3, Table 3.
Towards k-means-friendly spaces: simultaneous deep learning and clustering. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3861–3870. Cited by: §1, §2, §4.3, Table 3.
-  (2016) Joint unsupervised learning of deep representations and image clusters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5147–5156. Cited by: §2, §4.3, Table 3.
-  (2019) Deep spectral clustering using dual autoencoder network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4066–4075. Cited by: §2, §4.3, Table 3.
-  (2012) Graph degree linkage: agglomerative clustering on a directed graph. In European Conference on Computer Vision, pp. 428–441. Cited by: §4.3, Table 3.
-  (2017) Massively parallel digital transcriptional profiling of single cells. Nature communications 8, pp. 14049. Cited by: §4.1.
-  (2019) Disentangling latent space for vae by label relevant/irrelevant dimensions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12192–12201. Cited by: §2, §3.1.