Introduction
Clustering aims to separate the samples into different groups such that samples in the same cluster should be as similar as possible while samples among different clusters should be as dissimilar as possible, which is one of the most basic problems in machine learning
[40, 32]. Clustering in computer vision is especially difficult for the lack of low dimensional discriminative representation. With the development of deep learning
[31], more and more researchers pay attention to simultaneously learn features and clustering with unlabelled images. Although deep clustering methods perform significantly better than traditional methods, they are far from satisfactory on many large and complicated image datasets. How to improve the accuracy and stability of clustering is still a very important but challenging problem.Most of the existing methods alternately update the cluster assignment and intersample similarities which are used to guide the model training[44, 6, 42]. Nevertheless, they are susceptible to the inevitable errors distributed in the neighborhoods and suffer from errorpropagation during training. To solve this problem, some methods proposed to take advantage of mutual information and data augmentation[1] to simultaneously learn representation and cluster assignment[2, 24, 20]
. Specifically, they tried to maximize the mutual information between the assignment distributions of original images and their augmentations, which help to greatly improve the performance. For a general deep clustering framework, the representation is first extracted by a convolutional neural network(CNN), then the
dimensional logits (we call it assignment feature) can be obtained after a fully connected layer. After that, the
dimensional assignment probability can be calculated by the softmax function. The main idea of the latest methods is that the original image and its augmentation should share similar assignment probability. However, the assignment features could be quite different even the assignment probabilities are almost the same since the assignment probability is only sensitive to the maximum value of the assignment feature (See Figure 1(a)). Therefore, it will lead to unstable clustering results with high intraclass diversities. As shown in Figure 1(b), many boundary points can be misclassified if only assignment probability is used, which will greatly harm the clustering performance.In order to obtain more stable clusters and improve the clustering accuracy, we propose a novel method named Deep Robust Clustering by Contrastive Learning (DRC). Different from the existing methods, DRC tires not only learn invariant clustering results but also invariant features. From the perspective of assignment probability, DRC aims to maximize the mutual information between the cluster assignment distribution of the original images and their augmentations by a global view, which can help to increase interclass variance and lead to high confident partitions. From the perspective of assignment feature, DRC aims to maximize the mutual information between the assignment features of the original image and its augmentation by a local view, which can help to decrease intraclass variance and achieve more robust clusters (See Figure 1
(c)). In addition, we demonstrate that maximizing the mutual information is equivalent to minimizing two contrastive losses, which has been proved powerful and more friendly for training in unsupervised learning
[17, 38, 15, 7]. The main contributions can be summarized as:
We point out the drawback of the existing deep clustering methods and proposed a new method named Deep Robust Clustering by Contrastive Learning (DRC). As far as we know, DRC is the first work to look into deep clustering from perspectives of both assignment probability and assignment feature, which help to increase interclass diversities and decrease intraclass diversities simultaneously.

We investigated the internal relationship between mutual information and contrastive learning and summarized a general framework that can turn any maximizing mutual information into minimizing contrastive loss. DRC is the first work that successfully applies contrastive learning to deep clustering and achieves significant improvement.

Extensive experiments on six widelyadopted deep clustering benchmarks show that DRC can achieve more robust clusters and outperform a wide range of the stateoftheart methods.
Related Work
Deep Clustering.
There exists two categories of deep clustering approaches: explicitly learn the cluster assignment [45, 44, 5, 13, 41, 6, 46]
and combine deep representation learning with cluster analysis
[22, 36, 14, 25].The former usually aims to mine the estimated information or estimated groundtruth to train the network with a way of the supervised method. DEC
[44]introduced the Kmeans
[27] to conduct cluster assignments on pretrained image features and then iteratively optimized the network with more confident estimated groundtruth. IIC [23] and DCCM [41] exploited the intersamples relations based on the pairwise relationship between the latest sample features and optimize the model accordingly. But, these pseudo relations or pseudo labels may cause severe errorpropagation at the beginning stage of training, which limits their performance. On the contrary, the latter focused on exploiting a good cluster structure to train the network. PICA [21] maximized the global partition confidence of the clustering solution.Mutual Information.
Information theory has been utilized as a tool to train the deep networks in particular. IMSAT [19] used data augmentation to impose the invariance on discrete representations by maximizing mutual information between data and its representation. DeepINFOMAX [18]
simultaneously estimated and maximized the mutual information between input data and learned highlevel representations. However, they computed mutual information over continuous random variables, which required complex estimators. On the contrary, IIC
[23] did so for discrete variables with simple and exact computations.Contrastive Learning.
Contrastive learning has been widely used in unsupervised deep learning. [8] proposed this technique which used a maxmargin approach to separate positive from negative examples based on triplet losses. [11]
proposed a parametric form method that considered each instance as a class represented by a feature vector.
[43] introduced a memory bank to store the instance class representation embedding. Then, [50, 38] adopted and extended this memory back based approach in their recent paper. On the contrary, [10, 47] replaced the memory bank with the use of inbatch samples for negative sampling.Method
In this section, we will first present the problem definition of deep clustering. And we will propose our novel endtoend deep clustering framework. Then, we will demonstrate the relationship both mutual information and contrastive learning. Next, we introduce two contrastive losses based on this relationship: assignment feature loss and assignment probability loss. Finally, we describe the model training method for DRC.
Problem Formulation
Given a set of unlabelled images drawn from different semantic classes. Deep clustering aims to separate the images into different clusters by convolutional neural network (CNN) models such that the images with the same semantic labels can be reduced into the same cluster. Here we aim to learn a deep CNN network based on mapping function with parameter , then each image can be mapped to a dimension assignment feature . After that, the assignment probability vector can be obtained by softmax function which can be defined by
Then the cluster assignment can be predicted by maximum likelihood:
Framework
To address the above problem, we introduce a novel endtoend deep clustering framework to take advantage of both assignment probability and assignment feature. As shown in Figure 2, we first adopt the deep convolutional neural network(CNN) to generate assignment feature and assignment probability of dimension. After that, a contrastive loss based on assignment probability is used to hold the assignment consistency of original images and their augmentations, which can help to increase interclass variance and formulate wellseparated clusters. And a contrastive loss based on assignment feature is used to capture the representation consistency between original images and their augmentations, which can help to decrease intraclass variance and achieve more robust clusters.
Mutual Information Contrastive Learning
Contrastive learning has been proven to be powerful in unsupervised and selfsupervised learning, which helps to achieve stateoftheart results in many tasks. And Contrastive loss is also strongly related to mutual information. Let
be samples in a given space. And the transformation of is defined by . Since we know nothing about the ground truth of , all what we know is that can be view as a positive sample of for any . In other words, should be much bigger than . A very natural idea is maximally preserving the mutual information between and defined as(1) 
If we assume
(2) 
where is a function that can be different in different situations, then we have the following theorem.
Theorem 1
Assume there exists a constant such that holds for all , then
holds.
Proof: Denote , then we have
Define
(3) 
so minimizing contrastive loss is equal to maximizing a lower bound of mutual information .
Loss Function
Our loss function consists of three parts: 1. a contrastive loss based on assignment feature that preserves mutual information in feature level. 2. a contrastive loss based on assignment probability that maximizes the mutual information between predicted labels of original images and predicted labels of transformed images. 3. a cluster regularization loss is to avoid trivial solutions.
Let be the augmentations of , where is a random transformation for .
Assignment Feature Loss.
From the perspective of assignment feature, we can assume , where
and .
A basic assumption is that the assignment features between a image and its augmentation should be similar. To maximize the mutual information , it is reasonable to define
(4) 
according to Theorem 1, where is a temperature parameter. Then we can define the assignment feature loss as:
(5) 
Assignment Probability Loss.
As mentioned in problem formulation, let
be the assignment probability matrix for and respectively, we can write the matrix as
where and can tell us which pictures in and will be assigned to cluster respectively. Here we let . Since are the augmentations of , the cluster assignments should be consistent, which is equal to maximizing the mutual information
(6) 
where is the joint assignment distribution of and , and and are the marginal distributions. Base on Theorem 1, we can define
(7) 
where is also a temperature parameter. Then we can define the assignment probability loss as:
(8) 
Cluster regularization Loss.
In deep clustering, it is easy to fall into a local optimal solution that assign most samples into a minority of clusters. Inspired by group lasso[34], we introduce cluster regularization loss to address this problem, which can be formulated as:
(9) 
where indicate the th element of .
Then the overall objective function of DRC can be formulated as:
(10) 
where is a weight parameter.
Model training
The objective function (Eq. 10
) is differentiable endtoend, enabling the conventional stochastic gradient descent algorithm for model training. The assignment probability loss and assignment feature loss are calculated by a random minibatch of images and their augmentations. The training procedure is summarized in Algorithm
1.Experiments
Datasets & Metrics
We conduct extensive experiments on six widelyadopted benchmark datasets. For fair comparison, we adopt the same experimental setting as [6, 21].

CIFAR10/100: [28] A natural image dataset with 50,000/10,000 samples from 10(/100) classes in which the training and testing images of each dataset are jointly utilized to clustering.

TinyImageNet: [30] A subset of ImageNet with 200 classes which is a very challenging dataset for clustering. There are 100,000/10,000 training/test images evenly distributed in each category.

Evaluation Metrics: We used three standard clustering performance metrics: Accuracy (ACC), Normalized Mutual Information (NMI) and Adjusted Rand Index (ARI).
Datasets  CIFAR10  CIFAR100  STL10  ImageNet10  Imagenetdog15  TinyImageNet  

Methods\Metrics  NMI  ACC  ARI  NMI  ACC  ARI  NMI  ACC  ARI  NMI  ACC  ARI  NMI  ACC  ARI  NMI  ACC  ARI 
Kmeans  0.087  0.229  0.049  0.084  0.130  0.028  0.125  0.192  0.061  0.119  0.241  0.057  0.055  0.105  0.020  0.065  0.025  0.005 
SC  0.103  0.247  0.085  0.090  0.136  0.022  0.098  0.159  0.048  0.151  0.274  0.076  0.038  0.111  0.013  0.063  0.022  0.004 
AC  0.105  0.228  0.065  0.098  0.138  0.034  0.239  0.332  0.140  0.138  0.242  0.067  0.037  0.139  0.021  0.069  0.027  0.005 
NMF  0.081  0.190  0.034  0.079  0.118  0.026  0.096  0.180  0.046  0.132  0.230  0.065  0.044  0.118  0.016  0.072  0.029  0.005 
AE  0.239  0.314  0.169  0.100  0.165  0.048  0.250  0.303  0.161  0.210  0.317  0.152  0.104  0.185  0.073  0.131  0.041  0.007 
DAE  0.251  0.297  0.163  0.111  0.151  0.046  0.224  0.302  0.152  0.206  0.304  0.138  0.104  0.190  0.078  0.127  0.039  0.007 
GAN  0.265  0.315  0.176  0.120  0.151  0.045  0.210  0.298  0.139  0.225  0.346  0.157  0.121  0.174  0.078  0.135  0.041  0.007 
DeCNN  0.240  0.282  0.174  0.092  0.133  0.038  0.227  0.299  0.162  0.186  0.313  0.142  0.098  0.175  0.073  0.111  0.035  0.006 
VAE  0.245  0.291  0.167  0.108  0.152  0.040  0.200  0.282  0.146  0.193  0.334  0.168  0.107  0.179  0.079  0.113  0.036  0.006 
JULE  0.192  0.272  0.138  0.103  0.137  0.033  0.182  0.277  0.164  0.175  0.300  0.138  0.054  0.138  0.028  0.102  0.033  0.006 
DEC  0.257  0.301  0.161  0.136  0.185  0.050  0.276  0.359  0.186  0.282  0.381  0.203  0.122  0.195  0.079  0.115  0.037  0.007 
DAC  0.396  0.522  0.306  0.185  0.238  0.088  0.366  0.470  0.257  0.394  0.527  0.302  0.219  0.275  0.111  0.190  0.066  0.017 
DCCM  0.496  0.623  0.408  0.285  0.327  0.173  0.376  0.482  0.262  0.608  0.710  0.555  0.321  0.383  0.182  0.224  0.108  0.038 
IIC    0.617      0.257      0.610                     
PICA:(Mean)  0.561  0.645  0.467  0.296  0.322  0.159  0.592  0.693  0.504  0.782  0.850  0.733  0.336  0.324  0.179  0.277  0.094  0.016 
PICA:(Best)  0.591  0.696  0.512  0.310  0.337  0.171  0.611  0.713  0.531  0.802  0.870  0.761  0.352  0.352  0.201  0.277  0.098  0.040 
DRC:(Mean)  0.612  0.716  0.534  0.343  0.355  0.196  0.639  0.744  0.564  0.828  0.883  0.796  0.377  0.373  0.222  0.315  0.132  0.053 
DRC:(Best)  0.621  0.727  0.547  0.356  0.367  0.208  0.644  0.747  0.569  0.830  0.884  0.798  0.384  0.389  0.233  0.321  0.139  0.056 
Implementation Details
We adopt PyTorch
[35] to implement our approach. The network architecture used in our framework is a variant version of ResNet [16] which is the same as [21]. For fair comparisons with other approaches, we followed most of the same setting as [21, 23]. We used Adam [25] optimizer with and train 500 epochs. And we set the batch size to 256 and repeated each inbatch sample 2 times to contrastive learning. For hyperparameters, we set for all datasets. And we set the temperature for assignment feature loss and for assignment probability loss. Similar to PICA [21], we also utilized the same auxiliary overclustering method in a separate clustering head to exploit the additional data from irrelevant classes if available. To report the stable performance of the approach, we trained our model in all datasets with 5 trials and displayed the average and best results separately.Comparisons to StateoftheArt Methods.
For clustering, we adopt both traditional methods and deep learning based methods, including Kmeans, spectral clustering (SC)
[49], agglomerative clustering (AC) [12], the nonnegative matrix factorization (NMF) based clustering [4], autoencoder (AE) [3], denoising autoencoder (DAE) [39], GAN [37], deconvolutional networks (DECNN) [48], variational autoencoding (VAE) [26], deep embedding clustering (DEC) [44], jointly unsupervised learning (JULE) [46],deep adaptive image clustering (DAC) [6], invariant information clustering [23], deep comprehensive correlation Mining (DCCM) [41] and partition confidence maximisation (PICA) [21]. The results are shown in Table 1. Most results of other methods are directly copied from PICA [21]. We find that our approach DRC significantly surpasses other methods by a large margin on six widelyused deep clustering benchmarks under three different evaluation metrics. Specifically, the improvement of DRC is very significant even compared with the stateoftheart method PICA. Take the clustering mean ACC for example, our results are 7.1%, 3.3%, 5.1% higher than that of PICA on the CIFAR10, CIFAR100 and STL10 respectively. We further compare the variance of ACC between DRC and PICA[21] and the results are shown in Figure 3. We can see that DRC has a much smaller variance than PICA on five of the six datasets, which implies DRC can give much more robust clustering results under different initialization.Ablation Study
In this section, we will demonstrate that the three parts of losses in DRC are all very important to achieve stateoftheart performance by ablation analysis.
Effect of two contrastive losses.
We first investigate how assignment probability loss and assignment feature loss affect the clustering performance on CIFAR10, CIFAR100 and ImageNet10, and the results are shown in Table 2. It is clear that the two losses both help on all three datasets. At the same time, it is also very reasonable that assignment probability loss plays a greater role, since it directly affects the clustering result. And assignment feature loss is also indispensable, especially in CIFAR10 and CIFAR100.
Effect of cluster regularization loss.
Deep clustering can easily fall into a local optimal solution, when most samples are assigned to the same cluster. We then examine how the cluster regularization loss addresses this problem. As shown in Table 4, we can see that it significantly helps to improve the clustering performance. It is interesting to see the assignment feature loss and cluster regularization loss have little impact on ImageNet10, since it is a relatively easy dataset that images from different classes are well separated.
Effect of batch size.
According to the [7] , contrastive learning benefits from larger batch sizes. To evaluate the effect of batch size, we adopted different ranges batch size {32, 64, 128, 256, 512, 1024} to train DRC on CIFAR10 dataset. The results can be seen in Table 4. We can find the larger batch size will achieve better performance.
Variance analysis.
A good cluster embedding should have a smaller intraclass variance and a larger interclass variance. In order to prove the superiority of DRC from this aspect, we randomly select 6,000 samples from CIFAR10 and calculate the intraclass variance and interclass variance by using the assignment probability. As shown in Figure 6, we can see that DRC achieves relatively smaller intraclass variance but larger interclass variance than PICA[21]. This also demonstrates that DRC can obtain more robust clusters than the existing stateoftheart methods.
Qualitative Study
Visualization of cluster assignment.
To further illustrate that DRC can get more robust clustering results, we compare it with PICA on CIFAR10 by visualising the assignment feature and assignment probability. We plot the predictions of 6,000 randomly selected samples with the groundtruth classes color encoded by using tSNE[33]. Figure 4(a) and Figure 4(b) show the results of assignment probability, we can see that samples of the same class are closer and samples of different classes better separated for DRC. For the assignment feature, we also see a similar phenomenon in Figure 4(c) and Figure 4(d).
Success vs. failure cases.
At last, we investigate both success and failure cases to get extra insights into our method. Specifically, we study the following three cases of four classes from STL10: (1) Success cases, (2) False negative failure cases, (3) False positive cases. As shown in Figure 5, DRC can successfully group together images of the same class with different backgrounds and angles. Two different failure cases tell us that DRC mainly learns the shape of objects. Samples of different classes with a similar pattern may be grouped together and samples of the same class with different patterns may be separated into different classes. It is hard to look into the details at the absence of the groundtruth labels, which is still an unsolved problem for unsupervised learning and clustering.
Conclusion
We summarized a general framework that can turn any maximizing mutual information into minimizing contrastive loss. And we apply it to both the semantic clustering assignment and representation feature, which can help to increase interclass diversities and decrease intraclass diversities to perform more robust clustering assignment. Extensive experiments on six challenging datasets demonstrated DRC method can achieve stateoftheart results.
References

[1]
(2017)
Data augmentation generative adversarial networks
. arXiv preprint arXiv:1711.04340. Cited by: Introduction.  [2] (2019) Selflabelling via simultaneous clustering and representation learning. arXiv preprint arXiv:1911.05371. Cited by: Introduction.
 [3] (2007) Greedy layerwise training of deep networks. In Advances in neural information processing systems, pp. 153–160. Cited by: Comparisons to StateoftheArt Methods..
 [4] (2009) Locality preserving nonnegative matrix factorization.. In IJCAI, Vol. 9, pp. 1010–1015. Cited by: Comparisons to StateoftheArt Methods..
 [5] (2019) Deep discriminative clustering analysis. arXiv preprint arXiv:1905.01681. Cited by: Deep Clustering..
 [6] (2017) Deep adaptive image clustering. In Proceedings of the IEEE international conference on computer vision, pp. 5879–5887. Cited by: Introduction, Deep Clustering., 3rd item, Datasets & Metrics, Comparisons to StateoftheArt Methods..
 [7] (2020) A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709. Cited by: Introduction, Effect of batch size..

[8]
(2005)
Learning a similarity metric discriminatively, with application to face verification.
In
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)
, Vol. 1, pp. 539–546. Cited by: Contrastive Learning.. 
[9]
(2011)
An analysis of singlelayer networks in unsupervised feature learning.
In
Proceedings of the fourteenth international conference on artificial intelligence and statistics
, pp. 215–223. Cited by: 2nd item.  [10] (2017) Multitask selfsupervised visual learning. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2051–2060. Cited by: Contrastive Learning..
 [11] (2014) Discriminative unsupervised feature learning with convolutional neural networks. In Advances in neural information processing systems, pp. 766–774. Cited by: Contrastive Learning..
 [12] (1978) Agglomerative clustering using the concept of mutual nearest neighbourhood. Pattern recognition 10 (2), pp. 105–112. Cited by: Comparisons to StateoftheArt Methods..
 [13] (2017) Improved deep embedded clustering with local structure preservation.. In IJCAI, pp. 1753–1759. Cited by: Deep Clustering..
 [14] (2018) Associative deep clustering: training a classification network with no labels. In German Conference on Pattern Recognition, pp. 18–32. Cited by: Deep Clustering..
 [15] (2020) Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738. Cited by: Introduction.
 [16] (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: Implementation Details.
 [17] (2019) Dataefficient image recognition with contrastive predictive coding. arXiv preprint arXiv:1905.09272. Cited by: Introduction.
 [18] (2018) Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670. Cited by: Mutual Information..
 [19] (2017) Learning discrete representations via information maximizing selfaugmented training. arXiv preprint arXiv:1702.08720. Cited by: Mutual Information..
 [20] (2020) Deep semantic clustering by partition confidence maximisation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8849–8858. Cited by: Introduction.
 [21] (2020) Deep semantic clustering by partition confidence maximisation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8849–8858. Cited by: Deep Clustering., Datasets & Metrics, Implementation Details, Comparisons to StateoftheArt Methods., Variance analysis..
 [22] (2017) Deep subspace clustering networks. In Advances in Neural Information Processing Systems, pp. 24–33. Cited by: Deep Clustering..
 [23] (2019) Invariant information clustering for unsupervised image classification and segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 9865–9874. Cited by: Deep Clustering., Mutual Information., Implementation Details, Comparisons to StateoftheArt Methods..
 [24] (2019) Invariant information clustering for unsupervised image classification and segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 9865–9874. Cited by: Introduction.
 [25] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: Deep Clustering., Implementation Details.
 [26] (2013) Autoencoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: Comparisons to StateoftheArt Methods..
 [27] (1999) Genetic kmeans algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 29 (3), pp. 433–439. Cited by: Deep Clustering..
 [28] (2009) Learning multiple layers of features from tiny images. Cited by: 1st item.
 [29] (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: 3rd item.
 [30] (2015) Tiny imagenet visual recognition challenge. CS 231N 7. Cited by: 4th item.
 [31] (2015) Deep learning. nature 521 (7553), pp. 436–444. Cited by: Introduction.
 [32] (2001) Algorithms for nonnegative matrix factorization. In Advances in neural information processing systems, pp. 556–562. Cited by: Introduction.
 [33] (2008) Visualizing data using tsne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: Visualization of cluster assignment..

[34]
(2008)
The group lasso for logistic regression
. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70 (1), pp. 53–71. Cited by: Cluster regularization Loss..  [35] (2017) Automatic differentiation in pytorch. Cited by: Implementation Details.
 [36] (2017) Cascade subspace clustering. In ThirtyFirst AAAI conference on artificial intelligence, Cited by: Deep Clustering..
 [37] (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Cited by: Comparisons to StateoftheArt Methods..
 [38] (2019) Contrastive multiview coding. arXiv preprint arXiv:1906.05849. Cited by: Introduction, Contrastive Learning..

[39]
(2010)
Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion.
. Journal of machine learning research 11 (12). Cited by: Comparisons to StateoftheArt Methods..  [40] (2007) A tutorial on spectral clustering. Statistics and computing 17 (4), pp. 395–416. Cited by: Introduction.
 [41] (2019) Deep comprehensive correlation mining for image clustering. In Proceedings of the IEEE International Conference on Computer Vision, pp. 8150–8159. Cited by: Deep Clustering., Deep Clustering., Comparisons to StateoftheArt Methods..
 [42] (2019) Deep comprehensive correlation mining for image clustering. In Proceedings of the IEEE International Conference on Computer Vision, pp. 8150–8159. Cited by: Introduction.
 [43] (2018) Unsupervised feature learning via nonparametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3733–3742. Cited by: Contrastive Learning..
 [44] (2016) Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pp. 478–487. Cited by: Introduction, Deep Clustering., Deep Clustering., Comparisons to StateoftheArt Methods..
 [45] (2017) Towards kmeansfriendly spaces: simultaneous deep learning and clustering. In international conference on machine learning, pp. 3861–3870. Cited by: Deep Clustering..
 [46] (2016) Joint unsupervised learning of deep representations and image clusters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5147–5156. Cited by: Deep Clustering., Comparisons to StateoftheArt Methods..
 [47] (2019) Unsupervised embedding learning via invariant and spreading instance feature. In Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 6210–6219. Cited by: Contrastive Learning..
 [48] (2010) Deconvolutional networks. In 2010 IEEE Computer Society Conference on computer vision and pattern recognition, pp. 2528–2535. Cited by: Comparisons to StateoftheArt Methods..
 [49] (2005) Selftuning spectral clustering. In Advances in neural information processing systems, pp. 1601–1608. Cited by: Comparisons to StateoftheArt Methods..
 [50] (2019) Local aggregation for unsupervised learning of visual embeddings. In Proceedings of the IEEE International Conference on Computer Vision, pp. 6002–6012. Cited by: Contrastive Learning..
Comments
There are no comments yet.