1 Introduction
Thanks to the advances in deep learning techniques, the performance of face recognition has been remarkably boosted
[25, 22, 27, 3]. However, it should be noted that the high accuracy of modern face recognition systems relies heavily on the availability of largescale annotated training data. While one can easily collect a vast quantity of facial images from the Internet, annotating them is prohibitively expensive. Therefore, exploiting unlabeled data, e.g. through unsupervised or semisupervised learning, becomes a compelling option and has attracted lots of interest from both academia and industry
[30, 1].A natural idea to exploit unlabeled data is to cluster them into “pseudo classes”, such that they can be used like labeled data and fed to a supervised learning pipeline. Recent works [30]
have shown that this approach can bring performance gains. Yet, current implementations of this approach still leave a lot to be desired. Particularly, they often resort to unsupervised methods, such as Kmeans
[19][11][31], and approximate rankorder [1], to group unlabeled faces. These methods rely on simplistic assumptions, e.g., Kmeans implicitly assumes that the samples in each cluster are around a single center; spectral clustering requires that the cluster sizes are relatively balanced, etc. Consequently, they lack the capability of coping with complicated cluster structures, thus often giving rise to noisy clusters, especially when applied to largescale datasets collected from realworld settings. This problem seriously limits the performance improvement.Hence, to effectively exploit unlabeled face data, we need to develop an effective clustering algorithm that is able to cope with the complicated cluster structures arising frequently in practice. Clearly, relying on simple assumptions would not provide this capability. In this work, we explore a fundamentally different approach, that is, to learn how to cluster from data. Particularly, we desire to draw on the strong expressive power of graph convolutional network to capture the common patterns in face clusters, and leverage them to help to partition the unlabeled data.
We propose a framework for face clustering based on graph convolutional networks [15]. This framework adopts a pipeline similar to the Mask RCNN [10] for instance segmentation, i.e., generating proposals, identifying the positive ones, and then refining them with masks. These steps are accomplished respectively by an iterative proposal generator based on supervertex, a graph detection network, and a graph segmentation network. It should be noted that while we are inspired by Mask RCNN, our framework still differs essentially: the former operates on a 2D image grid while the latter operates on an affinity graph with arbitrary structures. As shown in Figure 1, relying on the structural patterns learned based on a graph convolutional network instead of some simplistic assumptions, our framework is able to handle clusters with complicated structures.
The proposed method significantly improves the clustering accuracy on largescale face data, achieving a Fscore at
, which is not only superior to the best result obtained by unsupervised clustering methods (Fscore ) but also higher than a recent state of the art [30] (Fscore ). Using this clustering framework to process the unlabeled data, we improve the performance of a face recognition model on MegaFace from to , which is quite close to the performance obtained by supervised learning on all the data ().The main contributions lie in three aspects: (1) We make the first attempt to perform topdown face clustering in a supervised manner. (2) It is the first work that formulates clustering as a detection and segmentation pipeline based on graph convolution networks. (3) Our method achieves stateoftheart performance in largescale face clustering, and boosts the face recognition model close to the supervised result when applying the discovered clusters.
2 Related Work
Face Clustering
Clustering is a basic task in machine learning. Jain
et al. [12] provide a survey for classical clustering methods. Most existing clustering methods are unsupervised. Face clustering provides a way to exploit massive unlabeled data. The study along this direction remains at an early stage. The question of how to cluster faces on largescale data remains open.Early works use handcrafted features and classical clustering algorithms. For example, Ho et al. [11] used gradient and pixel intensity as face features. Cui et al. [2] used LBP features. Both of them adopt spectral clustering. Recent methods make use of learned features. [13] performed topdown clustering in an unsupervised way. Finley et al. [5] proposed an SVMbased supervised method in a bottomup manner. Otto et al. [1]
used deep features from a CNNbased face model and proposed an approximate rankorder metric to link images pairs to be clusters. Lin
et al. [18] designed a similarity measure based on linear SVM trained on the nearest neighbours of data samples. Shi et al. [23] proposed Conditional Pairwise Clustering, formulating clustering as a conditional random field to cluster faces by pairwise similarities. Lin et al. [17] proposed to exploit local structures of deep features by introducing minimal covering spheres of neighbourhoods to improve similarity measure. Zhan et al. [30]trained a MLP classifier to aggregate information and thus discover more robust linkages, then obtained clusters by finding connected components.
Though using deep features, these works mainly concentrate on designing new similarity metrics, and still rely on unsupervised methods to perform clustering. Unlike all the works above, our method learns how to cluster in a topdown manner, based on a detectionsegmentation paradigm. This allows the model to handle clusters with complicated structures.
Graph Convolutional Networks
Graph Convolutional Networks (GCNs) [15] extend CNNs to process graphstructured data. Existing work has shown the advantages of GCNs, such as the strong capability of modeling complex graphical patterns. On various tasks, the use of GCNs has led to considerable performance improvement [15, 9, 26, 29]. For example, Kipf et al. [15] applied the GCNs to semisupervised classification. Hamilton et al. [9] leveraged GCNs to learn feature representations. Berg et al. [26] showed that GCNs are superior to other methods in link prediction. Yan et al. [29] employed GCNs to model human joints for skeletonbased action recognition.
In this paper, we adopt GCN as the basic machinery to capture cluster patterns on an affinity graph. To our best knowledge, this is the first work that uses GCN to learn how to cluster in a supervised way.
3 Methodology
In largescale face clustering, the complex variations of the cluster patterns become the main challenge for further performance gain. To tackle the challenge, we explore a supervised approach, that is, to learn the cluster patterns based on graph convolutional networks. Specifically, we formulate this as a joint detection and segmentation problem on an affinity graph.
Given a face dataset, we extract the feature for each face image with a trained CNN, forming a set of features , where is a
dimensional vector. To construct the affinity graph, we regard each sample as a vertex and use cosine similarity to find
nearest neighbors for each sample. By connecting between neighbors, we obtain an affinity graph for the whole dataset. Alternatively, the affinity graph can also be represented by a symmetric adjacent matrix , where the element is the cosine similarity between and if two vertices are connected, or zero otherwise. The affinity graph is a largescale graph with millions of vertices. From such a graph, we desire to find clusters that have the following properties: (1) different clusters contain the images with different labels; and (2) images in one cluster are with the same label.3.1 Framework Overview
As shown in Figure 2, our clustering framework consists of three modules, namely proposal generator, GCND, and GCNS. The first module generates cluster proposals, i.e
., subgraphs likely to be clusters, from the affinity graph. With all the cluster proposals, we then introduce two GCN modules, GCND and GCNS, to form a twostage procedure, which first selects highquality proposals and then refines the selected proposals by removing the noises therein. Specifically, GCND performs cluster detection. Taking a cluster proposal as input, it evaluates how likely the proposal constitutes a desired cluster. Then GCNS performs the segmentation to refine the selected proposals. Particularly, given a cluster, it estimates the probability of being noise for each vertex, and prunes the cluster by discarding the outliers. According to the outputs of these two GCNs, we can efficiently obtain highquality clusters.
3.2 Cluster Proposals
Instead of processing the large affinity graph directly, we first generate cluster proposals. It is inspired by the way of generating region proposals in object detection [7, 6]. Such a strategy can substantially reduce the computational cost, since in this way, only a limited number of cluster candidates need to be evaluated. A cluster proposal is a subgraph of the affinity graph . All the proposals compose a set . The cluster proposals are generated based on supervertices, and all the supervertices form a set . In this section, we first introduce the generation of supervertex, and then devise an algorithm to compose cluster proposals thereon.
SuperVertex. A supervertex is a subgraph containing a small number of vertices that are closely connected to each other. Hence, it is natural to use connected components to represent supervertex. However, the connected component directly derived from the graph can be overly large. To maintain high connectivity within each supervertice, we remove those edges whose affinity values are below a threshold and constrain the size of supervertices below a maximum . Alg. 1 shows the detailed procedure to produce the supervertex set . Generally, an affinity graph with vertices can be partitioned into supervertices, with each containing vertices on average.
Proposal Generation. Compared with the desired clusters, the supervertex is a conservative formation. Although the vertices in a supervertex are highly possible to describe the same person, the samples of a person may distribute into several supervertices. Inspired by the multiscale proposals in object detection [7, 6], we design an algorithm to generate multiscale cluster proposals. As Alg. 2 shows, we construct a higherlevel graph on top of the supervertices, with the centers of supervertices as the vertices and the affinities between these centers as the edges. With this higherlevel graph, we can apply Alg. 1 again and obtain proposals of larger sizes. By iteratively applying this construction for times, we obtain proposals with multiple scales.
3.3 Cluster Detection
We devise GCND, a module based on a graph convolutional network (GCN), to select highquality clusters from the generated cluster proposals. Here, the quality is measured by two metrics, namely IoU and IoP scores. Given a cluster proposal , these scores are defined as
(1) 
where is the groundtruth set comprised all the vertices with label , and is the majority label of the cluster , i.e. the label that occurs the most in . Intuitively, IoU reflects how close is to the desired groundtruth ; while IoP reflects the purity, i.e. the proportion of vertices in that are with the majority label .
Design of GCND.
We assume that high quality clusters usually exhibit certain structural patterns among the vertices. We introduce a GCN to identify such clusters. Specifically, given a cluster proposal , the GCN takes the visual features associated with its vertices (denoted as ) and the affinity submatrix (denoted as ) as input, and predicts both the IoU and IoP scores.
The GCN networks consist of layers and the computation of each layer can be formulated as:
(2) 
where is a diagonal degree matrix. contains the embeddings of the th layer. is a matrix to transform the embeddings and
is the nonlinear activation function (
ReLU is chosen in this work). Intuitively, this formula expresses a procedure of taking weighted average of the embedded features of each vertex and its neighbors, transforming them with , and then feeding them through a nonlinear activation. This is similar to a typical block in CNN, except that it operates on a graph with arbitrary topology. On the toplevel embeddings, we apply a max pooling over all the vertices in
, and obtain a feature vector that provides an overall summary. Two fullyconnected layers are then employed to predict the IoU and IoP scores, respectively.Training and Inference.
Given a training set with class labels, we can obtain the groundtruth IoU and IoP scores following Eq.(1) for each cluster proposal . Then we train the GCND module, with the objective to minimize the mean square error(MSE) between groundtruth and predicted scores. We experimentally show that, without any fancy techniques, GCN can give accurate prediction. During inference, we use the trained GCND to predict both the IoU and IoP scores for each proposal. The IoU scores will be used in sec. 3.5 to first retain proposals with high IoU. The IoP scores will be used in the next stage to determine whether a proposal needs to be refined.
3.4 Cluster Segmentation
The top proposals identified by GCND may not be completely pure. These proposals may still contain a few outliers, which need to be eliminated. To this end, we develop a cluster segmentation module, named GCNS, to exclude the outliers from the proposal.
Design of GCNS.
The structure of GCNS is similar to that of GCND. The differences mainly lie in the values to be predicted. Instead of predicting quality scores of an entire cluster , GCNS outputs a probability value for each vertex to indicate how likely it is a genuine member instead of an outlier.
Identifying Outliers
To train the GCNS, we need to prepare the groundtruth, i.e. identifying the outliers. This is nontrivial. A natural way is to treat all the vertices whose labels are different from the majority label as outliers. However, as shown in Fig. 3, this way may encounter difficulties for a proposal that contains an almost equal number of vertices that belong to two different classes. To avoid overfitting to manually defined outliers, we encourage the model to learn different segmentation patterns. As long as the segmentation result contains vertices from one class, no matter it is majority label or not, it is regarded as a reasonable solution. Specifically, we randomly select a vertex in the proposal as the seed. The vertices that have the same label with the seed are regarded as the positive vertices while others are considered as outliers. We apply this scheme multiple times with randomly chosen seeds and thus acquire multiple training samples from each proposal that may be annotated differently.
Training and Inference.
With the process above, we can prepare a set of training samples from the retained proposals. Each sample contains a set of feature vectors, each for a vertex, an affinity matrix, as well as a binary vector to indicate whether the vertices are positive or not. Then we train the GCNS module, using the vertexwise binary crossentropy as the loss function. During inference, we also draw multiple hypotheses for a generated cluster proposal, and only keep the predicted results that have the most positive vertices (with a threshold of
). This strategy avoids being misled by the case where a vertex associated with very few positive counterparts is chosen as the seed.We only feed the proposals with IoP between and to GCNS. Because when the proposal is very pure, the outliers are usually hard examples that need not be removed. When the proposal is very impure, it is probable that none of the classes dominate, therefore the proposal might not be suitable to be processed by GCNS. With the GCNS predictions, we remove the outliers from the proposals.
3.5 DeOverlapping
The three stages described above result in a collection of clusters. However, it is still possible that different clusters may overlap, i.e. sharing certain vertices. This may cause an adverse effect to the face recognition training performed thereon. Here, we propose a simple and fast deoverlapping algorithm to tackle this problem. Specifically, we first rank the cluster proposals in descending order of IoU scores. We sequentially collect the proposals from the ranked list, and modify each proposal by removing the vertices seen in preceding ones. The detailed algorithm is described in Alg. 3.
Compared to the NonMaximum Suppression (NMS) in object detection, the deoverlapping method is more efficient. Particularly, the former has a complexity of , while the latter has . This process can be further accelerated by setting a threshold of IoU for deoverlapping.
4 Experiments
4.1 Experimental Settings
Training set. MSCeleb1M [8] is a largescale face recognition dataset consists of identities, and each identity has about facial images. As the original identity labels are obtained automatically from webpages and thus are very noisy. We clean the labels based on the annotations from ArcFace [3], yielding a reliable subset that contains images from classes. The cleaned dataset is randomly split into parts with an almost equal number of identities. Each part contains identities with around images. We randomly select part as labeled data and the other parts as unlabeled data. Youtube Face Dataset [28] contains videos, from which we extract frames for evaluation. Particularly, we use frames with identities for training and the other images with identities for testing.
Testing set. MegaFace [14] is the largest public benchmark for face recognition. It includes a probe set from FaceScrub [21] with images and a gallery set containing images. IJBA [16] is another face recognition benchmark containing images from identities.
Metrics. We assess the performance on two tasks, namely face clustering and face recognition. Face clustering is to cluster all the images of the same identity into a cluster, where the performance is measured by pairwise recall and pairwise
precision. To consider both precision and recall, we report the widely used
Fscore, i.e., the harmonic mean of precision and recall. Face recognition is evaluated with
face identification benchmark in MegaFace and face verification protocol of IJBA. We adopt top identification hit rate in MegaFace, which is to rank the top image from the gallery images and compute the top hit rate. For IJBA, we adopt the protocol of face verification, which is to determine whether two given face images are from the same identity. We use true positive rate under the condition that the false positive rate is for evaluation.Implementation Details. We use GCN with two hidden layers in our experiments. The momentum SGD is used with a start learning rate . Proposals are generated by and as in Alg. 1.
Method  #clusters  Precision  Recall  Fscore  Time 
Kmeans  1000  60.65  59.6  60.12  39min 
HAC  3621  99.64  87.31  93.07  1h 
CDP  3081  98.32  89.84  93.89  175s 
Ours  2369  96.75  92.27  94.46  537s 
4.2 Method Comparison
4.2.1 Face Clustering
We compare the proposed method with a series of clustering baselines. These methods are briefly described below.
(1) Kmeans [19], the most commonly used clustering algorithm. With a given number of clusters
, Kmeans minimizes the total intracluster variance.
(2) DBSCAN [4], a densitybased clustering algorithm. It extracts clusters according to a designed density criterion and leaves the sparse background as noises.
(3) HAC [24], hierarchical agglomerative clustering is a bottomup approach to iteratively merge close clusters based on some criteria.
(4) Approximate Rank Order [1], develops an algorithm as a form of HAC. It only performs one iteration of clustering with a modified distance measure.
(5) CDP [30], a recent work that proposes a graphbased clustering algorithm. It better exploits the pairwise relationship in a bottomup manner.
(6) GCND, the first module of the proposed method. It applies a GCN to learn cluster pattern in a supervised way.
(7) GCND + GCNS, the twostage version of the proposed method. GCNS is introduced to refine the output of GCND, which detects and discards noises inside clusters.
Results
To control the experimental time, we randomly select one part of the data for evaluation, containing images of identities. Tab. 1 compares the performance of different methods on this set. The clustering performance is evaluated by both Fscore and the time cost. We also report the number of clusters, pairwise precision and pairwise recall for better understanding the advantages and disadvantages of each method.
The results show: (1) For Kmeans, the performance is influenced greatly by the number of clusters . We vary in a range of numbers and report the result with high Fscore. (2) DBSCAN reaches a high precision but suffers from the low recall. It may fail to deal with large density differences in largescale face clustering. (3) HAC gives more robust results than previous methods. Note that the standard algorithm consumes memory, which goes beyond the memory capacity when is as large as . We use an adapted hierarchical clustering [20] for comparison, which requires only memory. (4) Approximate Rank Order is very efficient due to its one iteration design, but the performance is inferior to other methods in our setting. (5) As a recent work designed to exploit unlabeled data for face recognition, CDP achieves a good balance of precision and recall. For a fair comparison, we compare with the single model version of CDP. Note that the idea of CDP and our approach are complementary, which can be combined to further improve the performance. (6) Our method applies GCN to learn cluster patterns. It improves the precision and recall simultaneously. Tab. 2 demonstrates that our method is robust and can be applied to datasets with different distributions. Since the GCN is trained using multiscale cluster proposals, it may better capture the properties of the desired clusters. As shown in Fig. 8, our method is capable of pinpointing some clusters with complex structure. (7) The GCNS module further refines the cluster proposals from the first stage. It improves the precision by sacrificing a little recall, resulting in the overall performance gain.
Runtime Analysis
The whole procedure of our method takes about , where generating proposals takes up to on a CPU and the inference of GCND and GCNS takes and respectively on a GPU with the batch size of . To compare the runtime fairly, we also test all our modules on CPU. Our method takes in total on CPU, which is still faster than most methods we compared. The speed gain of using GPU is not very significant in this work, as the main computing cost is on GCN. Since GCN relies on sparse matrix multiplication, it cannot make full use of GPU parallelism. The runtime of our method grows linearly with the number of unlabeled data and the process can be further accelerated by increasing batch size or parallelizing with more GPUs.
4.2.2 Face Recognition
With the trained clustering model, we apply it to unlabeled data to obtain pseudo labels. We investigate how the unlabeled data with pseudo labels enhance the performance of face recognition. Particularly, we follow the following steps to train face recognition models: (1) train the initial recognition model with labeled data in a supervised way; (2) train the clustering model on the labeled set, using the feature representation derived from the initial model; (3) apply the clustering model to group unlabeled data with various amounts (1, 3, 5, 7, 9 parts), and thus attach to them “pseudolabels”; and (4) train the final recognition model using the whole dataset, with both original labeled data and the others with assigned pseudolabels. The model trained only on the 1 part labeled data is regarded as the lower bound, while the model supervised by all the parts with groundtruth labels serves as the upper bound in our problem. For all clustering methods, each unlabeled image belongs to an unique cluster after clustering. We assign a pseudo label to each image as its cluster id.
Fig. 5 indicates that performance of face clustering is crucial for improving face recognition. For Kmeans and HAC, although the recall is good, the low precision indicates noisy predicted clusters. When the ratio of unlabeled and labeled data is small, the noisy clusters severely impair face recognition training. As the ratio of unlabeled and labeled data increases, the gain brought by the increase of unlabeled data alleviates the influence of noise. However, the overall improvement is limited. Both CDP and our approach benefit from the increase of the unlabeled data. Owing to the performance gain in clustering, our approach outperforms CDP consistently and improve the performance of face recognition model on MegaFace from to , which is close to the fully supervised upper bound ().
4.3 Ablation Study
We randomly select one part of the unlabeled data, containing images of identities, to study some important design choices in our framework.
4.3.1 Proposal Strategies
Cluster proposals generation is the fundamental module in our framework. With a fixed and different , and , we generate a large number of proposals with multiple scales. Generally, a larger number of proposals result in a better clustering performance. There is a tradeoff between performance and computational cost in choosing the proper number of proposals. As illustrated in Fig. 4, each point represents the Fscore under certain number of proposals. Different colors imply different iteration steps. (1) When , only the supervertices generated by Alg. 1 will be used. By choosing different , more proposals are obtained to increase the Fscore. The performance gradually saturates as the number increases beyond . (2) When , different combinations of supervertices are added to the proposals. Recall that it leverages the similarity between supervertices, thus it enlarges the receptive field of the proposals effectively. With a small number of proposals added, it boosts the Fscore by . (3) When , it further merges similar proposals from previous stages to create proposals with larger scales, which continues to contribute the performance gain. However, with the increasing proposal scales, more noises will be introduced to the proposals, hence the performance gain saturates.
Method  Channels  Pooling  Vertex Feature  Fscore 
a  128, 32  mean  ✓  76.97 
b  128, 32  sum  ✓  53.75 
c  128, 32  max  ✓  83.61 
d  128, 32  max  ×  73.06 
e  256, 64  max  ✓  84.16 
f  256, 128, 64  max  ✓  77.95 
4.3.2 Design choice of GCND
Although the training of GCNs does not require any fancy techniques, there are some important design choices. As Tabs. 3a, 3b and 3c indicate, the pooling method has large influence on the Fscore. Both mean pooling and sum pooling impair the clustering results compared with max pooling. For sum pooling, it is sensitive to the number of vertices, which tends to produce large proposals. Large proposals result in a high recall() but low precision (), ending up with a low Fscore. On the other hand, mean pooling better describes the graph structures, but may suffer from the outliers in the proposal. Besides the pooling methods, Tabs. 3c and 3d show that lacking vertex feature will significantly reduce the GCNs’ prediction accuracy. It demonstrates the necessity of leveraging both vertex feature and graph structure during GCN training. In addition, as shown in Tabs. 3c, 3e and 3f, widening the channels of GCNs can increase its expression power but the deeper network may drive the hidden feature of vertices to be similar, resulting in an effect like mean pooling.
4.3.3 GcnS
In our framework, GCNS is used as a denosing module after GCND. However, it can act as an independent module to combine with previous methods. Given the clustering results of Kmeans, HAC and CDP, we regard them as the cluster proposals and feed them into the GCNS. As Fig. 7 shows, GCNS can improve their clustering performances by discarding the outliers inside clusters, obtaining a performance gain around for various methods.
4.3.4 Postprocess strategies
NMS is a widely used postprocessing technique in object detection, which can be an alternative choice of deoverlapping. With a different threshold of IoU, it keeps the proposal with highest predicted IoU while suppressing other overlapped proposals. The computational complexity of NMS is . Compared with NMS, deoverlapping does not suppress other proposals and thus retains more samples, which increases the clustering recall. As shown in Fig. 7, deoverlapping achieves better clustering performance and can be computed in linear time.
5 Conclusions
This paper proposes a novel supervised face clustering framework based on graph convolution network. Particularly, we formulate clustering as a detection and segmentation paradigm on an affinity graph. The proposed method outperforms previous methods on face clustering by a large margin, which consequently boosts the face recognition performance close to the supervised result. Extensive analysis further demonstrate the effectiveness of our framework.
Acknowledgement This work is partially supported by the Collaborative Research grant from SenseTime Group (CUHK Agreement No. TS1610626 & No. TS1712093), the Early Career Scheme (ECS) of Hong Kong (No. 24204215), the General Research Fund (GRF) of Hong Kong (No. 14236516, No. 14203518 & No. 14241716), and Singapore MOE AcRF Tier 1 (M4012082.020).
References
 [1] Clustering millions of faces by identity. TPAMI, 40(2):289–303, 2018.
 [2] J. Cui, F. Wen, R. Xiao, Y. Tian, and X. Tang. Easyalbum: an interactive photo annotation system based on face clustering and reranking. In SIGCHI. ACM, 2007.
 [3] J. Deng, J. Guo, and S. Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. arXiv preprint arXiv:1801.07698, 2018.
 [4] M. Ester, H.P. Kriegel, J. Sander, X. Xu, et al. A densitybased algorithm for discovering clusters in large spatial databases with noise. In KDD, 1996.

[5]
T. Finley and T. Joachims.
Supervised clustering with support vector machines.
In ICML. ACM, 2005.  [6] R. Girshick. Fast rcnn. In ICCV, 2015.
 [7] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
 [8] Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao. Msceleb1m: A dataset and benchmark for largescale face recognition. In ECCV. Springer, 2016.
 [9] W. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on large graphs. In NeurIPS, 2017.
 [10] K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask rcnn. In ICCV. IEEE, 2017.
 [11] J. Ho, M.H. Yang, J. Lim, K.C. Lee, and D. Kriegman. Clustering appearances of objects under varying illumination conditions. In CVPR, 2003.
 [12] A. K. Jain. Data clustering: 50 years beyond kmeans. Pattern recognition letters, 31(8):651–666, 2010.

[13]
L. Kaufman and P. J. Rousseeuw.
Finding groups in data: an introduction to cluster analysis
, volume 344. John Wiley & Sons, 2009.  [14] I. KemelmacherShlizerman, S. M. Seitz, D. Miller, and E. Brossard. The megaface benchmark: 1 million faces for recognition at scale. In CVPR, 2016.
 [15] T. N. Kipf and M. Welling. Semisupervised classification with graph convolutional networks. In ICLR, 2017.
 [16] B. F. Klare, B. Klein, E. Taborsky, A. Blanton, J. Cheney, K. Allen, P. Grother, A. Mah, and A. K. Jain. Pushing the frontiers of unconstrained face detection and recognition: Iarpa janus benchmark a. In CVPR, 2015.
 [17] W.A. Lin, J.C. Chen, C. D. Castillo, and R. Chellappa. Deep density clustering of unconstrained faces. In CVPR, 2018.
 [18] W.A. Lin, J.C. Chen, and R. Chellappa. A proximityaware hierarchical clustering of faces. In FG. IEEE, 2017.
 [19] S. Lloyd. Least squares quantization in pcm. IEEE transactions on information theory, 28(2):129–137, 1982.
 [20] D. Müllner et al. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53(9):1–18, 2013.
 [21] H.W. Ng and S. Winkler. A datadriven approach to cleaning large face datasets. In ICIP. IEEE, 2014.
 [22] F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In CVPR, 2015.
 [23] Y. Shi, C. Otto, and A. K. Jain. Face clustering: representation and pairwise constraints. IEEE Transactions on Information Forensics and Security, 13(7):1626–1640, 2018.
 [24] R. Sibson. Slink: an optimally efficient algorithm for the singlelink cluster method. The computer journal, 16(1):30–34, 1973.
 [25] Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identificationverification. In NeurIPS, 2014.
 [26] R. van den Berg, T. N. Kipf, and M. Welling. Graph convolutional matrix completion. stat, 1050:7, 2017.
 [27] H. Wang, Y. Wang, Z. Zhou, X. Ji, Z. Li, D. Gong, J. Zhou, and W. Liu. Cosface: Large margin cosine loss for deep face recognition. In CVPR, 2018.
 [28] L. Wolf, T. Hassner, and I. Maoz. Face recognition in unconstrained videos with matched background similarity. IEEE, 2011.
 [29] S. Yan, Y. Xiong, and D. Lin. Spatial temporal graph convolutional networks for skeletonbased action recognition. In AAAI, 2018.
 [30] X. Zhan, Z. Liu, J. Yan, D. Lin, and C. C. Loy. Consensusdriven propagation in massive unlabeled data for face recognition. In ECCV, 2018.
 [31] M. Zhao, Y. W. Teo, S. Liu, T.S. Chua, and R. Jain. Automatic person annotation of family photo album. In ICIVR. Springer, 2006.
Comments
There are no comments yet.