Person re-identification (re-ID) aims to match persons in an image gallery collected from non-overlapping camera networks. Despite of the impressive progress of supervised methods in person re-ID  , models trained in one domain often fail to generalize well to others due to the change of camera configurations, lighting conditions, person views, etc. Domain adaptive re-ID methods that can work across domains remain a very open research challenge.
To implement domain adaptive re-ID, unsupervised domain adaptation (UDA) methods have been widely explored , , , , , , , , , . One major line of UDA methods attempts to align the feature distributions of source and target domains , . Another line of methods utilizes adversarial generative models as a style transformer to convert pedestrian images (with identity annotations) of a source domain into a target domain , , , . The style-transferred images are then used to train a re-ID model in the target domain. Many UDA methods preserve discriminative information across domains or camera styles, but they largely ignore the unlabeled samples and so the substantial sample distributions in target domains. Recent approaches ,  alleviate this problem by predicting pseudo-labels in target domains. They leverage the cluster (pseudo) labels for model fine-tuning directly but are often susceptible to noises and hard samples. This prevents them from maximizing model discrimination capacity in target domains.
In this paper, we propose an innovative augmented discriminative clustering (AD-Cluster) technique for domain adaptive person re-ID. AD-Cluster aims to maximize model discrimination capacity in the target domain by alternating discriminative clustering and sample generation as illustrated in Fig. 1. Specifically, density-based clustering first predicts sample clusters in the target domain where sample features are extracted by a re-ID model that is pre-trained in the source domain. AD-Cluster then learns through two iterative processes. First, an image generator keeps translating the clustered images to other cameras to augment the training samples while retaining the original pseudo identity labels (i.e. cluster labels). Second, a feature encoder keeps learning to maximize the inter-cluster distance while minimizing the intra-cluster distance in feature space. The image generator and the feature encoder thus compete in an adversarial min-max manner which iteratively estimate cluster labels and optimize re-ID models. Finally, AD-Cluster aggregates the discrimination ability of re-ID models through such adversarial learning and optimization.
The main contributions of this paper can be summarized in three aspects. First, it proposes a novel discriminative clustering method that addresses domain adaptive person re-ID by density-based clustering, adaptive sample augmentation, and discriminative feature learning. Second, it designs an adversarial min-max optimization strategy that increases the intra-cluster diversity and enforces discrimination ability of re-ID models in target domains simultaneously. Third, it achieves significant performance gain over the state-of-the-art on two widely used re-ID datasets: Market-1501 and DukeMTMC-reID.
2 Related Works
While person re-ID has been extensively investigated from various perspectives, we mainly review the domain adaptive person re-ID approaches, which are largely driven by unsupervised domain adaptation (UDA) methods.
2.1 Unsupervised Domain Adaptation (UDA)
Domain alignment. UDA defines a learning problem where source domains are fully labeled while sample labels in target domains are totally unknown. To learn discriminative modes in target domains, early methods focus on learning feature/sample mapping between source and target domains , . As an representative method, correlation alignment (CORAL) 
pursued minimizing domain shift by aligning the mean and co-variance of source and target distributions. Recent methods, ,  attempted reducing the domain shift by using generative adversarial networks (GANs) to learn a pixel-level transformation. The most representative CYCADA  transferred samples across domains at both pixel- and feature-level.
Domain-invariant features. The second line of UDA methods focuses on finding domain-invariant feature spaces , , , , , , . To fulfill this purpose, Long et al. ,  proposed the Maximum Mean Discrepancy (MMD), which maps features of both domains into the same Hilbert space. Ganin et al.  and Ajakan et al.  designed domain confusion loss to learn domain-invariant features. Saito et al. 
proposed aligning distributions of source and target domains by maximizing the discrepancy of classifiers’ outputs.
Pseudo-label prediction. Another line of UDA methods involves learning representations in target domains by using the predicted pseudo-label. In general, this approach uses an alternative estimation strategy: predicting pseudo-labels of samples by simultaneous modelling and optimizing the model using predicted pseudo-labels , , , 
. In the deep learning era, clustering loss has been designed for CNNs and jointly learning of features, image clusters, and re-ID models in an alternative manner, , , , , , .
2.2 UDA for Person re-ID
To implement domain adaptive person re-ID, researchers largely referred to the above reviewed UDA methods by incorporating the characteristics of person images.
Domain alignment. In , Lin et al. proposed minimizing the distribution variation of the source’s and the target’s mid-level features based on Maximum Mean Discrepancy (MMD) distance. Wang et al.  utilized additional attribute annotations to align feature distributions of source and target domains in a common space. Other works enforced camera in-variance by learning consistent pairwise similarity distributions  or reducing the discrepancy between both domains and cameras .
GAN-based methods have been extensively explored for domain adaptive person re-ID , , , , . HHL  simultaneously enforced cameras invariance and domain connectedness to improve the generalization ability of models on the target set. PTGAN , SPGAN , ATNet , CR-GAN  and PDA-Net  transferred images with identity labels from source into target domains to learn discriminative models.
By aligning feature and/or appearance, the above methods can preserve well the discriminative information from source domains; however, they largely ignore leveraging the unlabeled samples in target domains, which hinder them from maximizing the model discrimination capacity.
Pseudo-label prediction. Recently, the problem about how to leverage the large number of unlabeled samples in target domains has attracted increasing attention , , , , , . Clustering , , ,  and graph matching  methods have been explored to predict pseudo-labels in target domains for discriminative model learning. Reciprocal search  and exemplar-invariance approaches  were proposed to refine pseudo labels, taking camera-invariance into account concurrently.
Existing approaches have explored cluster distributions in the target domain. On the other hand, they still face the challenge on how to precisely predict the label of hard samples. The hard/difficult samples are crucial to a discriminative re-ID model but they often confuse clustering algorithms. We address this issues by iteratively generating and including diverse and representative samples in the target domain, which enforces the discrimination capability of re-ID models effectively.
3 The Proposed Approach
Under the context of unsupervised domain adaptation (UDA) for person re-ID, we have a fully labeled source domain that contains person images of identities in total in the source domain. and denote the sample images and identities in the source domain, respectively, where each image is associated with an identity . In addition, we have an unlabeled target domain that contains person images. The identities of images in the target domain are unavailable. The goal of AD-Cluster is to learn a re-ID model that generalizes well in the target domain by leveraging labeled samples in the source domain and unlabeled samples in the target domain.
AD-Cluster consists of two networks including a CNN as the feature encoder and a Generative Adversarial Network (GAN) as the image generator as shown in Fig. 2. The encoder is first trained using labeled samples in the source domain with cross-entropy loss and triplet loss . In the target domain, unlabelled sample are represented by features that are extracted by , where density-based clustering groups them to clusters and uses the cluster IDs as the pseudo-labels of the clustered samples. With each camera being a new domain with different styles, translates each sample of the target domain to other cameras and this generates identity-preserving samples with increased diversity. After that, all samples in the target domain together with those generated are fed to re-train the feature encoder . The generator and encoder thus learn in an adversarial min-max manner iteratively, where keeps generating identity-preservative samples to maximize the intra-cluster variations in the sample space whereas learns discriminative representation to minimize the intra-cluster variations in the feature space as illustrated in Fig. 1.
3.2 UDA Procedure
For a batch of samples, the classification loss is defined by
where , and denote the number of images in a batch, image index and source domain, respectively.
is the predicted probability of imagebelonging to .
The ranking triplet loss is defined as
where denotes the samples belonging to the same person with . denotes the samples belonging to different persons with . is a margin parameter .
Density-based clustering in target domain: In each learning iteration, density-based clustering  is employed in the target domain for pseudo-label prediction. The clustering procedure includes three steps: (1) Extracting convolutional features for all person images. (2) Computing a distance matrix with k-reciprocal encoding  for all training samples and then performing density-based clustering to assign samples into different groups. (3) Assigning pseudo-labels to the training samples according to the groups they belong to.
Adaptive sample augmentation across cameras: Due to the domain gap, the pseudo-labels predicted by density-based clustering suffer from noises. In addition, the limited number of training samples in the target domain often leads to the low diversity of samples in each cluster. These two factors make it difficult to learn discriminative representation in the target domain.
To address these issues, we propose to augment samples in the target domain with a GAN to aggregate sample diversity. The used GAN should possess the following two properties: (1) Generating new person images from existing ones while preserving the original identities; (2) Providing additional invariance such as camera configurations, lighting conditions, and person views.
To fulfill these purposes, we employ StarGAN  to augment person images which can preserve the person identities while generating new images in multiple camera styles. The image generation procedure roots in the results of density-based clustering. Suppose there are cameras in total in the target domain. A StarGAN model is first trained which enables image-image translation between each camera pair. Using the learned StarGAN model, for an image with pseudo-label , we generate augmented images , which have the pseudo-label with and similar styles as the images in camera , respectively. In this way, the sample number in each cluster increases by a factor of . The augmented images together with original images in target domain are used for discriminative feature learning, according to Eq. 3.
3.3 Min-Max Optimization
Although the adaptive sample augmentation enforces the discrimination ability of re-ID models, the sample generation procedure is completely independent from the clustering and feature learning which could lead to insufficient sample diversity across cameras.
To fuse the adaptive data augmentation with discriminative feature learning, we propose an adversarial min-max optimization strategy as illustrated in Fig. 3. Specifically, we alternatively train an image generator and a feature encoder that maximize sample diversity and minimize intra-cluster distance for each mini-batch, respectively.
Max-Step: Star-GAN  is employed as an image generator () for a given feature encoder (). In the procedure, the summation of Euclidean distances between samples and their cluster centers is defined as cluster diversity . For each sample, the diversity is defined as
where indicates whether sample and belong to the same person or not. when , otherwise .
For a batch of sample, a diversity loss is defined as
where is hyper-parameter. We use a negative exponent function to prevent from growing too large so as to preserve the identity of the augmented person images. According to Eq. 4 and Eq. 5, maximizing the sample diversity in a cluster is equal to minimizing the loss, as
is combined with loss of StarGAN to optimize the generator while augmenting samples.
Min-Step: Given a fixed generator , the feature encoder learns to minimize the intra-cluster distance while maximizing inter-cluster distance in feature space under the constraint of triplet loss, which is defined as
where denotes the samples belonging to the same cluster with . denotes the samples belonging to different clusters with . is a margin parameter. Specifically, we choose all the positive samples and the hardest negative sample to construct the triplets for each anchor sample, with a mini-batch of both original and generated sample images. The objective function is defined by
When keeps producing more diverse samples with features far away from the cluster centers, will be equipped with stronger discrimination ability in the target domain, as illustrated in Fig. 4. Algorithm 1 shows the detailed training procedure of the proposed AD-Cluster.
|Methods||DukeMTMC-reID Market-1501||Market-1501 DukeMTMC-reID|
DukeMTMC-reID, the proposed AD-Cluster significantly outperforms all state-of-the-art methods over all evaluation metrics. The top-three results are highlighted with bold, italic, and underline fonts, respectively.
We detail the implementation and evaluation of AD-Cluster. During the evaluation, ablation studies, parameter analysis, and comparisons with other methods are provided.
4.1 Datasets and Evaluation Metrics
The experiments were conducted over two public datasets Market1501  and DukeMTMC-ReID   by using the evaluation metrics Cumulative Matching Characteristic (CMC) curve and mean average precision (mAP).
Market1501 : This dataset contains 32,668 images of 1,501 identities from 6 disjoint surveillance cameras. Of the 32,668 person images, 12,936 images from 751 identities form a training set, 19,732 images from 750 identities (plus a number of distractors) form a gallery set, and 3,368 images from 750 identities form a query set.
DukeMTMC-ReID  : This dataset is a subset of the DukeMTMC. It consists of 16,522 training images, 2,228 query images, and 17,661 gallery images of 1,812 identities captured using 8 cameras. Of the 1812 identities, 1,404 appear in at least two cameras and the rest 408 (considered as distractors) appear in only one camera.
4.2 Implementation Details
We adopt the ResNet-50 
as the backbone network and initialize it by using parameters pre-trained on the ImageNet. During training, the input image is uniformly resized to and traditional image augmentation is performed via random flipping and random erasing. For each identity from the training set, a mini-batch of size 256 is sampled with P = 32 randomly selected identities and K = 8 (original to augmented samples ratio = 3:1) randomly sampled images for computing the hard batch triplet loss.
In addition, we set the margin parameter at 0.5 and use the SGD optimizer to train the model. The learning rate is set at and momentum at
. The whole training process consists of 30 iterative min-max clustering process, each of which consists of 70 training epochs.
Our network was implemented on a PyTorch platform and trained using 4 NVIDIA Tesla K80 GPUs (each with 12GB VRAM).
4.3 Comparisons with State-of-the-Arts
that employed unsupervised learning; and 3) nine UDA-based methods including PTGAN, SPGAN , ATNet , CamStyle , HHL , and ECN  that used GANs; MMFA  and TJ-AIDL  that used images attributes; and UDAP  that employed clustering. Table 1 shows the person Re-ID performance while adapting from Market1501 to DukeMTMC-reID and vice versa.
As Table 1 shows, LOMO and BOW using hand-crafted features do not perform well. UMDL , PUL  and CAMEL  derive image features through unsupervised learning, and they perform clearly better than LOMO and BOW under most evaluation metrics. The UDA-based methods further improve the person Re-ID performance in most cases. Specifically, UDAP performs much better than other methods as it employed the distribution of clusters in the target domains.The performance of the UDA methods using GAN is diverse. In particular, ECN performs better than most methods using GANs because it enforces cameras invariance and domain connectedness.
|Methods||DukeMTMC-reID Market-1501||Market-1501 DukeMTMC-reID|
|Supervised Model (upper bound)||91.9||97.4||98.4||81.4||82.8||92.2||94.9||69.8|
In addition, AD-Cluster performs significantly better than all compared methods. As Table 1 shows, AD-Cluster achieves a rank-1 accuracy of and an mAP of for the unsupervised adaptation DukeMTMC-reID Market1501, which outperforms the state-of-the-art (by UDAP) by and , respectively. For Market1501 DukeMTMC-reID, AD-Cluster obtains a rank-1 accuracy of and an mAP of which outperforms the state-of-the-art (by UDAP) by and , respectively.
Note that AD-Cluster improves differently for the two adaptations in reverse directions between the two datasets. This can also be observed for most existing methods as shown in Table 1. We conjecture that this is because the large variance of samples in DukeMTMC-reID caused more clustering noise, which reduces the effectiveness of pseudo-label prediction and hinders the model adaptation.
4.4 Ablation Studies
Extensive ablation studies are performed to evaluate each component of AD-Cluster as shown in Table 2.
Baseline, the Upper and Lower Bounds: We first derive the upper and lower performance bounds for the ablation studies as shown in Table 2. Specifically, the upper bounds of Re-ID performance are derived by the Supervised Models which are trained by using labelled target-domain training images and evaluated over the target-domain test images. The lower performance bounds are derived by the Direct Transfer models which are trained by using the labelled source-domain training images and evaluated over the target-domain test images. We can observe huge performance gaps between the Direct Transfer models and the Supervised Models due to the domain shift. Take the Market-1501 as an example. The rank-1 accuracy of the supervised model reaches up to but it drops significantly to for the directly transferred model which is trained by using the DukeMTMC-reID training images.
In addition, Table 2 gives the performance of Baseline models which are transfer models as trained by iterative density-based clustering as described in . As Table 2 shows, the Baseline model outperforms the Direct Transfer model by a large margin. For example, the rank-1 accuracy improves from to and from to , respectively, while evaluated over the datasets Market1501 and DukeMTMC-reID. This shows that the density-based clustering in the Baseline can group samples of same identities to any irregular distributions by utilizing the density correlation. At the same time, we can observe that there are still large performance gaps between the Baseline models and the Supervised Models, a drop of in mAP while transferring from DukeMTMC-reID to Market1501.
Adaptive Sample Augmentation: We first evaluated the adaptive sample augmentation as described in Section 3.2. For this experiment, we designed a network Baseline+ASA that just incorporates the adaptive sample augmentation into the Baseline that performs transfer via iterative density-based clustering. As shown in Table 2, adaptive sample augmentation improves the re-ID performance significantly. For DukeMTMC-reID Market1501, the Baseline+ASA achieves a rank-1 accuracy of and an mAP of which are higher than the Baseline by and , respectively. The contribution of the proposed sample augmentation can also be observed in the perspective of sample distributions in the feature space as illustrated in Fig. 5(c), where the including of the proposed sample augmentation improves the sample distribution greatly as compared with density-based clustering as shown in Fig. 5(b).
The large performance improvements can be explained by the effectiveness of the augmented samples. Specifically, the iterative injection of ID-preserving cross-camera images helps to reduce the feature distances of person images within the same cluster (, the intra-cluster distances) and increase that of different clusters (, the inter-cluster distances) simultaneously.
Discriminative Learning: We evaluated the the discriminative learning component as described in Section 3.3. For this experiment, we designed a new network Baseline+ASA+DL that further incorporates discriminative learning into the Baseline+ASA network as described in the previous subsection. As shown in Table 2, the incorporation of discriminative learning consistently improves the person Re-ID performance beyond the Baseline+ASA. Take the transfer DukeMTMC-reID Market1501 as an example. The Baseline+ASA+DL achieves a rank-1 accuracy of and an mAP of which outperforms the corresponding Baseline+ASA by and , respectively. The superior performance of the proposed discriminative learning can also be observed intuitively in the perspective of sample distributions in feature space as shown in Fig. 5(d). The effectiveness of the discriminative learning can be largely attributed to the min-max clustering optimization that alternately trains the image generator to generate more diverse samples for maximizing the sample diversity and the feature encoder for minimizing the intra-class distance.
From another perspective, it can be seen that Baseline+ASA+DL (, the complete AD-Cluster model) outperforms the Baseline by up to in rank-1 accuracy and 17% in mAP, respectively. This demonstrates the effectiveness of the proposed ID-preserving cross-camera sample augmentation and discriminative learning in UDA-based person Re-ID. In addition, we can observe that the performance of Baseline+ASA+DL becomes even close to the Supervised Models. For example, the Baseline+ASA+DL achieves a rank-1 accuracy of 86.7% for the transfer DukeMTMC-reID Market-1501 which is only 5.2% lower than the corresponding Supervised Model.
Specificity of AD-Cluster. The performance of the AD-Cluster is related to the sample generation method. In this work, we generate cross-camera images by using Star-GAN which theoretically can be replaced by any other ID-preserving generators. The key is how well the re-ID model can learn camera style in-variance via generating new samples. The AD-Cluster could thus be influenced by two factors: the quality of generated samples and the strength of camera style in-variance of the sample distribution in the target domain. These variances explain the different improvements by AD-Cluster over different adaptation tasks.
The min-max attenuation coefficient in Eq. 5 will affect the ID-preserving min-max clustering and so the person Re-ID performance. We studied this parameter by setting it to different values and checking the person Re-ID performance. Fig. 6 shows experimental results on Market-1501. Using a smaller usually leads to a higher cluster diversity, which further leads to better Re-ID performance. On the other hand, should not be very small for the target of identity preservation. Experiments show that AD-Cluster performs best when .
We also evaluate the accuracy of the pseudo-labels that are predicted during the iterative min-max clustering, as well as how the person Re-ID performance evolves during this process. Fig. 7
(left) shows that the f-score of the predicted pseudo-labels keeps improving during the iterative clustering process. Additionally, the proposed min-max clustering outperforms the density-based clustering significantly in both mAP and rank-1 accuracy as shown in the right graph in Fig. 7.
This paper presents an augmented discriminative clustering (AD-Cluster) method for domain adaptive person re-ID. With density-based clustering, we introduce adaptive sample augmentation to generate more diverse samples and a min-max optimization scheme to learn more discriminative re-ID model. Experiments demonstrates the effectiveness of adaptive sample augmentation and min-max optimization for improving the discrimination ability of deep re-ID model. Our approach not only produces a new state-of-the-art in UDA accuracy on two large-scale benchmarks but also provides a fresh insight for general UDA problems. We expect that the proposed AD-Cluster will inspire new insights and attract more interests for better UDA-based recognition  and detection  in the near future.
This work is partially supported by grants from the National Key R&D Program of China under grant 2017YFB1002400, the National Natural Science Foundation of China under contract No. 61825101, No. U1611461, No. 61836012 and No. 61972217.
-  Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, and Mario Marchand. Domain-adversarial neural networks. CoRR, abs/1412.4446, 2014.
-  Konstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, and Dilip Krishnan. Unsupervised pixel-level domain adaptation with generative adversarial networks. In IEEE CVPR, pages 95–104, 2017.
-  Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning of visual features. In ECCV, 2018.
-  Minmin Chen, Kilian Q. Weinberger, and John Blitzer. Co-training for domain adaptation. In NeurIPS, pages 2456–2464, 2011.
-  Tianlong Chen, Shaojin Ding, Jingyi Xie, Ye Yuan, Wuyang Chen, Yang Yang, Zhou Ren, and Zhangyang Wang. Abd-net: Attentive but diverse person re-identification. In IEEE ICCV, pages 8351–8361, 2019.
-  Yanbei Chen, Xiatian Zhu, and Shaogang Gong. Instance-guided context rendering for cross-domain person re-identification. In IEEE ICCV, 2019.
Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul
Stargan: Unified generative adversarial networks for multi-domain image-to-image translation.In IEEE CVPR, 2018.
Adam Coates and Andrew Y. Ng.
Learning feature representations with k-means.In Neural Networks: Tricks of the Trade - Second Edition, volume 7700, pages 561–580. Springer, 2012.
-  Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. Imagenet: A large-scale hierarchical image database. In IEEE CVPR, 2009.
-  Weijian Deng, Liang Zheng, Qixiang Ye, Guoliang Kang, Yi Yang, and Jianbin Jiao. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In IEEE CVPR, 2018.
Alexey Dosovitskiy, Jost Tobias Springenberg, Martin A. Riedmiller, and Thomas
Discriminative unsupervised feature learning with convolutional neural networks.In NeurIPS, pages 766–774, 2014.
-  Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226–231, 1996.
-  Hehe Fan, Liang Zheng, Chenggang Yan, and Yi Yang. Unsupervised person re-identification: Clustering and fine-tuning. TOMCCAP, 14(4):83:1–83:18, 2018.
-  Hehe Fan, Liang Zheng, and Yi Yang. Unsupervised person re-identification: Clustering and fine-tuning. CoRR, abs/1705.10444, 2017.
-  Yang Fu, Yunchao Wei, Guanshuo Wang, Yuqian Zhou, Honghui Shi, and Thomas S. Huang. Self-similarity grouping: A simple unsupervised cross domain adaptation approach for person re-identification. In IEEE ICCV, 2019.
Yaroslav Ganin and Victor S. Lempitsky.
Unsupervised domain adaptation by backpropagation.In Francis R. Bach and David M. Blei, editors, ICML, volume 37, pages 1180–1189, 2015.
-  Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor S. Lempitsky. Domain-adversarial training of neural networks. J. Mach. Learn. Res., 17:59:1–59:35, 2016.
-  Kamran Ghasedi, Xiaoqian Wang, Cheng Deng, and Heng Huang. Balanced self-paced learning for generative adversarial clustering network. In IEEE CVPR, 2019.
-  Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander J. Smola. A kernel two-sample test. J. Mach. Learn. Res., 13:723–773, 2012.
-  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In IEEE CVPR, June 2016.
-  Alexander Hermans, Lucas Beyer, and Bastian Leibe. In defense of the triplet loss for person re-identification. CoRR, abs/1703.07737, 2017.
-  Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei A. Efros, and Trevor Darrell. Cycada: Cycle-consistent adversarial domain adaptation. In ICML, pages 1994–2003, 2018.
-  Yu-Jhe Li, Ci-Siang Lin, Yan-Bo Lin, and Yu-Chiang Frank Wang. Cross-dataset person re-identification via unsupervised pose disentanglement and adaptation. In IEEE ICCV, 2019.
-  Renjie Liao, Alexander G. Schwing, Richard S. Zemel, and Raquel Urtasun. Learning deep parsimonious representations. In NeurIPS, pages 5076–5084, 2016.
-  Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. Person re-identification by local maximal occurrence representation and metric learning. In IEEE CVPR, June 2015.
-  Shan Lin, Haoliang Li, Chang-Tsun Li, and Alex C. Kot. Multi-task mid-level feature alignment network for unsupervised cross-dataset person re-identification. In BMVC, 2018.
-  Jiawei Liu, Zheng-Jun Zha, Di Chen, Richang Hong, and Meng Wang. Adaptive transfer network for cross-domain person re-identification. In IEEE CVPR, 2019.
-  Ming-Yu Liu and Oncel Tuzel. Coupled generative adversarial networks. In NeurIPS, pages 469–477, 2016.
-  Zimo Liu, Dong Wang, and Huchuan Lu. Stepwise metric promotion for unsupervised video person re-identification. In IEEE ICCV, pages 2448–2457, 2017.
-  Mingsheng Long, Yue Cao, Jianmin Wang, and Michael I. Jordan. Learning transferable features with deep adaptation networks. In Francis R. Bach and David M. Blei, editors, ICML, volume 37, pages 97–105, 2015.
-  Mingsheng Long, Guiguang Ding, Jianmin Wang, Jiaguang Sun, Yuchen Guo, and Philip S. Yu. Transfer sparse coding for robust image representation. In IEEE CVPR, pages 407–414, 2013.
-  Jianming Lv and Xintong Wang. Cross-dataset person re-identification using similarity preserved generative adversarial networks. In Weiru Liu, Fausto Giunchiglia, and Bo Yang, editors, KSEM, pages 171–183, 2018.
-  Saeid Motiian, Marco Piccirilli, Donald A. Adjeroh, and Gianfranco Doretto. Unified deep supervised domain adaptation and generalization. In IEEE ICCV, pages 5716–5726, 2017.
Peixi Peng, Tao Xiang, Yaowei Wang, Massimiliano Pontil, Shaogang Gong, Tiejun
Huang, and Yonghong Tian.
Unsupervised cross-dataset transfer learning for person re-identification.In IEEE CVPR, June 2016.
-  Lei Qi, Lei Wang, Jing Huo, Luping Zhou, Yinghuan Shi, and Yang Gao. A novel unsupervised camera-aware domain adaptation framework for person re-identification. In IEEE ICCV, 2019.
-  Ergys Ristani, Francesco Solera, Roger S. Zou, Rita Cucchiara, and Carlo Tomasi. Performance measures and a data set for multi-target, multi-camera tracking. In IEEE ECCV Workshops, 2016.
-  Marcus Rohrbach, Sandra Ebert, and Bernt Schiele. Transfer learning in a transductive setting. In NeurIPS, pages 46–54, 2013.
-  Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. Adapting visual category models to new domains. In ECCV, 2010.
-  Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, and Tatsuya Harada. Maximum classifier discrepancy for unsupervised domain adaptation. In IEEE CVPR, 2018.
-  Ozan Sener, Hyun Oh Song, Ashutosh Saxena, and Silvio Savarese. Learning transferrable representations for unsupervised domain adaptation. In NeurIPS, pages 2110–2118, 2016.
-  Liangchen Song, Cheng Wang, Lefei Zhang, Bo Du, Qian Zhang, Chang Huang, and Xinggang Wang. Unsupervised domain adaptive re-identification: Theory and practice. CoRR, abs/1807.11334, 2018.
-  Baochen Sun, Jiashi Feng, and Kate Saenko. Return of frustratingly easy domain adaptation. In AAAI, pages 2058–2065, 2016.
-  Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell. Deep domain confusion: Maximizing for domain invariance. CoRR, abs/1412.3474, 2014.
-  Jingya Wang, Xiatian Zhu, Shaogang Gong, and Wei Li. Transferable joint attribute-identity deep learning for unsupervised person re-identification. In IEEE CVPR, 2018.
-  Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. Person transfer gan to bridge domain gap for person re-identification. In IEEE CVPR, 2018.
-  Ancong Wu, Wei-Shi Zheng, and Jian-Huang Lai. Unsupervised person re-identification by camera-aware similarity consistency learning. In IEEE ICCV, 2019.
-  Jinlin Wu, Shengcai Liao, Zhen Lei, Xiaobo Wang, Yang Yang, and Stan Z. Li. Clustering and dynamic sampling based unsupervised domain adaptation for person re-identification. In IEEE ICME, pages 886–891, 2019.
-  Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In IEEE CVPR, 2018.
Junyuan Xie, Ross B. Girshick, and Ali Farhadi.
Unsupervised deep embedding for clustering analysis.In Maria-Florina Balcan and Kilian Q. Weinberger, editors, ICML, volume 48, pages 478–487, 2016.
-  Fan Yang, Ke Yan, Shijian Lu, Huizhu Jia, Xiaodong Xie, and Wen Gao. Attention driven person re-identification. Pattern Recognition, 86:143–155, 2019.
-  Jianwei Yang, Devi Parikh, and Dhruv Batra. Joint unsupervised learning of deep representations and image clusters. In IEEE CVPR, pages 5147–5156, 2016.
-  Mang Ye, Andy Jinhua Ma, Liang Zheng, Jiawei Li, and Pong C. Yuen. Dynamic label graph matching for unsupervised video re-identification. In IEEE ICCV, pages 5152–5160, 2017.
-  Hong-Xing Yu, Ancong Wu, and Wei-Shi Zheng. Cross-view asymmetric metric learning for unsupervised person re-identification. In IEEE ICCV, 2017.
-  Weichen Zhang, Wanli Ouyang, Wen Li, and Dong Xu. Collaborative and adversarial network for unsupervised domain adaptation. In IEEE CVPR, pages 3801–3809, 2018.
-  Xinyu Zhang, Jiewei Cao, Chunhua Shen, and Mingyu You. Self-training with progressive augmentation for unsupervised cross-domain person re-identification. In IEEE ICCV, 2019.
-  Xiaosong Zhang, Fang Wan, Chang Liu, Rongrong Ji, and Qixiang Ye. Freeanchor: Learning to match anchors for visual object detection. In Advances in Neural Information Processing Systems, pages 147–155, 2019.
-  Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. MARS: A video benchmark for large-scale person re-identification. In ECCV, pages 868–884, 2016.
-  Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. Scalable person re-identification: A benchmark. In IEEE ICCV, 2015.
-  Zhedong Zheng, Liang Zheng, and Yi Yang. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In IEEE ICCV, 2017.
-  Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. Re-ranking person re-identification with k-reciprocal encoding. In IEEE CVPR, 2017.
-  Zhun Zhong, Liang Zheng, Shaozi Li, and Yi Yang. Generalizing a person retrieval model hetero- and homogeneously. In ECCV, pages 176–192, 2018.
-  Zhun Zhong, Liang Zheng, Zhiming Luo, Shaozi Li, and Yi Yang. Invariance matters: Exemplar memory for domain adaptive person re-identification. In IEEE CVPR, 2019.
-  Zhun Zhong, Liang Zheng, Zhedong Zheng, Shaozi Li, and Yi Yang. Camstyle: A novel data augmentation method for person re-identification. IEEE TIP, 28(3):1176–1190, 2019.