1 Related Works
Person re-ID has been widely studied in the literature. Existing methods typically focus on tackling the challenges of matching images with viewpoint and pose variations, or those with background clutter or occlusion presented [18, 5, 22, 13, 30, 2, 17, 23, 40, 31, 3, 29, 39, 35, 4, 20]. For example, Liu et al. 
develop a pose-transferable deep learning framework based on GAN to handle image pose variants. Chen et al. 
integrate conditional random fields (CRF) and deep neural networks with multi-scale similarity metrics. Several attention-based methods[30, 17, 31] are further proposed to focus on learning the discriminative image features to mitigate the effect of background clutter. While promising results have been observed, the above approaches cannot easily be applied for addressing clothing dependence problem due to the lack of ability in suppressing the visual differences across clothes/clothing-colors.
Some related methods for cross-modality are proposed [42, 45, 46, 6, 38, 10] to address color-variations. Varior et al.  learned color patterns from pixels sampled from images across camera views, addressing the challenge of different illuminations of the perceived color of subjects. Wu et al. 
built the first cross-modality RGB-IR benchmark dataset named SYSU-MM01. They also analyze three different network structures and propose deep zero-padding for evolving domain-specific structure automatically in one stream network optimized for RGB-IR Re-ID tasks.[45, 46] propose modality-specific and modality-shared metric losses and a new bi-directional dual-constrained top-ranking loss for RGB-Thermal person re-identification.  introduce a cross-modality generative adversarial network (cmGAN) to reduce the distribution divergence of RGB and IR features. Recently,  also propose pixel alignment and feature alignment jointly to reduce the cross-modality variations. Yet, even though they successfully learn sensor-invariant features across modality, their models still can not be used to address the issue of clothing dependence in single modality.
Recently, a number of models are proposed to better represent specific disentangled features during re-ID [32, 49, 48, 47, 16, 44, 41]. Ma et al.  generate person images by disentangling the input into foreground, background and pose with a complex multi-branch model which is not end-to-end trainable. Ge et al.  and Li et al.  learn pose-invariant features with guided image information. Zheng et al.  propose a joint learning framework named DG-Net that couples re-id learning and data generation end-to-end. Their model involves a generative module that separately encodes each person into an appearance code and a structure code, which lead to improved re-ID performance. However, their appearance encoder used to perform re-ID are still dominated by the clothing-color features corresponding to the input images. Based on the above observations, we choose to learn clothing-color invariant features using a novel and unified model. By disentangling the body shape representation, re-ID can be successfully performed in the scenario of clothing change even if no ground true images containing clothing change are available for training data.
For the sake of the completeness, we define the notations to be used in this paper. In the training stage, we have access to a set of RGB images and its corresponding label set , where and are the RGB image and its label, respectively. To allow our model to handle images of different color variations, we generate a gray-scaled image set by multiplying each image from with RGB channel summation factors, followed by duplicating the single channel back to the original image size (i.e., ). Naturally, the label set for is identical to . In order to achieve body-shape distilling via image generation, we also sample another set of RGB images , where its corresponding label set is same as but with different pose and view point.
As depicted in Figure 4, CASE-Net consists of five components:(1) the shape encoder , (2) the color encoder , (3) the feature discriminator , (4) the image generator , and (5) the image discriminator . We now describe how these models work together to learn a body shape feature which can be used for re-ID in domains that do not use color. Training CASE-Net results in learning a shape encoding and a color encoding of an image of a person. However, we are primarily interested in the body shape feature since it can be re-used for cross-domain (non-color dependent) re-ID tasks.
2.1 Clothing color adaptation in re-ID
Shape encoder ().
To utilize labeled information of training data for person re-ID, we employ classification loss on the output feature vector( and ). With person identity as ground truth information, we can compute the negative log-likelihood between the predicted label and the ground truth one-hot vector , and define the identity loss as
where is the number of identities (classes).
To further enhance the discriminative property, we impose a triplet loss on the feature vector , which would maximize the inter-class discrepancy while minimizing intra-class distinctness. To be more specific, for each input image , we sample a positive image with the same identity label and a negative image with different identity labels to form a triplet tuple. Then, the following equations compute the distances between and /:
where , , and represent the feature vectors of images , , and , respectively. With the above definitions, we have the triplet loss defined as
where is the margin used to define the distance difference between the distance of positive image pair and the distance of negative image pair .
Feature discriminator ().
Next, since our goal is to derive body-shape representations which do not depend on clothing-color, we first learn color-invariant representation by encouraging the content encoder to generate similar feature distributions when observing both and . To achieve this, we advance adversarial learning strategies and deploy a feature discriminator in the latent feature space. This discriminator takes the feature vectors and as inputs to determine whether the input feature vectors are from or . To be more precise, we define the feature-level adversarial loss as
where and denote the encoded RGB and gray-scaled image features, respectively.111For simplicity, we omit the subscript , denote RGB and gray-scaled images as and , and represent their corresponding labels as and . With loss , our feature discriminator distinguish the features from two distributions while our shape encoder aligns the feature distributions across color variations, carrying out the learning of color-invariant representations for clothing via adversarial manner.
2.2 Pose guidance for body shape disentanglement
Color encoder ().
To ensure our derived feature is body-shape related in clothing-color changing tasks, we need to perform additional body-shape disentanglement during the learning of our CASE-Net. That is, we have the color encoder in Fig. 4 encodes the inputs from RGB images set into color-related features . As a result, both gray-scaled body-shape and color features would be produced in the latent space. Inspired by DG-Net  using gray-scaled image to achieve body-shape disentanglement across pose variations, we similarly enforce the our generators to produce the person images conditioned on the encoded color feature coming from different pose. To be precise, we have the generator take the concatenated shape and color feature pair and output the corresponding image .
Image generator ().
Since we have ground truth labels (i.e., image pair correspondences) from the training data, we can perform a image recovery task given two images and of the same person but with different poses, we expect that they share the same body-shape feature . Given the desirable feature pair , we then enforce to output the image using the body-shape feature which is originally associated with . This is referred to as Pose guided image recovery.
With the above discussion, image reconstruction loss can be calculated as:
where denotes . Note that we adopt the L norm in the above reconstruction loss terms as it preserves image sharpness .
Image discriminator ().
To further enforce perform perceptual content recovery, we produce perceptually realistic outputs by having the image discriminator discriminate between the real images and the synthesized ones . To this end, we have both reconstruction loss and perceptual discriminator loss for image recovery. Thus, the image perceptual discriminator loss as
To perform person re-ID in the testing phase, our network encodes the query image by for deriving the body shape feature , which is applied for matching the gallery ones via nearest neighbor search (in Euclidean distances). We will detail the properties of each component in the following subsections.
It is import to note that the goal of CASE-Net is to perform re-ID in clothing changing scenario without observing ground true clothing changing training data. By introducing the aforementioned network module, our CASE-Net would be capable of performing re-ID in environments with clothing changes. More precisely, with the joint training of encoders/generator and the feature discriminator, our model allows learning of body-structural representation. The pseudo code for training our CASE-Net using above losses is summarized in Algorithm 1, where and are hyper-parameters.
|Verif-Identif  (TOMM’17)||19.0||35.6||43.9||4.2||9.2||23.9||34.6||1.0|
|SVDNet  (ICCV’17)||20.7||46.0||59.4||5.3||9.8||25.1||35.5||1.3|
|FD-GAN  (NIPS’18)||21.2||46.5||59.9||5.1||14.3||26.4||36.5||1.6|
|Part-aligned  (ECCV’18)||23.7||47.3||60.6||5.5||14.9||27.4||36.1||1.8|
|PCB  (ECCV’18)||25.5||48.9||61.9||5.9||15.7||27.0||39.5||1.7|
|DG-Net  (CVPR’19)||27.2||51.3||63.3||6.2||19.7||30.1||47.5||2.2|
|cmGAN*  (IJCAI’18)||29.6||55.5||65.3||8.2||23.7||37.0||50.5||2.9|
|AlignGAN*  (ICCV’19)||41.8||63.3||72.2||12.6||26.0||45.3||61.0||3.4|
|Method||Standard re-ID||Extended re-ID|
|Q: RGB, G: RGB||Q: Gray, G: RGB||Q: RGB, G: Gray||Q: Gray, G: Gray|
|Verif-Identif  (TOMM’17)||79.5||86.0||90.3||61.5||10.2||15.4||21.1||7.8||19.5||35.6||43.9||10.9||42.5||61.3||74.2||20.6|
|SVDNet  (ICCV’17)||82.2||92.3||93.9||62.4||10.1||13.2||22.5||8.9||18.9||36.5||45.4||11.0||42.0||62.7||72.1||21.1|
|FD-GAN  (NIPS’18)||90.5||96.0||97.7||77.9||12.4||19.6||23.8||10.1||30.5||50.1||59.6||18.4||49.7||69.8||76.2||23.2|
|Part-aligned  (ECCV’18)||93.8||97.7||98.3||79.9||14.1||22.5||27.9||11.6||36.6||58.7||67.4||20.0||51.3||73.4||80.4||26.5|
|PCB  (ECCV’18)||93.2||97.3||98.2||81.7||13.6||22.4||27.4||10.6||35.5||56.2||65.1||19.3||50.2||72.9||80.1||26.2|
|DG-Net  (CVPR’19)||94.4||98.4||98.9||85.2||15.1||23.6||29.4||12.1||37.7||59.8||68.5||22.9||52.9||73.8||81.5||27.5|
|cmGAN*  (IJCAI’18)||82.1||92.5||94.1||61.8||67.2||83.5||88.6||46.3||70.4||86.8||91.5||46.5||70.8||86.2||90.1||46.7|
|AlignGAN*  (ICCV’19)||89.3||95.4||97.2||74.3||77.2||89.6||94.7||57.0||79.4||90.5||92.1||55.1||79.8||91.8||94.0||57.1|
To evaluate our proposed method, we conduct experiments on two of our synthesized datasets: SPML-reID and Div-Marke, and two benchmark re-ID datasets: Market-1501  and DukeMTMC-reID [53, 28] , which is commonly considered in recent re-ID tasks. We also additionally conduct experiments on one cross-modality dataset named SYSU-MM01  to assess the generalization of our model when it learns body shape representation.
SPML-reID is our synthetic dataset to simulate clothing change for person re-ID. We render identity across view-points (different shooting angle from top view) and walking poses using SMPL. Details of the SMPL model can be found at . For each identity, we render it using  and pair it with a selected background image. For shape parameters, we sampled from the ”walking” class from the AMASS dataset . The identities are rendered from 6 different view points (see examples in Fig. 2), and identities are for training where no clothes change occurs, while the other identities in testing dataset contain clothes changes, totally images.
Div-Market is our small synthesized dataset from current Market-1501. We use our generative model similar as  to change the clothing-color in the images of each identity. It contains total 24732 images of 200 identities each with hundreds of figures and it is only used for testing scenario.
The Market-1501  is composed of 32,668 labeled images of 1,501 identities collected from 6 camera views. The dataset is split into two non-over-lapping fixed parts: 12,936 images from 751 identities for training and 19,732 images from 750 identities for testing. In testing, 3368 query images from 750 identities are used to retrieve the matching persons in the gallery.
The DukeMTMC-reID [53, 28] is also a large-scale Re-ID dataset. It is collected from 8 cameras and contains 36,411 labeled images belonging to 1,404 identities. It also consists of 16,522 training images from 702 identities, 2,228 query images from the other 702 identities, and 17,661 gallery images.
The SYSU-MM01  dataset is the first benchmark for cross-modality (RGB-IR) Re-ID, which is captured by 6 cameras, including two IR cameras and four RGB ones. This dataset contains 491 persons with total 287,628 RGB images and 15,792 IR images from four RGB cameras and two IR cameras. The training set consists of total 32,451 images including 19,659 RGB images and 12,792 IR images, where the training set contains 395 identities and the test set contains 96 identities.
3.2 Implementation Details
We implement our model using PyTorch. Following Section2, we use ResNet-
pre-trained on ImageNet as our backbone of shape encoderand color encoder . Given an input image (all images are resized to size , denoting width, height, and channel respectively.), encodes the input into -dimension content feature . The structure of the generator is convolution-residual blocks similar to that proposed by Miyato et al. . The structure of the image discriminator employs the ResNet- as backbone while the architecture of shared feature discriminator adopts is composed of convolution blocks in our CASE-Net. All five components are all randomly initialized. The margin for the is set as , and we fix and as and , respectively. The performance of our method can be possibly further improved by applying pre/post-processing methods, attention mechanisms, or re-ranking techniques. However, such techniques are not used in all of our experiments.
3.3 Evaluation Settings and Protocol.
For our rendered SPML-reID, we train the model on training set and then inference it with testing set. For our synthesized testing set Div-Market, we evaluate the models training only with Market-1501 on the clothing-color changing dataset during the testing scenario. For Market-1501 , we augment the testing dataset by converting the RGB images into Gray-scaled ones. That is, in addition to the standard evaluation setting where both Probe (Query) and Gallery are of RGB, we conducted extended experiments on Gray/RGB, Gray/Gray, and Gray/Gray as Probe/Gallary sets for evaluating the generalization of current re-ID models. For SYSU-MM01, there are two test modes, i.e., all-search mode and indoor-search mode. For the all-search mode, all testing images are used. For the indoor-search mode, only indoor images from 1st, 2nd, 3rd, 6th cameras are used. The single-shot and multi-shot settings are adopted in both modes. Both modes use IR images as the probe set and RGB images as the gallery set.
We employ the standard metrics as in most person Re-ID literature, namely the cumulative matching curve (CMC) used for generating ranking accuracy, and the mean Average Precision (mAP). We report rank-1 accuracy and mean average precision (mAP) for evaluation on both datasets.
|Method||Standard re-ID||Extended re-ID|
|Q: RGB, G: RGB||Q: Gray, G: RGB||Q: RGB, G: Gray||Q: Gray, G: Gray|
|Verif-Identif  (TOMM’17)||68.7||81.5||84.2||49.8||8.9||15.4||20.3||6.2||16.1||33.9||43.5||8.1||35.4||52.3||60.0||17.4|
|SVDNet  (ICCV’17)||76.5||87.1||90.4||57.0||9.1||16.3||20.5||6.9||16.5||36.4||45.8||9.6||37.0||52.8||60.9||17.5|
|FD-GAN  (NIPS’18)||80.8||89.8||92.7||63.3||9.4||17.1||22.1||7.8||19.5||36.9||46.2||11.0||35.1||53.2||61.8||18.1|
|Part-aligned  (ECCV’18)||83.5||92.0||93.9||69.2||10.5||16.9||20.6||7.4||20.1||38.0||46.3||10.9||34.7||52.4||61.5||18.8|
|PCB  (ECCV’18)||82.9||91.1||93.6||67.1||9.1||17.4||21.4||6.6||19.4||37.7||46.4||10.8||34.2||52.6||60.9||18.4|
|DG-Net  (CVPR’19)||86.3||93.2||95.5||75.1||12.5||18.6||23.9||8.1||21.5||38.3||47.2||12.5||38.6||55.1||64.0||21.5|
|cmGAN*  (IJCAI’18)||74.1||86.2||88.5||54.1||58.8||76.1||79.3||38.2||60.5||77.0||83.6||39.0||60.6||76.9||81.7||39.0|
|AlignGAN*  (ICCV’19)||80.1||87.6||90.5||58.1||63.8||79.4||82.8||47.1||62.5||80.3||84.8||44.1||63.8||80.1||84.2||42.5|
3.4 Comparisons with State-of-the-art Approaches
|HOG  (CVPR’05)||2.8||18.3||32.0||4.2||3.8||22.8||37.6||2.2||3.2||24.7||44.5||7.3||4.8||29.1||49.4||3.5|
|LOMO  (CVPR’15)||3.6||23.2||37.3||4.5||4.7||28.3||43.1||2.3||5.8||34.4||54.9||10.2||7.4||40.4||60.4||5.6|
|Two Stream Net  (ICCV’17)||11.7||48.0||65.5||12.9||16.4||58.4||74.5||8.0||15.6||61.2||81.1||21.5||22.5||72.3||88.7||14.0|
|One Stream Net  (ICCV’17)||12.1||49.7||66.8||13.7||16.3||58.2||75.1||8.6||17.0||63.6||82.1||23.0||22.7||71.8||87.9||15.1|
|Zero Padding  (ICCV’17)||14.8||52.2||71.4||16.0||19.2||61.4||78.5||10.9||20.6||68.4||85.8||27.0||24.5||75.9||91.4||18.7|
|cmGAN  (IJCAI’18)||27.0||67.5||80.6||27.8||31.5||72.7||85.0||22.3||31.7||77.2||89.2||42.2||37.0||80.9||92.3||32.8|
|AlignGAN  (ICCV’19)||42.4||85.0||93.7||40.7||51.5||89.4||95.7||33.9||45.9||87.6||94.4||54.3||57.1||92.1||97.4||45.3|
|Method||Rank 1||Rank 5||Rank 10||mAP|
|Ours (full model)||56.2||61.5||69.2||13.5|
Ablation study of the loss functions on the Div-Market dataset.We note that, each row indicates the model with only one loss excluded.
To simulate the real-world clothing-color changing environment, we conducted the re-ID experiments on our SMPL-reID, and compared with the six state-of-the-art re-ID approaches and two cross-modality re-ID models. As the reported results presented on the left side of Table 1, our proposed CASE-Net outperforms all the compared methods by a large margin. In addition, some phenomenons can also be observed. First, we found severe performance drops in all the standard re-ID approaches, which indicates standard re-ID approaches all suffer from clothing-color/clothes mismatch problems. Second, though two cross-modality methods demonstrate improvement, their models can not handle clothing-color changing in single modality either.
For our synthesized Div-Market, we also compare our proposed method with six current standard re-ID approaches and two cross-modality re-ID models. We also reported the results on the right side of Table 1. Same phenomenons are also observed as SMPL-reID.
We compare our proposed method with six current standard re-ID approaches and two cross-modality re-iD models whose codes are available online, and reported the results in one standard and three extended settings on the Market-1501. These standard approaches include Verif-Identif , SVDNet , Part-aligned , FD-GAN , PCB , and DG-Net  while cross-modality models involve cmGAN  and AlighnGAN . We report all the results in Table 2 and several phenomenons can be observed which we summarized as three folds. Firstly, state-of-the-arts methods outperform two cross-modality approaches by a margin but suffer severe performance drop in the extended evaluation, which shows their vulnerability to color variations and weak generalization when they train to overfit on the clothing color. Second, our proposed CASE-Net outperforms all the methods in each settings, which demonstrates that its ability to derive body shape representation.
For the DukeMTMC-reID dataset, we also compare our proposed method with six current standard re-ID approaches and two cross-modality re-ID models whose codes are available online, and we reported the results in one standard and three extended settings in as well Table 3. Same phenomenons are also observed as Market-1501.
To assess the generalization of our CASE-Net in cross-modality person re-ID, we also conducted additional experiments on the SYSU-MM01 dataset. We compare our proposed CASEI-Net with two hand-crafted features (HOG , LOMO ) and three cross-modality approaches (SYSU model , cmGAN , AlighnGAN ). We reported the results in Table 4 and observe that our method achieves comparable result in the cross-modality re-ID setting. It to worth repeating that, our proposed CASE-Net which is developed for clothing-color changes in re-ID generalizes well in cross-modality re-ID.
3.5 Ablation Studies
To further analyze the importance of each introduced loss function, we conduct an ablation study shown in Table 5. Firstly, the feature adversarial loss is shown to be vital to our CASE-Net, since we observe drops on Div-Market when the loss was excluded. This is caused by no explicit supervision to guide our CASE-Net to generate human-perceivable images with body shape disentanglement, and thus the resulting model would suffer from image-level information loss. Secondly, without the feature adversarial loss , our model would not be able to perform feature-level color adaptation, causing failure on learning clothing color invariant representation and resulting in the re-ID performance drop (about ). Thirdly, when either or is turned off, our model is not able to be supervised using two re-ID losses, indicating that jointly use of two streams of supervision achieve best results. Lastly, the image adversarial loss is introduced to our CASE-Net to mitigate the perceptual image-level information loss.
We now visualize the feature vectors on our Div-Market in Figures 5 via t-SNE. It is worth to repeat that, in our synthesized Div-Market same identity can have different wearings while several identities can have the same wearing. In the figure, we select different person identities, each of which is indicated by a color. From Fig. 4(a) and Fig. 4(c), we observe that our projected feature vectors are well separated when it compared with DG-Net , which suggests that sufficient re-ID ability can be exhibited by our model. On the other hand, for Fig. 4(b) and Fig. 4(d), we colorize each same cloth dressing with a color. It can be observed that our projected feature vectors of the same identity but different dressing are all well clustered while the ones of DG-Net  are not.
In this paper, we have unfolded an challenge yet significant person re-identification task which has been long ignored in the past. We collect two re-ID datsets (SMPL-reID and Div-Market) for simulating real-world scenario, which contain changes in clothes or clothing-color. To address clothing changes in re-ID, we presented a novel Color Agnostic Shape Extraction Network (CASE-Net) which learns body shape representation training or fine-tuning on data containing clothing change. By advancing the adversarial learning and body shape disentanglement, our model resulted in satisfactory performance on the collected datasets (SPML-reID and Div-Market) and two re-ID benchmarks. Qualitative results also confirmed that our model is capable of learning body shape representation, which is clothing-color invariant. Furthermore, the extensive experimental result on one cross-modality dataset also demonstrated the generalization of our model to cross-modality re-ID.
-  (2008) People-tracking-by-detection and people-detection-by-tracking. In CVPR, Cited by: Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2018) Multi-level factorisation net for person re-identification. In CVPR, Cited by: §1.
-  (2018) Group consistent similarity learning via deep crf for person re-identification. In CVPR, Cited by: §1, Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2019) Learning resolution-invariant deep representations for person re-identification. In AAAI, Cited by: §1.
-  (2016) Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In CVPR, Cited by: §1.
-  (2018) Cross-modality person re-identification with generative adversarial training.. In IJCAI, Cited by: §1, Table 1, Table 2, §3.4, §3.4, Table 3, Table 4, Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2005) Histograms of oriented gradients for human detection. CVPR. Cited by: §3.4, Table 4.
-  (2018) FD-GAN: Pose-Guided Feature Distilling GAN for Robust Person Re-Identification. In NeurIPS, Cited by: §1, Table 1, Table 2, §3.4, Table 3.
-  (2014) Generative adversarial nets. In NeurIPS, Cited by: §1.
-  (2019) HSME: hypersphere manifold embedding for visible thermal person re-identification. In AAAI, Cited by: §1.
-  (2017) In defense of the triplet loss for person re-identification. In arXiv preprint, Cited by: Learning Shape Representations for Clothing Variations in Person Re-Identification.
Multimodal unsupervised image-to-image translation. In ECCV, Cited by: §2.2.
-  (2018) Human semantic parsing for person re-identification. In CVPR, Cited by: §1.
-  (2018) Neural 3D Mesh Renderer. CVPR. Cited by: §3.1.
-  (2016) Person re-identification for real-world surveillance systems. arXiv. Cited by: Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2017) Learning deep context-aware features over body and latent parts for person re-identification. In CVPR, Cited by: §1.
-  (2018) Harmonious attention network for person re-identification. In CVPR, Cited by: §1.
-  (2019) Recover and identify: a generative dual model for cross-resolution person re-identification. In ICCV, Cited by: §1.
-  (2019) Cross-dataset person re-identification via unsupervised pose disentanglement and adaptation. In ICCV, Cited by: §1.
Adaptation and re-identification network: an unsupervised deep transfer learning approach to person re-identification. In cvprw, Cited by: §1.
-  (2015) Person re-identification by local maximal occurrence representation and metric learning. In CVPR, Cited by: §3.4, Table 4.
-  (2017) Improving person re-identification by attribute and identity learning. In arXiv preprint, Cited by: §1, Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2018) Pose transferrable person re-identification. In CVPR, Cited by: §1.
-  (2015) SMPL: A skinned multi-person linear model. ACM Transactions on Graphics. Cited by: §3.1, Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2018) Disentangled Person Image Generation. In CVPR, Cited by: §1.
-  (2019) AMASS: Archive of Motion Capture as Surface Shapes. arXiv:1904.03278. Cited by: §3.1.
-  (2018) CGANs with projection discriminator. In ICLR, Cited by: §3.2.
-  (2016) Performance measures and a data set for multi-target, multi-camera tracking. In ECCVW, Cited by: §3.1, §3.1, Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2018) Deep group-shuffling random walk for person re-identification. In CVPR, Cited by: §1, Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2018) Dual attention matching network for context-aware feature sequence based person re-identification. In CVPR, Cited by: §1, Learning Shape Representations for Clothing Variations in Person Re-Identification.
Mask-Guided Contrastive Attention Model for Person Re-Identification. In CVPR, Cited by: §1.
-  (2017) Pose-driven deep convolutional model for person re-identification. In CVPR, Cited by: §1.
-  (2018) Part-aligned bilinear representations for person re-identification. ECCV. Cited by: Table 1, Table 2, §3.4, Table 3.
-  (2017) Svdnet for pedestrian retrieval. In arXiv preprint, Cited by: Table 1, Table 2, §3.4, Table 3.
-  (2018) Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). ECCV. Cited by: Figure 1, §1, Table 1, Table 2, §3.4, Table 3, Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2016) Learning invariant color features for person reidentification. TIP. Cited by: §1.
-  (2013) People reidentification in surveillance and forensics: a survey. ACM Computing Surveys (CSUR). Cited by: Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2019) RGB-infrared cross-modality person re-identification via joint pixel and feature alignment. In ICCV, Cited by: §1, Table 1, Table 2, §3.4, §3.4, Table 3, Table 4, Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2018) Resource Aware Person Re-identification across Multiple Resolutions. In CVPR, Cited by: §1.
-  (2018) Person transfer gan to bridge domain gap for person re-identification. In CVPR, Cited by: §1.
-  (2017) Glad: Global-local-alignment descriptor for pedestrian retrieval. In ACM MM, Cited by: §1.
-  (2017) Rgb-infrared cross-modality person re-identification. In ICCV, Cited by: §1, §3.1, §3.1, §3.4, Table 4, Learning Shape Representations for Clothing Variations in Person Re-Identification.
An Enhanced Deep Feature Representation for Person Re-Identification. WACV. Cited by: Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2019) Deep representation learning with part loss for person re-identification. TIP. Cited by: §1.
-  (2018) Hierarchical discriminative learning for visible thermal person re-identification. In AAAI, Cited by: §1, Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2018) Visible thermal person re-identification via dual-constrained top-ranking.. In IJCAI, Cited by: §1, Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2017) Spindle net: person re-identification with human body region guided feature decomposition and fusion. In CVPR, Cited by: §1.
-  (2017) Deeply-learned part-aligned representations for person re-identification. In CVPR, Cited by: §1.
-  (2017) Pose invariant embedding for deep person re-identification. In arXiv preprint, Cited by: §1.
-  (2015) Scalable person re-identification: a benchmark. In CVPR, Cited by: §3.1, §3.1, Learning Shape Representations for Clothing Variations in Person Re-Identification, Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2016) Person re-identification: past, present and future. In arXiv preprint, Cited by: Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2019) Joint discriminative and generative learning for person re-identification. In CVPR, Cited by: Figure 1, §1, §2.2, Table 1, Table 2, 4(a), 4(b), §3.1, §3.4, §3.5, Table 3, Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV, Cited by: §3.1, §3.1, Learning Shape Representations for Clothing Variations in Person Re-Identification.
-  (2018) A discriminatively learned CNN embedding for person reidentification. TOMM. Cited by: Table 1, Table 2, §3.4, Table 3.
-  (2018) Camera style adaptation for person re-identification. In CVPR, Cited by: Learning Shape Representations for Clothing Variations in Person Re-Identification.