Attribute analysis with synthetic dataset for person re-identification

06/12/2020
by   Suncheng Xiang, et al.
Shanghai Jiao Tong University
0

Person re-identification (re-ID) plays an important role in applications such as public security and video surveillance. Recently, learning from synthetic data, which benefits from the popularity of synthetic data engine, have achieved remarkable performance. However, existing synthetic datasets are in small size and lack of diversity, which hinders the development of person re-ID in real-world scenarios. To address this problem, firstly, we develop a large-scale synthetic data engine, the salient characteristic of this engine is controllable. Based on it, we build a large-scale synthetic dataset, which are diversified and customized from different attributes, such as illumination and viewpoint. Secondly, we quantitatively analyze the influence of dataset attributes on re-ID system. To our best knowledge, this is the first attempt to explicitly dissect person re-ID from the aspect of attribute on synthetic dataset. Comprehensive experiments help us have a deeper understanding of the fundamental problems in person re-ID. Our research also provides useful insights for dataset building and future practical usage.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

10/15/2020

Taking A Closer Look at Synthesis: Fine-grained Attribute Analysis for Person Re-Identification

Person re-identification (re-ID) plays an important role in applications...
12/05/2018

Dissecting Person Re-identification from the Viewpoint of Viewpoint

In person re-identification (re-ID),In person re-identification (re-ID),...
12/18/2019

Simulating Content Consistent Vehicle Datasets with Attribute Descent

We simulate data using a graphic engine to augment real-world datasets, ...
09/22/2021

Less is More: Learning from Synthetic Data with Fine-grained Attributes for Person Re-Identification

Person re-identification (re-ID) plays an important role in applications...
03/09/2020

When Person Re-identification Meets Changing Clothes

Person re-identification (Reid) is now an active research topic for AI-b...
10/06/2020

Dissecting Span Identification Tasks with Performance Prediction

Span identification (in short, span ID) tasks such as chunking, NER, or ...
08/14/2020

Open-World Person Re-Identification With RGBD Camera in Top-View Configuration for Retail Applications

Person re-identification (re-ID) is currently a notably topic in the com...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Person re-ID aims to identify images of the same person from large number of cameras views in different places, which has attracted lots of interests and attention in both academia and industry. Encouraged by the remarkable success of deep learning methods 

[9, 6] and the availability of re-ID datasets [21, 13], performance of person re-ID has been significantly boosted. For example, the the rank-1 accuracy of single query on Market-1501 [21] has been improved from 43.8% [10] to 91.2% [8], the rank-1 accuracy on DukeMTMC-reID [13] has been improved from 25.13% [21] to 85.95% [8]. Currently, these performance gains comes only when a large diversity of training data is available, which is at the price of large amount of accurate annotations obtained by expensive human labor. Accordingly, real applications have to cope with challenges like complex lighting and scene variations, which current real datasets might fail to address [17].

Fig. 1:

Illustration of the examples between Market-1501 (upper-left) and DukeMTMC-reID (upper-right). It can be obviously found that there exists difference among different datasets in light, background or weather. However, our proposed large-scale dataset GPR-800 (bottom) always has large variances, high resolution and different backgrounds.

To address this issue, many successful person re-ID approaches [2, 1, 15] have been proposed to take advantage of game engine to construct large-scale synthetic re-ID datasets, which can be used to pre-train or fine-tune CNN network. In essence, it helps to provide more complete and better initialization parameters for potentially promoting the development of re-ID task. However, existing synthetic datasets have limited identities and lack of diversity, which leads to a large performance degradation when transferring them to the wild or other scenes. Another challenge we observe is that, there exists serious scene shift between synthetic and real dataset. To be more specific, since synthetic dataset is large-attribute-range and diverse dataset, using all images may cause the side effect in real scenes. For instance, as shown in Fig. 1, Market-1501 only contains scenes that recorded in summer vacation, DukeMTMC-reID is set in the blizzard scenes. Consequently, pre-training with all synthetic datasets can deteriorate performance on target domain during domain adaptation, which is not practical in real-world scenarios.

Fig. 2: The procedure of our proposed end-to-end pipeline, which consists of 1) synthetic dataset generation, 2) dataset attribute analysis, and 3) re-ID evaluation period. Firstly, we employ GPR engine to generate a large-scale synthetic dataset named GPR-800. Based on it, we then adopt attribute-style loss with VGG-19 network to perform dataset attribute analysis, consequently, a more suitable dataset (reliable dataset) is constructed with prior knowledge of target domain. During the re-ID Evaluation stage, both triplet loss and ID loss are deployed to learn a discriminative re-ID model.

To remedy the above problems, we start from two aspects. On the one hand, we introduce a large-scale synthetic data engine GPR-X, our virtual humans are carefully designed with an electronic game “Grand Theft Auto V”. Note that the salient characteristic of this engine is “controllable”. Based on it, we construct a large-scale and diverse synthetic person re-ID dataset named GPR-800. Compared with existing datasets, our GPR-800 has several advantages: 1) free colloection and annotation; 2) larger data volume; 3) more diversified scenes and 4) high resolution. More detailed information about GPR-800 is illustrated in Table I.

 

dataset #identity #box #cam view

 

Real Market-1501 [21] 1,501 32,668 6 N
CUHK03 [9] 1,467 14,096 2 N
DukeMTMC-reID [13] 1,404 36,411 8 N

 

Synthetic SOMAset [2] 50 100,000 250 N
SyRI [1] 100 1,680,000 N
PersonX [15] 1,266 273,456 6 Y
GPR-800 800 4,838,400 12 Y

 

TABLE I: Comparison of real-world and synthetic Re-ID datasets. “View” denotes whether the dataset has viewpoint labels.

On the other hand, in the attempt to reveal the influence of attributes of dataset on re-ID accuracy, we quantitatively analyze the influence of different factors (e.g. background, weather, illumination and viewpoint) on a person re-ID system with prior knowledge of target domain. To our knowledge there is no work in the existing literatures that comprehensively study the impacts of dataset attributes on re-ID system. So a natural question then come to our attention: how does these attributes influence the retrieval performance? Which one is most critical to our re-ID system? To answer these questions, we perform rigorous quantification on pedestrian images regarding different dataset attributes. Both the control group and experimental group are designed, so as to obtain convincing scientific conclusions.

For the sake of demonstrating the potentiality of attribute analysis in recognizing pedestrian independently, we evaluate the performance based on the synthetic dataset in a discriminative re-ID system. To be more specific, the procedure of our proposed pipeline can be shown in Fig. 2. Typically, the framework is not designed to achieve the-state-of-the-art performance in re-ID task, but to quantitatively analyze the influence of dataset attributes on re-ID system. To this end, we just select a simple but effective baseline for re-ID evaluation. The empirical results are consistent with our finding on the other CNN network.

As a consequence, this paper makes three contributions to the community.

  • We introduce a large-scale and diverse synthetic dataset, which consists of 800 manually designed identities and editable visual variables.

  • Based on our synthetic dataset, we dissect a person re-ID system by quantitatively analyzing the influence of different attributes.

  • Comprehensive experiments conducted on benchmark datasets verify the effectiveness of proposed data-selecting strategy, helping us have a deeper understanding of the fundamental problems in person re-ID.

The rest of the paper is organized as follows: After reviewing relevant previous works (Section 2), we describe the proposed GPR-X engine and GPR-800 synthetic dataset respectively (Section 3). Dataset attribute analysis is introduced in Section 4.1, Section 4.2 describes our baseline network for re-ID evaluation. Section 5 reports on an exhaustive set of experiments, illustrating the power of the attribute analysis strategy. Section 6 concludes with a summary and by sketching future work.

Fig. 3: Illustration of GPR-800 dataset. (A): Background. In each background, a person can face toward a manually denoted direction. (B) and (C) denote the exemplars of different weather distribution and illumination distribution respectively.

Ii Related Work

It is known that manual labelling is generally time-consuming and labour-intensive for each new (target) domain. Transfer learning 

[16] sometimes, to some extent, works but fails to solve this problem fundamentally. More recently, leveraging synthetic data is a useful idea to alleviate the reliance on large-scale real datasets in person re-ID. Theoretically speaking, unlimited amount of labels can be made available by resorting to simulated data, which can greatly alleviate the problem of over-fitting caused by scarce real labeled data during training. In essence, the very recent re-ID approaches [2, 1, 15] incorporate this idea to further boost re-ID performance in an unsupervised manner. For example, Barbosa et al. [2] propose a synthetic instance dataset SOMAset, which is created by photorealistic human body generation software. Bak et al. [1] construct a SyRI dataset by using 100 virtual humans illuminated with multiple HDR environment maps. In addition, Sun et al. [15] introduce a large-scale synthetic data engine named PersonX. However, neither these synthetic datasets are intensively diversified, nor they are editable and extendable by the public. In comparison, our proposed GPR-800 synthetic dataset has configurable backgrounds, weathers, illuminations and diversified identities. More importantly, it can be extended not only for this study, but also lift the burden of constructing large-scale labeled datasets in this area, and free humans from heavy data annotations.

In recent years, visual attribute is an important factor in image retrieval due to its high-level semantic knowledge, which could greatly bridge the gap between low-level features and high-level human cognitions 

[5, 4]. However, the influence of background (context) regions of person images is mostly ignored by existing methods. Fortunately, in our experiment, we observed that, for a large person image database consisting of person images with different backgrounds, training with these datasets would greatly improve the robustness for person re-ID. Besides, illumination [1] is always a critical factor in person re-ID task, including viewpoints of pedestrian. However, current re-identification datasets lacks significant diversity in the number of lighting conditions or viewpoint angles, since they are usually limited to a relatively small number of cameras. Consequently, models trained on these special illuminations are thus biased to the limited illumination conditions seen during training, which fails to adapt the model to unseen illuminations. The same problem can be observed in terms of viewpoint [15].

To relieve this dilemma, in this paper, we firstly construct a large-scale synthetic dataset based on controllable synthetic data engine GPR-X, thus quantitatively analyze the influence of dataset attributes on re-ID accuracy. To the best of our knowledge, we are the first to conduct comprehensive experiments to quantitatively assess the influence of dataset attributes on person re-ID accuracy, which can help provide more meaningful guidance to construct a high-quality dataset for person re-ID task.

Fig. 4: Image examples under a specified viewpoints, which is sampled every from to (12 different viewpoints in total).

Iii A Controllable Person Generation Engine

Iii-a Description

Software. The GPR-X engine is built on an electronic game Grand Theft Auto V, thus it is named as “GTA5 Person Re-identification” (“GPR-X” for short) engine. As a controllable system, it can satisfy various scene requirements. In GPR-X, the person models and scenes look realistic. More importantly, the values of visual variable, e.g., background, weather, illumination and viewpoint are designed to be editable, which allows GPR-X to be highly flexible and extendable.

Identities. GPR-X has 800 hand-crafted identities including different skin colors, body forms (e.g., height and weight) and hair styles, etc. To ensure diversity, the clothes of these identities include jeans, pants, shorts, slacks, skirts and different kinds of clothes, etc. In particular, some of these identities have a backpack, shoulder bag, glasses or hat. The motion of these characters can be walking, running, standing, or even having a dialogue etc. Fig. 1 (bottom) presents some examples of the character phototypes with customizable body parts and clothing. When not specified, all images are captured with resolution of 200 470.

Iii-B Visual Factors in GPR-X

GPR-X is featured by editable environmental factors such as background, weather, illumination and viewpoint. More detailed information of these factors are described below.

Backgound. Currently GPR-X has 9 different backgrounds, as shown in Fig. 3(A). In each background, a person moves freely in arbitrary directions, exhibiting arbitrary viewpoints relative to the camera. In Fig. 3(A), backgrounds (#1, #4 and #6) depict different urban street scenes. Notably, adopting more types of scenes close to target domain seems to have a positive influence on the re-ID performance.

Weather. In our large-scale synthetic data engine GPR-X, there are 7 different types of weather, e.g., clear, clouds, overcast, foggy, neutral, rainy and blizzard. Parameters like degree and intensity can be modified for each weather type. By editing the values of these terms, various kinds of weather can be created. Some exemplars of synthetic scenes in different weather distribution from the proposed GPR-800 dataset are depicted in Fig. 3(B).

Illumination. Illumination can be obtained by in different time in whole day. For one specific pedestrian image, we also provide its capturing time in 24 hours. In this paper, we introduce a new synthetic dataset that contains 8 illumination conditions. The exemplars of different time distribution from the proposed GPR-800 dataset is illustrated in Fig. 3(C), e.g., “0912” denotes the time period during “9:0012:00” in 24 hours a day.

Viewpoint. Fig. 4

presents image examples under specified viewpoints following a uniform distribution. Those images are sampled during normal talking or walking. Specifically, a person image is sampled every

from to (12 different viewpoints in total). Each view has 1 image, so each person has 12 images. The entire GPR-X engine has 800 (identities) 9 (backgrounds) 7 (weathers) 8 (illuminations) 12 (viewpoints) = 4,838,400 images.

The above discussions indicate that GPR-X engine has strictly controlled environment variables and reasonably sensitive to environmental changes. We believe GPR-X will be a useful tool for the community and encourage the development of robust algorithms and scientific analysis.

Iv Attribute Analysis

It is known that attribute analysis on synthetic dataset can provide more meaningful guidance to construct a high-quality dataset for person re-ID task. Strictly speaking, finding that certain attributes is more important for learning models to identify pedestrians. For example, by discovering viewpoints that are effective for re-ID accuracy, our research can potentially benefit the practical usage of re-ID system. On the other hand, since GPR-800 is large-weather-range and diverse dataset, it is noteworthy that using all images to pre-train a CNN model will undoubtedly increase computational complexity, sometimes even causing the side effects in domain adaptation. In this section, we will quantitatively investigate the influences of different attributes on re-ID model learning that can potentially address this problem.

(a) (b) (c) (d)
Fig. 5: Attribute analysis with style representation on VGG-19 when trained on synthetic GPR-800, while tested on Market-1501. It can be easily observed that the most critical factor in each datasets corresponds with items which have minimum loss in each attribute. Orange in the bar chart indicates the most important factor in attribute analysis when performing re-ID task GPR-1000 Market-1501.
(a) (b) (c) (d)
Fig. 6: Attribute analysis with style representation on VGG-19 when trained on synthetic GPR-800, while tested on DukeMTMC-reID. It can be easily observed that the most critical factor in each datasets corresponds with items which have minimum loss in each attribute. Orange in the bar chart indicates the most important factor in attribute analysis when performing re-ID task GPR-1000 DukeMTMC-reID.

Iv-a Attribute Analysis with Style Representation

Style representation [5, 4]

has achieved remarkable results in the area of computer vision, which computes the correlations between the different filter responses. To obtain a representation of the style of an input image, we use a feature space designed to capture texture information, which is built on top of the CNN responses in each layer of the network. It consists of the correlations between the different filter responses, where the expectation is taken over the spatial extent of the feature maps. These feature correlations can be written as,

(1)

where is Gram Matrix to define different feature correlations, denote the inner product between the vectorised feature maps and in layer , is the activation of the filter at position in layer .

Indeed, we can visualise the information captured by these style feature spaces built on different layers of the network. To generate a texture that matches the style of a given image, we use gradient descent from a white noise image to find another image that matches the style representation of original image. This is done by minimising the mean-squared distance between the entries of the Gram Matrix from the original image and the Gram Matrix of the target real image. To this end, the contribution of that layer to the total loss is then written as,

(2)

where and denote their respective style representation in lay respectively, indicates the number of distinct filters, is the product of height the width of the feature map. And the total attribute-style loss is written as,

(3)

where is weighting factors of the contribution of each layer to the total loss, and denote be the original image and the image that is in target domain.

After obtaining the attribute-style loss between the GPR-800 dataset and the target dataset, we firstly conduct a simple statistical analysis, as depicted in Fig. 5 and Fig. 6. It can be easily observed that the datasets with different attributes have a distinct distribution in a specific task. For example, when performing adaptation from GPR-1000 Market-1501, background (#1, #4 and #6) is more sensitive since it has relatively smaller attribute-style loss, as shown in Fig. 5(a). In the Subsection 5.2, we will quantitatively verify the effectiveness of attribute analysis strategy in dissecting person re-ID task.

Iv-B re-ID Evaluation

Many existing re-ID approaches [24]

are based on a model pre-trained on ImageNet 

[3], and we follow the similar setting in [11] to obtain a initializing model. In particular, the last fully connected layer is discarded and two additional FC layer are added. To be more specific, the first FC layer has 2,048 dimensions, which is training with cross-entropy loss [20] by casting training process as a classification problem, while the second FC is dimensional, where is the number of identity in source dataset, batch-hard triplet loss [7] is employed with second FC layer by treating the training process as a verification problem, this gives rise to the re-ID evaluation stage of our proposed pipeline (Fig. 2).

 

#identity #box #background #weather #illumination #viewpoint mAP rank-1 rank-5

 

100 604,800 4.5 12.9 27.6
400 2,419,200 4.7 14.1 28.3
700 4,233,600 11.4 31.3 49.5
100 67,200 #6 7.3 21.2 36.4
100 86,400 clear 8.8 24.1 38.9
100 75,600 09 12 9.7 26.3 42.7
100 302,400 , ,,,, 10.2 28.5 45.0
800 115,200 #1,#4,#6 clear,neutral 06 18 , ,,,, 14.3 34.2 58.5

 

TABLE II: Ablation study on Market-1501. After adopted our attribute analysis with style representation, we can find the most critical factor when adapting our model to Marlet-1501. Specifically, indicates using all conditions in each attribute. Measured by %.

 

#identity #box #background #weather #illumination #viewpoint mAP rank-1 rank-5

 

100 604,800 3.8 11.9 20.4
400 2,419,200 9.3 24.1 37.3
700 4,233,600 14.8 33.5 47.2
100 67,200 #7 5.9 13.7 26.1
100 86,400 overcast 5.7 13.6 25.8
100 75,600 09 12 11.3 24.6 38.6
100 302,400 ,,,,, 12.6 28.3 44.8
800 115,200 #3,#6,#7 overcast,rainy 06 18 ,,,,, 14.2 31.7 46.6

 

TABLE III: Ablation study on DukeMTMC-reID. After adopted our attribute analysis with style representation, we can find the most critical factor when adapting our model to Marlet-1501. Specifically, indicates using all conditions in each attribute. Measured by %.

V Experiments and Evaluation

In this paper, we evaluate our method on two large-scale benchmark datasets, Market-1501 [21] and DukeMTMC-reID [13, 22].

Market-1501 This dataset has 32,668 person images of 1,501 identities. There are 6 cameras views in the summer campus. As official setting, 751 ids are used for training and the rest 750 ids are used for testing. The query contains 3,368 images.

DukeMTMC-reID This dataset is constructed from the multi-camera tracking dataset DukeMTMC, it contains 1,812 identities. 702 identities are used as the training set and the remaining 1,110 identities as the testing set. It contains 36,411 images in total, 2,228 images are used as queries.

Protocols In the experiment, we follow the standard evaluation protocol used in [23] and adopt mean Average Precision (mAP) and Cumulative Marching Characteristics (CMC) at rank-1, rank-5 for performance evaluation on all the candidate datasets.

V-a Experiments Setting

In experiment, we adopted PyTorch 

[12] to implement and train our re-ID learning network. The training procedure is standard and requires no bells and whistles. For attribute analysis, attribute-style loss presented in Fig. 5 and Fig. 6 were generated on the basis of the VGG network [14], and we empirically set = 0.2 in Eq. 3. During evaluation stage, our network is modified based on the ResNet-50 [6], following the baseline training strategy introduced in [11] on Tesla P100 GPU. In particular, we keep the aspect ratio of input images and resize them to 12864. Note that we only employ random erasing to training set for data augmentation, and choose SGD as the optimizer with momentum 0.9. The weight decay factor for L2 regularization is set to 0.0005.

V-B Evaluation

We evaluate the impacts of different attributes on a basic person re-ID system. Note that the experimental groups are used to assess the impact of different attributes in the training set. For a clearer understanding, we will demonstrate more comprehensive results with both figures and tables.

Experiment design. We train a re-ID model on the original synthetic dataset comprised of 4 different attributes, e.g., background, weather, illumination and viewpoint. We tested on one of attributes while keep the rest fixed. Before performing re-ID evaluation, we first employ style representation to select some important factors. For instance, when tested on Market-1501, we calculate the attribute-style loss with Eq. 3. Each attribute with relatively smaller attribute-style loss can be regarded as key factors. As shown in Fig. 5, background (#6), weather (clear), illumination (0912) and viewpoint ()

tend to have relatively smaller attribute-style loss. We argue that, it is probably because that this specific attributes set is more close to real market-1501 scene. Consequently, it undoubtedly leads to much better performance when adapted to Market-1501.

V-C Effectiveness of Attribute Analysis

In the section, we further explore the effectiveness of attribute analysis. Table II and Table III present the results obtained by the above attribute analysis strategy. There are several observations which can be made as follows.

First, we can easily observe that using more IDs will noticeably improve the re-ID performance. For example, as presented in Table II and Table III, we can only achieve a performance of 4.5% and 3.8% in mAP accuracy when tested on Market-1501 and DukeMTMC-reID, respectively. Moreover, adding IDs to 700 as supervised information notably improves the re-ID accuracy, leading to +6.9% and +11.0% in mAP accuracy.

Second, with constraint of ID number equals to 100, using background (#6) and background (#7) bring about +2.8% and +2.1% more improvement than using all backgrounds; using weather (clear) and weather (overcast) as constriant can take additional improvement of +4.3% and +1.9% in mAP accuracy. Same conclusion can also be drew by taking “illumination” into consideration. Furthermore, we can achieve a significant improvement of +15.6% and +16.4% in rank-1 accuracy by selecting some critical viewpoints when tested on Market-1501 and DukeMTMC-reID dataset. In other words, the major observation is that the retrieval accuracy will be negatively affected and there will be a non-trivial performance drop without these attribute constraints, which verifies the effectiveness of our proposed attribute analysis strategy.

To this end, by taking background, weather, illumination and viewpoint into account, we can further boost the performance to 14.3% and 14.2% when performing GPR-1000 Market-1501 and GPR-1000 DukeMTMC-reID, respectively. Compared with training only one attribute constraint, our multiple attribute constraint can achieve better performance with less labeled data for training, which can save lots of time cost for training and human labor for dataset labelling. This conclusion is very meaningful since it will provide guidance for us to construct a high-quality re-ID dataset. Furthermore, we can drastically improve our performance by enhancing the diversity of re-ID train-set in the further research. Nevertheless, we find it truly fascinating that a re-ID system, which is trained to perform one of the core computational tasks of image retrieval, automatically learns image representations that allow, at least to some extent, the separation of image attribute from content.

V-D Discussion

The main purpose of this paper is to evaluate the effect of different attributes on re-ID system with a simple baseline, so high performance in domain adaptation is not our point. In essence, performance of re-ID model trained on synthetic dataset, while tested on Market-1501 or DukeMTMC-reID may not be so competitive when compared with the SOTA methods [25, 19, 18], which is trained on real dataset, we argue that, it probably because there exists a huge gap between synthetic and real image distribution, so learning an invariant feature is extremely difficult when directly performing adaptation from synthetic to real domain.

Vi Discussion and Conclusion

This paper makes a step from engineering new technologies to science new discoveries. We make two contributions to the community. First, we build a synthetic data engine GPR-X than can generate images under controllable cameras and environments. Second, based on GPR-X, we conduct comprehensive experiments to quantitatively assess the influence of dataset attributes on person re-ID accuracy. To be more specific, we can find most critical factors by style representation when given a specific target domain, which can greatly decrease the scale of training samples, and save time cost for training and human label expenses for labelling. In the future, we will further explore other style transferring methods to bridge the gap between synthetic and real dataset, and boost the performance for re-ID tasks .

References

  • [1] S. Bak, P. Carr, and J. Lalonde (2018) Domain adaptation through synthesis for unsupervised person re-identification. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 189–205. Cited by: TABLE I, §I, §II, §II.
  • [2] I. B. Barbosa, M. Cristani, B. Caputo, A. Rognhaugen, and T. Theoharis (2018) Looking beyond appearances: synthetic training data for deep cnns in re-identification. Computer Vision and Image Understanding 167, pp. 50–62. Cited by: TABLE I, §I, §II.
  • [3] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In

    2009 IEEE conference on computer vision and pattern recognition

    ,
    pp. 248–255. Cited by: §IV-B.
  • [4] L. A. Gatys, A. S. Ecker, M. Bethge, A. Hertzmann, and E. Shechtman (2017) Controlling perceptual factors in neural style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3985–3993. Cited by: §II, §IV-A.
  • [5] L. A. Gatys, A. S. Ecker, and M. Bethge (2016)

    Image style transfer using convolutional neural networks

    .
    In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2414–2423. Cited by: §II, §IV-A.
  • [6] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §I, §V-A.
  • [7] A. Hermans, L. Beyer, and B. Leibe (2017) In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737. Cited by: §IV-B.
  • [8] M. M. Kalayeh, E. Basaran, M. Gökmen, M. E. Kamasak, and M. Shah (2018) Human semantic parsing for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1062–1071. Cited by: §I.
  • [9] W. Li, R. Zhao, T. Xiao, and X. Wang (2014) Deepreid: deep filter pairing neural network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 152–159. Cited by: TABLE I, §I.
  • [10] S. Liao, Y. Hu, X. Zhu, and S. Z. Li (2015) Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2197–2206. Cited by: §I.
  • [11] H. Luo, Y. Gu, X. Liao, S. Lai, and W. Jiang (2019) Bag of tricks and a strong baseline for deep person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0. Cited by: §IV-B, §V-A.
  • [12] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in pytorch. Cited by: §V-A.
  • [13] E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi (2016) Performance measures and a data set for multi-target, multi-camera tracking. In European Conference on Computer Vision, pp. 17–35. Cited by: TABLE I, §I, §V.
  • [14] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §V-A.
  • [15] X. Sun and L. Zheng (2019) Dissecting person re-identification from the viewpoint of viewpoint. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 608–617. Cited by: TABLE I, §I, §II, §II.
  • [16] S. Xiang, Y. Fu, and T. Liu (2019) Deep unsupervised progressive learning for distant domain adaptation. In

    2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)

    ,
    pp. 903–908. Cited by: §II.
  • [17] S. Xiang, Y. Fu, M. Xie, Z. Yu, and T. Liu (2020)

    Unsupervised person re-identification by hierarchical cluster and domain transfer

    .
    MULTIMEDIA TOOLS AND APPLICATIONS. Cited by: §I.
  • [18] S. Xiang, Y. Fu, G. You, and T. Liu Unsupervised domain adaptation through synthesis for person re-identification. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. Cited by: §V-D.
  • [19] H. Yu, W. Zheng, A. Wu, X. Guo, S. Gong, and J. Lai (2019) Unsupervised person re-identification by soft multilabel learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2148–2157. Cited by: §V-D.
  • [20] Z. Zhang and M. Sabuncu (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. In Advances in neural information processing systems, pp. 8778–8788. Cited by: §IV-B.
  • [21] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian (2015) Scalable person re-identification: a benchmark. In Proceedings of the IEEE international conference on computer vision, pp. 1116–1124. Cited by: TABLE I, §I, §V.
  • [22] Z. Zheng, L. Zheng, and Y. Yang (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3754–3762. Cited by: §V.
  • [23] Z. Zhong, L. Zheng, D. Cao, and S. Li (2017) Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1318–1327. Cited by: §V.
  • [24] Z. Zhong, L. Zheng, S. Li, and Y. Yang (2018) Generalizing a person retrieval model hetero-and homogeneously. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 172–188. Cited by: §IV-B.
  • [25] Z. Zhong, L. Zheng, Z. Luo, S. Li, and Y. Yang (2019) Invariance matters: exemplar memory for domain adaptive person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 598–607. Cited by: §V-D.