In many domains, such as surveillance or digital signage, being able to automatically recognize a person across different, non-overlapping cameras, without the help of a human operator is very valuable. This task is known as person re-identification and can be extremely challenging since great variations can occur between the different cameras. Figure 1 shows two images, taken from two different cameras from three academic datasets: VIPeR , CUHK01  and CUHK03 . Variation can be large between two pictures belonging to the same dataset such as body pose, luminosity, view angle or background.
In many works, person re-identification is based on a similarity score between a pair of images. If the two images represent the same person, the similarity score is high. Two aspects are usually studied. The first one consists in extracting robust invariant features to represent the appearance of a person [5, 6, 7]. The second is metric learning [8, 9]: it consists in learning the best possible metric to discriminate between positive and negative samples.
Recently, convolutional neural networks demonstrated very high efficiency in several computer vision problems such as image segmentation or object recognition . Many research projects have proved that deep neural networks are also extremely efficient for re-identification [12, 13, 14, 15].
To train such a deep neural network, large datasets are mandatory. Recently, re-identification datasets large enough to train deep models have emerged [13, 4, 16]. In many works [17, 15], a neural network is trained on a large dataset and then fine tuned on a smaller one. Consequently, for the performance evaluation, a specific fine tuned model is used to evaluate its corresponding dataset. For an industrial purpose, having a single model able to perform well on many datasets is extremely important. It means that the model can handle different situations, which enables deploying the same model on cameras installed in different environments.
Re-identification with CNNs is usually performed using features extracted by the neural network from identities during the training phase. Attributes, that are more high-level features, like gender, clothes length, handbag may be extremely valuable for re-identification since such features are truly robust to view-angle and cameras change.Schumann et al.  demonstrated that using only attributes leads to low performance compared to the features learned by a CNN from the identities. A good approach is therefore to use a combination of attributes and features extracted from identities.
To have a system able to make use of attributes, an access to a large dataset annotated with attributes is required. Nevertheless, it is difficult to acquire large training data for a set of attributes since manual annotations is extremely expensive. Thus, only a subset of re-identification datasets is annotated with attributes and many of them will remain attributeless. It is therefore a problem to build a general system that performs well on several datasets and make use of attributes. To deal with the variation of size and annotation of re-identification datasets, we present in this paper a multi-task learning approach which learns the re-identification task from a combination of several datasets. Furthermore, our system is able to take advantage of attribute information in dataset annotated with.
Two main strategies are used in the re-identification community for training deep neural networks. We will describe them in more detail in the next section. The first one [18, 19, 20] is based on siamese networks, contractive or triplet losses. The second one [21, 14]
, used in our work, is based on classification losses. Since the last layer is a linear classifier, classification methods ensure that features are linearly separable. Consequently, the distance between features belonging to two different classes increases. Nevertheless, with this approach, the intra-class variation is not controlled. Intuitively, reducing the intra-class variations can make the features more discriminant and then increase the re-identification performance. In this work, we add one task in our multi-task learning objective: a task designed to force the features of same identities to be the closest as possible. For the implementation, we employ a method described in which proposes a loss called center loss. We then evaluate the interest of this center loss for our re-identification system.
The contributions of our work are three folds:
We take advantage of attributes available in some re-identification datasets such as hair length, top/bottom color, clothes length. We have a multi-task learning objective: the re-identification task, learned from all the datasets and the attribute classification tasks, learned from a subset of the available datasets.
We evaluate an auxiliary task designed to control the intraclass variation of the re-identification features. This task is based on the center loss described in .
2 Related work
Many studies are lead on re-identification, and today CNN and deep learning approaches are well studied and show very good performance on many datasets[23, 21, 19, 24].
, triplet loss networks[26, 19], quadruplet loss  to learn a representation based on different and identical couple/triplet. The other approach is based on identity losses [21, 16]
, in which each identity is seen as a class. A classification loss function, such as softmax cross entropy is usually employed.
Training is different whether one chooses the first or the second approach. Triplet loss networks can be difficult to train since one needs to preprocess data to find triplets and hard positive and negative samples . Compared to the number of samples in the dataset, the couple/triplet needs for training dramatically increase and can lead to slow convergence.
Ahmed et al.  takes two images for both train and test time to be able to decide whether the two images represent the same person or not. This approach needs to perform an inference each time we need to compare two images, which requires lot of computing power during a search.
To deal with several datasets Xiao et al.  developed a guided dropout strategy to learn person representation across different dataset. Other approaches specialize a trained network to a particular dataset. For example, because a large CNN needs many samples, many works fine-tune a network trained on a large re-identification dataset to a smaller one to not overfit [17, 15, 23].
Multi-task learning for person re-identification
Multi-task learning  has been applied to re-identification. For example, in  the authors use a siamese network, the different tasks are attributes classification tasks. In , re-identifications from multiple cameras are regarded as related tasks. Some approaches use a network with two branches [30, 15], with jointly optimized losses.
Attributes for person re-identification
Attributes have been extensively studied in re-identification [31, 32, 33]. Attributes can preserve robust information of a person across different point of views or conditions and then it is natural to use them for re-identification. More recently, attributes have been used with deep learning approaches. These architectures can be trained with relatively large re-identification datasets annotated with attributes [34, 13]. Some researchers use architectures with two branches, one for the re-identification and one for the attribute extraction and combine them [19, 12]. Su et al  use three stages of fine tuning and a triplet based loss. Matsukawa et. al  only use a combination of losses based on attributes to create a representation able to perform re-identification.
3 Proposed approach
In this paper we present a deep learning approach for the problem of person re-identification. Given an image with a person, the network outputs a global representation of this person. This representation should be independent from the person pose. We call it the signature of the person. Furthermore, in our architecture, the same network also outputs a list of attributes. The complete attributes list we support is detailed in Table 1. To decide if two pictures represent the same person, we compute the cosine distance between the two signatures. The smallest the distance is, the more likely it is for the two signatures to represent the same person.
|top color||(black ,blue, green, grey, purple, red, white, yellow)|
|bottom color||(black ,blue, brown, grey, green, pink, purple, white, yellow)|
|top length||(long, short)|
|bottom length||(long, short)|
|hand bag||(true, false)|
|other bag||(true, false)|
|hair length||(long, short)|
3.1 Model design
This layer is the global person representation (the signature). Therefore, it is used to perform the re-identification. In our architecture, its size is set to . Two losses are used to train this layer. One identity loss and the center loss .
The objective of the identity loss is to make the network able to classify each identity into the correct class. It is a multi-class classification, therefore the cross-entropy softmax loss is used.
The identity loss forces the deep features to be separable. To reduce the intra class variation, we use the center loss, introduced byWen et al.  that organizes the deep features around a center for each classes. During the training, the centers are learned and the distance between the deep features and their corresponding center are minimized. Wen et al shown that the center loss is differentiable: our deep neural network can therefore be trained with a standard algorithm based on gradient descent. As stated in , the center loss is given by (1), in which is the size of the batch, is the center of the class and is deep feature. Dimensions of both and are the dimension of FC1: .
The balance between the identity loss and the center loss is done by a factor.
Our system is trained with attributes. We support several attributes and, for each of them, we train a classifier. Two types of classifiers are used for the two types of attributes. For the binary attribute (e.g. male/female) binary classifiers are used and trained with a sigmoid cross entropy. For the multi-class attributes (e.g. top/bottom colors) a softmax cross entropy loss is used. To avoid corrupting the global representation of FC1, we choose to connect the attributes classifiers on FC2, which is a dimensions layer. The weight of the attributes losses is controlled by a parameter.
3.1.1 Multi-task learning
The learning objectives are controlled by several losses, two for FC1 and for FC2, with the number of attributes. As showed in Figure 3, the learning is performed by the combination of all these losses.
Let’s the identity loss, the center loss and the sum of the attribute losses, our total loss is given by (2)
3.1.2 Re-identification process
The re-identification process is based on the global representation FC1 extracted from pictures. The similarity between pictures is computed with the cosine distance.
In which and are two
dimension vectors extracted from two pictures by the system. Andrepresent the euclidean dot product.
3.2 Learning strategy
The aim of our work is to build a single system able to perform well simultaneously on all the datasets listed on Table 2 and to recognize pedestrian attributes. We describe in this section the way we chose to reach this goal.
3.2.1 Joint datasets learning
One of the main objectives of our system is to show good performances on diverse datasets. Thus to build our training set, we proceed by mixing the training sets of these datasets. This approach is valid since there is no identity overlap between the different datasets. Let’s consider we have datasets, the number of identity of the is denoted by . The total number of identities of our global dataset is .
The identities in the datasets are not represented by the same number of images. Thus one can not simply merge the datasets since the information contained in the smallest datasets would then be negligible compared to that contained in the large ones. To tackle this issue we employe a weighted cross entropy for the identity loss.
Let represent the number of images in the class of the dataset.
represents the value of the corresponding logit andthe associated ground truth. The identity loss is thus given by
This ensure a high weight for the classes under represented and an lower weigh for the most frequent ones. This loss is also appropriate for an optimization in batch mode, in which we compute the weighted loss for each element of the batch and we compute the mean over all the cross entropies. Let be the size of the batch we denote the loss corresponding to the element of the batch by defined as in (4). One can then write the final loss :
3.2.2 Attribute recogntion task
To be able to have only one network able to output both re-identification and attributes, multi-task learning is used. We use three losses: the ones defined in (4) and in (1) for the re-identification task and another for the attributes extraction task. As for the identity loss, the attribute loss can be written as a modified cross entropy. Let suppose there are annotated classes in our dataset, with attributes. The loss corresponding to the attribute of the class is written and is:
Where represents the dimension of the logit layer of the attribute, the output of the logit and the corresponding ground truth. One can then write the loss relative to the attributes:
During training, we randomly sample batches of images. While all the images are annotated for the re-identification task, only some of them are annotated with attributes. Therefore the re-identification loss is computed with all the batch samples, the attributes loss is updated on a subset of the batch. More details can be found on Figure 3, in which each sample of a batch has an identity used for the re-identification task. Some of them also have attributes annotations. The identity loss is thus always computed on all the samples of the batch. On this example, the identity loss is computed on the 9 batch samples. The attribute loss is only computed with the samples having attributes annotation. In the batch given in example, there are only 3 samples annotated.
However in the dataset annotated with attributes, appearance frequencies of each attribute are not equal. For example, the blue pants class is more represented than the pink pants in the dataset. To deal with this unbalanced dataset a penalty in introduced in the loss. Such as for 3.2.1 the loss for a specific attribute (6) is weighted to penalize the most represented classes. Let represent the number of occurrences of the class of the attribute. One can then re-write (6):
4 Performance evaluation
We first present these datasets and the protocol followed to compute the performance metrics. Then we show the results of our approach and compare them to the state of the art.
Dataset with 3,884 images of 972 pedestrians, each identity is observed by two cameras view. Each person has two images from the first camera and two images from the second camera. All pedestrian images are manually cropped
dataset contains 14,097 cropped images of 1,467 identities. Each identity is observed by two camera views and contains 4.8 images in average for each view. There are two types of bounding boxes: the manually labeled pedestrian bounding boxes and the automaticaly detected bounding boxes obtained by a pedestrian detector.
dataset is a very challenging dataset, since it contains 632 pedestrian image pairs taken from arbitrary viewpoints with various illumination conditions or poses.
- Market 1501
This dataset contains 32668 images annotated using a DPM (Deformable Part Model) giving 1501 identities, split in a training set of 751 identities and a test set of 750 identities. Each identity is captured by at most 6 and at least 2 so that cross view search can be performed. Furthermore one can focus on the search from one viewpoint in each other. The dataset has been annotated with 27 attributes per ID.
The statistics of this four datasets are summarized on Table 2.
4.2 Metric and protocol
. The CMC curve represents the probability of correct re-identification on the y-axis against the number of candidates returned on the x-axis. The CMC rank1 is very important since it measures the ability of the system to truly identify a person.
For the different datasets we use the protocol described in  which is based on . For CUHK01 and VIPeR we divide the identities of the dataset in two equal parts, i.e 485 and 316, for the test set and the training set. For CUHK03, we use the commonly used split: 1467 identities for the training set and 100 identities for the test set.
Our gallery sets and probe sets are constructed as follows. For VIPeR, which has two camera views, we randomly select an image from the first camera as probe image. The gallery image is the same identity taken from the other camera. For CUHK03 and CUHK01 we use a similar protocol, we ensure that images of the gallery sets and probe sets are not the same camera. As stated in [21, 40] both the manually and automatically cropped images were used in our experimentation.
4.2.2 Training protocol
We train several networks to compare the influence of the center loss and the attributes loss.
For the first stage, we train networks without the attributes losses. The resnet50 network we use is pretrained on imagenet. The weights are the one distributed by the Tensorflow community. The name of the checkpoint is resnet_ v2_50_2017_04_14.tar.gz and can be downloaded from the Tensorflow GitHub repository111https://github.com/tensorflow. The learning is performed using the Adam  optimizer set with an initial learning rate at for this stage. We train 4 networks, each network is trained with a particular center loss value (, , and ).
For the second stage we load the weight previously computed and we launch a train with the Adam optimizer with learning rate sat to . We generate two types of models, the first type are models still trained without the attributes losses and the second type are models trained with the attributes losses. At the end of these two stages we therefore have generated 10 models, for the 5 center loss values we have a model with attributes and a model without attributes.
|, , and|
We discuss in this section both the influence of the center loss and the attributes. We first discuss only the center loss. To show its influence, the models are trained without attributes. Then we show the influence of the attributes. We therefore use our models trained with the attributes and the center loss.
4.3 Influence of the center loss
On this section we vary the parameter which controls the center loss weight on the global loss.
We focus in this section on performance without the attributes loss. The results are the ones in the No Attributes columns from the Tables 4 . To interpret the results we draw on Figures (a)a (resp. Figure (b)b) the value of the rank-1 CMC during the training (resp. the value of the center loss during the training). The rank-1 CMC during training is computed with all the datasets combined. We thus compute the CMC against all the identities of all the datasets. We control the regularization of our network to have a rank-1 CMC close to . Higher values lead to a rapid overfitting as we will show with a center loss value () set to .
The different values of the center loss lead to different performance. After we observe a drop of performance on all the three datasets. During training, the value of the center loss decreases according to the parameter. This effect can be seen figure (b)b. With the value of the center loss decreases dramatically. This has a strong influence to the CMC computed on the training datasets showed on Figure (a)a: the rank-1 CMC reaches a value near to 1.0 which leads to overfitting.
This shows that the center loss has a strong effect during the training, putting the features from a same identify close to a same center increases the capacity of the neural network. With our set of hyperparameters the ideal set of value of the center loss is lower than .
The center loss has not the same impact on all the datasets. With VIPeR the value increase of points. For CUHK01 and CUHK03 the value increase of and points. It shows that the center loss really helps the network to output general representation of pedestrian. Indeed in the ViPER dataset the variation between the two images of a same class is higher than the ones in the other datasets. Thus the network has to be able of great generalization to perform well on ViPER.
4.4 Influence of the attributes
We now focus on the attributes. Our objectives are to understand how the attributes help the re-identification score and how our model performs on the attributes extraction task. To compare the influence of the attributes we train our network with the different values of center loss with the attribute system activated. To activate the attribute we add the factor on all the attributes losses for our experiments(the factor). This factor has been empirically found and produces the best results. As shown in Table 4 the attributes losses make the network more efficient on all the datasets.
4.5 Comparison with the state of the art
We compare the performance of our system with the state of the art on CUHK01, CUHK03 and VIPeR. Results are shown on Table 5. We take our best model in this Table, i.e our model with the attributes losses enabled and the value set to .
|Best||66.6 ||75.3 ||47.8 |
Our system shows better performances on CUHK01 and CUHK03 than the state of the art while being lower on VIPeR. It shows that our approach indeed enables a network to be performant on a large variety of datasets without needing a special retraining for each of them.
4.6 Attributes performances
In this section, the performances of our network will be presented. The tests are run over the test set of the dataset Market-1501. Since some of the classes occur too rarely in the test set to be representative (for instance the bottom yellow and purple) they are removed from the tests. The average precision of the network on the attributes recognition are presented in Table 6. The possible values of the attribute are available on Table 1. For both bottom color and top color attributes, we compute the mean of the average precision of each color. These mean average precision are shown on Table 6 for the column bot.col and top.col.
This shows that our system is able to learn to recognize attributes and to perform re-identification at the same time. The attributes classifiers show very good performance on attributes such as gender, hair length, or backpack. Some attributes such as hand bag have a low average precision. This is probably because these classes are under represented on the train dataset. Even if we manage the attribute unbalance during training, when the number of samples for a given class is too limited the network cannot properly generalize.
In this paper we have shown a CNN architecture for person re-identification. This architecture is trained with multi-task learning in order to have a system able to be trained from different datasets with different labels. We have demonstrated our approach on 4 datasets: two datasets are relatively small (VIPeR and CUHK01), an another one is larger (CUHK03) and the last one is annotated with attributes (Market 1501). We have evaluated the influence of the different tasks on the global performance (center loss and attributes losses). We proved that combining the different losses leads to better performances. The center loss has a strong influence on performance. We have shown that our system performs well on CUHK01, CUHK03 and VIPeR, and outperforms recent re-identification works on CUHK01 and CUHK03.
Wen, Y., Zhang, K., Li, Z., Qiao, Y.:
A discriminative feature learning approach for deep face recognition.In: Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VII. (2016) 499–515
-  Gray, D., Tao, H.: Viewpoint invariant pedestrian recognition with an ensemble of localized features. In Forsyth, D.A., Torr, P.H.S., Zisserman, A., eds.: ECCV (1). Volume 5302 of Lecture Notes in Computer Science., Springer (2008) 262–275
-  Li, W., Zhao, R., Wang, X.: Human reidentification with transferred metric learning. In: ACCV. (2012)
-  Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: Deep filter pairing neural network for person re-identification. In: CVPR. (2014)
-  Liu, C., Gong, S., Loy, C.C., Lin, X.: Person re-identification: What features are important? In Fusiello, A., Murino, V., Cucchiara, R., eds.: Computer Vision – ECCV 2012. Workshops and Demonstrations, Berlin, Heidelberg, Springer Berlin Heidelberg (2012) 391–401
-  Madden, C., Cheng, E.D., Piccardi, M.: Tracking people across disjoint camera views by an illumination-tolerant appearance representation. Machine Vision and Applications 18(3) (Aug 2007) 233–247
Farenzena, M., Bazzani, L., Perina, A., Murino, V., Cristani, M.:
Person re-identification by symmetry-driven accumulation of local
In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. (June 2010) 2360–2367
-  Dikmen, M., Akbas, E., Huang, T.S., Ahuja, N.: Pedestrian recognition with a learned metric. In Kimmel, R., Klette, R., Sugimoto, A., eds.: Computer Vision – ACCV 2010, Berlin, Heidelberg, Springer Berlin Heidelberg (2011) 501–512
-  Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, IEEE Computer Society (2015) 2197–2206
-  Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, IEEE Computer Society (2015) 3431–3440
-  Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q., eds.: Advances in Neural Information Processing Systems 25. Curran Associates, Inc. (2012) 1097–1105
-  Matsukawa, T., Suzuki, E.: Person re-identification using cnn features learned from combination of attributes. In: 2016 23rd International Conference on Pattern Recognition (ICPR). (Dec 2016) 2428–2433
-  Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: A benchmark. In: Computer Vision, IEEE International Conference on. (2015)
-  Zheng, L., Yang, Y., Hauptmann, A.G.: Person re-identification: Past, present and future. (2016)
-  Schumann, A., Stiefelhagen, R.: Person re-identification by deep learning attribute-complementary information. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). (July 2017) 1435–1443
-  Springer: MARS: A Video Benchmark for Large-Scale Person Re-identification. In: European Conference on Computer Vision, Springer (2016)
-  Shi, H., Yang, Y., Zhu, X., Liao, S., Lei, Z., Zheng, W., Li, S.Z.: Embedding deep metric for person re-identification: A study against large variations. In: Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I. (2016) 732–748
-  Chung, D., Tahboub, K., Delp, E.J.: A two stream siamese convolutional neural network for person re-identification. In: 2017 IEEE International Conference on Computer Vision (ICCV). (Oct 2017) 1992–2000
-  Chen, Y., Duffner, S., Stoian, A., Dufour, J.Y., Baskurt, A.: Triplet cnn and pedestrian attribute recognition for improved person re-identification. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). (Aug 2017) 1–6
-  Chen, W., Chen, X., Zhang, J., Huang, K.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: International Conference on Computer Vision and Pattern Recognition (CVPR). (2017)
-  Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (June 2016) 1249–1258
-  Lin, Y., Zheng, L., Zheng, Zhedong and, W.Y.a.Y.Y.: Improving person re-identification by attribute and identity learning. (2017)
-  Ahmed, E., Jones, M., Marks, T.K.: An improved deep learning architecture for person re-identification. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (June 2015) 3908–3916
-  Zheng, Z., Zheng, L., Yang, Y.: A discriminatively learned CNN embedding for person reidentification. TOMCCAP 14(1) (2018) 13:1–13:20
-  Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a ”siamese” time delay neural network. In Cowan, J.D., Tesauro, G., Alspector, J., eds.: Advances in Neural Information Processing Systems 6. Morgan-Kaufmann (1994) 737–744
-  Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (June 2015) 815–823
-  Caruana, R.: Multitask learning: A knowledge-based source of inductive bias. In: Proceedings of the Tenth International Conference on Machine Learning, Morgan Kaufmann (1993) 41–48
-  McLaughlin, N., del Rincón, J.M., Miller, P.C.: Person reidentification using deep convnets with multitask learning. IEEE Trans. Circuits Syst. Video Techn. 27(3) (2017) 525–539
-  Su, C., Yang, F., Zhang, S., Tian, Q., Davis, L.S., Gao, W.: Multi-task learning with low rank attribute embedding for multi-camera person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence PP(99) (2018) 1–1
Li, W., Zhu, X., Gong, S.:
Person re-identification by deep joint learning of multi-loss
In Sierra, C., ed.: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, ijcai.org (2017) 2194–2200
-  Layne, R., Hospedales, T., Gong, S. In: Person Re-identification by Attributes. BMVA Press (2012) 1–11
-  Layne, R., Hospedales, T.M., Gong, S.: Towards person identification and re-identification with attributes. In Fusiello, A., Murino, V., Cucchiara, R., eds.: Computer Vision – ECCV 2012. Workshops and Demonstrations, Berlin, Heidelberg, Springer Berlin Heidelberg (2012) 402–412
-  Layne, R., Hospedales, T.M., Gong, S. In: Attributes-Based Re-identification. Springer London, London (2014) 93–117
-  Deng, Y., Luo, P., Loy, C.C., Tang, X.: Pedestrian attribute recognition at far distance. In: Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, USA, November 03 - 07, 2014. (2014) 789–792
-  Su, C., Zhang, S., Xing, J., Gao, W., Tian, Q.: Deep attributes driven multi-camera person re-identification. In: Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II. (2016) 475–491
-  He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. (2016) 770–778
-  Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR. Volume abs/1412.6980. (2014)
-  Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015) Software available from tensorflow.org.
-  Gray, D., Brennan, S., Tao, H.: Evaluating appearance models for recognition, reacquisition, and tracking. In: In IEEE International Workshop on Performance Evaluation for Tracking and Surveillance, Rio de Janeiro. (2007)
-  Paisitkriangkrai, S., Shen, C., van den Hengel, A.: Learning to rank in person re-identification with metric ensembles. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015) 1846–1855
-  Cheng, D., Gong, Y., Zhou, S., Wang, J., Zheng, N.: Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (June 2016) 1335–1344