Person re-identification across different datasets with multi-task learning

07/25/2018
by   Matthieu Ospici, et al.
Atos
0

This paper presents an approach to tackle the re-identification problem. This is a challenging problem due to the large variation of pose, illumination or camera view. More and more datasets are available to train machine learning models for person re-identification. These datasets vary in conditions: cameras numbers, camera positions, location, season, in size, i.e. number of images, number of different identities. Finally in labeling: there are datasets annotated with attributes while others are not. To deal with this variety of datasets we present in this paper an approach to take information from different datasets to build a system which performs well on all of them. Our model is based on a Convolutional Neural Network (CNN) and trained using multitask learning. Several losses are used to extract the different information available in the different datasets. Our main task is learned with a classification loss. To reduce the intra-class variation we experiment with the center loss. Our paper ends with a performance evaluation in which we discuss the influence of the different losses on the global re-identification performance. We show that with our method, we are able to build a system that performs well on different datasets and simultaneously extracts attributes. We also show that our system outperforms recent re-identification works on two datasets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

03/01/2020

FMT:Fusing Multi-task Convolutional Neural Network for Person Search

Person search is to detect all persons and identify the query persons fr...
07/24/2021

Going Deeper into Semi-supervised Person Re-identification

Person re-identification is the challenging task of identifying a person...
12/21/2018

A Deep Four-Stream Siamese Convolutional Neural Network with Joint Verification and Identification Loss for Person Re-detection

State-of-the-art person re-identification systems that employ a triplet ...
12/21/2018

Cluster Loss for Person Re-Identification

Person re-identification (ReID) is an important problem in computer visi...
08/28/2019

Orthogonal Center Learning with Subspace Masking for Person Re-Identification

Person re-identification aims to identify whether pairs of images belong...
07/29/2018

Towards Good Practices on Building Effective CNN Baseline Model for Person Re-identification

Person re-identification is indeed a challenging visual recognition task...
06/24/2018

Attention-based Few-Shot Person Re-identification Using Meta Learning

In this paper, we investigate the challenging task of person re-identifi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In many domains, such as surveillance or digital signage, being able to automatically recognize a person across different, non-overlapping cameras, without the help of a human operator is very valuable. This task is known as person re-identification and can be extremely challenging since great variations can occur between the different cameras. Figure 1 shows two images, taken from two different cameras from three academic datasets: VIPeR [2], CUHK01 [3] and CUHK03 [4]. Variation can be large between two pictures belonging to the same dataset such as body pose, luminosity, view angle or background.

(a) VIPeR
(b) CUHK01
(c) CUHK03
Figure 1: Three re-identification datasets used in our work.

In many works, person re-identification is based on a similarity score between a pair of images. If the two images represent the same person, the similarity score is high. Two aspects are usually studied. The first one consists in extracting robust invariant features to represent the appearance of a person [5, 6, 7]. The second is metric learning [8, 9]: it consists in learning the best possible metric to discriminate between positive and negative samples.

Recently, convolutional neural networks demonstrated very high efficiency in several computer vision problems such as image segmentation

[10] or object recognition [11]. Many research projects have proved that deep neural networks are also extremely efficient for re-identification [12, 13, 14, 15].

To train such a deep neural network, large datasets are mandatory. Recently, re-identification datasets large enough to train deep models have emerged [13, 4, 16]. In many works [17, 15], a neural network is trained on a large dataset and then fine tuned on a smaller one. Consequently, for the performance evaluation, a specific fine tuned model is used to evaluate its corresponding dataset. For an industrial purpose, having a single model able to perform well on many datasets is extremely important. It means that the model can handle different situations, which enables deploying the same model on cameras installed in different environments.

Re-identification with CNNs is usually performed using features extracted by the neural network from identities during the training phase. Attributes, that are more high-level features, like gender, clothes length, handbag may be extremely valuable for re-identification since such features are truly robust to view-angle and cameras change.

Schumann et al. [15] demonstrated that using only attributes leads to low performance compared to the features learned by a CNN from the identities. A good approach is therefore to use a combination of attributes and features extracted from identities.

To have a system able to make use of attributes, an access to a large dataset annotated with attributes is required. Nevertheless, it is difficult to acquire large training data for a set of attributes since manual annotations is extremely expensive. Thus, only a subset of re-identification datasets is annotated with attributes and many of them will remain attributeless. It is therefore a problem to build a general system that performs well on several datasets and make use of attributes. To deal with the variation of size and annotation of re-identification datasets, we present in this paper a multi-task learning approach which learns the re-identification task from a combination of several datasets. Furthermore, our system is able to take advantage of attribute information in dataset annotated with.

Two main strategies are used in the re-identification community for training deep neural networks. We will describe them in more detail in the next section. The first one [18, 19, 20] is based on siamese networks, contractive or triplet losses. The second one [21, 14]

, used in our work, is based on classification losses. Since the last layer is a linear classifier, classification methods ensure that features are linearly separable. Consequently, the distance between features belonging to two different classes increases. Nevertheless, with this approach, the intra-class variation is not controlled. Intuitively, reducing the intra-class variations can make the features more discriminant and then increase the re-identification performance. In this work, we add one task in our multi-task learning objective: a task designed to force the features of same identities to be the closest as possible. For the implementation, we employ a method described in

[1] which proposes a loss called center loss. We then evaluate the interest of this center loss for our re-identification system.

The contributions of our work are three folds:

  • We build a model that learns a generic representation of the person using several datasets for re-identification (CHUK01 [3], CHUK03 [4], MARS [16], ViPER [2], Market1501 with attributes [22]) to build a system that performs well on all of these datasets without performing a specific fine tuning.

  • We take advantage of attributes available in some re-identification datasets such as hair length, top/bottom color, clothes length. We have a multi-task learning objective: the re-identification task, learned from all the datasets and the attribute classification tasks, learned from a subset of the available datasets.

  • We evaluate an auxiliary task designed to control the intraclass variation of the re-identification features. This task is based on the center loss described in [1].

2 Related work

Person re-identification

Many studies are lead on re-identification, and today CNN and deep learning approaches are well studied and show very good performance on many datasets

[23, 21, 19, 24].

Two types of approaches are usually chosen for recent re-identification works with CNN. First one is based on siamese networks [25, 18]

, triplet loss networks

[26, 19], quadruplet loss [20] to learn a representation based on different and identical couple/triplet. The other approach is based on identity losses [21, 16]

, in which each identity is seen as a class. A classification loss function, such as softmax cross entropy is usually employed.

Training is different whether one chooses the first or the second approach. Triplet loss networks can be difficult to train since one needs to preprocess data to find triplets and hard positive and negative samples [26]. Compared to the number of samples in the dataset, the couple/triplet needs for training dramatically increase and can lead to slow convergence.

Ahmed et al. [23] takes two images for both train and test time to be able to decide whether the two images represent the same person or not. This approach needs to perform an inference each time we need to compare two images, which requires lot of computing power during a search.

To deal with several datasets Xiao et al. [21] developed a guided dropout strategy to learn person representation across different dataset. Other approaches specialize a trained network to a particular dataset. For example, because a large CNN needs many samples, many works fine-tune a network trained on a large re-identification dataset to a smaller one to not overfit [17, 15, 23].

Multi-task learning for person re-identification

Multi-task learning [27] has been applied to re-identification. For example, in [28] the authors use a siamese network, the different tasks are attributes classification tasks. In [29], re-identifications from multiple cameras are regarded as related tasks. Some approaches use a network with two branches [30, 15], with jointly optimized losses.

Attributes for person re-identification

Attributes have been extensively studied in re-identification [31, 32, 33]. Attributes can preserve robust information of a person across different point of views or conditions and then it is natural to use them for re-identification. More recently, attributes have been used with deep learning approaches. These architectures can be trained with relatively large re-identification datasets annotated with attributes [34, 13]. Some researchers use architectures with two branches, one for the re-identification and one for the attribute extraction and combine them [19, 12]. Su et al [35] use three stages of fine tuning and a triplet based loss. Matsukawa et. al [12] only use a combination of losses based on attributes to create a representation able to perform re-identification.

3 Proposed approach

In this paper we present a deep learning approach for the problem of person re-identification. Given an image with a person, the network outputs a global representation of this person. This representation should be independent from the person pose. We call it the signature of the person. Furthermore, in our architecture, the same network also outputs a list of attributes. The complete attributes list we support is detailed in Table 1. To decide if two pictures represent the same person, we compute the cosine distance between the two signatures. The smallest the distance is, the more likely it is for the two signatures to represent the same person.

Attribute Possible Values
genre (male, female)
top color (black ,blue, green, grey, purple, red, white, yellow)
bottom color (black ,blue, brown, grey, green, pink, purple, white, yellow)
top length (long, short)
bottom length (long, short)
backpack (true, false)
hand bag (true, false)
other bag (true, false)
hair length (long, short)
Table 1: Attributes supported by our system

3.1 Model design

The network architecture used in our work is represented Figure 2. It is based on the resnet50 [36] model followed by a dropout (DP) and two fully connected layers, and .

FC1

This layer is the global person representation (the signature). Therefore, it is used to perform the re-identification. In our architecture, its size is set to . Two losses are used to train this layer. One identity loss and the center loss [1].

The objective of the identity loss is to make the network able to classify each identity into the correct class. It is a multi-class classification, therefore the cross-entropy softmax loss is used.

The identity loss forces the deep features to be separable. To reduce the intra class variation, we use the center loss, introduced by

Wen et al. [1] that organizes the deep features around a center for each classes. During the training, the centers are learned and the distance between the deep features and their corresponding center are minimized. Wen et al shown that the center loss is differentiable: our deep neural network can therefore be trained with a standard algorithm based on gradient descent. As stated in [1], the center loss is given by (1), in which is the size of the batch, is the center of the class and is deep feature. Dimensions of both and are the dimension of FC1: .

(1)

The balance between the identity loss and the center loss is done by a factor.

FC2

Our system is trained with attributes. We support several attributes and, for each of them, we train a classifier. Two types of classifiers are used for the two types of attributes. For the binary attribute (e.g. male/female) binary classifiers are used and trained with a sigmoid cross entropy. For the multi-class attributes (e.g. top/bottom colors) a softmax cross entropy loss is used. To avoid corrupting the global representation of FC1, we choose to connect the attributes classifiers on FC2, which is a dimensions layer. The weight of the attributes losses is controlled by a parameter.

3.1.1 Multi-task learning

The learning objectives are controlled by several losses, two for FC1 and for FC2, with the number of attributes. As showed in Figure 3, the learning is performed by the combination of all these losses.

Let’s the identity loss, the center loss and the sum of the attribute losses, our total loss is given by (2)

(2)

3.1.2 Re-identification process

The re-identification process is based on the global representation FC1 extracted from pictures. The similarity between pictures is computed with the cosine distance.

(3)

In which and are two

dimension vectors extracted from two pictures by the system. And

represent the euclidean dot product.

Figure 2: The CNN architecture for the re-identification. A resnet50 [36] network is followed by two fully-connected layers. The FC1 layer is used to compute the similarity distance and it is trained with the center loss and an identity loss. FC2 is trained with the attribute losses. A dropout layer (DP) is added before FC1 and FC2.
Dataset classes Training Test Attributes
CUHK01 [3] 971 1552 388 No
CUHK03 [4] 1467 21012 5252 No
VIPeR [2] 632 506 126 No
MARKET 1501 [13] 1501 12936 19732 Yes
Table 2: Datasets used on our work.

3.2 Learning strategy

The aim of our work is to build a single system able to perform well simultaneously on all the datasets listed on Table 2 and to recognize pedestrian attributes. We describe in this section the way we chose to reach this goal.

3.2.1 Joint datasets learning

One of the main objectives of our system is to show good performances on diverse datasets. Thus to build our training set, we proceed by mixing the training sets of these datasets. This approach is valid since there is no identity overlap between the different datasets. Let’s consider we have datasets, the number of identity of the is denoted by . The total number of identities of our global dataset is .

The identities in the datasets are not represented by the same number of images. Thus one can not simply merge the datasets since the information contained in the smallest datasets would then be negligible compared to that contained in the large ones. To tackle this issue we employe a weighted cross entropy for the identity loss.

Let represent the number of images in the class of the dataset.

represents the value of the corresponding logit and

the associated ground truth. The identity loss is thus given by

(4)

This ensure a high weight for the classes under represented and an lower weigh for the most frequent ones. This loss is also appropriate for an optimization in batch mode, in which we compute the weighted loss for each element of the batch and we compute the mean over all the cross entropies. Let be the size of the batch we denote the loss corresponding to the element of the batch by defined as in (4). One can then write the final loss :

(5)

The training is done with the Adam optimizer [37]. The initial learning rate is set to . We start the training of our system with a resnet50 [36]

network pretrained on imagenet.

3.2.2 Attribute recogntion task

To be able to have only one network able to output both re-identification and attributes, multi-task learning is used. We use three losses: the ones defined in (4) and in (1) for the re-identification task and another for the attributes extraction task. As for the identity loss, the attribute loss can be written as a modified cross entropy. Let suppose there are annotated classes in our dataset, with attributes. The loss corresponding to the attribute of the class is written and is:

(6)

Where represents the dimension of the logit layer of the attribute, the output of the logit and the corresponding ground truth. One can then write the loss relative to the attributes:

(7)

During training, we randomly sample batches of images. While all the images are annotated for the re-identification task, only some of them are annotated with attributes. Therefore the re-identification loss is computed with all the batch samples, the attributes loss is updated on a subset of the batch. More details can be found on Figure 3, in which each sample of a batch has an identity used for the re-identification task. Some of them also have attributes annotations. The identity loss is thus always computed on all the samples of the batch. On this example, the identity loss is computed on the 9 batch samples. The attribute loss is only computed with the samples having attributes annotation. In the batch given in example, there are only 3 samples annotated.

Figure 3: Multi-task learning implementation during training for a batch. is the identity loss, the center loss and the sum of the attribute losses.

Lets define

(8)

With representing the batch size as in (5). The loss defined in the figure 3 can the be written as follows:

(9)

However in the dataset annotated with attributes, appearance frequencies of each attribute are not equal. For example, the blue pants class is more represented than the pink pants in the dataset. To deal with this unbalanced dataset a penalty in introduced in the loss. Such as for 3.2.1 the loss for a specific attribute (6) is weighted to penalize the most represented classes. Let represent the number of occurrences of the class of the attribute. One can then re-write (6):

(10)

4 Performance evaluation

Our model is implemented using the TensorFlow library

[38].

We perform our experiments on four re-identification datasets publicly available. The datasets used are CHUK01 [3], CHUK03 [4], ViPER [2], Market1501 with attributes [22].

We first present these datasets and the protocol followed to compute the performance metrics. Then we show the results of our approach and compare them to the state of the art.

4.1 Datasets

CUHK01

Dataset with 3,884 images of 972 pedestrians, each identity is observed by two cameras view. Each person has two images from the first camera and two images from the second camera. All pedestrian images are manually cropped

CUHK03

dataset contains 14,097 cropped images of 1,467 identities. Each identity is observed by two camera views and contains 4.8 images in average for each view. There are two types of bounding boxes: the manually labeled pedestrian bounding boxes and the automaticaly detected bounding boxes obtained by a pedestrian detector.

VIPeR

dataset is a very challenging dataset, since it contains 632 pedestrian image pairs taken from arbitrary viewpoints with various illumination conditions or poses.

Market 1501

This dataset contains 32668 images annotated using a DPM (Deformable Part Model) giving 1501 identities, split in a training set of 751 identities and a test set of 750 identities. Each identity is captured by at most 6 and at least 2 so that cross view search can be performed. Furthermore one can focus on the search from one viewpoint in each other. The dataset has been annotated with 27 attributes per ID.

The statistics of this four datasets are summarized on Table 2.

4.2 Metric and protocol

4.2.1 Metric

To evaluate our model we employed the Cumulative Matching Characteristic (CMC) [39] which is the most used metric on re-identification works [16, 2, 3, 19, 28]

. The CMC curve represents the probability of correct re-identification on the y-axis against the number of candidates returned on the x-axis. The CMC rank1 is very important since it measures the ability of the system to truly identify a person.

For the different datasets we use the protocol described in [21] which is based on [40]. For CUHK01 and VIPeR we divide the identities of the dataset in two equal parts, i.e 485 and 316, for the test set and the training set. For CUHK03, we use the commonly used split: 1467 identities for the training set and 100 identities for the test set.

Our gallery sets and probe sets are constructed as follows. For VIPeR, which has two camera views, we randomly select an image from the first camera as probe image. The gallery image is the same identity taken from the other camera. For CUHK03 and CUHK01 we use a similar protocol, we ensure that images of the gallery sets and probe sets are not the same camera. As stated in [21, 40] both the manually and automatically cropped images were used in our experimentation.

4.2.2 Training protocol

We train several networks to compare the influence of the center loss and the attributes loss.

For the first stage, we train networks without the attributes losses. The resnet50 network we use is pretrained on imagenet. The weights are the one distributed by the Tensorflow community. The name of the checkpoint is resnet_ v2_50_2017_04_14.tar.gz and can be downloaded from the Tensorflow GitHub repository111https://github.com/tensorflow. The learning is performed using the Adam [37] optimizer set with an initial learning rate at for this stage. We train 4 networks, each network is trained with a particular center loss value (, , and ).

For the second stage we load the weight previously computed and we launch a train with the Adam optimizer with learning rate sat to . We generate two types of models, the first type are models still trained without the attributes losses and the second type are models trained with the attributes losses. At the end of these two stages we therefore have generated 10 models, for the 5 center loss values we have a model with attributes and a model without attributes.

For the training, the hyperparameters we used are available int Table 

3.

Hyperparameter Value
Dropout 0.8
L2 regularization 0.001
Batch size 64
100
0.9
, , and
Table 3: Hyperparameters used in our model. is the learning rate for the center loss as described in [1].

We discuss in this section both the influence of the center loss and the attributes. We first discuss only the center loss. To show its influence, the models are trained without attributes. Then we show the influence of the attributes. We therefore use our models trained with the attributes and the center loss.

4.3 Influence of the center loss

On this section we vary the parameter which controls the center loss weight on the global loss.

We focus in this section on performance without the attributes loss. The results are the ones in the No Attributes columns from the Tables 4 . To interpret the results we draw on Figures (a)a (resp. Figure (b)b) the value of the rank-1 CMC during the training (resp. the value of the center loss during the training). The rank-1 CMC during training is computed with all the datasets combined. We thus compute the CMC against all the identities of all the datasets. We control the regularization of our network to have a rank-1 CMC close to . Higher values lead to a rapid overfitting as we will show with a center loss value () set to .

Center loss No attributes attributes
0.0 33.1 34.3
0.05 32.5 31.7
0.06 37.6 38.2
0.1 X 28.6
(a) VIPeR
Center loss No attributes Attributes
0.0 68.6 67.6
0.05 64.1 63.0
0.06 68.7 69.7
0.1 X 42.2
(b) CUHK01
Center loss No Attributes Attributes
0.0 73.6 74.1
0.05 76.3 77.0
0.06 77.1 77.5
0.1 X 57.8
(c) CUHK03
Table 4: Comparison of four different center loss values () with and without the attributes losses. We note a X on the 0.1 row since we are on overfitting regime and the value are therefore low.

The different values of the center loss lead to different performance. After we observe a drop of performance on all the three datasets. During training, the value of the center loss decreases according to the parameter. This effect can be seen figure (b)b. With the value of the center loss decreases dramatically. This has a strong influence to the CMC computed on the training datasets showed on Figure (a)a: the rank-1 CMC reaches a value near to 1.0 which leads to overfitting.

This shows that the center loss has a strong effect during the training, putting the features from a same identify close to a same center increases the capacity of the neural network. With our set of hyperparameters the ideal set of value of the center loss is lower than .

The center loss has not the same impact on all the datasets. With VIPeR the value increase of points. For CUHK01 and CUHK03 the value increase of and points. It shows that the center loss really helps the network to output general representation of pedestrian. Indeed in the ViPER dataset the variation between the two images of a same class is higher than the ones in the other datasets. Thus the network has to be able of great generalization to perform well on ViPER.

(a) CMC train
(b) Center loss value during train
Figure 4: Center loss values and CMC rank1 values during the training of our model.

4.4 Influence of the attributes

We now focus on the attributes. Our objectives are to understand how the attributes help the re-identification score and how our model performs on the attributes extraction task. To compare the influence of the attributes we train our network with the different values of center loss with the attribute system activated. To activate the attribute we add the factor on all the attributes losses for our experiments(the factor). This factor has been empirically found and produces the best results. As shown in Table 4 the attributes losses make the network more efficient on all the datasets.

4.5 Comparison with the state of the art

We compare the performance of our system with the state of the art on CUHK01, CUHK03 and VIPeR. Results are shown on Table 5. We take our best model in this Table, i.e our model with the attributes losses enabled and the value set to .

System CUHK01 CUHK03 VIPeR
Best 66.6 [21] 75.3 [21] 47.8 [41]
Ours 69.7 77.5 38.2
Table 5: CMC top-1 values for different system.

Our system shows better performances on CUHK01 and CUHK03 than the state of the art while being lower on VIPeR. It shows that our approach indeed enables a network to be performant on a large variety of datasets without needing a special retraining for each of them.

4.6 Attributes performances

In this section, the performances of our network will be presented. The tests are run over the test set of the dataset Market-1501. Since some of the classes occur too rarely in the test set to be representative (for instance the bottom yellow and purple) they are removed from the tests. The average precision of the network on the attributes recognition are presented in Table 6. The possible values of the attribute are available on Table 1. For both bottom color and top color attributes, we compute the mean of the average precision of each color. These mean average precision are shown on Table 6 for the column bot.col and top.col.

Att Gender len.top len.bot len.hair hand bag oth.bags backpack bot.col top.col mean
AP 0.94 0.5 0.97 0.90 0.21 0.54 0.81 0.64 0.80 0.70
Table 6: Average precision over the different attributes used in the system

This shows that our system is able to learn to recognize attributes and to perform re-identification at the same time. The attributes classifiers show very good performance on attributes such as gender, hair length, or backpack. Some attributes such as hand bag have a low average precision. This is probably because these classes are under represented on the train dataset. Even if we manage the attribute unbalance during training, when the number of samples for a given class is too limited the network cannot properly generalize.

5 Conclusions

In this paper we have shown a CNN architecture for person re-identification. This architecture is trained with multi-task learning in order to have a system able to be trained from different datasets with different labels. We have demonstrated our approach on 4 datasets: two datasets are relatively small (VIPeR and CUHK01), an another one is larger (CUHK03) and the last one is annotated with attributes (Market 1501). We have evaluated the influence of the different tasks on the global performance (center loss and attributes losses). We proved that combining the different losses leads to better performances. The center loss has a strong influence on performance. We have shown that our system performs well on CUHK01, CUHK03 and VIPeR, and outperforms recent re-identification works on CUHK01 and CUHK03.

References

  • [1] Wen, Y., Zhang, K., Li, Z., Qiao, Y.:

    A discriminative feature learning approach for deep face recognition.

    In: Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VII. (2016) 499–515
  • [2] Gray, D., Tao, H.: Viewpoint invariant pedestrian recognition with an ensemble of localized features. In Forsyth, D.A., Torr, P.H.S., Zisserman, A., eds.: ECCV (1). Volume 5302 of Lecture Notes in Computer Science., Springer (2008) 262–275
  • [3] Li, W., Zhao, R., Wang, X.: Human reidentification with transferred metric learning. In: ACCV. (2012)
  • [4] Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: Deep filter pairing neural network for person re-identification. In: CVPR. (2014)
  • [5] Liu, C., Gong, S., Loy, C.C., Lin, X.: Person re-identification: What features are important? In Fusiello, A., Murino, V., Cucchiara, R., eds.: Computer Vision – ECCV 2012. Workshops and Demonstrations, Berlin, Heidelberg, Springer Berlin Heidelberg (2012) 391–401
  • [6] Madden, C., Cheng, E.D., Piccardi, M.: Tracking people across disjoint camera views by an illumination-tolerant appearance representation. Machine Vision and Applications 18(3) (Aug 2007) 233–247
  • [7] Farenzena, M., Bazzani, L., Perina, A., Murino, V., Cristani, M.: Person re-identification by symmetry-driven accumulation of local features.

    In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. (June 2010) 2360–2367

  • [8] Dikmen, M., Akbas, E., Huang, T.S., Ahuja, N.: Pedestrian recognition with a learned metric. In Kimmel, R., Klette, R., Sugimoto, A., eds.: Computer Vision – ACCV 2010, Berlin, Heidelberg, Springer Berlin Heidelberg (2011) 501–512
  • [9] Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, IEEE Computer Society (2015) 2197–2206
  • [10] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, IEEE Computer Society (2015) 3431–3440
  • [11] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q., eds.: Advances in Neural Information Processing Systems 25. Curran Associates, Inc. (2012) 1097–1105
  • [12] Matsukawa, T., Suzuki, E.: Person re-identification using cnn features learned from combination of attributes. In: 2016 23rd International Conference on Pattern Recognition (ICPR). (Dec 2016) 2428–2433
  • [13] Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: A benchmark. In: Computer Vision, IEEE International Conference on. (2015)
  • [14] Zheng, L., Yang, Y., Hauptmann, A.G.: Person re-identification: Past, present and future. (2016)
  • [15] Schumann, A., Stiefelhagen, R.: Person re-identification by deep learning attribute-complementary information. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). (July 2017) 1435–1443
  • [16] Springer: MARS: A Video Benchmark for Large-Scale Person Re-identification. In: European Conference on Computer Vision, Springer (2016)
  • [17] Shi, H., Yang, Y., Zhu, X., Liao, S., Lei, Z., Zheng, W., Li, S.Z.: Embedding deep metric for person re-identification: A study against large variations. In: Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I. (2016) 732–748
  • [18] Chung, D., Tahboub, K., Delp, E.J.: A two stream siamese convolutional neural network for person re-identification. In: 2017 IEEE International Conference on Computer Vision (ICCV). (Oct 2017) 1992–2000
  • [19] Chen, Y., Duffner, S., Stoian, A., Dufour, J.Y., Baskurt, A.: Triplet cnn and pedestrian attribute recognition for improved person re-identification. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). (Aug 2017) 1–6
  • [20] Chen, W., Chen, X., Zhang, J., Huang, K.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: International Conference on Computer Vision and Pattern Recognition (CVPR). (2017)
  • [21] Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (June 2016) 1249–1258
  • [22] Lin, Y., Zheng, L., Zheng, Zhedong and, W.Y.a.Y.Y.: Improving person re-identification by attribute and identity learning. (2017)
  • [23] Ahmed, E., Jones, M., Marks, T.K.: An improved deep learning architecture for person re-identification. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (June 2015) 3908–3916
  • [24] Zheng, Z., Zheng, L., Yang, Y.: A discriminatively learned CNN embedding for person reidentification. TOMCCAP 14(1) (2018) 13:1–13:20
  • [25] Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a ”siamese” time delay neural network. In Cowan, J.D., Tesauro, G., Alspector, J., eds.: Advances in Neural Information Processing Systems 6. Morgan-Kaufmann (1994) 737–744
  • [26] Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (June 2015) 815–823
  • [27] Caruana, R.: Multitask learning: A knowledge-based source of inductive bias. In: Proceedings of the Tenth International Conference on Machine Learning, Morgan Kaufmann (1993) 41–48
  • [28] McLaughlin, N., del Rincón, J.M., Miller, P.C.: Person reidentification using deep convnets with multitask learning. IEEE Trans. Circuits Syst. Video Techn. 27(3) (2017) 525–539
  • [29] Su, C., Yang, F., Zhang, S., Tian, Q., Davis, L.S., Gao, W.: Multi-task learning with low rank attribute embedding for multi-camera person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence PP(99) (2018) 1–1
  • [30] Li, W., Zhu, X., Gong, S.: Person re-identification by deep joint learning of multi-loss classification.

    In Sierra, C., ed.: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, ijcai.org (2017) 2194–2200

  • [31] Layne, R., Hospedales, T., Gong, S. In: Person Re-identification by Attributes. BMVA Press (2012) 1–11
  • [32] Layne, R., Hospedales, T.M., Gong, S.: Towards person identification and re-identification with attributes. In Fusiello, A., Murino, V., Cucchiara, R., eds.: Computer Vision – ECCV 2012. Workshops and Demonstrations, Berlin, Heidelberg, Springer Berlin Heidelberg (2012) 402–412
  • [33] Layne, R., Hospedales, T.M., Gong, S. In: Attributes-Based Re-identification. Springer London, London (2014) 93–117
  • [34] Deng, Y., Luo, P., Loy, C.C., Tang, X.: Pedestrian attribute recognition at far distance. In: Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, USA, November 03 - 07, 2014. (2014) 789–792
  • [35] Su, C., Zhang, S., Xing, J., Gao, W., Tian, Q.: Deep attributes driven multi-camera person re-identification. In: Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II. (2016) 475–491
  • [36] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. (2016) 770–778
  • [37] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR. Volume abs/1412.6980. (2014)
  • [38] Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015) Software available from tensorflow.org.
  • [39] Gray, D., Brennan, S., Tao, H.: Evaluating appearance models for recognition, reacquisition, and tracking. In: In IEEE International Workshop on Performance Evaluation for Tracking and Surveillance, Rio de Janeiro. (2007)
  • [40] Paisitkriangkrai, S., Shen, C., van den Hengel, A.: Learning to rank in person re-identification with metric ensembles. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015) 1846–1855
  • [41] Cheng, D., Gong, Y., Zhou, S., Wang, J., Zheng, N.: Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (June 2016) 1335–1344