AttKGCN: Attribute Knowledge Graph Convolutional Network for Person Re-identification

11/24/2019 ∙ by Bo Jiang, et al. ∙ 23

Discriminative feature representation of person image is important for person re-identification (Re-ID) task. Recently, attributes have been demonstrated beneficially in guiding for learning more discriminative feature representations for Re-ID. As attributes normally co-occur in person images, it is desirable to model the attribute dependencies to improve the attribute prediction and thus Re-ID results. In this paper, we propose to model these attribute dependencies via a novel attribute knowledge graph (AttKG), and propose a novel Attribute Knowledge Graph Convolutional Network (AttKGCN) to solve Re-ID problem. AttKGCN integrates both attribute prediction and Re-ID learning together in a unified end-to-end framework which can boost their performances, respectively. AttKGCN first builds a directed attribute KG whose nodes denote attributes and edges encode the co-occurrence relationships of different attributes. Then, AttKGCN learns a set of inter-dependent attribute classifiers which are combined with person visual descriptors for attribute prediction. Finally, AttKGCN integrates attribute description and deeply visual representation together to construct a more discriminative feature representation for Re-ID task. Extensive experiments on several benchmark datasets demonstrate the effectiveness of AttKGCN on attribute prediction and Re-ID tasks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 4

page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Object (e.g.

, person, vehicle) re-identification (Re-ID) is an active research problem in computer vision area. Many of existing Re-ID methods adopt a classification framework which aims to determine the label of an input person image by using a classifier trained on the training samples 

[1, 2, 3, 4, 5, 6, 7]. Although recent years have witnessed rapid advancements in Re-ID, it is still a challenging task due to large changes of object visual appearance caused by pose, illumination, deformation and occlusion, etc.

Fig. 1: An example of attribute knowledge graph . Each node denotes a specific attribute , such as Bag, Female, etc. A direct edge exists between each attribute pair which encodes the co-occurrence relationship between them. The detail computation of is presented in section III-B.

One main issue for Re-ID problem is how to generate strong discriminative feature representation for person images [8, 9, 10]. Recently, attributes have been aggregated with deeply-learned Re-ID model and demonstrated beneficially in guiding Re-ID model to learn a stronger discriminative feature representation which can obviously improve the final Re-ID results [11, 12, 13]. For example, Schumann  [11] propose to first train an attribute-classifier and then incorporate it into person Re-ID model. Chang  [9] develop Multi-Level Factorisation Net (MLFN) to learn some latent discriminative factors for each person image based on multiple semantic levels. Li  [14]

propose Attributes-aided Part detection and Refinement network (APDR) which aims to employ the attribute learning to handle the body part misalignment and thus local feature extraction. Tay

 [15] propose Attribute Attention Network (AANet) which integrates person attributes and attribute attentions together in a unified classification framework for Re-ID problem.

As attributes normally co-occur in person images, it is desirable to model the attribute dependencies to improve the attribute prediction and thus final Re-ID results. However, existing works generally fail to exploit them for Re-ID problem. To capture and explore such important dependencies, in this paper, we first build an Attribute Knowledge Graph (AttKG) whose nodes denote attributes and edges encode the co-occurrence relationships among different attributes, as shown in Figure 1. Then, inspired by recent works [16, 17], we propose a novel Attribute Knowledge Graph Convolutional Network (AttKGCN) which integrates both attribute prediction and Re-ID learning together in a unified end-to-end classification framework. AttKGCN can learn a set of inter-dependent attribute classifiers which are combined with each person visual descriptor for its attribute prediction.

Overall, the main contributions of this paper are summarized as follows.

  • We propose to model the dependencies of object attributes via a novel attribute knowledge graph.

  • We propose a novel attribute knowledge graph convolutional network (GCN) based attribute learning model which can well capture and explore the dependencies of attributes for attribute representation and prediction.

  • We propose a strong discriminatory representation learning framework (AttKGCN) for the general object Re-ID tasks, which integrates both attribute learning and visual representation simultaneously and cooperatively in a unified network.

Extensive experiments on several benchmark datasets demonstrate the effectiveness and benefits of the proposed AttKGCN approach.

Fig. 2: Architecture of the proposed AttKGCN for person Re-ID, which contains three main parts, i.e., image-level representation learning, attribute knowledge graph convolutional module and attribute re-weighting. The backbone network can use any pre-trained general CNN models.

Ii Related Works

Ii-a Attributed-based for Person Re-ID

Recently, some studies have employed attributes for person Re-ID [12, 10, 18, 14, 15] to improve Re-ID results. Lin  [10] annotate attributes for benchmark datasets Market-1501 [19] and DukeMTMC-reID [20] manually, and propose an Attribute-Person Recognition (APR) network to conduct Re-ID embedding and pedestrian attributes prediction. Han  [12]

propose Attribute-Aware Attention Model (A

M) to jointly learn both local attribute and global identity feature representations together in an end-to-end manner. Tay  [15] propose Attribute Attention Network (AANet) which integrates person attributes and attribute attentions together for Re-ID problem. Li  [21]

design a deep learning based single attribute recognition model (DeepSAR) to identify each attribute. They also present a deep learning (DeepMAR) for recognizing multiple attributes by exploiting the relationships among attributes. Matsukawa

 [8] propose to define an attribute loss which is further combined with classification loss for Re-ID network training. Zhao  [22] propose an attribute-driven method for video person Re-ID problem.

Overall, the above attribute guided Re-ID approaches have demonstrated the benefits of integration of attribute learning and deeply visual representation for Re-ID problem. However, they normally conduct attribute representation/learning individually which ignores the inherent co-occurrence information existing among different attributes. To our best knowledge, this co-occurrence information has been less exploited or emphasized for Re-ID, although it has been mentioned (not utilized) in work [10]. Our aim in this paper is to further capture and exploit this co-occurrence information for attribute representation and Re-ID problem by employing the recently introduced Graph Convolutional Networks (GCNs) [23].

Ii-B Graph Convolutional Networks

As an extension of CNNs from regular grid to irregular graph, graph convolutional networks (GCNs) [23, 24, 25, 26, 27] have been demonstrated very effectively in graph representation and learning. Kipf  [23]

propose to develop a simple Graph Convolutional Network (GCN) for graph semi-supervised learning. Hamilton

 [28] present a general inductive representation and learning framework for the representations of unseen nodes. Veličković  [24] propose Graph Attention Networks (GATs) for graph based semi-supervised learning. The core GCNs is to conduct graph node representation and labeling by propagating messages on graph structure. Recently, knowledge graph convolutional networks have been developed for zero-shot learning and multi-label recognition. Wang  [16] propose to employ GCN learning on a category knowledge graph to predict the visual classifiers for unseen categories in zero-shot learning. Kampffmeyer  [29] improve this work [16] by further introducing a Dense Graph Propagation (DGP) module to exploit the hierarchical structure of knowledge graph. Chen  [17] propose multi-label GCN (ML-GCN) for image multi-label recognition. ML-GCN aims to learn inter-dependent object classifiers by employing GCN learning on an object label graph which encodes the correlation information among different labels.

Inspired by these works [16, 17], we propose to construct an attribute knowledge graph to model the co-occurrence information among different person attributes and then employ a noval attribute knowledge graph convolutional network (AttKGCN) for person attribute prediction and Re-ID.

Iii Proposed Model

In this section, we present our Attribute Knowledge Graph Convolutional Network (AttKGCN) for person attribute prediction and Re-ID tasks. The overall framework of AttKGCN is shown in Figure 2, which contains three main parts, i.e., 1) CNN based image feature extraction, 2) AttKGCN based attribute classifier generation and re-weighting and 3) integration of visual and attribute representations for final Re-ID task. In the following, we present the detail of these modules.

Iii-a Deeply visual representation

Given an input person image

, we first extract its visual representation by using a deep feature extractor

. The extractor can use any of pre-trained general CNN base models, such as ResNet-50 [30], VGG [31], or some deeply person-specific feature extraction models, such as PCB [4], HPM [32], etc. For convenience, here we use ResNet-50 [30] as an example for description. In this case, we can first obtain 2048248 feature maps from the layer when we resize an input image as 384128. Then, we obtain the image-level feature () by further employing global maxpooling operation as follow,

(1)

where

indicates the global max-pooling operation,

denotes the parameters of CNN model.

Iii-B AttKGCN based attribute learning

The aim of the proposed AttKGCN is to learn inter-dependent attribute classifiers via GCN based mapping function. In order to do so, we first construct an attribute knowledge graph (AttKG). Then, we propose to develop an attribute knowledge graph convolutional learning module for attribute representation and prediction. Finally, we propose to integrate attribute learning and visual representation together in a unified manner for the final Re-ID task (see in section III-C).

Attribute Knowledge Graph Construction. In order to employ GCN learning for attribute representation and prediction, we need to first construct an attribute graph to represent the correlation among different attributes. Specifically, as shown in Figure 3, we construct an attribute knowledge graph (AttKG) as follows. Each node in AttKG denotes a specific attribute , such as Female, Hat

, etc., which can be described via a semantic embedding vector such as word embedding. Here

denotes the number of different attributes. An edge exists between each node pair in AttKG which encodes the co-occurrence relationship between attribute and . Similar to work [17], we define this co-occurrence relationship

via conditional probability

which denotes the probability of occurrence of attribute when attribute appears. Given a training dataset, the conditional probability

is estimated as,

(2)

where denotes the co-occurring number of attribute and , , the number of person image in training dataset that contains both attribute and , and denotes the number of attribute , , the number of person image that contains attribute in the training dataset. Obviously, the co-occurrence probability matrix is unsymmetrical, which indicates the unsymmetry of our attribute knowledge graph . Figure 3 shows an example of the proposed AttKG which is constructed based on the attributes of Market-1501 [19] dataset.

Fig. 3: An example of the proposed AttKG obtained from Market-1501 [19] dataset. In this figure, we use the depth of the color to indicate the strength of the correlation between the two attributes. , , , , denotes , , , , , respectively.

AttKGCN for Attribute Representation. Given an input AttKG , we adopt a -layer GCN module for attribute representation and learning. Formally, given and the -layer representation , we propose to conduct the layer-wise propagation as

(3)

where , and indicates the normalized matrix .

is an activation function, such as ReLU function

. denotes the layer-wise trainable weight matrix.

The first input feature representation of AttKGCN can be set as the semantic embeddings of attributes. In this paper, we just simply set which can also obtain desired learning results, where

denotes an identity matrix with proper size. The main purpose of AttKGCN module is to learn a set of inter-dependent classifiers for attributes which are combined with the pre-trained deeply visual representation for person attribute prediction. Thus, the final output of AttKGCN is a regression matrix

where denotes the dimensionality of the image feature (Eq. (1)) and denotes the number of different attributes. Overall, we can summarize our AttKGCN module as

(4)

where denotes the collection of network weight matrices. In the following, we will apply the learned regressor for attribute prediction and re-weighting tasks.

Attribute Prediction and Re-weighting. The final output of the above AttKGCN is a regression matrix where each row denotes the classifier to the -th attribute . Specifically, given an input person visual representation , we can obtain the predicted scores of all attributes for this person by applying the learned regressor as follows,

(5)

where denotes the predicted score of the -th attribute, and is the softmax function which guarantees the output attribute scores satisfying the probability condition, ,

. Here, we adopt cross-entropy loss function 

[33] for attribute prediction,

(6)

where denotes the ground-truth attribute vector of input image , , indicates that image has attribute , and otherwise.

Moreover, in order to utilize the attribute information for Re-ID more compactly, it is also desirable to recalibrate the strengths of different attributes for person attribute representation. This motivates us to develop an attribute re-weighting (AttRW) module. In this paper, we formulate AttRW task as node labeling/weighting on attribute graph and implement it using GCN learning. Given attribute knowledge graph , AttRW aims to obtain weights for different attributes by using

(7)

where

denotes a single layer neural network with parameter

. The purpose of sigmoid function is used to guarantee the nonnegativity of

.

Remark. Here, one can either utilize the same GCN module used in Eq. (4) (, setting ) or design a different new GCN module. In our experiments, we use the former setting for complexity consideration.

By applying attribute weights , we can obtain a kind of weighted attribute representation for Re-ID task as

(8)

where denotes the element-wise multiplication operation between vector and .

Iii-C Re-identification

We propose to integrate both visual and re-weighted attribute representations for person Re-ID. To do so, we first develop a visual-attribute representation for the input person image by concatenating its deeply learned image representation with its corresponding re-weighted attribute prediction vector . Then, we adopt a FC layer followed by softmax operation to predict the ID label of the person, ,

(9)

where denotes the concatenation operation. Here, we adopt cross-entropy loss [33] function for identity prediction. Therefore, the final overall loss function is

(10)

where is a hyper-parameter to balance identification loss and attribute prediction loss.

Method Reference Market1501 DukeMTMC-reID
Rank-1 Rank-5 Rank-10 mAP Rank-1 Rank-5 Rank-10 mAP
C FD-GAN [34] NIPS2018 90.5 - - 77.7 80.0 - - 64.5
SafeNet [35] IJCAI2018 90.2 - - 72.7 82.7 - - 57.0
MGCAM [36] CVPR2018 83.7 - - 74.33 - - - -
PCB [4] ECCV2018 92.4 97.0 97.9 77.3 81.9 89.4 91.6 65.3
SGGNN [6] ECCV2018 92.3 96.1 97.4 82.8 81.1 88.4 91.2 68.2
MGAT [37] CVPRW2019 91.5 97.2 98.0 76.5 - - - -
VPM [5] CVPR2019 93.0 97.8 98.8 80.8 83.6 91.7 94.2 72.6
CASN [38] CVPR2019 94.4 - - 82.8 87.7 - - 73.7
HPM [32] AAAI2019 94.2 97.5 98.5 82.7 86.6 93.0 95.1 74.3
TBN+ [39] ICME2019 93.2 - - 83.0 85.5 - - 73.0
[40] ICCV2019 96.1 - - 84.7 86.3 - - 73.1
A ACRN [11] CVPR2017 83.6 92.6 95.3 62.6 72.5 84.7 88.8 51.9
APR [10] PR2019 87.0 95.1 96.4 66.8 73.9 - - 55.5
APDR [14] PR2019 93.1 97.2 98.2 80.1 84.3 92.4 94.7 69.7
MLFN [9] CVPR2018 87.0 95.1 96.4 66.8 73.9 - - 55.5
A[12] MM2018 86.5 95.1 97.0 68.9 - - - -
AANet [15] CVPR2019 93.8 - 98.5 82.4 86.4 - - 72.5
AttKGCN - 94.4 98.0 98.7 85.5 87.8 94.4 95.7 77.4
TABLE I: Comparison with state of the art on the Market-1501 [19] and DukeMTMC-reID [20]. C: the methods of CNN-based on re-id. A: the methods of Attribute-based on re-id. The red/blue denotes the 1/2 best results, respectively.
Method Age B.pack Bag H.bag C.down C.up S.clth L.low L.slv Hair Hat Gender Avg
APR [10] 88.6 84.9 76.4 90.4 73.8 74.0 92.8 93.7 93.6 84.4 97.1 88.9 86.6
AANet [15] 88.2 87.7 79.7 89.6 70.81 77.1 94.8 94.2 94.4 86.5 98.0 92.3 87.8
AttKGCN 88.9 90.0 89.6 89.3 90.1 88.5 94.0 89.8 89.0 90.1 89.5 89.4 89.8
TABLE II: Results of our approach for attribute recognition on Market-1501 [19]. ’Avg’ is our average accuracy to indicate overall attribute prediction accuracy. ’B.pack’, ’H.bag’, ’C.down’, ’C.up’, ’S.clth’, ’L.low’, ’L.slv’ denotes , , -, -, -, -, , respectively.

Iv Experiments

Iv-a Datasets and Settings

Market-1501 [19] dataset contains 32668 images of 1501 persons which are observed under six camera viewpoints. We follow the standard training and evaluation protocol where 750 identities are selected for testing and 751 identities for training. Deformable Part Model (DPM) [41] is used as the person detector. For each identity in this dataset, 27 attribute labels are annotated by [10].

DukeMTMC-reID [20] dataset is a subset of DukeMTMC dataset [42] which contains 1812 identities observed from 8 different camera viewpoints. It is divided into 16522 training images of 702 identities and 19889 test images of 702 identities. The evaluation protocol is same as that of on Market-1501 dataset [19]. For each identity in this dataset, 23 attribute labels are annotated by [10].

Evaluation Metrics. Following many previous works [4, 43], we use both Cumulative Matching Characteristic (CMC) (as rank-1, rank-5 and rank-10) and mean Average Precision (mAP) for evaluation, where mAP denotes the mean value of average precision across all queries. For attribute recognition, we calculate the classification accuracy for each attribute as well as the overall averaged accuracy of all these attribute predictions. When testing attribute prediction on Market-1501 [19], we omit the distractor (background) and junks images because they do not have attribute labels [10].

Iv-B Implementation details

As discussed in section III-A, for each person image, we extract its visual representation by using any person specific deep feature extraction models. In our experiments, we implement Horizontal Pyramid Matching (HPM) network [32] (Baseline model) and use it as our feature extractor

. In our experiment, the batch size is set to 90 and the epoch number is set to 120. The learning rate is set to 0.1 and 0.001 for the feature extraction module and our attribute GCN module, respectively. We train the whole network by using stochastic gradient descent (SGD) 

[44] in each mini-batch. The whole training process takes about two hours.

Iv-C Experiment results

We compare our AttKGCN with some recent related state-of-art methods which mainly includes both recent CNN-based Re-ID method FD-GAN [34], TBN+ [39], VPM [5], SGGNN [6], PCB [4], HPM [32] and attribute-based method APDR[14], APR [10], AANet [15] and MLFN [9]. Table I summarizes the comparison results on Market-1501 [19] and DukeMTMC-reID [20] datasets. Overall, AttKGCN generally obtains the best results on two benchmarks. More concretely, we can note the following aspects.

Comparison with CNN-based Methods. We select some representative CNN-based Re-ID methods which mainly include traditional CNN methods, augmented dataset methods, part-based methods and graph-based methods, etc. For example, on Market-1501, comparing with part-based convolutional baseline (PCB) [4], AttKGCN has 8.2 and 2.0 improvements on mAP and rank-1, respectively. Comparing with Horizontal Pyramid Matching (HPM), AttKGCN result improves 2.8 and 0.2 on mAP and rank-1, respectively. On DukeMTMC-reID [20], comparing with PCB [4] and HPM [32], AttKGCN has 5.9 and 1.2 improvements on rank-1 and 12.1 and 3.1 improvements on mAP, respectively. These clearly demonstrate the effectiveness of AttKGCN to enhance the discriminatory ability of part-based Re-ID model by incorporating our attribute knowledge graph learning architecture.

Fig. 4: Rank-5 results for some queries returned by Baseline and AttKGCN on Market-1501 [19], respectively. Green boundary indicates true positive samples, and red represents false positive samples.

Fig. 5: 2D t-SNE [45] visualization of feature representations learned by Baseline and AttKGCN on Market-1501 [19], respectively.

Comparison with Attribute-based Methods. We compare AttKGCN with recent attribute-based methods including ACRN [11], A[12], APR [10] and AANet [15]. Table I summarizes the comparison results. Here, we can note that, comparing with recent state-of-the-art attribute-based Re-ID method AANet [15], AttKGCN gains 3.1 and 0.6 improvements on mAP and rank-1, respectively on Market-1501 [19] dataset. On DukeMTMC-reID [20], we also gain 4.9 and 1.4 improvements over AANet [15] on mAP and rank-1 measurements.

Qualitative Visualization. Figure 4 shows the top-5 ranking list of Re-ID result for some query images on Market-1501 [19]. Intuitively, one can note that AttKGCN can find more true positives than Baseline model111 It has the same network setting with AttKGCN but without attribute representation/learning module..

Representation Visualization. Figure 5 demonstrates 2D visualizations of the learned final output representation of Baseline network and AttKGCN on Market-1501 dataset, respectively. Here, we only show 11 identities and different colors denote different identities. Intuitively, one can observe that the persons’ representations of AttKGCN are distributed more clearly than Baseline, which further demonstrates the more discriminatory of AttKGCN by incorporating attribute representation in person representation.

Iv-D Evaluation on Attribute Recognition

To evaluate the effectiveness of the proposed GCN based attribute prediction, we test attribute recognition/prediction on Market-1501 [19] dataset. Table II summarizes the comparison results. Overall, AttKGCN performs better than the competing method APR [10] and AANet [15] on most attributes and obtains the highest prediction accuracy on average results. This clearly demonstrates the effectiveness of the proposed AttKGCN on person attribute recognition.

Iv-E Parameter Analysis

Here, we evaluate the effectiveness of AttKGCN with different parameter settings. We first conduct experiments to verify the effect of parameter (Eq. (10)). Figure 6 shows performance of our AttKGCN with different parameter . Here, we can note that AttKGCN results are fairly stable when , which demonstrates the insensitivity of the proposed AttKGCN w.r.t. parameter . In all experiments, we set .

We then conduct experiments to verify the effect of different number of graph convolutional layers in the GCN module of AttKGCN. Table III shows the performance of our AttKGCN method across different number of convolutional layers. One can note that AttKGCN can obtain better performance with different numbers of layers, which indicates the insensitivity of the AttKGCN w.r.t. model depth. In all experiments, we use a two-layer graph convolutional network in our AttKGCN model.

(a) mAP
(b) Rank-1
Fig. 6: Performance comparison with different parameter on Market-1501 [19] and DukeMTMC-reID [20], respectively.
Layer Market-1501 DukeMTMC-reID
R1 R5 mAP R1 R5 mAP
2-layer 94.4 98.0 85.5 87.8 94.4 77.4
3-layer 94.0 97.9 85.1 86.8 93.9 77.1
4-layer 94.0 97.7 84.8 87.6 94.3 77.0
5-layer 94.3 97.7 85.1 86.4 93.6 75.2
TABLE III: Performance comparison with different depths of GCN in our AttKGCN model on two benchmarks.
(a) mAP
(b) Rank-1
Fig. 7: Comparison of Baseline, AttKGCN-noRW and AttKGCN on Market-1501 [19] and DukeMTMC-reID [20], respectively.

Iv-F Ablation Study

To justify the effectiveness of two main components (attribute knowledge graph convolutional module, attribute re-weighting module) in the proposed AttKGCN model, we conduct ablation experiments on two datasets. We implement some special variants of our model, i.e., Baseline, AttKGCN-noRW and AttKGCN. 1) Baseline model only uses the visual representation of person image for Re-ID problem and does not exploit any attribute information. 2) AttKGCN-noRW that removes attribute re-weighting module from AttKGCN. Figure 7 shows the comparison results. Here, we can note that (1) attribute knowledge graph convolutional module can significantly improve the final Re-ID results, which indicates the advantage of this architecture on conducting Re-ID tasks. (2) The proposed attribute re-weighting module is useful to guide the more accurate attribute representation and Re-ID.

V Conclusions and Future Works

This paper proposes a novel Attribute Knowledge Graph Convolutional Network (AttKGCN) model for person Re-ID and attribute recognition. AttKGCN employs a novel Attribute Knowledge Graph convolutional architecture for attribute representation and learning. AttKGCN integrates attribute representation and image visual representation together to learn a stronger discriminative person representation for Re-ID. Experimental results on benchmarks demonstrate that AttKGCN performs obviously better than state-of-the-art attribute-based Re-ID approaches. Note that, our AttKGCN is not limited to deal with person Re-ID.

In the future, we will adopt AttKGCN to address some other object Re-ID tasks, such as vehicle Re-ID. Also, we will incorporate some more semantic embedding of attributes (e.g., word embedding) into AttKGCN framework to further enhance the accuracy of attribute recognition

References

  • [1] C. Su, J. Li, S. Zhang, J. Xing, W. Gao, and Q. Tian, “Pose-driven deep convolutional model for person re-identification,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3960–3969.
  • [2] J. Si, H. Zhang, C.-G. Li, J. Kuen, X. Kong, A. C. Kot, and G. Wang, “Dual attention matching network for context-aware feature sequence based person re-identification,” in

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , 2018, pp. 5363–5372.
  • [3] D. Chen, D. Xu, H. Li, N. Sebe, and X. Wang, “Group consistent similarity learning via deep crf for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8649–8658.
  • [4] Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, “Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),” in Proceedings of the European Conference on Computer Vision, 2018, pp. 480–496.
  • [5] Y. Sun, Q. Xu, Y. Li, C. Zhang, Y. Li, S. Wang, and J. Sun, “Perceive where to focus: Learning visibility-aware part-level features for partial person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 393–402.
  • [6] Y. Shen, H. Li, S. Yi, D. Chen, and X. Wang, “Person re-identification with deep similarity-guided graph neural network,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 486–504.
  • [7] Y. Yan, Q. Zhang, B. Ni, W. Zhang, M. Xu, and X. Yang, “Learning context graph for person search,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 2158–2167.
  • [8] T. Matsukawa and E. Suzuki, “Person re-identification using cnn features learned from combination of attributes,” in Proceedings of the International Conference on Pattern Recognition, 2016, pp. 2428–2433.
  • [9] X. Chang, T. M. Hospedales, and T. Xiang, “Multi-level factorisation net for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2109–2118.
  • [10] Y. Lin, L. Zheng, Z. Zheng, Y. Wu, Z. Hu, C. Yan, and Y. Yang, “Improving person re-identification by attribute and identity learning,” Pattern Recognition, pp. 151–161, Nov 2019.
  • [11] A. Schumann and R. Stiefelhagen, “Person re-identification by deep learning attribute-complementary information,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 20–28.
  • [12] K. Han, J. Guo, C. Zhang, and M. Zhu, “Attribute-aware attention model for fine-grained representation learning,” in Proceedings of the ACM International Conference on Multimedia, 2018, pp. 2040–2048.
  • [13] Z. Yin, W.-S. Zheng, A. Wu, H.-X. Yu, H. Wan, X. Guo, F. Huang, and J. Lai, “Adversarial attribute-image person re-identification,” in

    Proceedings of the International Joint Conference on Artificial Intelligence

    , 2018, pp. 1100–1106.
  • [14] S. Li, H. Yu, W. Huang, and J. Zhang, “Attributes-aided part detection and refinement for person re-identification,” ArXiv preprint arXiv:1902.10528, 2019.
  • [15] C.-P. Tay, S. Roy, and K.-H. Yap, “Aanet: Attribute attention network for person re-identifications,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 7134–7143.
  • [16] X. Wang, Y. Ye, and A. Gupta, “Zero-shot recognition via semantic embeddings and knowledge graphs,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6857–6866.
  • [17] Z.-M. Chen, X.-S. Wei, P. Wang, and Y. Guo, “Multi-label image recognition with graph convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5177–5186.
  • [18] J. Liu, Z.-J. Zha, H. Xie, Z. Xiong, and Y. Zhang, “Ca 3 net: Contextual-attentional attribute-appearance network for person re-identification,” in Proceedings of the ACM International Conference on Multimedia, 2018, pp. 737–745.
  • [19] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1116–1124.
  • [20] Z. Zheng, L. Zheng, and Y. Yang, “Unlabeled samples generated by gan improve the person re-identification baseline in vitro,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3754–3762.
  • [21] D. Li, X. Chen, and K. Huang, “Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios,” in Proceedings of the Asian Conference on Pattern Recognition, 2015, pp. 111–115.
  • [22] Y. Zhao, X. Shen, Z. Jin, H. Lu, and X.-s. Hua, “Attribute-driven feature disentangling and temporal aggregation for video person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 4913–4922.
  • [23] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” ArXiv preprint arXiv:1609.02907, 2016.
  • [24] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” ArXiv preprint arXiv:1710.10903, 2017.
  • [25] F. Hu, Y. Zhu, S. Wu, L. Wang, and T. Tan, “Hierarchical graph convolutional networks for semi-supervised node classification,” ArXiv preprint arXiv:1902.06667, 2019.
  • [26] B. Jiang, Z. Zhang, D. Lin, J. Tang, and B. Luo, “Semi-supervised learning with graph learning-convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 313–11 320.
  • [27]

    F. Z. J. H. Ruoyu Li, Sheng Wang, “Adaptive graph convolutional neural networks,” in

    Proceedings of the AAAI Conference on Artificial Intelligence, 2018, pp. 3546–3553.
  • [28] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Advances in Neural Information Processing Systems, 2017, pp. 1024–1034.
  • [29] M. Kampffmeyer, Y. Chen, X. Liang, H. Wang, Y. Zhang, and E. P. Xing, “Rethinking knowledge graph propagation for zero-shot learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 487–11 496.
  • [30] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  • [31] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  • [32] Y. Fu, Y. Wei, Y. Zhou, H. Shi, G. Huang, X. Wang, Z. Yao, and T. Huang, “Horizontal pyramid matching for person re-identification,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, pp. 8295–8302.
  • [33] P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, “A tutorial on the cross-entropy method,” Annals of Operations Research, pp. 19–67, Jul 2005.
  • [34] Y. Ge, Z. Li, H. Zhao, G. Yin, S. Yi, X. Wang et al., “Fd-gan: Pose-guided feature distilling gan for robust person re-identification,” in Proceedings of the Conference on Neural Information Processing Systems, 2018, pp. 1222–1233.
  • [35] K. Yuan, Q. Zhang, C. Huang, S. Xiang, C. Pan, and H. Robotics, “Safenet: Scale-normalization and anchor-based feature extraction network for person re-identification.” in Proceedings of the International Joint Conference on Artificial Intelligence, 2018, pp. 1121–1127.
  • [36] C. Song, Y. Huang, W. Ouyang, and L. Wang, “Mask-guided contrastive attention model for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1179–1188.
  • [37] L. Bao, B. Ma, H. Chang, and X. Chen, “Masked graph attention network for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0.
  • [38] M. Zheng, S. Karanam, Z. Wu, and R. J. Radke, “Re-identification with consistent attentive siamese networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5735–5744.
  • [39] H. Li, M. Yang, Z. Lai, W. Zheng, and Z. Yu, “Pedestrian re-identification based on tree branch network with local and global learning,” arXiv preprint arXiv:1904.00355, 2019.
  • [40] S. Zhou, J. Wang, D. Meng, Y. Liang, Y. Gong, and N. Zheng, “Discriminative feature learning with foreground attention for person re-identification,” Proceedings of the IEEE Transactions on Image Processing, pp. 1–1, July 2019.
  • [41] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1627–1645, Sep 2009.
  • [42] E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Performance measures and a data set for multi-target, multi-camera tracking,” in Proceedings of the European Conference on Computer Vision, 2016, pp. 17–35.
  • [43] Z. Zhong, L. Zheng, D. Cao, and S. Li, “Re-ranking person re-identification with k-reciprocal encoding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1318–1327.
  • [44]

    L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in

    Proceedings of COMPSTAT’2010, 2010, pp. 177–186.
  • [45] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of machine learning research.