Person Search Challenges and Solutions: A Survey

05/01/2021 ∙ by Xiangtan Lin, et al. ∙ 0

Person search has drawn increasing attention due to its real-world applications and research significance. Person search aims to find a probe person in a gallery of scene images with a wide range of applications, such as criminals search, multicamera tracking, missing person search, etc. Early person search works focused on image-based person search, which uses person image as the search query. Text-based person search is another major person search category that uses free-form natural language as the search query. Person search is challenging, and corresponding solutions are diverse and complex. Therefore, systematic surveys on this topic are essential. This paper surveyed the recent works on image-based and text-based person search from the perspective of challenges and solutions. Specifically, we provide a brief analysis of highly influential person search methods considering the three significant challenges: the discriminative person features, the query-person gap, and the detection-identification inconsistency. We summarise and compare evaluation results. Finally, we discuss open issues and some promising future research directions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Person search [40] aims to find a query person in a gallery of scene images. Historically, person search was an extended form of person re-identification (re-id) problem [27, 23, 22, 6, 29, 5, 28]. Therefore, early researches on person search focused on an image-based setting, which uses person image as the search query  [39, 25, 1, 10, 38]. Meanwhile, research in text-based person search  [21, 37] has made significant advances in the past few years. Text-based person search is handy when a probe image is unavailable but free-form natural language. The two types of person search are illustrated in Figure 1.

Figure 1: The general frameworks of person search. (a) Image-based person search in which person image is available as the search query against a gallery of images. Image-based person search involves two sub-tasks, person detection and person identification. (b) Text-based person search in which search query is free form natural language. A general text-based person search framework typically learns text feature through an RNN variant network and then align text features with visual elements from the detection network to identify the person in the target images.

Person search faces more challenges than person re-id problem. Unlike the person re-id setting where the cropped person images are provided, and the primary challenge is just to bring the query-person gap. Person search needs to deal with an additional detection challenge so that the detected person can be used for the downstream identification task. The additional detection task poses more challenges due to the influences of poses, occlusion, resolution and background clutter in the scene images. Such detection results may be inconsistent with the identification task (Figure 2). Similarly, text-based person search is also more challenging than the traditional text-image matching problem [21] as it needs to learn discriminative features first before the text-person matching.

Person search is fast-evolving, and existing person search methods are diverse and complex. Researchers may leverage the rich knowledge concerning object detection, person re-id, and text-image matching separately. Systematic surveys concerning person search bring more values to the community. Especially, as far as we know, there is no existing survey covering the text-based person search.  [15] surveyed works on image-based person search and neglected the text-based person search. Furthermore,  [15] didn’t discuss the joint challenge of person detection and identification, especially the detection-identification inconsistency challenge as illustrated in Figure 2. Therefore, we survey works beyond image-based person search and provide a systematic review of the diverse person search solutions. We summarise the main differences between the previous survey [15] and ours in Table 1.

Survey Covering Analysis
 [15] Image-based Components
Ours Image-based, Text-based (Challenges: Solutions) Discriminative person features: Deep feature representation learning Query-person gap: Deep metric learing Detection-identification inconsistency: Identity-driven detection
Table 1: Summary of the main differences between the previous survey and ours. This survey focuses more on challenges and solutions.

In this survey, we aim to provide a cohesive analysis of the recent person search works so that the rationals behind the ideas can be grasped to inspire new ideas. Specifically, We surveyed recently published and pre-print person search papers from top conference venues and journals. We analyse methods from the perspective of challenges and solutions and summarise evaluation results accordingly. At the end of the paper, we provide insights on promising future research directions. In summary, the main contributions of this survey are:

  • In addition to image-based person search, we cover text-based person search which was neglected in the previous person search survey.

  • We analyse person search methods from the perspective of challenges and solutions to inspire new ideas.

  • We summarise and analyse existing methods’ performance and provide insights on promising future research directions.

2 Person Search

Person search is a fast-evolving research topic. In 2014,  [40] first introduced the person search problem and pointed out the conflicting nature between person detection and person identification sub-tasks. Person detection deals with common human appearance, while the identification task focuses on a person’s uniqueness. After  [39] introduced the first end-to-end person search framework in 2017, we have seen an increasing number of image-based person search works in the last three years. Meanwhile, in 2017, GNA-RNN [21] set the benchmark for text-based person search. We draw a timeline to present the person search works in Figure 3 and show the two divisions: image-based and text-based person search.

Figure 2: The detection-identification inconsistency problem. The detecion model learns person proposal based on common person apperance using intersection-over-union (IoU) over certain threshold, which may result in less accurate bounding boxes compare to person for the identification task.

Person search addresses person detection and person identification simultaneously. There are three significant person search challenges to be considered when developing a person search solution. Firstly, a person search model needs to learn discriminative person features from scene images suitable for matching the query identity. Inevitably, the learnt person features differ from the query identity features to some degrees. Therefore the second major challenge is how to bring the gap between the query and the detected person. The third challenge is related to the conflicting nature between person detection and person identification. Person detection deals with common person appearance, while the identification task focuses on a person’s uniqueness. The detected person may not be suitable for identity matching. For instance, a partial human body could be considered a person during detection and is inconsistent with the query identity at the identification stage, which may be a full person picture.

In this section, we analyse person search methods regarding above-mentioned three challenges and corresponding solutions from the following three aspects for both image-based and text-based person search:

  • Deep feature representation learning. Addressing the challenge of learning discriminative person features from gallery images concerning background clutter, occlusion and poses etc.

  • Deep metric learning.

    Addressing the challenge of bringing query-person gap by using loss functions to guide feature representation learning.

  • Identity-driven detection. Addressing the challenge of mitigating the detection-identification inconsistency by incorporating query identities into the detection process.

Figure 3: Timeline of person search studies. Above the timeline are image-based person search works. Below the line are text-based person search methods.

2.1 Deep Feature Representation Learning

Deep feature representation learning focuses on learning discriminative person features concerning distractors in the gallery images. The majority of the early methods exploited global person features, including context cues, while refining person proposals. Such as RCAA  [1]

utilises the relational spatial and temporal context in a deep reinforcement learning framework to adjust the bounding boxes constantly. However, these methods didn’t consider the background clutter in the proposal bounding boxes, resulting in a situation where different persons with similar backgrounds are close in the learnt feature space. SMG  

[46] eliminates background clutter using segmentation masks so that the learnt person features are invariant to the background clutter. NAE [3] separates persons and background by norms and discriminates person identities by angles. Person detection and object detection, in general, face the multi-scale matching challenge. To learn scale-invariant features, CLSA [18] and DHFF  [31] utilise multi-level features from the identification network to solve the multi-scale matching problem with different multi-metric losses.

Local discriminative features are useful when two persons exhibit similar appearance and can’t be discriminated against merely by full-body appearance. APNet  [49] divides the body into six parts and uses an attention mechanism to weigh the body parts’ contribution further. Unlike APNet, which uses arbitrary body parts, CGPS [41] proposes a region-based feature learning model for learning contextual information from a person graph. BINet  [8] uses the guidance from the cropped person patches to eliminate the context influence outside the bounding boxes.

Deep feature representation learning in text-based person search learns visual representations for the detected person most correspondent to the textual features. Similar to image-based person search, text-based person search methods exploit global and local discriminative features. GNA-RNN [21] exploits global features in the first text-based LSTM-CNN person search framework and uses an attention mechanism to learn the most relevant parts. GNA-RNN only attends to visual elements and doesn’t address various text structure. To address this problem, CMCE [20] employs a latent semantic attention module and is more robust to text syntax variations. To address the background clutter problem, PMA [17] uses pose information to learn the pose-related features from the map of the key points of human. To further distinguish person with similar global appearance, PWM+ATH [4] utilises a word-image patch matching model to capture the local similarities. ViTAA [37] decomposes both image and text into attribute components and conducts a fine-grained matching strategy to enhance the interplay between image and text.

2.2 Deep Metric Learning

Deep metric learning tackles the query-person gap challenge with loss functions to guide the feature representation learning. The general purpose is to bring the detected person features close to the target identity while separating them from other identities. Similarity metrics such as Euclidean distance and cosine similarity are common measures to evaluate the similarity level among those query-person pairs. The identification task is generally formulated as a classification problem where conventional softmax loss trains the classifier. Softmax has a major problem of slow convergence with a large number of classes. OIM (Eq:

2[39] addresses this issue while exploiting large number of identities and unlabeled identities. OIAM [10] and IEL  [35]

further improve the OIM method with additional center losses. Different from OIM variances, I-Net 

[12] introduces a Siamese structure with an online pairing loss (OPL) and hard example priority Softmax loss (HEP) to bring the query-person gap. RDLR [11] uses the identification loss instead of regression loss for supervising the bounding boxes.

In the landmark OIM approach, the OIM loss effectively closes the query-person gap utilising labelled and unlabeled identities from training data. The probability of detected person features

being recognised as the identity with class-id by a Softmax function:

(1)

Where is the labelled person features for the identity in the lookup table (LUT). is the labelled person features in the LUT. is the unlabelled person features in the LUT.

regulates probability distribution. OIM objective is to maximize the expected log-likelihood of the target

.

(2)

Metric learning in text-based person search is to close the text-image modality gap. The main challenge in text-based person search is that it requires the model to deal with the complex syntax from the free-form textual description. To tackle this, methods like ViTAA, CMCE, PWM+ATH [37, 20, 4] employ attention mechanism to build relation modules between visual and textual representations. Unlike the above three methods, which are all the CNN-RNN frameworks, Dual Path [48] employs CNN for textual feature learning and proposes an instance loss for image-text retrieval. CMPM+CMPC [45] utilizes a cross-modal projection matching (CMPM) loss and a cross-modal projection classification (CMPC) loss to learn discriminative image-text representations. Similar to CMPM+CMPC, MAN [16] proposes cross-modal objective functions for joint embedding learning to tackle the domain adaptive text-based person search.

Inspired by the recent success of knowledge distillation [13], instead of directly training detection and identification sub-nets, the two modules can be learnt from the pre-trained detection and identification models [33]. DKD [44] focuses on improving the performance of identification by introducing diverse knowledge distillation in learning the identification model. Specifically, a pre-trained external identification model is used to teach the internal identification model. A simplified knowledge distillation process is illustrated in Figure 4.

2.3 Identity-driven detection

The detection-identification inconsistency challenge in image-based person search is tackled by incorporating identities into the detection process. This means during training, ground-truth person identities are used to guide person proposals, or at search time, the query identity information is utilised to refine the bounding boxes. Person search tackles person detection and person identification challenges in one framework. Existing person search methods can be divided into two-stage and end-to-end solutions from the architecture perspective. In two-stage person detection, the detection and identification models are trained separately for optimal performance of both detection and identification models  [44, 30]. However, due to the detection-identification inconsistent issue, the separately trained models may not yield the best person search result. To address the inconsistency problem between the two branches, TCTS [36] and IGPN+PCB [9] exploit query information at search time to filter out low probable proposals. End-to-end methods share visual features between detection and identification and significantly decrease runtime. However, joint learning contributes to sub-optimal detection performance [36], which subsequently worsen the detection-identification inconsistency problem. To address the problem. NPSM [25] and QEEPS  [32] leverage query information to optimise person proposals in detection process. Differ from the query-guided methods, RDLR [11] supervises bounding box generation using identification loss. Therefore, proposal bounding boxes are more reliable. In person search settings, the query identity is present in gallery images. Therefore, all methods mentioned above essentially incorporate identities into the detection process.

Figure 4: A representative end-to-end person search framework where the detection and identification branches are supervised by pre-trained detection and identification models through knowledge distillation. The detection loss , identification loss and the knowledge distillation losses and can be optimised as a multi-task learning task through back-propagation.

Text-based person search faces less detection-identification inconsistency challenge since the proposal person is identified by text-image matching without comparing bounding boxes. Therefore, text-based person search mainly focuses on learning visual and language features and improving the matching accuracy. The majority of current text-based person search methods are end-to-end frameworks that consist of a CNN backbone for extracting visual elements and a bi-LSTM for learning language representations. The two modules are jointly trained to build word-image relations from the learnt visual and language feature representations. CMCE [20] is the only two-stage framework in which the stage-one CNN-LSTM network learns cross-modal features, and in stage-two, the CNN-LSTM network refines the matching results using an attention mechanism.

CUHK-SYSU PRW LSPM
Method Feature Loss mAP(%) R@1(%) mAP(%) R@1(%) mAP(%) R@1(%)
Non-identity-driven detection
OIM [39] global OIM 75.5 78.7 21.3 49.9 14.4 47.7
IAN [38] global Softmax, Center loss 76.3 80.1 23.0 61.9
OIAM [10] global OIM, Center loss 76.98 77.86 51.02 69.85
FMT-CNN [43] global OIM, Softmax 77.2 79.8
ELF16 [42] global & local OIM 77.8 80.6
IOIM [24] global IOIM, Certer loss 79.78 79.90 21.00 63.10
EEPSS [30] global Triplet loss 79.4 80.5 25.2 47.0
JDI + IEL  [35] global IEL 79.43 79.66 24.26 69.47
RCAA [1] global & context RL reward 81.3
I-NET  [12] global OLP, HEP 79.5 81.5
MGTS [2] global & mask OIM 83.0 83.7 32.6 72.1
KD-OIM [33] global OIM 83.8 84.2
CGPS [41] global & context OIM 84.1 86.5 33.4 73.6
PFFN [14] global & multi scale Triplet loss 84.5 89.8 34.3 73.9
SMG [46] global & mask Binary Cross Entropy 86.3 86.5
FPSP [19] global Cross entropy 86.99 89.87 44.45 70.58
CLSA [18] global & multi-scale Cross entropy 87.2 88.5 38.7 65.0
APNet [49] local OIM 88.9 89.3 41.9 81.4 18.8 55.7
DHFF  [31] global & multi-scale Multi-Metric loss 90.2 91.7 41.1 70.1
BINet  [8] global & local OIM 90.8 91.6 47.2 83.4
NAE+  [3] global OIM 92.1 92.9 44.0 81.1
DKD [44] global & local 93.6 94.72 54.16 87.89
Identity-driven detection
NPSM  [25] global Softmax 77.9 81.2 24.2 53.1
QEEPS  [32] global OIM 84.4 84.4 37.1 76.7
KD-QEEPS [33] global OIM 85.0 85.5
IGPN + PCB [9] global 90.3 91.4 47.2 87.0
RDLR [11] global Proxy Triplet Loss 93.0 94.2 42.9 70.2
TCTS [36] global IDGQ loss 93.9 95.1 46.8 87.5
Table 2: Performance of image-based person search methods on CUHK-SYSU, PRW and LSPM datasets.
CUHK-PEDES
Method Feature Loss R@1 R@5 R@10
GNA-RNN [21] global Cross entropy 19.05 53.63
CMCE [20] global CMCE loss 25.94 60.48
PWM+ATH  [4] global Cross entropy 27.14 49.45 61.02
Dual-Path  [48] global Ranking loss, Instance loss 44.4 66.26 75.07
CMPM+CMPC  [45] global CMPM, CMPC 49.37 79.27
LPS+MCCL [26] global MCCL 50.58 79.06
A-GANet [26] global Binary Cross Entropy 53.14 74.03 81.95
PMA [17] global & pose 53.81 73.54 81.23
TIMAM [34] global Cross Entropy, GAN Loss 55.41 77.56 84.78
ViTAA [37] global & attribute Alignment loss 55.97 75.84 83.52
Table 3: Performance of text-based person search methods on the CUHK-PEDES dataset.

3 Datasets and Evaluation

Image-Based Text-Based
Dataset CUHK-SYSU PRW LSPS CUHK-PEDES
#frames 18184 11816 51836 40206
#identities 8432 932 4067 13003
#anno boxes 96143 34304 60433
#parts 6% 0% 60%
#cameras 6 17
#description 80440
#detector hand hand Faster R-CNN

Table 4: Person search datasets statistics.

3.1 Datasets

CUHK-SYSU [39] dataset is an image-based person search dataset, which contains 18184 images, 8432 person identities, and 99809 annotated bounding boxes. The training set contains 11206 images and 5532 query identities. The test set contains 6978 images and 2900 query identities. The training and test sets have no overlap on images or query person.

PRW [47] dataset has a total of 11816 frames which are manually annotated with 43110 person bounding boxes. 34304 people have identifications ranging from 1 to 932, and the rest are assigned identities of -2. The PRW training set has 5704 images and 482 identities, and the test set has 6112 pictures and 450 identities.

LSPS [49] dataset is a new image-based person search dataset, in which a total number of 51,836 pictures are collected. 60,433 bounding boxes and 4,067 identities are annotated. LSPS has a substantially larger number of incomplete query bounding boxes of 60% compare to 6% in CUHK-SYSU and 0% in PRW.

CUHK-PEDES dataset  [21] is currently the only dataset for text-based person search. The images are collected from five person re-id datasets and added the corresponding language annotations. It contains 40206 images of 13003 identities and 80440 textual descriptions. Each picture has 2 textual descriptions. The dataset is divided into three parts, 11003 training individuals with 34054 images and 68126 captions, 1000 validation persons with 3078 images and 6158 sentences, and 1000 test identities with 3074 pictures 6156 captions.

CUHK-SYSU and PRW are de facto datasets for image-based person search. LSPS is new to the community and contains many partial body bounding boxes, making it a specialised dataset to evaluate methods exploiting local discriminative features. CUHK-PEDES is the only text-based person search dataset, and new datasets may further advance research in this area. Dataset statistics are summarised in Table 4.

3.2 Evaluation Metrics

Cumulative matching characteristics (CMC top-K) and mean averaged precision (mAP) are the primary evaluation metrics for person search. In CMC, the top-K predicted bounding boxes are ranked according to the intersection-over-union (IoU) overlap with the ground-truths equal to or greater than 0.5. The mAP is a popular evaluation metric in object detection, in which an averaged precision (AP) is calculated for each query person, and then the final mAP is calculated as an average of all APs.

3.3 Performance Analysis

In this section, we summarise and analyse the evaluation results considering the three significant challenges in person search discussed earlier. We aim to present the influencing factors that contribute to the person search performance. We don’t discuss CNN backbones as modern CNN backbones such as ResNet50 and VGG are similar in performance and are mostly interchangeable in different methods.

We summarise the evaluation results of image-based person search methods in Table 2. We annotate feature types and loss functions used for metric learning along with the methods. Image-based person search faces the steep detection-identification inconsistency challenge. Therefore, we divide image-based person search methods into identity-driven detection and non-identity-driven detection methods to analyse the identity-driven detection solution’s effectiveness.

Methods specifically addressing the detection and identification inconsistency challenge, such as IGPN, RDLR and TCTS, outperform methods addressing the detection and identification separately. Methods exploiting fine-grained discriminative features without considering the detection-identification inconsistency challenge don’t have a clear edge over methods using global features. Our interpretation is that the query identity presents in the gallery images. Therefore, the detected person needs to be consistent with the query identity for better query-person matching. For example, if the detected person features are free from noises, the query should be free of noises. Loss functions play critical roles in guiding feature representation learning, such as using a center loss on top of the OIM loss to bring the same identities closer and separate different identities. Knowledge distillation is a notably effective strategy in training the detection and identification models. KD-OIM, KD-QEEPS and DKD beat the corresponding baseline methods without knowledge distillation.

The performance of the text-based person search methods on CUHK-PEDES is summarised in Table 3. We include feature types and loss functions along with the methods. Text-based person search is essentially a text-image matching problem, and fine-grained discriminative features play a critical role in cross-modal matching. Recent methods exploiting fine-grained discriminative features with novel loss functions outperform methods using global features and vanilla Cross-Entropy loss. Specifically, ViTAA [37] exploiting local discriminative features via attribute-feature alignment achieves the best search results.

4 Discussion and Future Directions

In this survey, we review the recent person search advances covering both image-based and text-based person search. There have been remarkable achievements in the past few years, it remains an open question on addressing the three significant person search challenges, namely the discriminative features, the query-person gap and the detection-identification inconsistency. Next, we discuss a few future research directions.

Multi-modal person search. Exiting works focus on search by either image or text. None of them attempted a multi-modal search approach, in which query image and query text complement each other. Multi-modal person search is handy when a partial person image is available such as a passport-sized image. At the same time, the free text provides the rest of the body appearance. Specifically, the CUHK-PEDES dataset can be extended with annotated bounding boxes. Thus CUHK-PEDES has both annotated bounding boxes and textual descriptions, making it a suitable candidate dataset for multi-modal person search.

Attribute-based person search. It is a big challenge for a machine to learn complex sentence syntax. Attribute-based person search method AIHM [7] outperforms the text-based method GNA-RNN [21] evaluated on cropped person images with attribute annotations. Therefore, it’s worthwhile to collect attribute annotated scene images and further advance attribute-based person search. The state-of-the-art text-based person search method ViTAA [37] decomposes textual description to attributes to learn fine-grained discriminative features. Attribute annotations may ease this process and subsequently improve text-based person search performance.

Zero-shot person search. Text-based person search is essentially a zero-shot learning problem, in which the query person is unseen in training.  [7] formulates the attribute-based person search as a Zero-Shot Learning (ZSL) problem. In zero-shot learning, zero training image is available at training time, and only semantic representations such as textual descriptions are available to infer unseen classes. Text-based person search can leverage the knowledge of zero-shot learning, such as using adversarially generated person features to augment training data.

5 Conclusion

In this survey, we provide a systematic review of the recent works on person search. For the first time, we surveyed papers on text-based person search which is less investigated than image-based person search. We briefly discuss highly regarded methods from the perspective of challenges and solutions. We summarise and compare person search methods’ performance and provide insights that a person search method needs to address the joint challenge of discriminative features, query-person gap, and detection-identification inconsistency. We finally discuss some future research directions which may be of interest to incumbent and new researchers in the field.

References

  • [1] X. Chang, P. Huang, Y. Shen, X. Liang, Y. Yang, and A. G. Hauptmann (2018) RCAA: Relational Context-Aware Agents for Person Search. pp. 84–100. External Links: Link Cited by: §1, §2.1, Table 2.
  • [2] D. Chen, S. Zhang, W. Ouyang, J. Yang, and Y. Tai (2018) Person Search via A Mask-guided Two-stream CNN Model. pp. 734–750. External Links: Link Cited by: Table 2.
  • [3] D. Chen, S. Zhang, J. Yang, and B. Schiele (2020) Norm-Aware Embedding for Efficient Person Search. pp. 12615–12624. External Links: Link Cited by: §2.1, Table 2.
  • [4] T. Chen, C. Xu, and J. Luo (2018-03) Improving Text-Based Person Search by Spatial Matching and Adaptive Threshold. In

    2018 IEEE Winter Conference on Applications of Computer Vision (WACV)

    ,
    pp. 1879–1887. External Links: Document Cited by: §2.1, §2.2, Table 3.
  • [5] D. Cheng, X. Chang, L. Liu, A. G. Hauptmann, Y. Gong, and N. Zheng (2017) Discriminative dictionary learning with ranking metric embedded for person re-identification. In IJCAI, Cited by: §1.
  • [6] D. Cheng, Y. Gong, X. Chang, W. Shi, A. G. Hauptmann, and N. Zheng (2018) Deep feature learning via structured graph laplacian embedding for person re-identification. Pattern Recognit. 82, pp. 94–104. Cited by: §1.
  • [7] Q. Dong, S. Gong, and X. Zhu (2019) Person Search by Text Attribute Query As Zero-Shot Learning. pp. 3652–3661. External Links: Link Cited by: §4, §4.
  • [8] W. Dong, Z. Zhang, C. Song, and T. Tan (2020) Bi-Directional Interaction Network for Person Search. pp. 2839–2848. External Links: Link Cited by: §2.1, Table 2.
  • [9] W. Dong, Z. Zhang, C. Song, and T. Tan (2020) Instance Guided Proposal Network for Person Search. pp. 2585–2594. External Links: Link Cited by: §2.3, Table 2.
  • [10] C. Gao, R. Yao, J. Zhao, Y. Zhou, F. Hu, and L. Li (2019-12) Structure-aware person search with self-attention and online instance aggregation matching. Neurocomputing 369, pp. 29–38 (en). External Links: ISSN 0925-2312, Link, Document Cited by: §1, §2.2, Table 2.
  • [11] C. Han, J. Ye, Y. Zhong, X. Tan, C. Zhang, C. Gao, and N. Sang (2019) Re-ID Driven Localization Refinement for Person Search. pp. 9814–9823. External Links: Link Cited by: §2.2, §2.3, Table 2.
  • [12] Z. He and L. Zhang (2019) End-to-End Detection and Re-identification Integrated Net for Person Search. In Computer Vision – ACCV 2018, C. V. Jawahar, H. Li, G. Mori, and K. Schindler (Eds.), Lecture Notes in Computer Science, Cham, pp. 349–364 (en). External Links: ISBN 978-3-030-20890-5, Document Cited by: §2.2, Table 2.
  • [13] G. Hinton, O. Vinyals, and J. Dean (2015-03)

    Distilling the Knowledge in a Neural Network

    .
    arXiv:1503.02531 [cs, stat]. Note: arXiv: 1503.02531 External Links: Link Cited by: §2.2.
  • [14] Z. Hong, B. Liu, Y. Lu, G. Yin, and N. Yu (2019) Scale Voting With Pyramidal Feature Fusion Network for Person Search. IEEE Access 7, pp. 139692–139702. Note: Conference Name: IEEE Access External Links: ISSN 2169-3536, Document Cited by: Table 2.
  • [15] K. Islam (2020-09) Person search: New paradigm of person re-identification: A survey and outlook of recent works. Image and Vision Computing 101, pp. 103970 (en). External Links: ISSN 0262-8856, Link, Document Cited by: Table 1, §1.
  • [16] Y. Jing, W. Wang, L. Wang, and T. Tan (2020-06)

    Cross-Modal Cross-Domain Moment Alignment Network for Person Search

    .
    In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10675–10683. Note: ISSN: 2575-7075 External Links: Document Cited by: §2.2.
  • [17] Y. Jing, C. Si, J. Wang, W. Wang, L. Wang, and T. Tan (2020-04) Pose-Guided Multi-Granularity Attention Network for Text-Based Person Search.

    Proceedings of the AAAI Conference on Artificial Intelligence

    34 (07), pp. 11189–11196 (en).
    Note: Number: 07 External Links: ISSN 2374-3468, Link, Document Cited by: §2.1, Table 3.
  • [18] X. Lan, X. Zhu, and S. Gong (2018) Person Search by Multi-Scale Matching. pp. 536–552. External Links: Link Cited by: §2.1, Table 2.
  • [19] J. Li, F. Liang, Y. Li, and W. Zheng (2019-07) Fast Person Search Pipeline. In 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 1114–1119. Note: ISSN: 1945-788X External Links: Document Cited by: Table 2.
  • [20] S. Li, T. Xiao, H. Li, W. Yang, and X. Wang (2017) Identity-Aware Textual-Visual Matching With Latent Co-Attention. pp. 1890–1899. External Links: Link Cited by: §2.1, §2.2, §2.3, Table 3.
  • [21] S. Li, T. Xiao, H. Li, B. Zhou, D. Yue, and X. Wang (2017) Person Search With Natural Language Description. pp. 1970–1979. External Links: Link Cited by: §1, §1, §2.1, Table 3, §2, §3.1, §4.
  • [22] Z. Li, W. Liu, X. Chang, L. Yao, M. Prakash, and H. Zhang (2019) Domain-aware unsupervised cross-dataset person re-identification. In ADMA, Cited by: §1.
  • [23] C. Liu, X. Chang, and Y. Shen (2020) Unity style transfer for person re-identification. In CVPR, Cited by: §1.
  • [24] H. Liu, W. Shi, W. Huang, and Q. Guan (2018-04) A Discriminatively Learned Feature Embedding Based on Multi-Loss Fusion For Person Search. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1668–1672. Note: ISSN: 2379-190X External Links: Document Cited by: Table 2.
  • [25] H. Liu, J. Feng, Z. Jie, K. Jayashree, B. Zhao, M. Qi, J. Jiang, and S. Yan (2017) Neural Person Search Machines. pp. 493–501. External Links: Link Cited by: §1, §2.3, Table 2.
  • [26] J. Liu, Z. Zha, R. Hong, M. Wang, and Y. Zhang (2019-10) Deep Adversarial Graph Attention Convolution Network for Text-Based Person Search. In Proceedings of the 27th ACM International Conference on Multimedia, MM ’19, New York, NY, USA, pp. 665–673. External Links: ISBN 978-1-4503-6889-6, Link, Document Cited by: Table 3.
  • [27] W. Liu, X. Chang, L. Chen, D. Phung, X. Zhang, Y. Yang, and A. G. Hauptmann (2020)

    Pair-based uncertainty and diversity promoting early active learning for person re-identification

    .
    ACM Trans. Intell. Syst. Technol. 11 (2), pp. 21:1–21:15. Cited by: §1.
  • [28] W. Liu, X. Chang, L. Chen, and Y. Yang (2017) Early active learning with pairwise constraint for person re-identification. In ECML PKDD, Cited by: §1.
  • [29] W. Liu, X. Chang, L. Chen, and Y. Yang (2018) Semi-supervised bayesian attribute learning for person re-identification. In AAAI, S. A. McIlraith and K. Q. Weinberger (Eds.), Cited by: §1.
  • [30] A. Loesch, J. Rabarisoa, and R. Audigier (2019-09) End-To-End Person Search Sequentially Trained On Aggregated Dataset. In 2019 IEEE International Conference on Image Processing (ICIP), pp. 4574–4578. Note: ISSN: 2381-8549 External Links: Document Cited by: §2.3, Table 2.
  • [31] Y. Lu, Z. Hong, B. Liu, W. Li, and N. Yu (2019-09) Dhff: Robust Multi-Scale Person Search by Dynamic Hierarchical Feature Fusion. In 2019 IEEE International Conference on Image Processing (ICIP), pp. 3935–3939. Note: ISSN: 2381-8549 External Links: Document Cited by: §2.1, Table 2.
  • [32] B. Munjal, S. Amin, F. Tombari, and F. Galasso (2019) Query-Guided End-To-End Person Search. pp. 811–820. External Links: Link Cited by: §2.3, Table 2.
  • [33] B. Munjal, F. Galasso, and S. Amin (2019-09) Knowledge Distillation for End-to-End Person Search. arXiv:1909.01058 [cs]. Note: arXiv: 1909.01058 External Links: Link Cited by: §2.2, Table 2.
  • [34] N. Sarafianos, X. Xu, and I. A. Kakadiaris (2019) Adversarial Representation Learning for Text-to-Image Matching. pp. 5814–5824. External Links: Link Cited by: Table 3.
  • [35] W. Shi, H. Liu, F. Meng, and W. Huang (2018-10) Instance Enhancing Loss: Deep Identity-Sensitive Feature Embedding for Person Search. In 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 4108–4112. Note: ISSN: 2381-8549 External Links: Document Cited by: §2.2, Table 2.
  • [36] C. Wang, B. Ma, H. Chang, S. Shan, and X. Chen (2020) TCTS: A Task-Consistent Two-Stage Framework for Person Search. pp. 11952–11961. External Links: Link Cited by: §2.3, Table 2.
  • [37] Z. Wang, Z. Fang, J. Wang, and Y. Yang (2020) ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language. In Computer Vision – ECCV 2020, A. Vedaldi, H. Bischof, T. Brox, and J. Frahm (Eds.), Lecture Notes in Computer Science, Cham, pp. 402–420 (en). External Links: ISBN 978-3-030-58610-2, Document Cited by: §1, §2.1, §2.2, Table 3, §3.3, §4.
  • [38] J. Xiao, Y. Xie, T. Tillo, K. Huang, Y. Wei, and J. Feng (2019-03) IAN: The Individual Aggregation Network for Person Search. Pattern Recognition 87, pp. 332–340 (en). External Links: ISSN 0031-3203, Link, Document Cited by: §1, Table 2.
  • [39] T. Xiao, S. Li, B. Wang, L. Lin, and X. Wang (2017) Joint Detection and Identification Feature Learning for Person Search. pp. 3415–3424. External Links: Link Cited by: §1, §2.2, Table 2, §2, §3.1.
  • [40] Y. Xu, B. Ma, R. Huang, and L. Lin (2014-11) Person Search in a Scene by Jointly Modeling People Commonness and Person Uniqueness. In Proceedings of the 22nd ACM international conference on Multimedia, MM ’14, New York, NY, USA, pp. 937–940. External Links: ISBN 978-1-4503-3063-3, Link, Document Cited by: §1, §2.
  • [41] Y. Yan, Q. Zhang, B. Ni, W. Zhang, M. Xu, and X. Yang (2019) Learning Context Graph for Person Search. pp. 2158–2167. External Links: Link Cited by: §2.1, Table 2.
  • [42] J. Yang, M. Wang, M. Li, and J. Zhang (2017) Enhanced Deep Feature Representation for Person Search. In Computer Vision, J. Yang, Q. Hu, M. Cheng, L. Wang, Q. Liu, X. Bai, and D. Meng (Eds.), Communications in Computer and Information Science, Singapore, pp. 315–327 (en). External Links: ISBN 978-981-10-7305-2, Document Cited by: Table 2.
  • [43] S. Zhai, S. Liu, X. Wang, and J. Tang (2019-11)

    FMT: fusing multi-task convolutional neural network for person search

    .
    Multimedia Tools and Applications 78 (22), pp. 31605–31616 (en). External Links: ISSN 1573-7721, Link, Document Cited by: Table 2.
  • [44] X. Zhang, X. Wang, J. Bian, C. Shen, and M. You (2020-12) Diverse Knowledge Distillation for End-to-End Person Search. arXiv:2012.11187 [cs]. Note: arXiv: 2012.11187 External Links: Link Cited by: §2.2, §2.3, Table 2.
  • [45] Y. Zhang and H. Lu (2018) Deep Cross-Modal Projection Learning for Image-Text Matching. pp. 686–701. External Links: Link Cited by: §2.2, Table 3.
  • [46] D. Zheng, J. Xiao, K. Huang, and Y. Zhao (2020-08) Segmentation mask guided end-to-end person search. Signal Processing: Image Communication 86, pp. 115876 (en). External Links: ISSN 0923-5965, Link, Document Cited by: §2.1, Table 2.
  • [47] L. Zheng, H. Zhang, S. Sun, M. Chandraker, Y. Yang, and Q. Tian (2017) Person Re-Identification in the Wild. pp. 1367–1376. External Links: Link Cited by: §3.1.
  • [48] Z. Zheng, L. Zheng, M. Garrett, Y. Yang, M. Xu, and Y. Shen (2020-05) Dual-path Convolutional Image-Text Embeddings with Instance Loss. ACM Transactions on Multimedia Computing, Communications, and Applications 16 (2), pp. 51:1–51:23. External Links: ISSN 1551-6857, Link, Document Cited by: §2.2, Table 3.
  • [49] Y. Zhong, X. Wang, and S. Zhang (2020) Robust Partial Matching for Person Search in the Wild. pp. 6827–6835. External Links: Link Cited by: §2.1, Table 2, §3.1.