Person search  aims to find a query person in a gallery of scene images. Historically, person search was an extended form of person re-identification (re-id) problem [27, 23, 22, 6, 29, 5, 28]. Therefore, early researches on person search focused on an image-based setting, which uses person image as the search query [39, 25, 1, 10, 38]. Meanwhile, research in text-based person search [21, 37] has made significant advances in the past few years. Text-based person search is handy when a probe image is unavailable but free-form natural language. The two types of person search are illustrated in Figure 1.
Person search faces more challenges than person re-id problem. Unlike the person re-id setting where the cropped person images are provided, and the primary challenge is just to bring the query-person gap. Person search needs to deal with an additional detection challenge so that the detected person can be used for the downstream identification task. The additional detection task poses more challenges due to the influences of poses, occlusion, resolution and background clutter in the scene images. Such detection results may be inconsistent with the identification task (Figure 2). Similarly, text-based person search is also more challenging than the traditional text-image matching problem  as it needs to learn discriminative features first before the text-person matching.
Person search is fast-evolving, and existing person search methods are diverse and complex. Researchers may leverage the rich knowledge concerning object detection, person re-id, and text-image matching separately. Systematic surveys concerning person search bring more values to the community. Especially, as far as we know, there is no existing survey covering the text-based person search.  surveyed works on image-based person search and neglected the text-based person search. Furthermore,  didn’t discuss the joint challenge of person detection and identification, especially the detection-identification inconsistency challenge as illustrated in Figure 2. Therefore, we survey works beyond image-based person search and provide a systematic review of the diverse person search solutions. We summarise the main differences between the previous survey  and ours in Table 1.
|Ours||Image-based, Text-based||(Challenges: Solutions) Discriminative person features: Deep feature representation learning Query-person gap: Deep metric learing Detection-identification inconsistency: Identity-driven detection|
In this survey, we aim to provide a cohesive analysis of the recent person search works so that the rationals behind the ideas can be grasped to inspire new ideas. Specifically, We surveyed recently published and pre-print person search papers from top conference venues and journals. We analyse methods from the perspective of challenges and solutions and summarise evaluation results accordingly. At the end of the paper, we provide insights on promising future research directions. In summary, the main contributions of this survey are:
In addition to image-based person search, we cover text-based person search which was neglected in the previous person search survey.
We analyse person search methods from the perspective of challenges and solutions to inspire new ideas.
We summarise and analyse existing methods’ performance and provide insights on promising future research directions.
2 Person Search
Person search is a fast-evolving research topic. In 2014,  first introduced the person search problem and pointed out the conflicting nature between person detection and person identification sub-tasks. Person detection deals with common human appearance, while the identification task focuses on a person’s uniqueness. After  introduced the first end-to-end person search framework in 2017, we have seen an increasing number of image-based person search works in the last three years. Meanwhile, in 2017, GNA-RNN  set the benchmark for text-based person search. We draw a timeline to present the person search works in Figure 3 and show the two divisions: image-based and text-based person search.
Person search addresses person detection and person identification simultaneously. There are three significant person search challenges to be considered when developing a person search solution. Firstly, a person search model needs to learn discriminative person features from scene images suitable for matching the query identity. Inevitably, the learnt person features differ from the query identity features to some degrees. Therefore the second major challenge is how to bring the gap between the query and the detected person. The third challenge is related to the conflicting nature between person detection and person identification. Person detection deals with common person appearance, while the identification task focuses on a person’s uniqueness. The detected person may not be suitable for identity matching. For instance, a partial human body could be considered a person during detection and is inconsistent with the query identity at the identification stage, which may be a full person picture.
In this section, we analyse person search methods regarding above-mentioned three challenges and corresponding solutions from the following three aspects for both image-based and text-based person search:
Deep feature representation learning. Addressing the challenge of learning discriminative person features from gallery images concerning background clutter, occlusion and poses etc.
Deep metric learning.
Addressing the challenge of bringing query-person gap by using loss functions to guide feature representation learning.
Identity-driven detection. Addressing the challenge of mitigating the detection-identification inconsistency by incorporating query identities into the detection process.
2.1 Deep Feature Representation Learning
Deep feature representation learning focuses on learning discriminative person features concerning distractors in the gallery images. The majority of the early methods exploited global person features, including context cues, while refining person proposals. Such as RCAA 
utilises the relational spatial and temporal context in a deep reinforcement learning framework to adjust the bounding boxes constantly. However, these methods didn’t consider the background clutter in the proposal bounding boxes, resulting in a situation where different persons with similar backgrounds are close in the learnt feature space. SMG eliminates background clutter using segmentation masks so that the learnt person features are invariant to the background clutter. NAE  separates persons and background by norms and discriminates person identities by angles. Person detection and object detection, in general, face the multi-scale matching challenge. To learn scale-invariant features, CLSA  and DHFF  utilise multi-level features from the identification network to solve the multi-scale matching problem with different multi-metric losses.
Local discriminative features are useful when two persons exhibit similar appearance and can’t be discriminated against merely by full-body appearance. APNet  divides the body into six parts and uses an attention mechanism to weigh the body parts’ contribution further. Unlike APNet, which uses arbitrary body parts, CGPS  proposes a region-based feature learning model for learning contextual information from a person graph. BINet  uses the guidance from the cropped person patches to eliminate the context influence outside the bounding boxes.
Deep feature representation learning in text-based person search learns visual representations for the detected person most correspondent to the textual features. Similar to image-based person search, text-based person search methods exploit global and local discriminative features. GNA-RNN  exploits global features in the first text-based LSTM-CNN person search framework and uses an attention mechanism to learn the most relevant parts. GNA-RNN only attends to visual elements and doesn’t address various text structure. To address this problem, CMCE  employs a latent semantic attention module and is more robust to text syntax variations. To address the background clutter problem, PMA  uses pose information to learn the pose-related features from the map of the key points of human. To further distinguish person with similar global appearance, PWM+ATH  utilises a word-image patch matching model to capture the local similarities. ViTAA  decomposes both image and text into attribute components and conducts a fine-grained matching strategy to enhance the interplay between image and text.
2.2 Deep Metric Learning
Deep metric learning tackles the query-person gap challenge with loss functions to guide the feature representation learning. The general purpose is to bring the detected person features close to the target identity while separating them from other identities. Similarity metrics such as Euclidean distance and cosine similarity are common measures to evaluate the similarity level among those query-person pairs. The identification task is generally formulated as a classification problem where conventional softmax loss trains the classifier. Softmax has a major problem of slow convergence with a large number of classes. OIM (Eq:2)  addresses this issue while exploiting large number of identities and unlabeled identities. OIAM  and IEL 
further improve the OIM method with additional center losses. Different from OIM variances, I-Net introduces a Siamese structure with an online pairing loss (OPL) and hard example priority Softmax loss (HEP) to bring the query-person gap. RDLR  uses the identification loss instead of regression loss for supervising the bounding boxes.
In the landmark OIM approach, the OIM loss effectively closes the query-person gap utilising labelled and unlabeled identities from training data. The probability of detected person featuresbeing recognised as the identity with class-id by a Softmax function:
Where is the labelled person features for the identity in the lookup table (LUT). is the labelled person features in the LUT. is the unlabelled person features in the LUT.
regulates probability distribution. OIM objective is to maximize the expected log-likelihood of the target.
Metric learning in text-based person search is to close the text-image modality gap. The main challenge in text-based person search is that it requires the model to deal with the complex syntax from the free-form textual description. To tackle this, methods like ViTAA, CMCE, PWM+ATH [37, 20, 4] employ attention mechanism to build relation modules between visual and textual representations. Unlike the above three methods, which are all the CNN-RNN frameworks, Dual Path  employs CNN for textual feature learning and proposes an instance loss for image-text retrieval. CMPM+CMPC  utilizes a cross-modal projection matching (CMPM) loss and a cross-modal projection classification (CMPC) loss to learn discriminative image-text representations. Similar to CMPM+CMPC, MAN  proposes cross-modal objective functions for joint embedding learning to tackle the domain adaptive text-based person search.
Inspired by the recent success of knowledge distillation , instead of directly training detection and identification sub-nets, the two modules can be learnt from the pre-trained detection and identification models . DKD  focuses on improving the performance of identification by introducing diverse knowledge distillation in learning the identification model. Specifically, a pre-trained external identification model is used to teach the internal identification model. A simplified knowledge distillation process is illustrated in Figure 4.
2.3 Identity-driven detection
The detection-identification inconsistency challenge in image-based person search is tackled by incorporating identities into the detection process. This means during training, ground-truth person identities are used to guide person proposals, or at search time, the query identity information is utilised to refine the bounding boxes. Person search tackles person detection and person identification challenges in one framework. Existing person search methods can be divided into two-stage and end-to-end solutions from the architecture perspective. In two-stage person detection, the detection and identification models are trained separately for optimal performance of both detection and identification models [44, 30]. However, due to the detection-identification inconsistent issue, the separately trained models may not yield the best person search result. To address the inconsistency problem between the two branches, TCTS  and IGPN+PCB  exploit query information at search time to filter out low probable proposals. End-to-end methods share visual features between detection and identification and significantly decrease runtime. However, joint learning contributes to sub-optimal detection performance , which subsequently worsen the detection-identification inconsistency problem. To address the problem. NPSM  and QEEPS  leverage query information to optimise person proposals in detection process. Differ from the query-guided methods, RDLR  supervises bounding box generation using identification loss. Therefore, proposal bounding boxes are more reliable. In person search settings, the query identity is present in gallery images. Therefore, all methods mentioned above essentially incorporate identities into the detection process.
Text-based person search faces less detection-identification inconsistency challenge since the proposal person is identified by text-image matching without comparing bounding boxes. Therefore, text-based person search mainly focuses on learning visual and language features and improving the matching accuracy. The majority of current text-based person search methods are end-to-end frameworks that consist of a CNN backbone for extracting visual elements and a bi-LSTM for learning language representations. The two modules are jointly trained to build word-image relations from the learnt visual and language feature representations. CMCE  is the only two-stage framework in which the stage-one CNN-LSTM network learns cross-modal features, and in stage-two, the CNN-LSTM network refines the matching results using an attention mechanism.
|IAN ||global||Softmax, Center loss||76.3||80.1||23.0||61.9|
|OIAM ||global||OIM, Center loss||76.98||77.86||51.02||69.85|
|FMT-CNN ||global||OIM, Softmax||77.2||79.8|
|ELF16 ||global & local||OIM||77.8||80.6|
|IOIM ||global||IOIM, Certer loss||79.78||79.90||21.00||63.10|
|EEPSS ||global||Triplet loss||79.4||80.5||25.2||47.0|
|JDI + IEL ||global||IEL||79.43||79.66||24.26||69.47|
|RCAA ||global & context||RL reward||81.3|
|I-NET ||global||OLP, HEP||79.5||81.5|
|MGTS ||global & mask||OIM||83.0||83.7||32.6||72.1|
|CGPS ||global & context||OIM||84.1||86.5||33.4||73.6|
|PFFN ||global & multi scale||Triplet loss||84.5||89.8||34.3||73.9|
|SMG ||global & mask||Binary Cross Entropy||86.3||86.5|
|FPSP ||global||Cross entropy||86.99||89.87||44.45||70.58|
|CLSA ||global & multi-scale||Cross entropy||87.2||88.5||38.7||65.0|
|DHFF ||global & multi-scale||Multi-Metric loss||90.2||91.7||41.1||70.1|
|BINet ||global & local||OIM||90.8||91.6||47.2||83.4|
|DKD ||global & local||93.6||94.72||54.16||87.89|
|IGPN + PCB ||global||90.3||91.4||47.2||87.0|
|RDLR ||global||Proxy Triplet Loss||93.0||94.2||42.9||70.2|
|TCTS ||global||IDGQ loss||93.9||95.1||46.8||87.5|
|GNA-RNN ||global||Cross entropy||19.05||53.63|
|CMCE ||global||CMCE loss||25.94||60.48|
|PWM+ATH ||global||Cross entropy||27.14||49.45||61.02|
|Dual-Path ||global||Ranking loss, Instance loss||44.4||66.26||75.07|
|CMPM+CMPC ||global||CMPM, CMPC||49.37||79.27|
|A-GANet ||global||Binary Cross Entropy||53.14||74.03||81.95|
|PMA ||global & pose||53.81||73.54||81.23|
|TIMAM ||global||Cross Entropy, GAN Loss||55.41||77.56||84.78|
|ViTAA ||global & attribute||Alignment loss||55.97||75.84||83.52|
3 Datasets and Evaluation
CUHK-SYSU  dataset is an image-based person search dataset, which contains 18184 images, 8432 person identities, and 99809 annotated bounding boxes. The training set contains 11206 images and 5532 query identities. The test set contains 6978 images and 2900 query identities. The training and test sets have no overlap on images or query person.
PRW  dataset has a total of 11816 frames which are manually annotated with 43110 person bounding boxes. 34304 people have identifications ranging from 1 to 932, and the rest are assigned identities of -2. The PRW training set has 5704 images and 482 identities, and the test set has 6112 pictures and 450 identities.
LSPS  dataset is a new image-based person search dataset, in which a total number of 51,836 pictures are collected. 60,433 bounding boxes and 4,067 identities are annotated. LSPS has a substantially larger number of incomplete query bounding boxes of 60% compare to 6% in CUHK-SYSU and 0% in PRW.
CUHK-PEDES dataset  is currently the only dataset for text-based person search. The images are collected from five person re-id datasets and added the corresponding language annotations. It contains 40206 images of 13003 identities and 80440 textual descriptions. Each picture has 2 textual descriptions. The dataset is divided into three parts, 11003 training individuals with 34054 images and 68126 captions, 1000 validation persons with 3078 images and 6158 sentences, and 1000 test identities with 3074 pictures 6156 captions.
CUHK-SYSU and PRW are de facto datasets for image-based person search. LSPS is new to the community and contains many partial body bounding boxes, making it a specialised dataset to evaluate methods exploiting local discriminative features. CUHK-PEDES is the only text-based person search dataset, and new datasets may further advance research in this area. Dataset statistics are summarised in Table 4.
3.2 Evaluation Metrics
Cumulative matching characteristics (CMC top-K) and mean averaged precision (mAP) are the primary evaluation metrics for person search. In CMC, the top-K predicted bounding boxes are ranked according to the intersection-over-union (IoU) overlap with the ground-truths equal to or greater than 0.5. The mAP is a popular evaluation metric in object detection, in which an averaged precision (AP) is calculated for each query person, and then the final mAP is calculated as an average of all APs.
3.3 Performance Analysis
In this section, we summarise and analyse the evaluation results considering the three significant challenges in person search discussed earlier. We aim to present the influencing factors that contribute to the person search performance. We don’t discuss CNN backbones as modern CNN backbones such as ResNet50 and VGG are similar in performance and are mostly interchangeable in different methods.
We summarise the evaluation results of image-based person search methods in Table 2. We annotate feature types and loss functions used for metric learning along with the methods. Image-based person search faces the steep detection-identification inconsistency challenge. Therefore, we divide image-based person search methods into identity-driven detection and non-identity-driven detection methods to analyse the identity-driven detection solution’s effectiveness.
Methods specifically addressing the detection and identification inconsistency challenge, such as IGPN, RDLR and TCTS, outperform methods addressing the detection and identification separately. Methods exploiting fine-grained discriminative features without considering the detection-identification inconsistency challenge don’t have a clear edge over methods using global features. Our interpretation is that the query identity presents in the gallery images. Therefore, the detected person needs to be consistent with the query identity for better query-person matching. For example, if the detected person features are free from noises, the query should be free of noises. Loss functions play critical roles in guiding feature representation learning, such as using a center loss on top of the OIM loss to bring the same identities closer and separate different identities. Knowledge distillation is a notably effective strategy in training the detection and identification models. KD-OIM, KD-QEEPS and DKD beat the corresponding baseline methods without knowledge distillation.
The performance of the text-based person search methods on CUHK-PEDES is summarised in Table 3. We include feature types and loss functions along with the methods. Text-based person search is essentially a text-image matching problem, and fine-grained discriminative features play a critical role in cross-modal matching. Recent methods exploiting fine-grained discriminative features with novel loss functions outperform methods using global features and vanilla Cross-Entropy loss. Specifically, ViTAA  exploiting local discriminative features via attribute-feature alignment achieves the best search results.
4 Discussion and Future Directions
In this survey, we review the recent person search advances covering both image-based and text-based person search. There have been remarkable achievements in the past few years, it remains an open question on addressing the three significant person search challenges, namely the discriminative features, the query-person gap and the detection-identification inconsistency. Next, we discuss a few future research directions.
Multi-modal person search. Exiting works focus on search by either image or text. None of them attempted a multi-modal search approach, in which query image and query text complement each other. Multi-modal person search is handy when a partial person image is available such as a passport-sized image. At the same time, the free text provides the rest of the body appearance. Specifically, the CUHK-PEDES dataset can be extended with annotated bounding boxes. Thus CUHK-PEDES has both annotated bounding boxes and textual descriptions, making it a suitable candidate dataset for multi-modal person search.
Attribute-based person search. It is a big challenge for a machine to learn complex sentence syntax. Attribute-based person search method AIHM  outperforms the text-based method GNA-RNN  evaluated on cropped person images with attribute annotations. Therefore, it’s worthwhile to collect attribute annotated scene images and further advance attribute-based person search. The state-of-the-art text-based person search method ViTAA  decomposes textual description to attributes to learn fine-grained discriminative features. Attribute annotations may ease this process and subsequently improve text-based person search performance.
Zero-shot person search. Text-based person search is essentially a zero-shot learning problem, in which the query person is unseen in training.  formulates the attribute-based person search as a Zero-Shot Learning (ZSL) problem. In zero-shot learning, zero training image is available at training time, and only semantic representations such as textual descriptions are available to infer unseen classes. Text-based person search can leverage the knowledge of zero-shot learning, such as using adversarially generated person features to augment training data.
In this survey, we provide a systematic review of the recent works on person search. For the first time, we surveyed papers on text-based person search which is less investigated than image-based person search. We briefly discuss highly regarded methods from the perspective of challenges and solutions. We summarise and compare person search methods’ performance and provide insights that a person search method needs to address the joint challenge of discriminative features, query-person gap, and detection-identification inconsistency. We finally discuss some future research directions which may be of interest to incumbent and new researchers in the field.
-  (2018) RCAA: Relational Context-Aware Agents for Person Search. pp. 84–100. External Links: Cited by: §1, §2.1, Table 2.
-  (2018) Person Search via A Mask-guided Two-stream CNN Model. pp. 734–750. External Links: Cited by: Table 2.
-  (2020) Norm-Aware Embedding for Efficient Person Search. pp. 12615–12624. External Links: Cited by: §2.1, Table 2.
Improving Text-Based Person Search by Spatial Matching and Adaptive Threshold.
2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1879–1887. External Links: Cited by: §2.1, §2.2, Table 3.
-  (2017) Discriminative dictionary learning with ranking metric embedded for person re-identification. In IJCAI, Cited by: §1.
-  (2018) Deep feature learning via structured graph laplacian embedding for person re-identification. Pattern Recognit. 82, pp. 94–104. Cited by: §1.
-  (2019) Person Search by Text Attribute Query As Zero-Shot Learning. pp. 3652–3661. External Links: Cited by: §4, §4.
-  (2020) Bi-Directional Interaction Network for Person Search. pp. 2839–2848. External Links: Cited by: §2.1, Table 2.
-  (2020) Instance Guided Proposal Network for Person Search. pp. 2585–2594. External Links: Cited by: §2.3, Table 2.
-  (2019-12) Structure-aware person search with self-attention and online instance aggregation matching. Neurocomputing 369, pp. 29–38 (en). External Links: Cited by: §1, §2.2, Table 2.
-  (2019) Re-ID Driven Localization Refinement for Person Search. pp. 9814–9823. External Links: Cited by: §2.2, §2.3, Table 2.
-  (2019) End-to-End Detection and Re-identification Integrated Net for Person Search. In Computer Vision – ACCV 2018, C. V. Jawahar, H. Li, G. Mori, and K. Schindler (Eds.), Lecture Notes in Computer Science, Cham, pp. 349–364 (en). External Links: Cited by: §2.2, Table 2.
Distilling the Knowledge in a Neural Network. arXiv:1503.02531 [cs, stat]. Note: arXiv: 1503.02531 External Links: Cited by: §2.2.
-  (2019) Scale Voting With Pyramidal Feature Fusion Network for Person Search. IEEE Access 7, pp. 139692–139702. Note: Conference Name: IEEE Access External Links: Cited by: Table 2.
-  (2020-09) Person search: New paradigm of person re-identification: A survey and outlook of recent works. Image and Vision Computing 101, pp. 103970 (en). External Links: Cited by: Table 1, §1.
Cross-Modal Cross-Domain Moment Alignment Network for Person Search. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10675–10683. Note: ISSN: 2575-7075 External Links: Cited by: §2.2.
Pose-Guided Multi-Granularity Attention Network for Text-Based Person Search.
Proceedings of the AAAI Conference on Artificial Intelligence34 (07), pp. 11189–11196 (en). Note: Number: 07 External Links: Cited by: §2.1, Table 3.
-  (2018) Person Search by Multi-Scale Matching. pp. 536–552. External Links: Cited by: §2.1, Table 2.
-  (2019-07) Fast Person Search Pipeline. In 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 1114–1119. Note: ISSN: 1945-788X External Links: Cited by: Table 2.
-  (2017) Identity-Aware Textual-Visual Matching With Latent Co-Attention. pp. 1890–1899. External Links: Cited by: §2.1, §2.2, §2.3, Table 3.
-  (2017) Person Search With Natural Language Description. pp. 1970–1979. External Links: Cited by: §1, §1, §2.1, Table 3, §2, §3.1, §4.
-  (2019) Domain-aware unsupervised cross-dataset person re-identification. In ADMA, Cited by: §1.
-  (2020) Unity style transfer for person re-identification. In CVPR, Cited by: §1.
-  (2018-04) A Discriminatively Learned Feature Embedding Based on Multi-Loss Fusion For Person Search. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1668–1672. Note: ISSN: 2379-190X External Links: Cited by: Table 2.
-  (2017) Neural Person Search Machines. pp. 493–501. External Links: Cited by: §1, §2.3, Table 2.
-  (2019-10) Deep Adversarial Graph Attention Convolution Network for Text-Based Person Search. In Proceedings of the 27th ACM International Conference on Multimedia, MM ’19, New York, NY, USA, pp. 665–673. External Links: Cited by: Table 3.
Pair-based uncertainty and diversity promoting early active learning for person re-identification. ACM Trans. Intell. Syst. Technol. 11 (2), pp. 21:1–21:15. Cited by: §1.
-  (2017) Early active learning with pairwise constraint for person re-identification. In ECML PKDD, Cited by: §1.
-  (2018) Semi-supervised bayesian attribute learning for person re-identification. In AAAI, S. A. McIlraith and K. Q. Weinberger (Eds.), Cited by: §1.
-  (2019-09) End-To-End Person Search Sequentially Trained On Aggregated Dataset. In 2019 IEEE International Conference on Image Processing (ICIP), pp. 4574–4578. Note: ISSN: 2381-8549 External Links: Cited by: §2.3, Table 2.
-  (2019-09) Dhff: Robust Multi-Scale Person Search by Dynamic Hierarchical Feature Fusion. In 2019 IEEE International Conference on Image Processing (ICIP), pp. 3935–3939. Note: ISSN: 2381-8549 External Links: Cited by: §2.1, Table 2.
-  (2019) Query-Guided End-To-End Person Search. pp. 811–820. External Links: Cited by: §2.3, Table 2.
-  (2019-09) Knowledge Distillation for End-to-End Person Search. arXiv:1909.01058 [cs]. Note: arXiv: 1909.01058 External Links: Cited by: §2.2, Table 2.
-  (2019) Adversarial Representation Learning for Text-to-Image Matching. pp. 5814–5824. External Links: Cited by: Table 3.
-  (2018-10) Instance Enhancing Loss: Deep Identity-Sensitive Feature Embedding for Person Search. In 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 4108–4112. Note: ISSN: 2381-8549 External Links: Cited by: §2.2, Table 2.
-  (2020) TCTS: A Task-Consistent Two-Stage Framework for Person Search. pp. 11952–11961. External Links: Cited by: §2.3, Table 2.
-  (2020) ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language. In Computer Vision – ECCV 2020, A. Vedaldi, H. Bischof, T. Brox, and J. Frahm (Eds.), Lecture Notes in Computer Science, Cham, pp. 402–420 (en). External Links: Cited by: §1, §2.1, §2.2, Table 3, §3.3, §4.
-  (2019-03) IAN: The Individual Aggregation Network for Person Search. Pattern Recognition 87, pp. 332–340 (en). External Links: Cited by: §1, Table 2.
-  (2017) Joint Detection and Identification Feature Learning for Person Search. pp. 3415–3424. External Links: Cited by: §1, §2.2, Table 2, §2, §3.1.
-  (2014-11) Person Search in a Scene by Jointly Modeling People Commonness and Person Uniqueness. In Proceedings of the 22nd ACM international conference on Multimedia, MM ’14, New York, NY, USA, pp. 937–940. External Links: Cited by: §1, §2.
-  (2019) Learning Context Graph for Person Search. pp. 2158–2167. External Links: Cited by: §2.1, Table 2.
-  (2017) Enhanced Deep Feature Representation for Person Search. In Computer Vision, J. Yang, Q. Hu, M. Cheng, L. Wang, Q. Liu, X. Bai, and D. Meng (Eds.), Communications in Computer and Information Science, Singapore, pp. 315–327 (en). External Links: Cited by: Table 2.
FMT: fusing multi-task convolutional neural network for person search. Multimedia Tools and Applications 78 (22), pp. 31605–31616 (en). External Links: Cited by: Table 2.
-  (2020-12) Diverse Knowledge Distillation for End-to-End Person Search. arXiv:2012.11187 [cs]. Note: arXiv: 2012.11187 External Links: Cited by: §2.2, §2.3, Table 2.
-  (2018) Deep Cross-Modal Projection Learning for Image-Text Matching. pp. 686–701. External Links: Cited by: §2.2, Table 3.
-  (2020-08) Segmentation mask guided end-to-end person search. Signal Processing: Image Communication 86, pp. 115876 (en). External Links: Cited by: §2.1, Table 2.
-  (2017) Person Re-Identification in the Wild. pp. 1367–1376. External Links: Cited by: §3.1.
-  (2020-05) Dual-path Convolutional Image-Text Embeddings with Instance Loss. ACM Transactions on Multimedia Computing, Communications, and Applications 16 (2), pp. 51:1–51:23. External Links: Cited by: §2.2, Table 3.
-  (2020) Robust Partial Matching for Person Search in the Wild. pp. 6827–6835. External Links: Cited by: §2.1, Table 2, §3.1.