Perspectives on individual animal identification from biology and computer vision

02/28/2021 ∙ by Maxime Vidal, et al. ∙ 0

Identifying individual animals is crucial for many biological investigations. In response to some of the limitations of current identification methods, new automated computer vision approaches have emerged with strong performance. Here, we review current advances of computer vision identification techniques to provide both computer scientists and biologists with an overview of the available tools and discuss their applications. We conclude by offering recommendations for starting an animal identification project, illustrate current limitations and propose how they might be addressed in the future.



There are no comments yet.


page 1

page 2

page 3

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


The identification111In publications, the terminology re-identification is often used interchangeably. In this review we posit that re-identification refers to the recognition of (previously) known individuals, hence we use identification as the more general term. of specific individuals is central to addressing many questions in biology: does a sea turtle return to its natal beach to lay eggs? How does a social hierarchy form through individual interactions? What is the relationship between individual resource-use and physical development? Indeed, the need for identification in biological investigations has resulted in the development and application of a variety of identification methods, ranging from genetic methods (1, 2), capture-recapture (3, 4), to GPS tracking (5) and radio-frequency identification (6, 7). While each of these methods is capable of providing reliable re-identification, each is also subject to limitations, such as invasive implantation or deployment procedures, high costs, or demanding logistical requirements. Image-based identification techniques using photos, camera-traps, or videos offer (potentially) low-cost and non-invasive alternatives. However, identification success rates of image-based analyses have traditionally been lower than the aforementioned alternatives.

Using computer vision to identify animals dates back to the early 1990s and has developed quickly since (see Schneider et al. (8)

for an excellent historical account). The advancement of new machine learning tools, especially deep learning for computer vision 

(9, 10, 8, 11, 12), offers powerful methods for improving the accuracy of image-based identification analyses. In this review, we introduce relevant background for animal identification with deep learning based on visual data, review recent developments, identify remaining challenges and discuss the consequences for biology, including ecology, ethology, neuroscience, and conservation modeling. We aimed to create a review that can act as a reference for researchers, who are new to animal identification and can also help current practitioners interested in applying novel methods to their identification work.

Biological context for identification

Conspecific identification is crucial for most animals to avoid conflict, establish hierarchy, and mate (e.g., (13, 14, 15)). For some species, it is understood how they identify other individuals — for instance, penguin chicks make use of the distinct vocal signature based on frequency modulation to recognize their parents within enormous colonies (16). However, for many species the mechanisms of conspecific identification are poorly understood. What is certain, is that animals use multiple modalities to identify each other, from audition, to vision and chemosensation (13, 14, 15). Much like animals use different sensors, techniques using non-visual data have been proposed for identification.

Figure 1: a, Animal biometrics examples featuring unique distinguishable phenotypic traits (adapted with permission from b, Three pictures each of three example tigers from the Amur Tiger reID Dataset (17) and three pictures each of three example bears from the McNeil River State Game Sanctuary (photo credit Alaska Department of Fish and Game). The tiger stripes are robust visual biometrics. The bear images highlight the variations across seasons (fur and weight changes). Postures and contexts vary more or less depending on the species and dataset. c,

Machine Learning identification pipeline from raw data acquisition through feature extraction to identity retrieval.

From the technical point of view, the selection of characteristics for animal identification (termed biometrics) is primarily based on universality, uniqueness, permanence, measurability, feasibility and reliability (18). More specifically, reliable biometrics should display little intra-class variation and strong inter-class variation. Fingerprints, iris scans, and DNA analysis are some of the well-established biometric methods used to identify humans (18, 1, 2). However, other physical, chemical, or behavioral features such as gait patterns may be used to identify animals based on the taxonomic focus and study design (18, 19). For the purposes of this review, we will focus on visual biometrics and what is currently possible.

Visual biometrics: framing the problem

What are the key considerations for selecting potential “biometric” markers in images? We believe they are: (a) a strong differentiation among individuals based on their visible traits, and (b) the reliable presence of these permanent features by the species of interest within the study area. Furthermore, one should also consider whether they will be applied to a closed or open set (20). Consider a fully labeled dataset of unique individuals. In closed set identification, the problem consists of images of multiple, otherwise known, individuals, who shall be “found again” in (novel) images. In the more general case of open set identification, the (test) dataset may contain previously unseen individuals, thus permitting the formation of new identities. Depending on the application, both of these cases are important in biology and may require the selection of different computational methods.

Animal identification: the computer vision perspective

Some animals have specific traits, such as characteristic fur patterns, a property which greatly simplifies visual identification, while other species lack a salient, distinctive appearance (Figure 1a-b). Apart from visual appearance, additional challenges complicate animal identification, such as changes to the body over time, environmental changes and migration, deformable bodies, variability in illumination and view, as well as obstruction (Figure 1b).

Computational pipelines for animal identification consist of a sensor and modules for feature extraction, decision-making, and a system database (Figure 1c; (18)). Sensors, typically cameras, capture images of individuals which are transformed into salient, discriminative features by the feature extraction module. In computer vision, a feature is a distinctive attribute of the content of an image (at a particular location). Features might be e.g., edges, textures, or more abstract attributes. The decision-making module uses the computed features to identify the most similar known identities from the system database module, and in some cases, assign the individual to a new identity.

Computer vision pipelines for many other (so-called) tasks, such as animal localization, species classification and pose estimation follow similar principles (see Box 

LABEL:box-tasks for more details on those systems). As we will illustrate below, many of these tasks also play an important role in identification pipelines; for instance animal localization and alignment is a common component (see Figure 1c).

example=Other relevant computer vision tasksbox-tasks,text only Deep learning has greatly advanced many computer vision tasks relevant to biology (9, 10, 8, 11, 12). For example:
Animal detection: A subset of object detection, the branch of computer vision that deals with the tasks of localizing and identifying objects in images or videos. Current state of the art methods for object recognition usually employ anchor boxes, which represent the target location, size, and object class, such as in EfficientDet (21), or newly end-to-end like, as in DETR (22). Of particular interest for camera-trap data is the recently released MegaDetector (23), which is trained on more than 1 million labeled animal images and also actively updated222 Relevant for camera-traps, Beery et al. (24) developed attention-based detectors that can reason over multiple frames, integrating contextual information and thereby strongly improving performance. Various detectors have been used in the animal identification pipeline (25, 26, 27), which, however, are no longer state-of-the-art on detection benchmarks.
Animal species classification

: The problem of classifying

species based on pictures (28, 29). As performance is correlated to the amount of training data, most recently synthetic animals have been used to improve the classification of rare species, which is a major challenge (30).
Pose estimation: The problem of estimating the pose of an entity from images or videos. Algorithms can be top down, where the individuals are first localized, as in Wang et al. (31) or bottom up (without prior localization) as in Cheng et al. (32)

. Recently, several user-friendly and powerful software packages for pose estimation with deep learning of animals were developed, reviewed in 

Mathis et al. (12); real-time methods for closed-loop feedback are also available (33).
Alignment: In order to effectively compare similar regions and orientations - animals (in pictures) are often aligned using pose estimation or object recognition techniques.

In order to quantify identification performance, let us define the relevant evaluation metrics. These include top-

N accuracy, i.e., the frequency of the true identity being within the most confident predictions, and the mean average precision (mAP) defined in Box LABEL:box-glossaryDL. A perfect system would demonstrate a top-1 score and mAP of . However, animal identification through computer vision is a challenging problem, and as we will discuss, algorithms typically fall short of this ideal performance. Research often focuses on one species (and dataset), which is typically encouraged by the available data. Hence few benchmarks have been established, and adding to the varying difficulty of the different datasets, different evaluation methods and train-test splits are used, making the comparison between the different methods arduous.

As reviewed by Schneider et al. (8), the use of computer vision for animal identification dates back to the early 1990s. This recent review also contains a comprehensive table summarizing the major milestones and publications. In the meantime the field has further accelerated and we provide a table with important animal identification datasets since its publication (Table 1).

In computer vision, features are the components of an image that are considered significant. In the context of animal identification pipelines (and computer vision more broadly), two classes of features can be distinguished. Handcrafted features are a class of image properties that are manually selected (a process known as “feature engineering”) and then used directly for matching, or computationally utilized to train classifiers. This stands in contrast to deep features which are automatically determined using learning algorithms to train hierarchical processing architectures based on data (9, 11, 12)

. In the following sections, we will structure the review of relevant papers depending on the use of handcrafted and deep features. We also provide a glossary of relevant machine learning terms in Box 


1.3 Method Species Target Identities Train Images Test Images Results  Chen et al. (34) Panda Face 218 5,845 402 Top-1:*,  Li et al. (17) Tiger (ATRW) Body 92 1,887 1,762 Top-1:, Top-5:, mAP:  Liu et al. (35) Tiger (ATRW) Body 92 1,887 1,762 Top-1:, Top-5:, mAP:  Moskvyak et al. (36) Manta Ray Underside 120 1,380 350 Top-1:, Top-5:  Moskvyak et al. (36) Humpback Whale Fluke 633 2,358 550 Top-1:, Top-5:  Bouma et al. (37) Common Dolphin Fin 180 2,800 700 Top-1:, Top-5:  Nepovinnykh et al. (38) Saimaa Ringed Seal Pelage 46 3,000 2,000 Top-1:, Top-5:  Schofield et al. (39) Chimpanzee Face 23 3,249,739 1,018,494 Frame-acc:, Track-acc: Clapham et al. (40) Brown Bear Face 132 3,740 934 Acc:

  • Closed Set

  • Open Set

  • Single Camera Wild

Table 1: Recent animal identification publications and relevant data. This table extends the excellent list in Schneider et al. (8) by subsequent publications.

Handcrafted features

The use of handcrafted features is a powerful, classical computer vision method, which has been applied to many different species that display unique, salient visual patterns, such as zebras stripes (41), cheetahs’ spots (42), and guenons’ face marks (43) (Figure 1a). Hiby et al. (44) exploited the properties of tiger stripes to calculate similarity scores between individuals through a surface model of tigers’ skins. The authors report high model performance estimates (a top-1 score of and a top-5 score of on 298 individuals). It is notable that this technique performed well despite differences in camera angle of up to 66 degrees and image collection dates of 7 years, both of which serve to illustrate the strength of this approach. In addition to the feature descriptors used to distinguish individuals by fur patterns, these models may also utilize edge detectors, thereby allowing individual identification of marine species by fin shape. Indeed, Hughes and Burghardt (45) employed edge detection to examine great white shark fins by encoding fin contours with boundary descriptors. The authors achieved a top-1 score of , a top-10 score of , and a mAP of on 2456 images of 85 individuals (45). Similarly, Weideman et al. (46) used an integral curvature representation of cetacean flukes and fins to achieve a top-1 score of using 10,713 images of 401 bottlenose dolphins and a top-1 score of using 7,173 images of 3,572 humpback whales. Furthermore, work on great apes has shown that both global features (i.e., those derived from the whole image) and local features (i.e., those derived from small image patches) can be combined to increase model performance (47, 48). Local features were also used in Crouse et al. (49), who achieved top-1 scores of on a dataset of 462 images of 80 individual red-bellied lemurs. Prior to matching, the images were aligned with the help of manual eye markings.

Common handcrafted features like SIFT (50), which are designed to extract salient, invariant features from images can also be utilized. Building upon this, instead of focusing on a single species, Crall et al. (51) developed HotSpotter, an algorithm able to use stripes, spots and other patterns for the identification of multiple species.

As these studies highlight, for species with highly discernible physical traits, handcrafted features have shown to be accurate but often lack robustness. Deep learning has strongly improved the capabilities for animal identification, especially for species without clear visual traits. However, as we will discuss, hybrid systems have been emerged recently that combine handcrafted features and deep learning.

Deep features

In the last decade, deep learning, a subset of machine learning in which decision-making is performed using learned features generated algorithmically (e.g., empirical risk minimization with labeled examples; Box LABEL:box-glossaryDL

) has emerged as a powerful tool to analyze, extract, and recognize information. This emergence is due, in large part, to increases in computing power, the availability of large-scale datasets, open-source and well-maintained deep learning packages and advances in optimization and architecture design 

(9, 8, 11)

. Large datasets are ideal for deep learning, but data augmentation, transfer learning and other approaches reduce the thirst for data 

(9, 8, 11, 12). Data augmentation is a way to artificially increase dataset size by applying image transformations such as cropping, translating, rotating, as well as incorporating synthetic images (9, 12, 30). Since identification algorithms should be robust to those changes, augmentation often improves performance. Transfer learning is commonly used to benefit from pre-trained models (Box LABEL:box-glossaryDL).

Through deep learning, models can learn multiple increasingly complex representations within their progressively deeper layers, and can achieve high discriminative power. Further, as deep features do not need to be specifically engineered and are learned correspondingly for each unique dataset, deep learning provides a potential solution for many of the challenges typically faced in individual animal identification. Such challenges include species with few natural markings, inconsistencies in markings (caused by changes in pelage, scars, etc.), low-resolution sensor data, odd poses, and occlusions. Two methods have been widely used for animal identification with deep learning:

classification and metric learning.

Classification models

In the classification setting, a class (identity) from a set number of classes is probabilistically assigned to the input image. This assignment decision comes after the extraction of features usually done by convolutional neural networks (ConvNets), a class of deep learning algorithms typically applied to image analyses. Note that the input to ConvNets can be the raw images, but also the processed handcrafted features. In one of the first appearances of ConvNets for individual animal classification, 

Freytag et al. (52) improved upon work by Loos and Ernst (48) by increasing the accuracy with which individual chimpanzees could be identified from two datasets of cropped face images (C-Zoo and C-Tai) from and to and Freytag et al. (52)

used linear support vector machines (SVM) to differentiate features extracted by AlexNet 

(53), a popular ConvNet, without the use of aligned faces. They also tackled additional tasks including sex prediction and age estimation. Subsequent work by Brust et al. (54) also used AlexNet features on cropped faces of gorillas, and SVMs for classification. They reported a top-5 score of with 147 individuals and 2,500 images. A similar approach was developed for elephants by Körschens et al. (55). The authors used the YOLO object detection network (25) to automatically predict bounding boxes around elephants’ heads (see Box LABEL:box-tasks). Features were then extracted with a ResNet50 (56)

ConvNet, and projected to a lower-dimensional space by principal component analysis (PCA), followed by SVM classification. On a highly unbalanced dataset (i.e., highly uneven numbers of images per individual) consisting of 2078 images of 276 individuals, 

Körschens et al. (55) achieved a top-1 score of , and a top-10 score of . This increased to and for top-1 and top-10, respectively, when two images of the individual in question were used in the query. In practice, it is often possible to capture multiple images of an individual, for instance with camera traps, hence multi-image queries should be used when available.

Other examples of ConvNets for classification include work by Deb et al. (57), who explored both open- and closed-set identification for 3,000 face images of 129 lemurs, 1,450 images of 49 golden monkeys, and 5,559 images of 90 chimpanzees. The authors used manually annotated landmarks to align the faces, and introduced the PrimNet model architecture,which outperformed previous methods (e.g., Schroff et al. (58) and Crouse et al. (49) that used handcrafted features). Using this method, Deb et al. (57) achieved , and accuracy for lemurs, golden monkeys, and chimpanzees, respectively for the closed-set. Finally, Chen et al. (34) demonstrated a face classification method for captive pandas. After detecting the faces with Faster-RCNN (26), they used a modified ResNet50 (56) for face segmentation (binary mask output), alignment (outputs are the affine transformation parameters), and classification. They report a top-1 score of on a closed set containing 6,441 images from 218 individuals and a top-1 score of on an open set of 176 individuals.  Chen et al. (34) also used the Grad-CAM method (59)

, which propagates the gradient information from the last convolutional layers back to the image to visualize the neural networks’ activations, to determine that the areas around the pandas’ eyes and noses had the strongest impact on the identification process.

While the examples presented thus far have employed still images, videos have also been used for deep learning-based animal identification. Unlike single images, videos have the advantage that neighboring video frames often show the same individuals with slight variations in pose, view, and obstruction among others. While collecting data, one can gather more images in the same time-frame (at the cost of higher storage). For videos, Schofield et al. (39)

introduced a complete pipeline for the identification of chimpanzees, including face detection (with a single shot detector 

(27)), face tracking (Kanade-Lucas-Tomasi (KLT) tracker), sex and identity recognition (classification problem through modified VGG-M architectures (60)), and social network analysis. The video format of the data allowed the authors to maximize the number of images per individual, resulting in a dataset of 20,000 face tracks of 23 individuals. This amounts to 10,000,000 face detections, resulting in a frame-level accuracy of and a track-level accuracy of

. The authors also use a confusion matrix to inspect which individuals were identified incorrectly and reasons for this error. Perhaps unsurprisingly juveniles and (genetically) related individuals were the most difficult to separate. In follow-up work, 

Bain et al. (61) were able to predict identities of all individuals in a frame instead of predicting from face tracks. The authors showed that it is possible to use the activations of the last layer of a counting ConvNet (i.e., whose goal is to count the number of individuals in a frame) to find the spatial regions occupied by the chimpanzees. After cropping, the regions were fed into a fine-grained classification ConvNet. This resulted in similar identification precision compared to using only the face or the body, but a higher recall.

In laboratory settings, tracking is a common approach to identify individual animals (62, 7). Recent tracking system, such as (63) and TRex (64), have demonstrated the ability to track individuals in large groups of lab animals (fish, mice, etc.) by combining tracking with a ID-classifying ConvNet.

(Deep) Metric learning

Most recent studies on identification have focused on deep metric learning, a technique that seeks to automatically learn how to measure similarity and distance between deep features. Deep metric learning approaches commonly employ methods such as siamese networks or triplet loss (Box LABEL:box-glossaryDL). Schneider et al. (65) found that triplet loss always outperformed the siamese approach in a recent study considering a diverse group of five different species (humans, chimpanzees, humpback whales, fruit flies, and Siberian tigers); thereby they also tested many different ConvNets, and metric learning always gave better results. Importantly, metric learning frameworks naturally are able to handle open datasets, thereby allowing for both re-identification of a known individual and the discovery of new individuals.

Competitions often spur progress in computer vision (11, 12). In 2019 the first, large-scale benchmark for animal identification was released (example images in Figure 1b); it poses two identification challenges on the ATRW tiger dataset: plain, where images of tigers are cropped and normalized with manually curated bounding boxes and poses, and wild, where the tigers first have to be localized an then identified (17).

example=Deep Learning terms glossarybox-glossaryDL,text only Machine and deep learning: Machine learning seeks to develop algorithms that automatically detect patterns in data. These algorithms can then be used to uncover patterns, to predict future data, or to perform other kinds of decision making under uncertainty (66). Deep learning is a subset of machine learning that utilizes artificial neural networks with multiple layers as part of the algorithms. For computer vision problems, convolutional neural networks (ConvNets) are the de-facto standard building blocks. They consist of stacked convolutional filters with learnable weights (i.e., connections between computational elements). Convolutions bake translation invariance into the architecture and decrease the number of parameters due to weight sharing, as opposed to ordinary fully-connected neural networks (53, 9, 56).

Support vector machines (SVM)

: A powerful classification technique, which learns a hyperplane to separate data points in feature spaces. Nonlinear SVMs also exist 

Principal component analysis (PCA)

: An unsupervised technique that identifies a lower dimensional linear space, such that the variance of the projected data is maximized 

Classification network: A neural network that directly predicts the class of an object from inputs (e.g., images). The outputs have a confidence score as to whether they correspond to the target. Often trained with a cross entropy loss, or other prediction error based losses (53, 60, 56).
Metric learning: A branch of machine learning which consists in learning how to measure similarity and distance between data points (68) - common examples include siamese networks and triplet loss. Siamese networks: Two identical networks that consider a pair of inputs and classify them as similar or different, based on the distance between their embeddings. It is often trained with a contrastive loss, a distance-based loss, which pulls positive (similar) pairs together and pushes negative (different) pairs away:

where is any metric function parametrized by ,

is a binary variable that represents if

is a similar or dissimilar pair (69).
Triplet loss: As opposed to pairs in siamese networks, this loss uses triplets; it tries to bring the embedding of the anchor image closer to another image of the same class than to an image of a different class by a certain margin. In its naive form

where () is the distance from the anchor image to its positive (negative) counterpart. As shown in Hermans et al. (70)

, models with this loss are difficult to train, and triplet mining (heuristics for the most useful triplets) is often used. One solution is

semi-hard mining, e.g., showing moderately difficult samples in large batches, as in Schroff et al. (58). Another more efficient solution is the batch hard variant introduced in (70), where one samples multiple images for a few classes, and then keeps the hardest (i.e., furthest in the feature space) positive and the hardest negative for each class to compute the loss. Mining the easy positives (very similar pairs), (71) has recently proven to obtain good results.
Mean average precision (mAP): With precision defined as (TP: true positives, FP: false positives), and recall defined as (FN: false negative), the average precision is the area under the precision recall curve (see Murphy (67) for more information), and the mAP is the mean for all queries.
Transfer learning: The process when models are initialized with features, trained on a (related) large-scale annotated dataset, and then finetuned on the target task. This is particularly advantageous when the target dataset consists of only few labeled examples (72, 12)

. ImageNet is a large-scale object recognition data set 


that was particularly influential for transfer learning. As we outline in the main text, many methods use ConvNets pre-trained on ImageNet such as AlexNet 

(53), VGG (60), and ResNet (56).

The authors of the benchmark also evaluated various baseline methods and showed that metric learning was better than classification. Their strongest method, was a pose part-based model, which based on the pose estimation subnetwork processes the tiger image in 7 parts to get different feature representations and then used triplet loss for the global and local representations. On the single camera wild setting, the authors reported a mAP of , a top-1 score of and a top-5 score of - from 92 identities in 8,076 videos (17). Fourteen teams submitted methods and the best contribution for the competition, developed a novel triple-stream framework (35). The framework has a full image stream together with two local streams (one for the trunk and one for the limbs, which were localized based on the pose skeleton) as an additional task. However, they only required the part streams during training, which, given that pose estimation can be noisy, is particularly fitting for tiger identification in the wild. Liu et al. (35) also increased the spatial resolution of the ResNet backbone (56). Higher spatial resolution is also commonly used for other fine grained tasks such as human re-identification, segmentation (74) and pose estimation (32, 12). With these modification, the authors achieved a top-1 score of for single-camera wild-ID, and a score of across cameras.

Metric learning has also been used for mantas with semi-hard triplet mining (36). Human-assembled photos of mantas’ undersides (where they have unique spots) were fed as input to a ConvNet. Once the embeddings were created, Moskvyak et al. (36) used the -nearest neighbors algorithm (k-NN) for identification. The authors achieved a top-1 score of and top-5 of using a dataset of 1730 images of 120 mantas. Replicating the method for humpback whales’ flukes, the authors report a top-1 score of and a top-5 score of using 2908 images of 633 individual whales. Similarly, Bouma et al. (37) used batch hard triplet loss to achieve top-1 and top-5 scores of and , respectively, on 3,544 images of 185 common dolphins. When using an additional 1200 images as distractors, the authors reported a drop of in the top-1 score and in the top-5 score. The authors also explore the impact of increasing the number of individuals and the number of images per individual, both leading to score increases. Nepovinnykh et al. (38) applied metric learning to re-identify Saimaa ringed seals. After segmentation with DeepLab (74) and subsequent cropping, the authors extracted pelage pattern features with a Sato tubeness filter, used as input to their network. Indeed, Bakliwal and Ravela (75) also showed that – for some species – priming ConvNets with handcrafted features produced better results than using the raw images. Instead of using k-NNs, Nepovinnykh et al. (38) adopt topologically aware heatmaps to identify individual seals - both the query image and the database images are split into patches whose similarity is computed, and among the most similar, topological similarity is checked through angle difference ranking. For 2,000 images of 46 seals, the authors achieved a top-1 score of and a top-5 score of . Overall these recent papers highlight that recent work has combined handcrafted and deep learning approaches to boost the performance.

Applications of animal identification in field and laboratory settings333For the purposes of this review, we forgo discussion of individual identification in the context of the agricultural sciences, as circumstances differ greatly in those environments. However, we note that there is an emerging body of computer vision for the identification of livestock (76, 77).

Here, we discuss the use of computer vision techniques for animal identification from a biological perspective and offer insights on how these techniques can be used to address broad and far-reaching biological and ecological questions. In addition, we stress that the use of semi-automated or full deep learning tools for animal identification is in its infancy and current results need to be evaluated in comparison with the logistical, financial, and potential ethical constraints of conventional tagging and genetic sampling methods.

The specific goals for animal identification can vary greatly among studies and settings, objectives can generally be classified into two categories – applied and etiological – based on rationale, intention, and study design.

Applied uses include those with the primary aims of describing, characterizing, and monitoring observed phenomena, including species distribution and abundance, animal movements and home ranges, or resource selection (78, 45, 79). These studies frequently adopt a top-down perspective in which the predominant focus is on groups (e.g., populations), with individuals simply viewed as units within the group, and minimal interpretation of individual variability. As such, many of the modeling techniques employed for applied investigations, such as mark-recapture (3, 4), are adept at incorporating quantified uncertainty in identification. However, reliable identification of individuals in applied studies is essential to accurate enumeration and differentiation, when creating generalized models based on individual observations (80).

As such, misidentification can result in potential bias, and substantial consequences for biological interpretations and conclusions. For example, Johansson et al. (81) demonstrated the potential ramifications of individual misclassification on capture-recapture derived estimates of population abundance using camera trap photos of captive snow leopards (Panthera uncia). The authors employed a manual identification method wherein human observers were asked to identify individuals in images based on pelage patterns. Results indicated that observer misclassification resulted in population abundance estimates that were inflated by up to one third. Hupman et al. (82) also noted the potential for individual misidentification to result in under- or over-inflation of abundance estimates in a study exploring the use of photo-based mark-recapture for assessing population parameters of common dolphins (Delphinus sp.). The authors found that inclusion of less distinctive individuals, for which identification was more difficult, resulted in seasonal abundance estimates that were substantially different (sometimes lower and sometimes higher) than when using photos of distinctive individuals only.

Many other questions, such as identifying the social hierarchy from passive observation, demand highly accurate identity tracking (7, 39). Weissbrod et al. (7) showed that due to the fine differences in social interactions even high identification rates of can have measurable effects on results (as social hierarchy requires integration over long time scales). Though the current systems are not perfect, they can already outperform experts. For instance, Schofield et al. (39) demonstrated (on a test set, for the frame-level identification task) that both novices (around ) and experts (around ) are outperformed by their system that reaches , while only taking 60ms vs. 130min and 55min, for novices and experts, respectively.

These studies demonstrate the need to 1) be aware of the specific implications of potential errors in individual identification to their study conclusions, and 2) choose an identification method that seeks to minimize misclassification to the extent practicable given their specific objectives and study design. While the techniques described in this review have already assisted in lowering identification error rates so as to mitigate this concern, for some applications they already reach sufficient accuracy (e.g., for conservation and management (83, 49, 84, 39, 85), neuroscience and ethology (63, 64) and public engagement in zoos (86)). However, for many contexts, they have yet to reach the levels of precision associated with other applied techniques.

For comparison, genetic analyses are the highest current standard for individual identification in applied investigations. While genotyping error rates caused by allelic dropouts, null alleles, false alleles, etc. can vary between and per locus (87); genetic analyses combine numerous loci to reach individual identification error rates of  (88, 89). We stress that apart from accuracy many other variables should be considered, such as the relatively high logistical and financial costs associated with collecting and analyzing genetic samples, and the requirement to resample for re-identification. This results in sample sizes that are orders of magnitude smaller than many of the studies described above, with attendant decreases in explanatory/predictive power. Further, repeated invasive sampling may directly or indirectly affect animal behavior. Minimally invasive sampling (MIS) techniques using feces, hair, feathers, remote skin biopsies, etc. offer the potential to conduct genetic identification in a less intrusive and less expensive manner (90). MIS analyses are; however, vulnerable to genotyping errors associated with sample quality, with potential consequent ramifications to genotyping success rates (e.g. , , and for Fluidigm SNP type assays of wolf feces, wildcat hair, and bear hair, respectively;  Carroll et al. (90) and references therein). These challenges, coupled with the increasing success rates and low financial and logistical costs of computer vision analyses, may effectively narrow the gap when selecting an identification technique. Further, in some scenarios the acceptable level of analytical error can be reduced without compromising the investigation of specific project goals, in which case biologists may find that current computer vision techniques are sufficiently robust to address applied biological questions in a manner that is low cost, logistically efficient, and can make use of pre-existing and archival images and video footage.

Unlike their applied counterparts, etiological uses of individual identification do not seek to describe and characterize observed phenomena, but rather, to understand the mechanisms driving and influencing observed phenomena. This may include questions related to behavioral interactions, social hierarchies, mate choice, competition, altruism, etc. (e.g., (91, 92, 7, 62)). Etiological studies are frequently based on a bottom-up perspective, in which the focus is on individuals, or the roles of individuals within groups, and interpretations of individual variability often play predominant roles (93). As such, etiological investigations may seek to identify individuals in order to derive relationships among individuals, interpret outcomes of interactions between known individuals, assess and understand individuals’ roles in interactions or within groups, or characterize individual behavioral traits (94, 95, 96, 39). These studies are commonly done in laboratory settings, which presents some study limitations. The ability to record data and assign it to an individual in the wild may be crucial to understand the origin and development of personality (97, 98). Characterizing behavioral variability of individuals is of great importance for understanding behavior (99). This has been highlighted in a meta-analysis that showed that a third of behavioral variation among individuals could be attributed to individual differences (100). The impact of repeatably measuring observations for single individuals can also be illustrated in the context of brain mapping. Repeated sampling of human individuals with fMRI is revealing fine-grained features of functional organization, which were previously unseen due to variability across the population (101). Overall, longitudinal monitoring of single individuals with powerful techniques such as omics (102) and brain imaging (103) is heralding an exciting age for biology.

Starting an animal identification project

For biological practitioners seeking to make sense of the possibilities offered by computer vision, the importance of inter-disciplinary collaborations with computer scientists cannot be overstated. Since the advent of high definition camera traps, some scientists find they have hours of opportunistically collected footage, without a direct line of inquiry motivating the data collection. Collaboration with computer scientists can help to ensure the most productive analytical approach to using this footage to derive biological insights. Further, by instituting collaborations early in the study design process, computer scientists can assist biologists in implementing image collection protocols that are specifically designed for use with deep learning analyses.

General considerations for starting an image-based animal identification project, such as which features to focus on, are nicely reviewed by Kühl and Burghardt (19). Although handcrafted features can be suited for certain species (e.g., zebras), deep learning has proven to be a more robust and general framework for image-based animal identification. However, at least a few thousand images with ideally multiple examples of each individual are needed, constituting the biggest limitation to obtaining good results. As such, data collection is a crucial part of the process. Discussion between biologists and computer scientists is fundamental and should be engaged before data collection. As previously mentioned, camera traps (104, 4) can be used to collect data on a large spatial scale with little human involvement and less impact on animal behavior. Images from camera traps can be used both for model training and monitored for inference. The ability of camera traps to record multiple photos/videos of an individual allows multiple streams of data to be combined to enhance the identification process (as for localization (24)). Further, camera traps minimize the potential influence of humans on animal behavior as seen in Schneider et al. (8).

Following image collection, researchers should employ tools to automatically sieve through the data, to localize animals in pictures. Recent powerful detection models Beery et al. (23, 24), trained on large-scale datasets of annotated images, are becoming available and generalize reasonably well to other datasets (Box LABEL:box-tasks). Those or other object detection models can be used out-of-the-box or finetuned to create bounding boxes around faces or bodies (25, 26, 27), which can then be aligned by using pose estimation models (12). Additionally, animal segmentation for background removal/identification can be beneficial.

Most methods require an annotated dataset, which means that one needs to label the identity of different animals on example frames; unsupervised methods are also possible (e.g., (51, 105)). To start animal identification, a baseline model using triplet loss should be tried, which can be improved with different data augmentation schemes, combined with a classification loss and/or expanded into more multi-task models. If attempting the classification approach, assigning classes to previously unseen individuals is not straightforward. Most works usually add a node for "unknown individual". The evaluation pipeline to monitor the model’s performance has to be carefully designed to account for the way in which it will be used in practice. Of particular importance is how to split the dataset between training and testing subsets to avoid data leakage.

Also, how well a network trained with photos from professional DSLR cameras can generalize to images with largely different quality, e.g., camera traps, must be determined. In our experience this is typically not ideal, which is why it is important to get results from different cameras during training, if generalization is important. Ideally, one trains the model with the type of data that is used during deployment. However, there are also computational methods to deal with this. For human reidentification, Zhong et al. (106) used CycleGAN to transfer images from one camera style to another, although camera traps are perhaps too different. The generalization to other (similar) species is also a path to explore.

Other aspects to consider are the efficiency of models, even if identification is usually in an offline setting. Also, adding a “human-in-the-loop” approach, if the model does not perform perfectly, can still save time relative to a fully manual approach. For other considerations necessary to build a production ready system, readers are encouraged to look at Duyck et al. (84), who created Sloop, with subsequent deep learning integration by Bakliwal and Ravela (75), used for the identification of multiple species. Furthermore, Berger-Wolf et al. (85) implemented different algorithms such as HotSpotter (51) in the Wild Me platform, which is actively used to identify a variety of species.

Beyond image-based identification

As humans are highly visual creatures, it is intuitive that we gravitate to image-based identification techniques. Indeed, this preference may offer few drawbacks for applied uses of individual identification in which the researcher’s perspective is the primary lens through which discrimination and identification will occur. However, the interpretive objectives of etiological uses of identification add an additional layer of complexity that may not always favor a visually based method. When seeking to provide inference on the mechanisms shaping individual interactions, etiological applications must both 1) satisfy the researcher’s need to correctly identify known individuals, and 2) attempt to interpret interactions based on an understanding of the sensory method by which the individuals in question identify and re-identify conspecifics (107, 108, 109).

Different species employ numerous mechanisms to engage in conspecific identification (e.g., olfactory, auditory, chemosensory (13, 14, 15)). For example, previous studies have noted that giant pandas use olfaction for mate selection and assessment of competitors (110, 15). Conversely, Schneider et al. (111) showed that Drosophila, which were previously assumed not to be strongly visually based, were able to engage in successful visual identification of conspecifics. Thus, etiological applications that seek to find mechanisms of animal identification must consider both the perspectives of the researcher and the individuals under study (much like Uexküll’s concept of Umwelt (112)), and researchers must embrace their roles as both observers and translators attempting to reconcile potential differences between human and animal perspectives.

Just how animals identify each other with different senses, future methods could also focus on other forms of data. Indeed, deep learning is not just revolutionizing computer vision, but problems as diverse as finding novel antibiotics (113) and protein folding (114). Thus, we believe that deep learning will also strongly impact identification techniques for non-visual data and make those techniques both logistically feasible and sufficiently non-invasive so as to limit disturbances to natural behaviors. Previous studies have employed techniques that are promising. For example, acoustic signals were used by  Marin-Cudraz et al. (80) for counting of rock ptarmigan, and by Stowell et al. (115) in an identification method which seems to generalize to multiple bird species. Furthermore,  Kulahci et al. (116) used deep learning to describe individual identification using olfactory-auditory matching in lemurs. However, this research was conducted on captive animals and further work is required to allow for application of these techniques in wild settings.

Conclusions and outlook

Recent advances in computational techniques, such as deep-learning, have enhanced the proficiency of animal identification methods. Further, end-to-end pipelines have been created, which allow for the reliable identification of specific individuals, with, in some cases, better than human-level performance. As most methods follow a supervised learning approach, the expansion of datasets is crucial for the development of new models, as is collaboration between computer science and biological teams in order to understand the applicable questions to both fields. Hopefully, this review has elucidated the fact that lines of inquiry to one group might have previously been unknown to the other, and that interdisciplinary collaboration offers a path for future methodological developments that are analytically nimble and powerful, but also applicable, dependable, and practicable to addressing real-world phenomena.

As we have illustrated, recent advances have contributed to the deployment of some methods, but many challenges remain. For instance, individual identification of unmarked, featureless animals such as brown bears or primates has not yet been achieved for hundreds of individuals in the wild. Likewise, discrimination of close siblings remains a challenging computer vision individual identification problem. How can the performance of animal individual identification methods be further improved?

Since considerably more attention and effort has been devoted to the computer vision question of human identification, versus animal identification, this vast literature can be used as a source of inspiration for improving animal individual identification techniques. Many human identification studies experiment with additional losses in a multi-task setting. For instance, whereas triplet loss maximizes inter-class distance, the center loss minimizes intra-class distance, and can be used in combination with the former to pull samples of the same class closer together (117). Further, human identification studies demonstrate the use of spatio-temporal information to discard impossible matches (118). This idea could be used if an animal has just been identified somewhere and cannot possibly be at another distant location (using camera traps’ timestamps and GPS); this concept is also employed in occupancy modeling. Re-ranking the predictions has also been employed to improve performance in human-based studies using metric learning (119). This approach aggregates the losses with an additional re-ranking based distance. Appropriate augmentation techniques can also boost performance (120). In order to overcome occlusions, one can randomly erase rectangles of random pixels and random size from images in the training data set.

Applications involving human face recognition have also contributed significantly to the development of identification technologies. Human face datasets typically contain orders of magnitude more data (thousands of identities and many more images - e.g., the YouTube Faces dataset (121)) than those available for other animals. One of the first applications of deep learning to human face recognition was DeepFace, which used a classification approach (122). This was followed by Deep Face Recognition, which implemented a triplet loss bootstrapped from a classification network (123) and FaceNet by Schroff et al. (58) which used triplet loss with semi hard mining on large batches. FaceNet achieved a top-1 score of when applied to the Youtube Faces dataset. Some methods also showed promise for unlabeled datasets; Otto et al. (105) proposed an unsupervised method to cluster millions of faces with approximate rank order metric. We note that this research also raises ethical concerns (124). Finally, benchmarks are important for advancing research and fortunately they are emerging for animal identification (17), but more are needed.

Overall, broad areas for future efforts may include 1) improving the robustness of models to include other sensory modalities (consistent with conspecific identification inquiry) or movement patterns, 2) combining advanced image-based identification techniques with methods and technologies already commonly used in biological studies and surveys (e.g., remote sensing, population genetics, etc.), and 3) creating larger benchmarks and datasets, for instance, via Citizen Science programs (e.g., eMammal; iNaturalist, Great Grevy’s Rally). While these areas offer strong potential to foster analytical and computational advances, we caution that future advancements should not be dominated by technical innovation, but rather, technical development should proceed in parallel with, or be driven by, the application of novel and meaningful biological questions. Following a question-based approach will assist in ensuring the applicability and utility of new technologies to biological investigations and potentially mitigate against the use of identification techniques in suboptimal settings.


The authors wish to thank the McNeil River State Game Sanctuary, Alaska Department of Fish and Game, for providing inspiration for this review. Support for MV, BR, NW, and BPH was provided by Alaska Education Tax Credit funds contributed by the At-Sea Processors Association and the Groundfish Forum. We are grateful to Lucas Stoffl, Mackenzie Mathis, Niccolò Stefanini, Alessandro Marin Vargas, Axel Bisi, Sébastien Hausmann, Travis DeWolf, Jessy Lauer, Matthieu Le Cauchois, Jean-Michel Mongeau, Michael Reichert, Lorian Schweikert, Alexander Davis, Jess Kanwal, Rod Braga, and Wes Larson for comments on earlier versions of this manuscript.



  • Palsbøll (1999) Per J Palsbøll. Genetic tagging: contemporary molecular ecology. Biological Journal of the Linnean Society, 68(1-2):3–22, 1999.
  • Avise (2012) John C Avise. Molecular markers, natural history and evolution. Springer Science & Business Media, 2012.
  • Royle et al. (2013) J Andrew Royle, Richard B Chandler, Rahel Sollmann, and Beth Gardner. Spatial capture-recapture. Academic Press, 2013.
  • Choo et al. (2020) Yan Ru Choo, Enoka P Kudavidanage, Thakshila Ravindra Amarasinghe, Thilina Nimalrathna, Marcus AH Chua, and Edward L Webb. Best practices for reporting individual identification using camera trap photographs. Global Ecology and Conservation, page e01294, 2020.
  • Baudouin et al. (2015) Marie Baudouin, Benoît de Thoisy, Philippine Chambault, Rachel Berzins, Mathieu Entraygues, Laurent Kelle, Avasania Turny, Yvon Le Maho, and Damien Chevallier. Identification of key marine areas for conservation based on satellite tracking of post-nesting migrating green turtles (chelonia mydas). Biological Conservation, 184:36–41, 2015.
  • Bonter and Bridge (2011) David N Bonter and Eli S Bridge. Applications of radio frequency identification (rfid) in ornithological research: a review. Journal of Field Ornithology, 82(1):1–10, 2011.
  • Weissbrod et al. (2013) Aharon Weissbrod, Alexander Shapiro, Genadiy Vasserman, Liat Edry, Molly Dayan, Assif Yitzhaky, Libi Hertzberg, Ofer Feinerman, and Tali Kimchi. Automated long-term tracking and social behavioural phenotyping of animal colonies within a semi-natural environment. Nature communications, 4(1):1–10, 2013.
  • Schneider et al. (2019) Stefan Schneider, Graham W Taylor, Stefan Linquist, and Stefan C Kremer. Past, present and future approaches using computer vision for animal re-identification from camera trap data. Methods in Ecology and Evolution, 10(4):461–470, 2019.
  • LeCun et al. (2015) Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
  • Norouzzadeh et al. (2018a) Mohammad Sadegh Norouzzadeh, Anh Nguyen, Margaret Kosmala, Alexandra Swanson, Meredith S Palmer, Craig Packer, and Jeff Clune. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proceedings of the National Academy of Sciences, 115(25):E5716–E5725, 2018a.
  • Wu et al. (2020) Xiongwei Wu, Doyen Sahoo, and Steven CH Hoi. Recent advances in deep learning for object detection. Neurocomputing, 2020.
  • Mathis et al. (2020) Alexander Mathis, S. Schneider, Jessy Lauer, and M. Mathis. A primer on motion capture with deep learning: Principles, pitfalls, and perspectives. Neuron, 108:44–65, 2020.
  • Levréro et al. (2009) Florence Levréro, Laureline Durand, Clémentine Vignal, A Blanc, and Nicolas Mathevon. Begging calls support offspring individual identity and recognition by zebra finch parents. Comptes rendus biologies, 332:579–89, 07 2009. doi: 10.1016/j.crvi.2009.02.006.
  • Martin et al. (2008) Stephen Martin, Heikki Helanterä, and Falko Drijfhout. Colony-specific hydrocarbons identify nest mates in two species of formica ant. Journal of chemical ecology, 34:1072–80, 07 2008. doi: 10.1007/s10886-008-9482-7.
  • Hagey and Macdonald (2003) Lee Hagey and Edith Macdonald. Chemical cues identify gender and individuality in giant pandas (ailuropoda melanoleuca). Journal of chemical ecology, 29:1479–88, 07 2003. doi: 10.1023/A:1024225806263.
  • Jouventin et al. (1999) Pierre Jouventin, Thierry Aubin, and Thierry Lengagne. Finding a parent in a king penguin colony: the acoustic system of individual recognition. Animal Behaviour, 57(6):1175–1183, 1999.
  • Li et al. (2019) Shuyuan Li, Jianguo Li, Weiyao Lin, and Hanlin Tang. Amur tiger re-identification in the wild. arXiv preprint arXiv:1906.05586, 2019.
  • Jain et al. (2007) Anil K Jain, Patrick Flynn, and Arun A Ross. Handbook of biometrics. Springer Science & Business Media, 2007.
  • Kühl and Burghardt (2013) Hjalmar Kühl and Tilo Burghardt. Animal biometrics: Quantifying and detecting phenotypic appearance. Trends in ecology & evolution, 28, 03 2013. doi: 10.1016/j.tree.2013.02.013.
  • Phillips et al. (2011) P Jonathon Phillips, Patrick Grother, and Ross Micheals. Evaluation methods in face recognition. In Handbook of face recognition, pages 551–574. Springer, 2011.
  • Tan et al. (2020) Mingxing Tan, Ruoming Pang, and Quoc V Le. Efficientdet: Scalable and efficient object detection. In

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    , pages 10781–10790, 2020.
  • Carion et al. (2020) Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In European Conference on Computer Vision, pages 213–229. Springer, 2020.
  • Beery et al. (2019) Sara Beery, Dan Morris, and Siyu Yang. Efficient pipeline for camera trap image review. arXiv preprint arXiv:1907.06772, 2019.
  • Beery et al. (2020a) Sara Beery, Guanhang Wu, Vivek Rathod, Ronny Votel, and Jonathan Huang. Context r-cnn: Long term temporal context for per-camera object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13075–13085, 2020a.
  • Redmon et al. (2016) Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
  • Ren et al. (2016) Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6):1137–1149, 2016.
  • Liu et al. (2016) Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. Ssd: Single shot multibox detector. Lecture Notes in Computer Science, page 21–37, 2016. ISSN 1611-3349. doi: 10.1007/978-3-319-46448-0_2.
  • Villa et al. (2017) Alexander Gomez Villa, Augusto Salazar, and Francisco Vargas. Towards automatic wild animal monitoring: Identification of animal species in camera-trap images using very deep convolutional neural networks. Ecological informatics, 41:24–32, 2017.
  • Norouzzadeh et al. (2018b) Mohammad Sadegh Norouzzadeh, Anh Nguyen, Margaret Kosmala, Alexandra Swanson, Meredith S Palmer, Craig Packer, and Jeff Clune. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proceedings of the National Academy of Sciences, 115(25):E5716–E5725, 2018b.
  • Beery et al. (2020b) Sara Beery, Yang Liu, Dan Morris, Jim Piavis, Ashish Kapoor, Neel Joshi, Markus Meister, and Pietro Perona. Synthetic examples improve generalization for rare classes. In The IEEE Winter Conference on Applications of Computer Vision, pages 863–873, 2020b.
  • Wang et al. (2020) Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 2020.
  • Cheng et al. (2020) Bowen Cheng, Bin Xiao, Jingdong Wang, Honghui Shi, Thomas S Huang, and Lei Zhang. Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5386–5395, 2020.
  • Kane et al. (2020) Gary A Kane, Gonçalo Lopes, Jonny L Saunders, Alexander Mathis, and Mackenzie W Mathis. Real-time, low-latency closed-loop feedback using markerless posture tracking. Elife, 9:e61909, 2020.
  • Chen et al. (2020) Peng Chen, Pranjal Swarup, Wojciech Matkowski, Adams Kong, Su Han, Zhihe Zhang, and Hou Rong. A study on giant panda recognition based on images of a large proportion of captive pandas. Ecology and Evolution, 10, 03 2020. doi: 10.1002/ece3.6152.
  • Liu et al. (2019) Cen Liu, Rong Zhang, and Lijun Guo. Part-pose guided amur tiger re-identification. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 0–0, 2019.
  • Moskvyak et al. (2019) Olga Moskvyak, Frederic Maire, Asia O Armstrong, Feras Dayoub, and Mahsa Baktashmotlagh. Robust re-identification of manta rays from natural markings by learning pose invariant embeddings. arXiv preprint arXiv:1902.10847, 2019.
  • Bouma et al. (2018) Soren Bouma, Matthew DM Pawley, Krista Hupman, and Andrew Gilman. Individual common dolphin identification via metric embedding learning. In 2018 International Conference on Image and Vision Computing New Zealand (IVCNZ), pages 1–6. IEEE, 2018.
  • Nepovinnykh et al. (2020) Ekaterina Nepovinnykh, Tuomas Eerola, and Heikki Kalviainen.

    Siamese network based pelage pattern matching for ringed seal re-identification.

    In Proceedings of the IEEE Winter Conference on Applications of Computer Vision Workshops, pages 25–34, 2020.
  • Schofield et al. (2019) Daniel Schofield, Arsha Nagrani, Andrew Zisserman, Misato Hayashi, Tetsuro Matsuzawa, Dora Biro, and Susana Carvalho. Chimpanzee face recognition from videos in the wild using deep learning. Science Advances, 5, 09 2019. doi: 10.1126/sciadv.aaw0736.
  • Clapham et al. (2020) Melanie Clapham, Ed Miller, Mary Nguyen, and Chris T Darimont. Automated facial recognition for wildlife that lack unique markings: A deep learning approach for brown bears. Ecology and Evolution, 2020.
  • Lahiri et al. (2011) Mayank Lahiri, Chayant Tantipathananandh, Rosemary Warungu, Daniel I Rubenstein, and Tanya Y Berger-Wolf. Biometric animal databases from field photographs: identification of individual zebra in the wild. In Proceedings of the 1st ACM international conference on multimedia retrieval, pages 1–8, 2011.
  • Kelly (2001) Marcella Kelly. Computer-aided photograph matching in studies using individual identification: An example from serengeti cheetahs. Journal of Mammalogy, 82:440–449, 05 2001. doi: 10.1644/1545-1542(2001)082<0440:CAPMIS>2.0.CO;2.
  • Allen and Higham (2015) William Allen and James Higham. Assessing the potential information content of multicomponent visual signals: A machine learning approach. Proceedings. Biological sciences / The Royal Society, 282, 03 2015. doi: 10.1098/rspb.2014.2284.
  • Hiby et al. (2009) Lex Hiby, Phil Lovell, Narendra Patil, Narayanarao Kumar, Arjun Gopalaswamy, and K Karanth. A tiger cannot change its stripes: Using a three-dimensional model to match images of living tigers and tiger skins. Biology letters, 5:383–6, 04 2009. doi: 10.1098/rsbl.2009.0028.
  • Hughes and Burghardt (2017) Benjamin Hughes and Tilo Burghardt. Automated visual fin identification of individual great white sharks. International Journal of Computer Vision, 122(3):542–557, 2017.
  • Weideman et al. (2017) Hendrik J Weideman, Zachary M Jablons, Jason Holmberg, Kiirsten Flynn, John Calambokidis, Reny B Tyson, Jason B Allen, Randall S Wells, Krista Hupman, Kim Urian, et al. Integral curvature representation and matching algorithms for identification of dolphins and whales. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 2831–2839, 2017.
  • Loos (2012) Alexander Loos. Identification of great apes using gabor features and locality preserving projections. In Proceedings of the 1st ACM international workshop on Multimedia analysis for ecological data, pages 19–24, 2012.
  • Loos and Ernst (2013) Alexander Loos and Andreas Ernst. An automated chimpanzee identification system using face detection and recognition. EURASIP Journal on Image and Video Processing, 2013:49, 07 2013. doi: 10.1186/1687-5281-2013-49.
  • Crouse et al. (2017) David Crouse, Rachel Jacobs, Zach Richardson, Scott Klum, Anil Jain, Andrea Baden, and Stacey Tecot. Lemurfaceid: A face recognition system to facilitate individual identification of lemurs. BMC Zoology, 2, 02 2017. doi: 10.1186/s40850-016-0011-9.
  • Lowe (2004) David G Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91–110, 2004.
  • Crall et al. (2013) Jonathan P Crall, Charles V Stewart, Tanya Y Berger-Wolf, Daniel I Rubenstein, and Siva R Sundaresan. Hotspotter—patterned species instance recognition. In 2013 IEEE workshop on applications of computer vision (WACV), pages 230–237. IEEE, 2013.
  • Freytag et al. (2016) Alexander Freytag, Erik Rodner, Marcel Simon, Alexander Loos, Hjalmar S Kühl, and Joachim Denzler. Chimpanzee faces in the wild: Log-euclidean cnns for predicting identities and attributes of primates. In German Conference on Pattern Recognition, pages 51–63. Springer, 2016.
  • Krizhevsky et al. (2012) Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
  • Brust et al. (2017) Clemens-Alexander Brust, Tilo Burghardt, Milou Groenenberg, Christoph Kading, Hjalmar S Kuhl, Marie L Manguette, and Joachim Denzler. Towards automated visual monitoring of individual gorillas in the wild. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 2820–2830, 2017.
  • Körschens et al. (2018) Matthias Körschens, Björn Barz, and Joachim Denzler. Towards automatic identification of elephants in the wild. arXiv preprint arXiv:1812.04418, 2018.
  • He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • Deb et al. (2018) Debayan Deb, Susan Wiper, Sixue Gong, Yichun Shi, Cori Tymoszek, Alison Fletcher, and Anil K Jain. Face recognition: Primates in the wild. In 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), pages 1–10. IEEE, 2018.
  • Schroff et al. (2015) Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2015. doi: 10.1109/cvpr.2015.7298682.
  • Selvaraju et al. (2019) Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128(2):336–359, Oct 2019. ISSN 1573-1405. doi: 10.1007/s11263-019-01228-7.
  • Chatfield et al. (2014) Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531, 2014.
  • Bain et al. (2019) Max Bain, Arsha Nagrani, Daniel Schofield, and Andrew Zisserman. Count, crop and recognise: Fine-grained recognition in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0, 2019.
  • Dell et al. (2014) Anthony I Dell, John A Bender, Kristin Branson, Iain D Couzin, Gonzalo G de Polavieja, Lucas PJJ Noldus, Alfonso Pérez-Escudero, Pietro Perona, Andrew D Straw, Martin Wikelski, et al. Automated image-based tracking and its application in ecology. Trends in ecology & evolution, 29(7):417–428, 2014.
  • Romero-Ferrero et al. (2019) Francisco Romero-Ferrero, Mattia G Bergomi, Robert C Hinz, Francisco JH Heras, and Gonzalo G de Polavieja. Idtracker. ai: tracking all individuals in small or large collectives of unmarked animals. Nature methods, 16(2):179–182, 2019.
  • Walter and Couzin (2021) Tristan Walter and Ian D Couzin. Trex, a fast multi-animal tracking system with markerless identification, and 2d estimation of posture and visual fields. Elife, 10:e64000, 2021.
  • Schneider et al. (2020) Stefan Schneider, Graham W. Taylor, Stefan S. Linquist, and Stefan C. Kremer. Similarity learning networks for animal individual re-identification - beyond the capabilities of a human observer. 2020 IEEE Winter Applications of Computer Vision Workshops (WACVW), pages 44–52, 2020.
  • Murphy (2012) Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.
  • Murphy (2021) Kevin P. Murphy. Probabilistic Machine Learning: An introduction. MIT Press, 2021.
  • Bellet et al. (2013) Aurélien Bellet, Amaury Habrard, and Marc Sebban. A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709, 2013.
  • Hadsell et al. (2006) Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 2, pages 1735–1742. IEEE, 2006.
  • Hermans et al. (2017) Alexander Hermans, Lucas Beyer, and Bastian Leibe. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737, 2017.
  • Xuan et al. (2020) Hong Xuan, Abby Stylianou, and Robert Pless. Improved embeddings with easy positive triplet mining. In The IEEE Winter Conference on Applications of Computer Vision, pages 2474–2482, 2020.
  • Zhuang et al. (2020) Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. A comprehensive survey on transfer learning. Proceedings of the IEEE, 2020.
  • Russakovsky et al. (2015) Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
  • Chen et al. (2017) Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017.
  • Bakliwal and Ravela (2020) Kshitij Bakliwal and Sai Ravela. The sloop system for individual animal identification with deep learning. arXiv preprint arXiv:2003.00559, 2020.
  • Qiao et al. (2020) Yongliang Qiao, Daobilige Su, He Kong, Salah Sukkarieh, Sabrina Lomax, and Cameron Clark. Bilstm-based individual cattle identification for automated precision livestock farming. In 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), pages 967–972. IEEE, 2020.
  • Andrew et al. (2020) William Andrew, Jing Gao, Neill Campbell, Andrew W Dowsey, and Tilo Burghardt. Visual identification of individual holstein friesian cattle via deep metric learning. arXiv preprint arXiv:2006.09205, 2020.
  • Harris et al. (2020) Grant Harris, Matthew Butler, David Stewart, Eric Rominger, and Caitlin Ruhl. Accurate population estimation of caprinae using trail cameras and distance sampling. Scientific Reports, 10, 10 2020. doi: 10.1038/s41598-020-73893-5.
  • Baird et al. (2008) Robin W. Baird, Antoinette M. Gorgone, Daniel J. McSweeney, Daniel L. Webster, Dan R. Salden, Mark H. Deakos, Allan D. Ligon, Gregory S. Schorr, Jay Barlow, and Sabre D. Mahaffy. False killer whales (pseudorca crassidens) around the main hawaiian islands: Long-term site fidelity, inter-island movements, and association patterns. Marine Mammal Science, 24(3):591–612, 2008. doi:
  • Marin-Cudraz et al. (2019) Thibaut Marin-Cudraz, Bertrand Muffat-Joly, Claude Novoa, Philippe Aubry, Jean-François Desmet, Mathieu Mahamoud-Issa, Florence Nicolè, Mark H Van Niekerk, Nicolas Mathevon, and Frédéric Sèbe. Acoustic monitoring of rock ptarmigan: A multi-year comparison with point-count protocol. Ecological Indicators, 101:710–719, 2019.
  • Johansson et al. (2020) Orjan Johansson, Gustaf Samelius, Ewa Wikberg, Guillaume Chapron, Charudutt Mishra, and Matthew Low. Identification errors in camera-trap studies result in systematic population overestimation. Scientific Reports, 10:6393, 04 2020. doi: 10.1038/s41598-020-63367-z.
  • Hupman et al. (2018) Krista Hupman, Karen Stockin, Kenneth Pollock, Matthew Pawley, Sarah Dwyer, Catherine Lea, and Gabriela Tezanos-Pinto. Challenges of implementing mark-recapture studies on poorly marked gregarious delphinids. PLOS ONE, 13:e0198167, 07 2018. doi: 10.1371/journal.pone.0198167.
  • Guo et al. (2020) Songtao Guo, Pengfei Xu, Qiguang Miao, Guofan Shao, Colin A. Chapman, Xiaojiang Chen, Gang He, Dingyi Fang, He Zhang, Yewen Sun, Zhihui Shi, and Baoguo Li. Automatic Identification of Individual Primates with Deep Learning Techniques. iScience, 23(8):101412, August 2020. doi: 10.1016/j.isci.2020.101412.
  • Duyck et al. (2015) James Duyck, Chelsea Finn, Andy Hutcheon, Pablo Vera, Joaquin Salas, and Sai Ravela. Sloop: A pattern retrieval engine for individual animal identification. Pattern Recognition, 48(4):1059–1073, 2015.
  • Berger-Wolf et al. (2017) Tanya Y Berger-Wolf, Daniel I Rubenstein, Charles V Stewart, Jason A Holmberg, Jason Parham, Sreejith Menon, Jonathan Crall, Jon Van Oast, Emre Kiciman, and Lucas Joppa. Wildbook: Crowdsourcing, computer vision, and data science for conservation. arXiv preprint arXiv:1710.08880, 2017.
  • Brookes and Burghardt (2020) Otto Brookes and Tilo Burghardt. A dataset and application for facial recognition of individual gorillas in zoo environments. arXiv preprint arXiv:2012.04689, 2020.
  • Wang (2018) Jinliang Wang. Estimating genotyping errors from genotype and reconstructed pedigree data. Methods in Ecology and Evolution, 9(1):109–120, 2018.
  • Weller et al. (2006) Joel Weller, Eyal Seroussi, and M Ron. Estimation of the number of genetic markers required for individual animal identification accounting for genotyping errors. Animal genetics, 37:387–9, 08 2006. doi: 10.1111/j.1365-2052.2006.01455.x.
  • Baetscher et al. (2018) Diana S Baetscher, Anthony J Clemento, Thomas C Ng, Eric C Anderson, and John C Garza. Microhaplotypes provide increased power from short-read dna sequences for relationship inference. Molecular Ecology Resources, 18(2):296–305, 2018.
  • Carroll et al. (2018) Emma L Carroll, Mike W Bruford, J Andrew DeWoody, Gregoire Leroy, Alan Strand, Lisette Waits, and Jinliang Wang. Genetic and genomic monitoring with minimally invasive sampling methods. Evolutionary applications, 11(7):1094–1119, 2018.
  • Clapham et al. (2012) Melanie Clapham, Owen Nevin, Andrew Ramsey, and Frank Rosell. A hypothetico-deductive approach to assessing the social function of chemical signalling in a non-territorial solitary carnivore. PloS one, 7:e35404, 04 2012. doi: 10.1371/journal.pone.0035404.
  • Parsons et al. (2009) K. Parsons, K.C. Balcomb, John Ford, and J.W. Durban. The social dynamics of southern resident killer whales and conservation implications for this endangered population. Animal Behaviour, 77:963–971, 04 2009. doi: 10.1016/j.anbehav.2009.01.018.
  • Díaz López (2020) Bruno Díaz López. When personality matters: personality and social structure in wild bottlenose dolphins, tursiops truncatus. Animal Behaviour, 163:73–84, 05 2020. doi: 10.1016/j.anbehav.2020.03.001.
  • Krasnova et al. (2014) V. Krasnova, Anton Chernetsky, AI Zheludkova, and V. Bel’kovich. Parental behavior of the beluga whale (delphinapterus leucas) in natural environment. Biology Bulletin, 41:349–356, 07 2014. doi: 10.1134/S1062359014040062.
  • Constantine et al. (2007) Rochelle Constantine, Kirsty Russell, Nadine Gibbs, Simon Childerhouse, and C Scott Baker. Photo-identification of humpback whales (megaptera novaeangliae) in new zealand waters and their migratory connections to breeding grounds of oceania. Marine Mammal Science, 23(3):715–720, 2007.
  • Kelly et al. (1998) Marcella Kelly, M. Laurenson, Clare Fitzgibbon, Anthony Collins, Sarah Durant, George Frame, B Bertram, and Tatianna Caro. Demography of the serengeti cheetah (acinonyx jubatus) population. Journal of Zoology, 244:473–488, 04 1998. doi: 10.1111/j.1469-7998.1998.tb00053.x.
  • Dall et al. (2012) Sasha RX Dall, Alison M Bell, Daniel I Bolnick, and Francis LW Ratnieks. An evolutionary ecology of individual differences. Ecology letters, 15(10):1189–1198, 2012.
  • Stamps and Groothuis (2010) Judy Stamps and Ton GG Groothuis. The development of animal personality: relevance, concepts and perspectives. Biological Reviews, 85(2):301–325, 2010.
  • Roche et al. (2016) Dominique G Roche, Vincent Careau, and Sandra A Binning. Demystifying animal ‘personality’(or not): why individual variation matters to experimental biologists. Journal of Experimental Biology, 219(24):3832–3843, 2016.
  • Bell et al. (2009) Alison M Bell, Shala J Hankison, and Kate L Laskowski. The repeatability of behaviour: a meta-analysis. Animal behaviour, 77(4):771–783, 2009.
  • Braga and Buckner (2017) Rodrigo M Braga and Randy L Buckner. Parallel interdigitated distributed networks within the individual estimated by intrinsic functional connectivity. Neuron, 95(2):457–471, 2017.
  • Chen et al. (2012) Rui Chen, George I Mias, Jennifer Li-Pook-Than, Lihua Jiang, Hugo YK Lam, Rong Chen, Elana Miriami, Konrad J Karczewski, Manoj Hariharan, Frederick E Dewey, et al. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell, 148(6):1293–1307, 2012.
  • Poldrack (2021) Russell A Poldrack. Diving into the deep end: A personal reflection on the myconnectome study. Current Opinion in Behavioral Sciences, 40:1–4, 2021.
  • Caravaggi et al. (2017) Anthony Caravaggi, Peter Banks, Cole Burton, Caroline Finlay, Peter Haswell, Matt Hayward, Marcus Rowcliffe, and Mike Wood. A review of camera trapping for conservation behaviour research. Remote Sensing in Ecology and Conservation, 3, 06 2017. doi: 10.1002/rse2.48.
  • Otto et al. (2017) Charles Otto, Dayong Wang, and Anil K Jain. Clustering millions of faces by identity. IEEE transactions on pattern analysis and machine intelligence, 40(2):289–303, 2017.
  • Zhong et al. (2018) Zhun Zhong, Liang Zheng, Zhedong Zheng, Shaozi Li, and Yi Yang. Camera style adaptation for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5157–5166, 2018.
  • Tibbetts and Dale (2007) Elizabeth Tibbetts and J. Dale. Individual recognition: it is good to be different. Trends in ecology & evolution, 22:529–37, 11 2007. doi: 10.1016/j.tree.2007.09.001.
  • Thom and Hurst (2004) Michael Thom and Jane Hurst. Individual recognition by scent. Ann. Zool. Fenn., 41, 01 2004.
  • Tibbetts (2002) Elizabeth Tibbetts. Visual signals of individual identity in the wasp polistes fuscatus. Proceedings. Biological sciences / The Royal Society, 269:1423–8, 08 2002. doi: 10.1098/rspb.2002.2031.
  • Swaisgood et al. (2004) Ronald R Swaisgood, Donald G Lindburg, Angela M White, Zhang Hemin, and Zhou Xiaoping. Chemical communication in giant pandas. Giant pandas: biology and conservation, 106, 2004.
  • Schneider et al. (2018) Jonathan Schneider, Nihal Murali, Graham W Taylor, and Joel D Levine. Can drosophila melanogaster tell who’s who? PloS one, 13(10):e0205043, 2018.
  • Von Uexküll (1992) Jakob Von Uexküll. A stroll through the worlds of animals and men: A picture book of invisible worlds. Semiotica, 89(4):319–391, 1992.
  • Stokes et al. (2020) Jonathan M Stokes, Kevin Yang, Kyle Swanson, Wengong Jin, Andres Cubillos-Ruiz, Nina M Donghia, Craig R MacNair, Shawn French, Lindsey A Carfrae, Zohar Bloom-Ackerman, et al. A deep learning approach to antibiotic discovery. Cell, 180(4):688–702, 2020.
  • Service (2020) Robert F Service. ‘the game has changed.’ai triumphs at protein folding, 2020.
  • Stowell et al. (2019) Dan Stowell, Tereza Petrusková, Martin Šálek, and Pavel Linhart. Automatic acoustic identification of individuals in multiple species: improving identification across recording conditions. Journal of the Royal Society Interface, 16(153):20180940, 2019.
  • Kulahci et al. (2014) Ipek Kulahci, Christine Drea, Daniel Rubenstein, and Asif Ghazanfar. Individual recognition through olfactory - auditory matching in lemurs. Proceedings. Biological sciences / The Royal Society, 281:20140071, 04 2014. doi: 10.1098/rspb.2014.0071.
  • Wen et al. (2016) Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. A discriminative feature learning approach for deep face recognition. In European conference on computer vision, pages 499–515. Springer, 2016.
  • Wang et al. (2019) Guangcong Wang, Jianhuang Lai, Peigen Huang, and Xiaohua Xie. Spatial-temporal person re-identification. In

    Proceedings of the AAAI conference on artificial intelligence

    , volume 33, pages 8933–8940, 2019.
  • Zhong et al. (2017) Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1318–1327, 2017.
  • Zhong et al. (2020) Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 13001–13008, 2020.
  • Wolf et al. (2011) L. Wolf, T. Hassner, and I. Maoz. Face recognition in unconstrained videos with matched background similarity. In CVPR 2011, pages 529–534, 2011.
  • Taigman et al. (2014) Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1701–1708, 2014.
  • Parkhi et al. (2015) Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. Deep face recognition. In Xianghua Xie, Mark W. Jones, and Gary K. L. Tam, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 41.1–41.12. BMVA Press, September 2015. ISBN 1-901725-53-7. doi: 10.5244/C.29.41.
  • Van Noorden (2020) Richard Van Noorden. The ethical questions that haunt facial-recognition research. Nature, 587(7834):354–358, 2020.