Visual Identification of Individual Holstein Friesian Cattle via Deep Metric Learning

by   William Andrew, et al.
University of Bristol

Holstein Friesian cattle exhibit individually-characteristic black and white coat patterns visually akin to those arising from Turing's reaction-diffusion systems. This work takes advantage of these natural markings in order to automate visual detection and biometric identification of individual Holstein Friesians via convolutional neural networks and deep metric learning techniques. Using agriculturally relevant top-down imaging, we present methods for the detection, localisation, and identification of individual Holstein Friesians in an open herd setting, i.e. where changes in the herd do not require system re-training. We propose the use of SoftMax-based reciprocal triplet loss to address the identification problem and evaluate the techniques in detail against fixed herd paradigms. We find that deep metric learning systems show strong performance even under conditions where cattle unseen during system training are to be identified and re-identified - achieving 98.2 accuracy when trained on just half of the population. This work paves the way for facilitating the visual non-intrusive monitoring of cattle applicable to precision farming for automated health and welfare monitoring and to veterinary research in behavioural analysis, disease outbreak tracing, and more.


page 3

page 4

page 5

page 8

page 9

page 10

page 14


Towards Individual Grevy's Zebra Identification via Deep 3D Fitting and Metric Learning

This paper combines deep learning techniques for species detection, 3D m...

In Defense of the Triplet Loss for Person Re-Identification

In the past few years, the field of computer vision has gone through a r...

DIABLO: Dictionary-based Attention Block for Deep Metric Learning

Recent breakthroughs in representation learning of unseen classes and ex...

Deep Cosine Metric Learning for Person Re-Identification

Metric learning aims to construct an embedding where two extracted featu...

Visual Microfossil Identificationvia Deep Metric Learning

We apply deep metric learning for the first time to the prob-lem of clas...

Label a Herd in Minutes: Individual Holstein-Friesian Cattle Identification

We describe a practically evaluated approach for training visual cattle ...

Automated Visual Fin Identification of Individual Great White Sharks

This paper discusses the automated visual identification of individual g...

Code Repositories


Underpinning code for our paper - "Visual Identification of Individual Holstein Friesian Cattle via Deep Metric Learning"

view repo

1 Introduction

Motivation. Driven by their high milk yield Tadesse and Dessie (2003), black and white patterned Holstein Friesian and British Friesian cattle are the dominant dairy cattle breeds farmed in the UK New et al. (2005); Department for Environment and Affairs (2008) (see Fig. 2). Legal frameworks mandate traceability of livestock throughout their lives Parliament and Council (1997); of Agriculture (USDA) – Animal and Service in order to identify individuals for monitoring, control of disease outbreak, and more Hansen et al. (2018); Smith et al. (2005); Bowling et al. (2008); Caporale et al. (2001). For cattle this is realised in the form of a national tracking database linked to a unique ear-tag identification for each animal Houston (2001); Buick (2004); Shanahan et al. (2009), or additionally via injectable transponders Klindtworth et al. (1999), branding Adcock et al. (2018), and more Awad (2016) (see Fig. 3). Such tags, however, cannot provide the continuous localisation of individuals that would open up numerous applications in precision farming and a number of research areas, including welfare assessment, behavioural and social analysis, disease development and infection transmission, amongst others Ungar et al. (2005); Turner et al. (2000). Even for conventional identification, tagging has been called into question from a welfare standpoint Johnston et al. (1996); Edwards and Johnston (1999), regarding longevity/reliability Fosgate et al. (2006), and permanent damage Edwards et al. (2001); Wardrope (1995). Building upon previous research Martinez-Ortiz et al. (2013); Li et al. (2017); Andrew et al. (2016, 2017, 2019); Andrew (2019); Andrew et al. (2020), we propose to take advantage of the intrinsic, characteristic formations of the breed’s coat pattern in order to perform non-intrusive visual identification (ID), laying down the essential precursors to continuous monitoring of herds on an individual animal level via non-intrusive visual observation (see Fig. 1).

Figure 1: Identification Pipeline Overview. Overview of the proposed pipeline for automatically detecting and identifying both known and never before seen cattle. The process begins with a breed-wide detector extracting cattle regions of interest (RoIs) agnostic to individual patterns. These are then embedded via a dimensionality reduction model trained to cluster images according to individual coat patterns.

RoIs projected into this latent ID space can then be classified by a lightweight approaches such as k-nearest neighbours, ultimately yielding

cattle identities for input images. Unknown cattle can be projected into this same space as long as the model has learnt a sufficiently discriminative reduction such that its new embeddings can be differentiated from other clusters based on distance.
(a) Cattle breed distribution
(b) Average UK dairy herd size
Figure 2: Cattle Distributions. Distribution of cattle breeds and herd sizes within the United Kingdom. : The 8 most common breeds of cattle in Great Britain in 2008 Department for Environment and Affairs (2008). The ‘Black & White’ category collectively contains the Holstein Friesian, British Friesian and Holstein breeds as they are difficult to distinguish during census. In total there were nearly 9 million registered cattle in Great Britain in 2008, approximately 3 million of which were ‘Black & White’ Department for Environment and Affairs (2008). : shows the average size of dairy herds per farm within the UK doubling over the past two decades Agriculture and Board (2017, 2015). Figure credit Andrew (2019).
(a) Ear tag
(b) Tattooing
(c) Collars
(d) Freeze branding
(e) Ours
Figure 3: Cattle Identification Methods. Examples of traditional methods for identifying cattle. All rely upon some physical addition, be that permanent (branding, tattooing, ear tagging) or non-permanent (collars). We instead propose to use naturally-occurring coat pattern features to achieve vision-based identification from imagery acquired via (d) an Unmanned Aerial Vehicle (UAV) (top), or low-cost static cameras (bottom). Figure credit: (a) Velez, J. F. et al. Velez et al. (2013), (b) Pennington, J. A. Pennington (2007), (c) PyonProducts, (d) Bertram, J. et al. Bertram et al. (1996).

Closed-set Identification. Our previous works showed that visual cattle detection, localisation, and re-identification via deep learning is robustly feasible in closed-set scenarios where a system is trained and tested on a fixed set of known Holstein Friesian cattle under study Andrew et al. (2016, 2017); Andrew (2019). However in this setup, imagery of all animals must be taken and manually annotated/identified before system training can take place. Consequently, any change in the population or transfer of the system to a new herd requires labour-intensive data gathering and labelling, plus computationally demanding retraining of the system.

Open-set Identification. In this paper our focus is on a flexible scenario - the open-set recognition of individual Holstein Friesian cattle. Instead of only being able to recognise individuals that have been seen before and trained against, the system should be able to identify and re-identify cattle that have never been seen before without further retraining. To provide a complete process, we propose a full pipeline for detection and open-set recognition from image input to IDs. (see Fig. 1).

The remainder of this paper and its contributions are organised as follows: Section 2 discusses relevant related works in the context of this paper. Next, Section 4 outlines Holstein Friesian breed RoI detection, the first stage of the proposed identification pipeline, followed by the second stage in Section 5 on open-set individual recognition with extensive experiments on various relevant techniques. Finally, concluding remarks and possible avenues for future work are given in Section 7.

2 Related Work

The most longstanding approaches to cattle biometrics leverage the discovery of the cattle muzzle as a dermatoglyphic trait as far back as 1922 by Petersen, WE. Petersen (1922). Since then, this property has been taken advantage of in the form of semi-automated approaches Kumar and Singh (2017); Kumar et al. (2017); Kimura et al. (2004); Tharwat et al. (2014) and those operating on muzzle images Awad and Hassaballah (2019); El Hadad et al. (2015); Barry et al. (2007). These techniques, however, rely upon the presence of heavily constrained images of the cattle muzzle that are not easily attainable. Other works have looked towards retinal biometrics Allen et al. (2008), facial features Barbedo et al. (2019); Cai and Li (2013), and body scans Arslan et al. (2014), all requiring specialised imaging.

2.1 Automated Cattle Biometrics

Only a few works have utilised advancements in the field of computer vision for the automated extraction of individual identity based on full body dorsal features Martinez-Ortiz et al. (2013); Li et al. (2017)

. Our previous works have taken advantage of this property; exploiting hand crafted features extracted on the coat

Andrew et al. (2016) (similar to a later work by Li, W. et al. Li et al. (2017)), which was outperformed by a deep approach using convolutional neural networks extracting spatio-temporal features Andrew et al. (2017, 2019); Andrew (2019), similar to Qiao et al. (2019). More recently, there have been works that integrate multiple views of cattle faces for identification Barbedo et al. (2019), utilise thermal imagery for background subtraction as a pre-processing technique for a standard CNN-based classification pipeline Bhole et al. (2019), and detect cattle presence from UAV-acquired imagery Barbedo et al. (2019). In this work we continue to exploit dorsal biometric features from coat patterns exhibited by Holstein and Holstein Friesian breeds as they provably provide sufficient distinction across populations. In addition, the images are easily acquired via static ceiling-mounted cameras, or outdoors using UAVs. Note that such birds-eye view images provide a canonical and consistent viewpoint of the object, the possibility of occlusions is widely eradicated, and imagery can be captured in a non-intrusive manner.

2.2 Deep Object Detection

Object detectors generally fall into two classes: one-stage detectors such as SSD Liu et al. (2016) and YOLO Redmon et al. (2016)

which infer class probability and bounding box offsets within a single feedforward network, and two-stage detectors such as Faster R-CNN

Ren et al. (2015) and Cascade-RCNN Cai and Vasconcelos (2018) which pre-process images first to generate class-agnostic regions before classifying these and regressing associated bounding boxes. Recent improvements to one-stage detectors exemplified by YOLOv3 Redmon and Farhadi (2018) and RetinaNet Lin et al. (2017b) deliver detection accuracy comparable to two-stage detectors at the general speed of a single detection stage. A RetinaNet architecture is used as the detection network of choice in this work, since it also addresses class imbalances; replacing the traditional cross-entropy loss with focal loss for classification.

2.3 Open-Set Recognition

The problem of open-set recognition – that is, automatically re-identifying never before seen objects – is a well-studied area in computer vision and machine learning. Traditional and seminal techniques typically have their foundations in probabilistic and statistical approaches 

Jain et al. (2014); Scheirer et al. (2014); Rudd et al. (2017)

, with alternatives including specialised support vector machines 

Scheirer et al. (2012); Júnior et al. (2016) and more Bendale and Boult (2015); Júnior et al. (2017).

However, given the performance gains on benchmark datasets achieved using deep learning and neural network techniques Sermanet et al. (2013); Girshick et al. (2014); Krizhevsky et al. (2012)

, approaches to open-set recognition have followed suit. Proposed deep models can be found to operate in an autoencoder paradigm 

Oza and Patel (2019); Yoshihashi et al. (2019)

, where a network learns to transform an image input into an efficient latent representation and then reconstructs it from that representation as closely as possible. Alternatives include open-set loss function formulations instead of softmax 

Bendale and Boult (2016), the generation of counterfactual images close to the training set to strengthen object discrimination Neal et al. (2018), and approaches that combine these two techniques Ge et al. (2017); Shu et al. (2017). Some further, less relevant techniques are discussed in Geng et al. (2018).

The approach taken in this work is to learn a latent representation of the training set of individual cattle in the form of an embedding that generalises visual uniqueness of the breed, beyond that of the specific training herd. The idea is that this dimensionality reduction should be discriminative to the extent that new unseen individuals projected into this same space will differ significantly from the embeddings of the known training set. This form of approach has history in literature Meyer and Drummond (2019); Lagunes-Fortiz et al. (2019); Hassen and Chan (2018), where embeddings have been originally used for human re-identification Schroff et al. (2015); Hermans et al. (2017), as well as data aggregation and clustering Oh Song et al. (2016); Opitz et al. (2018); Oh Song et al. (2017). In our experiments, we will investigate the effect of various loss functions for constructing latent spaces Schroff et al. (2015); Lagunes-Fortiz et al. (2019); Masullo et al. (2019) and quantify their suitability for the open-set recognition of Holstein Friesian cattle.

3 Dataset: OpenCows2020

To facilitate the experiments carried out in this paper, we introduce the OpenCows2020 dataset, which will be made available publicly. The dataset consists of indoor and outdoor top-down imagery collated from our previous works and datasets Andrew et al. (2016, 2017, 2019). Indoor footage was acquired with statically affixed cameras, whilst out imagery was captured onboard a UAV. The dataset is split into two components detailed below: (a) for cattle detection and localisation, the first stage in our pipeline, and (b) for open-set identification.

3.1 Detection & Localisation

The detection and localisation component of the OpenSet2020 dataset consists of whole images with hand annotated cattle regions across in-barn and outdoor settings. When training a detector on this set, one obtains a model that is widely domain agnostic with respect to the environment, and can be deployed in a variety of farming-relevant conditions. This component of the dataset consists of a total of images, containing cattle instances. Around 52 of this set are original, non-augmented images. The rest were synthesised with a combination of random cropping, scaling, rotation, blurring and more using Jung et al. (2020) to enhance the training set. For each cow, we manually annotated a bounding box that encloses the animal’s torso, excluding the head, neck, legs and tail in adherence with the VOC 2012 guidelines Everingham et al. . This is in order to limit content to a canonical, compact and minimally deforming species-relevant region. Illustrative examples from this set are given in Fig. 4.

Figure 4: Detection & Localisation Dataset Examples. Example instances to illustrate the variety in acquisition conditions and environments for the dataset provided for training and testing models performing breed-wide detection and localisation of cattle.

3.2 Identification

The second component of the OpenSet2020 dataset consists of identified cattle from the detection image set. Individuals with less than 20 instances were discarded, resulting in a population of individuals, an average of instances per class and regions overall. A random example from each individual is given in Figure 5 to illustrate the variety in coat patterns, as well as the various acquisition methods, backgrounds/environments, illumination conditions, etc.

Figure 5: Identification Dataset Examples. Example instances for each of 46 individuals in the OpenCows2020 dataset. Observable is the variation in acquisition method, surrounding environment and background, illumination conditions, etc.

4 Cattle Detection

The first stage in the pipeline (see Fig. 1, blue) is to be able to automatically and robustly detect and locate Holstein Friesian cattle within relevant imagery. That is, we want to train a generic breed-wide cattle detector such that for some image input, we receive a set of bounding box coordinates with confidence scores (see Figure 6) enclosing every cow within it as output. Note that the object class of (all) cattle is highly diverse with each individual presenting a different coat pattern. The RetinaNet Lin et al. (2017b) architecture serves as the detection backbone for this breed recognition task where we will compare its performance against other relevant seminal baselines (see Section 4.3).

Figure 6: Detection & Localisation Examples. Example of (left) indoor and (right) outdoor instances for object detection. (red) Rectangular ground truth bounding boxes for class ‘cow’ and (blue) predicted bounding boxes of cow with associated confidence scores. and are the top left and the lower right coordinates of each bounding box, respectively.

4.1 Detection Loss

RetinaNet consists of a backbone feature pyramid network Lin et al. (2017a) followed by two task-specific sub-networks. One sub-network performs object classification on the backbone’s output using focal loss, the other regresses the bounding box position. To implement focal loss, we first define as follows for convenience:


where is the ground truth and

is the estimated probability when

= 1. For detection we only need to separate cattle from the background, therefore presenting a binary classification problem. As such, focal loss is defined as:


where is cross entropy for binary classification, is the modulating factor that balances easy/difficult samples, and can balance the number of positive/negative samples. The focal loss function guarantees that the training process pays attention to positive and difficult samples first.

The regression sub-network predicts four parameters representing the offset coordinates between anchor box and ground-truth box . Their ground-truth offsets can be expressed as:


where is the ground-truth box and is the anchor box. The width and height of the bounding box are given by and . The regression loss can be defined as:


where Smooth L1 loss is defined as:


Overall, the detection network minimises a combined loss function bringing together Smooth L1 and focal loss components relating to localisation and classification, respectively:


where and are defined by equations 4 and 2, respectively. is a balancing parameter.

4.2 Experimental Setup

Our particular RetinaNet implementation utilises a ResNet-50 backbone He et al. (2016)

as the feature pyramid network, with weights pre-trained on ImageNet 

Deng et al. (2009). The intersection over union (IoU) threshold, the prior anchor’s confidence of foreground, and other parameters are set to those proposed in Lin et al. (2017b). The network was fine-tuned on the detection component of our dataset using a batch size of

, Stochastic Gradient Descent 

Robbins and Monro (1951) at an initial learning rate of with a momentum of  Qian (1999) and weight decay at . Training and testing splits were randomly chosen in a ratio of , respectively, with any synthetic instances removed from the test set. Focal loss function parameters were selected with = 2, = 0.25, = 1. Training time was around 30 hours on an Nvidia V100 GPU (Tesla P100-PCIE-16GB) for epochs of training. Finally, to provide a suitable comparison with baselines, two popular and seminal architectures – YOLO v3 Redmon and Farhadi (2018) and Faster R-CNN Ren et al. (2015) – are evaluated on the same dataset and splits in the following section.

4.3 Baseline Comparisons and Evaluation

Quantitative comparisons of the proposed detection method against classic and recent approaches are shown in Table 1. Mean average precision (mAP) is given as the chosen metric to quantitatively compare performances computed via the area under the curve for the precision-recall curve obtained from each method. As can be seen, the strongest performance was achieved by the RetinaNet-underpinned architecture at near perfect mAP rates suitable for practical application - which justifies the network’s use in our proposed image-to-ID pipeline. Specifically, our implementation obtains this performance with the following parameter choices: confidence score threshold = 0.5, non-maximum suppression (NMS) threshold = 0.28, IOU threshold = 0.5.

Figure 7 depicts limitations and shows instances of RetinaNet detection failures. Examples (a) and (b) arise from image boundary clipping following the VOC labelling guidelines Everingham et al. on object visibility/occlusion which can be avoided in most practical applications by ignoring boundary areas. In (c), poor localisation is the result of closely situated cattle in conjunction with the choice of a low NMS threshold. We chose to keep the NMS threshold as low as possible, otherwise it occasionally leads to false positive detections in groups of crowded cattle (see Fig. 6(a)). Finally we found that in rare cases, as shown in (e), when two cattle are parallel, in close proximity and have a diagonal heading, a predicted box between the two cows can sometimes be observed. This is as a result of one of the intrinsic drawbacks of orthogonal bounding boxes. In the case of objects with diagonal heading, a ground truth bounding box will include maximal background pixels. Consequently, background pixels could be occupied by neighbouring cattle, which misguides the network.

Pre-trained on
COCO Lin et al. (2014)
Pre-trained on
ImageNet Deng et al. (2009)
YOLO V3 Redmon and Farhadi (2018) N N 80.3
Faster R-CNN Ren et al. (2015)(Resnet50 backbone) Y N 94.8
RetinaNet Lin et al. (2017b)(Resnet50 backbone) N Y 97.5
Table 1: Quantitative Performance. Comparative results on the detection component of the OpenCows2020 dataset, where mean Average Precision (mAP) is computed as the area under the curve in precision-recall space.
Figure 7: Detection and Localisation Failures of RetinaNet. Examples of failures for detecting cattle. (Red): ground truth annotations, (blue): predicted bounding boxes. Examples include (a) false negative detection, (b) false positive detection at the boundary of the images, (c) inaccurate localisation and (d) false negative detection due to the proximity and alignment of multiple cattle. (e) depicts an example of higher (0.5) NMS threshold, where it is not low enough to make a bounding box eliminate its neighbouring high-confidence box.

5 Open-Set Individual Identification via Metric Learning

Given robustly identified image regions that contain cattle, we would like to discriminate individuals, seen or unseen, without the costly step of manually labelling new individuals and fully re-training a closed-set classifier. The key idea to approach this task is to learn a mapping into a class-distinctive latent space where maps of images of the same individual naturally cluster together. Such a feature embedding encodes a latent representation of inputs and, for images, also equates to a significant dimensionality reduction from a matrix to an embedding with size , where is the dimensionality of the embedded space. In the latent space, distances directly encode input similarity, hence the term of metric learning. To actually classify inputs after constructing a successful embedding, a lightweight clustering algorithm can be applied to the latent space (e.g. k-Nearest Neighbours) where clusters now represent individuals.

5.1 Metric Space Building and Loss Functions

Success in building this form of latent representation relies heavily – amongst many other factors – upon the careful choice of a loss function that naturally yields an identity-clustered space. A seminal example in metric learning originates from the use of Siamese architectures Hadsell et al. (2006), where image pairs are passed through a dual stream network with coupled weights to obtain their embedding. Weights are shared between two identical network streams :


The authors then proposed training this architecture with a contrastive loss to cluster instances according to their class:


where is a binary label denoting similarity or dissimilarity on the inputs , and is the Euclidean distance between two embeddings with dimensionality . The problem with this formulation is that it cannot simultaneously encourage learning of visual similarities and dissimilarities, both of which are critical for obtaining clean, well-separated clusters on our coat pattern differentiation task. This shortcoming can be overcome by a triplet loss formulation Schroff et al. (2015); utilising the embeddings of a triplet containing three image inputs denoting an anchor, a positive example from the same class, and a negative example from a different class, respectively. The idea being to encourage minimal distance between the anchor and the positive , and maximal distance between the anchor and the negative sample in the embedded space. Figure 7(a) illustrates the learning goal, whilst the loss function is given by:



denotes a constant margin hyperparameter. The inclusion of the constant

often turns out to cause learning issues since the margin can be satisfied at any distance from the anchor; Figure 7(b) illustrates this problem. Alleviating this limitation is a recent formulation named reciprocal triplet loss Masullo et al. (2019), which removes the margin hyperparameter altogether:

(a) Triplet loss learning objective
(b) Margin problem
Figure 8: Triplet Loss and the Margin Problem. (a) The triplet loss function aims to minimise the distance between an anchor and a positive instance (both belonging to the same class), whilst maximising the distance between the anchor and a negative (belonging to a different class). However, (b) illustrates the problem with the inclusion of a margin parameter in the triplet loss formulation; it can be satisfied at any distance from the anchor.

Recent work Lagunes-Fortiz et al. (2019) has demonstrated improvements in open-set recognition on various datasets Hodan et al. (2017); Wang et al. (2017) via the inclusion of a SoftMax term in the triplet loss formulation during training given by:




and where is a constant weighting hyperparameter and is standard triplet loss as defined in equation 9. For our experiments, we select  as suggested in the original paper Lagunes-Fortiz et al. (2019)

as the result of a parameter grid search. This formulation is able to outperform the standard triplet loss approach since it combines the best of both worlds; fully supervised learning and a separable embedded space. Most importantly for the task at hand, we propose to combine a fully supervised loss term as given by Softmax loss with the reciprocal triplet loss formulation which removes the necessity of specifying a margin parameter. This combination is novel and given by:


where and are defined by equations 10 and 12 above, respectively. Comparative results for all of these loss functions are given in our experiments as follows.

6 Experiments

In the following section, we compare and contrast different triplet loss functions to quantitatively show performance differences on our task of open-set identification of Holstein Friesian cattle. The goal of the experiments carried out here is to investigate the extent to which different feature embedding spaces are suitable for our specific open-set classification task. Within the context of the overall identification pipeline given in Figure 1, we will assume that the earlier stage (as described in Section 4) has successfully detected the presence of cattle and extracted good-quality regions of interest. These regions are now ready to be identified, as assessed in these experiments.

6.1 Experimental Setup

The employed embedding network utilises a ResNet50 backbone He et al. (2016), with weights pre-trained on ImageNet Deng et al. (2009). The final fully connected layer was set to have outputs, defining the dimensionality of the embedding space. This dimensionality choice was founded on existing research suggesting

to be suitable for fine-grained recognition tasks such as face recognition

Schroff et al. (2015) or image class retrieval Balntas et al. (2016). In each experiment, the network was fine-tuned on the training portion of the identification regions in the OpenCows2020 dataset over epochs with a batch size of . We chose Stochastic Gradient Descent Robbins and Monro (1951) as the optimiser, set to an initial learning rate of with momentum  Qian (1999) and weight decay . For every training run, the reported accuracy value is the highest achieved over the epochs of training. Of note is that we found the momentum component led to significant instability during training with reciprocal triplet loss, thus we disabled it for runs using that function. Finally, for a comparative closed-set classifier chosen as another baseline, the same ResNet50 architecture was used.

Once an image is passed through the network, we obtain its -dimensional embedding . We then used -NN with (as suggested by similar research Lagunes-Fortiz et al. (2019)), where more complex alternatives for provided only negligible performance gain. Using -NN to classify unseen classes operates by projecting every non-testing instance from every class into the latent space; both those seen and unseen during the network training. Subsequently, every testing instance (of known and unknown individuals) is also projected into the latent space. Finally, each testing instance is classified from votes from the surrounding nearest embeddings from non-testing instances. Accuracy is then defined as the number of correct predictions divided by the cardinality of the testing set.

To validate the model in its capacity to generalise from seen to unseen individuals, we perform several -fold cross validations. In order to do so, the set of individuals are randomly split into evenly-sized bins. For each fold , the -th bin forms the unseen set of individuals (withheld during training), and the rest form the known set, which are trained against. The number of folds is incrementally lowered from to observe the effect of withholding more individuals from training; Table 2 illustrates quantitative results. That is, how well does the model perform on an increasingly open problem? Within each individual class, its instances were randomly split into training and testing samples in a ratio of , respectively. These splits remain constant throughout experimentation to ensure consistency and enable quantitative comparison.

6.1.1 Identity Space Mining Strategies

During training, one observes the network learning quickly and, as a result, a large fraction of triplets are rendered relatively uninformative. The commonly-employed remedy is to mine triplets a priori for difficult examples. This offline process was superseded by Hermans et al. in their 2017 paper Hermans et al. (2017); proposing two online methods for mining more appropriate triplets: ‘batch hard’ and ‘batch all’. Triplets are mined within each mini-batch during training and their triplet loss computed over the selections. In this way, a costly offline search before training is no longer necessary. Consequently, we employ ‘batch hard’ here as our online mining strategy, as given by:


where is the mini-batch of triplets, are the anchor classes and are the images for those anchors. This formulation selects moderate triplets overall, since they are the hardest examples within each mini-batch, which is in turn a small subset of the training data. We use this mining strategy for all of the tested loss functions given in the following results section.

6.2 Results

Average Accuracy (%) : [Minimum, Maximum]
Known / Unknown (%) 90 / 10 83 / 17 75 / 25 67 / 33 50 / 50 40 / 60 30 / 70 20 / 80 10 / 90
Cross-entropy (Closed-set) 36.69 25.6 13.1 7.86
Triplet Loss Schroff et al. (2015) 93.34 84.06 81.65 71.57
Reciprocal Triplet Loss Masullo et al. (2019) 83.35 94.96 89.11 87.7
Softmax + Triplet Loss Lagunes-Fortiz et al. (2019) 97.78 94.56 89.31 86.9
(Ours) Softmax +
Reciprocal Triplet Loss
97.38 95.56 89.52 84.48
Table 2: Cross-Validated Average Accuracies. Average, minimum and maximum accuracies from cross-validation for varying ratios of known to unknown classes within the OpenCows2020 dataset consisting of 46 individuals. These results are also illustrated in Figure 9.

Key quantitative results for our experiments are given in Table 2. As can be seen, we found that our proposal for the combination of a supervised Softmax term on the reciprocal triplet loss function led to a slight performance gain when compared to other functions. Figure 9 illustrates these values in graph form, expressing the ability for the implemented methods to cope with an increasingly open-set problem. Visible in the graph is also a standard CNN-based classification baseline using Softmax and cross-entropy loss. As one would expect, this has a linear relationship with how open the identification problem is set; the baseline method can in no way generalise to unseen classes by design. In stark contrast, all embedding-based methods can be seen to drastically outperform the implemented baseline, suggesting the suitability in this form of approach to the problem. Encouragingly, as shown in Figure 10, we found that identification error had no tendency to originate from the unknown identity set.

Figure 9: Open-Set Generalisation Ability. Average, minimum and maximum accuracy across folds versus how open the problem is; that is, the proportion of all identity classes that are withheld entirely during training. Plotted are the differing responses based on the employed loss function, where TL, RTL denote standard triplet loss and reciprocal triplet loss, respectively and “SoftMax +” denotes a weighted mixture of cross-entropy and triplet loss functions as suggested by Lagunes-Fortiz et al. (2019). Also included is a baseline to highlight the unsuitability of a traditional closed-set classifier.
Figure 10: Error Proportion vs. Openness. Where the proportion of error lies (in the known or unknown set) versus how open-set the problem is. Values were calculated from embeddings trained via Softmax and reciprocal triplet loss. These results were found to be consistent across all employed loss functions.

One issue we encountered is that when there are only a small number of training classes, the model can quickly learn to satisfy that limited set; achieving near-zero loss and 100% accuracy on the validation data for those seen classes. However, the construction of the latent space is widely incomplete and there is no room for the model to learn any further, and thus performance on novel classes cannot be improved. For best performance in practise therefore, we suggest to utilise as wide an identity landscape as possible (many individuals) to carve out a diverse latent space capturing a wide range of intra-breed variance. The avoidance of overfitting is critical, as illustrated in Figure

11, where eventual perfect performance (overfitting) on a small set of known training identities does not allow performance to generalise to novel classes. The reciprocal triplet loss formulation performs slightly better across the learning task which is reflected quantitatively in our findings (see Figure 9). Thus, we suggest utilisation of RTL over the original triplet loss function for the task at hand.

(a) 50% open
(b) 10% open
Figure 11: Training-Validation Accuracy Divergence. Contrasting examples of divergence in training and validation accuracy over the course of training for 500 epochs for (a) a 50% open problem and (b) 10% openness. Cyan: training accuracy, blue: validation accuracy, orange: loss. For (a), given the reduced (half) set of classes, the model quickly learns and overfits the training set leading to increasingly poor performance on the validation set containing all classes, in contrast to (b).

6.2.1 Qualitative Analysis

To provide a qualitative visualisation, we include Figure 12, which is a visualisation of the embedded space and the corresponding clusters. This plot and the others in this section were produced using the t-distributed Stochastic Neighbour Embedding (t-SNE) van der Maaten and Hinton (2008) technique for visualising high-dimensional spaces with a perplexity of . Visible – particularly in relation to the embedded training set (see Fig. 11(a)) – is the success of the model trained via triplet loss formulations, clumping like-identities together whilst distancing others. This is then sufficient to cluster and thereby re-identify never before seen testing identities (see Fig. 11(b)). Most importantly in this case, despite only being shown half of the identity classes during training, the model learned a discriminative enough embedding that generalises well to previously unseen cattle. Thus, surprisingly few coat pattern identities are sufficient to create a latent space that spans dimensions which can successfully accommodate and cluster unseen identities.

(a) Training
(b) Testing
Figure 12: t-SNE van der Maaten and Hinton (2008) Embedding Visualisation. Examples of the feature embedding space for training and testing instances for each class. Positions of class labels indicate cluster centroids, with those in red (23 individuals; half of the dataset) denote unseen classes that were withheld during training. The embedding was trained with our proposed loss function combining a softmax component with reciprocal triplet loss.

Figure 13 visualises the embeddings of the consistent training set for a 50% open problem across all the implemented loss functions used to train latent spaces. The inclusion of a Softmax component in the loss function provided quantifiable improvements in identification accuracy. This is also qualitatively reflected in the quality of the embeddings and corresponding clusters, comparing the top and bottom rows in Figure 13. Thus, both quantitative and qualitative findings re-inforce the suitability of the proposed method to the task at hand. The core technical takeaway is that the inclusion of a fully supervised loss term appears to beneficially support a purely metric learning-based approach in training a discriminative and separable latent representation that is able to generalise to unseen instances of Holstein Friesians. Figure 14 illustrates an example from each class overlaid in this same latent space. This visualises the spatial similarities and dissimilarities the network uses to generate separable embeddings for the classes that are seen during training that generalise to unseen individuals (shown in red).

(a) Triplet Loss
(b) Reciprocal Triplet Loss
(c) Softmax + Triplet Loss
(d) Softmax + Reciprocal Triplet Loss
Figure 13: Embeddings Visualisation per Loss Function. Visualisation of the clusterings for the various loss functions used for training the embedded spaces. The visualisations are for the first fold of the same 50% open problem across all loss functions. Note that individual colours representing separate classes that are consistent across all visualisations. The visualisation is generated using the t-SNE van der Maaten and Hinton (2008) dimensionality reduction technique.
Figure 14: Class Examples Overlay. A randomly chosen example from each class overlaid on the centroids of the embeddings for their respective training instances, where half of the classes (highlighted in red) were not shown during training. Dimensionality reduction from to was performed using t-SNE van der Maaten and Hinton (2008) and the embedding was trained using Softmax and reciprocal triplet loss.

7 Conclusion

This work proposes a complete pipeline for identifying individual Holstein Friesian cattle, both seen and never before seen, in agriculturally-relevant imagery. An assessment of existing state-of-the-art object detectors determined that they are well-suited to serve as an initial breed-wide cattle detector, and RetinaNet demonstrated sufficiently strong performance at mAP on the employed dataset. Extensive experiments in open-set recognition found that surprisingly few instances are needed in order to learn and construct a robust embedding space – from image RoI to ID clusters – that generalises well to unseen cattle. Specifically, Reciprocal Triplet Loss in conjunction with a supervised Softmax component was found to demonstrably generalise best in terms of performance across open-set experiments. For instance, for a latent space built from 23 out of 46 individuals, a cross-validated accuracy of was observed. Considering its wider application, these experiments suggest that the proposed pipeline is a viable step towards automating cattle detection and identification non-intrusively in agriculturally-relevant scenarios where herds change dynamically over time. Importantly, the identification component can be trained at the time of deployment on a present herd and, as shown here for the first time, performs well without re-enrolment of individuals or re-training of the system as the population changes - a key requirement for transferability in practical agricultural settings.

7.1 Future Work

Further research will look towards investigating the scalability of this form of approach to large populations. That is, increasing the base number of individuals via additional data acquisition with the intention of learning a general representation of dorsal features exhibited by Holstein Friesian cattle. In doing so, this paves the way for the model to generalise to new farms and new herds prior to deployment, with significant implications for the precision livestock farming sector.

Another future avenue of research will investigate extension to movement tracking from video sequences through continuous re-identification. As we have shown that our cattle detection and individual identification techniques are highly accurate, the incorporation of simple tracking techniques between video frames have the potential to filter out any remaining errors. How robust this approach will be to heavy bunching of cows (for example, before milking in traditional parlours) remains to be tested.

Further goals include the incorporation of collision detection for analysis of social networks and transmission dynamics, and behaviour detection for automated welfare and health assessment, which would allow longitudinal tracking of the disease and welfare status of individual cows. Within this regard, the addition of a depth imagery component alongside standard RGB to support and improve these objectives needs to be evaluated.


  • S. J. Adcock, C. B. Tucker, G. Weerasinghe, and E. Rajapaksha (2018) Branding practices on four dairies in kantale, sri lanka. Animals 8 (8), pp. 137. Cited by: §1.
  • Agriculture and H. D. Board (2015) Dairy statistics: an insider’s guide 2015. Cited by: Figure 2.
  • Agriculture and H. D. Board (2017) Farm data - average size of dairy herds. Cited by: Figure 2.
  • A. Allen, B. Golden, M. Taylor, D. Patterson, D. Henriksen, and R. Skuce (2008) Evaluation of retinal imaging technology for the biometric identification of bovine animals in northern ireland. Livestock science 116 (1-3), pp. 42–52. Cited by: §2.
  • W. Andrew, C. Greatwood, and T. Burghardt (2017) Visual localisation and individual identification of holstein friesian cattle via deep learning. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2850–2859. Cited by: §1, §1, §2.1, §3.
  • W. Andrew, C. Greatwood, and T. Burghardt (2019) Aerial animal biometrics: individual friesian cattle recovery and visual identification via an autonomous uav with onboard deep inference. arXiv preprint arXiv:1907.05310. Cited by: §1, §2.1, §3.
  • W. Andrew, C. Greatwood, and T. Burghardt (2020) Fusing animal biometrics with autonomous robotics: drone-based search and individual id of friesian cattle. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision Workshops, pp. 38–43. Cited by: §1.
  • W. Andrew, S. Hannuna, N. Campbell, and T. Burghardt (2016)

    Automatic individual holstein friesian cattle identification via selective local coat pattern matching in rgb-d imagery

    In 2016 IEEE International Conference on Image Processing (ICIP), pp. 484–488. Cited by: §1, §1, §2.1, §3.
  • W. Andrew (2019) Visual biometric processes for collective identification of individual friesian cattle. Ph.D. Thesis, University of Bristol. Cited by: Figure 2, §1, §1, §2.1.
  • A. C. Arslan, M. Akar, and F. Alagöz (2014) 3D cow identification in cattle farms. In 2014 22nd Signal Processing and Communications Applications Conference (SIU), pp. 1347–1350. Cited by: §2.
  • A. I. Awad and M. Hassaballah (2019) Bag-of-visual-words for cattle identification from muzzle print images. Applied Sciences 9 (22), pp. 4914. Cited by: §2.
  • A. I. Awad (2016) From classical methods to animal biometrics: a review on cattle identification and tracking. Computers and Electronics in Agriculture 123, pp. 423–435. Cited by: §1.
  • V. Balntas, E. Riba, D. Ponsa, and K. Mikolajczyk (2016) Learning local feature descriptors with triplets and shallow convolutional neural networks.. In Bmvc, Vol. 1, pp. 3. Cited by: §6.1.
  • J. G. A. Barbedo, L. V. Koenigkan, T. T. Santos, and P. M. Santos (2019) A study on the detection of cattle in uav images using deep learning. Sensors 19 (24), pp. 5436. Cited by: §2.1, §2.
  • B. Barry, U. Gonzales-Barron, K. McDonnell, F. Butler, and S. Ward (2007)

    Using muzzle pattern recognition as a biometric approach for cattle identification

    Transactions of the ASABE 50 (3), pp. 1073–1080. Cited by: §2.
  • A. Bendale and T. E. Boult (2016) Towards open set deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1563–1572. Cited by: §2.3.
  • A. Bendale and T. Boult (2015) Towards open world recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1893–1902. Cited by: §2.3.
  • L. Bertram, B. Gill, J. Coventry, A. Springs, and F. DPIFM (1996) Freeze branding. Department of industrires and fisheries, northern territory, Darwin, Australia. Cited by: Figure 3.
  • A. Bhole, O. Falzon, M. Biehl, and G. Azzopardi (2019) A computer vision pipeline that uses thermal and rgb images for the recognition of holstein cattle. In International Conference on Computer Analysis of Images and Patterns, pp. 108–119. Cited by: §2.1.
  • M. Bowling, D. Pendell, D. Morris, Y. Yoon, K. Katoh, K. Belk, and G. Smith (2008) Identification and traceability of cattle in selected countries outside of north america. The Professional Animal Scientist 24 (4), pp. 287–294. Cited by: §1.
  • W. Buick (2004) Animal passports and identification. Defra Veterinary Journal 15, pp. 20–26. Cited by: §1.
  • C. Cai and J. Li (2013) Cattle face recognition using local binary pattern descriptor. In 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 1–4. Cited by: §2.
  • Z. Cai and N. Vasconcelos (2018) Cascade r-cnn: delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6154–6162. Cited by: §2.2.
  • V. Caporale, A. Giovannini, C. Di Francesco, and P. Calistri (2001) Importance of the traceability of animals and animal products in epidemiology. Revue Scientifique et Technique-Office International des Epizooties 20 (2), pp. 372–378. Cited by: §1.
  • J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Cited by: §4.2, Table 1, §6.1.
  • F. Department for Environment and R. Affairs (2008)

    The cattle book 2008: descriptive statistics of cattle numbers in great britain on 1 june 2008

    DEFRA. Cited by: Figure 2, §1.
  • D. Edwards, A. Johnston, and D. Pfeiffer (2001) A comparison of commonly used ear tags on the ear damage of sheep. Animal Welfare 10 (2), pp. 141–151. Cited by: §1.
  • D. Edwards and A. Johnston (1999) Welfare implications of sheep ear tags.. The Veterinary Record 144 (22), pp. 603–606. Cited by: §1.
  • H. M. El Hadad, H. A. Mahmoud, and F. A. Mousa (2015) Bovines muzzle classification based on machine learning techniques. Procedia Computer Science 65, pp. 864–871. Cited by: §2.
  • [30] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. Note: Cited by: §3.1, §4.3.
  • G. Fosgate, A. Adesiyun, and D. Hird (2006) Ear-tag retention and identification methods for extensively managed water buffalo (bubalus bubalis) in trinidad. Preventive veterinary medicine 73 (4), pp. 287–296. Cited by: §1.
  • Z. Ge, S. Demyanov, Z. Chen, and R. Garnavi (2017) Generative openmax for multi-class open set classification. arXiv preprint arXiv:1707.07418. Cited by: §2.3.
  • C. Geng, S. Huang, and S. Chen (2018) Recent advances in open set recognition: a survey. arXiv preprint arXiv:1811.08581. Cited by: §2.3.
  • R. Girshick, J. Donahue, T. Darrell, and J. Malik (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587. Cited by: §2.3.
  • R. Hadsell, S. Chopra, and Y. LeCun (2006) Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2, pp. 1735–1742. Cited by: §5.1.
  • M. F. Hansen, M. L. Smith, L. N. Smith, K. A. Jabbar, and D. Forbes (2018) Automated monitoring of dairy cow body condition, mobility and weight using a single 3d video capture device. Computers in industry 98, pp. 14–22. Cited by: §1.
  • M. Hassen and P. K. Chan (2018) Learning a neural-network-based representation for open set recognition. arXiv preprint arXiv:1802.04365. Cited by: §2.3.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §4.2, §6.1.
  • A. Hermans, L. Beyer, and B. Leibe (2017) In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737. Cited by: §2.3, §6.1.1.
  • T. Hodan, P. Haluza, Š. Obdržálek, J. Matas, M. Lourakis, and X. Zabulis (2017)

    T-less: an rgb-d dataset for 6d pose estimation of texture-less objects

    In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 880–888. Cited by: §5.1.
  • R. Houston (2001) A computerised database system for bovine traceability. Revue Scientifique et Technique-Office International des Epizooties 20 (2), pp. 652. Cited by: §1.
  • L. P. Jain, W. J. Scheirer, and T. E. Boult (2014) Multi-class open set recognition using probability of inclusion. In European Conference on Computer Vision, pp. 393–409. Cited by: §2.3.
  • A. Johnston, D. Edwards, E. Hofmann, P. Wrench, F. Sharples, R. Hiller, W. Welte, and K. Diederichs (1996) 1418001. welfare implications of identification of cattle by ear tags. The Veterinary Record 138 (25), pp. 612–614. Cited by: §1.
  • A. B. Jung, K. Wada, J. Crall, S. Tanaka, J. Graving, C. Reinders, S. Yadav, J. Banerjee, G. Vecsei, A. Kraft, Z. Rui, J. Borovec, C. Vallentin, S. Zhydenko, K. Pfeiffer, B. Cook, I. Fernández, F. De Rainville, C. Weng, A. Ayala-Acevedo, R. Meudec, M. Laporte, et al. (2020) imgaug. Note:; accessed 01-Feb-2020 Cited by: §3.1.
  • P. R. M. Júnior, R. M. de Souza, R. d. O. Werneck, B. V. Stein, D. V. Pazinato, W. R. de Almeida, O. A. Penatti, R. d. S. Torres, and A. Rocha (2017) Nearest neighbors distance ratio open-set classifier. Machine Learning 106 (3), pp. 359–386. Cited by: §2.3.
  • P. R. M. Júnior, T. E. Boult, J. Wainer, and A. Rocha (2016) Specialized support vector machines for open-set recognition. arXiv preprint arXiv:1606.03802. Cited by: §2.3.
  • A. Kimura, K. Itaya, and T. Watanabe (2004) Structural pattern recognition of biological textures with growing deformations: a case of cattle’s muzzle patterns. Electronics and Communications in Japan (Part II: Electronics) 87 (5), pp. 54–66. Cited by: §2.
  • M. Klindtworth, G. Wendl, K. Klindtworth, and H. Pirkelmann (1999) Electronic identification of cattle with injectable transponders. Computers and electronics in agriculture 24 (1-2), pp. 65–79. Cited by: §1.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §2.3.
  • S. Kumar, S. K. Singh, R. S. Singh, A. K. Singh, and S. Tiwari (2017) Real-time recognition of cattle using animal biometrics. Journal of Real-Time Image Processing 13 (3), pp. 505–526. Cited by: §2.
  • S. Kumar and S. K. Singh (2017) Automatic identification of cattle using muzzle point pattern: a hybrid feature extraction and classification paradigm. Multimedia Tools and Applications 76 (24), pp. 26551–26580. Cited by: §2.
  • M. Lagunes-Fortiz, D. Damen, and W. Mayol-Cuevas (2019) Learning discriminative embeddings for object recognition on-the-fly. In 2019 International Conference on Robotics and Automation (ICRA), pp. 2932–2938. Cited by: §2.3, §5.1, Figure 9, §6.1, Table 2.
  • W. Li, Z. Ji, L. Wang, C. Sun, and X. Yang (2017) Automatic individual identification of holstein dairy cows using tailhead images. Computers and electronics in agriculture 142, pp. 622–631. Cited by: §1, §2.1.
  • T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie (2017a) Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125. Cited by: §4.1.
  • T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár (2017b) Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp. 2980–2988. Cited by: §2.2, §4.2, Table 1, §4.
  • T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick (2014) Microsoft coco: common objects in context. In European conference on computer vision, pp. 740–755. Cited by: Table 1.
  • W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg (2016) Ssd: single shot multibox detector. In European conference on computer vision, pp. 21–37. Cited by: §2.2.
  • C. A. Martinez-Ortiz, R. M. Everson, and T. Mottram (2013) Video tracking of dairy cows for assessing mobility scores. Cited by: §1, §2.1.
  • A. Masullo, T. Burghardt, D. Damen, T. Perrett, and M. Mirmehdi (2019) Who goes there? exploiting silhouettes and wearable signals for subject identification in multi-person environments. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 0–0. Cited by: §2.3, §5.1, Table 2.
  • B. J. Meyer and T. Drummond (2019)

    The importance of metric learning for robotic vision: open set recognition and active learning

    In 2019 International Conference on Robotics and Automation (ICRA), pp. 2924–2931. Cited by: §2.3.
  • L. Neal, M. Olson, X. Fern, W. Wong, and F. Li (2018) Open set learning with counterfactual images. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 613–628. Cited by: §2.3.
  • E. New, D. f. E. F. Zoonotic Disease (NEZD) Division, and R. Affairs (2005) Most common breeds of cattle in gb (nuts 1 areas). Vol. 1. Cited by: §1.
  • [63] U. S. D. of Agriculture (USDA) – Animal and P. H. I. Service Cattle identification. Note:[Online; accessed 14-November-2018] Cited by: §1.
  • H. Oh Song, S. Jegelka, V. Rathod, and K. Murphy (2017) Deep metric learning via facility location. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5382–5390. Cited by: §2.3.
  • H. Oh Song, Y. Xiang, S. Jegelka, and S. Savarese (2016) Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4004–4012. Cited by: §2.3.
  • M. Opitz, G. Waltner, H. Possegger, and H. Bischof (2018) Deep metric learning with bier: boosting independent embeddings robustly. IEEE transactions on pattern analysis and machine intelligence. Cited by: §2.3.
  • P. Oza and V. M. Patel (2019) Deep cnn-based multi-task learning for open-set recognition. arXiv preprint arXiv:1903.03161. Cited by: §2.3.
  • E. Parliament and Council (1997) Establishing a system for the identification and registration of bovine animals and regarding the labelling of beef and beef products and repealing council regulation (ec) no 820/97. Note:[Online; accessed 29-January-2016] Cited by: §1.
  • J. A. Pennington (2007) Tattooing of cattle and goats. Cooperative Extension Service, University of Arkansas Division of …. Cited by: Figure 3.
  • W. Petersen (1922) The identification of the bovine by means of nose-prints. Journal of dairy science 5 (3), pp. 249–258. Cited by: §2.
  • N. Qian (1999) On the momentum term in gradient descent learning algorithms. Neural networks 12 (1), pp. 145–151. Cited by: §4.2, §6.1.
  • Y. Qiao, D. Su, H. Kong, S. Sukkarieh, S. Lomax, and C. Clark (2019) Individual cattle identification using a deep learning based framework. IFAC-PapersOnLine 52 (30), pp. 318–323. Cited by: §2.1.
  • J. Redmon, S. Divvala, R. Girshick, and A. Farhadi (2016) You only look once: unified, real-time object detection. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.2.
  • J. Redmon and A. Farhadi (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767. Cited by: §2.2, §4.2, Table 1.
  • S. Ren, K. He, R. Girshick, and J. Sun (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pp. 91–99. Cited by: §2.2, §4.2, Table 1.
  • H. Robbins and S. Monro (1951) A stochastic approximation method. The annals of mathematical statistics, pp. 400–407. Cited by: §4.2, §6.1.
  • E. M. Rudd, L. P. Jain, W. J. Scheirer, and T. E. Boult (2017) The extreme value machine. IEEE transactions on pattern analysis and machine intelligence 40 (3), pp. 762–768. Cited by: §2.3.
  • W. J. Scheirer, A. de Rezende Rocha, A. Sapkota, and T. E. Boult (2012) Toward open set recognition. IEEE transactions on pattern analysis and machine intelligence 35 (7), pp. 1757–1772. Cited by: §2.3.
  • W. J. Scheirer, L. P. Jain, and T. E. Boult (2014) Probability models for open set recognition. IEEE transactions on pattern analysis and machine intelligence 36 (11), pp. 2317–2324. Cited by: §2.3.
  • F. Schroff, D. Kalenichenko, and J. Philbin (2015) Facenet: a unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823. Cited by: §2.3, §5.1, §6.1, Table 2.
  • P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229. Cited by: §2.3.
  • C. Shanahan, B. Kernan, G. Ayalew, K. McDonnell, F. Butler, and S. Ward (2009) A framework for beef traceability from farm to slaughter using global standards: an irish perspective. Computers and electronics in agriculture 66 (1), pp. 62–69. Cited by: §1.
  • L. Shu, H. Xu, and B. Liu (2017) Doc: deep open classification of text documents. arXiv preprint arXiv:1709.08716. Cited by: §2.3.
  • G. Smith, J. Tatum, K. Belk, J. Scanga, T. Grandin, and J. Sofos (2005) Traceability from a us perspective. Meat science 71 (1), pp. 174–193. Cited by: §1.
  • M. Tadesse and T. Dessie (2003) Milk production performance of zebu, holstein friesian and their crosses in ethiopia. Livestock Research for Rural Development 15 (3), pp. 1–9. Cited by: §1.
  • A. Tharwat, T. Gaber, A. E. Hassanien, H. A. Hassanien, and M. F. Tolba (2014) Cattle identification using muzzle print images based on texture features approach. In Proceedings of the Fifth International Conference on Innovations in Bio-Inspired Computing and Applications IBICA 2014, pp. 217–227. Cited by: §2.
  • L. Turner, M. Udal, B. Larson, and S. Shearer (2000) Monitoring cattle behavior and pasture use with gps and gis. Canadian Journal of Animal Science 80 (3), pp. 405–413. Cited by: §1.
  • E. D. Ungar, Z. Henkin, M. Gutman, A. Dolev, A. Genizi, and D. Ganskopp (2005) Inference of animal activity from gps collar data on free-ranging cattle. Rangeland Ecology & Management 58 (3), pp. 256–266. Cited by: §1.
  • L. J. van der Maaten and G. E. Hinton (2008)

    Visualizing high-dimensional data using t-sne

    Journal of machine learning research 9 (nov), pp. 2579–2605. Cited by: Figure 12, Figure 13, Figure 14, §6.2.1.
  • J. Velez, A. Sanchez, J. Sanchez, and J. Esteban (2013) Beef identification in industrial slaughterhouses using machine vision techniques. Spanish Journal of Agricultural Research 11 (4), pp. 945–957. Cited by: Figure 3.
  • X. Wang, F. M. Eliott, J. Ainooson, J. H. Palmer, and M. Kunda (2017) An object is worth six thousand pictures: the egocentric, manual, multi-image (emmi) dataset. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2364–2372. Cited by: §5.1.
  • D. Wardrope (1995) Problems with the use of ear tags in cattle. Veterinary Record 137 (26), pp. 675–675. Cited by: §1.
  • R. Yoshihashi, W. Shao, R. Kawakami, S. You, M. Iida, and T. Naemura (2019) Classification-reconstruction learning for open-set recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4016–4025. Cited by: §2.3.