1 Introduction and Background
Holstein-Friesians are, with a global population of 70 million  animals, the most numerous and also highest milk-yielding  cattle breed in the world. Cattle identification (ID) via tags [19, 12, 39] is mandatory [32, 31], yet transponders , branding [1, 9] and biometric ID  via face , muzzle [34, 26, 22, 42, 10, 14], retina , rear , or coat patterns [29, 20, 27, 8] are also viable. The last can conveniently operate from a distance above and has recently been implemented via supervised deep learning . Particularly, unsupervised learning for coat pattern identification of Holstein-Friesians has not been tried and public datasets
are also viable. The last can conveniently operate from a distance above and has recently been implemented via supervised deep learning[6, 3]. However, research into reducing manual labelling efforts for creating and maintaining such ID systems is in its infancy [5, 44]
. Particularly, unsupervised learning for coat pattern identification of Holstein-Friesians has not been tried and public datasets are small to date.
This paper addresses these shortcomings and introduces the largest ID-annotated dataset of Holstein-Friesians: Cows2021 so far, alongside a basic self-supervision system for video identification of individual animals (see Fig. 1).
2 Dataset Cows2021
We introduce the RGB image dataset Cows2021111Available online at https://data.bris.ac.uk, which features a herd of Holstein-Friesian cattle (see Fig. 2) and was acquired via an Intel D435 at University of Bristol’s Wyndhurst Farm in Langford Village, UK. The camera pointed downwards from above the ground over a walkway (see Fig. 3) between milking parlour and holding pens. Motion-triggered recordings took place after milking across month of filming.
The dataset is resolved at pixels per frame with 8bit per RGB channel. It contains still images, in addition to videos (each of length ) at 30fps. The distribution of stills across individuals and time reflects the natural workings of the farm (see Fig. 4). Various expert ground truth (GT) annotations are provided alongside the acquired dataset.
Oriented Bounding-Box Cattle Annotations. Adhering to the VOC 2012 guidelines  for object annotation, we manually labelled222Tool used: https://github.com/cgvict/roLabelImg all visible cattle torso instances across the still image set. Annotations excluded the head, neck, legs and tail. Significantly clipped torso instances were (following ) not used further and given a ‘clipped’ tag. Example images from the resulting set of non-clipped cattle torso annotations are given in red in Fig. 3 (bottom). Each oriented bounding-box label is parameterised by a tuple: corresponding to the box centrepoint, width, height, and head direction.
Animal Identity Annotations. Overall detected (see Sec. 3) cattle instances were manually ID-assigned to one of individuals (see Fig. 2). The all-black cows were excluded from the ID study subject to future research. The number of occurrences of individuals varies from to with mean (see Fig. 4). of these annotations were filmed on different days to the video data. These were used to form the identity test data.
Video Data and Tracklet Annotations. In addition to still images, the dataset contains videos with tracklet information designed for utilisation as a rich source of self-supervision in identity learning. Using a highly reliable ID-agnostic cattle detector (see Sec. 3) and sampling at 5Hz, tracking-by-detection was employed to connect nearest centrepoints of detections in neighbouring frames and thereby extract entire tracklets of the same individual (see Fig. 1). Manual checking ensured no tracking errors occurred. The average number of tracklets per video is .
3 ID-agnostic Cattle Detector
Existing multi-object single-frame cattle detectors [11, 7, 5] produce image-aligned bounding-boxes that cannot avoid capturing several individuals in crowded scenes (see Fig. 3), which is problematic for subsequent identity assignment. In response, we constructed a first orientation-aware cattle detector (see Fig. 3 blue) by modifying RetinaNet  with an ImageNet-pretrained
with an ImageNet-pretrained ResNet50 backbone . We added additional target parameters for orientation encoding and rotated anchors implemented in 5 layers (P3 - P7). To train the network, we partitioned the still image set approximately for training, validation and testing, respectively. We used timestamps to split data so any temporal bias is reduced. We then trained the network against Focal Loss  with settings , , via SGD  with a learning rate of , momentum of , and weight decay of . Fig. 5 illustrates training and depicts full performance benchmarks for the detector. For the test set, it operates at an Average Precision of using an Intersection over Union (IoU) threshold of , reliably translating in-barn videos to tracklets.
4 Self-Supervised Animal Identity Learning
Given an ID-agnostic cattle detector (see Sec. 3), reliable tracklets can be generated (see Sec. 2) from readily available in-barn videos of a Holstein-Friesian herd. We investigated how far this data can be used to self-supervise the learning of filmed individual animals to aid the time-consuming task of manual labelling.
4.1 Contrastive Training
Identification Network and Triplet Loss. We use a ResNet50  pretrained on ImageNet
pretrained on ImageNet, modified to have a fully-connected final layer to learn a latent -dimensional ID-space. Across the training data of all videos, we normalise each tracklet for rotation (as seen in Fig. 2) and organise it into a ‘positive’ ID sample set representing the same, unknown individual. We pair this set against ‘negative’ samples from random cattle of other videos, which have a high chance of containing a different individual. All sets are enhanced via rotational augmentation (max. angle ). The separate image data was used as a validation and testing base, split . Reciprocal triplet loss (RTL)  is then employed for learning an ID-encoding latent space via an online batch hard mining strategy :
where and are sampled from the ‘positive’ set and is a ‘negative’ sample. We trained the network for 7 hours via SGD  over epochs with batch size , learning rate , margin , and weight decay . The pocket algorithm  against the validation set was used to tackle overfitting (see Fig. 6).
4.2 Animal Identity Discovery via Clustering
Clustering. We then fitted  a Gaussian Mixture Model (GMM)  to the generated -dimensional space by setting the cluster cardinality to the known patterned individual animals with iterations. Resulting clusters are then interpreted as representing separate animal identities. A t-distributed Stochastic Neighbour Embedding (t-SNE)  of the training set projected into the clustered space is visualised in Fig. 7. In order to evaluate the clustering performance, we used two measures: the Adjusted Rand Index (ARI)  and ID prediction accuracy. For the latter, each GMM cluster is assigned to the one individual ID with the highest overlap which is defined as:
where is the number of images in a GMM cluster that belong to an individual, and is the total number of images of the individual. This produces (GMM Cluster)-(ID Label) pairs for accuracy evaluation.
Top-N Accuracy. In order to quantitatively evaluate the capacity to aid human annotation, we consider a scenario where a user annotates IDs as a one-out-of-N pick (expanding N if the correct ID is not present). Thus, the Top-N system accuracy  is a key measure to investigate. For each cluster one can rank all identities according to . Identities that have a form the randomly assigned tail of the sequence. For every data point this provides a general Top-N assigned ID. Finding the GT identity amongst the Top-N assigned IDs is then counted as correct identification.
5 Experimental Results and Discussion
Structural Clustering Similarity. In order to characterise the ID performance as if this were a new, unknown herd, we calculated the ARI to be for the test set when measured between the partitioning derived from the clustering provided by the GMM versus the identity GT. This measure captures the (purely structural) similarities between the two clusterings.
Clustering Accuracy. In order to characterise the ID performance with class labels, we calculated Top-N accuracy for the test set as depicted in Table 1. Figure 8 visualises the identification performance and misclassification severity using a t-SNE plot.
Context and Result Discussion. Considering that classes were used and absolutely no training labelling was provided, results of Top-1 accuracy and Top-4 accuracy are an encouraging and practically relevant first step towards self-supervision in this domain. We know that individual Holstein-Friesian identification via supervised deep learning is a widely solved task with systems achieving near-perfect benchmarks when using multi-frame LRCNs  and good results even in partial annotation settings . However, labelling efforts are laborious for supervised systems of larger herds; they require days if not weeks of manual annotation effort using visual dictionaries of animal ground truth. Humans can efficiently compare small sets of images. Thus, using the described pipeline we could present the user with a set of (e.g. ) images that contain the correct individual with a chance better than 3-in-4. As part of a toolchain, the approach presented can potentially dramatically reduce labelling times and help bootstrap production systems via combinations of self-supervised learning followed by open set fine-tuning
) images that contain the correct individual with a chance better than 3-in-4. As part of a toolchain, the approach presented can potentially dramatically reduce labelling times and help bootstrap production systems via combinations of self-supervised learning followed by open set fine-tuning.
In this paper we presented the largest identity-annotated Holstein-Friesian cattle dataset, Cows2021, made available to date. We also showed a first self-supervision framework for identifying individual animals. Driven by the enormous labelling effort involved in constructing visual cattle identification systems, we proposed exploiting coat pattern appearance across videos as a self-supervision signal. A generic cattle detector yielded oriented bounding-boxes which were normalised and augmented. Triplet loss contrastive learning was then used to construct a latent space wherein we fitted a GMM. This yielded a cattle identity classifier which we evaluated. Our results showed that the achieved accuracy levels are strong enough to help speed up ID labelling efforts for supervised systems in the future. Despite the need for even larger datatsets, we hope that the published dataset, code, and benchmark will stimulate research in the area of self-supervision learning for biometric animal (re)identification.
Acknowledgements. This work was supported by The Alan Turing Institute under the EPSRC grant EP/N510129/1 and the John Oldacre Foundation through the John Oldacre Centre for Sustainability and Welfare in Dairy Production, Bristol Veterinary School. Jing Gao was supported by the China Scholarship Council. We thank Kate Robinson and the Wyndhurst Farm staff for their assistance with data collection.
-  Sarah JJ Adcock, Cassandra B Tucker, Gayani Weerasinghe, and Eranda Rajapaksha. Branding practices on four dairies in kantale, sri lanka. Animals, 8(8):137, 2018.
-  A Allen, B Golden, M Taylor, D Patterson, D Henriksen, and R Skuce. Evaluation of retinal imaging technology for the biometric identification of bovine animals in northern ireland. Livestock science, 116(1-3):42–52, 2008.
-  William Andrew. Visual biometric processes for collective identification of individual Friesian cattle. PhD thesis, University of Bristol, 2019.
-  Will Andrew, Tilo Burghardt, Neill Campbell, and Jing Gao. The opencows2020 dataset, 2020. https://data.bris.ac.uk/data/dataset/10m32xl88x2b61zlkkgz3fml17.
-  William Andrew, Jing Gao, Neill Campbell, Andrew W Dowsey, and Tilo Burghardt. Visual identification of individual holstein friesian cattle via deep metric learning. arXiv preprint arXiv:2006.09205, 2020.
William Andrew, Colin Greatwood, and Tilo Burghardt.
Visual localisation and individual identification of holstein
friesian cattle via deep learning.
Proceedings of the IEEE International Conference on Computer Vision, pages 2850–2859, 2017.
-  William Andrew, Colin Greatwood, and Tilo Burghardt. Aerial animal biometrics: Individual friesian cattle recovery and visual identification via an autonomous uav with onboard deep inference. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 237–243. IEEE, 2019.
William Andrew, Sion Hannuna, Neill Campbell, and Tilo Burghardt.
Automatic individual holstein friesian cattle identification via selective local coat pattern matching in rgb-d imagery.In 2016 IEEE International Conference on Image Processing (ICIP), pages 484–488. IEEE, 2016.
-  Ali Ismail Awad. From classical methods to animal biometrics: A review on cattle identification and tracking. Computers and Electronics in Agriculture, 123:423–435, 2016.
-  Ali Ismail Awad and M Hassaballah. Bag-of-visual-words for cattle identification from muzzle print images. Applied Sciences, 9(22):4914, 2019.
-  Jayme Garcia Arnal Barbedo, Luciano Vieira Koenigkan, Thiago Teixeira Santos, and Patrícia Menezes Santos. A study on the detection of cattle in uav images using deep learning. Sensors, 19(24):5436, 2019.
-  W Buick. Animal passports and identification. Defra Veterinary Journal, 15:20–26, 2004.
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei.
Imagenet: A large-scale hierarchical image database.
2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
Hagar M El Hadad, Hamdi A Mahmoud, and Farid Ali Mousa.
Bovines muzzle classification based on machine learning techniques.Procedia Computer Science, 65:864–871, 2015.
-  M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
-  Food and Agriculture Organization of the United Nations. Gateway to dairy production and products. http://www.fao.org/dairy-production-products/production/dairy-animals/cattle/en/. [Online; accessed 4-August-2020].
-  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
-  Alexander Hermans, Lucas Beyer, and Bastian Leibe. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737, 2017.
-  R Houston. A computerised database system for bovine traceability. Revue Scientifique et Technique-Office International des Epizooties, 20(2):652, 2001.
-  Hengqi Hu, Baisheng Dai, Weizheng Shen, Xiaoli Wei, Jian Sun, Runze Li, and Yonggen Zhang. Cow identification based on fusion of deep parts features. Biosystems Engineering, 192:245–256, 2020.
-  Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of classification, 2(1):193–218, 1985.
-  Akio Kimura, Kazushi Itaya, and Takashi Watanabe. Structural pattern recognition of biological textures with growing deformations: A case of cattle’s muzzle patterns. Electronics and Communications in Japan (Part II: Electronics), 87(5):54–66, 2004.
-  M Klindtworth, G Wendl, K Klindtworth, and H Pirkelmann. Electronic identification of cattle with injectable transponders. Computers and electronics in agriculture, 24(1-2):65–79, 1999.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton.
Imagenet classification with deep convolutional neural networks.In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012.
-  Hjalmar S Kühl and Tilo Burghardt. Animal biometrics: quantifying and detecting phenotypic appearance. Trends in ecology & evolution, 28(7):432–441, 2013.
Santosh Kumar and Sanjay Kumar Singh.
Automatic identification of cattle using muzzle point pattern: a hybrid feature extraction and classification paradigm.Multimedia Tools and Applications, 76(24):26551–26580, 2017.
-  Wenyong Li, Zengtao Ji, Lin Wang, Chuanheng Sun, and Xinting Yang. Automatic individual identification of holstein dairy cows using tailhead images. Computers and electronics in agriculture, 142:622–631, 2017.
-  Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
-  Carlos A Martinez-Ortiz, Richard M Everson, and Toby Mottram. Video tracking of dairy cows for assessing mobility scores. 2013.
-  Alessandro Masullo, Tilo Burghardt, Dima Damen, Toby Perrett, and Majid Mirmehdi. Who goes there? exploiting silhouettes and wearable signals for subject identification in multi-person environments. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 0–0, 2019.
-  United States Department of Agriculture (USDA) Animal and Plant Health Inspection Service. Cattle identification. https://www.aphis.usda.gov/aphis/ourfocus/animalhealth/nvap/NVAP-Reference-Guide/Animal-Identification/Cattle-Identification. [Online; accessed 14-November-2018].
-  European Parliament and Council. Establishing a system for the identification and registration of bovine animals and regarding the labelling of beef and beef products and repealing council regulation (ec) no 820/97. http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex:32000R1760, 1997. [Online; accessed 29-January-2016].
-  F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
-  WE Petersen. The identification of the bovine by means of nose-prints. Journal of dairy science, 5(3):249–258, 1922.
-  Ning Qian. On the momentum term in gradient descent learning algorithms. Neural networks, 12(1):145–151, 1999.
-  Yongliang Qiao, Daobilige Su, He Kong, Salah Sukkarieh, Sabrina Lomax, and Cameron Clark. Individual cattle identification using a deep learning based framework. IFAC-PapersOnLine, 52(30):318–323, 2019.
-  Douglas A Reynolds. Gaussian mixture models. Encyclopedia of biometrics, 741:659–663, 2009.
-  Herbert Robbins and Sutton Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400–407, 1951.
-  C Shanahan, B Kernan, G Ayalew, K McDonnell, F Butler, and S Ward. A framework for beef traceability from farm to slaughter using global standards: an irish perspective. Computers and electronics in agriculture, 66(1):62–69, 2009.
-  I Stephen. Perceptron-based learning algorithms. IEEE Transactions on neural networks, 50(2):179, 1990.
-  Million Tadesse and Tadelle Dessie. Milk production performance of zebu, holstein friesian and their crosses in ethiopia. Livestock Research for Rural Development, 15(3):1–9, 2003.
-  Alaa Tharwat, Tarek Gaber, Aboul Ella Hassanien, Hasssan A Hassanien, and Mohamed F Tolba. Cattle identification using muzzle print images based on texture features approach. In Proceedings of the Fifth International Conference on Innovations in Bio-Inspired Computing and Applications IBICA 2014, pages 217–227. Springer, 2014.
Laurens JP van der Maaten and Geoffrey E Hinton.
Visualizing high-dimensional data using t-sne.Journal of machine learning research, 9(nov):2579–2605, 2008.
-  Maxime Vidal, Nathan Wolf, Beth Rosenberg, Bradley P Harris, and Alexander Mathis. Perspectives on individual animal identification from biology and computer vision. arXiv preprint arXiv:2103.00560, 2021.