I Introduction and Related Work
This paper presents an unmanned aerial vehicle (UAV) platform with onboard deep learning inference (see Fig. 1) that autonomously locates and visually identifies individual Holstein Friesian cattle by their uniquely-coloured coats in low altitude flight (approx. 10m) within a geo-fenced farm area. The task encompasses the integrated performance of species detection, exploratory agency, and individual animal identification (ID). All tasks are performed entirely onboard a custom DJI M100 quadrotor with limited computational resources, battery lifetime, and payload size.
In doing so, this work attempts to assist agricultural monitoring in performing minimally-invasive cattle localisation and identification in the field. Possible applications include the behavioural analysis of social hierarchies [1, 2, 3], grazing patterns [4, 5] and herd welfare .
The search for targets with unknown locations traditionally arises in search and rescue (SAR) scenarios [7, 8, 9]. In visually-supported navigation for this task, approaches broadly operate either a map-based or map-less paradigm [10, 11]. Map-less approaches have no global environment representation and traditionally operate using template appearance matching [12, 13], optical-flow guidance , or landmark feature tracking [15, 16]. More recently, such systems have been replaced with visual input classification via convolutional neural networks (CNNs) [17, 18]. In this work, we build on a simulation setup presented in  and formulate a 2D global grid approximation of the environment (see map in Fig. 1) for storing visited positions, current location, and successful target recoveries. This concept is inspired by occupancy grid maps [20, 21], as opposed to post-exploration maps  or topological maps . For our cattle recovery task – and despite their simplicity – grid maps still represent a highly effective tool  for exploring the solution space of AI solutions [24, 25].
Coat pattern identification of individual Friesian cattle represents a form of animal biometrics . Early systems for the particular task at hand utilised the Scale-Invariant Feature Transform (SIFT)  on image sequences  or Affine SIFT (ASIFT)  to map from dorsal cow patterns to animal IDs . However, for up-to-date performance we base our individual ID component on recent CNN-grounded biometric work  where temporal stacks of the region of interest (RoIs) around detected cattle are analysed by a Long-term Recurrent Convolutional Network (LRCN)  as shown in Fig. 1 in red. This architecture represents a compromise between light-weight onboard operation and more high-end networks with heavier computational footprints.
Whilst aerial wildlife census applications routinely use manually controlled UAVs [33, 34, 35, 36], and have experimented with part-automated photographic gliders , to the best of our knowledge this paper presents the first proof-of-concept system for fully autonomous exploration and online individual biometric identification of animals onboard an aircraft.
To summarise this paper’s principal contributions:
Proof-of-concept in the viability of autonomous aerial biometric cattle ID in a real-world agricultural setting.
Novel combination of algorithms performing online target detection, identification and exploratory agency.
Validation of the employed UAV hardware setup capable of deep inference onboard the flight platform itself.
Real-world and live application of the exploratory simulation framework developed in our previous work .
We use the DJI Matrice 100 quadrotor UAV, which houses the ROS-enabled DJI N1 flight controller, as our base platform. It has been employed previously across various autonomous tasks [38, 39, 40, 41]. We extend the base M100 platform by adding an Nvidia Jetson TX2 mounted on a Connect Tech Inc. Orbitty carrier board to enable onboard deep inference via 256 CUDA cores under Nvidia’s PascalTM architecture. Also onboard is the DJI Manifold (essentially an Nvidia Jetson TK1) to decode the raw image feed from the onboard camera (the DJI Zenmuse X3 camera/gimbal) and to add further computational capacity. The X3 camera is mounted on rubber grommets and a 3-axis gimbal, which allows for rotor vibration isolation and independent movement of the flight platform for stabilised footage and programmatically controlled roll-pitch-yaw. A Quanum QM12V5A-UBEC voltage regulation device was fitted to power non-conformal devices feeding off the primary aircraft battery. In addition, customised mounts were added for WiFi antennas to monitor the craft remotely. Figure 2 depicts the complete aircraft with views of custom components, whilst Figure 3 shows a detailed overview of the communication infrastructure. Note that the base station and remote control devices act in a supervisory role only; all inputs and autonomous control are processed and issued onboard the UAV.
Iii Experimental Setup
Iii-a Location, Timing and Target Herd
Flights were performed over a two-week experimental period at the University of Bristol’s Wyndhurst Farm in Langford Village, UK (see Fig. 4) on a consistent herd of 17 yearling heifer Holstein Friesian cattle (see Fig. 5), satisfying all relevant animal welfare and flight regulations. Experiments consisted of two phases: (a) training data acquisition across 14 (semi)manual flights, and subsequent (b) conduction of 18 autonomous flights.
Iii-B Training Data, Annotation and Augmentation
Training data was acquired over two day-long recording sessions where manual and semi-autonomous flights at varying altitudes were carried out, recording video at a resolution of at fps. The result was a raw dataset consisting of 37 minutes from 15 videos over 14 flights occupying 18GB. Overall frames were extracted from these video files at a rate of .
First, after discarding frames without cattle, bounding boxes around individual cattle were labelled in the
frames containing cattle. Animals were also manually identified as ground truth for individual identification. Secondly, to produce ground truth for training the cattle detector, square sub-images (matching the YOLOv2 input tensor size) were manually annotated to encompass individuals such that they are resolved at approximatelypixels. Figure 6
illustrates the full pipeline. To synthesise additional data, augmentations for both detection and identification datasets are performed stochastically with the possibility for any combination of the operations listed as follows according to a per-operation likelihood value: horizontal & vertical flipping, crop & pad, affine transformations (scale, translate, rotate, shear), Gaussian, average or median blurring, noise addition, background variations, contrast changes, and small perspective transformation. Figure7 provides augmentation samples.
Iv Software, Implementation and Training
Iv-a Object Class Detection
Cattle detection and localisation is performed in real-time frame-by-frame using the YOLOv2  CNN. The network was retrained from scratch on the annotated region dataset, consisting of synthetic and non-synthetic training images (see Fig. 7, bottom) and associated ground truth labels. Model inference operates on images obtained by cropping and scaling the source pixel camera stream. As shown in Figure 1, this process yields a set of bounding boxes per frame with associated object confidence scores. Inference on each of sampled frames then produces a box-annotated spatio-temporal volume . Bounding boxes are associated across this volume by accumulating detections that are consistently present in cow-sized areas of an equally subdivided image. This method is effective for reliable short-term tracking due to distinct and non-overlapping targets, slow target movement and stable UAV hovering. The outputs are short individual animal tracklets reshaped into an image patch sequence , which forms the input to the individual identification network (see Section IV-D). In addition, the current frame is also abstracted to a grid map encoding animal presence in the field of view of the camera (see Fig. 1). This forms the input to the exploratory agency network discussed in the following section.
Iv-B Exploratory Agency
Navigation activities aim at locating as many – themselves moving – individual animals as possible on the shortest routes in a gridded domain where a target counts as ‘located’ once the agent occupies the same grid location as the target. To solve this dynamic travelling salesman task with city locations to be discovered on the fly, we use a dual-stream deep network architecture, as first suggested in our previous work . The method computes grid-based navigational decisions based on immediate sensory (tactical/exploitation) and historic navigational (strategic/exploration) information using two separate streams within a single deep inference network. As shown in the paper, this strategy can significantly outperform simple strategies such as a ‘lawnmower’ pattern and other baselines.
To summarise the method’s operation, the sensory input is processed via a first stream utilising a basic AlexNet  design (see Fig. 1). A second stream operates on the exploratory history thus far, as stored in a long-term memory map (see Fig. 1). This stores the agent’s present and past positions alongside animal encounters within the flight area of grid locations. The agent’s starting position is fixed and is reset after of the map has been explored. Both these streams are concatenated into a shallow integration network that, as shown in Figure 1, maps to a SoftMax-normalised likelihood vector of the possible navigational actions. During inference, the network selects the top-ranking navigational action from based on , which is performed and, in-turn, the positional history is updated.
For training, the entire two-stream navigation network is optimised via stochastic gradient decent (SGD) with momentum  and a fixed learning rate based on triples
using one-hot encoding ofand cross-entropy loss. This unified model allows for the back-propagation of navigation decision errors across both streams and domains. For training, we simulate episodes of pseudo-randomly  placed targets in a grid and calculate optimal navigation decisions by solving the associated travelling salesman problem. -fold cross validation on this setup yielded an accuracy of in making an optimal next grid navigation decision and a target recovery rate of targets per grid move. For full implementation details we refer to the original paper , which operates on simulations. In contrast, examples of real-world environment explorations during our 18 test flights are visualised in Figure 8 and Figure 9.
Iv-C Coordinate Fulfilment
Re-positioning commands from to the M100 flight platform need to be issued via local position offsets in metres with respect to a programatically-set East North Up (ENU) reference frame. As such, in order to fulfil a target GPS coordinate arising from exploratory agency, it must be converted into that frame. This is achieved by converting the target GPS coordinate into the static Earth-Centred Earth-Fixed (ECEF) reference frame, then converting that coordinate into the local ENU frame. Equally, the same process is performed on the agent’s current GPS position and the resulting local positions are compared. Our implementation follows the standard as established in the literature [47, 48].
Iv-D Identity Estimation
Individual identification based on an image patch sequence is performed via an LRCN, first introduced by Donahue et al. . In particular, as shown in Figure 1, we combine a GoogLeNet/Inception V3 CNN [49, 42]
with a single Long Short-Term Memory (LSTM) layer. This approach has demonstrated success in disambiguating fine-grained categories in our previous work .
Training of the GoogLeNet/Inception V3 network takes groups of same class randomly selected RoIs (exemplified in Fig. 7, top), each of which were non-proportionally resized to pixels. SGD with momentum , a batch size of 32 and a fixed learning rate were used for optimisation. Figure 10 (right) provides evidence of per-category learning of appropriate spatial representations using local interpretable model-agnostic explanations , which qualitatively highlight the success of the Inception architecture learning discriminative and fine-grained visual features for each individual. Once trained, samples are passed through this GoogLeNet up to the pool_5 layer and feature vectors are combined over the samples. A shallow LSTM network is finally trained on these vector sequences using a SoftMax cross-entropy cost function optimised against the one-hot encoded identities vector representing the possible classes. This approach achieved 100% validation accuracy with little training, as can be seen in Figure 10 bottom.
V Real-World Autonomous Performance
We conducted fully autonomous test flights at a low altitude (approximately m) above an area of cells (see Fig. 4) covering altogether 147 minutes. Examples of environment explorations are visualised in Figure 8 and Figure 9 depicting various example flights with detailed annotations of flight path, animal encounters and identification confidences. For all experiments, we ran the object detection and exploratory agency networks live and in real-time to navigate the aircraft. Note that the herd was naturally dispersed in these experiments and animals were free to roam across the entire field in and out of the covered area. Thus, only a few individuals were present and recoverable in this area at any one time. The median coverage of the grid was with median flight time of minutes and seconds per experiment, and a median of grid iterations per flight. For each of the flights, we conducted two types of experiment: that is (a) saving a single frame per grid location (due to onboard storage limitations) visited and perform a full separate analysis of detection and identification performance after the flight offline, and (b) also running multi-frame identification live during flight for all cases where the aircraft has navigated centrally above a detected individual.
V-a Offline After-Flight Performance Evaluation
For offline analysis, the UAV saved to file one acquired image at each exploratory agency iteration, yielding images. of those images actually contained target cows that were hand labelled with ground truth bounding box annotations and identities according to the VOC guidelines . A detection was deemed a successful true positive based on the IoU (). Grounded in this, we measured the YOLOv2 detection accuracy to be , where out of the present animals, were missed and false positive nested detections occurred (see Fig. 11). We then tested separately, the performance of the single frame Inception V3 individual identification architecture (yielding accuracy), where all ground truth bounding boxes (not only the detected instances) were presented to the ID component. In contrast, and as shown in Table I, when identification is performed on detected RoIs only then the combined offline system accuracy is 91.9%.
V-B Online In-Flight Performance Evaluation
For online autonomous operation, all computation was performed live in real-time onboard the UAV’s computers (DJI Manifold & Nvidia Jetson TX2). Figure 9 depicts various example flights with detailed annotations of flight paths, animal encounters and identification confidences. Across the grid locations visited during the set of experiments, the aircraft navigated centrally above a detected individual times triggering identification. Note that this mode of operation eliminates the problem of clipped visibility at image borders, minimises image distortions, optimises the viewpoint, and exposes the coat in a canonized orthogonal view. For triggered identification, we store intermediate LRCN confidence outputs after processing up to same-class patches to compare performance differences between single view and multi-view identification. Figure 12 depicts some same-class patch sequences and one instance where multi-frame inference was indeed beneficial to identification. The respective overall results are given in Table II. Notably, across the small online sample set ( instances), the LRCN model performs perfectly.
|Detection on||Single Frame ID||Combined|
|# Sample||# Animal||Sample Frames||on Labelled RoIs||Detection+ID|
|Frames||Instances||Accuracy (%)||Accuracy (%)||Accuracy (%)|
|LRCN Identification||Single Frame Identification|
|# Samples||Accuracy (%)||Accuracy (%)|
Vi Conclusion and Future Work
This paper provides a proof-of-concept that fully autonomous aerial animal biometrics is practically feasible. Operating in a real-world agricultural setting, the paper demonstrated that individual cattle identities can be reliably recovered biometrically from the air onboard a fully autonomous robotic agent. Experiments conducted on a small herd of 17 live cattle confirmed demonstrable identification robustness of the proposed approach. In successfully performing these tasks with limited computational resources alongside payload, weight restrictions and more, the presented system gives rise to future agricultural automation possibilities with potential positive implications for animal welfare and farm productivity.
Beyond farming, the concept of autonomous biometric animal identification from the air as presented opens up a realm of future applications in fields such as ecology, where animal identification of uniquely patterned species in the wild (e.g. zebras, giraffes) is critical to assessing the status of populations.
-  R. Ungerfeld, C. Cajarville, M. Rosas, and J. Repetto, “Time budget differences of high-and low-social rank grazing dairy cows,” New Zealand journal of agricultural research, vol. 57, no. 2, pp. 122–127, 2014.
-  S. Kondo and J. Hurnik, “Stabilization of social hierarchy in dairy cows,” Applied Animal Behaviour Science, vol. 27, no. 4, pp. 287–297, 1990.
-  C. Phillips and M. Rind, “The effects of social dominance on the production and behavior of grazing dairy cows offered forage supplements,” Journal of Dairy Science, vol. 85, no. 1, pp. 51–59, 2002.
-  P. Gregorini, “Diurnal grazing pattern: its physiological basis and strategic management,” Animal Production Science, vol. 52, no. 7, pp. 416–430, 2012.
-  P. Gregorini, S. Tamminga, and S. Gunter, “Behavior and daily grazing patterns of cattle,” The Professional Animal Scientist, vol. 22, no. 3, pp. 201–209, 2006.
-  B. Sowell, J. Mosley, and J. Bowman, “Social behavior of grazing beef cattle: Implications for management,” Journal of Animal Science, vol. 77, no. E-Suppl, pp. 1–6, 2000.
-  L. Lin and M. A. Goodrich, “Uav intelligent path planning for wilderness search and rescue,” in Intelligent robots and systems, 2009. IROS 2009. IEEE/RSJ International Conference on. IEEE, 2009, pp. 709–714.
-  S. Waharte and N. Trigoni, “Supporting search and rescue operations with uavs,” in Emerging Security Technologies (EST), 2010 International Conference on. IEEE, 2010, pp. 142–147.
-  K. Ryu, “Autonomous robotic strategies for urban search and rescue,” Ph.D. dissertation, Virginia Polytechnic Institute and State University, 2012.
-  F. Bonin-Font, A. Ortiz, and G. Oliver, “Visual navigation for mobile robots: A survey,” Journal of intelligent and robotic systems, vol. 53, no. 3, p. 263, 2008.
-  G. N. DeSouza and A. C. Kak, “Vision for mobile robot navigation: A survey,” IEEE transactions on pattern analysis and machine intelligence, vol. 24, no. 2, pp. 237–267, 2002.
-  Y. Matsumoto, M. Inaba, and H. Inoue, “Visual navigation using view-sequenced route representation,” in Robotics and Automation, 1996. Proceedings., 1996 IEEE International Conference on, vol. 1. IEEE, 1996, pp. 83–88.
-  S. D. Jones, C. Andresen, and J. L. Crowley, “Appearance based process for visual navigation,” in Intelligent Robots and Systems, 1997. IROS’97., Proceedings of the 1997 IEEE/RSJ International Conference on, vol. 2. IEEE, 1997, pp. 551–557.
J. Santos-Victor, G. Sandini, F. Curotto, and S. Garibaldi, “Divergent stereo
for robot navigation: Learning from bees,” in
Computer Vision and Pattern Recognition, 1993. Proceedings CVPR’93., 1993 IEEE Computer Society Conference on. IEEE, 1993, pp. 434–439.
-  N. Pears and B. Liang, “Ground plane segmentation for mobile robot visual navigation,” in Intelligent Robots and Systems, 2001. Proceedings. 2001 IEEE/RSJ International Conference on, vol. 3. IEEE, 2001, pp. 1513–1518.
-  P. Saeedi, P. D. Lawrence, and D. G. Lowe, “Vision-based 3-d trajectory tracking for unknown environments,” IEEE transactions on robotics, vol. 22, no. 1, pp. 119–136, 2006.
A. Giusti, J. Guzzi, D. C. Cireşan, F.-L. He, J. P. Rodríguez,
F. Fontana, M. Faessler, C. Forster, J. Schmidhuber, G. Di Caro,
, “A machine learning approach to visual perception of forest trails for mobile robots,”IEEE Robotics and Automation Letters, vol. 1, no. 2, pp. 661–667, 2016.
-  L. Ran, Y. Zhang, Q. Zhang, and T. Yang, “Convolutional neural network-based robot navigation using uncalibrated spherical images,” Sensors, vol. 17, no. 6, p. 1341, 2017.
-  W. Andrew, C. Greatwood, and T. Burghardt, “Deep learning for exploration and recovery of uncharted and dynamic targets from uav-like vision,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 1124–1131.
-  J. Borenstein and Y. Koren, “Real-time obstacle avoidance for fast mobile robots in cluttered environments,” in Robotics and Automation, 1990. Proceedings., 1990 IEEE International Conference on. IEEE, 1990, pp. 572–577.
-  G. Oriolo, M. Vendittelli, and G. Ulivi, “On-line map building and navigation for autonomous mobile robots,” in Robotics and Automation, 1995. Proceedings., 1995 IEEE International Conference on, vol. 3. IEEE, 1995, pp. 2900–2906.
-  H. P. Moravec, “The stanford cart and the cmu rover,” Proceedings of the IEEE, vol. 71, no. 7, pp. 872–884, 1983.
-  A. Kosaka and A. C. Kak, “Fast vision-guided mobile robot navigation using model-based reasoning and prediction of uncertainties,” CVGIP: Image understanding, vol. 56, no. 3, pp. 271–329, 1992.
-  J. Zhang, L. Tai, J. Boedecker, W. Burgard, and M. Liu, “Neural slam,” arXiv preprint arXiv:1706.09520, 2017.
-  A. A. Melnikov, A. Makmal, and H. J. Briegel, “Projective simulation applied to the grid-world and the mountain-car problem,” arXiv preprint arXiv:1405.5459, 2014.
-  H. S. Kühl and T. Burghardt, “Animal biometrics: quantifying and detecting phenotypic appearance,” Trends in ecology & evolution, vol. 28, no. 7, pp. 432–441, 2013.
-  D. G. Lowe, “Object recognition from local scale-invariant features,” in Computer vision, 1999. The proceedings of the seventh IEEE international conference on, vol. 2. Ieee, 1999, pp. 1150–1157.
-  C. A. Martinez-Ortiz, R. M. Everson, and T. Mottram, “Video tracking of dairy cows for assessing mobility scores,” 2013.
J.-M. Morel and G. Yu, “Asift: A new framework for fully affine invariant image comparison,”SIAM Journal on Imaging Sciences, vol. 2, no. 2, pp. 438–469, 2009.
W. Andrew, S. Hannuna, N. Campbell, and T. Burghardt, “Automatic individual holstein friesian cattle identification via selective local coat pattern matching in rgb-d imagery,” inImage Processing (ICIP), 2016 IEEE International Conference on. IEEE, 2016, pp. 484–488.
-  W. Andrew, C. Greatwood, and T. Burghardt, “Visual localisation and individual identification of holstein friesian cattle via deep learning,” in 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Oct 2017, pp. 2850–2859.
-  J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, “Long-term recurrent convolutional networks for visual recognition and description,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 2625–2634.
-  A. Hodgson, N. Kelly, and D. Peel, “Unmanned aerial vehicles (uavs) for surveying marine fauna: a dugong case study,” PloS one, vol. 8, no. 11, p. e79556, 2013.
-  W. R. Koski, T. Allen, D. Ireland, G. Buck, P. R. Smith, A. M. Macrander, M. A. Halick, C. Rushing, D. J. Sliwa, and T. L. McDonald, “Evaluation of an unmanned airborne system for monitoring marine mammals,” Aquatic Mammals, vol. 35, no. 3, p. 347, 2009.
-  A. Abd-Elrahman, L. Pearlstine, and F. Percival, “Development of pattern recognition algorithm for automatic bird detection from unmanned aerial vehicle imagery,” Surveying and Land Information Science, vol. 65, no. 1, p. 37, 2005.
-  A. Rodríguez, J. J. Negro, M. Mulero, C. Rodríguez, J. Hernández-Pliego, and J. Bustamante, “The eye in the sky: combined use of unmanned aerial systems and gps data loggers for ecological research and conservation of small birds,” PLoS One, vol. 7, no. 12, p. e50336, 2012.
-  MIT, “Sloopflyer,” https://caos.mit.edu/blog/glider-photography-sloopflyer, [Online; accessed 1-Mar-2019. Unpublished elsewhere].
-  X. Hui, J. Bian, X. Zhao, and M. Tan, “Vision-based autonomous navigation approach for unmanned aerial vehicle transmission-line inspection,” International Journal of Advanced Robotic Systems, vol. 15, no. 1, p. 1729881417752821, 2018.
-  A. F. Cobo and F. C. Benıtez, “Approach for autonomous landing on moving platforms based on computer vision,” 2016.
-  H. Yu, S. Lin, J. Wang, K. Fu, and W. Yang, “An Intelligent Unmanned Aircraft System for Wilderness Search and Rescue,” http://www.imavs.org/papers/2017/143_imav2017_proceedings.pdf, [Online; accessed 1-Mar-2019].
-  S. Kyristsis, A. Antonopoulos, T. Chanialakis, E. Stefanakis, C. Linardos, A. Tripolitsiotis, and P. Partsinevelos, “Towards autonomous modular uav missions: The detection, geo-location and landing paradigm,” Sensors, vol. 16, no. 11, p. 1844, 2016.
-  C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818–2826.
-  J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” CoRR, vol. abs/1612.08242, 2016. [Online]. Available: http://arxiv.org/abs/1612.08242
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” inAdvances in neural information processing systems, 2012, pp. 1097–1105.
-  N. Qian, “On the momentum term in gradient descent learning algorithms,” Neural networks, vol. 12, no. 1, pp. 145–151, 1999.
-  M. Matsumoto and T. Nishimura, “Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator,” ACM Transactions on Modeling and Computer Simulation (TOMACS), vol. 8, no. 1, pp. 3–30, 1998.
-  M. Barth and J. A. Farrell, “The global positioning system & inertial navigation,” McGraw-Hill, vol. 8, pp. 21–56, 1999.
-  B. Hofmann, H. Lichtenegger, and J. Collins, “Gps theory and practice,” Springer Wien NewYork, 2001.
-  C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
-  S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should i trust you?: Explaining the predictions of any classifier,” inProceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 2016, pp. 1135–1144.
-  A. Savitzky and M. J. Golay, “Smoothing and differentiation of data by simplified least squares procedures.” Analytical chemistry, vol. 36, no. 8, pp. 1627–1639, 1964.
-  M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010.