The tracking of objects is an important concept in many fields. For instance, tracking within the supply chain is a key element of the Industry 4.0 philosophy [7, 19]. In the forestry industry for example, it would consist in tracking trees from the forest to their entrance in the wood yard [24, 27]. In the context of mobile robotics, being able to uniquely identify trees might improve localization in forests [17, 25, 32]
, as one would be able to use trees as visual landmarks. In order to perform tracking on trees, one must be able to re-identify them, potentially from bark images. In this paper, we precisely explore this problem, by developing a method to compare images of tree bark, and determining if they come from the same surface or not.
The difficulty of re-identifying bark surfaces arises in part from the self-similar nature of their texture. Moreover, the bark texture induces large changes in appearance when lit from different angles. This is due to the presence of deep troughs in bark, for many tree species. Another difficulty is the absence of a dataset tailored to this problem. There are already-existing bark datasets [13, 35, 9], but these are geared towards tree species classification.
To this effect, we first collected our own dataset with 200 uniquely-identified bark surface samples, for a total of 2,400 bark images. With these images, we produced a feature-matching dataset enabling the training of deep learning feature descriptors. We also have established the first state-of-the-art bark retrieval performance, showing promising results in challenging conditions. In particular, it surpassed by far common local feature descriptors such as SIFT  or SURF , as well as the novel data-driven descriptor DeepDesc : see Figure 1 for a qualitative assessment.
In short, our contributions can be summarized as follows:
We introduce a novel dataset of tree bark pictures for image retrieval. These pictures contain specific markers to infer camera plane transformation.
We train a local feature descriptor via Deep Learning and demonstrate that one can match with great success a set of different images of the same bark surface.
2 Related Work
Our problem is related to three main areas: image retrieval, local feature descriptors and metric learning. Below, we discuss these. We also discuss the application of computer vision methods to the identification of bark images.
2.1 Image retrieval
The problem of image retrieval can be defined as follows: given a query image, the goal is to find other images in a database that look similar to the query one. More specifically, in mobile robotics, for example, there is a definition known as Visual Place Recognition (VPR) [41, 2, 40, 10, 11], where image retrieval is used to perform localization. There, the objective is to determine if a location has already been visited, given its visual appearance. The robot could then localize itself by finding previously-seen images which are geo-referenced. In the area of surveillance, the problem is defined as Person Re-Identification (Person Re-Id) and aims to follow an individual through a number of security camera recordings [18, 41, 40, 14]. This technique implies to build or learn a function that maps multiple images of an individual to the same compact description, despite variation of view-point, illumination, pose or even clothes. Our tree bark re-identification is closest to this Person Re-Id problem, since we desire to track an individual bark surface despite changes in illumination and viewpoint.
2.2 Local feature descriptor
To describe and compare images while being invariant to view point and illumination changes, we chose to use local feature descriptors. The goal of these descriptors is to summarize the visual content of an image patch. The ideal descriptor is a) compact (low dimensionality) b) fast to compute c) distinctive and d) robust to illumination, translation and rotations. A widespread approach is to use hand-crafted descriptors. They often rely on histograms of orientation, scale and magnitude of image gradients, as in SIFT or SURF. Different variants have appeared over the years trying to alleviate the computation cost [8, 26] or simply trying to improve the performance [1, 3].
approaches based on machine learning have appeared. Some learn a parametric function that maps image patches to compact descriptions that can be compared by their distance[6, 30]. Instead of describing an image patch alone, 
takes two patches at once and directly provides a similarity probability. There is also work proposing a pipeline trained end-to-end (detector + descriptor)[37, 12]. For a good comparison between hand-crafted and data-driven descriptors on different tasks, see [28, 40].
2.3 Metric learning
To build a learned local feature descriptor, we turned to the field of metric learning. It is a training paradigm that tries to learn a distance function between data points. The goal is in line with the points c) and d) of an ideal descriptor since it seeks to make this distance small for similar examples, and large for dissimilar ones. This approach has been explored in [15, 30, 12], where training relied on the so-called contrastive loss
. Another line of work attempts instead to make the inter-class variation larger than the intra-class variation by a chosen margin in the vector space. This formulation corresponds to thetriplet loss [29, 2].  instead chose to compare a similar pair of examples to multiple negative ones, using a clever batch construction process.
2.4 Vision applied to bark/wood texture
Exploiting the information presents in bark images has been explored before. For instance, hand-crafted features such as Local Binary Patterns (LBP) [20, 34, 35], SIFT descriptors  and Gabor filters  have been used to perform tree species recognition. Closer to our work,  compared variants of the LBP method for image retrieval, but only at the species level. If bark is considered as a texture problem, one can find interesting work such as [38, 39] that use grounds textures such as asphalt, wood floor or other texture surfaces to enable robots to localise themselves. However, their technique is based on images with almost no variations and each query is compared with one set of SIFT descriptions from their whole texture map. Data-driven approaches such as deep learning also were applied on images of bark, but strictly for species classification .
3 Problem Definition
The problem we are addressing is an instance of re-identification. Given an existing database of bark images and a query image , our goal is to find all images in the database that correspond to the same physical surface, and hence the same tree. We assume that has a meaningful match in our database, i.e. we are not trying to solve an open-set problem; See FAB-MAP  for the detection of novel locations.
3.1 Image global signature
We perform the bark image search via global image signatures, defined as . These signatures are extracted for each image (database and query ), as depicted in Figure 2. For this, we mostly follow the method used in , summarized below. First, a keypoint detector extracts a collection of keypoints from an image. For each of these keypoints , we extract a description of dimension 128, yielding a list of descriptions . These descriptions can be from standard descriptors, such as SIFT or SURF, or our novel descriptors, described further down. The remaining component of an image signature is a Bag of Words (BoW) representation , calculated from the list of descriptions . We also apply the standard TF-IDF technique. In , the comparison between two BoW is done using the cosine distance. Instead, we have -normalized every BoW and used the distance to compare them. This way, our distance ranking is equivalent to the pure cosine distance, but without using a dot product.
3.2 Signature matching
The search is performed mainly by computing a score between a query image signature and each image signatures of the entire database and retrieving the best match based on . For the BoW technique, we simply use the distance between two BoWs as our score . Another way to calculate a score between and begins by taking the distance between every description of and to obtain a collection of putative matching pairs of features, with . Then to filter out potential false matches, one needs to add extra constraints. In this paper, we explore two such filters. The first one is the Lowe Ratio (LR) test introduced in . The second one is a Geometric Verification (GV), which is a simple neighbors check. It begins by taking a match , then retrieving the keypoints associated with each description of the match. Following this, we find the nearest neighbors of each of the keypoints in their respective images. Finally, the match is accepted if at least % of the neighbors of have a match with the neighbors of . The number of matches left after filtering is then considered as the matching score between two bark images.
4 Our approach: Data-driven descriptors
Considering tree bark highly-textured surfaces cause problem to hand-crafted descriptors, we present here the main contribution of our paper, which are data-driven descriptors for bark image re-identification. First we describe our bark image dataset. Then, we discuss how to process this dataset in order to generate keypoint-aligned image patches required to train our descriptors. These descriptors are then described in detail, followed by the necessary training details.
4.1 Bark Image Datasets
In order to develop our data-driven descriptors, we collected a dataset of tree bark images. To ensure drastic illumination changes, we took the pictures at night, and varied the position of a 550 lumen LED lamp. We also varied the position of the camera, an LG Q6 cellphone with a resolution of 4160 x 3120 pixels. Since our training approach (subsection 4.2) requires keypoint-aligned image patches, we used fiducial markers on a wooden frame attached to trees to automate and increase the precision of the image registration, as shown in Figure 3.
We collected bark images for only two different tree species, namely Red Pine (RP) (an evergreen) (50 trees, 100 unique bark surfaces) and Elm (EL) (a deciduous tree) (50 trees, 100 unique bark surfaces). Each bark was surrounded by a custom-made wooden frame which made visible a surface of 757.5 (rectangle of 50.5 by 15
). We limited ourselves to only two species, to avoid positively biasing image retrieval results. Indeed, neural networks have the capacity to easily distinguish between tree species. In total, we took 12 images per distinct surface with the aforementioned variations. To make our evaluation on EL bark more challenging, we also collected unseen bark images from elm trees without any markers. To keep these new images close to our original appearance distribution, we took them at night with 3 different illumination angles, but with limited changes in point of view. We collected a total of 30 images per tree with some physical overlap, spread nearly uniformly around the trunk. This gave us a total of 750 manually-cropped non-relevant images for any EL query taken at a scale similar to all of our other images.
4.2 Descriptor Training Dataset
Our descriptors require a dataset made of 64x64 patches for training with metric learning. Moreover, these patches not only need to be properly indexed per bark surface, they must also show the exact same physical location. After automatically cropping the excess information from images (background, frame, shadow, etc), we performed registration between every image of a bark surface with a reference frame via homography . We used the fiducial markers affixed to our wooden frame surrounding the bark surface (See Figure 3
) to estimate these transformations. Then, for each bark image, we detected the maximum number of keypoints and projected them to the reference framevia the homography . We filtered all of the keypoints in to require a minimum of 32 pixels between them to minimize overlap. This resulted in around 800-1000 distinct keypoints in . For each of these keypoints, we then found the 12 image patches (one per image, see subsection 4.1) using an homography that gives the transformation from the reference frame to a specific bark image. This resulted in a collection of 64x64 image patches corresponding to the exact same physical location on the bark, but with changes in illumination and point of view (rotation, scaling and perspective). Figure 4 shows three images of a unique bark surface, with the manual correspondence between keypoints. Figure 5 show 12 examples of a keypoint extracted according to our algorithm used to create the training dataset.
4.3 DeepBark and SqueezeBark Descriptors
To perform description extraction, we implemented two different architectures with Pytorch 0.4.1. The first one,DeepBark, is based on a pre-train version of ResNet-18 
. We removed the average pool and the fully connected layers and replaced them with one fully connected layer (no activation function). The second one,SqueezeBark, is a smaller network based on the pre-train version of SqueezeNet 1.1 
. We again removed the average pool and the fully connected layers. We replaced them with a max pooling layer (to reduce the feature map) and a fully connected layer (no activation function). In both cases, the network computes a 128-dimensional vector, fed to annormalization layer. Removing our last fully connected layer and calculating the number of parameters for the remaining convolutionnal layers, DeepBark is then composed of a total of 10,994,880 parameters and SqueezeBark includes 719,552 parameters. Our intention here is to be able to compare the impact of network size on the descriptor quality.
4.4 Training details
To train our networks (DeepBark or SqueezeBark), we chose the N-pair-mc loss . The only difference in our implementation is that, instead of using regularization to avoid degeneracy, we -normalized the descriptor vectors to keep it in a hypersphere .
Our dataset is composed of 64x64 patches containing patches for around 70,800 distinct keypoints for the training set and 17,500 for the validation set for most of our experiments. Using 12 patches by keypoint for training and 2 for validation, this totals 884,600 64x64 bark images patches. At each iteration, we only used a pair of example for every keypoint in the training set. However, to ensure an equal probability for every patch to be seen together with every other patch, we randomly selected each patch tuple. We added online data augmentation in the form of color, luminosity and blurriness jitter. Each input image was normalized between by subtracting 127.5 and then divided by 128. We optimized using Adam  starting with a learning rate of and reducing it by a factor of 0.5 each time the validation plateaued for 20 iterations.
We built the validation set by finding all of the keypoints available in the bark images set aside for validation, and randomly selected 2 patches from the 12 available for each distinct keypoint. This gave us a fixed validation set, where every patch had a corresponding one. This way, during training we validated our model by selecting 50 keypoints with their 2 examples at the time and performed a retrieval test to calculate the Precision at rank 1 (P@1). The final validation score was simply the average of every P@1 calculated for every batch of 50 keypoints. After training, we selected the model with the highest validation score. The training was stopped either with early stopping when the validation stagnated for 40 iterations, or when a maximum number of iterations was reached.
Beside DeepBark and SqueezeBark, we also analysed hand-crafted descriptors, namely SIFT and SURF. We also included DeepDesc, a learned descriptor train on the multi-view stereo dataset . All descriptors use the SIFT keypoint detector, except for SURF that uses its own detector. For all experiments, we used a ratio of 0.8 for the LR test, and set and for the GV filter. Also, each visual vocabulary
was computed from the training images of each respective experiments while being clustered using the k-mean algorithm. As we will see later, these parameters offered good performance and we did not try to adjust any of them to further improve the results
Image retrieval can be evaluated in multiple ways. In our case, we favored metrics based on an ordered set, as they align best with our problem description. Hence, we chose to present results with the Precision at rank K (P@K), Recall at rank K (R@K), R Precision (R-P) and Average Precision (AP). Here are P@K and R@K:
In Equation 3, represents one image and is the rank where the image can be found. Taking the mean of every give the AP. Keep in mind that these metrics are calculated for every query, averaged together. Also thus, instead of AP we write mean Average Precision (mAP).
5.1 Hyperparameters search
Our approach comprises a number of hyperparameters to select. First is the maximum allowable numberof keypoints in an image. From experiments, increasing beyond 500 keypoints did not significantly improve the performance of any descriptor. The second hyperparameter is the downsizing factor of the original image. Downsizing an image allowed the receptive field of any method to be increased, without changing its process. Our experiments showed that using generally helped every descriptor. Our third hyperparameter is the sigma used in the blurring performed before passing the image through the keypoint detector. Note that the blur was done for the keypoint detection, but after that we used either the unblurred image for computing the description of learned descriptors (DeepBark, SqueezeBark and DeepDesc) or the blurred image for SIFT and SURF. The latter was necessary, as they use the keypoint information found on the blurred image. We found that the best blur filter value varied greatly between descriptors. A sample of the results is in Table 1 and the chosen values for the subsection 5.3 experiment are shown in Table 2. These values were found by averaging the results of 36 randomly-selected queries run against the validation set for each hyperparameter combination.
|Descriptors||Avg. Keypoint Num.|
5.2 Impact of training data size
Data-driven approaches based on Deep Learning tend to be data hungry. To check the impact of the training data size, we created 5 training scenarios by tree species, which used 10%, 20%, 30%, 40% and 50% of the dataset. All trained descriptors were validated and tested on the same folds (10% and 40% respectively) of each species dataset. We stopped training when the validation P@1 stagnated for 40 consecutive iterations.
Table 3 shows the performance of the descriptor DeepBark, for each training set size. For each species, the P@1, the R-P and the mAP are reported for the three scoring techniques: GV, LR and BoW. It is good to note that the BoW is also affected by the size of the training set, since the of the BoW is computed from that same training set. From these metrics, we concluded that performance gains were minimal beyond 40%. This confirmed that our training database is sufficiently large to obtain good performance. For references, when using 50% of RP as training data, we have access to approximately 42,700 distinct keypoints giving 512,000 bark images of 64x64 pixels.
5.3 Descriptors comparison
We selected 50% of red pine bark surfaces and 50% of elm bark surfaces to create a test set, while using the remaining data for the training and validation sets. This corresponded to 80 unique bark surfaces for the training, 20 for the validation and 100 for testing, while keeping the ratio between tree species to 50/50 in each set. Our data-driven descriptors DeepBark and SqueezeBark were trained for 200 iterations, and we kept the model with the best validation. With 12 images for each bark surface, the test set has a total of 1200 images, with 600 per tree species. Each of these images were used as a query during the retrieval test. The results were averaged over all queries. We report results in Figure 6 as PR curves. This way, all 11 true positives are taken into account in our experimentations, properly estimating how well our approach resists to strong illumination/viewpoint changes.
From Figure 6, we can see that across almost all descriptors, the GV or the LR are better scoring methods than the BoW. This is understandable, as BoW is more intended as a pre-filtering tool to reduce the number of potential candidates. We can also see that DeepBark clearly dominates SIFT, SURF and DeepDesc. We can also notice that the precision is over 98% up to a recall of 6 images with the GV. Interestingly, the results for SqueezeBark are mitigated. This might indicate that finding a good descriptor for bark images under strong illumination changes is a difficult problem, and thus requires a neural architecture with sufficient capacity.
5.4 Generalization across species
In the experiments of subsection 5.3, we reported results on networks trained on both species, instead of training and testing each architecture on a single species. Our intention was to double the amount of training data, and benefiting from the potential synergy between species, which is often seen in deep networks (multi-task learning). Here, we precisely look at the generalization of our networks across species. We thus devised two experiments to evaluate the generalization from one kind of bark to the other and vice versa. The first one is composed of a training set with 80% of the RP data, using the remaining 20% as the validation set and all of the EL data is the test set (labelled RP->EL). We also performed the converse (EL->RP). We only report in Figure 7 the PR curve for the GV, as the trend is similar for other scoring methods. Figure 7 first shows that DeepBark is capable of generalizing across species, but that SqueezeBark do so to a lesser extent. Also, there is no clear trend for the generalization direction, since SqueezeBark generalized better from EL to RP but DeepBark generalized better in the opposite direction (from RP to EL).
5.5 Extra negative examples
To test how our system would perform on a larger database, we added 6,700 true negative elm examples with a crop size similar to query images. Half of them were original images, and the other images were generated via data augmentation, by doing either a rotation, scale or affine transformation. Note that the original 3,350 images contain some physical overlap, as they come from 25 trees.
We reused the DeepBark network and the previously trained in subsection 5.3. For the test, we removed the red pine images and kept the elm images that we separated in two crops (top and bottom halves) giving us a total of 1,200 images. We thus obtain a database of 7,900 bark images. Again, every query had 11 relevant images. This experiment is the only one where we split bark images in two crops, solely done to increase the database size. This has the side effect of also dropping the performance, as the visible bark (and thus the number of visible features) is reduced by half. This can be seen by comparing Figure 6 and Figure 8.
Among the three scoring methods evaluated, the most affected by the amount of negative examples was the BoW, as seen in Figure 8 and Table 4. The LR filter displays a smaller degree of degradation, as a function of the amount of extra negative examples. However, it still retains almost the same P@1. Finally, when looking at the GV, it is clear that the impact of extra negative examples is negligible. This again demonstrates the importance of performing GV filtering. We can also extrapolate that our approach with GV would work on a much larger, realistic dataset.
5.6 Computing time considerations
Even if the LR test and the GV filter perform better, it is unrealistic to use them to search a whole database, in a realistic scenario. Instead, the BoW can be used as pre-filtering to propose putative candidates to the other methods. To this end, we provide Table 5, which shows the R@K for various . These results suggest that keeping the 200 best matching scores calculated using the BoW on DeepBark would retain 73.9% of the 11 relevant images among 7,900 possible matches. As shown by , the BoW is fast to compare and can handle large datasets. To get a sense of the time that could be saved by the pre-filtering, we report in Table 6 the average time of these operations using our current algorithm on a single thread. It is important to note that the BoW technique could be even faster using an inverted index and by taking advantage of its sparsity (on average 71.8% of it has a null entry in our experiments). From this, we can see that applying the GV on the top from the original 7,900 images, can be accomplished in 35.88 , while the BoW only took 0.016 for the 7,900 images.
|Mean Time ()||0.002||131.499||179.387|
In this paper, we explored bark image re-identification in the challenging context of strong illumination and viewpoint variations. To this effect, we introduced a novel bark image dataset, from which we can extract over 2 million keypoint-registered image patches. Using the latter, we developed two learned local feature descriptors based on Deep Learning and metric learning, namely DeepBark and SqueezeBark. After seeing that a descriptor can perform well with only 40 % of the dataset from one tree species, we showed that both our descriptors performed better than SIFT, SURF and DeepDesc on any of the three scoring methods presented. Our results indicate that using our descriptor DeepBark, retrieval is viable even for large datasets with thousands of negative examples. Moreover, the approach can be sped up by using Bag-of-Words.
Our results are very encouraging, but performance in real-life scenario might differ and thus more data should be collected. Also, it would be interesting to quantify the effect of the BoW size, the generalization capacity over more tree species or the effect of using different keypoint detectors. Further improvement to the training procedure could be done, such as allowing more training iterations, trying other networks, adding pre-training or employing hard mining approaches. Finally, our results open the road for new bark re-identification applications.
We would like to thank Marc-André Fallu for the access to his red pine plantation. The authors would also like to thank the "Fonds de recherche du Québec – Nature et technologies (FRQNT)" for their financial support. We are also grateful to Fan Zhou for the help in the data collection. Finally, we thank to NVIDIA for their Hardware Grant Program.
-  (2012) KAZE features. In ECCV, pp. 214–227. Cited by: §2.2.
-  (2018-06) NetVLAD: cnn architecture for weakly supervised place recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (6), pp. 1437–1451. Cited by: §2.1, §2.3.
-  (2012) Three things everyone should know to improve object retrieval. In CVPR, pp. 2911–2918. Cited by: §2.2.
-  (2006) SURF: speeded up robust features. In CVIU, Vol. 110, pp. 404–417. Cited by: §1.
-  (2015) A comparison of multi-scale local binary pattern variants for bark image retrieval. In ACIVS, pp. 764–775. Cited by: §2.4.
-  (2011-01) Discriminative learning of local image descriptors. Transactions on Pattern Analysis and Machine Intelligence 33 (1), pp. 43–57. Cited by: §2.2, §5.
-  (2011-11) Using rfid for the management of pharmaceutical inventory-system optimization and shrinkage control. Decision Support Systems 51, pp. 842–852. Cited by: §1.
-  (2010) BRIEF: binary robust independent elementary features. In ECCV, pp. 778–792. Cited by: §2.2.
Tree species identification from bark images using convolutional neural networks. IROS. Cited by: §1, §2.4, §4.1.
-  (2008-06) FAB-map: probabilistic localization and mapping in the space of appearance. The International Journal of Robotics Research 27 (6), pp. 647–665. Cited by: §2.1, §3.
-  (2009-06) Highly scalable appearance-only slam - fab-map 2.0. In Robotics: Science and Systems, pp. 1828–1833. Cited by: §2.1, §5.6.
-  (2018) SuperPoint: self-supervised interest point detection and description. CVPRW, pp. 337–349. Cited by: §2.2, §2.3.
-  (2010-02) Automated identification of tree species from images of the bark , leaves or needles. pp. 67–74. Cited by: §1, §2.4.
-  (2007) Evaluating appearance models for recognition, reacquisition, and tracking. In International Workshop on Performance Evaluation for Tracking and Surveillance, Cited by: §2.1.
-  (2006) Dimensionality reduction by learning an invariant mapping. In CVPR, pp. 1735–1742. Cited by: §2.3.
-  (2016-06) Deep residual learning for image recognition. In CVPR, Vol. 19, pp. 770–778. Cited by: §4.3.
-  (2009) Autonomous forest vehicles: historic, envisioned and state-of-the-art. International Journal of Forest Engineering 20 (1), pp. 31–38. Cited by: §1.
-  (2017) In defense of the triplet loss for person re-identification. CoRR abs/1703.07737. External Links: Cited by: §2.1.
-  (2012-05) Challenges for building rfid tracking systems across the whole supply chain. International Journal of RF Technologies Research and Applications 3, pp. 201–218. Cited by: §1.
-  (2006) Bark classification based on contourlet filter features using rbpnn. In Intelligent Computing, pp. 1121–1126. Cited by: §2.4.
-  (2016-02) SqueezeNet: alexnet-level accuracy with 50x fewer parameters and <0.5mb model size. CoRR abs/1602.07360. External Links: Cited by: §4.3.
-  (2015) Adam: a method for stochastic optimization. CoRR abs/1412.6980. External Links: Cited by: §4.4.
-  (2004-11) Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60 (2), pp. 91–110. Cited by: §1, §3.2.
-  (2014-01) Forestry wood supply chain information system using rfid technology. IIE Annual Conference and Expo, pp. 1562–1571. Cited by: §1.
-  (2007-04) Recognising and modelling landmarks to close loops in outdoor slam. In ICRA, pp. 2036–2041. Cited by: §1.
-  (2011) ORB: an efficient alternative to sift or surf. In ICCV, pp. 2564–2571. Cited by: §2.2.
-  (2018-05) Automated system for natural resources management. CEUR-WS. Cited by: §1.
-  (2017-07) Comparative evaluation of hand-crafted and learned local features. In CVPR, pp. 6959–6968. Cited by: §2.2.
FaceNet: a unified embedding for face recognition and clustering. In CVPR, pp. 815–823. Cited by: §2.3, §4.4.
-  (2015-12) Discriminative learning of deep convolutional feature point descriptors. In ICCV, pp. 118–126. Cited by: §1, §2.2, §2.3.
-  (2003) Video google: a text retrieval approach to object matching in videos. In ICCV, pp. 1470–1477 vol.2. Cited by: §3.1.
-  (2017) Toward low-flying autonomous mav trail navigation using deep neural networks for environmental awareness. IROS, pp. 4241–4247. Cited by: §1.
-  (2016) Improved deep metric learning with multi-class n-pair loss objective. NIPS (Nips), pp. 1857–1865,. Cited by: §2.3, §4.4.
-  (2013-11) Kernel-mapped histograms of multi-scale lbps for tree bark recognition. In IVCNZ, Vol. , pp. 82–87. Cited by: §2.4.
-  (2014) Computer-vision-based tree trunk recognition. Univerza v Ljubljani. Cited by: §1, §2.4.
-  (2015-06) MatchNet: unifying feature and metric learning for patch-based matching. In CVPR, Vol. 07-12-June, pp. 3279–3286. Cited by: §2.2.
-  (2016) LIFT: learned invariant feature transform. In ECCV, Cited by: §2.2.
-  (2017) High-precision localization using ground texture. ICRA, pp. 6381–6387. Cited by: §2.4.
-  (2018-06) Learning to detect features in texture images. In CVPR, pp. 6325–6333. Cited by: §2.4.
-  (2016-08) SIFT meets cnn: a decade survey of instance retrieval. Transactions on Pattern Analysis and Machine Intelligence PP. Cited by: §2.1, §2.2.
-  (2017-12) A discriminatively learned cnn embedding for person reidentification. ACM Trans. Multimedia Comput. Commun. Appl. 14 (1), pp. 13:1–13:20. Cited by: §2.1.
-  (2003-12) Plant species recognition based on bark patterns using novel gabor filter banks. In NIPS, Vol. 2, pp. 1035–1038. Cited by: §2.4.