1 Introduction
Although the Euclidean distance is a simple and convenient metric, it is often not an accurate representation of the underlying shape of the data [Frome et al., 2006]. Such a representation is crucial in many realworld applications [Yang et al., 2011, Boiman et al., 2008], such as object classification [Fink, 2005, Frome et al., 2007], text document retrieval [Lebanon, 2006, Wang et al., 2010] and face verification [Chopra et al., 2005, Nguyen and Bai, 2011], and methods that learn a distance metric from training data have hence been widely studied in recent years. We present a new angle on the metric learning problem based on random forests [Amit and Geman, 1997, Breiman, 2001] as the underlying distance representation. The emphasis of our work is the capability to incorporate the absolute position of point pairs in the input space without requiring a separate metric per instance or exemplar. In doing so, our method, called random forest distance (RFD), is able to adapt to the underlying shape of the data by varying the metric based on the position of sample pairs in the feature space while maintaining the efficiency of a single metric. In some sense, our method achieves a middleground between the two main classes of existing methods—single, global distance functions and multimetric sets of distance functions—overcoming the limitations of both (see Figure 1 for an illustrative example). We next elaborate upon these comparisons.
The metric learning literature has been dominated by methods that learn a global Mahalanobis metric, with representative methods [Xing et al., 2003, BarHillel et al., 2003, Hoi et al., 2006, Davis et al., 2007, Weinberger and Saul, 2009, Shen et al., 2010, Nguyen and Guo, 2008, Shi et al., 2011]
. In brief, given a set of pairwise constraints (either by sampling from label data, or collecting side information in the semisupervised case), indicating pairs of points that should or should not be grouped (i.e., have small or large distance, respectively), the goal is to find the appropriate linear transformation of the data to best satisfy these constraints. One such method
[Xing et al., 2003] minimizes the distance between positivelylinked points subject to the constraint that negativelylinked points are separated, but requires solving a computationaly expensive semidefinite programming problem. Relevant Component Analysis (RCA) [BarHillel et al., 2003] learns a linear Mahalanobis transformation to satisfy a set of positive constraints. Discriminant Component Analysis (DCA) [Hoi et al., 2006] extends RCA by exploring negative constraints. ITML [Davis et al., 2007] minimizes the LogDet divergence under positive and negative linear constraints, and LMNN [Weinberger and Saul, 2009, Shen et al., 2010] learns a distance metric through the maximum margin framework. [Nguyen and Guo, 2008] formulate metric learning as a quadratic semidefinite programming problem with local neighborhood constraints and linear time complexity in the original feature space. More recently, researchers have begun developing fast algorithms that can work in an online manner, such as POLA [ShalevShwartz et al., 2004], MLCL [Globerson and Roweis, 2006] and LEGO [Jain et al., 2008].These global methods learn a single Mahalanobis metric using the relative
position of point pairs:
. Although the resulting single metric is efficient, it is limited in its capacity to capture the shape of complex data. In contrast, a second class, called multimetric methods, distributes distance metrics throughout the input space; in the limit, they estimate a distance metric per instance or exemplar, e.g.,
[Frome et al., 2006, 2007] for the case of Mahalanobis metrics. [Zhan et al., 2009] extend [Frome et al., 2006] by propagating metrics learned on training exemplars to learn a matrix for each unlabeled point as well. However, these pointbased multimetric methods all suffer from high time and space complexity due to the need to learn and store by metric matrices. A more efficient approach to this second class is to divide the data into subsets and learn a metric for each subset [Babenko et al., 2009, Weinberger and Saul, 2008]. However, these methods have strong assumptions in generating these subsets; for example, [Babenko et al., 2009] learns at most one metric per category, forfeiting the possibility that different samples within a category may require different metrics.We propose a metric learning method that is able to achieve both the efficiency of the global methods and specificity of the multimetric methods. Our method, the random forest distance (RFD), transforms the metric learning problem into a binary classification problem and uses random forests as the underlying representation [Amit and Geman, 1997, Breiman, 2001, Leistner et al., 2009, Biau and Devroye, 2010]. In this general form, we are able to incorporate the position of samples implicitly into the metric and yet maintain a single and efficient global metric. To that end, we use a novel pointpair mapping function that encodes both the position of the points relative to each other and their absolute position within the feature space. Our experimental analyses demonstrate the importance of incorporating position information into the metric (Section 3).
We use the random forest as the underlying representation for several reasons. First, the output of the random forest algorithm is a simple “yes” or “no” vote from each tree in the forest. In our case, “no” votes correspond to positively constrained training data, and “yes” votes correspond to negatively constrained training data. The number of yes votes, then, is effectively a distance function, representing the relative resemblance of a point pair to pairs that are known to be dissimilar versus pairs that are known to be similar. Second, random forests are efficient and scale well, and have been shown to be one of the most powerful and scalable supervised methods for handling highdimensional data
[Caruana and NiculescuMizil, 2006]—in contrast to instancespecific multimetric methods [Frome et al., 2006, 2007], the storage requirement of our method is independent of the size of the input data set. Our experimental results indicate RFD is at least 16 times faster than the state of the art multimetric method. Third, because random forests are nonparametric, they make minimal assumptions about the shape and patterning of the data [Breiman, 2001], affording a flexible model that is inherently nonlinear. In the next section, we describe the new RFD method in more detail, followed by a thorough comparison to the state of the art in Section 3.2 Random Forest Distance: Implicitly PositionDependent Metric Learning
Our random forestbased approach is inspired by several other recent advances in metric learning [ShalevShwartz et al., 2004, Babenko et al., 2009] that reformulate the metric learning problem into a classification problem. However, where these approaches restricted the form of the learned distance function to a Mahalanobis matrix, thus precluding the use of position information, we adopt a more general formulation of the classification problem that removes this restriction.
Given the instance set , each
is a vector of
features. Taking a geometric interpretation of each , we consider the position of sample in the space . The value of this interpretation will become clear throughout the paper as the learned metric will implicitly vary over , which allows it to adapt the learned metric based on local structure in a manner similar to the instancespecific multimetric methods, e.g., [Frome et al., 2006]. Denote two pairwise constraint sets: a mustlink constraint set and are similar and a donotlink constraint set and are dissimilar. For any constraint , denote as the ideal distance between and . If , then the distance , otherwise . Therefore, we seek a function from an appropriate function space :(1) 
where
is some loss function that will be specified by the specific classifier chosen. In our random forests case, we minimize expected loss, as in many classification problems. So consider
to be a binary classifier for the classes and . For flexibility, we redefine the problem as , where is some classification model, and is a mapping function that maps the pair to a feature vector that will serve as input for the classifier function . To train , we transform each constraint pair using the mapping function and submit the resulting set of vectors and labels as training data. We next describe the feature mapping function .2.1 Mapping function for implicitly positiondependent metric learning
In actuality, all metric learning methods implicitly employ a mapping function . However, Mahalanobis based methods are restricted in terms of what features their metric solution can encode. These methods all learn a (positive semidefinite) metric matrix , and a distance function of the form , which can be reformulated as , where denotes vectorization or flattening of a matrix. Mahalanobisbased methods can thus be viewed as using the mapping function . This function encodes only relative position information, and the Mahalanobis formulation allows the use of no other features.
However, our formulation affords a more general mapping function:
(2) 
which considers both the relative location of the samples as well as their absolute position . The output feature vector is the concatenation of these two and in .
The relative location represents the same information as the Mahalanobis mapping function. Note, we take the absolute value in to enforce symmetry in the learned metric. The primary difference between our mapping function and that of previous methods is thus the information contained in —the mean of the two point vectors. It localizes each mapped pair to a region of the space, which allows our method to adapt to heterogeneous distributions of data. It is for this reason that we consider our learned metric to be implicitly positiondependent. Note the earlier methods that learn positionbased metrics, i.e. the methods that learn a metric per instance such as [Frome et al., 2006], incorporate absolute position of each instance only, whereas we incorporate the absolute position of each instance pair, which adds additional modeling versatility.
We note that alternate encodings of the position information are possible but have shortcomings. For example, we could choose to simply concatenate the position of the two points rather than average them, but this approach raises the issue of ordering the points. Using would again yield a nonsymmetric feature, and an arbitrary ordering rule would not guarantee meaningful feature comparisons. The usefulness of position information varies depending on the data set. For data that is largely linear and homogenous, including will only add noise to the features, and could worsen the accuracy. In our experiments, we found that for many real data sets (and particularly for more difficult data sets) the inclusion of significantly improves the performance of the metric (see Section 3).
Dataset  Size  Dim.  No. Classes  Dataset  Size  Dim.  No. Classes 

Balance  625  4  3  Iris  150  4  3 
BUPA Liver Disorders  345  6  2  Pima Indians Diabetes  768  8  2 
Breast Cancer  699  10  2  Wine  178  13  3 
Image Segmentation  2310  19  7  Sonar  208  60  2 
Semeion Handwritten Digits  1593  256  10 
Multiple Features Handwritten Digits 
2000  649  10 
UCI data sets used for KNNclassification testing
2.2 Random forests for metric learning
Random forests are well studied in the machine learning literature and we do not describe them in any detail; the interested reader is directed to
[Amit and Geman, 1997, Breiman, 2001]. In brief, a random forest is a set of decision trees
operating on a common feature space, in our case . To evaluate a pointpair , each tree independently classifies the sample (based on the leaf node at which the pointpair arrives) as similar or dissimilar (0 or 1, respectively) and the forest averages them, essentially regressing a distance measure on the pointpair:(3) 
where is the classification output of tree .
It has been found empirically that random forests scale well with increasing dimensionality, compared with other classification methods [Caruana and NiculescuMizil, 2006], and, as a decision treebased method, they are inherently nonlinear. Hence, our use of them in RFD as a regression algorithm allows for a more scalable and more flexible metric than is possible using Mahalanobis methods. Moreover, the incorporation of position information into this classification function (as described in Section 2.1) allows the metric to implicitly adapt to different regions over the feature space. In other words, when a decision tree in the random forest selects a node split based on a value of the absolute position subvector (see Eq. 2), then all evaluation in the subtree is localized to a specific halfspace of . Subsequent splits on elements of further refine the subspace of emphasis . Indeed, each path through a decision tree in the random forest is localized to a particular (possibly overlapping) subspace.
The RFD is not technically a metric but rather a pseudosemimetric. Although RFD can easily be shown to be nonnegative and symmetric, it does not satisfy the triangle inequality (i.e., ) or the implication that , sometimes called identity of indiscernibles. It is straightforward to construct examples for both of these cases. Although this point may appear problematic, it is not uncommon in the metric learning literature. For example, by necessity, no metric whose distance function varies across the feature space can guarantee the triangle inequality is satisfied. [Frome et al., 2006, 2007] similarly cannot satisfy the triangle inequality. Our method must violate the triangle inequality in order to fulfill our original objective of producing a metric that incorporates position data. Moreover, our extensive experimental results demonstrate the capability of RFD as a distance (Section 3).
3 Experiments and Analysis
In this section, we present a set of experiments comparing our method to state of the art metric learning techniques on both a range of UCI data sets (Table 1) and an image data set taken from the Corel database. To substantiate our claim of computational efficiency, we also provide an analysis of running time efficiency relative to an existing positiondependent metric learning method.
For the UCI data sets, we compare performance at the nearest neighbor classification task against both standard Mahalanobis methods and pointbased positiondependent methods. For the former, we test NN classification accuracy at a range of kvalues (as in Figure 1), while the latter relies on results published by other methods’ authors, and thus uses a fixed . For the image data set, we measure accuracy at NN retrieval, rather than NN classification. We compare our results to several Mahalanobis methods.
The following is an overview of the primary experimental findings to be covered in the following sections.

RFD has comparable or superior accuracy to state of the art positionspecific methods (Table 3).

RFD is 16 to 85 times faster than the state of the art positionspecific method (Table 4).

RFD outperforms the state of the art in nine out of ten categories in the benchmark Corel image retrieval problem (Figure
4).
3.1 Comparison with global Mahalanobis metric learning methods
We first compare our method to a set of state of the art Mahalanobis metric learning methods: RCA [BarHillel et al., 2003], DCA [Hoi et al., 2006], InformationTheoretic Metric Learning (ITML)[Davis et al., 2007] and distance metric learning for largemargin nearest neighbor classification (LMNN) [Weinberger and Saul, 2009, Shen et al., 2010]. For our method, we test using the full feature mapping including relative position data, , and absolute pairwise position data, , (RFD (P)) as well as with only relative position data, , (RFD (
P)). To provide a baseline, we also show results using both the Euclidean distance and a heuristic Mahalanobis metric, where the
used is simply the covariance matrix for the data. All algorithm code was obtained from authors’ websites, for which we are indebted (our code is available on http://www.cse.buffalo.edu/~jcorso).We test each algorithm on a number of standard small to medium scale UCI data sets (see Table 1
). All algorithms are trained using 1000 positive and 1000 negative constraints per class, with the exceptions of RCA, which used only the 1000 positive constraints and LMNN, which used the full label set to actively select a (generally much larger) set of constraints; constraints are all selected randomly according to a uniform distribution. In each case, we set the number of trees used by our method to 400 (
see Section 3.2 for a discussion of the effect of varying forest sizes).Testing is performed using 5fold cross validation on the nearestneighbor classification task. Rather than selecting a single value for this task, we test with varying s, increasing in increments of 5 up to the maximum possible value for each data set (i.e. the number of elements in the smallest class). By varying in this way, we are able to gain some insight into each method’s ability to capture the global variation in a data set. When is small, most of the identified neighbors lie within a small local region surrounding the query point, enabling linear metrics to perform fairly well even on globally nonlinear data by taking advantage of local linearity. However, as increases, local linearity becomes less practical, and the quality of the metric’s representation of the global structure of the data is exposed. Though the accuracy results at higher values do not have strong implications for each method’s efficacy for the specific task of NN classification (where an ideal value can just be selected by crossvalidation), they do indicate overall metric performance, and are highly relevant to other tasks, such as retrieval.
Figure 2 show the accuracy plots for ten UCI datasets. RFD is consistently near the top performers on these various data sets. In the lower dimension case (Iris), most methods perform well, and RFD without position information outperforms RFD with position information (this is the sole data set in which this occurs), which we attribute to the limited data set size (150 samples) and the position information acting as a distractor in this small and highly linear case. In all other cases, the RFD with absolute position information significantly outperforms RFD without it. In many of the more more difficult cases (Diabetes, Segmentation, Sonar), RFD with position information significantly outperforms the field. This result is suggestive that RFD can scale well with increasing dimensionality, which is consistent with the findings from the literature that random forests are one of the most robust classification methods for highdimensional data [Caruana and NiculescuMizil, 2006].
Table 2 provides a summary statistic of the methods by computing the meanrank (lower better) over the ten data sets at varying values. For all but one value of , RFD with absolute position information has the best mean rank of all the methods (and for the offcase, it is ranked a close second). RFD without absolute position information performs comparatively poorer, underscoring the utility of the absolute position information. In summary, the results in Table 2 show that RFD is consistently able to outperform the state of the art in global metric learning methods on various benchmark problems.
value  Euclid  Mahal  RCA  DCA  ITML  LMNN  RFD (P)  RFD (P) 

5  5.8 (8)  5.7 (7)  4.3 (4)  4.8 (5)  3.9 (3)  3.2 (2)  5.4 (6)  2.9 (1) 
10  6.1 (8)  5.6 (7)  3.7 (3)  4.6 (4)  4.8 (5)  2.9 (1)  5.1 (6)  3.2 (2) 
15  5.7 (8)  5.4 (6)  3.9 (3)  4.7 (5)  5.6 (7)  3.1 (2)  4.6 (4)  3 (1) 
20  5.6 (8)  5.4 (7)  3.8 (3)  5.2 (5)  5.3 (6)  3.7 (2)  4.5 (4)  2.5 (1) 
25  6.1 (8)  5.3 (6)  4 (3)  4.5 (4)  5.4 (7)  3.4 (2)  4.8 (5)  2.5 (1) 
30  5.8 (7)  5.9 (8)  4.5 (5)  4.3 (3)  5.3 (6)  3.5 (2)  4.3 (3)  2.4 (1) 
35  5.8 (8)  5.4 (6)  4.3 (4)  4.9 (5)  5.5 (7)  4 (3)  3.8 (2)  2.3 (1) 
45  6.6 (8)  5.5 (6)  4.4 (4)  4.4 (4)  5.9 (7)  3.3 (2)  4.1 (3)  1.8 (1) 
Max  6.5 (8)  6.1 (7)  5.1 (5)  3.7 (3)  5.5 (6)  3.7 (3)  3.5 (2)  1.9 (1) 
3.2 Varying forest size
One question that must be addressed when using RFD is how many trees must or should be learned in order to obtain good results. Increasing the size of the forest increases computation and space requirements, and past a certain point yields little or no improvement and may possibly overtrain. It is beyond the scope of this paper to provide a full answer as to how many trees are needed in RFD, but we have made some observations.
First, the addition of absolute position information noticeably increases the benefit that may be obtained from additional trees (see Figure 3). This result is unsurprising, considering the increased size of the feature vector, as well as the increased degree of finetuning possible for a metric that can vary from region to region. Second, in our experiments we observe significant improvements in accuracy up to about 100 trees, even without position information, and would recommend this as a reasonable minimum value. It seems reasonable that larger constraintsets will require larger forests, and similarly, the more complex the shape of the data, the larger the forest may need to be. But, these two points have not yet been thoroughly explored by our group.
Dataset  RFD  ISD L1  ISD L2  FSM  FSSM 

Balance  .120.024  .114.013  .116.014  0.134.020  0.143.013 
Diabetes  .241.028  .287.019  .269.023  .342.050  .322.232 
Breast(Scaled)  .030.011  .0.31.010  .030.010  .102.041  .112.029 
German  .277.039  .277.015  .274.013  .275.021  0.275.060 
Haberman  .273.029  .277.029  .273.025  .276.032  .276.029 
Dataset  ISD Time  RFD Time  ISD:RFD Ratio 

Iris  34.6  2.1  16.4 
Balance  620.3  11.2  55.3 
Breast (scaled)  657.4  7.8  84.6 
Diabetes  849.5  14.7  57.8 
3.3 Comparison with positionspecific multimetric methods
We compare our method to three multimetric methods that incorporate absolute position (via instancespecific metrics): FSM, FSSM and ISD. FSM [Frome et al., 2006] learns an instancespecific distance for each labeled example. FSSM [Frome et al., 2007] is an extension of FSM that enforces global consistency and comparability among the different instancespecific metrics. ISD [Zhan et al., 2009] first learns instancespecific distance metrics for each labeled data point, then uses metric propagation to generate instancespecific metrics for unlabeled points as well.
We again use the ten UCI data sets, but under the same conditions used by these methods’ authors. Accuracy is measured on the NN task (=11) with threefold cross validation. The parameters of the compared methods are set as suggested in [Zhan et al., 2009]. Our RFD method chooses 1% of the available positive constraints and 1% of the available negative constraints, and constructs a random forest with 1000 trees. We report the average result of ten different runs on each data set, with random partitions of training/testing data generated each time (see Table 3). These results show that our RFD method yields performance better than or comparable to state of the art explicitly multimetric learning methods. Additionally, because we only learn one distance function and random forests are an inherently efficient technique, our method offers significantly better computational efficiency than these instancespecific approaches (see Table 4)—between 16 to 85 times faster than ISD.
The comparable level of accuracy is not surprising. While our method is a single metric in form, in practice its implicit positiondependence allows it to act like a multimetric system. Notably, because our method learns using the position of each pointpair rather than each point, it can potentially encode up to implicit positionspecific metrics, rather than the learned by existing positiondependent methods, which learn a single metric per instance/position. RFD is a stronger way to learn a positiondependent metric, because even explicit multimetric methods will fail over global distances in cases where a single (Mahalanobis) metric cannot capture the relationship between its associated point and every other point in the data.
3.4 Retrieval on the Corel image data set
We also evaluate our method’s performance on the challenging image retrieval task because this task differs from NN classification by emphasizing the accuracy of individual pairwise distances rather than broad patterns. For this task, we use an image data set taken from the Corel image database. We select ten image categories of varying types (cats, roses, mountains, etc.—the classes and images are similar to those used by Hoi et al. to validate DCA [Hoi et al., 2006]), each with a clear semantic meaning. Each class contains 100 images, for a total of 1000 images in the data set.
For each image, we extract a 36dimensional lowlevel feature vector comprising color, shape and texture. For color, we extract mean, variance and skewness in each HSV color channel, and thus obtain 9 color features. For shape, we employ a Canny edge detector and construct an 18dimensional edge direction histogram for the image. For texture, we apply Discrete Wavelet Transformation (DWT) to graylevel versions of original RGB images. A Daubechies4 wavelet filter is applied to perform 3level decomposition, and mean, variance and mode of each of the 3 levels are extracted as a 9dimensional texture feature.
We compare three state of the art algorithms and a Euclidean distance baseline: ITML, DCA, and our RFD method (with absolute position information). For ITML, we vary the parameter from to and choose the best (). For each method, we generate 1% of the available positive constraints and 1% of the available negative constraints (as proposed in [Hoi et al., 2006]). For RFD, we construct a random forest with 1500 trees. Using fivefold cross validation, we retrieve the 20 nearest neighbors of each image under each metric. Accuracy is determined by counting the fraction of the retrieved images that are the same class as the image that retrieved them. We repeat this experiment 10 times with differing random folds and report the average results in Figure 4. RFD clearly outperforms the other methods tested, achieving the best accuracy on all but the cougar category. Also note that ITML performs roughly on par with or worse than the baseline on 7 classes, and DCA on 5, while RFD fails only on 1, indicating again that RFD provides a better global distance measure than current state of the art approaches, and is less likely to sacrifice performance in one region in order to gain it in another.
4 Conclusion
In this paper, we have proposed a new angle to the metric learning problem. Our method, called random forest distance (RFD), incorporates both conventional relative position of point pairs as well as absolute position of point pairs into the learned metric, and hence implicitly adapts the metric through the feature space. Our evaluation has demonstrated the capability of RFD, which has best overall performance in terms of accuracy and speed on a variety of benchmarks.
There are immediate directions of inquiry that have been paved with this paper. First, RFD further demonstrates the capability of classification methods underpinning metric learning. Similar feature mapping functions and other underlying forms for the distance function need to be investigated. Second, the utility of absolute pairwise position is clear from our work, which is a good indication of the need for multiple metrics. Open questions remain about other representations of the position as well as the use of position in other metric forms, even the classic Mahalanobis metric. Third, there are connections between random forests and nearestneighbor methods, which may explain the good performance we have observed. We have not explored them in any detail in this paper and plan to in the future. Finally, we are also investigating the use of RFD on largerscale, more diverse data sets like the new MIT SUN image classification data set.
Acknowledgements
We are grateful for the support in part provided through the following grants: NSF CAREER IIS0845282, ARO YIP W911NF1110090, DARPA Mind’s Eye W911NF1020062, DARPA CSSG D11AP00245, and NPS N002441110022. Findings are those of the authors and do not reflect the views of the funding agencies.
References
 Amit and Geman [1997] Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. Neural computation, 9(7):1545–1588, 1997. ISSN 08997667.
 Babenko et al. [2009] B. Babenko, S. Branson, and S. Belongie. Similarity metrics for categorization: from monolithic to category specific. In ICCV, pages 293–300, 2009.
 BarHillel et al. [2003] A. BarHillel, T. Hertz, N. Shental, and D. Weinshall. Learning distance functions using equivalence relations. In ICML, volume 20, page 11, 2003.

Biau and Devroye [2010]
G. Biau and L. Devroye.
On the layered nearest neighbour estimate, the bagged nearest
neighbour estimate and the random forest method in regression and
classification.
Journal of Multivariate Analysis
, 101(10):2499–2518, 2010.  Boiman et al. [2008] O. Boiman, E. Shechtman, and M. Irani. In defense of nearestneighbor based image classification. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008.
 Breiman [2001] L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001. ISSN 08856125.

Caruana and NiculescuMizil [2006]
R. Caruana and A. NiculescuMizil.
An empircal comparison of supervised learning algorithms.
In ICML, 2006.  Chopra et al. [2005] S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. In CVPR, volume 1, pages 539–546. IEEE, 2005. ISBN 0769523722.
 Davis et al. [2007] J.V. Davis, B. Kulis, P. Jain, S. Sra, and I.S. Dhillon. Informationtheoretic metric learning. In ICML, pages 209–216, 2007.
 Fink [2005] M. Fink. Object classification from a single example utilizing class relevance metrics. In Advances in neural information processing systems 17: proceedings of the 2004 conference, volume 17, page 449. The MIT Press, 2005.
 Frome et al. [2006] A. Frome, Y. Singer, and J. Malik. Image retrieval and classification using local distance functions. NIPS, 19:417, 2006. ISSN 10495258.
 Frome et al. [2007] A. Frome, Y. Singer, F. Sha, and J. Malik. Learning globallyconsistent local distance functions for shapebased image retrieval and classification. In ICCV, pages 1–8. IEEE, 2007.
 Globerson and Roweis [2006] A. Globerson and S. Roweis. Metric learning by collapsing classes. NIPS, 18:451, 2006. ISSN 10495258.
 Hoi et al. [2006] S.C.H. Hoi, W. Liu, M.R. Lyu, and W.Y. Ma. Learning distance metrics with contextual constraints for image retrieval. In CVPR, volume 2, pages 2072–2078. IEEE, 2006. ISBN 0769525970.
 Jain et al. [2008] P. Jain, B. Kulis, and K. Grauman. Fast image search for learned metrics. CVPR, 2008.
 Lebanon [2006] G. Lebanon. Metric learning for text documents. PAMI, pages 497–508, 2006. ISSN 01628828.
 Leistner et al. [2009] C. Leistner, A. Saffari, J. Santner, and H. Bischof. Semisupervised random forests. In Computer Vision, 2009 IEEE 12th International Conference on, pages 506–513. IEEE, 2009.
 Nguyen and Bai [2011] H. Nguyen and L. Bai. Cosine similarity metric learning for face verification. Computer Vision–ACCV 2010, pages 709–720, 2011.
 Nguyen and Guo [2008] N. Nguyen and Y. Guo. Metric Learning: A Support Vector Approach. ECML PKDD, pages 125–136, 2008.
 ShalevShwartz et al. [2004] S. ShalevShwartz, Y. Singer, and A.Y. Ng. Online and batch learning of pseudometrics. In ICML, page 94. ACM, 2004.
 Shen et al. [2010] C. Shen, J. Kim, and L. Wang. Scalable LargeMargin Mahalanobis Distance Metric Learning. Neural Networks, IEEE Transactions on, 21(9):1524–1530, 2010. ISSN 10459227.
 Shi et al. [2011] Y. Shi, Y.K. Noh, F. Sha, and D.D. Lee. Learning discriminative metrics via generative models and kernel learning. Arxiv preprint arXiv:1109.3940, 2011.
 Wang et al. [2010] J. Wang, S. Wu, H.Q. Vu, and G. Li. Text document clustering with metric learning. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 783–784. ACM, 2010.
 Weinberger and Saul [2008] K.Q. Weinberger and L.K. Saul. Fast solvers and efficient implementations for distance metric learning. In ICML, pages 1160–1167. ACM, 2008.
 Weinberger and Saul [2009] K.Q. Weinberger and L.K. Saul. Distance metric learning for large margin nearest neighbor classification. JLMR, 10:207–244, 2009. ISSN 15324435.
 Xing et al. [2003] E.P. Xing, A.Y. Ng, M.I. Jordan, and S. Russell. Distance metric learning with application to clustering with sideinformation. NIPS, pages 521–528, 2003. ISSN 10495258.
 Yang et al. [2011] W. Yang, Y. Wang, and G. Mori. Learning transferable distance functions for human action recognition. Machine Learning for VisionBased Motion Analysis, pages 349–370, 2011.
 Zhan et al. [2009] D.C. Zhan, M. Li, Y.F. Li, and Z.H. Zhou. Learning instance specific distances using metric propagation. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1225–1232. ACM, 2009.
Comments
There are no comments yet.