1 Introduction
Classification is a fundamental task in the field of machine learning. While deep learning classifiers have obtained superior performance on numerous applications, they generally require a large amount of labeled data. For small data sets, traditional classification algorithms remain valuable.
The nearest neighbor (NN) classifier is one of the oldest established methods for classification, which compares the distances between a new instance and the training instances. However, with different metrics, the performance of NN would be quite different. Hence it is very beneficial if we can find a wellsuited and adaptive distance metric for specific applications. To this end, metric learning is an appealing technique. It enables the algorithms to automatically learn a metric from the available data. Metric learning with a convex objective function was first proposed in the seminal work of Xing et al. [1]. After that, many other metric learning methods have been developed and widely adopted, such as the Large Marin Nearest Neighbor (LMNN) [2] and the Information Theoretic Metric Learning [3]. Some theoretical work has also been proposed for metric learning, especially on deriving different generalization bounds [4, 5, 6, 7] and deep networks have been used to represent nonlinear metrics [8, 9]. In addition, metric learning methods have bee developed for specific purposes, including multioutput tasks [10], multiview learning [11]
, medical image retrieval
[12], kinship verification tasks [13, 14], face recognition tasks
[15], tracking problems [16] and so on.Most aforementioned methods use a single metric for the whole metric space and thus may not be wellsuited for data sets with multimodality. To solve this problem, local metric learning algorithms have been proposed [17, 2, 18, 19, 20, 21, 22, 23, 24].
Most of these localized algorithms can be categorized into two groups: 1) Each data point or cluster of data points has a local metric . This, however, results in an asymmetric distance as illustrated in [18], i.e. would cause . 2) Each line segment or cluster of line segments has a local metric, i.e. . The definitions of , such as in [20] where is defined as to guarantee the symmetry and or
is based on the posterior probability that the point
belongs to the th Gaussian cluster in a Gaussian mixture (GMM), are nonetheless not very intuitive.In this short paper, we define an intuitive, new symmetric distance, and a novel local metric learning method. By splitting the metric space into influential regions and a background region, we define the distance between any two points as the sum of lengths of line segments in each region, as illustrated in Figure 1. Building multiple influential regions solves the multimodality issues; and learning a suitable local metric in each influential region improves class separability, as shown in Figure 2.
To establish our new distance and local metric learning method, we first define some key concepts, namely influential regions, local metrics and line segments, which lead to the definition of the new distance. Then we calculate the distance by discussing the geometric relationship between line segment and influential regions. After that, we use the proposed local metric to build a novel classifier and study its learnablity. The penalty terms from the derived learning bound, together with the empirical hinge loss, form an optimization problem, which is solved via gradient descent due to the nonconvexity. Finally we experiment the proposed local metric learning algorithm on 14 publicly available data sets. On eight of these data sets, the proposed algorithm achieves the best performance, much better than the stateoftheart metric learning competitors.
2 Definitions of Influential Regions, Local Metrics and Distance
In this section, we will first define influential regions , and the background region . With a local metric for each region and , the distance between and will be defined as the sum of lengths of line segments in each influential region and the background region, as illustrated in Figure 1. Since the metric is defined with respect to line segments, the distance is symmetric, i.e. .
To simplify the calculation required later, we restrict the shape of each influential region to be a ball.
Definition 1.
Influential regions are defined to be any set of balls inside the metric space:
where denotes the number of influential regions; , in which denotes a ball with the center at and radius of ; the location of each influential region is determined by the Euclidean distance; and points construct a set with the following form
(1) 
Definition 2.
Background region is defined to be the region excluding influential regions:
where indicates the universe set.
Throughout this paper, the distance between two points and is equivalent to the length of line segment , i.e. . Length in influential regions and the background region will be defined separately with respective metrics.
Definition 3.
Each influential region has its own local metric . The length of a line segment inside an influential region is defined as^{1}^{1}1Since influential regions are restricted to be ballshaped and a ball is a convex set, the line segment would lie in the ball for any two point and inside the ball.
(2) 
To make illustrations more intuitive, the distance adopted in this paper will be based on the Mahalanobis distance^{2}^{2}2This is different the usually adopted squared Mahalanobis distance and enjoys convenience when solving the optimization problem..
Definition 4.
The background region has a background metric . For any two points and , the length of a line segment is defined as
We make two remarks here:

While the metrics and will be learned inside influential regions and the background region, the Euclidean distance is used to determine the location of influential regions.

For and , the distance between and is generally different from . It is because some parts of the line segment may lie in influential regions so their lengths should be calculated via the related local metrics.
To calculate the distance between any and , we need to consider the relationship between the line segment and influential regions, which can be simplified as one of the following three cases: nointersection, tangent and withintersection.
Definition 5.
The intersection of a line segment and an influential region is denoted as . In the case of nointersection, ; in the case of tangent, , where is the tangent point; in the case of withintersection, , where is the maximum subline segment of inside , is the point which lies closer to and is the point which lies closer to . On the other hand, the intersection of a line segment and the background region B is defined as
(3) 
where is the union of intersections between the line segment and all influential regions. It could also be understood as a set of nonoverlapping line segments^{3}^{3}3This could be easily proved by recursively combining any overlapping line segments until no overlapping one is found..
Accordingly, the length of line segment can be calculated through the length of intersection.
Definition 6.
The length of intersection of a line segment and an influential region is defined as . In the case of tangent or nointersection, ; in the case of withintersection, it is defined to be the length of , i.e. . On the other hand, the length of the intersection of a line segment and the background region is defined as
(4) 
Definition 7.
The length of line segment is defined as
(5) 
where is the metric of the line segment . will be simplified as afterwards.
3 Calculation of Distances
3.1 Calculation of the length of intersection with influential regions
Notation  Detail 

2  
Line  Values of  Values of  Illustration  
Nointersect  Figure 3.1  
Tangent  Figure 3.2  
Figure 3.3  
Figure 3.4  
withintersect  Figure 3.5  
Figure 3.6  
Figure 3.7  
Figure 3.8 
We will first provide an intuitive explanation of calculating the length of intersection with influential regions, as illustrated in Figure 3. If the line does not intersect with or is the tangent to the influential ball, the length is zero. This is equivalent to identifying the start and end points of line and the ball, , via one variable quadratic equation. If the line intersects with the ball, we will calculate the length by considering the relationship between the intersection of the line and the influential ball, i.e. , and the intersection of the line segment and the influential ball, i.e. . can be obtained based on points and the constraint that the start and end points should be on the linear segment .
Definition 8.
The intersection points of the line and the influential region are represented as and , where , and are called the intersection coefficients between the line and . The intersection points of the line segment and the influential region are represented as and , where and are called the intersection coefficients between the line segment and . is called the intersection ratio.
Proposition 1.
The length of intersection between line segment and the influential region , with the intersection points and intersection coefficients , is
(6) 
As shown in the above proposition, the length of intersection can be calculated given the local metric and , where the latter term can be obtained from and .
Now we discuss the computation of , which can be divided into two steps.
1) Calculate the intersection points of the line and the ball: and , i.e. and .
The coefficients and could be easily solved through the following quadratic equation with one variable:
(7) 
with ; and when , the solutions to the above equation are
Hence the two intersection points between the ball and the line become
For simplicity, the superscript and subscript for , , , and will be discarded if no confusion is caused.
2) Calculate the intersection points of the line segment and the ball: and , i.e. and .
We check the number of solutions to (7). If (7) has 0 or 1 solution, the line has no intersection or is tangent to the region, and thus . If it has two solutions, the intersection between the line and the ball is a line segment . Based on the value of ^{4}^{4}4If and only if the value of or lies in the range of , the corresponding point lies inside the line segment ., we can obtain the relationship between and and get the values of and from
3.2 Calculation of the length of intersection with local metrics
Proposition 2.
In the case of nonoverlapping influential regions, i.e. ,
(8) 
where is defined as the intersection ratio of the background region, and in the nonoverlapping case .
Proposition 2 suggests that the distance can be obtained once we have metrics (, ) and the intersection ratio . As all calculations are in closed form, the computation is efficient.
4 Classifier and Learnability
Lipschitz continuous functions are a family of smooth functions which are learnable [25]. In this paper, we select Lipschitz continuous functions as the classifiers. Based on the resultant learning bounds, we obtain the terms to regularize in order to improve the generalization ability.
4.1 Classifier
In the Euclidean space, it is intuitive to see the following classifier gives the same classification results as 1NN:
where indicates that belongs to negative class and indicates that belongs to positive class; is the set that contains the Euclidean distance values between and any instance of the negative or positive class, and indicates the Euclidean distance between and .
KNN considers more nearby instances and hence is more robust than 1NN. A similar extension to consider more nearby instances based on the above equation is as follows:
(10) 
where denotes the sum of the minimal elements of the set. This function will be used as the classifier in our algorithm.
4.2 Learnability of the Classifier with Local Metrics
We will discuss the learnability of functions based on the Lipschitz constant, which characterizes the smoothness of a function. The smaller the value of Lipschitz constant, the more smooth the function is.
Definition 9.
([26]) The Lipschitz constant of a function is
Proposition 3.
([26]) Let and , then
(a) ;
(b) ;
(c) , where is a constant.
Proposition 4.
Let the Lipschitz constant of , then the Lipschitz constant of is bounded by .
Proof.
Therefore,
Based on the definition of Lipschitz constant, the proposition is proved. ∎
Lemma 1.
Proof.
Let denote the Mahalanobis distance with metric :
With the identity matrix
, is the Euclidean distance.The Lipschitz constant of is bounded by as follows:
where the first inequality follows the triangle inequality of distance, and the second inequality is based on the fact that matrix Frobenius norm is consistent with the vector
norm^{5}^{5}5The consistence between a matrix norm and a vector norm indicates , where is a matrix, is a vector, is a matrix norm and is a vector norm., i.e.Combining the results of Proposition 1 and the Corollary 6 of [25], we can obtain the following Corollary.
Corollary 1.
Let metric space have doubling dimension and let be the collection of real valued functions over with the Lipschitz constant at most . Then for any that classifies a sample of size correctly, if is correct on all but examples, we have with probability at least
(11) 
where
denotes the diameter of the space and denotes doubling dimension^{6}^{6}6The detailed definition can be found in [25].
The above learning bound illustrates the generalization ability, i.e. the difference between the expected error and the empirical error . Based on the bound, reducing the value of would help reduce the gap between the empirical error and the expected error. In other words, the learning bound indicates that regularizing would help improve the generalization ability of the classifier.
5 Optimization Problem
gradient  
0  
1  
0  
5.1 Objective Function
Based on the discussion in previous sections, with hinge loss and the regularization terms of
Comments
There are no comments yet.