1 Introduction
The Minkowski (
) metric is inarguably one of the most commonly used quantitative distance (dissimilarity) measures in scientific and engineering applications. The Minkowski distance between two vectors
and in the dimensional Euclidean space, , is given by(1) 
Three special cases of the metric are of particular interest, namely, (cityblock metric), (Euclidean metric), and (chessboard metric). Given the general form (1), and can be defined in a straightforward fashion, while is defined as
In many applications, the data space is Euclidean and therefore the metric is the natural choice. In addition, this metric has the advantage of being isotropic (rotation invariant). For example, when the input vectors stem from an isotropic vector field, e.g. a velocity field, the most appropriate choice is to use the metric so that all vectors are processed in the same way, regardless of their orientation Barni95 .
The main drawback of is its high computational requirements due to the multiplications and the square root operation. As a result, and are often used as alternatives. Although these metrics are computationally more efficient, they deviate from significantly. The Minkowski metric is translation invariant, i.e. for all , hence it suffices to consider , i.e. the distance from the point to the origin. Therefore, in the rest of the paper, we will consider approximations to rather than .
In this paper, we examine several approximations to the Euclidean norm. The rest of the paper is organized as follows. In Section 2 we describe the Euclidean norm approximations that have appeared in the literature, and compare their average and maximum errors using numerical simulations. We then show that all of these methods fit into a single mathematical formulation. In Section 3 we examine the simulation results from a theoretical perspective. Finally, in Section 4 we provide our conclusions.
2 Euclidean Norm Approximations
For reasons explained in Sec. 1, we concentrate on approximations to the Euclidean norm on . Let , defined on , be an approximation to . We assume that is a continuous homogeneous function. We note that all variants of we consider in this paper satisfy these assumptions. As a measure of the quality of the approximation of to we define the maximum relative error (MRE) as
(2) 
Using the homogeneity of and , (2) can be written as
(3) 
where
is the unit hypersphere of with respect to the Euclidean norm. Furthermore, by the continuity of , we can replace the supremum with maximum in (3) and write
(4) 
We will use (4) as the definition of MRE throughout.
In the trivial case where we have . Hence, for nontrivial cases we wish to have a small value. In other words, the smaller the value of , the better (more accurate) the corresponding approximation . It can be shown that (cityblock norm) overestimates and the corresponding MRE is given by Chaudhuri92 . In contrast, (chessboard norm) underestimates with MRE given by Chaudhuri92 . More explicitly,
(5)  
for all . Therefore, it is natural to expect a suitable linear combination of and to give an approximation to better than both and Rhodes95 .
2.1 Chaudhuri et al.’s approximation
Chaudhuri et al. Chaudhuri92 proposed the approximation ^{1}^{1}1Unfortunately, the motivation behind this particular choice of is not given in the paper.
Here is the index of the absolute largest component of , i.e. , and is the floor function which returns the largest integer less than or equal to . Since , by adding and subtracting the term , can be written as a linear combination of and as
(6) 
It is easy to see that for all since . It can also be shown that Chaudhuri92 for sufficiently large , is closer to than both and , i.e. and for all .
For sufficiently large , underestimates and the corresponding MRE is given by
Otherwise, overestimates and we have
Proofs of these identities can be found in Chaudhuri92 .
2.2 Rhodes’ approximations
Rhodes Rhodes95 reformulated (6) as a maximum of linear functions
where is taken as a free parameter. He determined the optimal value for by minimizing analytically. In particular, he showed that optimal values for Chaudhuri et al.’s norm can be determined by solving the equation in the interval . This equation is a quartic (fourth order) in and can be solved using Ferrari’s method King08 . It can be shown that this particular quartic equation has two real and two complex roots and the optimal value is given by the smaller of the real roots. The corresponding MRE is given by Rhodes95
(7) 
In the remainder of this paper, refers to this improved variant of Chaudhuri et al.’s norm.
Rhodes also investigated the twoparameter family of approximations given by
(8) 
where . He proved that the optimal solution and its MRE in this case are given by
(9)  
Finally, Rhodes investigated the approximations with . He proved that the optimal solution and its MRE are given by
This approximation will not be considered any further since its accuracy is inferior to even the singleparameter approximation .
It should be noted that Rhodes optimized and over . Therefore, these norms are in fact suboptimal on (see §2.5).
2.3 Barni et al.’s approximation
Barni et al. Barni95 ; Barni00 formulated a generic approximation for as
where is the th absolute largest component of , i.e. is a permutation of such that . Here and are approximation parameters. Note that a nonincreasing ordering and strict positivity of the component weights, i.e. is a necessary and sufficient condition for to define a norm Barni00 .
The minimization of (4) is equivalent to determining the weight vector and the scale factor that solve the following minimax problem Barni00 ; Demyanov90
(10) 
where .
The optimal solution and its MRE are given by
(11) 
It should be noted that a similar but less rigorous approach had been published earlier by Ohashi Ohashi94 .
2.4 Seol and Cheun’s approximation
Seol and Cheun Seol08 recently proposed an approximation of the form
(12) 
where and are strictly positive parameters to be determined by solving the following linear system
where is the expectation operator.
Note that the formulation of is similar to that of (8) in that they both approximate by a linear combination of and . These approximations differ in their methodologies for finding the optimal parameters. Rhodes follows an analytical approach and derives theoretical values for the parameters and the maximum error. However, he achieves this by sacrificing maximization over , and maximizes only over . Seol and Cheun follow an empirical approach where they approximate optimal parameters over
, which causes them to sacrifice the ability to obtain analytical values for the parameters and the maximum error. They estimate the optimal values of
and usingdimensional vectors whose components are independent and identically distributed, standard Gaussian random variables.
2.5 Comparison of the Euclidean norm approximations
It is easy to see that all of the presented approximations fit into the general form
which is a weighted norm.
The component weights for each approximation are given in Table 1. It can be seen that has the most elaborate design in which each component is assigned a weight proportional to its ranking. However, this weighting scheme also presents a drawback in that a full ordering of the component absolute values is required (see Table 2).
Norm  

1  
Due to their formulations, the MRE’s for , , and can be calculated analytically using (7), (9), and (11), respectively. In Figure 1 we plot the theoretical errors for these norms for . It can be seen that is not only more accurate than and , but it also scales significantly better. Although is more accurate than when is small, the difference between the two approximations becomes less significant as is increased.
The operation counts for each norm are given in Table 2 (ABS: absolute value, COMP: comparison, ADD: addition, MULT: multiplication, SQRT: square root). The following conclusions can be drawn:

has the highest computational cost among the approximate norms due to its costly weighting scheme, which requires sorting of numbers and multiplications. For small values of , sorting can be performed most efficiently by a sorting network Cormen09 . For large values of , sorting requires comparisons, which is likely to exceed the cost of the square root operation Barni00 . Therefore, in high dimensional spaces, e.g. Seol08 , provides no computational advantage over .

has the lowest computational cost among the approximate norms. and have the same computational cost, which is slightly higher than that of .

A significant advantage of , , and is that they require a fixed number of multiplications (1 or 2) regardless of the value of .

, , and can be used to approximate (squared Euclidean norm) using an extra multiplication. On the other hand, the computational cost of is higher than that of due to the extra absolute value and sorting operations involved.
Norm  ABS  COMP  ADD  MULT  SQRT 

0  0  0  
0  0  0  
0  0  1  
1  0  
2  0  
0  
2  0 
In Table 3 we display the average and maximum errors for , , and for . Average relative error (ARE) is defined as
(13) 
where is a finite subset of the unit hypersphere , and denotes the number of elements in . An efficient way to pick a random point on is to generate independent Gaussian random variables
with zero mean and unit variance. The distribution of the unit vectors
will then be uniform over the surface of the hypersphere Muller59 . For each approximate norm, the ARE and MRE values were calculated over an increasing number of points,
(that are uniformly distributed on the hypersphere) until the error values converge, i.e. the error values do not differ by more than
in two consecutive iterations. Note that for each norm, two types of maximum error were considered: empirical maximum error (), which is calculated numerically over and the theoretical maximum error (), which is calculated analytically using (7), (9), or (11). It can be seen that for , the empirical and maximum errors agree in all cases, which demonstrates the validity of the presented iterative error calculation scheme. This is not the case for and since these norms are optimized over instead of . Therefore, a perfect agreement between the empirical and theoretical results should not be expected. Nevertheless, the empirical error is always less than the maximum error, which is expected because we are maximizing over a smaller set.Table 4 shows the average and maximum errors for . The error values under the column “Seol & Cheun” are taken from Seol08 (where the simulations were performed on a set of dimensional vectors whose components are independent and identically distributed, zero mean, and unit variance Gaussian random variables), whereas those under the column “This study” were obtained using the aforementioned iterative scheme. It can be seen that the maximum errors obtained by Seol & Cheun are lower than those that we obtained and the discrepancy between the outcomes of the two error calculation schemes increases as is increased. The optimistic maximum error values given by Seol and Cheun are due to the fact that vectors are not enough to cover the surface of the hypersphere in higher dimensions. This is investigated further in the following section. On the other hand, the average error values agree perfectly in both calculation schemes.
ARE  ARE  ARE  
2  0.0348  0.0551  0.0551  0.0276  0.0470  0.0470  0.0241  0.0396  0.0396 
3  0.0431  0.0852  0.0852  0.0367  0.0778  0.0778  0.0300  0.0602  0.0602 
4  0.0455  0.1074  0.1074  0.0420  0.1010  0.1010  0.0345  0.0739  0.0739 
5  0.0460  0.1251  0.1251  0.0447  0.1197  0.1197  0.0377  0.0839  0.0839 
6  0.0458  0.1400  0.1400  0.0462  0.1354  0.1354  0.0401  0.0919  0.0919 
7  0.0454  0.1529  0.1529  0.0469  0.1489  0.1490  0.0418  0.0984  0.0984 
8  0.0448  0.1641  0.1643  0.0471  0.1606  0.1609  0.0431  0.1039  0.1039 
9  0.0442  0.1739  0.1745  0.0471  0.1709  0.1716  0.0440  0.1086  0.1086 
10  0.0435  0.1827  0.1837  0.0469  0.1803  0.1812  0.0447  0.1128  0.1128 
Seol & Cheun  This study  

ARE  ARE  
2  0.0200  0.0526  0.0200  0.0525 
3  0.0239  0.0991  0.0239  0.0998 
4  0.0257  0.1342  0.0257  0.1363 
5  0.0268  0.1420  0.0268  0.1649 
6  0.0273  0.1674  0.0273  0.1871 
7  0.0276  0.1772  0.0276  0.1968 
8  0.0277  0.1753  0.0277  0.2076 
9  0.0277  0.1711  0.0277  0.2120 
10  0.0276  0.1526  0.0276  0.2156 

is the most accurate approximation in all cases. This is because this norm is designed to minimize the maximum error and it has a more sophisticated weighting scheme than the other two approximations, i.e. and , that are based on the same optimality criterion.

As is also evident from Figure 1, is slightly more accurate than especially for small values of
, in accordance with the greater degrees of freedom it is afforded.

is the least accurate approximation except for . This was expected since this norm is designed to minimize the mean squared error rather than the maximum error.

As is increased, the error increases in all approximations. However, as can be seen from Figure 1, the error grows faster in some approximations than others.
On the other hand, with respect to average error we can see that:

As expected, is the most accurate approximation.

As is increased, the error increases consistently for the norm. This is not the case for the and norms. This inconsistent average error behavior is not surprising given the fact that these norms are designed to minimize the maximum error.

Interestingly, is more accurate than for . A possible explanation to this phenomenon is that both approximations are optimized for the maximum error. Since the minimization of the maximum and average errors are conflicting objectives, it is likely that sacrifices the average error to obtain better (lower) maximum error. The same relationship holds between and .
3 Sampling on the Unit Hypersphere
In this section, we demonstrate why a fixed number of samples from the unit hypersphere (i.e. the approach advocated in Seol08 ) can give biased estimates for the maximum error. The basic reason behind this is the fact that a fixed number of samples fail to suffice as the dimension of the space increases. The following provides a plausibility argument as to why this is the case. To this end, we need to consider the notion of covering a sphere ‘sufficiently’. We begin with some definitions.
A closed ball of radius with respect to the Euclidean norm, denoted , is the set of points whose Euclidean norm is less than or equal to . That is,
Note that, in particular, the unit hypersphere of is the boundary of .
Given an , we say that a set of points on is an dense covering of if for any in , there exists at least one in (different than ) such that . Essentially, our main purpose here is to give a rough estimate of the number of points in , where is an dense covering of . We would then argue that, if is sufficiently small then is a fineenough representation of points on . Therefore, we can restrict any computation that needs to be performed on to the finite set .
The basic idea behind the proof is to approximate by , that is, approximate the unit hypersphere of by balls of radius . This is the same principle as approximating a circle in by tiny line segments , or the surface of a sphere in by tiny discs . It is easy to see that, if we choose small enough, then the approximation is satisfactory for most practical purposes.
To proceed further, we need a lemma from elementary probability theory, which is known as the
coupon collector’s problem Mitzenmacher05 .Lemma 0.
Given a collection of distinct objects, the expected number of independent random trials needed to sample each one of the objects is .
We can now prove the following result.
Theorem 2.
The expected number of uniformly distributed samples needed to generate an dense covering of is where
Proof.
Let be given. We will first count the number of identical copies of balls that are needed to approximate in the sense described above. By elementary calculus one can compute the volume of an to be
where is the gamma function. The surface area of this ball is equal to the derivative with respect to of its volume:
Note that is equal to the surface area of . The approximate number of balls needed to cover the surface of is the ratio of the surface area of to the volume of , i.e.,
The result now follows once we apply Lemma 1 with . ∎
In the light of the following result, we see that the actual number of samples required does not deviate significantly from the value provided by Theorem 2.
Theorem 3.
Let be the number of samples observed before obtaining one in each region. Then, for any constant we have
Proof.
The probability of not obtaining the th region after steps is
By a union bound, the probability that a region has not been obtained after steps is only . ∎
Note that one can use a Chernoff bound to obtain an even tighter bound in Theorem 3 since
See Mitzenmacher05 for details.
We should note that in order to apply Lemma 1 the patches used to cover should be disjoint which is clearly not the case since we have used for this purpose. This leads to an overestimate of the samples needed to obtain a dense covering, and thus the argument presented in this section is only a rough estimate. However, as empirically demonstrated in the previous section, a fixed number of samples as in Seol08 is definitely not sufficient either. To come up with a tight estimate of the number of sample points needed, one has to express as a disjoint union of small patches. The delicacy lies in the requirement that this has to be achieved through a constructive process in a way that the surface area of each patch can be explicitly computed as a function of the dimension , and a characteristic measure . To the best of the authors’ knowledge there is no systematic method in the literature to achieve this.
4 Conclusions
In this paper, we investigated the theoretical and practical aspects of several Euclidean norm approximations in the literature and showed that these are in fact special cases of the weighted cityblock norm. We evaluated the average and maximum errors of these norms using numerical simulations. Finally, we demonstrated that the maximum errors given in a recent study Seol08 are significantly optimistic.
The implementations of the approximate norms described in this paper will be made publicly available at http://www.lsus.edu/faculty/~ecelebi/research.htm.
5 Acknowledgments
This work was supported by a grant from the Louisiana Board of Regents (LEQSF200811RDA12). The authors are grateful to Changkyu Seol for clarifying various points about his paper.
References

(1)
D. Chaudhuri, C.A. Murthy, and B.B. Chaudhuri, “A Modified Metric to Compute Distance,” Pattern Recognition, vol. 25, no. 7, pp. 667–677, 1992.
 (2) F. Rhodes, “On the Metrics of Chaudhuri, Murthy and Chaudhuri,” Pattern Recognition, vol. 28, no. 5, pp. 745–752, 1995.
 (3) M. Barni, F. Bartolini, F. Buti, and V. Cappellini, “Optimum Linear Approximation of the Euclidean Norm to Speed up Vector Median Filtering,” Proc. of the 2nd IEEE Int. Conf. on Image Processing (ICIP’95), pp. 362–365, 1995.
 (4) C. Seol and K. Cheun, “A Low Complexity Euclidean Norm Approximation,” IEEE Trans. on Signal Processing, vol. 56, no. 4, pp. 1721–1726, 2008.
 (5) R.B. King, “Beyond the Quartic Equation,” Birkhäuser Boston, 2008.
 (6) M. Barni, F. Buti, F. Bartolini, and V. Cappellini, “A QuasiEuclidean Norm to Speed up Vector Median Filtering,” IEEE Trans. on Image Processing, vol. 9, no. 10, pp. 1704–1709, 2000.
 (7) V.F. Dem’yanov and V.N. Malozemov, “Introduction to Minimax,” Dover Publications, 1990.
 (8) Y. Ohashi, “Fast Linear Approximations of Euclidean Distance in Higher Dimensions,” in P. Heckbert (Ed.), Graphics Gems IV, Academic Press, 1994.
 (9) T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, “Introduction to Algorithms,” The MIT Press, 2009.
 (10) M.E. Muller, “A Note on a Method for Generating Points Uniformly on NDimensional Spheres,” Communications of the ACM, vol. 2, no. 4, pp. 19–20, 1959.
 (11) M. Mitzenmacher and E. Upfal, “Probability and Computing: Randomized Algorithms and Probabilistic Analysis,” Cambridge University Press, 2005.
Comments
There are no comments yet.