Minimum Enclosing Ball Revisited: Stability and Sub-linear Time Algorithms

04/08/2019
by   Hu Ding, et al.
USTC
0

In this paper, we revisit the Minimum Enclosing Ball (MEB) problem and its robust version, MEB with outliers, in Euclidean space R^d. Though the problem has been extensively studied before, most of the existing algorithms need at least linear time (in the number of input points n and the dimensionality d) to achieve a (1+ϵ)-approximation. Motivated by some recent developments on beyond worst-case analysis, we introduce the notion of stability for MEB (with outliers), which is natural and easy to understand. Under the stability assumption, we present two sampling algorithms for computing approximate MEB with sample complexities independent of the number of input points. Further, we achieve the first sub-linear time approximation algorithm for MEB with outliers. We also show that our idea can be extended to the general case of MEB with outliers ( i.e., without the stability assumption), and obtain a sub-linear time bi-criteria approximation algorithm. Our results can be viewed as a new step along the direction of beyond worst-case analysis.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

04/08/2019

Minimum Enclosing Ball Revisited: Stability, Sub-linear Time Algorithms, and Extension

In this paper, we revisit the Minimum Enclosing Ball (MEB) problem and i...
04/20/2020

A Sub-linear Time Framework for Geometric Optimization with Outliers in High Dimensions

Many real-world problems can be formulated as geometric optimization pro...
05/24/2019

A Practical Framework for Solving Center-Based Clustering with Outliers

Clustering has many important applications in computer science, but real...
11/04/2020

PCP Theorems, SETH and More: Towards Proving Sub-linear Time Inapproximability

In this paper we propose the PCP-like theorem for sub-linear time inappr...
07/15/2020

An Õ(n^5/4) Time ε-Approximation Algorithm for RMS Matching in a Plane

The 2-Wasserstein distance (or RMS distance) is a useful measure of simi...
04/25/2018

Bi-criteria Approximation Algorithms for Minimum Enclosing Ball and k-Center Clustering with Outliers

Motivated by the arising realistic issues in big data, the problem of Mi...
04/25/2018

Solving Minimum Enclosing Ball with Outliers: Algorithm, Implementation, and Application

Motivated by the arising realistic issues in big data, the problem of Mi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Given a set of points in Euclidean space , where could be quite high, the problem of Minimum Enclosing Ball (MEB) is to find a ball with minimum radius to cover all the points in [8, 42, 29]

. MEB is a fundamental problem in computational geometry and finds applications in many fields such as machine learning and data mining. For example, one of the most popular classification models,

Support Vector Machine (SVM), can be formulated as an MEB problem in high dimensional space, and fast MEB algorithms can be adopted to speed up its training procedure [58, 57, 20, 21]. Recently, MEB has also been used for preserving privacy in data analysis [49, 28].

In real world applications, we often need to assume the presence of outliers in given datasets. MEB with outliers is a natural generalization of the MEB problem, where the goal is to find the minimum ball covering at least a certain fraction or number of input points; for example, the ball may be required to cover at least of the points and leave the remaining of points as outliers. The existence of outliers makes the problem not only non-convex but also highly combinatorial; the high dimensionality of the problem further increases its challenge.

The MEB (with outliers) problem has been extensively studied before (a detailed discussion on previous works is given in Section 1.1). However, almost all of them need at least linear time (in terms of and ) to obtain a -approximation. This is not quite ideal, especially in big data where the size of the dataset could be so large that we cannot even afford to read the whole dataset once. This motivates us to ask the following question: is it possible to develop approximation algorithms for MEB (with outliers) that run in sub-linear time in the input size? Designing sub-linear time algorithms has become a promising approach to handle many big data problems and has attracted a great deal of attentions in the past decades [54, 22].

Our idea for designing sub-linear time MEB (with outliers) algorithms is inspired by some recent developments on optimization with respect to stable instances, under the umbrella of beyond worst-case analysis [53]. Many NP-hard optimization problems have shown to be challenging even for approximation, but admit efficient solutions in practice. Several recent works tried to explain this phenomenon and introduced the notion of stability for problems like clustering and max-cut [14, 15, 50, 6]. In this paper, we give the notion of “stability” for MEB. Roughly speaking, an instance of MEB is stable, if the radius of the resulting ball cannot be significantly reduced by removing a small fraction of the input points (e.g., the radius cannot be reduced by if only of the points are removed). The rationale behind this notion is quite natural: if the given instance is not stable, the small fraction of points causing significant reduction in the radius should be viewed as outliers (or they may need to be covered by additional balls, e.g., -center clustering [36, 33]). To the best of our knowledge, this is the first study on MEB (with outliers) from the perspective of stability.

We prove an important implication of the stability assumption that is useful not only for designing sub-linear time MEB (with outliers) algorithms, but also for handling incomplete datasets (Section 3). Using this implication, we propose two sampling algorithms for computing approximate MEB with sample complexities independent of the input size (Section 4). The approximation ratios of both algorithms are in the form of some function ; , where is a small error caused in the computation and is a parameter for measuring the stability (the instance is more stable if is smaller). We further extend the idea to obtain a sub-linear time algorithm for the MEB with outliers problem in Section 5. In Section 6, we consider the general case of MEB with outliers (i.e., without the stability assumption), and propose a sub-linear time bi-criteria approximation algorithm, where the “bi-criteria” means that the ball is allowed to exclude a little more points than the pre-specified number of outliers. Our results are the first sub-linear time approximation algorithms for MEB with outliers with sample sizes independent of the number of points and the dimensionality , which significantly improve the time complexities of existing algorithms.

Note that if we arbitrarily select a point from the input dataset, it will be the center of a

-approximate MEB by the triangle inequality. However, it is challenging to determine the radius of the ball in sub-linear time. In some applications, only estimating the position of the ball center may not be sufficient, and a ball covering all the given points is thus needed. In this paper, we aim to determine not only the center of the ball, but also its radius, in sub-linear time.

1.1 Related Works

The works most related to ours are [5, 21]. Alon et al. [5] studied the following property testing problem: given a set of points in some metric space, determine whether the instance is -clusterable, where an instance is called -clusterable if it can be covered by balls with radius (or diameter) . They proposed several sampling algorithms to answer the question “approximately”. Particularly, they distinguish between the case that the instance is -clusterable and the case that it is -far away from -clusterable, where and . “-far” means that more than points should be removed so that it becomes -clusterable. Although MEB is a special case of -center clustering with , their method cannot yield a single-criterion approximation algorithm for MEB (with outliers), since it will introduce an unavoidable error on the number of covered points due to the relaxation of “-far”. However, it is possible to convert it into a bi-criteria approximation algorithm for MEB with outliers (as defined in Section 2); but its sample size depends on the dimensionality (a similar result was also presented in [37]). Our bi-criteria approximation algorithm presented in Section 6 has the sample size independent of both and . Note that Alon et al. showed in [5] another property testing algorithm with sample size independent of , but it is challenging to be used to solve the MEB with outliers problem, to our best knowledge.

Clarkson et al. [21]

developed an elegant perceptron framework for solving several optimization problems arising in machine learning, such as MEB. For a set of

points in represented as an matrix with non-zero entries, their framework can solve the MEB problem in 111The asymptotic notation . time. Note that the parameter “” is an additive error (i.e., the resulting radius is if is the radius of the optimal MEB) which can be converted into a relative error (i.e., ) in preprocessing time. Thus, if , the running time is still sub-linear in the input size . Our algorithms have different sub-linear time complexities which are independent of the number of input points.

MEB and MEB with outliers. A core-set [1] is a small set of points that approximates the structure/shape of a much larger point set, and thus can be used to significantly reduce the time complexities for many optimization problems (the reader is referred to a recent survey [52] for more details on core-sets). The core-set idea has also been used to approximate the MEB problem in high dimensional space [10, 42]. Bădoiu and Clarkson [8] showed that it is possible to find a core-set of size that yields a -approximate MEB ; later, they [9] further proved that actually only points are sufficient, but their core-set construction is more complicated. In fact, the algorithm for computing the core-set of MEB is a Frank-Wolfe style algorithm [30], which has been systematically studied by Clarkson [20]. There are also several exact and approximation algorithms for MEB that do not rely on core-sets  [29, 55, 4, 51]. Most of these algorithms have linear time complexities. Agarwal and Sharathkumar [2] presented a streaming -approximation algorithm for MEB; later, Chan and Pathak [17] proved that the same algorithm has an approximation ratio less than .

Bădoiu et al. [10] extended their core-set idea to the problem of MEB with outliers and achieved a bi-criteria approximation. Several algorithms for the low dimensional MEB with outliers problem have also been developed [3, 26, 34, 44]. There are several existing works on -center clustering with outliers [19, 45, 18] and streaming MEB with outliers [60].

Optimizations under stability. Bilu and Linial [15] showed that the Max-Cut problem becomes easier if the given instance is stable with respect to perturbation on edge weights. Ostrovsky et al. [50] proposed a separation condition for -means clustering which refers to the scenario where the clustering cost of -means is significantly lower than that of

-means for a given instance, and demonstrated the effectiveness of the Lloyd heuristic

[43] under the separation condition. Balcan et al. [14] introduced the concept of approximation-stability for finding the ground-truth of -median and -means clustering. Awasthi et al. [6] introduced another notion of clustering stability and gave a PTAS for -median and -means clustering. More algorithms on clustering problems under stability assumption were studied in [7, 13, 12, 11, 41].

Sub-linear time algorithms. Indyk presented sub-linear time algorithms for several metric space problems, such as -median clustering [38] and -clustering [39]. More sub-linear time clustering algorithms have been studied in [46, 47, 23]. Another important motivation for designing sub-linear time algorithms is property testing. For example, Goldreich et al. [32] focused on using small sample to test some natural graph properties. More detailed discussion on sub-linear time algorithms can be found in the survey papers [54, 22].

2 Definitions and Preliminaries

In this paper, we let denote the number of points of a given point set in , and denote the Euclidean distance between two points and in . We use to denote the ball centered at a point with radius . Below, we first give the definitions of MEB and the property of stability.

Definition 1 (Minimum Enclosing Ball (MEB))

Given a set of points in , the MEB problem is to find a ball with minimum radius to cover all the points in . The resulting ball and its radius are denoted by and , respectively.

A ball is called a -approximation of for some , if the ball covers all points in and has radius .

Definition 2 ((, )-stable)

Given a set of points in with two small parameters and in , is an (, )-stable instance if for any with .

Intuitively, the property of stability indicates that cannot be significantly reduced after removing any small fraction of points from . For a fixed , the smaller is, the more stable becomes. Actually, our stability assumption is quite reasonable in practice. For example, if the radius of MEB can be reduced considerably (say by ) after removing only a small fraction (say ) of points, it is natural to view the small fraction of points as outliers. Another intuition of stability is shown in Section 7, which says that if the distribution of is dense enough and is fixed, will tend to as increases. Moreover, the stability property implies that the MEB of a stable instance locates stably in the space, even a small fraction of points are missed (we prove this implication in Section 3).

Definition 3 (MEB with Outliers)

Given a set of points in and a small parameter , the MEB with outliers problem is to find the smallest ball that covers points. Namely, the task is to find a subset of with size such that the resulting MEB is the smallest among all possible choices of the subset. The obtained ball is denoted by .

For convenience, we use to denote the optimal subset of with respect to . That is, . From Definition 3, we can see that the main issue is to determine the subset of . Actually, solving such combinatorial problems involving outliers are often challenging. For example, Mount et al. [48]

showed that any approximation for linear regression with

points and outliers requires time under the assumption of the hardness of affine degeneracy [27]; they then turned to find an efficient bi-criteria approximation algorithm instead. Similarly, we also design a bi-criteria approximation for the general case of the MEB with outliers problem.

Definition 4 (Bi-criteria Approximation)

Given an instance for MEB with outliers and two small parameters , a -approximation of is a ball that covers at least points and has radius at most .

When both and are small, the bi-criteria approximation is very close to the optimal solution with only slight changes on the number of covered points and the radius.

We also extend the stability property of MEB to MEB with outliers.

Definition 5 ((, )-stable for MEB with Outliers)

Given an instance of the MEB with outliers problem in Definition 3, is an (, )-stable instance if for any with .

Definition 5 directly implies the following claim.

Claim 1

If is an (, )-stable instance of the problem of MEB with outliers, the corresponding is an (, )-stable instance of MEB.

To see the correctness of Claim 1, we can use contradiction. Suppose that there exists a subset such that and . Then, it is in contradiction to the fact that is an -stable instance of MEB with outliers.

2.1 A More Careful Analysis for Core-set Construction in [8]

Before presenting our main results, we first revisit the core-set construction algorithm for MEB by Bădoiu and Clarkson [8], since their method will be used in our algorithms for MEB (with outliers).

Let . The algorithm of Bădoiu and Clarkson [8] yields an MEB core-set of size (for convenience, we always assume that is an integer). However, there is a small issue in their paper. The analysis assumes that the exact MEB of the core-set is computed in each iteration, but instead one may only compute an approximate MEB. Thus, an immediate question is whether the quality is still guaranteed with such a change. Kumar et al. [42] fixed this issue, and showed that computing a -approximate MEB for the core-set in each iteration still guarantees a core-set with size , where the hidden constant is . Increasing the core-set size from to

is neglectable in asymptotic analysis. But in Section 

6, we will show that it could cause serious issues if outliers exist. Hence, a core-set of size is still desirable. For this purpose, we will provide a new analysis below.

For the sake of completeness, we first briefly introduce the idea of the core-set construction algorithm in [8]. Given a point set , the algorithm is a simple iterative procedure. Initially, it selects an arbitrary point from and places it into an initially empty set . In each of the following iterations, the algorithm updates the center of and adds to the farthest point from the current center of . Finally, the center of induces a -approximation for . The selected set of points (i.e., ) is called the core-set of MEB. To ensure the expected improvement in each iteration,  [8] showed that the following two inequalities hold if the algorithm always selects the farthest point to the current center of :

(1)

where and are the radii of in the -th and -th iterations, respectively, and is the shifting distance of the center of from the -th to -th iteration.

Figure 1: An illustration of (2).

As mentioned earlier, we often compute only an approximate in each iteration. In -th iteration, we let and denote the centers of the exact and the approximate , respectively. Suppose that , where (we will see why this bound is needed later). Note that we only compute rather than in each iteration. As a consequence, we can only select the farthest point (say ) to . If , we are done and a -approximation of MEB is already obtained. Otherwise, we have

(2)

by the triangle inequality (see Figure 1). In other words, we should replace the first inequality of (1) by . Also, the second inequality of (1) still holds since it depends only on the property of the exact MEB (see Lemma 2.1 in [8]). Thus, we have

(3)

This leads to the following theorem whose proof can be found in Section 8.

Theorem 2.1

In the core-set construction algorithm of [8], if one computes an approximate MEB for in each iteration and the resulting center has the distance to less than for some , the final core-set size is bounded by . Also, the bound could be arbitrarily close to when is small enough.

Remark 1

We want to emphasize a simple observation on the above core-set construction procedure, which will be used in our algorithms and analysis later on. The above core-set construction algorithm always selects the farthest point to in each iteration. However, this is actually not necessary. As long as the selected point has distance at least , the inequality (2) always holds and the following analysis is still true. If no such a point exists (i.e., ), a -approximate MEB (i.e., ) has already been obtained.

3 Implication of the Stability Property

Figure 2: We expand , and the larger ball is an approximate MEB of .

In this section, we show an important implication of the stability property described in Definition 2.

Theorem 3.1

Let be an (, )-stable instance of the MEB problem, and be the center of its MEB. Let and be a given point in . If the ball covers at least points from and , the following holds

(4)

Theorem 3.1 indicates that if a ball covers a large enough subset of and its radius is bounded, its center should be close to the center of . Furthermore, the more stable the instance is (i.e., is smaller), the closer the two centers are. Actually, besides using it to design our sub-linear time MEB algorithms later, Theorem 3.1 is also useful in other practical scenarios. For example, if we miss points from , we can compute a -approximate MEB of the remaining points, denoted by the obtained ball. Since the ball is a -approximate MEB of a subset of , we have . Moreover, due to Definition 2, we know . Together with Theorem 3.1, we have

(5)

and the radius

(6)

That is, the ball is a -approximate MEB of (see Figure 2). Note that we cannot directly use since we do not know the value of . Even if we have missed points, we are still able to compute an approximate MEB of through Theorem 3.1. But this approach has a time complexity of . In Section 4, we will present sub-linear time algorithms for this scenario.

Now, we prove Theorem 3.1. Let . To bound the distance between and , we need to bridge them by the ball . Let be the center of . The following are two key lemmas to the proof.

Lemma 1

The distance .

Proof

We consider two cases: is totally covered by and otherwise. For the first case (see Figure (a)a), it is easy to see that

(7)

where the first inequality comes from the fact that has radius at least (Definition 2), and the last inequality comes from the fact that . Thus, we can focus on the second case below.

(a)
(b)
(c)
(d)
Figure 7: (a) The case ; (b) an illustration of Claim 2; (c) the angle ; (d) an illustration of Lemma 2.

Let be any point locating on the intersection of the two spheres of and . Consequently, we have the following claim.

Claim 2

The angle .

Proof

Suppose that . Note that is always smaller than since . Therefore, and

are separated by the hyperplane

that is orthogonal to the segment and passing through the point . See Figure (b)b.

Now we show that can be covered by a ball smaller than . Let be the point , and (resp., ) be the point collinear with and on the right side of the sphere of (resp., left side of the sphere of ; see Figure  (b)b). Then, we have

(8)

Similarly, we have . Consequently, is covered by the ball . Further, because is covered by and , is covered by the ball that is smaller than . This contradicts to the fact that is the minimum enclosing ball of . Thus, the claim is true. ∎

Given Claim 2, we know that . See Figure (c)c. Moreover, Definition 2 implies that . Therefore, we have

(9)

Lemma 2

The distance .

Proof

Let be the hyperplane orthogonal to the segment and passing through the center . Suppose locates in the left side of . Then, there exists a point such that locates on the right closed semi-sphere of divided by (this result was proved in [31, 10] and see Lemma 2.2 in [10]; for completeness, we also state the lemma in Section 9). See Figure (d)d. That is, the angle . As a consequence, we have

(10)

Moreover, since and , (10) implies that . ∎

By triangle inequality and Lemmas 1 and 2, we immediately have

(11)

This completes the proof of Theorem 3.1.

4 Sub-linear Time Algorithms for MEB

Using Theorem 3.1, we present two different sub-linear time sampling algorithms for computing MEB. The first one is simpler, but has a sample size depending on the dimensionality , while the second one has a sample size independent of both and .

4.1 The First Sampling Algorithm

Algorithm 1 is based on the theory of VC dimension and -net [59, 35]. Roughly speaking, we compute an approximate MEB of a small random sample (i.e., ), and expand the ball slightly; then we prove that this expanded ball is an approximate MEB of the whole data set. The key idea is to show that covers at least points and therefore is close to the optimal center by Theorem 3.1. Due to space limit, we leave the proof of Theorem 4.1 to Section 10.

0:  An (, )-stable instance of MEB problem in ; a small parameter .
1:  Randomly select a set of points from .
2:  Apply any approximate MEB algorithm (such as the core-set based algorithm [8]) to compute a -approximate MEB of , and let the resulting ball be .
3:  Output the ball .
Algorithm 1 MEB Algorithm i@
Theorem 4.1

With constant probability, Algorithm 

1 returns a -approximate MEB of , where

(12)

The running time is .

4.2 The Second Sampling Algorithm

In this section, we present our second MEB algorithm which has a sample size independent of both and . To better understand the algorithm, we briefly overview the high level idea below.

Figure 8: An illustration of Lemma 3; the red points are the set of sampled points.

High level idea: Recall our remark below Theorem 2.1 in Section 2.1. If we know the value of , we can perform almost the same core-set construction procedure described in Theorem 2.1 to achieve an approximate center of , where the only difference is that we add a point with distance at least to in each iteration. In this way, we avoid selecting the farthest point to , since this operation will inevitably have a linear time complexity. To implement our strategy in sub-linear time, we need to determine the value of first. Based on the stability property, we observe that the core-set construction procedure can serve as an “oracle” to help us guess the value of (see Algorithm 2). Let be a candidate. We add a point with distance at least to in each iteration. We prove that the procedure cannot continue more than iterations if , and will continue more than iterations with certain probability if , where is the size of core-set described in Theorem 2.1. First, we use Lemma 3 to estimate the range of , and then perform a binary search on the range to determine the value of approximately. Also, during the procedure of core-set construction, we add the points to the core-set via random sampling, rather than a deterministic way. As a consequence, by using the property of stability, we can prove that the whole complexity is independent of the input size .

Lemma 3

Let be an (, )-stable instance of MEB problem. Given a parameter , one selects an arbitrary point and takes a random sample with . Let be the point farthest to from . Then, with probability ,

(13)
Proof

First, the lower bound of is obvious since is always no larger than . Then, we consider the upper bound. Let be the ball covering exactly points of , and thus according to Definition 2. To proceed our proof, we also need the following folklore lemma presented in [25].

Lemma 4

[25] Let be a set of elements, and be a subset of with size for some . If one randomly samples elements from , then with probability at least , the sample contains at least one element of for any .

In Lemma 4, Let and be the point set and the subset , respectively. We know that contains at least one point from according to Lemma 4. Namely, contains at least one point outside . See Figure 8. As a consequence, we have , i.e., . ∎

Note that Lemma 3 directly implies the following result.

Theorem 4.2

In Lemma 3, the ball is a -approximate MEB of , with probability .

Proof

From the upper bound in Lemma 3, we know that . It implies that the ball covers the whole point set . From the lower bound in Lemma 3, we know that . Therefore, it is a -approximate MEB of . ∎

Since in Lemma 3, Theorem 4.2 indicates that we can easily obtain a -approximate MEB of in time. We further show our second sampling algorithm (Algorithm 3) that achieves a lower approximation ratio. Algorithm 2 serves as a subroutine in Algorithm 3. In Algorithm 2, we simply set with as described in Theorem 2.1; we compute having distance less than to the center of in Step 2(1).

0:  An (, )-stable instance of MEB problem in ; two small parameters and , , and a positive integer .
1:  Initially, arbitrarily select a point and let .
2:  ; repeat the following steps:
  1. Compute an approximate MEB of and let the ball center be .

  2. Randomly select a subset with .

  3. Select the point that is farthest to , and add it to .

  4. If , stop the loop and output “yes”.

  5. ; if , stop the loop and output “no”.

Algorithm 2 Oracle on
Lemma 5

If , Algorithm 2 returns “yes”; else if , Algorithm 2 returns “no” with probability at least .

Proof

First, we assume that . Recall the remark following Theorem 2.1. If we always add a point with distance at least to , the loop 2(1)-(5) cannot continue more than iterations, i.e., Algorithm 2 will return “yes”.

Now, we consider the case . Similar to the proof of Lemma 3, we consider the ball covering exactly points of . We know that according to Definition 2. Also, with probability , contains at least one point outside from Lemma 4. By taking the union bound, with probability , is always larger than and Algorithm 2 will return “no”. ∎

0:  An (, )-stable instance of MEB problem in ; two small parameters and and a positive integer ; the interval for obtained by Lemma 3.
1:  Among the set where , perform binary search for the value by using Algorithm 2 with .
2:  Suppose that Algorithm 2 returns “no” when and returns “yes” when .
3:  Run Algorithm 2 again with and ; let be the resulting ball center of when the loop stops.
4:  Return the ball , where .
Algorithm 3 MEB Algorithm ii@
Theorem 4.3

With probability , Algorithm 3 returns a -approximate MEB of , where

(14)

and . The running time is , where .

Proof

Since Algorithm 2 returns “no” when and returns “yes” when , we know that

(15)
(16)

from Lemma 5. The above inequalities together imply that

(17)

Thus, when running Algorithm 2 with in Step 3, the algorithm returns “yes” (by the right hand-side of (17)). Then, consider the ball . We claim that . Otherwise, the sample contains at least one point outside with probability in Step 2(2) of Algorithm 2, i.e., the loop will continue. Thus, it contradicts to the fact that the algorithm returns “yes”. Let , and then . Moreover, the left hand-side of (17) indicates that

(18)

Now, we can apply Theorem 3.1, where the only difference is that we replace the “” by “” in the theorem. Let be the center of . Consequently, we have

(19)

For simplicity, we let and . Hence, and via (18) and (19). From (19), we know that . From the right hand-side of (17), we know that . Thus, we have

(20)

where . Also, the radius

(21)

This means that is a -approximate MEB of with .

As the subroutine, Algorithm 2 runs in time; Algorithm 3 calls the subroutine times. Note that . Thus, the total running time is .

The success probability of Algorithm 2 is . We set in Step 1 and in Step 3, respectively. Therefore, we take the union bound and the success probability of Algorithm 3 is . ∎

5 Sub-linear Time Algorithm of MEB with Outliers for Stable Instances

Key idea: Our result is an extension of Theorem 4.2, but needs a more complicated analysis. A key step is to estimate the range of . In Lemma 3, we can estimate the range of via a simple sampling procedure. However, this idea cannot be applied to the case with outliers, since the farthest sampled point could be an outlier. To address this issue, we imagine two balls centered at (recall in the proof of Lemma 3, we only consider one ball as in Figure 8) with two carefully chosen radii (see Figure 9). Intuitively, these two balls guarantee a large enough gap such that there exists at least one sampled point, say , falling in the ring between the two spheres. Moreover, together with the stability property described in Definition 5, we can show that provides a range of in Lemma 6.

We would like to emphasize that the idea of building a ring by two balls is also a key technique for designing our sub-linear time algorithm for general instances in Section 6.

Lemma 6

Let be an -stable instance of MEB with outliers, and be a point randomly selected from . Let be a random sample from with size for some . Then, if is the -th farthest point to in , where , the following holds with probability ,

(22)
Proof
Figure 9: An illustration of Lemma 6.

First, we assume that (note that this happens with probability ). We consider two balls and such that

(23)