    # Approximation Algorithms For The Dispersion Problems in a Metric Space

In this article, we consider the c-dispersion problem in a metric space (X,d). Let P={p_1, p_2, …, p_n} be a set of n points in a metric space (X,d). For each point p ∈ P and S ⊆ P, we define cost_c(p,S) as the sum of distances from p to the nearest c points in S ∖{p}, where c≥ 1 is a fixed integer. We define cost_c(S)=min_p ∈ S{cost_c(p,S)} for S ⊆ P. In the c-dispersion problem, a set P of n points in a metric space (X,d) and a positive integer k ∈ [c+1,n] are given. The objective is to find a subset S⊆ P of size k such that cost_c(S) is maximized. We propose a simple polynomial time greedy algorithm that produces a 2c-factor approximation result for the c-dispersion problem in a metric space. The best known result for the c-dispersion problem in the Euclidean metric space (X,d) is 2c^2, where P ⊆ℝ^2 and the distance function is Euclidean distance [ Amano, K. and Nakano, S. I., Away from Rivals, CCCG, pp.68-71, 2018 ]. We also prove that the c-dispersion problem in a metric space is W-hard.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In facility location problem(FLP), we are given a set of locations on which some desired facilities to be placed and a positive integer . The goal is to place facilities on locations out of given locations such that specific objective is satisfied. Here, specific objective depends on the nature of the problem. Suppose that the objective is to place facilities on locations, such that the closeness of chosen locations are undesirable. By closeness, we mean distance between a pair of facilities. We refer this FLP as a dispersion problem. More specifically in the dispersion problem, we wish to minimize the interference between the placed facilities. One of the most studied dispersion problems is the max-min dispersion problem.

In the max-min dispersion problem, we are given a set of locations, the non-negative distance between each pair of locations , and a positive integer (). Here, refers to the number of facilities to be opened and distances are assumed to be symmetric. The objective is to find a size subset of locations such that is maximized. This problem is known as -dispersion problem in the literature.

With reference to above mentioned problem, we extend the concept of closeness of a point from one closest neighbor to given some specified number of closest neighbor. Therefore, to preserve this notion of closeness of a point , we need to add distances between point and its specified number of nearest neighbors. We refer to such problem as -dispersion problem. Now, we define -dispersion problem in a metric space as follows:

c-dispersion problem: Let be a set of points in a metric space . For each point and , we define as the sum of distances from to the first c nearest points in . We also define for each . In the -dispersion problem, a set of points in a metric space and a positive integer are given. The objective is to find a subset of size such that is maximized.

In the real world, the dispersion problem has a huge number of applications. The possibility of opening chain stores in a community has piqued our interest in the dispersion problem. We need to open stores that are far apart from each other to eliminate/prevent self-competition among the stores. Installing dangerous facilities, such as nuclear power plants and oil tanks, is another condition in which dispersion is a concern. These facilities must be dispersed to the greatest extent possible, so that an accident at one site does not affect others. The dispersion problem is often used in information retrieval when we try to find a small subset of data with some desired variety from a large data set so that the small subset can be used as a valid sample to overview the large data set.

## 2 Related Work

In 1977, Shier  studied the max-min dispersion problem on trees, and linked the problem with the -center problem. In , Erkut proved that the max-min dispersion problem is NP-hard even if the distance function satisfies triangular inequality. In the geometric settings, the max-min dispersion problem was first introduced by Wang and Kuo . They consider the problem in a -dimensional space with Euclidean distance function between two points. They proposed a dynamic programming algorithm that solves the problem in time for . They also proved that for , the problem is NP-hard. White  studied the max-min dispersion problem and proposed a -factor approximation result. In 1994, Ravi et al.  studied the max-min dispersion problem and proposed a -factor approximation algorithm when the distance function satisfies triangular inequality. Furthermore, they also demonstrated that unless , the max-min dispersion problem cannot be approximated within a factor of even if the distance function satisfies the triangular inequality. Recently, in , the exact algorithm for the max-min dispersion problem was shown by establishing a relationship with the maximum independent set problem. They proposed an time algorithm, where . In , Akagi et al. also studied two special cases of the max-min dispersion problem where set of points (1) lies on a line, and (2) lies on a circle. They proposed a polynomial time exact algorithm for both the cases.

The max-sum -dispersion problem is another popular variant of the dispersion problem. Here, the objective is to maximize the sum of distances between facilities. Ravi et al.  gave a polynomial time exact algorithm when the points are placed on a line. They also proposed a -factor approximation algorithm if the distance function satisfies triangular inequality. In , they also proposed a -factor approximation algorithm for -dimensional Euclidean space, where . In  and , the approximation factor of was improved to . One can see  and  for other variations of the dispersion problems.

In comparison with max-min dispersion (-dispersion) problem, a handful amount of research has been done in -dispersion problem in a metric space . Recently, in 2018, Amano and Nakano  proposed a greedy algorithm for the Euclidean -dispersion problem in , where the distance function between two points is the Euclidean distance. They have shown that the proposed greedy algorithm produces an -factor approximation result for the Euclidean -dispersion problem in . In 2020,  they analyzed the same greedy algorithm proposed in , and shown that the greedy algorithm produces a -factor approximation result for the Euclidean -dispersion problem. In , they also proposed a approximation result for the Euclidean -dispersion problem in .

### 2.1 Our Contribution

In this article, we consider the -dispersion problem in a metric space and propose a simple polynomial time -factor approximation algorithm for a fixed . We also proved that the -dispersion problem in a metric space is -hard.

### 2.2 Organization of the Paper

The remainder of the paper is organized as follows. In Section 3, we propose a -factor approximation algorithm for the -dispersion problem in a metric space. In Section 4, we prove that the -dispersion problem in a metric space is -hard. We conclude the paper in Section 5.

## 3 2c-Factor Approximation Algorithm for the c-Dispersion Problem in Metric Space

In this section, we propose a greedy algorithm for the -dispersion problem in a metric space . We will show that this algorithm guarantees -factor approximation result for the -dispersion problem in a metric space. Now, we discuss the greedy algorithm as follows. Let be an arbitrary instance of the -dispersion problem in a metric space , where is the set of points and is a positive integer. It is an iterative algorithm. Initially, we choose a subset of size such that is maximized. Next, we add one point into to construct , i.e., , such that is maximized and continues this process up to the construction of . The pseudo code of the algorithm is described in Algorithm 1.

Let be an optimum solution for the -dispersion problem, i.e., . We define a ball at each point as follows: . Let . A point is properly contained in , if , whereas if , then we say that point is contained in .

###### Lemma 3.1.

For any point , can properly contains at most points from the optimal set .

###### Proof.

On the contrary assume that properly contains points. Without loss of generality assume that are properly contained in . This implies that each of is less than . Since distance function satisfies triangular inequality, , for . This implies . This leads to a contradiction that . Thus, the lemma.

###### Lemma 3.2.

For any point , if is the set of balls that properly contains , then .

###### Proof.

On the contrary assume that . Without loss of generality assume that are balls that properly contains . Here, . Since is properly contained in , therefore each is less than . So, the ball properly contains points of the optimal set , which is a contradiction to Lemma 3.1. Thus, the lemma. ∎

###### Lemma 3.3.

Let be a set of points such that . If , then there exists at least one ball that properly contains less than number of points in .

###### Proof.

On the contrary assume that there does not exist any such that properly contains less than number of points in . Construct a bipartite graph as follows: (i) and are two partite vertex sets, and (ii) for , if and only if is properly contained in .

According to assumption, each ball properly contains at least points in . Therefore, the total degree of the vertices in in is at least . Note that . On the other hand, the total degree of the vertices in in is at most (see Lemma 3.2). Since , the total degree of the vertices in in is less than , which leads to a contradiction that the total degree of the vertices in in is at least . Thus, there exists at least one such that ball properly contains at most points in . ∎

###### Lemma 3.4.

The running time of Algorithm 1 is .

###### Proof.

In line number 1, algorithm computes such that is maximized. To compute it, algorithm calculates for each distinct subset independently. So, algorithm invests time to compute such that is maximized. Note that the value of is fixed. Now, for choosing a point in each iteration, algorithm takes time. Here, the number of iteration is bounded by . So, to construct a set of size from , algorithm takes time. Since line number 1 of Algorithm 1 takes a substantial amount of time compared to other steps of the algorithm, therefore the overall time complexity is . ∎

###### Theorem 3.5.

Algorithm 1 produces -factor approximation result in polynomial time for the -dispersion problem.

###### Proof.

Let be an arbitrary input instance of the -dispersion problem in a metric space , where is the set of points and is a positive integer. Let and be an output of Algorithm 1 and an optimum solution, respectively, for the instance . To prove the theorem, we show that . Here we use induction to show that for each .

Since is an optimum solution for points (see line number 1 of Algorithm 1), therefore holds. Now, assume that the condition holds for each such that . We will prove that the condition holds for too.

Let be the set of balls corresponding to points in . Since and with condition , then there exists at least one ball that properly contains at most points in (see Lemma 3.3). Now, if properly contains points of the set , then the distance of to the -th closest point in is greater than or equal to . Now, if we choose a point to the set to construct set (line number 3 of the algorithm), then . Let be an arbitrary point. Now, to prove , we consider following cases: (1) is not in the -th nearest point of in , and (2) is one of the nearest points of in . In case (1), by the definition of the set , and in case (2) there exists at least one point such that , and is one of the nearest points of (see Lemma 3.2). Therefore, sum of the distances of from nearest point in is greater than . Therefore, we can conclude that if we consider the set , then .

Since our algorithm chooses a point (see line number 3 of Algorithm 1) that maximizes , therefore it will always choose a point in the iteration such that .

By the help of Lemma 3.1, Lemma 3.2 and Lemma 3.3, we have and thus condition holds for too. Also, Lemma 3.4 says that Algorithm 1 computes in polynomial time. Therefore, Algorithm 1 produces -factor approximation result in polynomial time for the -dispersion problem.

## 4 c-Dispersion Problem is W-hard

In this section, we discuss the hardness of the -dispersion problem in a metric spaces in the realm of parameterized complexity. We prove that the -dispersion problem in a metric spaces is -hard. We show a parameterized reduction from -independent set problem (known to be -hard ) to the -dispersion problem in .

We define parameterized version of both the problems as follows.

k-Independent Set Problem
Instance: A graph and a positive integer .
Parameter:
Problem: Does there exist an independent set of size in ?

-Dispersion Problem
Instance: A set of locations and a positive integer .
Parameter:
Problem: Given a bound , does there exist a subset of size such that is ?

###### Theorem 4.1.

-dispersion problem in a metric space is -hard.

###### Proof.

We prove this by giving a parameterized reduction from the -independent set problem in simple undirected graphs to the -dispersion problem in . Now, we present a method to construct an instance of -dispersion problem from any instance of the -independent set problem in polynomial time.

Let be an arbitrary instance of the -independent set problem. Here, . We construct an instance of the -dispersion problem from the given instance of the -independent set problem. We use set of vertices of as a set of locations , i.e., of points. We define distance between points as follows: if , and , otherwise. Note that this distance function satisfies triangle inequality. So, the entire process of constructing an instance of the -dispersion problem takes polynomial time.

Claim. has independent set of size if and only if there exists a subset of size , such that .

Necessity: Let be an independent set of such that . We construct a set by selecting points in corresponding to vertices in , i.e., . Since, is an independent set, therefore by construction of an instance of the -dispersion problem from , distance between any two points in is . This implies that for each , . Therefore, .

Sufficiency: Suppose there exists a subset , such that . Since , this implies that there exists a point such that and for all , . Now if for a point , , then by pigeon hole principle, distance of to one of the nearest points in is greater than 2, which is not possible as per our construction of an instance of the -dispersion problem. So, for all points , . Now, we can create a set by selecting vertices corresponding to each point in , i.e., . Since distance between each pair of points is , therefore there does not exist any edge in . Therefore, is an independent set of size .

Since -independent set problem is -hard for a parameter , and therefore using the above reduction we conclude that the -dispersion problem in a metric space is also -hard for the same parameter . Thus, the theorem. ∎

## 5 Conclusion

In this article, we studied the -dispersion problem in a metric space. We presented a polynomial time -factor approximation algorithm for the -dispersion problem in a metric space. The best known approximation factor available for the Euclidean -dispersion problem in is . For , the proposed algorithm will produce 2-factor approximation result, which is same as the result in . Therefore, our proposed algorithm is a generalized version and provide a better approximation result for the problem. We also proved that the -dispersion problem in a metric space is W-hard.

## References

•  Akagi, Toshihiro and Araki, Tetsuya and Horiyama, Takashi and Nakano, Shin-ichi and Okamoto, Yoshio and Otachi, Yota and Saitoh, Toshiki and Uehara, Ryuhei and Uno, Takeaki and Wasa, Kunihiro. Exact algorithms for the max-min dispersion problem. International Workshop on Frontiers in Algorithmics, pp. 263–272, 2018.
•  Amano, Kazuyuki and Nakano, Shin-Ichi. Away from Rivals. CCCG, pp. 68–71, 2018.
•  Amano, Kazuyuki and Nakano, Shin-Ichi. An Approximation Algorithm for the 2-Dispersion Problem. IEICE Transactions on Information and Systems, 103(3): 506–508, 2020.
•  Baur, Christoph and Fekete, Sándor P. Approximation of geometric dispersion problems. Algorithmica, 30(3):451–470, 2001.
•  Birnbaum, Benjamin and Goldman, Kenneth J. An improved analysis for a greedy remote-clique algorithm using factor-revealing LPs. Algorithmica, 55(1):42–59, 2009.
•  Chandra, Barun and Halldórsson, Magnús M. Approximation algorithms for dispersion problems. Journal of algorithms, 38(2):438–465, 2001.
•  Erkut, Erhan. The discrete p-dispersion problem. European Journal of Operational Research, 46(1):48–60, 1990.
•  Flum, Jörg and Grohe, Martin. Parameterized complexity theory. Springer Science & Business Media, 2006.
•  Hassin, Refael and Rubinstein, Shlomi and Tamir, Arie. Approximation algorithms for maximum dispersion. Operations research letters, 21(3):133–137, 1997.
•  Shier, Douglas R. A min-max theorem for p-center problems on a tree. Transportation Science, 11(3):243–252, 1977.
•  Ravi, Sekharipuram S and Rosenkrantz, Daniel J and Tayi, Giri Kumar. Heuristic and special case algorithms for dispersion problems. Operations Research, 42(2):299–310, 1994.
•  Wang, DW and Kuo, Yue-Sun. A study on two geometric location problems. Information processing letters, 28(6):281–286, 1988.
•  White, Douglas J. The maximal-dispersion problem. IMA Journal of Mathematics Applied in Business and Industry, 3(2):131–140, 1991.