1.1 Background and Motivation
In min-max -gathering problem, we are given a metric space that contains several users and facilities . We can open some facilities and assign each user to an opened facility so that each opened facility has so least users. The objective of the problem is to minimize the maximum distance between the facilities and the assigned users .
This problem has an application in shelter evacuation problem : There are people and evacuation shelters, and we divide the people into shelters so that all people can evacuate in minimum possible time. Each shelter must have at least people to maintain their lives in shelters. The problem also has an application to privacy protection . A set of clusters satisfies -anonymity if each cluster has at least users; this condition prevents reconstructing personal information from the clustering.
Several tractability and intractability results are known. There is a polynomial-time -approximation algorithm for a general metric space , and no better approximation ratio can be achieved unless P=NP . If is a line, we can solve the problem exactly by dynamic programming (DP) [4, 7, 9], where the fastest algorithm runs in linear-time . When is a spider, which is a metric space constructed by joining half-lines at their endpoints, Ahmed et al.  proposed a fixed-parameter tractable algorithm parameterized by and the degree of the center. In our co-submitted paper , the authors showed the problem is NP-hard if is a spider, and the problem admits a fixed-parameter tractable algorithm parameterized by .
1.2 Our Contribution
The goal of this study is to explore the boundary of tractability of the min-max -gathering problem. Specifically, we consider the problem on tree, which is a natural graph class that contains spiders as a subclass.
It is easy to see that the problem does not admit a fully polynomial-time approximation scheme (FPTAS) (see Proposition 2.1). Therefore, the best-possible positive result that we can expect is a polynomial-time approximation scheme (PTAS). Our main contribution is to establish PTAS for this problem as follows. There exists an algorithm for the min-max -gathering problem on a tree so that for any it outputs a solution with an approximation ratio of in time. The proposed algorithm seeks the optimal value by a binary search, and in each step, it solves the corresponding decision problem by a DP on a tree. Here, the most difficult part is establishing an algorithm for the decision problem.
This technique can also be applied to other problems, for example, -gathering problem and -gathering problem with a constraint on the number of open facilities. It can also be shown that these problems are NP-hard and do not admit FPTAS unless P=NP by the same reduction. Thus, these are also tight results.
On the other hand, there are variants of -gathering, which can be solved exactly in polynomial time on a tree. We provide polynomial time algorithms via DP for two problems: min-sum -gathering problem and min-max (and min-sum) -gathering with proximity requirement.
The rest of the paper is organized as follows. In section 2, we give a PTAS for min-max -gathering problem on a tree. We also show the problems which admit essentially same PTAS. In subsection 3.1, we provide the polynomial-time algorithm which solves the min-sum version of -gathering problem exactly on a tree. Finally, in 3.2, we provide the polynomial-time algorithm which solves the min-max (and min-sum) -gathering with proximity requirement exactly on a tree.
2 PTAS for min-max -Gathering on Tree
A weighted tree is an undirected connected graph without cycles, where is the set of vertices, is the set of edges, and is the non-negative edge length. forms a metric space by the tree metric , which is the sum of the edge lengths on the unique simple - path for any vertices . We consider the min-max -clustering problem on this metric space.
Without loss of generality, we assume that all users and facilities are located on different vertices; otherwise, we add new vertices connected with edges of length zero and separate the users/facilities into the new vertices. By performing similar operations, we also assume that is a rooted full binary tree rooted at a special vertex (that is, we can make to the rooted tree so that every vertex has zero or two children). These operations only increase the number of vertices (and edges) of tree by a constant factor; these do not affect the time complexity of our algorithms. We denote the subtree of rooted at by .
2.1 Hardness of the Problem
We first see that the problem does not admit FPTAS. This is a simple consequence of our co-submitted paper  that proves the NP-hardness of the problem on a spider. There is no FPTAS for the min-max -gathering problem on a spider unless P=NP.
In , the authors proved that the min-max -gathering problem is NP-hard even if the input is a spider and the edge lengths are integral, and the diameter of the spider is bounded by . Let us take such an instance. If there is a FPTAS for the min-max -gathering problem on a spider, by taking for sufficiently large constant , we get an optimal solution because the optimal value is an integer at most . This contradicts to the hardness. ∎
2.2 Algorithm. Part 1: Binary Search
In the following sections, we develop a PTAS for the problem. We employ a standard practice for min-max problems: we guess the optimal value by binary search and solve the corresponding decision problem for the feasibility of the problem whose objective value is at most the guessed optimal value.
First, we run Armon et al.’s -approximation algorithm  to obtain such that holds. Then we set as the range for the binary search. This part is needed to run the algorithm in strongly polynomial-time.
For the binary search, we design the following oracle : Given an instance , threshold , and positive number , it reports YES if , and NO if . If then both answer is acceptable. Our oracle also outputs the corresponding solution as a certificate if it returns YES. Note that we cannot set since it reduces to the decision version of the min-max -gathering problem, which is NP-hard on a tree .
If we have such oracle, we can construct a PTAS as shown in Algorithm 1. The correctness of this algorithm is as follows.
Assume that there is a deterministic strongly polynomial time oracle Solve described above. Then, Algorithm 1 gives a solution to the min-max -gathering problem whose cost is at most in strongly polynomial time.
By the definition of Solve and the algorithm, during the algorithm, always returns YES, and returns NO unless is YES and . Thus, we have . Therefore, the algorithm outputs the solution with cost at most , which is at most , because
The algorithm terminates in steps because the gap becomes half in each step, That completes proof. ∎
2.3 Algorithm. Part 2: Rounding Distance
In this and next subsections, we propose a DP algorithm for . Our algorithm maintains “distance information” in the indices of the DP table. For this purpose, we round the distances so that all the vertices (thus the users and facilities) are located on the points which are distant from the root by distance multiple of positive number as follows.
For each edge , where is closer to the root, we define the rounded length by . Intuitively, this moves all the vertices “toward the root” and regularize the edge lengths into integers. Then, we define the rounded distance the metric on .
This rounding process only changes the optimal value a little. For any pair of vertices , holds. Especially, .
Let be the lowest common ancestor of and . Then, is on the - path; thus, and hold. Since and for all vertex , we have . We also have by symmetry. Thus holds. Since the cost of the min-max -gathering problem is the maximum length of some paths, the second statement follows from the first statement. ∎
This lemma implies that an algorithm that determines whether has a solution with cost at most works as an oracle if .
2.4 Dynamic Programming
Now we propose an algorithm to determine whether has a solution with cost at most . Since all the edge costs of are integral, without loss of generality, we replace the threshold by . An important observation is that is bounded by a constant since .
Our algorithm is a dynamic programming on a tree. For vertex , arrays and , we define a boolean value . is true if there is a way to
open some facilities in , and
assign some users in to the opened facilities so that
for all there are unassigned users in who are distant from by distance and no other users are unassigned, and
for all we will assign users out of who are distant from by distance to open facilities in ,
and false otherwise. is the solution to the problem. The elements of and are non-negative integers at most ; thus, the number of the DP states is , which remains in polynomial in the size of input.
The remaining task is to write down the transitions. For arrays and , we denote by the element-wise addition, by the element-wise subtraction, and by the element-wise inequality. We denote by the array produced by shifting by rightwards if and the array produced by shifting by leftwards if ; the overflowed entries are discarded. Let be the two children of . We make a formula to calculate from the DP values for children. Let the cost of the edges in be . Then, is true if and only if
there are arrays of integers whose lengths are such that
is if there is a user on and otherwise, and
the sum of all elements in is zero or at least if there is a facility on and zero otherwise, and
if is nonzero, the sum of indices of last nonzero elements of and are at most , and
and , and
for , for , for , for , and
The meaning of the auxiliary variables are as follows.
The -th entry of (resp. ) denotes the number of users in (resp. ) who are distant from by distance and assigned to the facility in (resp. ).
and decide whether we assign the user on to an open facility in or remain unassigned.
The -th entry of (resp. ) denotes the number of users in (resp. outside of ) who are assigned to the facility on and distant from by distance .
We can enumerate all the possibilities of the arrays in polynomial time. Thus, the total time complexity is polynomial. We can reconstruct the solution by storing which candidates of transitions are chosen, so we achieved to construct an algorithm what we wanted. This gives a proof of Theorem 1.2.
Our technique can be used for other variants of the -gathering problems. In -gathering problem , we do not need to assign at most factor of users. We can construct an algorithm to solve it, just by adding the number of ignored users in to DP states of vertex . Note that, this problem is also NP-hard and does not admit FPTAS, because we can convert -gathering instance to equivalent -gathering instance, just by adding the proper number of users on sufficiently far points.
We can treat the constraint on the number of open facilities just by adding the number of open facilities in to DP states of vertex . Note that, this problem is also NP-hard and does not admit FPTAS because in the gadget construction described in our another paper  we only have to decide whether there is a solution with clusters, where is the number of “long legs” on a spider.
Here we give a theorem to conclude this subsection.
Both min-max -gathering and -gathering with constraints on the number of open facilities admit strongly polynomial time approximation schema.
We can also straightforwardly combine these additional states to solve combined problems.
3 Polynomial-Time Algorithms for other variants
In contrast to the min-max -gathering, there are variants which can be solved in polynomial-time in tree. In this section, we introduce them.
3.1 min-sum -Gathering and Lower Bounded Facility Location Problem
Now we consider the other objective function – not min-max, but min-sum. We can also introduce the cost to open facility for each facility : the total cost is the sum of the distance between users and assigned facilities, and the sum of over all open facilities. In this situation, the problem is so-called lower bounded facility location problem . For the general metric case, -approximation algorithm was given in . Later, the approximation ratio is improved to .
Unlike the min-max case, we can solve this problem exactly on a tree in polynomial time. For each vertex and an integer , such that , let us define the value by the minimum total cost in following situation.
If , all but users in are assigned to facilities in , all open facilities in has at least users, and we will assign remaining users to facilities out of . In other words, users go upwards from , and no users go downwards to .
Otherwise, all users in are assigned to facilities in , and we will assign additional users out of to the facilities in . In other words, users go downwards to , and no users go upward from .
We want the value . Following observation ensures we can get an optimal solution by calculating DP values in a bottom-up way.
There is an optimal solution, that for each edge , all users who pass through the edge when they go to the assigned facilities pass through in the same direction.
Assume the users go to the facilities , respectively, and they pass through the edge in the opposite direction. Then, we can decrease the sum of the number of edges the user pass through among all users, by reassigning to and to , without increasing the total cost and breaking feasibility. ∎
Let us write down the transitions. Denote two children of by , and distance between and by . We also denote the number of users on by . Then, is calculated by
if contains no facilities. If contains a facility , we also decide whether to open . Thus, we additionally take a minimum to the value . We can implement this algorithm to work in time. Since , we get the following theorem.
min-sum -gathering problem and lower bounded facility location problem on a tree admit an exact time algorithm.
3.2 Proximity Requirement
In real applications, it is natural to assume that users go to their nearest open facilities. This requirement is called proximity requirement. It is discussed in Armon’s paper  for min-max -gathering problem and they gave a -approximation algorithm. We assume that for all user , there is no tie among the distances from to the facilities. That ensures the users uniquely determine the facility that they go. Especially, there is a positive distance between two distinct facilities.
Unlike the vanilla -gathering, We can solve this problem exactly in polynomial time on a tree. The key observation is the following fact.
Assume that the user go to the facility , respectively, in a feasible solution. If path and path have a common point, .
Denote this common point by . Since , holds. Without loss of generality, we can assume that . It means both and should go to . ∎
By the above lemma, we can argue that if there are two users who go to the same facility, so do all the users between them. From now, we construct an algorithm by dynamic programming.
For each vertex , facility , and integer , we calculate the value , which represents the minimum possible cost to assign all users in and decide whether to open each facilities in and , in situation
there are at least users assigned to ,
the nearest open facility from is ,
users in is assigned to the facilities in or ,
and all open facilities in but have at least users.
If there is no solution which satisfies above conditions, this value is . We calculate these values in a bottom-up way.
We want the minimum value among all facility . Let us write down the transitions. Let two children of the vertex be and , and the number of users on vertex be . Let be when there are users on and when there is no user on . is calculated by the following minimum.
for all . That corresponds to the case remaining users in are assigned in .
for all and facility , which satisfies and . That corresponds to the case remaining users in are assigned to and we finish to choose users assigned to .
for all and facility , which satisfies and . That corresponds to the opposite case described above.
for all facilities , which satisfies and . That corresponds to the case which we finish to choose the users assigned to .
We can calculate all these transitions in time for each vertex , so we can solve this problem in time. Note that, min-sum version of this problem can be solved in the same way. Here we conclude this subsection by the following theorem.
min-max and min-sum -gathering with proximity requirement admit an exact time algorithm.
-  Gagan Aggarwal, Rina Panigrahy, Tomás Feder, Dilys Thomas, Krishnaram Kenthapadi, Samir Khuller, and An Zhu. Achieving anonymity via clustering. ACM Transactions on Algorithms, 6(3):49:1–49:19, 2010.
-  Sara Ahmadian and Chaitanya Swamy. Improved approximation guarantees for lower-bounded facility location. In International Workshop on Approximation and Online Algorithms, pages 257–271. Springer, 2012.
-  Shareef Ahmed, Shin-ichi Nakano, and Md Saidur Rahman. r-gatherings on a star. In Proceedings of International Workshop on Algorithms and Computation, pages 31–42. Springer, 2019.
-  Toshihiro Akagi and Shin-ichi Nakano. On r-gatherings on the line. In Proceedings of International Workshop on Frontiers in Algorithmics, pages 25–32. Springer, 2015.
-  Sarker Anik, Sung Wing-kin, and Rahman Mohammad Sohel. A linear time algorithm for the r-gathering problem on the line (extended abstract). In Proceedings of International Workshop on Algorithms and Computation, pages 56–66. Springer, 2019.
-  Amitai Armon. On min–max r-gatherings. Theoretical Computer Science, 412(7):573–582, 2011.
-  Yijie Han and Shin-ichi Nakano. On r-gatherings on the line. In Proceedings of International Conference on Foundations of Computer Science, pages 99–104, 2016.
-  Soh Kumabe and Takanori Maehara. -gather clustering and -gathering on spider: FPT algorithms and hardness.
-  Shin-ichi Nakano. A simple algorithm for r-gatherings on the line. In Proceedings of International Workshop on Algorithms and Computation, pages 1–7. Springer, 2018.
-  Zoya Svitkina. Lower-bounded facility location. ACM Transactions on Algorithms, 6(4):69, 2010.
-  Latanya Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):557–570, 2002.