Abstract
The facility location problem is a wellknown challenge in logistics that is proven to be NPhard. In this paper we specifically simulate the geographical placement of facilities to provide adequate service to customers. Determining reasonable center locations is an important challenge for a management since it directly effects future service costs. Generally, the objective is to place the central nodes such that all customers have convenient access to them. We analyze the problem and compare different placement strategies and evaluate the number of required centers. We use several existing approaches and propose a new heuristic for the problem. For our experiments we consider various scenarios and employ simulation to evaluate the performance of the optimization algorithms. Our new optimization approach shows a significant improvement. The presented results are generally applicable to many domains, e.g., the placement of military bases, the planning of content delivery networks, or the placement of warehouses.
I Introduction
The typical geographical facility placement problem originates from the application area of logistic and transportation. Locations of central warehouses have to be determined with a short distance to customers to provide adequate service and convenient access. An example is shown in Figure 1. The left side shows the initial state with the geographical locations of customers. The scenario on the right side includes the optimized locations of ten warehouses with short distances to the customers. The edges represent the assignment of a customer to its nearest warehouse. The underlying theoretical problem is of importance in many different application areas [12, 6], for example, the strategic placement of fire stations, hospitals, and military bases; as well as in content delivery networks for storage server allocation; and in telecommunication for improvements on the network infrastructure. Numerous further application areas with the same problem can profit from a good solution.
Once a facility is set up at a specific location, it is very complex to change it. Facility location decisions are costly and have a strong longterm effect. Often only the set up costs and the capacity at a possible location of a facility is taken into account for the decision. The distance of a facility to its customers often receives less consideration, since existing placement strategies offer modest performance. The planning of the facility placement is very important and shall be well elaborated. Generally a facility should be placed close to the customers to reduce transportation time and cost. A wellplanned facility improves the overall supply with service and goods. Therefor, the focus of this work is the geographical and infrastructural placement of facilities with a short distance to the customers. The facility location problem is based on the classic kcenter and clustering problem. In the uncapacitated version, it is assumed that a facility can provide service to as many customers as related to them [13]. Nevertheless, the customers shall be more or less equally assigned to the facilities to obtain an equal load balancing. In the more complex version, the facilities have a predefined capacity to satisfy a limited number of customers [10].
We identified three fundamentally different constraints for the placement of facilities in relation to the infrastructure. These are:

[itemsep=4pt]

Free placement: The location can be determined completely free and without any restriction.

Infrastructural placement: Existing infrastructure should be used for the placement.

Node placement: Only a limited amount of given locations are viable to place a facility.
Case one offers the largest degree of freedom, while the last case is the most restrictive approach. We analyze these constraints to illustrate the differences between them.
In this article, we present a deterministic heuristic that finds optimized geographical locations of multiple, central locations. The approach is used for all three constraints. We compare the performance of our solution using benchmark scenarios as well as realistic scenarios. We discuss the influence of management decisions, regarding the placement restrictions and the number of available facilities. To this end, we use simulation based optimization to analyze the necessary amount of facilities for an effective supply.
Ii Scenario and Requirements
The management of a large company intends to expand their business in another country and wants to distribute their products. The small supply chain in our example consists of three parts. Goods are produced in a factory overseas. From there, these products are transported to intermediate storages close to the distributor locations, where they are distributed to local stores for selling. The geographical location of the stores to be supplied and the precise infrastructure is known beforehand. To implement this supply chain, multiple small warehouses need to be set up to provide a continuous supply of goods to the stores. Generally, the company aims for short transportation paths from their warehouses to the customers to reduce transportation costs and transportation times. These costs are balanced against the operational costs of multiple warehouses. The placement of the warehouses and the assignment of stores to their closest warehouse have to be simulated and optimized automatically. For the initial planning phase, we consider only the uncapacitated geographical facility location problem. According to the scenario and various application areas, we need to answer the following important questions:

[itemsep=4pt]

Where should new warehouses be placed geographically in order to obtain short and efficient transportation paths?

Which consumer is assigned to which warehouse?

What is the necessary amount of warehouses to minimize the cost of operation and transportation?

How large is the performance difference between the constraintfree and the node placement scenario?
Iii Problem Description
The scenario described in Section II is based on the kcenter or kmedian problem, depending on the objective function. The uncapacitated facility location problem was first described in [3]. For a given amount of locations, a predefined amount of central locations have to be found. The kcenter problem considers the minimization of the maximum distance between a location and its nearest center [4]. In contrast, kmedian problem uses the median as objective criterion [10]. Both problems are NPhard [7]. The problem can be specified using a strongly connected graph topology with vertices () and edges (). The kcenter problem is defined on a complete, undirected graph. The objective function defines the fitness value d(,) for an edge e(,) between two vertices (,), satisfying the triangle inequality. The objective function selects the best edge from a vertex to one of the calculated locations of a center node (). A center node can be a vertice, dependent on the placement constraint from Section I. The amount of all fitness values from every vertice is defined as .
We intend to place the amount of center nodes to minimize the maximum (, ) from a vertex to its closest . This objective criterion belongs to the typical kcenter problem according to Equation 1, the objective criterion of the similar kmedian problem correspond to Equation 2:
(1) 
(2) 
Regarding our case, we set up a predefined amount of warehouses and minimize the maximum distance from the stores to its nearest warehouse . The objective function defines the distance between a warehouse and a store .
Iv Related Work
Over the past years, many solutions for the facility location problem have been proposed. We focus on the most recent approaches and heuristics for comparison. Most approaches analyzed the problem from the perspective of the kcenter or clustering point of view with the objective to minimize the maximum distance [4, 9]. The papers of Rana and Garg [22] and Arthur and Vassilvitskii [2] propose multiple heuristic approaches for the problem. Other important but more general work are from Potikas [21], Jamin et al. [11], and Hochbaum and Shmoys [8]. One of the current best solutions are from Resende and Werneck [23]. The adapted greedy randomized adaptive search procedure (GRASP) meta heuristic combines a greedy initialization with a local search strategy. Besides these strategies there is a model of the node placement problem with mathematical relaxations [19]
, which can be solved by a linear programming (LP) solver. We use the proposed approaches as benchmarks. Additionally, we use evolutionary algorithms like Simulated Annealing
[1]and a genetic algorithm
[15]. The books of Klose [14], Mayer [18], and Fischer [5] present a comprehensive analysis of the logistical problem. But their focus is more on the economic aspects and the entire modeling process than on the adequate geographical placement of facilities.The aforementioned work covers our requirements only partly or the proposed approaches show only a modest performance. They provide less information about the necessary amount of facilities to reach a specific objective.
V Reference Algorithms
To compare our new approach with the existing strategies we present the most important reference algorithms. For our experiments, it is sufficient to consider the most restrictive case (node placement) and the most flexible approach (free placement). At the moment, we do not consider algorithms for infrastructural placement, since all algorithms can easily be adapted to generate appropriate solutions. The algorithms in Table
I are suitable for certain placement constraints.Node placement  Node and Free placement 

2Approx  kMeans 
Greedy  Evolutionary Algorithm 
GRASP 
Va 2Approx
The 2Approx choses a random vertice at the beginning, which becomes the location of the first center node. After that, the algorithm calculates for every placed center node the distance to all vertices. It chooses the vertice with the largest distance to their closest center node as the new location of the next center node. This iteration runs until the specified amount of center nodes is reached. This algorithm is 2approximable. Generally, approximable algorithm with factor , guarantees a solution with cost where . The algorithm guarantees that the maximum value of a distance from a vertice to its nearest center node is not larger than twice the maximum considering the optimal placement location of the center node [7, 8]. This bound is given without knowledge of the actual optimum.
VB kMeans
The following group of clustering algorithms can be adapted for the free placement and node placement constraints. The main idea is to define k center nodes, one for each cluster. In this case a cluster is a group of vertices. Already placed nodes obviously influence the location of following placed nodes. In contrast to the iterative approach, these algorithms place all center nodes at the same time. This makes backtracking unnecessary. Placement changes of a center node have a direct effect to other center nodes.
Various approaches for clustering exist for a fixed number of clusters, they differ mainly with regard to the initial placement of center points. The MacQueen [17] algorithm is one of the less complex kMeans algorithms. It relies on randomly selected locations of the vertices, which are used as the initial locations of the center nodes. Compared to the algorithm from Lloyd [16], also known as Voronoi relaxation, it starts with completely randomly placed center nodes in the area. Another typical initialization is used in the kMeans++ algorithm [2]
. Here, the location of the first center node is chosen randomly at a vertice location. The other center nodes are also placed randomly at vertice locations, however, the probability is skewed to favor certain locations. The selection probability is increased proportionally with their squared distances to already selected locations.
After initialization, all vertices are assigned to their respective center nodes. For each group of vertices related to a center node an updated location is calculated. The new center node is the geometrical center of all vertices in a group. This process is repeated until center node locations do no longer change. During every iteration step, the vertices are reassigned to the nearest center node. The algorithms are run multiple times due to random initialization. To adapt these strategies for the three constraints, the locations of the center nodes are mapped to the nearest possible location either in every step or at the end of the optimization.
VC Greedy and GRASP
The Greedy strategy Jamin et al. [11] initially places the center nodes at the location of predefined vertices iteratively. During each iteration it tries all possible placement combinations for the next center node and ultimately selects the location that provides the biggest benefit with respect to the optimization criterion. The Greedy strategy repeats this process until all center nodes are placed. To improve the quality further they include backtracking, to test whether already placed center nodes can be placed in a better way or may be removed completely.
The advanced approach of GRASP tries several starting locations as initialization for a greedy local optimization [20]. A weighted greedy randomized strategy is used for the initialization process. These starting locations are subsequently iteratively improved using local search. It updates the location of randomly chosen center nodes in a greedy manner.
VD Evolutionary Algorithm
Finally we use several evolutionary algorithms to optimize the center node locations. The SEREIN framework [24]
is used to implement these algorithms. We employ the standard implementation of a genetic algorithm (GA) provided by SEREIN and use a population with 25 individuals evolving over 80 generations. In addition, a Particle Swarm Optimization (PSO) and Simulated Annealing (SA) approach is implemented as well. The parameters for the algorithms were determined experimentally using metaoptimization.
Vi Our Approach Dragoon
Based on the kMeans strategy, we developed a new algorithm Dragoon (Diversification Rectifies Advanced Greedy Overdetermined Optimization NDimensions). Most established placement algorithms are very sensitive to the initial placement of centers. Furthermore, the first center placed usually serves a high amount of customers. This is possible in the uncapacitated facility location problem, but it shows the serious influence of the first placement decision. Nevertheless, an even distribution would be desirable. After the initialization, the vertices are assigned to the nearest center node location. In an iterative optimization these locations are improved. Most existing approaches try to find optimized locations only with respect to these single groups. The influence to other groups and the overall system is lost, which leads to suboptimal solutions. With the knowledge of these problem properties and weaknesses of current solutions, we designed our own algorithm. The algorithm should avoid as much as possible random decisions to prevent multiple runs and to reach stable results.
To reduce the sensitivity to initialization, we designed a new initialization process. In the preliminary stage of the initialization phase, an orientation node is placed at the optimal position comparable to the one center node case. To avoid complete search, this can be simplified by calculating the average value of the coordinates. Afterwards, the specified amount of center nodes is placed using the 2Approx strategy. Thereby, we obtain a very specific solution of the 2Approx placement strategy. This guarantees the 2approximable quality of the result. After the initialization, the algorithm starts with the iterative refinement. These newly designed optimization steps are adaptable to different placement constraints.
The following description explains the general approach, which is adaptable to all three placement constraints. In every iteration step, the vertices are (re)assigned to their nearest center nodes. Afterwards, an updated location is calculated for every cluster of vertices related to a center node. This is done with respect to the entire scenario. The algorithm tests all possible locations around the current center restricted by the current cluster. The new location is chosen after the best improvement. This is done with respect to the specified optimization criterion. In our case, it is the maximum distance. If this value is unchanged, the algorithm will use another additional criterion. To choose between two solutions and to identify an improvement, we use an average or mean criterion. In each iteration, every center node is allowed to shift its location only once. This leads to a stepwise improvement and avoids a too fast stagnation in a local optimum. The order of the cluster selection has mostly no influence on the final result. This is due to the global view. For our simulations, the clusters are chosen with respect to their worst performance first. This iterative optimization is repeated until all center node locations do not change any more. Usually, only a few iteration steps are necessary until the algorithm terminates due to the described initialization. The algorithm accepts only improved locations in every iteration step. Therefore, the 2approximable condition holds and it will always terminate.
For the free placement constraint, our algorithm tests all points on a grid with a defined distance (). If one of the tested locations results in a better performance for the overall scenario, it will be accepted. This location is used for the next iteration step. If no location leads to an improvement, we successively increase the granularity of the grid (). This process is repeated until the grid distance is smaller than the maximal accepted deviation. It is necessary to define a limit for the maximal deviation to terminate the optimization process. The processing steps of the iterative optimization are shown in Figure 2. The left side illustrates the movement to an improved spot. The right side shows the increased granularity of the grid by bisection.
For the node placement constraint, the algorithm simply tests all locations of grouped vertices for a center. To identify an improved location, the algorithm evaluates the overall scenario. All actual center locations are used in every evaluation including reassignment except of the observed center node. The possibilities of better center node location are limited to the group in each iteration step. To improve the performance, the tested capabilities can be further restricted to locations with a defined distance to the current center. The algorithm optimizes the center locations iteratively until no changes occur. According to the divide and conquer principle, the one center problem is solved optimal for each group of vertices with respect to the overall scenario. This optimization is calculated in polynomial runtime.
The algorithm can also be adapted to upgrade an existing scenario with partly fixed centers from the beginning or other constraints. A typical application area for this algorithm is the clustering of data.
Vii Simulation and Assessment
The evaluation of the algorithms is based on experiments using a prototypic implementation in Java We use classic geocoordinates in the 2dimensional space and the euclidean distance as metric. We use more than 10 scenarios with equally weighted vertices and edges. The test set consists of a randomly generated and realistic scenarios without hierarchical topologies or other particular conditions. For the evaluation, the calculated distances of the different scenarios are normalized for comparability. As fitness function we calculated the distance parameter: maximum, 95% quantile, median and average. To guarantee statistical significance we repeated simulations using multiple scenarios with an amount of vertices from 600 to 1200. The achieved results are accumulated for each algorithm.
Initially, a Monte Carlo approach for node placement serves as a basic benchmark. It shows that the 2Approx algorithm on average returns much better and stable results. So we use the 2Approx as reference. Based on the results of the 2Approx we define a theoretical limit for the optimum:
(3) 
The solutions of 2Approx vary because of the random initialization. Nevertheless, the 2approximable condition is valid every run.
Figure 3 shows the results of the algorithm comparison. For small center node amounts, our improved Dragoon algorithm is close to the theoretical optimum. For larger center node amounts the algorithms stagnate with their performance nearly at the same level, referring that we are already very close to the actual optimum. Our approach performs significantly better than 2Approx and is much faster than a brute force approach.
Figure 4 presents the difference between free placement and node placement for our improved algorithm Dragoon. The distance deviations between the different placement constraints are on average 4% and in the worst case 11%. We observe that the node placement approach needs on average 2 centers more to compensate the more flexible positioning of the free placement. In the worst case, 6 centers more are required. While additional center nodes have a positive effect, increasing the overall capacity and load balance, the added benefit decreases significantly for large amounts of center nodes. We observe a saturation effect for high ratios of center nodes in relation to vertices.
Based on the maximum distance of a vertex to its nearest center node, Table II and Figure 5 show that it is sufficient to set up about 6% of the vertices as center nodes. After we placed 58 center nodes in the normalized scenarios, the average improvement of maximum distance for an additional center node is less than 1% with the 2Approx or Dragoon algorithms.
Center Nodes  1  2  5  10  20  30  40  50  60  70  80 

Max Distance  112.6  76.3  46.8  29.8  19.8  14.7  12.6  11.0  9.9  8.9  8.1 
Improvement in %    32.2  38.7  36.4  33.5  25.9  14.3  12.5  10.4  9.6  8.8 
Figure 5 and Table III present the general performance of the different algorithms. The performance of the SEREIN framework with an evolutionary algorithms is remarkable. SEREIN is not customized for this problem but achieves good solutions in comparison to other algorithms specially developed for this task. The performance of the algorithms MacQueen, Lloyd and kMeans++ are nearly the same, so we took MacQueen, the best of the three. The complexity of the algorithms is considerably different, but all optimizations finished after a couple of minutes in every used scenario except LP. It took a much longer time, especially for large scenarios.
Center Nodes 
Monte Carlo (node) 
2Approx (node) 
Greedy (node) 
MacQueen (node) 
MacQueen (free) 
GRASP (node) 
GA (node) 
Dragoon (node) 
Dragoon (free) 

1  85.7  112.6  76.0  76.9  77.0  76.0  83.0  76.0  75.3 
2  63.6  76.3  74.8  52.0  51.1  48.2  62.0  47.9  47.1 
5  48.4  46.8  47.1  41.4  40.1  36.0  46.9  35.0  34.0 
10  36.9  29.8  33.3  28.4  26.2  24.5  35.6  22.0  21.2 
20  27.8  19.8  21.9  22.6  20.5  17.7  27.3  15.5  14.7 
30  24.0  14.7  17.2  19.9  18.1  14.0  23.5  12.8  12.1 
40  21.3  12.6  14.4  17.8  15.9  12.1  20.8  10.9  10.1 
50  19.8  11.0  12.6  16.4  14.3  10.5  19.3  9.4  9.1 
60  14.7  9.9  11.5  15.2  13.3  9.6  18.0  8.7  8.3 
70  13.8  8.9  10.4  14.2  12.0  8.6  16.6  8.0  7.7 
80  12.9  8.1  9.6  13.5  11.5  8.0  15.3  7.5  7.1 
In line with our initial intention, to set up warehouses for a specific scenario, the costs for transportation as well as operating and setup costs have to be respected. With an increasing amount of center nodes, the transportation distance and cost is reduced, whereas the setup and operating cost is increased. To find the optimal balance between these aspects, we use simulation based optimization to calculate the optimal amount and location of center nodes. Figure 6 shows the operating costs for a specific scenario. This calculation has to be made for every scenario individually. For this example, we used abstracted cost function to show the objective of our simulation based optimization.
Viii Conclusion and Outlook
In this paper, we propose the novel algorithm Dragoon to solve the kcenter problem with geographical placement. Our strategy outperforms the other approaches, reaches very good results close to the global optimum in short time and is less sensitive to random initialization. We calculated the distance deviations between the different placement constraints (free vs node). A slight difference on average of 4% is observed for the maximum distance. In the worst case, it can be up to 11% distance difference between the most flexible case and most restrictive case.
To optimize the supply chain and the delivery time, we analyzed the amount of recommended center nodes for predefined scenarios. Our analyses show that even the best placement strategy reaches less than 1% performance gain by adding an additional center node after an amount of a center node ratio of about 6% is reached. In the future, we intend to further enhance the performance of the placement algorithms. Furthermore, the inclusion of weights for customers and edges as well as different fitness functions will be considered.
References
 [1] (1992) Simulated annealing algorithm for the simple plant location problem: a computational study. Revista Investigacao Operacional. Cited by: §IV.
 [2] Cited by: §IV, §VB.
 [3] (1965)(Website) University of Pennsylvania and MATHEMATICA. Cited by: §III.
 [4] (1998) The pneighbor kcenter problem. Cited by: §III, §IV.
 [5] (1997) Standortplanung unter ber cksichtigung verschiedener marktbedingungen. PhysicaVerlag Heidelberg. Cited by: §IV.
 [6] (201012) A novel priority based data mining algorithm using improved kmeans clustering for detecting protein sequence from dataset. In Proc. IEEE Computational Intelligence and Computing Research (ICCIC), Piscataway, New Jersey. Cited by: §I.
 [7] Cited by: §III, §VA.
 [8] Cited by: §IV, §VA.

[9]
(200009)
Primaldual approximation algorithms for metric facility location and kmedian problems.
In
Approximation Algorithms for Combinatorial Optimization
, Saarbrücken, Germany. Cited by: §IV.  [10] (200103) Approximation algorithms for metric facility location and kmedian problems using the primaldual schema and lagrangian relaxation. New York, NY, USA, pp. 274–296. Cited by: §I, §III.
 [11] Cited by: §IV, §VC.
 [12] (200207) An efficient kmeans clustering algorithm: analysis and implementation. In Proc. IEEE Pattern Analysis and Machine Intelligence, Piscataway, New Jersey, pp. 881–892. Cited by: §I.
 [13] Cited by: §I.
 [14] (2013) Standortplanung in distributiven systemen: modelle, methoden, anwendungen. PhysicaVerlag Heidelberg. Cited by: §IV.
 [15] (2001) Solving the simple plant location problem by genetic algorithm. RAIROOperations Research 35 (01), pp. 127–142. Cited by: §IV.
 [16] Cited by: §VB.
 [17] Cited by: §VB.
 [18] (2001) Strategische logistikplanung von hub&spokesystemen. Cited by: §IV.
 [19] (201109) Optimization with gurobi and python. 1 edition, INESC Porto and Universidade do Porto,, Porto, Portugal. Cited by: §IV.
 [20] (2009) Greedy randomized adaptive search procedures. Encyclopedia of optimization, pp. 1460–1469. Cited by: §VC.
 [21] Cited by: §IV.
 [22] Cited by: §IV.
 [23] (2006) A hybrid heuristic for the pmedian problem. European Journal of Operational Research 174 (1), pp. 54–68. Cited by: §IV.
 [24] Cited by: §VD.
Comments
There are no comments yet.