Instance Scale, Numerical Properties and Design of Metaheuristics: A Study for the Facility Location Problem

01/10/2018 ∙ by David Chalupa, et al. ∙ Aalborg University 0

Metaheuristics are known to be strong in solving large-scale instances of computationally hard problems. However, their efficiency still needs exploration in the context of instance structure, scale and numerical properties for many of these problems. In this paper, we present an in-depth computational study of two local search metaheuristics for the classical uncapacitated facility location problem. We investigate four problem instance models, studied for the same problem size, for which the two metaheuristics exhibit intriguing and contrasting behaviours. The metaheuristics explored include a local search (LS) algorithm that chooses the best moves in the current neighbourhood, while a randomised local search (RLS) algorithm chooses the first move that does not lead to a worsening. The experimental results indicate that the right choice between these two algorithms depends heavily on the distribution of coefficients within the problem instance. This is also put further into context by finding optimal or near-optimal solutions using a mixed-integer linear programming problem solver. Since the facility location problem is a relatively simple example of a choice-and-assignment problem, similar phenomena are likely to be discovered in a number of other, possibly more complex computational problems in science and engineering.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many problems in science and engineering are widely regarded as computationally hard. Within operations research, these involve a number of planning, scheduling or production optimisation problems. Such problems include a variety of facility location [22] and supply chain optimisation problems [34], as well as shop scheduling problems [8], including job shop scheduling [4] and flow shop scheduling [33]. Typical application areas for solving this type of problems include product demand analysis [37], warehouse location [35], planning and scheduling technology design [41], portfolio selection for product development [40], container pick-up operations [19], or applications in preventive health care [25].

A variety of optimisation algorithms have been developed and experimentally verified over the last decades. This holds not only for real-world variants of these problems [11], but also for a wider range of NP-hard optimisation problems such as knapsack [23], resource allocation [30] or the -reachability problem [13].

The No free lunch theorems for optimisation have had a vital impact on design of efficient algorithms to solve combinatorial optimisation problems [45]. The theorems have been interpreted from an algorithmic point of view [17], as well as in a more application-oriented context [26]

. One of the implications is that approximation capabilities and runtime of algorithms for combinatorial optimisation problems are now increasingly interpreted in their relation to specific problem instances. This is particularly pronounced in theoretical literature on evolutionary computation

[36].

In this paper, we present a computational study of local search strategies for the classical uncapacitated facility location problem [2, 16]. The problem is very well suited for this type of study not only for its NP-hardness [38], but also because of a wide range of its multiplicative, capacitated and dynamic variants [20, 39, 44]. In addition, the problem has very good scaling properties, as well as relatively less constrained search space that does not seem to change its structural properties heavily by scaling. This makes it highly suitable for this computational investigation. Facility location also has important applications in production systems, operations design and management, as well as supply chain design [18].

Contributions.

In this research we use two local search algorithms to solve the facility location problem and obtain that their performance comparison indicators depend vastly on the coefficients within the instance. To further investigate this, we extend on a previous study of local search algorithms for very large instances of the problem, modelling large-scale applications such as customer service centre location [15].

The first algorithm is steepest descent local search (LS), choosing the move to open or close a facility at each time step such that the best objective value is obtained. Another algorithm studied will be randomised local search (RLS), which attempts to open or close a facility and accepts the move whenever it does not lead to a worsening. These algorithms are well-studied in evolutionary computation theory [36] and have been used to solve other combinatorial optimisation problems, e.g. the minimum conductance graph partitioning problem [14]. For each set of instances with up to potential facility sites, we also use a mixed-integer linear programming solver to compare the results of the local search algorithms to distribution of the actual optima (or near-optimal solutions, in case of very hard instances).

The experimental results are presented for four sets of problem instances, with each set containing instances with customers and numbers of facility sites ranging from to (for instances with up to facility sites we also used the ILP-based solver). These four sets do not differ in size but only in the distribution of values within the instances themselves. Intriguingly we observe that the behaviour of LS and RLS depend heavily on the particular facility cost and distance model. Some of the observations contrast with each other, highlighting the tight coupling of numerical properties of an instance and the choice of a suitable algorithm to solve it.

Unsurprisingly, the choice of the right local search strategy is indeed closely tied to the numerical properties of a particular instance. If these properties are known for a particular real-world application, they can be taken into account in an efficient algorithm design. Otherwise, our results provide evidence that simple and scalable strategies are suitable to explore unknown search landscapes. As the facility location problem belongs to a class of classical assignment and cost optimisation problems, we find it reasonable that the results can be generalised to other popular real-world combinatorial optimisation problems.

The paper is structured as follows. In Section 2, we introduce and review the uncapacitated facility location problem. In Section 3, we propose our four cost and distance models for instances of the problem, as well as the local search algorithms explored. Section 4 presents the experimental results and provides a brief discussion. Our conclusions are presented in Section 5.

2 The Uncapacitated Facility Location Problem

We will firstly formalise the problem as an integer linear program and provide an overview of related results on heuristics and metaheuristics to solve the problem in large-scale. Next, we will point to the results underpinning the aim of this study. An illustration of the interpretation of the problem and the concepts used is depicted in Figure 1.

Figure 1: Illustration of the concepts and interpretation of the variables used in the formulation of the facility location problem in formulas (1) and (2).

Let be a subset of selected potential facility locations. Let be the cost of facility and let be the distance from a customer to the nearest facility . Then, the objective in the uncapacitated facility location problem will be to minimise the following objective function:

(1)

It is possible to transform this formulation into an integer linear programming (ILP) formulation of the problem [3]. Let be the number of facilities and let be the number of customers. Then, alternatively, the objective is solve the following ILP formulation of the problem:

(2)

s.t.

(3)
(4)

where is the cost of meeting the demand of customer from facility , and and are binary decision variables determining if facility is used to serve customer and if a facility is established at position .

With this exact formulation, an out-of-the-box mixed-integer linear programming solver can be used to solve the problem up to a certain scale. We have previously conducted a brief study of such an approach with a high number of customers [15]. As an alternative, specific branch-and-bound algorithms have also been proposed for the problem [43]. For larger instances, as with most NP-hard problems, one needs to use approximation algorithms or heuristics.

The approximation algorithms for the problem use quite specific features of the problem to find solutions of good quality. The best general approximation algorithm for the problem achieves approximation ratio [27]. The metric facility location problem is approximable within a factor of 1.488 [31]. However, it is NP-hard to approximate it within a factor better than 1.463 [21]. This documents that both for general and special cases of the problem, there is a gap in approximation results that is well-suited for exploration using heuristics.

Several optimisation algorithms have been applied to solve large scale problem instances. These range from local search algorithms to hybrid population-based approaches [1, 7]

. However, little is still known about the choice of operators to explore the landscape and solve the problem efficiently. This seems to be the case even though the problem is relatively simple and easy to test computationally. An empirical comparison has previously been conducted, comparing genetic algorithms, tabu search and simulated annealing in solving the problem

[5]. Interestingly but perhaps not surprisingly, this study finds that tabu search works best for most instances, while genetic algorithms or simulated annealing may perform very well in specific cases.

The ideas of local search for the problem have been analysed theoretically [6, 29]. Several tabu search approaches have been proposed to solve the problem [3, 5, 42], generally differing in more general versus more specific design and parameterisation. Genetic algorithms have also been used to solve the problem [28], as well as evolutionary simulated annealing [46], or particle swarm optimisation [24].

Many of these algorithms combine several ideas to solve the problem efficiently and usually incorporate a set of parameters with tuned values. However, the interplay of different components of these algorithms is often still not fully understood. This is the case not only for the facility location problem but holds for a number of combinatorial optimisation problems. We will investigate the performance of two local search algorithms for a set of carefully chosen instances. These will highlight the contrast between search space structures and their influence on the actual behaviour of different optimisation techiniques. The aim is to establish how the behaviours of metaheuristics can vastly differ with only a slight variation in values within the problem instance. This will not only highlight the need for hybrid metaheuristics in these problems but will also strengthen the case for hybrid algorithms with a compact parameter suite.

3 Our Facility Cost and Distance Models and Local Search Algorithms

In this section, we introduce our four facility cost and distance models, as well as the algorithms we use to solve them.

3.1 Facility Cost and Distance Models

Each of the four problem models have specific quantitative properties, leading to contrasting search landscapes. In Model 1, all facilities have the same cost, while the distances follow a moderately varied uniform distribution. Model 2 works with binary facility costs and a bimodal distribution of distances, with many distant connections but a few close ones. Model 3 assumes both binary facility costs and binary distances. Last but not least, Model 4 works with facility costs and distances following a Poissonian distribution. For illustration, the probability distributions for occurrences of specific distance values in all four models are plotted in Figure 2. Similar plots are omitted for the distributions of facility costs, as these were very simple in all four models. These models are chosen, as they lead to a number of different landscapes. For example, while Model 1 represents a moderately rugged landscape, Model 2 is expected to feature “spikes” and Model 3 is expected to lead to relatively large plateaus in the search space.

3.1.1 Model 1: Flat Facility Cost, Moderately Varied Random Distances

The first model will assign the same unit cost to all facilities. The distances will be represented by integers taken uniformly at random from a limited interval between and . This way plateaus will be generated in the search space. The corresponding values of and will be:

(5)
(6)
Figure 2: Illustrations of the probability distributions for customer-facility distance values in all four of our models (from Model 1 in the top left corner, to Model 4 in the bottom right corner).

3.1.2 Model 2: Binary Facility Cost, Bimodal Distribution of Distances

In the second model, we use a binary choice facility cost. Each facility costs either or units. The distances will follow a bimodal distribution. Similarly to the previous model, we will generate uniformly random integer values between and . However, each value higher than will be overwritten by the maximum possible value. This generates instances with many high distances, but also several low distances between customers and facilities. The values within an instance will be the following:

(7)
(8)

3.1.3 Model 3: Binary Facility Cost, Binary Distances

The facility cost structure in this model is the same as in the previous model. However, the distances will also be generated as or . This will lead to a relative flat problem landscape. The aim will be to effectively search for a solution with as many facility cost and distance values equal to as possible.

(9)
(10)

3.1.4 Model 4: Flat Facility Cost, Poissonian Distribution of Distances

The last model is a modification of Model 1. The uniform facility cost will be used, with each facility costing a single unit. The distances will be taken from the Poissonian distribution , with trials and a probability of success per trial. The distance will be equal to the number of successful trials, incremented by

, to avoid zero distance in case that all trials fail. This leads to a highly skewed distribution of distances. In our implementation, we choose

and . The average distance will be , while there will also be many distances equal to and also a few distance values of or more. The parameters of the model are the following:

(11)
(12)

3.2 Local Search Algorithms

We use two local search algorithms that have been explored in our previous study on a similar large-scale model of customer service centre applications [15]. The algorithms have been chosen for their simplicity and scalability, as well as the fact that neither of them requires extensive explicit or implicit parameter tuning. This makes them highly suitable for our investigation of the implications of the No Free Lunch theorem in real-world applications. The algorithms also have somewhat complementary qualities, as supported by our further experiments. In these experiments, the comparative performance of LS and RLS surprisingly depends on the coefficients within the problem instance.

The first one is a systematic local search (LS) algorithm that, at each time step, chooses the move that leads to the largest drop in the objective value. The second algorithm is a randomised local search (RLS), which attempts opening or closing a single randomly chosen facility in each time step and accepts the move if it does not lead to a worsening. It is worth noting that this type of algorithms is well-studied in evolutionary computation theory [36] and have been used to solve other optimisation problems, such as searching for bottlenecks in complex networks [14].

Local search (LS)

The algorithm will search in space of bit strings, i.e. in . Each bit determines whether facility is open or closed, similarly to the ILP formulation of the problem. The algorithm starts with all facilities open, i.e. with a bit string consisting solely of -bits. Each customer is then assigned to the closest facility. Let be the current bit string and let be the bit string obtained by flipping bit in . Then, LS chooses such that the objective value of is minimised within the neighbourhood. This leads to a steepest descent behaviour of the algorithm. In our investigations, LS will be terminated after iterations, where is the number of candidate facilities.

Randomised local search (RLS)

This algorithm also starts with all facilities open. Let be the current bit string and let be the bit string obtained by flipping bit in . Then, RLS chooses at random in each time step. The new bit string is accepted if the objective value of is not higher than the objective value of . In terms of the number of bit flips attempted, each iteration of LS corresponds to iterations of RLS. RLS will therefore be terminated after iterations in our further experiments.

The simplicity of both LS and RLS will enable us to investigate the intricate interplay between the algorithm design and numerical properties of problem instance. This impacts both the choice of efficient local search and design of hybrid algorithms for a particular cost structure. One can intuitively expect this to be inherently tied to the properties of a particular real-world application. It is also worth noting that both algorithms can be implemented efficiently using lookups in precomputed data structures that can be used to recalculate the objective value for a potential bit flip. This requires storing the lists of customers assigned to each facility, as well as the mapping of customers to particular facilities. Such implementation options allow both algorithms to scale well to quite large problem instances [15].

4 Experimental Results and Discussion

In this section, we present the experimental results obtained. We first describe the experimental settings and then present the results for the four problem instance models. This is followed by a brief discussion and the potential impact of these findings on future investigations and designs of algorithms for the facility location problem.

4.1 Experimental Settings

For each of the facility cost and distance models, we generated 10 instances and ran LS and RLS 1000 times for each instance. We analyse the results from the perspectives of both the ability to estimate the distribution of optima, as well as the ability to sample high-quality solutions in “lucky runs”.

To determine actual optima or near-optimal solution we use the CBC mixed-integer programming branch-and-cut solver from the COIN-OR package [9, 32]. We use a precompiled 64-bit Windows binary of CBC compiled by the Intel 11.1 compiler. The time limit for CBC solution search was set to 24 hours per instance.

The experiments were performed on a machine with an Intel Core i7-6820 CPU @ 2.70 GHz, 32 GB RAM and Windows 10 operating system. LS and RLS were implemented in C++ using Qt, compiled by the MinGW 32-bit compiler.

Each run of LS was stopped after iterations, as in each iteration, a flip of each bit separately is attempted. For RLS, local search iterations were allowed, since a single bit is always flipped in each iteration.

4.2 Results for the Four Facility Cost and Distance Models

The experimental results are presented for each model separately using box-whisker plots to illustrate the distribution of solutions found by LS and RLS. For smaller instances, the distribution of optimal or near-optimal solutions and lower bounds will also be presented. This will allow us to estimate the actual approximation capabilities of LS and RLS in these costing and distance models. In addition, we will investigate the growth of computational needs to solve the problem exactly in all of the models studied. The results will then be discussed and all the techniques investigated will be compared.

4.2.1 Model 1: Flat Facility Cost, Moderately Varied Random Distances

Figure 3 presents the box-whisker plots obtained for the instances generated by Model 1. For these instances, RLS outperforms LS. This can be explained by the possibility that closing or opening a facility, which leads to a higher objective value at the moment, could not necessarily lead to the best moves in the future. Such a possibility is linked to the randomised distance structure and the presence of potential local optima. CBC was able to find optima for instances with up to

customers, as well as for most instances with customers. One can observe that the gap between the typical performances of LS and RLS tends to widen with growing instance size. However, this seems to hold also for the gap between the performance of RLS and the actual optima. This suggests that more advanced techniques could offer some improvement in this model, including tabu search [42]

, population-based local search or evolutionary algorithms

[28].

Figure 3: Box-whisker plots obtained for problem instances generated according to Model 1. LS and RLS were used times per instance, while the results for CBC represent optimal or near-optimal reference solutions obtained by the corresponding ILP solving procedure.
Figure 4: A plot depicting the relation of the performance of LS, RLS and the actual optimal or near-optimal solutions found by CBC for Model 1. These results are grouped according to the problem instance. Each of the gray lines represents the objective values found by a run of LS, RLS and the objective value found by CBC.
Figure 5: Box-whisker plots obtained for problem instances generated according to Model 2. LS and RLS were used times per instance, while the results for CBC represent optimal or near-optimal reference solutions obtained by the corresponding ILP solving procedure.
Figure 6: A plot depicting the relation of the performance of LS, RLS and the actual optimal or near-optimal solutions found by CBC for Model 2. These results are grouped according to the problem instance. Each of the gray lines represents the objective values found by a run of LS, RLS and the objective value found by CBC.

Figure 4 sheds a bit more light on the relations between the performances of different techniques, grouped by instances. Each line in the plot connects results for the corresponding runs of LS, RLS and CBC, linking the solutions found by the local search algorithms to the actual optimal or near-optimal solutions. One can observe an overall “downward” trend from LS to RLS, confirming the better performance of RLS also on the instance level. However, one can also observe a widening gap between the individual results of RLS and CBC. This needs to be interpreted in context: the fact that the performance of an individual run of LS or RLS is inferior does not necessarily mean that a multi-start variant of the algorithm cannot succeed. This can be seen in particular for instances. While there is a gap between individual performances of RLS and CBC, the box-whisker plot indicates that “lucky” runs of RLS are still not far away from the results of CBC. The question that remains open is how many restarts would such an algorithm require to be successful and whether a hybridisation of the algorithm would be more suitable.

4.2.2 Model 2: Binary Facility Cost, Bimodal Distribution of Distances

Figure 5 illustrates the results obtained for Model 2 as box-wisker plots. These reveal a contrasting pattern to the one obtained for Model 1. LS outperforms RLS for these instances. At this point, it is worth noting that we have only changed the cost and distance structure. Such a change already leads to a very different result to the one observed for the previous instances. Observing the distributions of results found by LS and RLS, one can see that the gap even widens with growing number of facilities. However, one can also observe a difference between the median performance of LS and median objective value of optimal or near-optimal solutions.

However, Figure 6 does not seem to indicate a pronounced slope in the relations between objective values observed for high-quality runs of LS and the results found by CBC. It also reveals a very clear pattern of clusters, corresponding to individual problem instances. It seems that for this type of instance, a multi-start variant of the LS algorithm is a good choice.

4.2.3 Model 3: Binary Facility Cost, Binary Distances

Figure 7 reveals the results obtained for Model 3. These paint a more complex picture than those obtained for the previous instances. This is further supported by the results of CBC, which was able to find proven optima only for the instances with facilities. For larger instances, near-optimal solutions were found.

For these instances, LS has a very intriguing performance. One can observe that the median solution quality is better than for RLS. The distribution of solution quality for LS seems to be estimated relatively well in terms of its shape. However, one can also observe a consistent bias in this estimate. The results obtained by LS are heavily concentrated around the median, similarly to the distribution of the actual optimal or near-optimal solutions.

Figure 7: Box-whisker plots obtained for problem instances generated according to Model 3. LS and RLS were used times per instance, while the results for CBC represent optimal or near-optimal reference solutions obtained by the corresponding ILP solving procedure.
Figure 8: A plot depicting the relation of the performance of LS, RLS and the actual optimal or near-optimal solutions found by CBC for Model 3. These results are grouped according to the problem instance. Each of the gray lines represents the objective values found by a run of LS, RLS and the objective value found by CBC.
Figure 9: Box-whisker plots obtained for problem instances generated according to Model 4. LS and RLS were used times per instance, while the results for CBC represent optimal or near-optimal reference solutions obtained by the corresponding ILP solving procedure.
Figure 10: A plot depicting the relation of the performance of LS, RLS and the actual optimal or near-optimal solutions found by CBC for Model 4. These results are grouped according to the problem instance. Each of the gray lines represents the objective values found by a run of LS, RLS and the objective value found by CBC.

A surprising observation is that while RLS often has inferior performance, it can actually outperform LS at times due to its randomised nature. This can be observed in Figure 8. The triangular shape of the structures indicates that RLS is often far from high-quality solutions. However, one can also observe occasional high-quality runs, such as the solution with the optimal objective value for one of the instances, that LS failed to produce.

Model 3 represents an instance type, for which LS is a good choice to rapidly obtain a good but not necessarily optimal solution. LS estimates the distribution of optima better but with a consistent bias, possibly guaranteeing a good approximation but getting frequently stuck in local optima. If one needs to obtain a near-optimal solution, then RLS might be a better choice of an algorithm due to its randomised nature and better robustness. In summary, this seems to be a model, for which a suitable hybrid of both LS and RLS may work quite well.

4.2.4 Model 4: Flat Facility Cost, Poissonian Distribution of Distances

Figure 9 presents the results for the last model in the form of box-whisker plots. This model seems to be the most intriguing one out of the four models investigated. In contrast to Model 3, it is RLS, for which the results are concentrated around the median. However, RLS still seems to perform better than LS in most cases, even though the results indicate that it also has a tendency to get stuck in local optima. It is also worth noting that CBC was only able to find near-optimal solutions for these instances for the sizes we investigated, including the smallest instances with facilities. The near-optimal solutions sampled have a very flat distribution. In this context, RLS seems to be better at estimating these distributions, albeit with a bias.

In Figure 10, we present a more detailed picture of the performances of the algorithms. The patterns reveal that in most cases, LS and RLS produce solutions of comparable quality. RLS is therefore likely to sample solutions with better objective values more frequently, thus, obtaining a better median performance. However, for the instances, one can also observe that while CBC consistently produced solutions with objective value , RLS was able to provide a better solution with objective value , standing out from the corresponding plot. This is also in line with the relatively distant lower bound distribution sampled. This reveals that for Model 4, hybrid algorithms incorporating RLS may be a good choice of problem solving techniques.

4.3 Discussion

The previous results have indeed painted a very complex picture of problem difficulty landscape and algorithm performance. On a more fine-grained level, we have obtained four partially or heavily contrasting instance models and algorithm behaviours. For simplicity, we summarise our findings in the following:

  • Model 1. We obtained that a hybrid or a multi-start algorithm using RLS would be a good technique for this type of instances. The use of a solution polishing feature of an out-of-the-box ILP solver could also be a good choice, given the results obtained by CBC. This idea was previously successfully used to solve the minimum weight dominating set problem [10].

  • Model 2. For this model, we obtained that a multi-start variant of LS would likely be the best choice of an algorithm.

  • Model 3. This model showed much more complex patterns of landscape and algorithm behaviour. However, it seems that a multi-start variant of RLS and possibly its hybrid could work well for these instances.

  • Model 4. For the Poissonian distance model, we obtained a very flat picture of algorithm behaviours. However, we observed a phenomenon of “very lucky” runs of RLS, that may provide even better solutions than CBC in hours of systematic search. For such an instance type, hybrid algorithms based on RLS seem to be well-suited.

instance average average result * average
model upper bound lower bound CPU time
Model 1 1038.4 1038.4 OPT 101 s
1033.8 1033.8 OPT 1437 s
1032.4 1032.2 TLE (1) 27557 s
1030 1027.3 TLE (9) 82238 s
1030.1 1025 TLE (10)
Model 2 1100.8 1100.8 OPT 45 s
1058.6 1058.6 OPT 254 s
1052.3 1052.3 OPT 1175 s
1042.1 1041.9 TLE (1) 24506 s
1041.4 1040 TLE (4) 55092 s
Model 3 1006.9 1006.9 OPT 20002 s
1007 1006.4 TLE (5) 65511 s
1007 1006 TLE (8) 81083 s
1007 1005.6 TLE (8) 81369 s
1006.9 1005.1 TLE (10)
Model 4 1010 1007.6 TLE (10)
1009.9 1007 TLE (10)
1009.9 1006.4 TLE (10)
1009.9 1006 TLE (10)
1009.9 1005.9 TLE (10)

* Result values represent the following:
OPT - optimal solution found for all instances generated,
TLE - time limit exceeded and near-optimal solution found for at least one instance (the value in parentheses represents the number of instances with TLE result).

Table 1: Detailed results obtained for each of the instance models by CBC with hour time limit. The results presented represent averages over instances per each configuration.

It is worth pointing out that these results are not particularly surprising in the context of the No Free Lunch theorem [17, 26, 45]. However, their implications for the design of efficient algorithms are profound for solving related problems in real-world scenarios. Such a case becomes even more pressing if generalisation and adaptability to instances with unknown numerical properties is desired. This includes popular problems such as job shop scheduling [4], flow shop scheduling problems [33], variants of knapsack problems [23] or container pick-up problems [19].

The way out of this vicious circle of metaheuristic algorithm design is two-fold. Firstly, one can argue that a reliable metaheuristic for a real-world problem can only be designed if specific structural and numerical properties of the instance to encounter are known in advance. This is the case for several real-world applications, for which the problem-specific knowledge may be used to describe the expected numerical structure of the instance. Alternatively, if a solution of only a certain quality is expected, then it is desirable to use as simple algorithm as possible. This has several advantages, including scalability and minimised need for parameter tuning. However, the results we obtained for LS and RLS also suggest combining several of such algorithms may be a good way forward.

It has previously been verified that tabu search is the most universally successful technique for solving the facility location problem, providing better results than genetic algorithms and simulated annealing in most cases [5]. However, as tabu search can be based on both systematic and randomised local search strategies, an interesting open problem is an extension of our investigations to tabu search algorithms based on LS and RLS. In addition, it is known the tabu tenure, i.e. the tabu list size, is crucially important for the performance of metaheuristics in combinatiorial optimisation. It may even be decisive on whether the algorithm converges or will get stuck with an overwhelming probability [12]. We therefore believe this study is an important step towards a different approach to performance investigation for metaheuristics. This also has an impact on their design, either in its numerically focused or exploratory form.

5 Conclusions

We proposed four cost and distance models for the uncapacitated facility location problem instances. These models enabled us to uncover a complex relation between the numerical properties of a problem instance and efficiency of the optimisation algorithms used to solve the problem. Since facility location is a simple example of a scalable choice-and-assignment optimisation problem, these phenomena are likely to observed also in more complex optimisation problems in science and engineering.

An investigation of the efficiency of two metaheuristic algorithms was presented, namely systematic local search (LS) and randomised local search (RLS). An out-of-the-box mixed-integer linear programming solver has also been used to obtain reference results and determine how far LS and RLS are from the optimal or near-optimal solutions. In the experimental results, one can observe a rich variety of behaviours that is preserved even though the algorithm performance was investigated for instances with the same size. RLS outperformed LS for Model 1, while this was reversed in Model 2. Models 3 and 4 show even more intricate landscape properties and behaviours of LS, RLS, as well as the exact solver.

Based on the results obtained by LS and RLS, one can conclude that the choice of the right local search strategy for a particular model is closely related to its internal numerical properties. As a consequence, efficient state-of-the-art algorithms for each model would look somewhat different in design. This strengthens the case for design of hybrid metaheuristics for this type of problems, but also highlights the intriguing need for the numerical properties of a particular instance to be taken into account in such a design. In some real-world applications, these properties may be well-known intuitively. However, for applications where these properties are unknown, this shows a pressing need for simple and scalable algorithms with a compact set of parameters.

Our results can likely be generalised to other related assignment problems and combinatorial problems in general, such as job shop and flow shop scheduling, knapsack problems or pallet stacking. We believe that these results may motivate a somewhat new bottom-up approach to metaheuristic design and investigation, especially in relation to hybrid metaheuristics.

References

  • [1] E. H. Aarts and J. K. Lenstra.

    Local search in combinatorial optimization

    .
    Princeton University Press, 1997.
  • [2] C. H. Aikens. Facility location models for distribution planning. European Journal of Operational Research, 22(3):263–279, 1985.
  • [3] K. S. Al-Sultan and M. A. Al-Fawzan. A tabu search approach to the uncapacitated facility location problem. Annals of Operations Research, 86:91–103, 1999.
  • [4] D. Applegate and W. Cook. A computational study of the job-shop scheduling problem. ORSA Journal on Computing, 3(2):149–156, 1991.
  • [5] M. A. Arostegui, S. N. Kadipasaoglu, and B. M. Khumawala. An empirical comparison of tabu search, simulated annealing, and genetic algorithms for facilities location problems. International Journal of Production Economics, 103(2):742–754, 2006.
  • [6] V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Munagala, and V. Pandit. Local search heuristics for k-median and facility location problems. SIAM Journal on Computing, 33(3):544–562, 2004.
  • [7] C. Blum and A. Roli. Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Computing Surveys (CSUR), 35(3):268–308, 2003.
  • [8] C. Blum and M. Sampels. An ant colony optimization algorithm for shop scheduling problems. Journal of Mathematical Modelling and Algorithms, 3(3):285–308, 2004.
  • [9] P. Bonami, L. T. Biegler, A. R. Conn, G. Cornuéjols, I. E. Grossmann, J. Laird, C. D.and Lee, A. Lodi, F. Margot, N. Sawaya, et al. An algorithmic framework for convex mixed integer nonlinear programs. Discrete Optimization, 5(2):186–204, 2008.
  • [10] S. Bouamama and C. Blum. A hybrid algorithmic model for the minimum weight dominating set problem. Simulation Modelling Practice and Theory, 64:57–68, 2016.
  • [11] C. Canel, Ba. M. Khumawala, J. Law, and A. Loh. An algorithm for the capacitated, multi-commodity multi-period facility location problem. Computers & Operations Research, 28(5):411–427, 2001.
  • [12] D. Chalupa. On transitions in the behaviour of tabu search algorithm TabuCol for graph colouring.

    Journal of Experimental & Theoretical Artificial Intelligence

    , pages 1–17, 2017.
  • [13] D. Chalupa and C. Blum. Mining k-reachable sets in real-world networks using domination in shortcut graphs. Journal of Computational Science, 22:1–14, 2017.
  • [14] D. Chalupa, K. A. Hawick, and J. A. Walker. Hybrid bridge-based memetic algorithms for finding bottlenecks in complex networks. Technical Report CSI-0076, Computer Science, University of Hull, Cottingham Road, Hull, HU6 7RX, August 2017.
  • [15] D. Chalupa and P. Nielsen. A large-scale customer-facility network model for customer service centre location applications. Technical report, Operations Research, Aalborg University, Fibigerstræde 16, 9220 Aalborg, October 2017.
  • [16] G. Cornuéjols, G. L. Nemhauser, and L. A. Wolsey. The uncapacitated facility location problem. Technical report, Carnegie-mellon univ pittsburgh pa management sciences research group, 1983.
  • [17] J. C. Culberson. On the futility of blind search: An algorithmic view of “no free lunch”. Evolutionary Computation, 6(2):109–127, 1998.
  • [18] M. S. Daskin, L. V. Snyder, and R. T. Berger. Facility location in supply chain design. Logistics systems: Design and optimization, pages 39–65, 2005.
  • [19] N. A. D. Do, I. E. Nielsen, G. Chen, and P. Nielsen. A simulation-based genetic algorithm approach for reducing emissions from import container pick-up operation at container terminal. Annals of Operations Research, 242(2):285–301, 2016.
  • [20] R. Z. Farahani, M. Abedian, and S. Sharahi. Dynamic facility location problem. In Facility Location, pages 347–372. Springer, 2009.
  • [21] Sudipto G. and Samir K. Greedy strikes back: Improved facility location algorithms. Journal of Algorithms, 31(1):228–248, 1999.
  • [22] L. L. Gao and E. P. Robinson. A dual-based optimization procedure for the two-echelon uncapacitated facility location problem. Naval Research Logistics (NRL), 39(2):191–212, 1992.
  • [23] C. García-Martínez, F. J. Rodriguez, and M. Lozano. Tabu-enhanced iterated greedy algorithm: a case study in the quadratic multiple knapsack problem. European Journal of Operational Research, 232(3):454–463, 2014.
  • [24] A. R Guner and M. Sevkli.

    A discrete particle swarm optimization algorithm for uncapacitated facility location problem.

    Journal of Artificial Evolution and Applications, 2008, 2008.
  • [25] K. Haase and S. Müller. Insights into clients’ choice in preventive health care facility location planning. OR Spectrum, 37(1):273–291, Jan 2015.
  • [26] Y. C. Ho and D. L. Pepyne. Simple explanation of the no-free-lunch theorem and its implications. Journal of Optimization Theory and Applications, 115(3):549–570, 2002.
  • [27] D. S. Hochbaum. Heuristics for the fixed cost median problem. Mathematical Programming, 22(1):148–162, Dec 1982.
  • [28] J. H. Jaramillo, J. Bhadury, and R. Batta. On the use of genetic algorithms to solve location problems. Computers & Operations Research, 29(6):761–779, 2002.
  • [29] M. R. Korupolu, C. G. Plaxton, and R. Rajaraman. Analysis of a local search heuristic for facility location problems. Journal of Algorithms, 37(1):146–188, 2000.
  • [30] Z. J. Lee and C. Y. Lee. A hybrid search algorithm with heuristics for resource allocation problem. Information Sciences, 173(1):155–167, 2005.
  • [31] S. Li. A 1.488 Approximation Algorithm for the Uncapacitated Facility Location Problem, pages 77–88. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011.
  • [32] J. T. Linderoth and A. Lodi. MILP software. Wiley encyclopedia of operations research and management science, 2011.
  • [33] R. Linn and W. Zhang. Hybrid flow shop scheduling: a survey. Computers & Industrial Engineering, 37(1-2):57–61, 1999.
  • [34] M. T. Melo, S. Nickel, and F. Saldanha-Da-Gama. Facility location and supply chain management–a review. European Journal of Operational Research, 196(2):401–412, 2009.
  • [35] L. Michel and P. Van Hentenryck. A simple tabu search for warehouse location. European Journal of Operational Research, 157(3):576–591, 2004.
  • [36] F. Neumann and C. Witt. Bioinspired computation in combinatorial optimization: algorithms and their computational complexity. Springer, 2010.
  • [37] P. Nielsen, I. Nielsen, and K. Steger-Jensen. Analyzing and evaluating product demand interdependencies. Computers in Industry, 61(9):869–876, 2010.
  • [38] M. Nimrod and T. Arie. On the complexity of locating linear facilities in the plane. Operations Research Letters, 1(5):194–197, 1982.
  • [39] H. Pirkul and V. Jayaraman. A multi-commodity, multi-plant, capacitated facility location problem: formulation and efficient heuristic solution. Computers & Operations Research, 25(10):869–878, 1998.
  • [40] M. Relich and P. Pawlewski. A multi-agent system for selecting portfolio of new product development projects. Communications in Computer and Information Science, 524:102–114, 2015.
  • [41] K. Steger-Jensen, H. H. Hvolby, P. Nielsen, and I. Nielsen. Advanced planning and scheduling technology. Production Planning & Control, 22(8):800–808, 2011.
  • [42] M. Sun. Solving the uncapacitated facility location problem using tabu search. Computers & Operations Research, 33(9):2563–2589, 2006.
  • [43] D. W. Tcha and B. I. Lee. A branch-and-bound algorithm for the multi-level uncapacitated facility location problem. European Journal of Operational Research, 18(1):35–43, 1984.
  • [44] J. Wollenweber. A multi-stage facility location problem with staircase costs and splitting of commodities: model, heuristic approach and application. OR Spectrum, 30(4):655–673, Oct 2008.
  • [45] D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67–82, 1997.
  • [46] V. Yigit, M. E. Aydin, and O. Turkbey. Solving large-scale uncapacitated facility location problems with evolutionary simulated annealing. International Journal of Production Research, 44(22):4773–4791, 2006.