An interacting replica approach applied to the traveling salesman problem

06/27/2014 ∙ by Bo Sun, et al. ∙ Washington University in St Louis 0

We present a physics inspired heuristic method for solving combinatorial optimization problems. Our approach is specifically motivated by the desire to avoid trapping in metastable local minima- a common occurrence in hard problems with multiple extrema. Our method involves (i) coupling otherwise independent simulations of a system ("replicas") via geometrical distances as well as (ii) probabilistic inference applied to the solutions found by individual replicas. The ensemble of replicas evolves as to maximize the inter-replica correlation while simultaneously minimize the local intra-replica cost function (e.g., the total path length in the Traveling Salesman Problem within each replica). We demonstrate how our method improves the performance of rudimentary local optimization schemes long applied to the NP hard Traveling Salesman Problem. In particular, we apply our method to the well-known "k-opt" algorithm and examine two particular cases- k=2 and k=3. With the aid of geometrical coupling alone, we are able to determine for the optimum tour length on systems up to 280 cities (an order of magnitude larger than the largest systems typically solved by the bare k=3 opt). The probabilistic replica-based inference approach improves k-opt even further and determines the optimal solution of a problem with 318 cities and find tours whose total length is close to that of the optimal solutions for other systems with a larger number of cities.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 5

page 9

page 10

page 11

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Combinatorial optimization problems [1] often possess a relatively large number of locally optimal pseudo-solutions, similar to the abundance of metastable energy states in complex physical systems. This can make determination of the global optimum difficult, especially for heuristic algorithms which attempt to optimize a cost function locally (i.e., by iteratively perturbing the parameters, testing the resulting change in the cost function, and allowing the state change if the cost function is decreased or some other conditions are satisfied).

The concept of a local optimizer acting on some cost function in parameter space is equivalent to modeling a thermal system exploring its energy landscape. At a certain temperature, a thermal system can realistically exchange a certain amount of heat with the environment. While the system is generally attempting to find the lowest energy state, it can temporarily gain energy, and, in so doing escape from a local energy well. However, if the well is deeper than the realistically allowable energy gain, then the system may remain stuck in a metastable, locally optimal energy state indefinitely. This is what happens in spin glasses [2], for instance.

Analogously, if a local well of the cost function in parameter space is deeper than the realistically allowable positive gain in the cost function, then the simulation of a local optimizer will remain stuck, creating a design tradeoff. The greater potential gain allowed in the cost function, the easier it is for the simulation to escape potential wells, but the longer it will take to actually find a minimum because it will have a larger search space at each step, and it will be moving “uphill” more often. This is the reason that heuristic algorithms can generally be relied upon to produce good pseudo-solutions a few percent above the optimum value, but rarely find the actual global optimum in sufficiently complex problems.

Previous methods such as replica exchange [3]

and genetic algorithms

[4] have attempted to address this problem with varying degrees of efficacy depending on context. We were inspired to use “information-based replica” correlations and inference to systematically detect ideal subgraph (or “community”) partitions of a large graph as two of us have done several years ago [5]. By “replicas” we here allude to independent copies of the same problem. Since then these notions have been applied to a variety of complex system physics (both static and dynamic) and image segmentation problems [6, 7, 8, 9]. More recently, other works applied similar notions to a host of interesting problems [11, 12, 13]. Free-energies and entropies of such ensembles or “multiplexes” have been discussed in [5, 14]. In our approach we do not focus solely on directly extracting the minimum amongst an ensemble of solvers. A key ingredient that we introduced in earlier work is the use of inference to predict which features of the solutions appearing in disparate replicas may coincide with those in the optimal solution; this inference coupling as well as other effective “interactions” between the replicas (e.g., a “geometrical coupling” discussed below) between the replicas may lead to solutions that at intermediate steps elevate the energy (yet lower a “free energy” [5]) similar to the way in which thermal effects may, at intermediate steps, elevate energies in annealing algorithms. Transitions in the complexity of combinatorial optimization problems such community detection problems have crisp signatures in inter-replica correlation functions and information theory measures[10]. Augmenting Refs. [5, 6, 7, 8, 9, 10], we further also note the more recent work of Ref. [15] in which the authors demonstrate that the inference algorithms based on evolving interactions between replicated solutions in a cavity type approach have better performance in the binary Ising percepton problem. We further note that the “wisdom of the crowds” (which we took in Refs. [5, 6, 7, 8, 9, 10] to be independent replicas) has been long appreciated [16].

The approach that we further develop in the current work- that of geometric interactions between individual solvers and probabilistic ensemble inference- emulates the biological and sociological advantages long known colloquially from collective behaviors and “wisdom of the crowds” [16]. Historically, biologically inspired “swarm intelligence” algorithms [17] have spawned algorithms such as the well known Ant Colony System (ACS) [18] with which we will later compare the new probabilistic variant of our method. In a broad sense, the spin-glass physics cavity approximation inspired message type algorithms of the type of Ref. [15], exchange Monte Carlo [3], genetic algorithms [4], the work that two of us developed in Refs. [5, 6, 7, 8, 9, 10], the ACS, and a multitude of other approaches might all be cast as particular realizations of broad ensemble based interactions or moves.

In the current work, we present new optimization methods based upon the concepts of (i) geometrical distance coupling (GDC) and (ii) inference amongst independent replicas. We apply these tools to the Traveling Salesman Problem. We demonstrate that while a single replica can do quite poorly in solving a challenging problem, via the use of (i) and (ii) above, the ensemble of replicas can address much harder problems.

We employ a quantity, which we term the “GDC distance” (see Appendix, Eq. (A2) in particular) to measure the similarity between two tours A and B. A distance C(A,B) =0 indicates that tours A and B are identical. We coupled otherwise independent optimizers via their geometrical distances, so that the optimizers will essentially have two “forces” influencing their behavior, see Fig. 1. Each optimizer will (i) independently desire to decrease its cost function locally, while simultaneously attempting to (ii) minimize the GDC between itself and all other optimizers (portrayed by the harmonic springs in Fig. 1). We demonstrate that through this coupling, local optimizers can escape from wells which otherwise would have confined them permanently, in the cases we studied.

Fig. 1: (Color online.) Coupled replicas in a high dimensional energy landscape. The springs schematically represent the tendency of replicas to collectively interact with one another when veering towards viable minima.

An additional central tool that we will invoke in this work is that of probabilstic inference from the different replicas, e.g., [5, 8] or a “replica inference based” (RIB) method. In the simplest rendition of this approach, we simply count how many times a given feature of the solution appears in the different replicas. If a structure of the solution (e.g. in the travelling salesman problem that we will discuss in this work, a tour sequentially passing through the same three cities) is common to all or many solvers then one may anticipate this structure appears in the optimal global solution. That is, augmenting the GDC discussed above, the replicas interact effectively with one another via their correlations. By sequentially finding common features in independent copies or replicas of the problem and assuming these to be correct and left untouched, the system to be examined is sequentially made smaller and easier to solve anew. Both of the approaches that we will employ in this work may be viewed as emulating the minimization of an effective multi-replica “free-energy”. Schematically, as in Ref. [5], we may consider an effective free-energy given by

(1)

where are the collection of coordinates that describe solver (replica) , the quantities are the energies of contending solutions in the disparate replicas, is an inter-replica correlation functional, and sets a relative weight between the sum of the intra-replica contributions () and the inter-replica correlations (that may, e.g., imitate the GDC, RIB or other couplings). The detailed iterative minimization procedures that we describe in this work for the Traveling Salesman Problem are particular simple examples of the more general idea embodied by the minimization of the replica ensemble functional of Eq. (1). The springs in Fig. 1 symbolize inter-replica effects. For finite , both intra-replica and inter-replica effects must be assuaged when minimizing .

Ii Traveling salesman problem

The Traveling Salesman Problem (TSP) is one of the most famous problems in the entire field of combinatorial optimization. It is often used as a benchmark for testing new optimization approaches. The TSP is an NP-hard problem, and as such, no algorithm has been discovered which can solve it in polynomial time. The problem is defined as follows:

Given a set of N cities, find the shortest tour which visits each city exactly once and returns to the starting city.

Because the best exact methods [19, 20] for solving the TSP take an amount of time which increases faster than a polynomial function of , numerous heuristic algorithms have been proposed which run much faster, but they fail to guarantee optimality primarily because of the drawbacks mentioned above.

There are roughly three classes of algorithms. The first class is the greedy heuristic which gradually forms a tour by adding a new city at each step, such as the Nearest-Neighbor algorithm [21]. In our method, we use the Nearest-Neighbor algorithm to initialize a candidate tour construction. Briefly, the Nearest-Neighbor algorithm is given by three steps: () Select a random city. () Find the nearest unvisited city and go there. () Check if any unvisited cities are left? If yes, go to step . If no, return to the first city.

The second class of heuristic TSP algorithms is a tour improvement approach. A typical example is given by the -opt algorithm which seeks to iteratively improve the current state of a tour by removing edges and replacing them in the most optimal way by random searching on a limited number of cities at a time. A famous local search algorithm by Lin-Kernighan (LK) [22] belongs to this class. The LK algorithm is a variable -opt algorithm which decides which is the most suitable at each iteration step. This makes the algorithm quite complex, and few have been able improve it. A more in-depth study of the LK algorithm with possible improvements was made by Helsgaun (LKH) [23].

The third class of heuristic methods is a composite algorithm that combines the features of the former two. A good example can be found in Dorigo and Gambardella [24] where the authors combined the ant colony system (ACS) [18] with the -opt method to achieve strong results. Here the ACS acts like a more sophisticated tour construction algorithm which allows communication (pheromones left by individual ants) between different ant solvers. -opt is the local optimizer which helps to optimize the results obtained with ACS.

Iii Geometrical distance coupling algorithm

As noted above, we use the geometric distance coupling between different solvers (or replicas) to enhance the solutions found by individual solvers. To demonstrate the strength of our approach and the degree to which coupling between replicas can dramatically improve the results, we apply our method on the bare k-opt algorithm. On their own, sans the use of replicas, the k=2 or 3 opt-algorithms have a very poor performance; this makes the improvement using our replica based approach very clear. Our GDC replica based approach may, in principle, be applied to any algorithm (not solely k-opt). The GDC algorithm is implemented as follows:

  1. Use the Nearest-Neighbor Algorithm [21] to seed all the replicas, beginning at random cities in different replicas to ensure some variation in the initial states.

  2. Perform a variable initial number of -opt steps independently on all replicas.

  3. Apply the GDC step (after a given number of iterations) as follows:

    1. Determine the most common edge among all replicas.

    2. For replicas that share the identified common edge (see Fig. 2 as well as Appendix for further discussion), attempt to move a random city to its average position relative to common edge in all relevant replicas. Allow the move only if the total tour length is decreased or if it is increased by less than a specified tolerance.

  4. Perform a variable number of -opt steps independently on all replicas.

  5. Go to step until a global number of iterations is reached.

Fig. 2: (Color online.) An illustration of the geometric coupling between replicas. The outcome of this basic coupling is that a given city (node) is moved to a position averaged over all replicas. In this example, there are three replicas. The link is common to all replicas. is a “standard city” that is used to calibrate distances (see Appendix). This city is chosen at the beginning (by symmetry the choice of this city is immaterial). (a) Three relevant replicas sharing a common edge S-S’. is the randomly chosen city. The designations A3, B5, and C7 represent the same city as it appears in replica 1, replica 2 and replica 3 respectively. In these three individual replicas, the tour length between and are 10, 20, and 30 respectively. After averaging, the city will be moved to cities with a distance of 20 away from in all three replicas. A5, B5 and C5 are the target city where A3, B5 and C7 will be inserted (see Appendix). (b) Updated replica configurations after inserting city into the target location. (c) A graphic depiction of the change between the initial (a) and final (b) replicas before and after the move of city in replica 1.

To clarify, we start the Nearest Neighbor algorithm in different cities for each replica and perform an initial number of -opt steps in order to guarantee that our replicas are not starting too close to each other in parameter space. The GDC attempts to ensure that all of the replicas eventually converge on the same location in parameter space. If the replicas become clustered too closely together (as in, e.g., the cartoon of Fig. 1), they may not efficiently explore the landscape of possible solutions and fail to find the global minimum.

During the GDC step, we randomly select a sequence of cities. We then calculate the average position of those cities relative to the most common edge for the replicas which contain the identified common edge. Then, for each relevant replica, that city is plucked from its current location and placed in the average position. The locally broken tour sequence is reattached in the manner described in Appendix. If the tour length is either decreased or increased by less than the tolerance, then the move is accepted, otherwise it is rejected. In this manner, the algorithm is able to locally decrease the quantity

(2)

In the above, is the number of the relevant replicas which share the common edge in step 3 of GDC algorithm, is the city index, is the distance between city and a common edge in replica , and is the average of (as averaged over all the relevant replicas). By lowering for different cities , the replicas generally become more similar to one another.

Iv Results from geometrical distance coupling algorithm

We tested our algorithm on several instances from TSPLIB [25] using and for the -opt step. We demonstrated that within a certain range of for which -opt and -opt alone almost always failed to find the global minimum, our enhanced algorithm was able to find it in a reasonable amount of time. For some larger values of , our algorithm also failed to find the minimum, but it did significantly improve the

-opt estimate, and we believe that it could find the minimum if mated with an appropriate optimizer which is more sophisticated than the standard

- and -opt methods.

The results are summarized in Table I where length and time values are averaged over runs. The unit of the CPU time here is second. The GDC enhanced algorithm is labeled “2-opt GDC” and “3-opt GDC” respectively, for the base - or -opt local search routine. The parameters used here were replicas, geometrical distance coupling step consisting of moves alternated with to -opt steps, and an allowable increase in the tour length of - during each attempted GDC move. For all instances studied in Table I, we performed initial -opt steps independently on all replicas for step in Sec. III. For the instances with a small number of cities (berlin52, eil51, pr76, eil76) we used approximately -opt steps in step . For the instances with a relatively large number of cities (ch130, ts225, a280, lin 318, and att532), we invoked 15 000 k-opt moves in step 4 of the algorithm.

We allowed larger tour length increase tolerance in step (b) for larger problems. The GDC method using the -opt optimization can correctly solve all examined TSPLIB [25] problems up to cities. If -opt is applied instead of -opt the maximal solvable size is . We note that neither the bare - nor -opt by themselves are not able to find the optimal solution for even the smallest of these examples given a comparable number -opt optimization steps. The time required to find the global optimum with the GDC step is large compared to the computing time for the -opt. Part of the reason is that the -opt optimization is easily trapped in local minima, but the GDC step is capable of pulling the optimizer from the local minima and having them explore a much broader region of the solution space.

Fig. 3: (Color online.) The improvement of the bare 2-opt method by the use of replica coupling for the lin318 problem. The figure shows that tour lengths found by invoking R=1 (black squares), R=5 (red circles) and R=20 (blue triangle) replicas averaged over solution attempts. The horizontal axis shows the results obtained by including attempts. Applied to the 2-opt GDC, the use of replicas produced better results than the use of replicas.
Fig. 4: (Color online.) The improvement of the bare 3-opt method by the use of replica coupling for the lin318 problem. For 3-opt GDC the average tour length for the tested range of replicas was very close, but when , the algorithm still found a smaller tour length.
Fig. 5: (Color online.) A schematic representation common structure segments in solutions of the Traveling Salesman Problem. Cities in segments , , and as well as , , and are represented by spheres and the solved tour path follows the depicted edges connecting the cities.

We investigated the effect of the number of replicas used on the performance of the algorithm. Figure 3 and 4 contrast results obtained when our GDC algorithm is applied with and replicas to improve the bare 2-opt and 3-opt respectively (we term the resultant algorithms 2-opt GDC and 3-opt GDC) in the lin318 problem. The average tour length in the case using the 2-opt GDC was 42 479 whereas when using replicas, the 2-opt GDC yielded a path of distance 42 404. Not too surprisingly, in both the 2-opt GDC and the 3-opt GDC, the replica case provides shorter tour lengths than the replica algorithm; this improvement with increasing is smaller for the 3-opt GDC (as the bare 3-opt algorithm is better than the 2-opt method).

V probabilistic replica-inference based algorithm

Next, we show how we developed a replica-inference based (RIB) algorithm to, e.g., solve the lin318 test problem and the att532 test problem. The RIB algorithm is implemented as follows:

  1. Use GDC Algorithm to seed all the replicas (the total number of replicas is R).

  2. For all the replicas, calculate the probability distribution of the nodes (see equation (2)).

  3. Given the probability information of the nodes, divide the tour into different parts: “bubble” region in which different tours appear in disparate replicas and a “backbone” region which is common to all replicas (see Figs. (12, 16)).

  4. Keeping the common backbone region unchanged, examine different tours in the bubble regions such that when combined with the backbone path, they will lead to new viable solutions (replicas) and then pick the one (replica X) that has the shortest path. When we examine different configurations in the bubble regions we must pay attention to the “pairing” inside the bubbles (see Figs.(12,16) and discussion below). We also observe that inside a given bubble there are no lines that cross as required by the triangle inequality (see Fig. 6).

  5. Two replica comparison (performed R times): compare the replica X found in step 4 with the R original replicas that existed prior to step 4 through steps -.Each time after step 4, update the replica X found in step 4. A final solution will be produced after R times of two replica comparison.

In the up and coming, we will introduce and invoke a probability associated with each node . This probability will measure the frequency that the links to the incoming () and outgoing () associated with each city are the same amongst all replicas. That is,

(3)

is the number of times that the composite link connecting the three cities , and appears in the ensemble of replicas studied.

In what follows, we introduce the concept of a “bubble” alluded to in the algorithm above. A “bubble” is, by fiat, comprised of all nodes for which the composite links are not the same across all replicas (i.e., nodes for which ). The set of such nodes must generally terminate somewhere and is linked to a backbone of nodes that have the same links in all replicas. The termination points marks the boundaries of the “bubbles”. In the more detailed representations of the tour solutions in some of the figures that follow, we will typically mark green all points for which the links are identical in all replicas (i.e., the associated probability ).

In Fig. 6, we schematically depict typical “one- in-one-out” and “two-in-two-out” bubbles (i.e., bubbles that are attached to the common backbone by either two or four points).

Fig. 6: (Color online.) A cartoon illustrating that the minimal tour will never intersect itself. In the figure above, a tour containing the two segments and will always have a shorter length than a tour incorporating the same four points yet includes the segments and (that intersect at a point ). The proof of this assertion is trivial. By the triangle inequality as applied to the triangles and respectively, we have and . Adding these two inequalities yields . Permuting the contour segments (e.g., AD,BC AC, BD) to avoid crossing will always lower the total path length.
Fig. 7: (Color online.) Typical “one-in-one-out” and “two-in-two-out” bubble.

Vi Results of the probabilistic replica inference approach

We next apply our replica-inference based algorithm to solve the lin318 test problem from TSPLIB (see 3-opt GDC results in Table I). Typically, we used replicas. The known true (i.e., minimal distance) tour solution is a path of length 42 029. Each of the replicas employed provided a contending solution; the paths in each of the replicas that varied in length from 42 050 to 42 199.

A significant fraction of the configurations produced by the GDC algorithm (discussed in sections III, IV) when it is applied for each of the replicas share three-site configurations of the type shown in Fig. 5. Schematically, the composite link may be shared amongst numerous replicas as is illustrated in Fig. 8. To quantitively measure this similarity, we employ (as we have alluded to in sections III) a probability distribution based on the these link patterns associated with each node . That is, in every replica each node has two adjacent nodes, so is defined as the max number of replicas for which shares the same neighbors (where is in the middle) divided by total number (R) of replicas ( in our example calculation here). In Fig. 9, we show a representative plot the probability distribution for different nodes on the lin318 problem.

Fig. 8: (Color online.) Schematic representation of common structures that appear in various replicas. In this example, the candidate common structure contains of three nodes with two links between them. When the common structure appears in all replicas exactly, we define the probability to be ( here). When the structure does not appear the same in all replicas, . For example, if , the number of replicas that contain the common structure is eight ().
Fig. 9: (Color online.) Building on the abstract representation in Fig. 8, we plot the exact probability distribution (frequency of common structures identified in each of the different replicas) for each node in the lin318 problem from TSPLIB. Here, there are nodes and replicas. Every node has two adjacent neighbors, so we define as the number of times a common structure occurs (i.e., the same pair of neighbor cities are connected to node ) among the replicas divided by the total number of replicas.

We wish to take advantage of the information contained in all of replica tours. Toward that end, we observe in Fig. 9 that out of nodes (approximately of the nodes) have a probability of . We conjectured that the common structures for nodes which have a probability equal to are the same as that from the known optimal solution. To test whether it was the case, we made a plot of Fig. 10. In Fig. 10, is a given fixed probability defined above when averaged over all replicas. Then we looked at the set of all links having that probability . is the fraction of these links that appear in the optimal solution; we then plot versus in Fig. 10.

Fig. 10: (Color online.) The vertical axis is the fraction () of two links from neighboring sites that impinge on a given node () found by the replicas that appear in the optimal (i.e., shortest tour) length solution. The horizontal axis is the probability () of finding this common set of two neighbors connected to the given node ; this probability is identical to the vertical axis in Fig. 9. As this figure illustrates for sufficiently large values , essentially all of the links found by many replicas also appear in the true optimal solution.
Fig. 11: (Color online.) We define a “bubble” as a set of nodes where the neighbor cities differ among the replicas. Here, green nodes denote the nodes which have identical neighbors in all replicas ( here); we define these nodes to have a probability . In this example, there are two distinct bubbles with nodes that are, respectively, depicted in this figure by two different colors- i.e., “yellow” and “red” spheres. The configurations inside the bubbles are different for each replica while the tour sections outside the bubbles are the same for all replicas.
Fig. 12: (Color online.) A schematic top view of Fig. 11. (a) one possible pairing inside the bubbles (b) the other possible pairing inside the bubbles. The green solid lines outside the blobs refer to the common structures shared by all of the 24 replicas while the dotted line inside the blobs denote the various possible bubble tours. The nodes (A1, A2, A3, A4, B1, B2, B3, and B4) are located on the boundaries of the shown bubbles. (c) a concrete example of (a).

Perusing Fig. 9 (associated with the lin318 problem), we observe that links with inter-replica frequency (i.e., links that consistently appeared in all replicas) indeed appeared in the optimal tour. Furthermore, numerous links with (i.e., those which were not consistent across all replicas) also appeared in the optimal shortest tour solution. In what follows, we introduce the concept of a “bubble” as it pertains to the current problem. A “bubble” is, by fiat, comprised of all nodes for which the links are not the same across all replicas (i.e., nodes for which ). The set of such nodes must generally terminate somewhere and is linked to a backbone of nodes that have the same links in all replicas. The termination points marks the boundaries of the “bubbles”.

In the replica-based inference approach, the common structures with are left untouched. We aim to solve the smaller and less difficult bubble problems separately instead of the entire tour map. In Figs. 11 and 13, nodes (represented as spheres in a connected tour map) whose probability is are colored green and those with are colored red or yellow.

We now turn to step 3 of our algorithm. By definition, the bubbles encompass the same set of nodes in all replicas. That is, if there is at least one replica in which a red (or yellow) node is attached to another red (or yellow) node , then and will lie in the same bubble for all replicas. In the lin318 problem, we obtained two bubbles by comparing the replicas to each other. This is illustrated in Fig. 11. (Different colors refer to different bubbles.)

Next, we apply step 4. As mentioned previously, the green nodes in Fig. 11 remain unchanged. We then solved for the optimal tour inside the two identified bubbles for this problem. The shortest intra-bubble tour for the larger bubble (red nodes) is (from replica ). The shortest intra-bubble tour for the smaller bubble (yellow nodes) is (this also appeared in replica ). Although we cannot find the optimal solution for lin318 problem at this stage, the current best tour length is .

Benchmark problems Bare algorithms unable to find optimal path Results from our new method that couples these algorithms
-opt -opt 2-opt GDC 3-opt GDC
Problem Cities Optimal length Length CPU time Length CPU time Length CPU time Length CPU time
berlin52 52 7542 7721 0.035 7606 4.3 7542 2.31 7542 1.62
eil51 51 426 433 0.023 429 13.41 426 6.82 426 39.36
pr76 76 108159 110875 0.057 109096 27.32 108280 8.511 108159 399.57
eil76 76 538 553 0.05 544 29.35 538 50.3 538 184.63
ch130 130 6110 6354 5.09 6232 8.377 6110 1003 6110 410
ts225 225 126643 128103 7.08 126885 110.4 126643 3398.6 126643 76.44
a280 280 2579 2701 8.88 2642 143.43 2615 1500 2579 7807.8
lin318 318 42029 44473 7.66 43347 48.57 42423 15120 42124 28080
att532 532 27686 29338 10.88 28447 60.00 28607 20220 28035 40770
TABLE I: TSP performance and results using -opt, -opt, 2-opt GDC, and 3-opt GDC on a selection of TSPLIB instances. k-opt GDC are results from the current work. The values of tour lengths and CPU times are averaged over runs. For the studied instances, -opt and -opt always failed to find the global minimum alone. With geometrical distance coupling (see Sec. III), our 2-opt GDC algorithm found the global minimum up to cities, and 3-opt GDC found the optimal tour up to . Although neither 2-opt GDC nor 3-opt GDC found the optimal solution for lin318, they significantly improved the base - or -opt estimate. The percentages above the optimum for lin318 was for -opt and for -opt which was reduced to and for 2-opt GDC and 3-opt GDC, respectively.

Caution must be exercised in choosing the optimal “intra-bubble” tour among all the relevant replicas. We mandate that the resultant tour is a valid TSP solution (i.e., we need to make certain that (i) the tour visits each node exactly once inside any bubble and that (ii) the formed global tour forms a closed path). To that end, we should consider the Pairing between two nodes alludes here to the circumstance that these two nodes are continuously connected to each other by an intra-bubble segment (see Figs. (12, 16)). Figs. (12, 16) constitute top views of the tours depicted in Figs. (11, 15) where the common backbones and regions with differing node paths (“bubble”) are made clear. The blobs in Fig. 12 schematically denote bubbles. The green solid lines (backbones) are the tour path formed only by common structures (which we do not want to change). The dotted lines inside the blobs are possible tour paths (we want to find the shortest such paths). Within the blobs there might be some common structures between the different replicas. Among all of the 24 replicas there were the two possible ways of pairing for the bubble nodes at the boundary. The bubbles in Fig. 12 are of “two-in-two-out” type. That is, the full tour will enter and exit each bubble twice. As discussed above, when we pick the optimal solution for each bubble and combine them together to form a new global solution we must make it certain that the tour is still valid.

Step 5: We were able to decrease the old tour length for replica by using just two replicas. In doing so, we find that we can decrease the tour length from to by swapping common bubble appearing in both replicas and and having the same “in-out” pairing. The bubbles being swapped are of the “two-in-two-out” type with the same pairing (see Fig. 12). So we can safely swap these two bubbles. The results are shown in Figs. 13 and 14. The total tour length associated with the common bubble in replica is compared to for replica with a difference of . Upon swapping the lower distance tour in the smaller bubble from replica 3 with the existing one in replica 7, replica attained the ideal optimal tour length of length - the correct solution of the lin318 problem. The resulting tour is depicted in Fig. 14.

We also applied our replica inference method to further optimize the solutions obtained by the GDC algorithm for the att532 problem [25]. In Fig. 15, all 24 replicas were employed to produce the common backbone (“green”) nodes and bubbles (differing tour regions attached to the common backbone) as we did for the lin318 instance. There were six bubbles in total. Five of these bubbles were of the “one-in-one-out” type while the rest were of the “three-in-three-out” type (see Fig. 16). In all of the replicas, the ”in-out” pairing was between the very same sets of node pairs. For the five smaller bubbles, by virtue of their minute size, brute force minimization quickly produced to find the optimal solution. For the “three-in-three-out” bubble we inserted the shortest path result (that obtained from replica 17) among the 24 replicas. These steps led to a solution for the att532 problem having a total tour length of 27 937 as compared to the average replica result of 28 035. We tried to decrease the tour length of the big bubble of “three-in-three-out” type further by invoking replica pair comparisons. This iteratively led to a replacement of the original three-in-three-out bubble to many far smaller bubbles. We then investigated whether we can optimize the original big bubble by swapping these smaller bubbles between replica 17 and others. Adopting some smaller bubble solutions from replicas 1,10, and 23 respectively, this set of sequential minimization and inference operations led to a better solution having a tour length . The process is illustrated in Fig. 17. The length of the resultant contending solution is 27 881. (The known optimal shortest path for att532 has a tour length equal to .)

Although we still cannot find the global minimum for att532, the solution of 27 881 produced by GDC algorithm and bubble method togother is only above the optimum. More importantly, by employing inference, we were able to reduce the large problem involving a minimization of the path of all nodes in the graph to a set of smaller problems involving disparate “bubbles”. Other than the “three-in-three-out” bubble the rest tour configuration was found to be the same as the known optimal solution.

To better understand the difference between our solution and the known optimal solution, we compared our replicas to the known optimal solution. Perusing Fig. 16, we find that despite of the same in-out “pairing” occuring associated with identifying the locations where the tour went in and out of each bubble, the optimal solution tries to go from node (on boundary) A6 to A5 more directly while our replicas always intend to go north from city A6 and finally reach A5 after visiting numerous nodes.

Fig. 13: (Color online.) A specific illustration of how our method is applied to two particular replicas in our GDC algorithm in Sec. III. The top place denotes the outcome of replica number 7 in our simulations while the bottom plane shows replica number 3. Green nodes are those nodes that have identical links in the set of all replicas (as in Fig. 8,  9). That is, nodes that are colored green have identical neighbors in all replicas. The remaining nodes with links that differ between the disparate replicas form separate “bubbles” attached to the backbone of common (green) nodes. We mark the nodes in the different “bubbles” by different colors. The yellow and red nodes form two bubbles where the tour configurations in the individual replicas differ. Amongst the two replicas shown, the shorter path in the bubble formed by the yellow nodes appears in replica 3. This intra-bubble configuration may be implemented in replica 7 to replace the original one shown. Once this transfer is done, the optimal tour (shown in Fig. 14) results.
Fig. 14: (Color online.) Optimal solution for lin318 from TSPLIB as discussed in Fig. 13. Following this transfer of the tour segment inside the bubble from replica 3 in Fig. 13, the new replica 7 attains the lowest distance optimal tour for the lin318 problem.
Fig. 15: (Color online.) An all replica comparison that was used to produce the common (green) backbone of links for the 532 node att532 problem. In this example, we found a total of six bubbles attached to the common backbone. Five of these bubbles were of the “one-in-one-out” type (denoted yellow above). The nodes in the more challenging “three-in-three-out” type bubble are marked red.
Fig. 16: (Color online.) A schematic top view of Fig. 15. The green solid lines denote the common tour path between the replicas. The dotted line inside the blobs denote the possible various bubble tours. The nodes (A1, A2, A3, A4, A5, and A6) are situated on the periphery of the “three-in-three-out” bubble marked red in Fig. 15.
Fig. 17: (Color online.) A comparison between replica 17 and other replicas in the att532 problem (see Figs. 15 and 16 for notation and color convention). (a) A side by side comparison between replica 17 (left) and replica 1 (right). Once the shorter intra-bubble path (marked yellow) from replica 1 is implemented in replica 17, the total tour length in replica 17 is reduced by a distance difference of size 8. (b) A similar comparison (for a region with another one-in-one-out “yellow bubble” different than that shown in panel (a)), between replica 17 (left) and replica 1 (right). Coincidentally, here also, swapping the intra-bubble tour in replica 17 with the shorter one found in replica 1 further lowers the total length by 8. (c) A further analogous comparison between replica 17 (left) with replica 10 (right). Replacing the initial other intra-bubble tour in replica 17 by the shorter one found for this bubble in replica 10 leads to a further lowering the tour length by 26. (d) A comparison between replica 17 (on left) with replica 23 (right) for a fourth “one-in-one-out” bubble in Figs. (15, 16). Using the shorter intra-bubble tour found in replica 23 instead of that initially found in replica 17 leads to a further reduction of the tour length in replica 17 by 14.

Vii Conclusion

We introduce a general method for improving known algorithms. (i.e., the -opt and -opt of the TSP [21]). Although, for definitiveness, we focused on the TSP, the premise of underlying our method is very general and it may, in principle, be applied to many other problems. The core concept of our approach is that of coupling between independent solvers (see Fig. 1 and Eq. (1)). Such a coupling between members of an ensemble of solvers (or “replicas”) that collectively seek to find an optimal solution may substantially improve the convergence to the correct answer as compared to the prevalent single replica algorithm. This coupling may be introduced amongst solvers of many types (with these solvers obtained by any previously known algorithm). We underscore and reiterte that by couplings these solvers, we may, very significantly, improve earlier results. In the context of the TSP, we demonstrated that geometrical distance and probabilistic inference coupling between otherwise independent replica solvers allow local solvers to flee from false local minima. By doing this, we obtained optimal TSP tours even when single solvers were unable to find the correct answer. As an example, we showed that while the bare 3-opt method fails to solve for the examined TSP problems with more than 50 cities (see Table I), by invoking replicas, the 3-opt method can accurately solve problems up to size of 318 cities (see Fig. 13). We furthermore obtained nearly optimal solutions even for larger systems. For instance, in the att532 example the replica method led to a solution with increase in tour length relative to the minimum (see latter part discussion of Section VI). Thus, the inter-replica coupling indeed led to a substantial improvement. Unlike genetic algorithms that involve no such coupling and simply randomly swap the city nodes among different contending solutions, our algorithms makes use of replica correlations and inference. Genetic algorithms fail to solve problems as complicated as those we do. To our knowledge, the currently best genetic algorithm [26] already falters in atempting to correctly find the minimal path for a 76 city tour (“pr76”) that we readily solved here (see Table I) and successfully went to far larger city tours. In conclusion, we introduce a new replica based approach that may be applied to disparate problems beyond the confines of the particular TSP problem solved here and the clustering and image segmentation challenges addressed in [5, 8]. Our core idea is that even algorithms that are simple may be much more potent once inter-replica interactions and inference are invoked.

Acknowledgment

This work was supported by NSF grants DMR-1411229 and DMR-1106293.

Appendix A Geometrical distance coupling step

The geometrical distance coupling step is applied “on top of” a base TSP solver. Generally speaking, the idea is to alternate optimization of the local solver (-opt in the current work) with the GDC step to induce the algorithm to escape local minima and enhance the chances of finding the globally optimal tour solution. GDC seeks to utilize the distance information implicitly contained in multiple TSP solvers to enhance the optimization performed by a base algorithm.

For illustration purposes, the following discussion uses only five replicas which are labeled , , , , and . We then represent a candidate tour solution as a string of cities,

Replica : tour:

Replica : tour:

Replica : tour:

Replica : tour:

Replica : tour:

where each replica tour correspondes to a permutation of the integers . The GDC algorithm is given by the following steps:

  1. Determine the most common edge among all replicas:

    1. Randomly select a “standard” reference city from . Cycle each replica order so it includes the standard city as the first element. The five example replicas have the following configurations:

      Replica :

      Replica :

      Replica :

      Replica :

      Replica :

      where are the neighbors of in the corresponding replica.

    2. Determine the most common edge among all replicas. That is, find which of the five links (, , , , ) appears the most frequently and flag the corresponding replicas. Let’s call the most common nearest neighbor city “”. Suppose there are three replicas containing this link :

      Replica :

      Replica :

      Replica :

    3. Calculate the average position for every cities. For example, replica has a long link: . We can know the distance from a specific city (say ) to the first city . For the relevant replicas (replicas , , and in this example) calculate the average value of the distance of all cities to the standard city .

  2. For replicas that share an identified common edge, attempt to move a random city to its average position (measured relative to the common edge). For example, in replicas , , and , we already know the average distance of city to . We compare this value to the distance of all cities to the standard city, and we find that the distance of to is the closest to the average distance of city to . If the total tour length after this change is decreased or if it is increased by less than a specified tolerance, we rearrange the order as follows: . Similarly, we continue to move random cities to their average positions in all relevant replicas for N-2 times.

The geometrical distance coupling between two replicas (candidate tours which share at least one common edge) is calculated by

(4)

where is the number of cities in the instance. represents the difference of the geometrical distance from the city to the standard city in tour and tour .

References

  • [1] C. H. Papadimitriou and K. Steiglitz, Combinatorial optimization: algorithms and complexity, (Courier Dover Publications, 1998).
  • [2] M. Mézard, G. Parisi, and M. A. Virasoro, Spin glass theory and beyond, (World scientific, Singapore, 1987), Vol. 9.
  • [3] A. Percus, et al., Parallel tempering for the traveling salesman problem, No. LA-UR-08-05100; LA-UR-08-5100. Los Alamos National Laboratory (LANL), (2008).
  • [4]

    D. E. Goldberg, “Genetic algorithms in search, optimization, and machine learning.” Addison Wesley (1989).

  • [5] P. Ronhovde and Z. Nussinov, Phys. Rev. E 80, 016109 (2009).
  • [6] P. Ronhovde, S. Chakrabarty, D. Hu, M. Sahu, K. K. Sahu, K. F. Kelton, and N. A. Mauro, and Z. Nussinov, The European Physical Journal E 34 1 (2011).
  • [7] P. Ronhovde, S. Chakrabarty, D. Hu, M. Sahu, K. K. Sahu, K. F. Kelton, N. A. Mauro, and Z. Nussinov, Scientific reports 2, 329 (2012).
  • [8] D. Hu, P. Ronhovde, and Z. Nussinov, Phys. Rev. E 85, 016101 (2012).
  • [9] D. Hu, P. Sarder, P. Ronhovde, S. Orthaus, S. Achilefu, and Z. Nussinov, Journal of microscopy 253, 54 (2014).
  • [10] D. Hu, P. Ronhovde, and Z. Nussinov, Phil. Mag. 92 (4), 406-445 (2012)
  • [11] P. J. Mucha, T. Richardson, K. Macon, M. A. Porter, and J. Onnela, Science 328, 876 (2010).
  • [12] J. Gao, S. V. Buldyrev, H. E. Stanley, and S. Havlin, Nature Physics 8, 40-48 (2012).
  • [13] A. Cardillo, J. Gómez-Gardeñes, M. Zanin, M. Romance, D. Papo, F. del Pozo, and S. Boccaletti, Scientific reports 3, 1344 (2013).
  • [14] G. Bianconi, Phys. Rev. E 87, 062806 (2013).
  • [15] R. C. Alamino, J. P. Neirotti, and D. Saad, Phys. Rev. E 88, 013313 (2013).
  • [16] J. Surowiecki, The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations, (Anchor Books, 2005).
  • [17] M. Clerc, Particle Swarm Optimization, (Wiley, 2006).
  • [18] M. Dorigo and L. M. Gambardella, BioSystems 43, 73 (1997).
  • [19] M. Padberg and G. Rinaldi, SIAM Review 33, 60 (1991).
  • [20] M. Grötschel and O. Holland, Mathematical Programming 51, 141 (1991).
  • [21] D. S. Johnson and L. A. McGeoch, “The traveling salesman problem: A case study in local optimization,” in Local Search in Combinatorial Optimization (1997), pp. 215-310.
  • [22] S. Lin and B. W. Kernighan, Operations Research 21, 498 (1973).
  • [23] K. Helsgaun, European J. Operational Research 126, 106 (2000).
  • [24]

    M. Dorigo and L. M. Gambardella, Evolutionary Computation, IEEE Transactions

    1, 53 (1997).
  • [25] http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/tsp/
  • [26] Zakir H. Ahmed, “Genetic Algorithm for the Traveling Salesman Problem using Sequential Constructive Crossover Operator”, International Journal of Biometrics and Bioinformatics, 3, 96 (2010).