1 Introduction
One of the most conspicuous limitations of heuristic search methods in combinatorial optimization, is the ability to become trapped at a local optimum
[9]. The number and distribution of local optima in a search space have, therefore, an important impact on the performance of heuristic search algorithms. A recently proposed model: Local Optima Networks [22, 25], provides an intermediate level of description for combinatorial fitness landscapes. The model has a higher descriptive power than a single statistical metric or set of metrics; but it also compresses the search space into a more manageable mathematical object. Specifically, a graph having as vertices the optima configurations of the problem and as edges the possible weighted transitions between these optima. This network representation allows the application of new analytical tools and metrics to study combinatorial landscapes, namely, those of complex networks analysis (e.g. degree distribution, clustering coefficient, assortativity, and community structure, to name a few). In previous work, alternative definitions of edges have been studied, and some of these metrics have been computed on the network extracted for two combinatorial problems: the NK family of landscapes [22, 16, 17, 25, 24], and the Quadratic Assignment problem [7, 6]. Those studies have been mainly descriptive, although distinctive correlations between some network features and previous knowledge about search difficulty in these landscapes had been found. A previous related work [18], explored the relationships between landscape features and the performance of a hybrid EA. The authors use standard landscape metrics, and conduct a study based mainly on scatter plots. They suggest that: “further work is necessary to gain better understanding of the escape rate on the actual problem difficulty”. The present study, addresses exactly this point. The goal is to systematically explore correlations between local optima network features and the performance of a stochastic local search algorithm (Iterated Local Search) running on the underlying combinatorial optimization problem (in this study the family of landscapes). The ‘escape rate’ is a property related to the local optimum, and the LON could be considered as a new tool to better understand problem difficulty. The ultimate goal is to have predictive models of the performance of specific local search heuristics when solving a given combinatorial optimization problem. This paper proposes an initial predictive model of performance based on the most influential LON features.2 Methods
The local optima network model of combinatorial landscapes, and the iterated local search metaheuristic are considered in this study. The relevant definitions and experimental setup are given below.
2.1 Local Optima Network
A fitness landscape [20] is a triplet where is a set of potential solutions i.e. the search space, is a function that assigns to every a set of neighbors i.e the neighborhood structure, and is the evaluation of the corresponding solutions i.e. the fitness function.
The present study uses the wellknown landscapes [12] as a benchmark set. It is a problemindependent model for constructing combinatorial landscapes that are tunably rugged. In the model, refers to the number of (binary) genes in the genotype, i.e. the string length, and to their epistatic interaction, i.e. the number of other loci (chosen at random here) that influence the fitness contribution of a particular gene. Starting from this loci allele additive model, by increasing the nonlinearity from 0 to , the landscapes can be tuned from smooth to rugged. Hence, is the search space of all bit binary strings, and its size is . The neighborhood is defined by the minimum possible move on it, which is the single bitflip operation, and the neighborhood size is . The fitness function evaluates each genotype as , where the values of loci contributions are drawn uniformly at random in .
A local optimum (), which is taken to be a maximum here, is a solution such that , . All optima are determined through exhaustive search by recursively running the 1bitflip bestimprovement hillclimber, as in Algorithm 1. Let us denote by the operator that associates to each solution , the solution obtained after applying that algorithm until convergence to a . Since is of finite size and there is no neutrality in values, this produces a partition of the landscape in a finite number of basins of attraction, which we can denote by , , , the local optima.
The connections among them account for the chances of escaping from a and jumping into another one with a controlled move [24]. There exists a directed transition from to if it exists a solution such that and , where the distance can be measured in “number of moves” (i.e Hamming distance in the bitflip operator case). The distancethreshold can be chosen accordingly to the applied perturbation; in this work, it is set to . The weight of such a transition is then: , i.e. the number of paths at distance starting at and reaching the basin of . This can be normalized by the number of solutions within reach w.r.t. the given distance threshold, i.e. .
The weighted and directed graph having the set of vertices and the set of edges , is the Local Optima Network (LON) [24].
2.2 Iterated Local Search
Iterated local search is a relatively simple but successful algorithm. It operates by iteratively alternating between applying a move operator to the incumbent solution and restarting local search from the perturbed solution. This search principle has been rediscovered multiple times, within different research communities and with different names [3, 14]. The term iterated local search (ILS) was proposed in [13]. Algorithm 2 outlines the procedure.
In the present study, the base LocalSearch heuristic is the same bestimprovement hillclimber of Algorithm 1, which stops on a LO. This heuristic uses the single bitflip move operator. Therefore, a 2bitflip mutation is chosen as a Perturbation operator. When a different LO is found after that, the search process accepts the move if its fitness is higher (we are assuming maximization). With these settings, ILS is performing a firstimprovement hillclimbing in the configuration space of the LON with the escapeedges at distance , as defined in section 2.1.
The search terminates at the global optimum, which for benchmark problems is known a priori, or when reaching a preset limit of fitness evaluations .
2.3 Performance Evaluation
As the performance criterion, we selected the expected number of function evaluations to reach the global optimum (success) after independent restarts of the ILS algorithm (Algorithm 2) [1]. This measure accounts for both the success rate () and the convergence speed. In theory, after unsuccessful runs stopped at steps and the final successful one running for steps, the total running time would be . Taking the expectation and considering that
follows a geometric distribution
^{1}^{1}1probability distribution of the number of failures before the first success (Bernoulli trials) with parameter , it gives:where in the present case
, the ratio of successful to total runs is an estimator for
, and can be estimated by the average running time of those successful runs.The ILS variant detailed in Sec. 2.2 is essentially incomplete, i.e. there are soluble problem instances for which the success probability is even in the limit of an infinite running time [11]. Given the chosen acceptance criterion, the search will eventually get stuck. Indeed, out of the test runs, instance with and instances with were not solved. This theoretical limitation could be overcome by performing as many random restart as to cover the whole search space, but such a solution is of limited practical interest for large problems. In the present study, the success performance has been estimated on all the instances that had been solved at least once.
aggregate  nv  lo  lv  fnn  wii  cc  zout  y2  knn  ets () 

landscape. Standard deviations are given in subscripts.
= number of vertices (Local Optima), = average shortest path to reach the global optimum, = average path length (), = Spearman coefficient for the nearestneighbors fitnessfitness correlation, = average nonnormalized weight of selfloops, global clustering coefficient, = average outgoing degree, = average weight disparity for outgoing edges, = degree assortativity, = estimated time to succeed.The benchmark set consists of landscapes with and . Those are the largest possible parameter combinations for which we could afford the exhaustive extraction of the local optima networks. In order to minimize the influence of the random generation of landscapes, independent problem instances are considered for each combination of and , which accounts for a total of instances in the problem set. The functionevaluations limit is set to of , i.e. , success rate and running time of successful runs are estimated on random restarts per instance.
3 Results
3.1 Descriptive Statistics
Table 1 summarises the LON metrics. Results are grouped according to the value of the corresponding landscapes, and present averages and standard deviations over independent realizations per group. The number of local optima , is a metric familiar to any description of a rugged landscape. The other metrics are particular to the complexnetwork perspective provided by the LON model [15].
From left to right in the table, represents the average length of the shortest paths that reach the global optimum starting from any other local optimum. The cost associated to an edge of the LON graph, is . This measure can be interpreted as the expected number of random perturbations for escaping and entering exactly the basin of . gives the average path length for the whole graph, which accounts for the weighted network characteristic length. is intuitively more directly related to the search difficulty. Indeed, increases steadily with the landscape ruggedness , whilst the trend with is less clear. As a possible explanation, the network growth in terms of nodes might be counteracted by a growth in nodes connectivity: the number of weighted outgoing transitions from a given , i.e. its outdegree in the LON, increases with (cf. column in Tab. 1).
Column measures the correlation between the fitness of a node and the weighted average of the fitness of its nearest neighbors [2]. This is relevant as the ILS acceptance criterion takes the fitness values of LO into account. With respect to this metric, landscapes behave at the LO level as they do at the solution level: they become the more and more uncorrelated with approaching (see column ). In general, it is expected that a high and positive correlation would help the search process.
Column reports the average number of perturbations that remain in the same basin of attraction, which is a proxy for the basin size. The larger the basins, the more difficult it is to escape them. Table 1 shows that average value decreases with the landscape ruggedness.
Column , reports the clustering coefficient [15], which measures the ratio of connected triples in the LON graph. In a social network, this coefficient measures the probability for one’s friends to be friends among each other. In the LO networks, it provides an index of topological locality for the transitions between local optima. Table 1 suggest that values decrease steadily with increasing , but so does the LON density [24] and that might be the reason.
Column counts the number of transitions departing from a given LO. It is relevant to know whether all the transitions have the same rate, or if there is a preferred direction. To this aim, the disparity score gauges the weight heterogeneity of outgoing edges [2]. When all connections leaving a given have the same probability , the disparity will be close to the inverse of the outdegree otherwise it will be higher than this value. Column shows that the disparity monotonically decreases with increasing . From the point of view of a metaheuristic dynamic, a low disparity means that transitions are almost equiprobable. The LON topology does not preferentially guide the search trajectory, which makes the search process harder.
Column , reports the nearestneighbors degree correlation or assortativity, a classical description of the mixing pattern of nodes in a network. The assortativity measures the affinity to connect with high or lowdegree neighbors. The LONs of landscapes are strongly disassortative, i.e. LO with few connections tend to link to others with many, and conversely. The implications of this observation on the search difficulty are worth further investigation.
The last column in Table 1 gives the values of the success performance indicator detailed in Sec 2.3, which we abbreviate to in the following. Clearly, the expected running time increases with the landscape ruggedness (problem nonlinearity) . In order to analyze the intercorrelations between all the studied metrics, Figure 1 displays a correlogram of the whole data set. The figure depicts any possible pairing among the observed variables with a scatter plot in the panel below the diagonal, and the corresponding correlation coefficient in the upper panel. Table 1 corresponds to the first column of the correlation matrix, i.e. the one showing how the different network metrics shape against the epistasis .
We are mostly interested in the correlations with the performance metric . Thus, the most relevant scatter plots are those in the last row of Figure 1 together with the respective Pearson coefficients in the last column. By inspecting these values, the highest positive correlation is with the average length of the shortest path climbing to the global optimum (). and with the total number of LO (). This observation seems reasonable: the more rugged the landscape (i.e. the larger its LON), the higher the number of hops through the LON to reach the global optimum, and thus the longer the expected running time of a restarting local search. Conversely, the larger the LO basins, and the higher the nearestneighbors fitnessfitness correlation, the shorter the running time (()). The catch here, is that those metrics are in turn correlated among themselves (e.g. ), which prevents one from drawing relationships of causality from simple pairwise correlations.
The last column plots also suggest a strong nonlinearity of the performance estimator w.r.t. the LON metrics. In order to further analyze these relationships, Figure 2 zooms on the relevant scatter plots, and displays the logarithm of as a function of the considered landscape measures. The logtransformation allows to approach linearity, highlighting and confirming the results in the last column of Figure 1. Namely, the relationships between and all the metrics but appear clearly. The picture suggests a positive exponential trend with and , and a negative exponential trend with , , , and . In the case of , the relation could also be close to powerlaw.
Since data are far from being normal bivariate, a more robust measure of association would be the rankbased Spearman’s statistic, reported in Table 2. These results complement the visual inspection of scatter plots and confirm the previous observations.
nv  lo  lv  fnn  wii  cc  zout  y2  knn 

3.2 Statistical Modeling
Figures 2 and 1, along with Table 2, address already some of the research questions asked in Section 1, but do not provide an explanatory model for the algorithm performance as a function of the landscape features. To this end, we perform a multiple linear regression on the data, which has the general form:
(1) 
where the response variable
in our case would be and different predictors are to be chosen among the LON metrics; is the usual random noise term.The least square regression produces estimates for the model coefficients; the difference between the predicted values and the actual, observed values, are the regression residuals: .
The difficulty of this analysis is that there are several possible explanatory variables, which are in turn intercorrelated. In consequence, some of them could have a confounding effect on the regression. In general, when confounders are known, measurable, and measured, it is a good practice to include them in the model. We, therefore, start by fitting the following formula:
(2) 
where, w.r.t the general expression 1, the response and one of the predictors have been logtransformed in order to better approach linearity, as seen in Figure 2. Moreover, the variable
is qualitative and enters the model as a fixed effect, which translates in one appropriate dummy variable for each class
.The summary statistics for this model are reported in Table 3. In the table caption, the multiple
represents the proportion of variance explained by the linear regression.
would be equal to if all observed data points were lying on the regression plane. When comparing models with a different number of predictors, the adjusted should be used instead. The statistic is the ratio of the variance explained by the parameters in the model, to the residual or unexplained variance. The value is the probability of achieving an that large under the null hypothesis of no effect [4].The estimated coefficient
and their estimated standard error
are given in the and columns, respectively. Their ratio is the statistic ( column) that is used to calculate a value for the significance of the estimation (last column).summary  Estimate  Std. Error  value  Pr(>) 

(Intercept)  
k4  
k6  
k8  
k10  
k12  
k14  
k16  
k17  
log(nv)  
lo  
lv  
fnn  
wii  
cc  
zout  
y2  
knn 
In this initial model, the average length of paths to the global optimum is the only predictor with a regression coefficient that is statisticallysignificant at the threshold (, value = ).
Therefore, we proceed to perform a stepwise model selection by backward elimination [23]. From the initial formula, at each step we compute what change in the fit could be produced by dropping each predictor in turn, and then we eliminate the one that minimizes the AIC score of the resulting model [21]. By iterating until all predictors become significant, we obtain the final model:
(3) 
which is detailed in Table 4.
summary  Estimate  Std. Error  value  Pr(>) 

(Intercept)  
lo  
zout  
y2  
knn 
This final model is able to explain of the variance observed in with a linear regression on four variables that are all LON network metrics. Among these metrics, the length of the paths to the global optimum, and the weight disparity have the highest relative importance [10].
Without a check on the model assumptions, this would remain an observational study and could not be used to make predictions. To this end, a combination of parametric tests (not reported for space reasons) provided a positive confirmation [19]. However, a visual diagnostics is more informative. In particular, Figure 3
helps to assess if the regression residuals follow a normal distribution with zero mean and homogeneous variance, whereas Figure
4 displays the contributions to the model of each predictor in turn, highlighting possible violations of the linearity hypothesis [8]. All assumptions seem acceptable. Therefore, formula 3 could be used to make inferences. In other words, formula 3 coefficients can be interpreted as conditional expectations for the average change in the response when one predictor undergoes a unitary change, and all the others remain fix. Since the dependent variable is logtransformed, this effect would be multiplicative.The only important limitation of the proposed model is multicollinearity: due to the complex intercorrelations among LON metrics (cf. Figure 1
), predictors are not really independent. This does not invalidate the multiple linear regression analysis, but it inflates the variance of its coefficients and makes it harder to disentangle their respective contributions
[8].4 Conclusions
This article explored correlations between local optima network features and the performance of a stochastic local search algorithm running on the underlying combinatorial optimization problem. The NK family of landscapes and the iterated local search metaheuristic were considered. It has been shown in previous work e.g. [22, 25] that some features of the LON networks are related to the landscapes ruggedness, and thus to problem difficulty. However, no statistically testable model was presented. The contribution of this study was to investigate, with a statisticallysound approach, which features of the LONs have a strong influence on the search performance, expressed as expected running times to success. The results obtained through the use of a multiple linear regression model show that some LON metrics are more important than others. These are: the average length of the shortest paths to the optimum, the average outdegree, the average disparity, and the degree assortativity. This study confirms and provides significant evidence that LON modeling is a compressedbutrelevant view of the fitness landscape, and can be used to understand and predict search difficulty.
It is worth noticing that some network metrics can be estimated without knowing the global optimum beforehand, such as the average outdegree, the fitnessfitness correlation, the average disparity and assortativity. Using these metrics and an adequate statistical model, as we have done in this work, opens up exciting possibilities. With standard sampling methods, larger search spaces could be studied. Thereafter, using the performance model based on the estimated LON metrics, the search heuristic parameters or its operators can be offline tuned, or even online controlled. We plan to continue working in this direction and to extend this analysis to other combinatorial problems such as QAP.
5 Acknowledgments
Fabio Daolio and Marco Tomassini gratefully acknowledge the Swiss National Science Foundation for financial support under grant number 200021124578.
References

[1]
A. Auger and N. Hansen.
Performance evaluation of an advanced local search evolutionary algorithm.
In Evolutionary Computation, 2005. The 2005 IEEE Congress on, volume 2, pages 1777–1784. IEEE, 2005.  [2] M. Barthélemy, A. Barrat, R. PastorSatorras, and A. Vespignani. Characterization and modeling of weighted networks. Physica A: Statistical Mechanics and its Applications, 346(1):34–43, 2005.
 [3] J. Baxter. Local optima avoidance in depot location. Journal of the Operational Research Society, 32:815–819, 1981.
 [4] J. Chambers, T. Hastie, et al. Linear models. In Statistical models in S, chapter 4. Chapman & Hall London, 1992.
 [5] W. Cleveland. Lowess: A program for smoothing scatterplots by robust locally weighted regression. The American Statistician, 35(1):54–54, 1981.
 [6] F. Daolio, M. Tomassini, S. Vérel, and G. Ochoa. Communities of minima in local optima networks of combinatorial spaces. Physica A Statistical Mechanics and its Applications, 390:1684–1694, 2011.
 [7] F. Daolio, S. Verel, G. Ochoa, and M. Tomassini. Local Optima Networks of the Quadratic Assignment Problem. In Evolutionary Computation (CEC), 2010 IEEE Congress on, pages 1–8. IEEE, 2010.
 [8] J. Fox and S. Weisberg. An R Companion to Applied Regression. Sage, Thousand Oaks CA, second edition, 2011.

[9]
F. Glover.
Future paths for integer programming and links to artificial intelligence.
Computers & Operations Research, 13(5):533–549, 1986.  [10] U. Groemping. Relative importance for linear regression in R: The package relaimpo. Journal of Statistical Software, 17(1):1–27, 2006.
 [11] H. Hoos and T. Stützle. Stochastic local search: Foundations and applications. Morgan Kaufmann, 2005.
 [12] S. A. Kauffman. The Origins of Order. Oxford University Press, New York, 1993.
 [13] H. R. Lourenço, O. Martin, and T. Stützle. Iterated local search. In Handbook of Metaheuristics, volume 57 of International Series in Operations Research & Management Science, pages 321–353. Kluwer Academic Publishers, 2002.

[14]
O. Martin, S. W. Otto, and E. W. Felten.
Largestep Markov chains for the TSP incorporating local search heuristics.
Operations Research Letters, 11(4):219–224, 1992.  [15] M. Newman. The structure and function of complex networks. SIAM review, pages 167–256, 2003.
 [16] G. Ochoa, M. Tomassini, S. Verel, and C. Darabos. A study of NK landscapes’ basins and local optima networks. In Genetic and Evolutionary Computation Conference, GECCO 2008, pages 555–562. ACM, 2008.
 [17] G. Ochoa, S. Verel, and M. Tomassini. Firstimprovement vs. bestimprovement local optima networks of nk landscapes. In Parallel Problem Solving from Nature  PPSN XI, volume 6238 of Lecture Notes in Computer Science, pages 104–113. Springer, 2010.
 [18] M. Pelikan. Nk landscapes, problem difficulty, and hybrid evolutionary algorithms. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, GECCO ’10, pages 665–672, New York, NY, USA, 2010. ACM.
 [19] E. A. Pena and E. H. Slate. gvlma: Global Validation of Linear Models Assumptions, 2010. R package version 1.0.0.1.
 [20] C. Reidys and P. Stadler. Combinatorial landscapes. SIAM review, 44(1):3–54, 2002.
 [21] Y. Sakamoto and G. Kitagawa. Akaike information criterion statistics. Kluwer Academic Publishers, 1987.
 [22] M. Tomassini, S. Verel, and G. Ochoa. Complexnetwork analysis of combinatorial spaces: The NK landscape case. Phys. Rev. E, 78(6):066114, 2008.
 [23] W. N. Venables and B. D. Ripley. Modern Applied Statistics with S. Springer, New York, fourth edition, 2002. ISBN 0387954570.
 [24] S. Verel, F. Daolio, G. Ochoa, and M. Tomassini. Local Optima Networks with Escape Edges. In Procedings of International Conference on Artificial Evolution (EA2011), pages 10 – 23, Angers, France, Oct 2011.
 [25] S. Verel, G. Ochoa, and M. Tomassini. Local optima networks of NK landscapes with neutrality. Evolutionary Computation, IEEE Transactions on, 6(15):783 – 797, 2011.
Comments
There are no comments yet.