Multiobjective combinatorial optimization (MoCO) problems, where several criteria have to be optimized simultaneously, receive more and more interest in the field of search algorithms. One of the main issues in multiobjective optimization is the Pareto dominance relation, which gives a partial order between feasible solutions. Roughly speaking, a given solution dominates another solution if it is better according to all objective functions. A possible approach in solving a multiobjective problem consists in finding the whole set of non-dominated solutions, called the efficient set, or a subset that is close to it. This efficient set plays a central role in the structure of the search space.
The design of metaheuristics for multiobjective combinatorial optimization is a real challenge, as it is problem-dependent. Like in single-objective optimization, the structure of the search space can explain the ability of multiobjective metaheuristics. Two main classes of multiobjective metaheuristics can be distinguished. The first ones, known as scalar approaches, are based on multiple scalarized aggregations of the objective functions. However, they are only able to find a subset of efficient solutions, called supported efficient solutions. The second ones, known as Pareto-based approaches, directly or indirectly focus the search on the Pareto dominance relation. Moreover, when the size of the efficient set is too large, a metaheuristic should manipulate a limited-size solution set during the search, and this limit is related to the size of the efficient set. In addition, connectedness is related to the property that efficient solutions are connected with respect to a neighborhood relation . When connectedness holds, it becomes possible to find the whole efficient set by iteratively exploring the neighborhood of the current approximation, initialized with at least one efficient solution. This strategy is often used explicitly, or implicitly by Pareto-based approaches. For the design of metaheuristics for MoCO, three main questions, related to the efficient set properties, are of our interest in this paper:
What is the cardinality of the efficient set? Can we pretend to identify or approximate the whole set of efficient solutions, or should we consider a mechanism to bound the size of the approximation set?
How many efficient solutions are supported? Is a scalar approach able to identify or approximate enough efficient solutions?
Are efficient solutions connected with respect to a neighborhood operator? Is it possible to identify or approximate additional efficient solutions by a simple local search initialized with a subpart of the efficient set?
In particular we want to study such properties according to the objective correlation, as it seems to largely affect the solutions of MoCO problems  and the behavior of metaheuristics . Few benchmark takes the correlation between objectives into account. To the best of our knowledge, the multiobjective quadratic assignment problem  should be the single one. In this problem, a parameter can tune the correlation between different pairs of objectives. Another well-known benchmark, the multiobjective -landscapes  facilitate the study of problem structure in multiobjective optimization. In this class, the epistatic degree, which is the degree of non-linearity of the problem, can be tuned very precisely. In this work, in order to study the problem structure, and in particular the structure of the efficient set, we define a general method to tune the correlation between all pairs of objectives very precisely. As an example, we define the multiobjective -landscapes, an extension of multiobjective -landscapes with objective correlation. With such a benchmark, we can study the problem structure according to the objective space dimension, the epistasis and especially the objective correlation, and then highlight some guidelines for the design of efficient multiobjective metaheuristics.
In summary, the contributions of this work can be stated as follows. First, we propose a method to precisely tune the correlation between objective functions. It is applied to the design of -landscapes, but it can easily be generalized to other problems. Second, we show the influence of the objective correlation on some properties of the efficient set (and its image in the objective space): its size, the proportion of supported solutions, and the connectedness of efficient solutions. Third, we bring those properties with the design of local search metaheuristics in order to help the practitioner to make proper choices between several classes of methodologies. The reminder of the paper is organized as follows. Section 2 is dedicated to multiobjective combinatorial optimization, multiobjective metaheuristics, as well as single- and multi-objective -landscapes. Section 3 presents the design of -landscapes. We conduct a theoretical analysis and an experimental study to show the sharpness of the objective correlation. Section 4 deeply analyzes the efficient set structure on this new class of problems according to the objective space dimension, the non-linearity and especially the objective correlation. The consequence on the design of multiobjective metaheuristics are discussed in the last section.
2.1 Multiobjective Combinatorial Optimization
A large number of real-world optimization problems are multiobjective by nature, because several criteria have to be considered simultaneously. A MoCO problem can be defined by a set of objective functions , and a discrete set of feasible solutions in the decision space. Let
be the set of feasible outcome vectors in theobjective space. In a maximization context, a solution dominates a solution , denoted by , iff , and such as . A solution is said to be efficient (or non-dominated, Pareto optimal), if there does not exist any other solution such that dominates . The set of all efficient solutions is called the efficient set (or Pareto optimal set), denoted by , and its mapping in the objective space is called the Pareto front. A possible approach in MoCO is to identify a minimal complete efficient set, i.e. one efficient solution mapping to each point of the Pareto front.
However, generating the entire efficient set of a MoCO problem is often infeasible for two main reasons . First, for most MoCO problems, the number of efficient solutions is known to be exponential in the size of the problem instance. In that sense, most MoCO problems are said to be intractable
. Second, deciding if a feasible solution belongs to the efficient set is NP-complete for numerous MoCO problems, even if none of its single-objective counterpart is NP-hard. Therefore, the overall goal is often to identify a good efficient set approximation. To this end, metaheuristics in general, and evolutionary algorithms in particular, have received a growing interest since the late eighties, and multiobjective metaheuristics still constitute an active research area.
2.2 Metaheuristics for Multiobjective Combinatorial Optimization
Two main classes of metaheuristics for MoCO can be distinguished, see for instance . The first ones, known as scalar approaches, are based on multiple scalarized aggregations of the objective functions. The second ones, known as Pareto-based approaches, directly or indirectly focus the search on the Pareto dominance relation (or a slight modification of it). These two kinds of approaches can also be hybridized in a two-phase way.
Initial approaches dealing with MoCO are based on successive transformations of the original multiobjective problem into single-objective ones by means of a scalarization strategy. Most of the time, scalar approaches are based on a weighted-sum aggregation of the objective functions, that can be defined as follows. : where for all . The problem is now to identify a (single) solution that maximizes . For any given weighting coefficient vector , if , then is an efficient solution. Multiple weighting coefficient vectors can be iteratively defined so that several non-dominated solutions are identified (or approximated). For each scalarization, the corresponding solution is incorporated into an approximation set, whose dominated solutions are then discarded. However, in the combinatorial case, a number of efficient solutions are not optimal for any definition of . They are known as non-supported (efficient) solutions. On the contrary, there exists supported (efficient) solutions whose corresponding objective vectors are located on the convex hull of the Pareto front. The set of all supported efficient solutions will be denoted by . As a consequence, the proportion of non-supported solutions over the efficient set has a direct implication on the ability of scalar approaches to find a proper non-dominated set approximation.
Over the years, other types of approaches were proposed. They are based on the explicit or implicit use of the Pareto dominance relation, that allows to define a partial order between feasible solutions. The basic idea is to maintain a set solutions (typically a population or an archive of mutually non-dominated solutions). The content of this set is then iteratively updated with new solutions built by means of variation or neighborhood operators. The update of this set is based on a specific decision on which solutions to accept or to choose for further manipulation. This process is iterated until no further improvement is possible or another stopping condition is fulfilled. In the end, this set corresponds to the approximation outputted by the algorithm. The implicit goal is to identify an approximation whose image in the objective space is () close to and () well-spread along the Pareto front. However, as the number of efficient solutions is often intractable, we generally have to design specific strategies to limit the size of the approximation set . As a consequence, the cardinality of the efficient set also plays a major role on the design of multiobjective metaheuristics.
More recently, the neighborhood structure of the efficient set has been claimed to play a crucial role for the development of efficient metaheuristics. One of these properties is known as connectedness [1, 9]. Let us define a graph such that each node represents an efficient solution, and an edge connects a pair of nodes if the corresponding solutions are neighbors with respect to a given neighborhood operator . This graph is called the efficient graph. A neighborhood operator is a function that assigns a set of solutions to any solution . is called the neighborhood of , and a solution is called a neighbor of . The efficient set is said to be connected if there exists a path between every pair of nodes in the graph. In other words, each efficient solution is located in the neighborhood of at least one other solution from the efficient set. This property has later been extended to the notion of cluster by introducing an arbitrary distance separating two efficient solutions . When connectedness holds, it becomes possible to find all the efficient solutions by means of the iterative exploration of the neighborhood of the current approximation by starting with one (or more) solution(s) from the efficient set. This gives rise to a two-phase approach: () identify a number of (typically supported) non-dominated solutions () improve the set of non-dominated solutions by exploring their neighborhood.
2.3 - and -Landscapes
The family of -landscapes  is a problem-independent model used for constructing multimodal landscapes. refers to the number of (binary) genes in the genotype (i.e. the string length) and to the number of genes that influence a particular gene from the string (the epistatic interactions). By increasing the value of from 0 to , -landscapes can be gradually tuned from smooth to rugged. The fitness function (to be maximized) of a -landscape is defined on binary strings with bits. An ‘atom’ with fixed epistasis level is represented by a fitness component associated to each bit . Its value depends on the allele at bit and also on the alleles at other epistatic positions ( must fall between and ). The fitness of a solution corresponds to the mean value of its fitness components :
where . Several ways have been proposed to set the bits from the bit string of size . Two possibilities are mainly used: adjacent and random neighborhoods. With an adjacent neighborhood, the bits nearest to the bit are chosen (the genotype is taken to have periodic boundaries). With a random neighborhood, the bits are chosen randomly on the bit string. Each fitness component is specified by extension, i.e. a number from is associated with each element from
. Those numbers are uniformly distributed in the range.
More recently, a multiobjective variant of -landscapes (namely - landscapes)  have been defined with a set of fitness functions:
The numbers of epistasis links can theoretically be different for each fitness function. But in practice, the same epistasis degree for all is used. Each fitness component is specified by extension with the numbers . In the original -landscapes , these numbers are randomly and independently drawn from . As a consequence, it is very unlikely that two different solutions map to the same point in the objective space.
3 -Landscapes: Multiobjective -Landscapes with Correlation
In this section, we define the - and the -landscapes, which are based on the -landscapes . In this multiobjective model, the correlation between objective functions can be precisely tuned by a correlation matrix. It allows to study the simultaneous influence of objective space dimension, non-linearity and objective correlation on the main properties of multiobjective fitness landscapes. The construction of landscapes is defined and the analytic proof of the correlation between objectives, completed with an experimental study, are given. Note that the proposed approach to tune the objective correlation can be applied to other MoCO problems where the objective functions are summing objectives, share the same definition, but are computed with different cost or profit matrices. This is the case, for instance, of the multiobjective knapsack, traveling salesman and quadratic assignment problems [4, 6].
In the proposed -landscapes, the epistasis structure is identical for all the objective functions: , and , , . The fitness components are not defined independently. The numbers follow a multivariate uniform law of dimension , defined by a correlation matrix . Thus, the ’s follow a multidimensional law with uniform marginals and the correlations between s are defined by the matrix . So, the four parameters of the family of -landscapes are () the number of objective functions , () the length of the bit string , () the number of epistatic links , and () the correlation matrix .
The matrix is a symmetric positive-definite matrix where numbers can be defined. In order to limit the number of free numbers in matrix , we define the matrix which has the same correlation between all the objectives: for all , and for all . In this case, we denote -landscapes by -landscapes, and the original -landscapes are equivalent to -landscapes with . However, it is not possible to have the matrix for all between . must be positive-definite: , . So, must be greater than . For two-objective problems, all the correlations between are possible. However, for three-objective problems, the correlation must fall in . Of course, if one wants to study very negative correlations between some pairs of objectives, it is possible to design a matrix that keeps the condition that is positive-definite.
To generate random variables with uniform marginals and a specified correlation matrix, we follow the work of Hotelling . We first generate a multinormal laws of means and correlation matrix . Then, the values are uniformly distributed with a correlation matrix , where is the univariate normal cumulative density function. Note that this is not the only way to generate a multivariate uniform law.
3.2 Correlation between Objective Functions
The construction of -landscapes defines correlation between the ’s but not directly between the objectives. In this section, we prove by algebra that the correlation between objectives is tuned by the matrix . This proof is followed by an experimental analysis.
Let be the fitness vector values of the solutions with respect to objective . The correlation between objective and is: where and
are the standard deviations of fitness values over the landscape of theand fitness functions. (resp. ) corresponds to the average value of the vectors (resp. ) of fitness component values:
By definition, when , and , where is the correlation defined in the matrix , and (resp. ) is the standard deviation of fitness component . The correlation between objectives and becomes:
By construction of the fitness functions, the following relation between standard deviations stands (resp. for ). On average, the are equal to the standard deviation of the uniform law on .
Then, the average of the correlations between objective functions are given by the matrix . In the -landscapes, the parameter allows to tune very precisely the correlation between all pairs of objectives.
In order to enumerate the search space exhaustively, we conduct an empirical study for . In order to minimize the influence of the random creation of landscapes, we considered different and independent landscapes for each parameter combinations: , , and . The measures reported are the average over these 30 landscapes. The remaining set of parameters are given in Table 1. Figure 1 shows the average111 For , there are several correlation coefficients. We report here the average correlation coefficients over all the objectives (these values are all very close). of the Spearman correlation coefficient according to the parameters , and . This confirms the result of equation (1), the correlation coefficients are very close to the expected value .
Then, in the -landscapes, the parameter tunes very precisely the correlation, and, in addition to the correlated multiobjective quadratic assignment problem , it is possible to tune this correlation between all pairs of objectives. In the following, we study the influence of epistasis, number of objective and objective correlation on the properties of the efficient set for the -landscapes model.
4 Analysis of the Efficient Set Properties
In this section, we conduct experiments on the -landscapes in order to study different properties of the efficient set: its cardinality, the number of supported solutions and connectedness-related features. The instances under study are defined by the parameter setting given in Table 1.
4.1 Cardinality of the Efficient Set
Figure 2 shows the proportion of efficient solutions in the search space according to parameters , and of -landscapes. First of all, the epistatic parameter does not seem to have a major influence on the results. At the opposite, the objective correlation modifies the number of efficient solutions to several orders of magnitude. Indeed, the proportion decreases from to () for two-objective problems, and from to () for . With respect to the number of objective functions (, and ), the size increases of several decades according to . For a negative objective correlation (), the proportion goes from to whereas it goes from to for a positive correlation ().
The influence of objective correlation on the efficient size becomes as important as the number of dimension of objective space. A lot of solutions becomes efficient when the anti-correlation is high. Now, let us suppose that we want to set or to bound the size of the approximation set by . Such a parameter setting is often used while handling a population or an archive of non-dominated solutions in a multiobjective metaheuristic. For the -landscapes, the proportion of non-dominated solutions over the search space should be roughly around (this goes up to for solutions). Whatever the correlation value , a solution approximation set always allows to store all the efficient set for two-objective problems. However, this is not the case for a higher dimension of the objective space. For instance, for , solutions suffice to store the whole efficient set for a high objective correlation only (). In other words, for , we cannot pretend to identify the whole efficient set exhaustively by handling a solution approximation set.
To summarize, when the number of objective increases, and even more when the objectives are in conflict, the size of the efficient set becomes very large, and then tend to be intractable. In this case, it is not reasonable to pretend to identify the whole efficient set, and a limited-size approximation should be considered. This first result shows the importance to design a benchmark where the objective correlation can be tuned precisely, even when . Such a property should be taken into consideration for the development of metaheuristics, when the number of objective becomes too large, and when there is a high anti-correlation between objective functions. A special attention should be paid with regards to the size of the approximation set handled by the search approach.
4.2 Number of Supported Efficient Solutions
Figure 3 shows the proportion of supported solutions in the search space according to parameters , and of -landscapes. Mainly, this number follows the size of the efficient set: the epistatic parameters has low influence on the size. When the objective space dimension increases or the objective correlation decreases, the number of supported solutions gets higher. The difference with the size of the efficient set becomes more clear in Figure 4. It gives the proportion of supported solutions over the efficient set. This proportion is nearly independent of the epistasis degree of the problem (). However, when the objective correlation increases, this proportion increases. For a high objective correlation (), nearly all solutions become supported (this is even the case for some instances). The same observation can be made with the number of objectives. The number of supported solution increases with the cardinality of the efficient set, but the former increases faster than the latter.
While putting this property in relation with the design of a metaheuristic, we can conclude that scalar approaches should become more appropriate when the number of objective is low, and when the objective correlation is high.
4.3 Connectedness of the Efficient Set
In this section, the efficient graph (see Section 2.2), i.e. the graph of efficient solutions where edges are induced by a given neighborhood operator, is analyzed.
Firstly, the efficient graph can be composed of several connected components. In this case, all the efficient solutions are not connected with respect to the neighborhood relation. Figure 5 shows the average ratio of the larger connected component size induced by Hamming distance . Nearby all solutions of the efficient graph are in the same component when the objective space dimension is high () and when the objective correlation is negative (). At first sight, such a result seems to be explained by the very large size of the efficient set obtained for those parameters (see Section 4.1). However, we compared this result to the size of the larger component of a graph of same size, but where the nodes are now random solutions. We found out that this size is much smaller than the one of the efficient graph, in particular when the epistatic degree is low ( times larger for , , and ). Consequently, the ratio size of the larger component is not the consequence of the number of efficient solutions only .
Contrary to the size of the efficient set, the size of the largest connected component seems to depend on the epistatic degree . Indeed, this size decreases when increases. As an example, for and , the ratio size is for and lower than for . When the epistatic degree is low, the objective values of neighboring solutions are correlated, and this correlation decreases with the epistatic degree 
. This could explain our experimental result: If a solution is efficient, the probability that one of its neighbors is also efficient gets higher when the epistatic degree gets lower.
The objective correlation and the number of objective functions also affect the size of the largest connected component. But the variation is different with respect to the number of objective functions. For , the ratio of the larger component size increases when the objective correlation increases (apart from ). For , the ratio decreases when the objective correlation increases. As a consequence, excepting when the efficient set is intractable (that is, when there is a high objective space dimension and a high anti-correlation degree), we cannot expect to reach all the efficient solutions by iteratively exploring the neighborhood of an approximation set initialized with one non-dominated solution. However, when there are several connected components for the efficient graph based on Hamming distance (see the definition of cluster in Section 2.2), the distance between those components could be small.
When efficient solutions are connected with respect to a neighborhood structure related to Hamming distance and not , the efficient set is then said to be -connected . When the minimal distance is around , which is the average distance between random solutions, we can say that the distance between efficient solutions is large. Figure 6 shows the average minimal distance to connect all the efficient solutions. This minimal distance increases when the epistatic degree increases. As an example, for , the average distance is equals to and for dimension 2 and 5, respectively, when , whereas it is equal to and , respectively, when . These results meet the previous ones on the largest component size: At the same time, the size of the larger component decreases, and the distance between efficient solutions increases.
The average -connectedness increases also when the objective correlation increases. For an objective space dimension and a negative objective correlation , it could be possible to reach all non-dominated solutions from another one, as the average minimal distance is lower than . At the opposite, when the objective correlation is positive, it should be easier to find a new non-dominated solution by restarting the search from a random solution, rather than exploring the neighborhood of a given non-dominated solution such as the distance is around the third of the bit string length. When objectives are correlated, less solutions are to be found, but knowing some of them will not help to find more. Then, the design of an efficient metaheuristic has to be different according to the objective correlation. In a two-phase approach, the number of starting solutions and the size of the neighborhood can be tuned according to correlation between objectives following this study.
In this paper, we analyzed the consequence of the objective space dimension, the non-linearity, and the objective correlation on the structure of multiobjective combinatorial search spaces for the design of metaheuristics. We proposed a new method to design a multiobjective combinatorial benchmark where the correlation between all pairs of objectives can be tuned very precisely. As an example, we defined the -landscapes which extend the multiobjective -landscapes.
Figure 7 shows three examples of -landscapes in the objective space. The number of objective is , the parameter is , and length of the bit string is . This gives a summary of our results in a more intuitive way. When the objective correlation is negative, the objectives are in conflict (feasible solutions are in green). The efficient set size (in red) is large, and the problem could become intractable. In this case, a metaheuristic has to find a limited-size approximation of the efficient set only. When the objective correlation is null, as in , the image of the search space in the objective space can be represented as a multidimensional ‘bowl’. The objectives are independent. When the objective correlation is positive, there exists few solutions in the efficient set. Nearly all solutions become supported. Indeed, when the number of objectives is low, and when the objective correlation is high, efficient solutions are supported. We can conclude that scalar approaches should become more appropriate in such a case. The connectedness property is not represented in the last figure. The size of larger connected component and the minimal distance to connect all the efficient solutions depend on the objective space dimension, the epistatic degree, and also on the objective correlation. A two-phase strategy, starting from some efficient (supported) solutions, and exploring their neighborhood at a given distance, can be tuned according to the results of this work.
Bringing those properties with the design of local search metaheuristics help to make proper choices between several classes of methodologies. This analysis shows the importance of the objective correlation on the design of benchmark problems, in particular when the number of objectives is higher than . In future works, we will use some sample technics to study the -landscapes of larger size. We will also compare our results on the properties of search space with the performance of different metaheuristics. However, the efficient set does not cover all the search space properties, so next works will focus on the properties related to the Pareto local optima, and to the Pareto local optimum sets.
-  Ehrgott, M., Klamroth, K.: Connectedness of efficient solutions in multiple criteria combinatorial optimization. European Journal of Operational Research 97(1) (1997) 159–166
-  Mote, J., Olson, I.M.D.L.: A parametric approach to solving bicriterion shortest path problems. European Journal of Operational Research 53(1) (1991) 81–92
-  Paquete, L., Stützle, T.: A study of stochastic local search algorithms for the biobjective QAP with correlated flow matrices. European Journal of Operational Research 169(3) (2006) 943–959
-  Knowles, J., Corne, D.: Instance generators and test suites for the multiobjective quadratic assignment problem. In: Second International Conference on Evolutionary Multi-Criterion Optimization (EMO 2003), Faro, Portugal, Springer. Lecture Notes in Computer Science. Volume 2632 (2003) 295–310
-  Aguirre, H.E., Tanaka, K.: Working principles, behavior, and performance of MOEAs on MNK-landscapes. European Journal of Operational Research 181(3) (2007) 1670–1690
-  Ehrgott, M.: Multicriteria optimization. Second edn. Springer (2005)
-  Paquete, L., Stützle, T.: Stochastic local search algorithms for multiobjective combinatorial optimization: A review. In: Handbook of Approximation Algorithms and Metaheuristics. Volume 13 of Computer & Information Science Series. Chapman & Hall / CRC (2007)
-  Knowles, J., Corne, D.: Bounded Pareto archiving: Theory and practice. In: Metaheuristics for Multiobjective Optimisation. Volume 535 of Lecture Notes in Economics and Mathematical Systems. Springer-Verlag (2004) 39–64
-  Gorski, J., Klamroth, K., Ruzika, S.: Connectedness of efficient solutions in multiple objective combinatorial optimization. Technical Report 102/2006, University of Kaiserslautern, Department of Mathematics (2006)
-  Paquete, L., Stützle, T.: Clusters of non-dominated solutions in multiobjective combinatorial optimization: An experimental analysis. In: Multiobjective Programming and Goal Programming. Volume 618 of Lecture Notes in Economics and Mathematical Systems. Springer (2009) 69–77
-  Kauffman, S.A.: The Origins of Order. Oxford University Press, New York (1993)
-  Hotelling, H., Pabst, M.R.: Rank correlation and tests of significance involving no assumptions of normality. Ann. Math. Stat. 7 (1936) 29–43
-  Weinberger, E.D.: Correlated and uncorrelatated fitness landscapes and how to tell the difference. In: Biological Cybernetics. (1990) 63:325–336