The single-objective long path problem  has been introduced to show that a problem instance can be difficult to solve for a hillclimber-like heuristic even if the search space is unimodal, i.e. the single local optimum is the global optimum. For such a problem, a hillclimber guarantees to reach the global optimum, but the length of the path to get it is exponential in the dimension of the search space. As a consequence, a hillclimbing-based heuristic cannot expect to solve the problem in polynomial time. The ‘path length’ takes then place in the rank of problem difficulty, on the same level as multimodality, ruggedness, deceptivity, and so on. Rudolph  demonstrated that the long path problem can be solved in a polynomial expected amount of time for a evolutionary algorithm (EA) which is able to mutate more than one bit at a time. This EA is able to take some shortcuts on the outside of the path so that it makes the computation more efficient. However, it does not change the argument that, even for unimodal problems, the path length to the global optimum must be taken into account in the design of efficient local search algorithms.
Like in single-objective optimization, the structure of the search space can explain the difficulty for multiobjective local search methods. In multiobjective combinatorial optimization (MoCO), the efficient set is the set of solutions which are not dominated by any other feasible solution. It is often claimed that the structure of this efficient set plays a crucial role for the development of efficient local search methods . Connectedness is related to the property that efficient solutions are connected (at distance ) with respect to a neighborhood relation . This property has later been extended to the notion of cluster, where distances can take higher values . When connectedness holds, it becomes possible to find all the efficient solutions by means of the iterative exploration of the neighborhood of the current approximation set by starting by one (or more) solution(s) from the efficient set. This strategy coincides with the Pareto Local Search (PLS) algorithm , initialized with one efficient solution, and then acts like an exact approach. However, a common knowledge is that, for most MoCO problems, the number of non-dominated solutions is not polynomial in the size of the problem instance , so that a PLS algorithm can take an exponential time to identify the efficient set once the later contains an exponential number of solutions. Then, the goal of the optimization process is often to identify a representative sample set, containing a limited number of efficient solutions.
In this work, we argue that connectedness is not the only feature which explains the difficulty of MoCO for search algorithms. Analogously to the single-objective long path problems, where a hillclimbing algorithm is outperformed by a simple EA, even if the search space is unimodal, we here oppose straightforward extensions of those algorithms, a hillclimbing algorithm and a simple EA, in a multiobjective context. On one side, PLS extends a single-objective hillclimber in terms of Pareto dominance . At the opposite, we use an adaptation of the Simple Evolutionary Multiobjective Optimization (SEMO) algorithm . Both approaches are initialized with one solution from the efficient set, corresponding to an extreme point of the Pareto front. In this paper, we propose the definition of the biobjective long path problem (-lp) and of the biobjective multiple path problem (-mp). With -lp, we show experimentally that, even if the efficient set is connected, the runtime required by PLS to find a reasonably good approximation (in terms of hypervolume ) is larger than for SEMO, and becomes computationally prohibitive for large-size instances. Furthermore, we construct -mp instances where the efficient set is completely disconnected, but some additional shortcuts are available to walk from one non-dominated solution to the others. In this case, we show experimentally that PLS can find a good approximation in a significantly less amount of time than SEMO. Indeed, both algorithms differ in the way they sample the efficient set. For -lp, PLS can only follow the path defined by the connectedness property while SEMO is able to take some shortcuts outside of the path. For -mp, PLS takes advantage of the multiple paths, defined outside the efficient set, which are temporally non-dominated and that lead to further non-dominated solutions.
The reminder of the paper is organized as follows. First, some notions related to MoCO, connectedness and long path problems are briefly presented in the next section. Section 3 introduces the class of biobjective long path problems, for which the efficient set is fully connected and exponential in the size of the problem instance. Next, the class of multiple path problems is presented in Section 4. It handles an exponential number of disconnected efficient solutions. Our experiments illustrate that PLS appears to be outperformed by SEMO for biobjective long path problems, while more surprisingly, the opposite occurs for multiple path problems. This work leads to further investigations on a proper definition of fitness landscapes for MoCO, not only with regards to the efficient set itself, but also to the way that leads to its approximation.
2.1 Multiobjective Combinatorial Optimization
A multiobjective optimization problem can be defined by a set of objective functions , and a set of feasible solutions in the decision space. In the combinatorial case, is a discrete set. Let
denote the set of feasible outcome vectors in theobjective space. To each solution is assigned an objective vector on the basis of a vector function with . Without loss of generality, we here assume that all objective functions are to be maximized. A solution is said to dominate a solution , denoted by , iff , and such as . A solution is said to be efficient (or Pareto optimal, non-dominated) if there does not exist any other solution such that dominates . The set of all efficient solutions is called the efficient set and its mapping in the objective space is called the Pareto front. A possible approach in MoCO is to find a minimal set of efficient solutions, such that strictly one solution maps to each non-dominated vector. However, generating the entire efficient set of a MoCO problem is usually infeasible for two main reasons. First, the number of efficient solutions is typically exponential in the size of the problem instance . In that sense, most MoCO problems are said to be intractable. Second, deciding if a feasible solution belongs to the efficient set is known to be NP-complete for numerous MoCO problems , even if none of its single-objective counterpart is NP-hard. Therefore, the overall goal is often to identify a good efficient set approximation, ideally a subpart of the efficient set. To this end, heuristic approaches have received a growing interest in the last decades.
2.2 Local Search and Connectedness
A neighborhood structure is a function that assigns a set of solutions to any solution . is called the neighborhood of , and a solution is called a neighbor of . Local search algorithms for MoCO, like the Pareto Local Search (PLS) , generally combine the use of such a neighborhood structure with the management of an archive (or population) of mutually non-dominated solutions found so far. The basic idea is to iteratively improve this archive by exploring the neighborhood of its own content until no further improvement is possible, or until another stopping condition is fulfilled.
Recently, local search approaches have been successfully applied to MoCO problems. Some structural properties of the landscape seem to allow the search space to be explored in an effective way. Such a property, related to the efficient set, is connectedness [3, 4]. As argued by the original authors, it could provide a theoretical justification for the design of multiobjective local search. Let us define a graph such that each node represents an efficient solution, and an edge connects a pair of nodes if the corresponding solutions are neighbors with respect to a given neighborhood relation . The efficient set is said to be connected if there exists a path between every pair of nodes in the graph. Paquete and Stützle  extended this notion by introducing an arbitrary distance separating two efficient solutions (i.e. the minimal number of neighbors to visit to go from one solution to another). Unfortunately, in the general case, rather negative results have been reported in the literature for some classical MoCO problems [3, 4]. However, in practice, many empirical results show that efficient solutions for some MoCO problems are strongly clustered with respect to more classical neighborhood structures from combinatorial optimization, see for instance . Indeed, in the case of connectedness, by starting with one or more non-dominated solutions, it becomes possible to find all the efficient solutions through a basic iterative neighborhood exploration procedure, like PLS. However, we show in this paper that connectedness is not the only property to deal with when searching for an approximation of the efficient set.
2.3 The Single-objective Long -path Problem
The long path problem has been introduced by Horn et al.  to design unimodal landscapes where the path length to reach the global optimum is exponential in the size of the problem instance. The long -path is defined on bit strings of size . Let be a long -path of dimension , and the solution on this path. The long -path of dimension is only made of two solutions , and the path of dimension can be defined by recursion:
where is the length of the -path of dimension . The fitness function of the long -path problem (to be maximized) is defined as follows. For all :
where is the number of ‘’ in the bit string . In the long -path, a shortcut can be found by flipping consecutive bits. For a hillclimbing algorithm which chooses the best solution in the neighborhood defined by Hamming distance , the number of iterations to reach the global optimum matches the length of the path, . The number of evaluations is then () for a hillclimber. On the contrary, a
EA which flips each bit with a probabilityat each iteration is found the global optimum in polynomial expected running time 111The lower bound of the expected runtime could be exponential when ..
3 The Biobjective Long -path Problem
In this section, we propose a biobjective problem where the efficient set is connected, but so huge that the full enumeration of it cannot be made in polynomial time. We define the biobjective long -path problem to show that the required runtime to sample a connected efficient set can be very long for a simple local search algorithm.
The biobjective long -path problem (-lp) is defined on a bit string of length , with an objective function vector of dimension . Each objective function corresponds to a ‘single’ long -path problem, which is to be maximized. The -lp is built such that the efficient set matches the path . The objective function vector of -lp is defined as follows. For all :
where is the function which associates each integer to the point of coordinates in the objective space. So, the first objective is the fitness function of the single-objective long -path problem.
The efficient set of -lp corresponds to the path (see Fig. 1). By construction, all solutions in are neighbors with respect to Hamming distance , so that the efficient set is connected. The size of is , which cannot be enumerated in a polynomial number of evaluations in the general case. The efficient set of -lp is then () connected and () intractable. Let us now experimentally examine the ability of search algorithms to identify a good approximation of it.
3.2 Experimental Analysis
For the single-objective long path problems, existing studies are based on the comparison of a hillclimber and of a EA . Then, we will here consider straightforward multiobjective extensions of these approaches, respectively a PLS- and a SEMO-like algorithm. They are both adapted to the path problems (-lp and -mp) introduced in this paper, and they will be respectively denoted by PLS and SEMO to differentiate them from their original implementation. A pseudo-code is given in Algorithm 1 and Algorithm 2, respectively. At each PLS iteration, one solution is chosen at random from the archive. All solutions located at Hamming distance are evaluated and are checked for insertion in the archive. For the problem under study, note that at most two neighbors are located on the long path, with one of them being already found at a previous iteration. The current solution is then marked as visited in order to avoid a useless revaluation of its neighborhood. At each SEMO step, one solution is randomly chosen from the archive. Each bit of this solution is independently flipped with a probability , and the obtained solution is checked for insertion in the archive. In PLS, the whole neighborhood is explored while in SEMO, all solutions are potentially reachable with respect to different probabilities222In SEMO, the neighborhood operator is generally supposed to be ergodic .. In order to take advantage of the connectedness property, the archive of both algorithms is initialized with one solution from the efficient set: the bit string of size .
However, the efficient set of -lp is intractable. It becomes then impracticable to use an unbounded archive for large-size problem instances. As a consequence, contrary to the original approaches, we here maintain a bounded archive of size in our implementation of the algorithms. Our attempt is not to compare different bounded archiving techniques, but rather to limit the number of evaluations required for computing a reasonably good approximation of the efficient set. So, we define a nearly ideal archiving method to find such an approximation for the particular case of -lp. If the Pareto front was linear, an ‘optimal’ approximation of size
contains uniformly distributed points over the segmentin the objective space. Note that, in our case, those points do not necessarily correspond to feasible solutions in the decision space. The distance between solutions with respect to the first objective is then . The bounded archiving technique under consideration is given in Algorithm 3. First, dominated solutions are always discarded. If the number of non-dominated solutions becomes too large, the solution with the lowest first objective value which is too close from the previous one (i.e. the difference with respect to the first objective is below ) is removed from the archive. If this rule does not hold for any solution, the penultimate solution (with respect to the order defined by objective ) is removed (not the last one). Of course, such an archiving technique is -lp-specific, but it does not introduce any bias within heuristic rules generally defined by existing diversity-based archiving approaches.
3.2.2 Experimental Design.
The algorithms are compared in terms of the required number of evaluations to attain a reasonable approximation of the efficient set. The cost related to archiving is then ignored, as we want to focus on the complexity of algorithms independently of the archiving strategy. The stopping criteria is based on a percentage of hypervolume  covered by the solutions from the archive. For -lp, an upper bound of the maximal hypervolume () for an approximation of size can be computed by uniformly distributing points over the Pareto front, that is , being the reference point. Once the hypervolume covered by the current archive is below an -value from , the algorithm stops.
The experimental study has been conducted with and dimensions . We use an archive of size , and the required approximation to be found is less than of the maximal hypervolume. In other words, at least of the best-possible approximation is covered in terms of hypervolume. The archive is initialized with a bit string where all bits are set to ‘’. The number of evaluations is reported over independent runs.
3.2.3 Results and Discussion.
shows the average and the standard deviation of the number of evaluations for each algorithm. The number of evaluations required by PLSseems to grow exponentially with the dimension . It could be interpreted as follows. To approximate the efficient set, PLS follows the long path. When the archive reaches its maximum size, the archiving technique let one solution at an ‘optimal’ position in the objective space at every iteration. So, at a given iteration , the current hypervolume is approximately , where . Then, the stopping criteria is reached at the end of the long path only, so that the number of evaluations is more than exponential in the dimension of the problem instance ( times larger). For SEMO, the number of evaluations increases from evaluations for to for . The computational effort required by SEMO and by PLS is different of several orders of magnitude. For SEMO, it is difficult to pretend that the runtime is polynomial or not, nevertheless the number of evaluations remains huge. The increase is higher than quadratic and seems to fit a cubic curve.
To summarize, SEMO can sample the efficient set more easily than PLS by taking shortcuts out of the long path. From the SEMO point of view, the efficient set is -connected : one efficient solution can be reached by flipping bits of another efficient solution. The computational difference between the two algorithms can be explained by different structures of the graph of efficient solutions. For PLS, it is linear, and for SEMO, the distance between efficient solutions in the graph is much smaller than the distance in the objective space. This result suggests that the connectedness property is not fully satisfactorily to explain the degree of difficulty of the problem. The structure of the graph of efficient solutions induced by the neighborhood relation should also be taken into account. In the next section, we will show that the structure of this graph is still not enough to explain all the difficulties.
4 The Biobjective Multiple -path Problem
In the biobjective long -path, the efficient set is connected, intractable and difficult to sample. In this section, we define the biobjective multiple -path problem (-mp) where the efficient set is still intractable but not connected anymore, while easier to sample for a PLS-like algorithm.
The idea is to modify -lp in order to make the efficient set disconnected (with respect to Hamming distance ), and to add some shortcuts out of the path that guide the search towards efficient solutions. A -mp instance of dimension is defined for bit strings of size such that , with being an even integer value. First, let us define the additional paths, called extra paths. Let and be the extra paths of the -path of dimension . Let be a concatenation of and . (resp. ) is the solution on the extra path from solution to solution of the long -path (resp. from to ). and are defined like the bridges in the single-objective long path problem . :
The sequence of neighboring solutions is the extra path to go from solution to solution . Respectively, the sequence allows to go from to . For an even number, and have the same parity: is even iff is even.
In -mp, the efficient set corresponds to the set of solutions in the long path where is an even number. The efficient set is then fully disconnected with respect to Hamming distance . Solutions which are out of the efficient set are translated by a vector ‘under’ the solutions , so that they become dominated. As a consequence, a solution leads to, but is dominated by, the efficient solution . However, and are mutually non-dominated. In the same way, the extra paths to go from to are put on the first diagonal of the square enclosed by and . More formally, the fitness function of the -mp can be defined as follows. For all :
Fig. 3 illustrates the extra paths starting from one solution. Fig. 4 shows the objective space of a -mp instance. For , solution is a neighbor of solution and is dominated by it. As well, solution is a neighbor of the efficient solution and is dominated by it. However, all and are mutually non-dominated. The extra paths (Down) lead to a further solution in the long path, and the extra paths (Up) are the backward paths of the extra paths . With those extra paths, an algorithm based on one bit-flipping can reach an efficient solution easily, just by following the sequence defined by the set of mutually non-dominated solutions found so far.
4.2 Experimental Analysis
The experimental study is conducted with the same approaches and parameters defined for the biobjective long path problem on the previous section. Fig. 5 shows the average value and the standard deviation of the number of evaluations for each algorithm. Fig. 6 allows to compare the number of evaluations with the previous problem. Contrary to the results obtained for the long -path problem, PLS here clearly outperforms SEMO which needs times more evaluations for dimension . For PLS, the number of evaluations increases linearly with the dimension of the problem instance. PLS can find easily the same shortcuts than SEMO, and the latter now loses computational resources to explore dominated solution and to evaluate the neighborhood of some solutions from the archive more than once. The curves on the right show that it is much easier to sample the efficient set of the multiple -path than for the long -path problem: for dimension , nearly times more evaluations are required between SEMO for -lp and PLS for -mp.
This is the main results of this study. The extra paths guide the search process to efficient solutions distributed all over the Pareto front. The extra solutions are not in the efficient set and do not appear on the graph of efficient solutions, but they are the keys to explain the performances of local search approaches. Indeed, efficient solutions can now be reached very quickly by following the extra paths, this explains the good performances of the algorithms. Features from the efficient set (connectedness, etc.) are independent of the solutions from the extra paths. Hence, the features of the efficient set are not the only key issue to explain the success of local search for MoCO.
5 Conclusions and Future Works
In this paper, we proposed two new classes of biobjective combinatorial optimization problems, the long and the multiple path problems, in order to demonstrate empirically that connectedness is not the only key issue that characterizes the difficulty of a multiobjective combinatorial optimization problem. In other words, connectedness is not the ‘Holy Grail’ of search space features when the efficient set is intractable, and when the goal is to find a limited-size approximation. Indeed, on the long path problems, where the efficient set is intractable and connected, our experiments show that the running time to approximate it is exponential for a Pareto-based local search (PLS), and polynomial for a simple Pareto-based evolutionary algorithm (SEMO). On the multiple path problems, where the efficient set is still intractable but disconnected, PLS now outperforms SEMO, which seems rather unexpected at first sight. This suggests two new considerations to measure the difficulty of finding a good efficient set approximation:
First, the structure of the graph of efficient solutions induced by the neighborhood relation defined by the algorithm should also be taken into account. In the long path problems, this graph is a huge line for PLS whereas it is highly connected for SEMO. Extending the notion of cluster on the efficient graph as defined by Paquete and Stützle , we should study a graph where an edge between efficient solutions is defined as the probability to reach one solution from the other.
Second, the solutions outside the efficient set should also be considered. In the multiple path problems, some solutions outside of the efficient set are temporally non-dominated so that they are saved into the archive during the search process. They help to approximate the (disconnected) efficient set.
In some sense, the fitness landscape of biobjective multiple path problems is unimodal, with a number of short paths leading to good solutions. On the contrary, the biobjective long path problem can be characterized by a unimodal landscape where the path to good solutions is intractable.
Clearly, following the work of Horoba and Neumann , the next step will consist in leading a rigorous runtime analysis of PLS and SEMO for both the multiple and the long path problems. The actual bounded archiving method is probably too specific, and seems very difficult to study rigorously. Then, in order to do so, we certainly have to change this strategy with the concept of -dominance, for instance. It is also possible to extend the biobjective path problems proposed in this paper to a larger objective space dimension (more than objective functions), or with a larger ‘disconnectedness’ (delete more than one solution over two). The next challenge will be to define a relevant definition of fitness landscape in order to better understand the difficulty of multiobjective combinatorial optimization problems. Given that the goal is here to find a set of solutions, we believe that another way to do so would be to analyze a fitness landscape where the search space consists of sets of solutions. A solution would then be a set of bit strings instead of a single bit string for the problems under study in this paper. Therefore, we plan to formally define fitness landscapes for the recent proposal of set-based multiobjective optimization .
The authors are grateful to Dr. Dirk Thierens for useful suggestions on the relation between intractable efficient sets and long path problems. They would also like to thank Dr. Luis Paquete for fruitful discussion on the subject of this work.
-  Horn, J., Goldberg, D., Deb, K.: Long path problems. In: Parallel Problem Solving from Nature (PPSN III). Volume 866 of Lecture Notes in Computer Science. Springer (1994) 149–158
-  Rudolph, G.: How mutation and selection solve long path problems in polynomial expected time. Evolutionary Computation 4(2) (1996) 195–205
-  Gorski, J., Klamroth, K., Ruzika, S.: Connectedness of efficient solutions in multiple objective combinatorial optimization. Technical Report 102/2006, University of Kaiserslautern, Department of Mathematics (2006)
-  Ehrgott, M., Klamroth, K.: Connectedness of efficient solutions in multiple criteria combinatorial optimization. European Journal of Operational Research 97(1) (1997) 159–166
-  Paquete, L., Stützle, T.: Clusters of non-dominated solutions in multiobjective combinatorial optimization: An experimental analysis. In: Multiobjective Programming and Goal Programming. Volume 618 of Lecture Notes in Economics and Mathematical Systems. Springer (2009) 69–77
-  Paquete, L., Chiarandini, M., Stützle, T.: Pareto local optimum sets in the biobjective traveling salesman problem: An experimental study. In: Metaheuristics for Multiobjective Optimisation. Volume 535 of Lecture Notes in Economics and Mathematical Systems. Springer (2004) 177–199
-  Ehrgott, M.: Multicriteria optimization. Second edn. Springer (2005)
-  Laumanns, M., Thiele, L., Zitzler, E.: Running time analysis of evolutionary algorithms on a simplified multiobjective knapsack problem. Natural Computing: an international journal 3(1) (2004) 37–51
-  Zitzler, E., Thiele, L.: Multiobjective evolutionary algorithms: A comparative case study and the strength pareto approach. IEEE Transactions on Evolutionary Computation 3(4) (1999) 257–271
-  Serafini, P.: Some considerations about computational complexity for multiobjective combinatorial problems. In: Recent advances and historical development of vector optimization. Volume 294 of Lecture Notes in Economics and Mathematical Systems., Springer (1986)
-  Droste, S., Jansen, T., Wegener, I.: On the optimization of unimodal functions with the (1 + 1) evolutionary algorithm. In: Parallel Problem Solving from Nature (PPSN V). Volume 1498 of Lecture Notes in Computer Science. Springer (1998) 13–22
Horoba, C., Neumann, F.:
Additive approximations of pareto-optimal sets by evolutionary
In: Tenth Workshop on Foundations of Genetic Algorithms (FOGA 2009), New York, NY, USA, ACM (2009) 79–86
-  Zitzler, E., Thiele, L., Bader, J.: On set-based multiobjective optimization. IEEE Transactions on Evolutionary Computation 14(1) (2010) 58–79