The No Free Lunch (NFL) theorems—roughly speaking—state that all search algorithms have the same average performance over all possible objective functions , where the search space as well as the cost-value space are finite sets . However, it has been argued that in practice one does not need an algorithm that performs well on all possible functions, but only on a subset that arises from the real-world problems at hand. Further, it has been shown that for pseudo-Boolean functions restrictions of the complexity lead to subsets of functions on which some algorithms perform better than others (e.g., in  complexity is defined in terms of the number of local minima and in  the complexity is defined based on the size of the smallest OBDD representations of the functions).
Recently, a sharpened version of the NFL theorem has been proven that states that NFL results hold for any subset of the set of all possible functions if and only if is closed under permutation . Based on this important result, we can derive classes of functions where NFL does not hold simply by showing that these classes are not closed under permutation (c.u.p.). This leads to the encouraging results in this paper: It is proven that the fraction of subsets c.u.p. is so small that it can be neglected. In addition, arguments are given why we think that objective functions resulting from important classes of real-world problems are likely not to be c.u.p.
In the following section, we give some basic definitions and concisely restate the sharpened NFL theorem given in . Then we derive the number of subsets c.u.p. Finally, we discuss some observations regarding structured search spaces and closure under permutation.
We consider a finite search space and a finite set of cost values . Let be the set of all objective functions to be optimized (also called fitness, energy, or cost functions). NFL theorems are concerned with non-repeating black-box search algorithms (referred to simply as algorithms for brevity) that choose a new exploration point in the search space depending on the complete history of prior explorations: Let the sequence represent non-repeating explorations , and their cost values . An algorithm appends a pair to this sequence by mapping to a new point , . Generally, the performance of an algorithm after iterations with respect to a function depends on the sequence of cost values the algorithm has produced. Let the function denote a performance measure mapping sequences of to the real numbers (e.g., in the case of function minimization a performance measure that returns the minimum value in the sequence could be a reasonable choice).
Let be a permutation (i.e., bijective function) of . The set of all permutations of is denoted by . A set is said to be closed under permutation (c.u.p.) if for any and any function the function is also in .
Theorem 1 (Nfl).
For any two algorithms and , any value , and any performance measure
iff is c.u.p.
3 Fraction of Subsets Closed under Permutation
Let be the set of functions mapping . There exist non-empty subsets of . We want to calculate the fraction of subsets that are c.u.p.
The number of non-empty subsets of that are c.u.p. is given by
The proof is given in the appendix.
Figure 1 shows a plot of the fraction of non-empty subsets c.u.p., i.e.,
versus the cardinality of for different values of . The fraction decreases for increasing as well as for increasing . Already for small and the fraction almost vanishes, e.g., for a Boolean function the fraction is .
4 Search Spaces with Neighborhood Relations
In the previous section, we have shown that the fraction of subsets c.u.p. is close to zero already for small search and cost-value spaces. Still, the absolute number of subsets c.u.p. grows rapidly with increasing and . What if these classes of functions are the “important” ones, i.e., those we are dealing with in practice? In this section, we define some quite general constraints on functions important in practice that induce classes of functions that are not c.u.p.
We believe that two assumptions can be made for most of the functions we are dealing with in real-world optimization: First, the search space has some structure. Second, the set of objective functions we are interested in fulfills some constraints defined based on this structure. More formally, there exists a non-trivial neighborhood relation on based on which constraints on the set of functions under consideration are formulated. For example, with respect to a neighborhood relation we can define concepts like ruggedness or local optimality and constraints like upper bounds on the ruggedness or on the maximum number of local minima. Intuitively, it is likely that in a function class c.u.p. there exists a function that violates such constraints.
We define a simple neighborhood relation on as a symmetric function . Two elements are called neighbors iff . We call a neighborhood non-trivial iff and . It holds:
A non-trivial neighborhood on is not invariant under permutations of .
It holds . For any permutation that maps and onto and , respectively, the invariance property, , is violated.
Assume the search space can be decomposed as and let on one component exist a non-trivial neighborhood . This neighborhood induces a non-trivial neighborhood on , where two points are neighbored iff their -th components are neighbored with respect to . Thus, the constraints discussed below need only refer to a single component.
The neighborhood relation need not be the canonical one (e.g., Hamming-distance for Boolean search spaces). Instead, it can be based on “phenotypic” properties (e.g., if integers are encoded by bit-strings, then the bit-strings can be defined as neighbored iff the corresponding integers are).
Now we describe some constrains that are defined with respect to a neighborhood relation and are—to our minds—relevant in practice. For this purpose, we assume a metric on , e.g., in the typical case of real-valued fitness function the Euclidean distance.
First, we show how a constraint on steepness (closely related to the concept of strong causality) leads to a set of functions that is not c.u.p. Based on a neighborhood relation on the search space, we can define a simple measure of maximum steepness of a function by
Further, for a function , we define the diameter of its range as
If the maximum steepness of every function in a non-empty subset is constrained to be smaller than the maximal possible , then is not c.u.p.
Let and let and be two points with property . Since the neighborhood on is non-trivial there exist two neighboring points and . There exists a permutation that maps and on and . If is c.u.p., the function is in . This function has steepness , which contradicts the steepness-constraint.
As a second constraint, we consider the number of local minima, which is often regarded as a measure of complexity . For a function a point is a local minimum iff for all neighbors of . Given a function and a neighborhood relation on , we define as the maximal number of minima that functions with the same -histogram as can have (i.e., functions where the number of -values that are mapped to a certain -value are the same as for , see appendix). In the appendix we prove that for any two functions with the same -histogram there exists a permutation with . Thus, it follows:
If the number of local minima of every function in a non-empty subset is constrained to be smaller than the maximal possible , then is not c.u.p.
For example, consider pseudo-Boolean function and let two points be neighbored iff they have Hamming-distance one. Then the maximum number of local minima is .
Based on the results in , we have shown that the statement “I’m only interested in a subset
of all possible functions, so the NFL theorems do not apply” is true with a probability close to one (ifis chosen uniformly and and have reasonable cardinalities). Further, the statements “In my application domain, functions with maximum number of local minima are not realistic” and “For some components, the objective functions under consideration will not have the maximal possible steepness” lead to scenarios where NFL does not hold.
We thank Hannes Edelbrunner for fruitful discussions and Thomas Jansen, Stefan Wiegand, and Michael Hüsken for their comments on the manuscript. This work was supported by the DFG, grant Solesys, number SE251/41-1.
Appendix A Proof of Theorem 2
For the proof, we use the concepts of -histograms: We define a -histogram (histogram for short) as a mapping such that . The set of all histograms is denoted . With any function we associate the histogram that counts the number of elements in that are mapped to the same value by . Herein, returns the preimage of . Further, we call two functions -equivalent iff they have the same histogram and we call the corresponding -equivalence class containing all function with histogram a basis class. Before we prove theorem 2, we consider the following lemma that gives some basic properties of basis classes.
pairwise disjoint basis classes and
Two functions are -equivalent iff there exists a permutation of such that .
is equal to the permutation orbit of any function with histogram , i.e.,
Any subset that is c.u.p. is uniquely defined by a union of pairwise disjoint basis classes.
The number of different histograms is given by
i.e., the number of distinguishable distributions (e.g., , p. 38). Two basis classes and , , are disjoint because functions in different basis classes have different histograms. The union because every function in has a histogram.
Let be two functions with same histogram . Then, for any , and are equal in size and there exists a bijective function between these two subsets. Then the bijection
defines a unique permutation such that . Thus, -equivalence implies existence of a permutation. On the other hand, the histogram of a function is invariant under permutation since for any and
because is bijective and the addends can be resorted. Thus, existence of a permutation implies -equivalence.
For a subset , let (i.e., contains all functions in with the same histogram ). By (a), all are pairwise disjoint and . Suppose : Since is c.u.p. there exists a function that spans the orbit . Thus and therefore . Because basis classes are disjoint, the union
Proof of theorem 2.
The number of different, non-empty unions of basis classes (equal to the cardinality of power set of the set of all basis classes minus one for the empty set) is given by
S. Droste, T. Jansen, and I. Wegener.
Perhaps not a free lunch but at least a free appetizer.
In W. Banzhaf, J. Daida, A. Eiben, M. H. Garzon, V. Honovar,
M. Jakiela, and R. E. Smith, editors,
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’99), pages 833–839. Morgan Kaufmann, 1999.
An Introduction to Probability Theory and Its Applications, volume I. John Wiley & Sons, New York, Chichester, Brisbane, Singapore, 3. edition, 1971.
-  C. Schuhmacher, M. D. Vose, and L. D. Whitley. The no free lunch and description length. In L. Spector, E. Goodman, A. Wu, W. Langdon, H.-M. Voigt, M. Gen, S. Sen, M. Dorigo, S. Pezeshk, M. Garzon, and E. Burke, editors, Genetic and Evolutionary Computation Conference (GECCO 2001), pages 565–570. Morgan Kaufmann, 2001.
-  D. Whitley. A free lunch proof for gray versus binary encodings. In W. Banzhaf, J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. J. akiela, and R. E. Smith, editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO’99), volume 1, pages 726–733. Morgan Kaufmann, 1999.
-  D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 1995.