I Introduction
Simulated landscapes have been used for several decades to evaluate search strategies whose goal is to find the landscape location with maximum fitness. Applications of simulated landscapes include modeling the capacity of enzymes to catalyze reactions or ligands to bind to proteins, and the clinical effectiveness of medical treatments [10], [2]. Understanding properties of landscapes is important for understanding search difficulty.
landscapes are defined by a straightforward, tunable algorithm where specifies the number of binary features or loci and the degree of epistatic interactions among the loci [9]. landscapes are convenient because there are only two tunable parameters ( and ), and yet they provide a very rich set of landscapes. There is a large literature exploring properties of landscapes, though there are few analytic results, with [4] and [11] notable exceptions.
This paper provides a novel perspective on landscapes and generalizations of landscapes, proving that these landscapes can be characterized by parametric linear models comprised of main effects and interaction effects where the model coefficients have meaningful interpretations. The
algorithm induces a statistical distribution on the parametric model coefficients. We derive the distribution of model coefficients, showing how the
algorithm, for , automatically assigns the largest expected magnitude to main effects, with the expected magnitude of interaction effects typically decreasing with increasing order of interaction.The linear model representation of the landscape suggests that model rank should provide a measure of landscape complexity. A simple method of assessing rank is provided, and we determine conditions on and sufficient for the existence of designs that achieve maximum rank. We then show that the expected number of local optima is proportional to an orthant probability, which can be calculated with reasonable speed and accuracy for very large landscapes. Interestingly and surprisingly, rank is both positively and negatively correlated with the number of local optima. For fixed , it is wellknown [3] that the number of local optima increases with (i.e. a positive correlation with landscape rank). We show that when and are both fixed, there is a strong negative correlation between rank and number of local optima for classic landscapes. The statistical distribution of model effect coefficients provides an explanation for this counter intuitive phenomenon.
In related work, a proposal to use linear models with main effects and interactions to construct landscapes was explored in [14], [15] and [16]. These authors examined the effect of epistatic interactions on the properties of landscapes using metrics common in the experimental design literature, and they noted the equivalence between Walsh function decompositions and interaction models. An analysis of landscapes using Walsh functions was given in [6].
Ii Generalized Landscapes and Interaction Models
In this section we define the landscape and interaction models. The models are represented as linear models in matrix form, as this representation facilitates the study of model properties. For simplicity, the model is first discussed for constant across loci. The model is then generalized to allow varying across loci.
Iia landscapes
A general landscape is defined by a triple where is a set of locations, is a distance measure and is a “fitness” function. landscapes are a map from to where the fitness function is built from binary loci and epistatic interactions formed between each locus and other loci where can range from to .
To describe landscapes more fully, for each let represent
written in base 2 and represented as a (binary) vector of length
. For , let denote a vector of independent random weights, where each component has meanand variance
and otherwise arbitrary probability distribution. Let
denote a function from where denotes unit coordinate vectors in , i.e. vectors of the form . The definition of the function depends on the epistatic interactions to the th locus. The function is defined explicitly below.For , define . Note that is the landscape fitness at location and that and as each is comprised of a sum of independent weights. The landscape is defined by the fitnesslocation pairs coupled with Hamming distance.
IiB Matrix representation of generalized landscapes
We begin by providing an explicit description for the construction and representation of and generalized landscapes as linear models in matrix form. For , let where and with for some . denotes the th interaction set, comprised of locus and loci that interact with locus . Note that isn’t restricted to be constant across loci, and there are no restrictions on the number of times locus can appear in the interaction sets. This represents a generalization of landscapes, with the classic landscape occurring as the special case when and each locus appears in exactly interaction sets. An additional generalization would be to not require locus to be a member of . The results in this article would still hold for this generalization.
With a slight abuse of notation, for each write where are the elements of corresponding to . Thinking of as a binary number, let be the decimal representation of this number. Define the vector where the 1 occurs in column . Alternatively, if
is the identity matrix of size
, then is the th row of .To write the generalized model in matrix form, consider the vector where and denotes column concatenation. Define the model matrix
(1) 
With this definition, the generalized landscape is where denotes the vector of independent random fitnesses.
Example: To illustrate the definition of and the matrix , suppose , for and , and . Then
The functions and are defined similarly. The matrix is then
To our knowledge, landscapes have never been formalized using the matrix representation given here. The rank of is a measure of the richness of the landscape, as it gives the dimension of the domain for the vector . We will show that the rank of is determined by , and the structure of the interaction sets , and that using rank as a measure of complexity provides refinement beyond using only and .
IiC Interaction Model
Here we define the general form of parametric interaction models, and in the next section relate them to landscapes. Statisticians have long employed interaction models to study the effects of multiple ‘treatments’ on an outcome, see for example [12]
. Interaction models are straightforward to define and model parameters have meaningful interpretations. In the evolutionary computing literature, these models seem to have received little attention, with the exception of
[14],[15] and [16].To define a general interaction model, for , let and define . Mathematical properties of the interaction model and interpretation of model parameters are most easily obtained using the transformed values where as before .
The general form of an interaction model with terms is
where the ’s are coefficients that can take any value in and where we adopt the convention that when , .
Example: Consider a model with loci and , . The interaction model is . In this example, and are main effects coefficients while and are twoloci interaction coefficients.
The intercept () and coefficients of the main effects and interaction terms have meaningful interpretations. The intercept coefficient represents the average of the fitness values across the entire landscape. For the general interaction model, it is not difficult to show that the main effect represents the difference in fitness values when locus is varied between and , averaged over the values of the other loci. represents the difference of differences between fitness values for and when and , averaged over all other for . In general, higher order interaction coefficients are interpreted as average differences between lower order interactions. For example, represents the average difference in the two factor interaction between loci and when and .
Iii Generalized Landscapes as Interaction Models
Here we show that generalized landscapes can be expressed as linear interaction models, and that the interactions that are included in the model are completely determined by the interaction sets , . We also derive the statistical properties of the interaction model coefficients and show how to construct classic landscapes that maximize the number of interaction terms.
Let denote the power set for , i.e. the set and all it’s subsets, including the empty set. Let , and consider the interaction model
(2) 
Evaluated at the transformed values , the model can be written in matrix notation as where is an appropriately defined matrix, is the number of elements in and is an vector of coefficients. The random vector of main effects and interactions has distributional properties dependent on the probability distribution of the vector of weights and the structure of the interaction sets . The distributional properties of are studied in section IIIA. Note that the model defined in (2) always contains an intercept and all main effects terms.
Example: Consider again the example in Section IIB, where , for and , and . Then and
The first column of corresponds to the intercept , columns two through four are for the main effects, and columns five through seven are for the two loci interactions. Column five, for example, corresponds to the interaction between loci 1 and 2, and is obtained by taking the product of the elements of columns two and three, which correspond to the main effects for loci 1 and 2.
The following proposition establishes that a generalized landscape defined by interaction sets is equivalent to the interaction model given by (2), thereby clearly establishing the nonzero interaction effects induced by the algorithm. The proof is given in the appendix.
Proposition 1.
Equation (2) shows that the algorithm constructs an interaction model in an interesting way. Note that the algorithm dictates that the interaction model contain all subinteractions contained in higher order interactions. For example, if an landscape has a fourth order interaction between loci , then it also has the four thirdorder and six secondorder interactions defined by the subsets of the four interacting loci.
However, knowing which interactions have nonzero coefficients is not sufficient to fully understand the structure of landscapes. A complete understanding requires knowing the statistical properties of the main effect and interaction coefficients.
Iiia Induced properties of interaction model coefficients
While we have established that generalized landscapes can be represented by interaction models and shown how the epistatic interaction sets define the interaction terms included in the model, additional insight into landscape properties and a complete understanding of the representation requires knowledge of the distributional properties of the interaction model coefficients. This is addressed in the next proposition, with proof given in the appendix.
Proposition 2.
Consider a generalized landscape defined by interaction sets and given by with weight vector where and Var. For , let denote the coefficient of the interaction term corresponding to , and the indicator function, i.e.
Then
and
Some remarks are in order. First, if the weights
are normally distributed, then the interaction model coefficients, which are linear functions of
, are independent and normally distributed with the indicated means and variances. For other distributions on , the exact distribution of the coefficients is an often intractable convolution problem. However, as both andincrease, the central limit theorem can be invoked to show that the coefficients will be approximately normally distributed (contrast with approximate normality of the fitnesses, which only requires
large). Regardless of the distribution of , the coefficients are uncorrelated.For normally distributed, , i.e. the expected magnitude of
is proportional to it’s standard deviation. More generally, the variance represents the expected magnitude of the square of the coefficient (
for ). With this perspective, it is interesting to note how the algorithm assigns importance (magnitude) to the interaction terms. The expected squared magnitude of depends on the frequency with which . Then for the classic model where is constant and , main effects have the largest expected magnitude, second order interactions would typically have larger expected magnitude than third order interactions and so on. On the other hand, when , all coefficients have the same expected magnitude because each power set contains main effects and interactions of all orders.When the coefficients are normally distributed, independence of model coefficients means that, for example, knowing the magnitude of a two loci interaction provides no information on the magnitudes of the corresponding main effects of the loci–the algorithm assigns magnitudes of the coefficients completely independently.
The interaction model representation of a landscape with binary loci is equivalent to the representation given by Walsh functions [16]. While landscapes have been studied from the perspective of Walsh functions [6], the above results provide a transparent and explicit analysis of the Walsh coefficients for generalized landscapes, showing exactly which Walsh coefficients are nonzero and the statistical properties of these coefficients. For example, note that Theorems 4 and 5 in [6] follow immediately from the representation given in equation (2).
An interesting property of generalized landscapes gleaned from Propositions 1 and 2 is that two landscapes defined by different interaction sets can lead to landscapes with the same set of main effects and interactions, i.e. identical matrices , but the variances of the coefficients of the landscapes can be different, suggesting that the properties of the resultant landscapes may also be different. Consider, for example, the two landscapes defined by the following interaction sets: and . Clearly so that the landscapes have identical column space, but by Proposition 2, the variances of the terms are not all identical. Consider, for example, the main effect for locus 1: From Proposition 2, Var.
IiiB Rank of landscapes and maximal rank models
Here we show that the rank of the model matrix for an landscape is determined by the interaction sets that define the landscape. We derive the maximum achievable rank and describe how interaction sets can be constructed to maximize rank.
Proposition 3.
where denotes counting measure and where .
Proof.
The result follows immediately from noting that the complete set of linear and interaction terms of all orders, comprise a Hadamard matrix, a linearly independent set of (orthogonal) column vectors. ∎
It follows from the proof of Proposition 3 that number of model interactions , i.e. model rank is a onetoone function of the total number of interactions in an landscape.
The next results give an upper bound on the rank of an landscape, and provide conditions under which maximal rank designs exist for classic landscapes.
Proposition 4.
For the classic landscape ( constant across loci), where the max is over all possible designs for fixed values of and .
Proof.
The maximum rank occurs when there are no overlaps in interactions between the inputs to the loci. In this case each locus will contribute interactions of order two or higher. Summing over the loci and including the intercept and main effects terms gives the result. ∎
Proposition 4 begs the question as to when maximal rank classic landscapes exist. Equation (2) shows that all classic landscapes contain a main effect corresponding to each loci. Then for both and fixed, maximizing rank is equivalent to maximizing the number of interactions in the model. From the proof of Proposition 4, it is evident that a necessary and sufficient condition is that each set contain unique pairs of loci, as this ensures no redundant two factor interactions, and by extension no redundant higher order interactions. We now give explicit cases when the maximum rank given in Proposition 4 above can be achieved. The proof of the following proposition is given in the appendix.
Proposition 5.
There exist classic landscapes of maximal rank for the following values of and .

and all .

and all .

and all except possibly .

and all except possibly .

and all .

and and all .

and and all .

and and all .
It is not difficult to extend Proposition 5 to larger values of , and the approach for doing so is contained in the proof of the result. It is worth noting, and not difficult to prove, that a necessary condition for existence of maximal rank landscapes is that , because otherwise at least one pair of loci must occur together in more than one set .
Proposition 5 established the existence of maximum rank landscapes. We now discuss the construction of these landscapes, which is easily achieved through the use of what we term difference sets. The combinatorial theory underlying these sets is detailed in the proof of Proposition 5 given in the appendix. Table I provides the needed difference sets for up to .
To construct a maximal rank classic landscape, recall that an landscape is completely determined by it’s interaction sets . Interaction sets resulting in maximal rank designs are defined by incrementing the elements in the difference sets by one (modulo ) until sets have been defined. This process is illustrated in the following example.
Example: Consider constructing an landscape achieving maximum rank when and . From Table I, the difference set is . Then define . Note that increments are modulo , i.e. increments exceeding “wrap around”.
Difference Set  

2  
3  
4  
5  
6  
7  
8  
9 
The design constructed in the example is shown in Figure (a)a. The th column of the figure gives the elements in . Notice that no pair of loci appears together more than once across columns. The landscape resulting from this design has rank , resulting from an intercept, 7 main effects, 21 two factor interactions and 7 three factor interactions.
Contrast with Figure (b)b which shows the design resulting from choosing adjacent loci for the epistatic interactions. This design has rank 29, resulting from an intercept, 7 main effects, 14 two factor interactions and 7 three factor interactions.
Iv Number of Local optima
The number of local optima is perhaps the strongest measure of landscape ruggedness and search difficulty [8], [17], [19]. For fixed , it has been established empirically that the number of local optima increase with [3]. The result is not surprising as increasing increases the landscape model rank by increasing both the number and order of interactions defining the landscape. An unexplored question is the association between the number of local optima and landscape rank when and are both fixed.
We begin by providing an analytic expression for the expected number of local optima on and generalized landscapes.
Proposition 6.
Consider a generalized landscape defined by interaction sets and weight vector where the elements of
are independent normal random variables with
and Var. Then the expected number of local optima on the landscape is given by whereand where is an symmetric matrix with elements given by
Proof.
Let denote the fitness of a randomly selected location on the landscape, and let denote the fitness one hamming distance away obtained by flipping the value of locus . Let denote the vector of fitness differences. Then the probability that the random location with fitness is a local optima is given by the orthant probability Pr, and the expected number of local optima on the landscape is then . When is jointly normally distributed, it follows that is multivariate normal because it is a linear function of . The mean of is clearly zero, and the variance matrix is straightforward to derive and is given by defined above. The derivation is omitted. The orthant probability Pr is therefore given by . ∎
Proposition 6 assumes is multivariate normal. The utility of this assumption is that the expected number of local optima then depends on the multivariate normal orthant probability , allowing us to take advantage of the extensive research on the numerical computation of multivariate normal probabilities (e.g. [5]
and references therein). We are then able to estimate the number of local optima for very large landscapes without having to generate an actual landscape and check individually whether each location is a local peak. Note the normality assumption implies that the expected number of local optima is a function of only
and .For arbitrary probability distributions for the weights , computation of the orthant probability Pr is typically an intractable dimensional integral. When the weights are nonnormal and , are both large, the distribution of (see proof of Proposition 6) is approximately normal by the central limit theorem, and normal orthant probabilities should then provide reasonable approximations to the expected number of local optima.
An empirical study was done to explore the relation between landscape rank and the number of local optima for both classic and generalized landscapes. For each combination of and , we first generated interaction sets for 20 classic landscapes. For each value of and , maximal rank landscapes were generated when they were known to exist, and an adjacent
loci interaction design was also generated. Additional classic landscapes were generated by randomly selecting an latin square using the methods in [7] and then randomly selecting rows from the latin square. The resulting columns of elements comprise the interaction sets that define a classic landscape. For each landscape, the rank and the expected number of local optima were computed. Multivariate normal orthant probabilities representing the expected number of local optima were computed using the "mvtnorm"
package in the R computing environment, see [5].
As seen in Figure 2, there is a strong positive correlation between the expected number of local optima and landscape rank for fixed and increasing . Figure 3 provides additional resolution with separate plots for when , showing very strong negative correlation between rank and the number of local optima when and are both fixed. Results for and 100 were similar. The maximal or near maximal rank designs are given by the points in the lower right corners of Figure 3, whereas the adjacent designs are given by the point in the upper left corners. Note that landscapes computed using adjacent loci to define interaction sets resulted in landscapes with significantly lower rank and larger expected number of local optima than randomly chosen landscapes.
The strong negative correlation seen in Figure 3 is perhaps surprising. Recall that landscape rank is equivalent to the number of terms in the interaction model representation, so that larger rank corresponds to more interaction terms. It is well known that landscapes with main effects but no interactions have only a single peak. It would seem that additional interaction terms would translate on average to more rugged landscapes, but this is not the case when and are both fixed.
This phenomenon is explained, at least partially, by noting that when Proposition 2 is applied to any maximal design for an landscape, it follows that main effects have variance and interactions of all orders have variance giving a ratio of , demonstrating that main effects can have considerably more influence than interactions in maximal designs. This observation would clearly extend to designs that are nearly maximal. Conversely, for an adjacent loci design, the variation is spread more equitably among main effects and interactions. For example, applying Proposition 2, the ratio of variances for a main effect and two factor interaction is for the adjacent loci design.
Note also from Proposition 2 that for fixed and , the variance of the main effects and the sum of the variances of interaction coefficients is constant for all classic landscapes, i.e. these quantities do not vary regardless of how the interaction sets are defined. It follows that for fixed and , designs that increase the number of interaction terms must have a decreased average variance for the interaction terms. In other words, additional model complexity achieved through the addition of interaction terms is necessarily offset by a reduction in their expected magnitude.
Additional simulations were done to assess the relation between rank and number of local optima for generalized landscapes. To construct generalized landscapes, we kept the sizes of the interaction sets constant for a given landscape, but did not restrict the number of times that locus could appear in the interaction sets for (for classic landscapes, locus appears in exactly interaction sets for ).
We randomly generated 20 generalized landscapes for each combination of and . A strong positive correlation is again seen between rank and expected number of local optima as varies with fixed, see Figure 4. However, the additional resolution provided by Figure 5 shows a positive correlation between landscape rank and expected number of local optima. An explanation is that for generalized landscapes the expected magnitude of main and interaction effects can vary, and it is possible to have expected magnitudes for a subset of interactions larger than those for a subset of main effects.
V Conclusion
Representation of landscapes as linear models in matrix form provides a transparent connection to parametric linear interaction models, and provides a straightforward means for deriving the statistical properties of interaction model coefficients induced by the landscape algorithm. The interaction model representation coupled with distributional properties of model coefficients provides new insights into properties of landscapes. Expressing the expected number of local optima as a multivariate normal orthant probability provides additional insight into aspects of landscapes that affect the number of local optima, and also allows for quick computation of the expected number of local optima.
The gaps in the data points seen in the horizontal axes of Figure 4 and Figure 5 represent gaps in the ranks of landscape models and clearly illustrate that classic landscapes only represent a small subset of possible interaction designs, see also [6] for a similar observation. While our definition of generalized landscapes would fill in some of the gaps, the definition still requires that lower order interactions that are contained in higher order interactions appear in the model.
The representation of landscapes as interaction models immediately suggests a more general definition that allows for main effects and interactions of any order to appear together without restriction. Such landscapes could be constructed with any rank ranging from 1 to , and main effect and interaction coefficients could be generated with arbitrary variances and correlation structures. In this vein, [14] defined a class of tunable landscapes allowing general interaction structures and studied the performance of GA on these landscapes.
Using arguments similar to those in Section IV, it is not difficult to show that for a general interaction model with normally distributed coefficients, the expected number of local optima is represented by a normal orthant probability that depends entirely on the variance/covariance matrix of the vector of fitness differences. However, it is not obvious how to select interactions and assign variances to coefficients that will result in landscapes with large numbers of local peaks, or more generally that are difficult to search. There are few results regarding the size of normal orthant probabilities as a function of properties of the variance/covariance matrix, though [18] provide potentially useful results in this context.
Characterizing the search difficulty of landscapes is itself difficult [13], and the number of local optima is an imperfect determinant of search difficulty [8], [13]. For example, the needleinahaystack landscape, which can be represented as a parametric interaction model containing all possible interactions where the magnitudes of all main and interactions effects are equal, has a single peak which can only be found by guessing. While the maximal rank designs for fixed and had smaller expected number of local optima, they may posses other attributes that affect search difficulty.
An attractive feature of the algorithm is that a very rich set of landscapes can be generated with only two tuning parameters. Whether a procedure with comparable simplicity can be developed for constructing a more general class of interaction models that are difficult to search is an area for future research.
The following is a proof of Proposition 1.
Proof.
We first construct a matrix representation for the interaction model. For the th locus, let denote the vector of covariates corresponding to the inputs for locus . For example, if , then . Note that there are elements in . More generally, let denote the set of order interactions for the inputs of locus . For , define the vector , and note that can take different values.
Consider the vector where . Define the model matrix
(3) 
Note that but clearly as is comprised of all the columns in with some columns repeated. is an overparameterized version of , and in the following we show that .
The proof uses two basic results. First, suppose two matrices have the same column space. Choose rows from the first matrix and form a new matrix by expanding this matrix by row concatenating each of the rows times. The same operations are applied to the second matrix, that is the corresponding rows from the second matrix are repeated times. The expanded matrices are then easily seen to have the same column space. The second result is that if for then where and and where denotes column concatenation.
Let and let be the binary ordering of . Define
Then as is a full rank matrix, and is the identity matrix. Next, let and denote columns to of and respectively. Note and are obtained from and by repeating rows of these matrices. Then by the first result described at the start of the proof, .
Finally, the result follows from the fact that and .
∎
The following is the proof of Proposition 2.
Proof.
Note that and the expectation of the RHS is easily seen to be from which it follows that . Let denote the th column of and the corresponding interaction model coefficient. Then for , where the last equality follows from whenever .
To obtain the variance/covariance results, note that implies
where we use the fact that the columns of are orthogonal, each with norm . Then
To evaluate , let represent the th column of where represents the submatrix of corresponding to the interaction set (columns to of ). Then it is not difficult to show that
(4) 
where the function when and zero otherwise ( is defined explicitly below), and where denotes the interaction term corresponding to . Let denote the th row of . Then it follows from the identity above that , and then that
Comments
There are no comments yet.