1. Introduction
The ability to discover and exploit stepping stones is a hallmark of evolutionary systems. Evolutionary algorithms driven by a single fitness objective are often victims of
deception: they converge to small areas of the search space, missing available stepping stones. Novelty search (Lehman and Stanley, 2008, 2011a) is an increasingly popular paradigm that overcomes deception by ranking solutions based on how different they are from others. Novelty is computed in the space of behaviors, i.e., vectors containing semantic information about
how a solution achieves its performance when it is evaluated. In a collection of solutions with sufficiently diverse behaviors, some solutions will be useful stepping stones. However, with a large space of possible behaviors, novelty search can become increasingly unfocused, spending most of its resources in regions that will never lead to promising solutions. Recently, several approaches have been proposed to combine novelty with a more traditional fitness objective (Mouret and Clune, 2015; Mouret and Doncieux, 2009; Gomes et al., 2015; Gomez, 2009; Pugh et al., 2015) to reorient search towards fitness as it explores the behavior space. These approaches have helped scale novelty search to more complex environments, including an array of control (Cully et al., 2015; Mouret and Doncieux, 2012; Bowren et al., 2016) and content generation (Lehman and Stanley, 2011b; Liapis et al., 2013; Preuss et al., 2014; Lehman and Stanley, 2012; Nguyen et al., 2015, 2016; Lehman et al., 2016) domains.This paper shows that, aside from focusing search overall, the addition of fitness can also be used to focus search on discovering useful stepping stones. The assumption is that the most likely stepping stones occur at local optima along some dimensions of the behavior space. Competition in several existing algorithms inhibits the discovery and maintenance of such stepping stones, resulting in “spooky action at a distance”, when a small search step in one part of the space causes a novel solution to be lost in another part. Based on the notion of behavior domination, a class of algorithms is defined in this paper as a framework for understanding the dynamics of behaviordriven search and developing algorithms that avoid such problems. Intuitively, behavior domination means that a solution exerts a negative effect on the ranking of every weaker solution, and this effect increases as their difference in fitness increases and as the distance between their behaviors decreases. Behavior domination algorithms include several existing algorithms, and the definition makes it possible to transfer theoretical guarantees from multiobjective optimization; the nondominated front induced by behavior domination can be viewed (Figure 1) as a rotation of a Pareto front. Within this framework, a new algorithm is developed that uses fast nondominated sorting (Deb et al., 2002). Experimental results show that this algorithm outperforms existing approaches in domains that contain useful stepping stones, and its advantage is sustained with scale. The conclusion is that behavior domination can help illuminate the complex dynamics of behaviordriven search, and can thus lead to the design of more scalable and robust algorithms.
2. Behaviordriven Ranking
Behaviordriven algorithms are a class of evolutionary algorithms that are guided by information about how a solution achieves its performance during evaluation. The core defining component of such an algorithm is the ranking procedure it uses to order solutions for selection or replacement. This section reviews background for behaviordriven search, first defining some useful terms, and then describing examples of popular behaviordriven algorithms.
2.1. Behavior and Behavior Characterization
Behaviordriven algorithms use a notion of solution behavior to induce a meaningful distance metric between solutions and to facilitate the drive towards novelty and diversity. For example, in a robot control domain, a solution’s behavior may be some function of the robot’s trajectory (Gomez, 2009; Gomes and Christensen, 2009; Mouret and Doncieux, 2012)
, whereas in an image generation domain, it may be the result of applying some deep features to the image
(Liapis et al., 2013; Nguyen et al., 2015, 2016; Lehman et al., 2016). The following definitions of behavior, behavior characterization, behavior space, and behavior distance are fairly universal in the literature, though often not explicitly defined.Definition 2.1 ().
A behavior of solution in environment is a vector resulting from the evaluation of in .
Definition 2.2 ().
A behavior characterization for an environment is a (possibly stochastic) function mapping any solution to its behavior , given the evaluation of in .
By definition, the behavior characterization can be any function mapping solutions to vectors. In practice, the behavior characterization is usually designed to align with a fitness measure or notion of interestingness in the evaluation environment (Pugh et al., 2015). For example, in a maze navigation task, the final position of a robot aligns more with solving the task than its final orientation. In other words, the behavior characterization is designed to capture a space whose exploration is expected to have practical benefits.
Definition 2.3 ().
The behavior space of a behavior characterization is the codomain of .
The exploration of the behavior space by a search algorithm is facilitated by a function giving the distance between two solutions as a function of their behavior.
Definition 2.4 ().
A behavior distance is a metric .
In pure novelty search, the behavior of a solution is the only information returned from evaluation that is used in the ranking system. This is in contrast to traditional evolutionary algorithms, which use only a single scalar fitness value computed from a scalar fitness function . In general, a behaviordriven algorithm can take advantage of both behavior and fitness when ranking solutions.
2.2. Existing Behaviordriven Algorithms
The following are some of the most popular schemes for behaviordriven algorithms. As extensions to the pure novelty search paradigm, several recent algorithms use both behavior and fitness information in ranking, trying to navigate the tradeoff between the pressures towards novelty and diversity, and the pressure to maximize. Although more exist that are not covered here, these below should give a sense of the behaviordriven algorithm design space. (See (Mouret and Doncieux, 2012; Pugh et al., 2015; Gomes et al., 2015) for previous reviews of these algorithms.)
2.2.1. Novelty search (NS) (Lehman and Stanley, 2008, 2011a)
Each solution is ranked based on a single novelty function , giving the average distance of its behavior to the nearest behaviors of other solutions in the population and an archive of past solutions accumulated throughout search. More specifically,
where is the nearest neighbor of
in the behavior space. The prevalent method of building the archive, and the method used in this paper, is to add each solution to the archive with a fixed probability
(Lehman and Stanley, 2010; Gomes et al., 2015), in which case the archive represents a sampling from the distribution of areas visited so far. Novelty search captures the idea that more complex and interesting solutions lie away from the visited areas of the behavior space.2.2.2. Linear scalarization of novelty and fitness (LSNF) (Cuccu and Gomez, 2011; Gomes et al., 2015)
An intuitive method of combining novelty and fitness is to rank a solution based on linear scalarization of its fitness and novelty:
The fitness and novelty scores here are normalized to compensate for differences in scale at every iteration. , , , and are the minimum and maximum fitness and novelty scores in the current population. The parameter controls the tradeoff of fitness vs. novelty. LSNF with has been shown to be robust across domains (Gomes et al., 2015), and that is the version considered here.
2.2.3. NSGAII with novelty and fitness objectives (NSGANF) (Mouret and Doncieux, 2009, 2012)
Another approach is to use novelty and fitness as two objectives within NSGAII (Deb et al., 2002), the popular multiobjective framework. Often the novelty score in this approach is behavioral diversity, which is a special case of novelty, where is the population size and there is no archive. This approach has been shown to improve performance on many tasks, especially those in evolutionary robotics, where some constant diversity is useful to avoid local optima.
2.2.4. Novelty search with local competition (NSLC) (Lehman and Stanley, 2011b; Pugh et al., 2015)
Novelty search with local competition also uses an NSGAII ranking system, but instead of using a raw fitness objective alongside the novelty objective, it uses a relative fitness score: a solution’s rank in fitness among its nearest neighbors. This enables the suitable exploration of diverse niches in the behavior space with different orders of magnitude of fitness. Lower fit niches are not outpaced and forgotten by having too much of the search’s resources comitted to the globally most fit regions. NSLC has yielded particularly promising results in content generation domains, such as generating virtual creatures and images (Lehman and Stanley, 2011b; Nguyen et al., 2015).
2.2.5. MAPelites (Cully et al., 2015; Mouret and Clune, 2015)
In MAPelites, the behavior space is broken up into a set of bins, such that each behavior is mapped to a bin. For each bin, the solution with highest fitness whose behavior falls into that bin is kept. The population at any point thus consists of the most fit (elite) solution from each bin for which a behavior has been found. Because MAPelites keeps an elite from all visited bins in the behavior space, at any point the population displays a map of the levels of fitness achievable throughout the space. So, along with being a method for generating highquality diverse solutions, MAPelites is a useful tool for visualization in understanding how the behavior space and fitness landscape relate.
2.2.6. Fitnessbased search
It is worth including fitnessbased search, the standard approach to evolutionary search, as the trivial example. In fitnessbased search, solutions are ranked based on a single fitness value. Any additionally available behavior information is ignored.
The proliferation of recently introduced behaviordriven methods gives a strong indication that novelty alone is not generally sufficient for tackling complex domains. The methods reviewed above each have intriguing definitions that suggest they would be a good option for particular kinds of problems. However, unforeseen dynamics can emerge from the interaction between novelty and fitness, which can be difficult to disentangle. The next section sheds some light on these issues, resulting in the characterization of these existing algorithms, and the development of a new approach.
3. Behavior Domination Algorithms
The goal is to maintain the power of novelty search to discover stepping stones, while adding a fitness drive to focus search. Novelty search has demonstrated that a sufficiently diverse collection of solutions most likely contains useful stepping stones for solving the problem at hand. When adding fitness to focus search, the presumption is that the most useful stepping stones will be local optima along some dimensions of the behavior space. As pure fitnessbased search maintains the most fit solutions, and pure novelty search maintains the most novel solutions, a method that combines the two should maintain the most promising set of stepping stones discovered so far, and the quality of this set should improve over time. Section 3.1 discusses the presence of “spooky action at a distance” in several existing algorithms, which inhibits their ability to preserve useful stepping stones. Section 3.2 presents a formalization of behavior domination, which defines a subclass of behaviordriven algorithms that can avoid this pitfall and guarantee monotonic improvement of collected stepping stones. Section 3.3 shows that several existing behaviordriven algorithms are in this subclass. Section 3.4 uses behavior domination to develop a new algorithm based on fast nondominated sorting.
3.1. “Spooky Action at a Distance” for Behaviordriven Search
When novelty and fitness are combined, the interaction between these two drives can have unintended consequences. The stepping stone discovery ability of novelty search may not necessarily be preserved. For example, if a small change in behavior of one solution has a fatal effect on a distant isolated solution on the other edge of the explored behavior space, then a valuable stepping stone may be lost. The algorithm has taken one small step forward, but one large step back. This unsettling effect is an instance of “spooky action at a distance” for behaviordriven search. More specifically, spooky action at a distance occurs when a ranking decision based on a local increase in novelty results in a global decrease of novelty. Here, global novelty is defined by two measures: GNP, the maximum behavior distance between any pair of solutions in the population; and GNT, the total behavior distance between all pairs of solutions.
It turns out several existing behaviordriven algorithms support spooky action at a distance. The following example is for a onedimensional behavior space. Consider a population , and an empty archive, where , , , , , , , and . Now, consider an identical setup but with , where , and (Figure 2).
Suppose an algorithm must delete one solution, and deletes with population , but deletes with population . This change must be caused by the move of to . with deleted has global novelty and . However, with deleted has global novelty and . Thus, demonstrates spooky action at a distance.
Suppose . Then given , , , , and . Given , , , , and . The next three observations show spooky action at a distance for LSNF, NSGANF, and NSLC.
Observation 3.1 (Spookiness of LSNF).
With , , , , and is deleted. With , , , , and is deleted.
Observation 3.2 (Spookiness of NSGANF).
With , dominates , while all other solutions are nondominated is deleted. With , is no longer dominated, but now dominates is deleted.
Observation 3.3 (Spookiness of NSLC).
With , the local competition scores of are , resp. So, dominates , while all other solutions are nondominated is deleted. With , the local competition scores of are again , resp. So, as in Observation 3.2, is no longer dominated, but now dominates is deleted.
With problems such as “spooky action at a distance” in mind, the next section introduces a notion of behavior domination from which algorithms can be developed that avoid these issues.
3.2. Ranking by Behavior Domination
A practical unifying framework for behaviordriven methods should capture both the pure novelty maximization and pure fitness maximization extremes, as well as a tradeoff space, that potentially captures some of the existing approaches and suggests new ones. Many components of existing ranking mechanisms (Section 2.2) can be represented in terms of pairwise relationships between solutions, based on their behaviors and fitnesses. These pairwise interactions capture the positive or negative effects solutions have on each other during ranking when they are competing for a spot in the population. Focusing on pairwise effects also helps avoid unintended global effects, such as that discussed in Section 3.1.
To focus search on maintaining the most efficient set of stepping stones, behavior domination aims to formalize the idea that a solution should dominate solutions with similar behaviors and lower fitnesses. In particular, each solution exerts a domination effect over each weaker solution. Intuitively, the domination effect should increase (decrease) as the difference between their fitnesses increases (decreases), and increase (decrease) as the distance between their behaviors decreases (increases). The following definition of domination effect captures these requirements.
Definition 3.4 ().
The domination effect of on is a function
where is a fitness function, is a behavior characterization, and is a behavior distance.
The score produced by the domination effect function can be used in various ways in a ranking system. Two common methods of combining pairwise scores are (1) ranking by aggregation, and (2) ranking by domination. In ranking by aggregation, solutions are ranked by a single score based on a sum of pairwise scores, e.g., the novelty score is a normalized sum of distances between the behaviors of pairs of solutions. In ranking by domination, solutions are ranked in a partial order, by a boolean pairwise relation of whether they dominate one another. To enable ranking by domination, the following definition provides such a pairwise operator, based on the domination effect function defined above.
Definition 3.5 ().
If , then , that is, dominates .
It turns out that for any specification of effective domination, i.e., any choice of , , and , this definition of domination defines a partial order over solutions.
Theorem 3.6 ().
induces a partial order over solutions for any choice of , , and .
Proof.
Transitivity: Suppose and . Then, . Reflexivity and antisymmetry are similarly straightforward to show. ∎
The partial order defined by behavior domination is similar to the one defined by Paretodominance in multiobjective optimization. Note that, even though they make use of a notion of Paretodominance, neither NSGANF nor NSLC have the property of a stable partialordering of solutions, because the novelty objective fluctuates as the population changes over time. On the other hand, the front induced by behavior domination can be viewed geometrically as a rotation of a Pareto front (Figure 1). Algorithms based on behavior domination can then more easily inherit properties from multiobjective optimization, e.g., guarantees that the nondominated front dominates every point ever generated and all area dominated by any point ever generated, and guarantees regarding nearoptimal distribution of nondominated solutions (Laumanns et al., 2002; Deb et al., 2016; Coello et al., 2007). The practical expectation is that the utility of nondominated solutions as stepping stones in multiobjective optimization will transfer to the case of behavior domination. An algorithm based on this connection to multiobjective optimization is introduced in Section 3.4.
Although aggregation and domination are the most prevalent approaches to ranking, the definition of a behavior domination algorithm does not preclude the existence of other schemes that use a domination effect function.
Definition 3.7 ().
Every algorithm whose ranking mechanism’s dependence on and can be defined in terms of a domination effect function is a behavior domination algorithm (BDMA).
Behavior domination algorithms can avoid “spooky action at a distance” (Section 3.1) by using a dominationbased ranking scheme. When ranking decisions are only made with respect to the operator , moving a solution away from a nondominated solution cannot cause to become dominated. For example, see the representation of MAPelites in the next section (Observation 3.10).
3.3. BDMA Representation of Existing Algorithms
The next three observations demonstrate how the behavioral domination framework can be used to represent existing algorithms. Such observations are helpful in clarifying the space of BDMAs.
Observation 3.8 (Fitnessbased search is a BDMA).
Since fitnessbased search does not make use of behavior, this can be achieved by setting to be the trivial behavior characterization, . Then, (Definition 3.5) induces the same total ordering as sorting fitness scores directly.
Observation 3.9 (Novelty search is a BDMA).
This is another trivial case. Since novelty search does not make use of the fitness function, this is similarly achieved by choosing , and using the usual novelty search aggregation scoring for ranking solutions.
Observation 3.10 (MAPelites is a BDMA).
Consider an instance of MAPelites with fitness function , behavior characterization , and binning function that maps each behavior to its bin. Choose such that , and define by
Then, the nondominated solutions under are exactly the elites maintained by the original MAPelites algorithm.
3.4. A nondominated sorting BMDA: BDMA2
Given a fitness function and a behavior characterization , here let the domination effect function be parameterized completely by the choice of behavior distance . A new algorithm, BDMA2, is defined with a scaled L2 distance metric:
The inclusion of the scaling parameter is useful for flexibility in relating fitness and behavior distance numerically. Increasing increases the emphasis on novelty; decreasing it increases the emphasis on fitness. Figure 3
depicts an instance of a ranking step in BDMA2, including the induced domination structure, taken from the experiments in Section 4.1.
Now that a suitable behavior distance is defined, a fast nondominated sort (as in NSGAII (Deb et al., 2002)) is used to rank the solutions, based on the operator induced by . In contrast to the distance function used by MAPelites (Obs. 3.10), the L2 distance allows the flexible discovery of the locations of an efficient set of stepping stones, opposed to having their bounded locations determined beforehand. The expectation is that the success of the nondominated front in NSGAII in providing useful stepping stone for multiobjective optimization will transfer to this case of behavior domination. Similar to a previous behaviordriven tiebreaking approach (Hodjat et al., 2016), ties are broken on the final front from which solutions must be kept by iteratively excluding the less fit of the two nearest solutions on that front, until the desired number of solutions remain.
Specifying the number of top solutions to select via the fast nondominated sort can be viewed as specifying the number of stepping stones wished to be maintained during search. To preserve the efficient exploration capabilities of novelty search while maintaining useful stepping stones, it is useful to have a subset of the population selected as stepping stones, and the remainder selected by novelty alone. Specifying the number of stepping stones in the population is an intuitive parameterization that can be informed by domain knowledge as well as time and space requirements.
On the other hand, it may take significant experimenter effort and domain knowledge to set an effective . Conveniently, the definition of behavior domination can be used to develop a suitable scheme for automatically setting online during search. It is straightforward to encode rules so that is set to guarantee the domination or nondomination of some set of solutions considered harmful or desirable, respectively. In the experiments in this paper, an example of such an online adaptation scheme is considered, inspired by the avoidance of “spooky action at a distance” (Section 3.1). In this scheme, at every iteration is set at the maximal value such that neither of the two most distant solutions are dominated. This online adaptation scheme (BDMA2a) is compared against setting a static in Section 4
. Though it is an intuitive heuristic, setting
online in this fashion does not necessarily preserve the guarantees of using a fixed domination effect function. Development of more grounded approaches to adapting is left to future work.4. Experimental Investigation
Experiments were run in domains that extend limited capacity drift models, previously used to study novelty search (Lehman and Stanley, 2013; Lehman and Miikkulainen, 2015), with fitness and a continuous solution space. Each solution is encoded by a vector with values in the range . The population is randomly initialized with all values in . This abstraction captures the property of real world domains that often only a small portion of the behavior space can be reached by randomly generated solutions, e.g., robots that either spin in place or crash into the nearest wall; evolution must accumulate structure in its solutions to progress beyond this initial space. The first set of experiments tests the ability to discover and maintain available stepping stones; the second tests the ability to perform well in settings where effective use of stepping stones can accelerate evolutionary progress.
The underlying evolutionary algorithm for each experimental setup is a steadystate algorithm with Gaussian mutation and uniform crossover. The only difference between setups in a domain is the method of ranking solutions. See Appendix for experimental parameter settings. In each domain, the performance measures for each algorithm were averaged over ten runs.
4.1. Discovering and Maintaining Stepping Stones
The first domain has a onedimensional solution space. The fitness landscape has four peaks of differing heights, with the rightmost peak being the highest (Figure 3). The behavior characterization is the identity function, i.e., . Each peak represents a potentially useful stepping stone, with the higher peaks having more potential. In an optimal state, a population will include solutions near the tops of each peak. This domain tests an algorithm’s ability to grow its solutions to successfully discover each peak while maintaining in the active population potentially useful stepping stones encountered along the way.
Consider four bins in the behavior space, each of width 10 and centered around a peak. Each algorithm is evaluated against two MAPelitesbased measures (Cully et al., 2015; Pugh et al., 2015). The first is the sum of the top fitnesses ever achieved across the bins; this measures an algorithm’s ability to discover stepping stones. The second is the sum of the top fitnesses of these bins in the current population; this measures an algorithm’s ability to maintain stepping stones. The results are depicted in Figure 4.
As expected, novelty search is able to discover the available stepping stones most quickly, since it’s focused only on exploration. However, BMDA2 is not far behind, followed by NSLC and BDMA2a. When it comes to maintaining these stepping stones, BDMA2 outperforms the other algorithms, again followed closely by NSLC and BDMA2a. Note that although MAPelites maintains the elites in each visited bin, when the bin size is large it is difficult to jump to new bins, and when it is small the chance of selecting an elite on the edge as a parent is small. So, MAPelites explores slowly in this domain (results shown with bin size 1).
Figure 5 shows examples of values of adapted over the course of BDMA2a runs.
Future schemes for adapting may try to minimize fluctuations for better predictability (Section 5).
4.2. Harnessing Stepping Stones
The most successful algorithms at discovering and maintaining stepping stones (NSLC, BDMA2, and BDMA2a), along with Novelty and Fitness as controls, were evaluated in two further domains, which test the abilities of algorithms to exploit available stepping stones by focusing on the most promising areas of the search space.
4.2.1. Exponential Focus (ETF) Domain
The ETF domain captures the notion that real world domains contain complementary stepping stones, which, if harnessed successfully, can accelerate progress in a way not possible otherwise. This domain has a twodimensional solution space, and the fitness function contains stepping stones that can enable exponential progress if used effectively.
The fitness landscape consists of a series of clawlike regions that increase in size and value as they get farther away from the origin; all other areas have fitness zero (Figure 6 (top)).
The heel of the first claw is located at and has fitness 1. The claw has a heel with fitness , and three toes, each of width . Fitness increases linearly along each toe. The tip of the vertical and horizontal toes have fitness , and the tip of the diagonal toe has fitness . The heel of the claw has fitness , and can be reached by a successful crossover of the vertical and horizontal toes. Thus, an algorithm can reach the next claw by maintaining solutions on the tips of both horizontal and vertical toes, while avoiding convergence to the deceptive diagonal toe.
The behavior characterization is , i.e., controls how much the first dimension of the behavior space is stretched. As increases, it is more costly for an algorithm to densely explore the entire behavior space. Experiments were run with , , and .
Since the purpose of this domain is to evaluate how well an algorithm can use stepping stones to discover highperforming solutions, algorithms are compared based on their maximum fitness achieved by iteration. Results are shown in Figure 6 (bottom) and Table 1 (a).
Fitness  Novelty  NSLC  BDMA2  BDMA2a  
2.55 (0.28)  6.49 (0.77)  6.02 (1.29)  22.41 (5.32)  11.76 (0.85)  
2.55 (0.28)  9.59 (1.74)  6.31 (1.26)  14.79 (2.63)  14.16 (1.33)  
2.55 (0.28)  9.36 (1.68)  6.13 (0.98)  9.57 (2.04)  15.68 (1.71) 
(a) Mean max fitness (std. err.) in the ETF domain.
Fitness
Novelty
NSLC
BDMA2
BDMA2a
10
2.708 (0.00)
2.846 (0.09)
2.823 (0.05)
3.023 (0.09)
3.010 (0.10)
20
2.708 (0.00)
2.678 (0.01)
2.748 (0.02)
2.898 (0.05)
2.791 (0.05)
30
2.708 (0.00)
2.682 (0.01)
2.705 (0.00)
2.791 (0.02)
2.711 (0.02)
(b) Mean max fitness (std. err.) in the focused Ackley domain.
BDMA2a significantly outperforms each existing algorithm for each value of (Mann Whitney U Test, ), with BDMA2 showing dramatic improvements as well.
4.2.2. Focused Ackley Domain
The results in the ETF domain demonstrate that BDMA2 can be successful in domains that contain natural stepping stones. To further validate this idea, experiments were run in a domain based on the popular Ackley benchmark function (Ackley, 1987; Bäck et al., 1997), which also has an inherent stepping stone structure. The search space is dimensional. If a solution falls in a bounded region, defined by and , its fitness is the value of the Ackley function at , otherwise, its fitness is drawn randomly from (Figure 7 (top)).
In this domain, , and scale is controlled by the number of dimensions of the behavior space. The noise outside of the bounded region is a challenge for algorithms that must decide which regions are worth exploring. The results in Figure 7 (bottom) and Table 1 (b) show how BDMA2 and BMDA2a improve upon existing approaches. BDMA2 significantly outperforms each existing algorithm for each value of (Mann Whitney U Test, ), except for Fitness with , as each approach that makes use of the behavior characterization is negatively affected by increases in the dimensionality of .
Still, the success of BDMA2 and BDMA2a in these domains that contain useful stepping stones is encouraging evidence for the potential to scale behavior domination algorithms to more complex domains, where it is assumed that such stepping stones exist.
5. Discussion and Future Work
The existing algorithms classified under behavior domination (Section
3.3) have been validated across an array of complex domains (Bäck et al., 1997; Lehman and Stanley, 2010, 2011a; Cully et al., 2015; Mouret and Clune, 2015). The experiments in Section 4 demonstrate that the behavior domination framework can lead to progress over existing approaches on problems that contain useful stepping stones, and it will be interesting to see what new methods will be required to scale these methods to the real world, where stepping stones abound, e.g., in domains such as robot control (Lehman and Stanley, 2011a; Cully et al., 2015; Mouret and Doncieux, 2012) and automatic content generation (Lehman and Stanley, 2011b; Nguyen et al., 2016; Lehman et al., 2016).Effective specification of behavior is still an issue. Experiments in the ETF domain (Section 4.2.1) showed how behaviordriven algorithms can be sensitive even to linear scaling of the behavior space. Although BDMA2 and BDMA2a outperformed the other approaches in this scenario, their reliance on a single parameter across all behavior dimensions makes them susceptible to such issues. From the perspective of behavior domination, solutions to these issues can be hidden in the behavior characterization, i.e., by letting be some transformation of the raw behavior characterization. Automatically specifying behavior characterizations in a robust and general way is an open problem, and some recent work has begun to make progress in this direction (Meyerson et al., 2016; Nguyen et al., 2016; Liapis et al., 2013; Gomes et al., 2014).
Given a reasonable behavior characterization, one method of setting automatically was presented in Section 3.4, but there are many methods that could be tried, some of which may be more generally effective, and preserve stability properties of the behavior domination front. Overall, more work can be done to transfer guarantees from the theory of multiobjective optimization (Deb et al., 2016; Coello et al., 2007), which will also lead to practical algorithmic improvements.
Although transferring theoretical properties can be satisfying, further work is needed to understand where theoretical focus in behaviordriven search will yield the biggest practical impact. The issue of “spooky action at a distance” (Section 3.1) identifies some unsettling dynamics in existing algorithms, but it is not clear whether it strikes at the heart of the matter, or is merely a shadow of something more illusive. Further work must be done to fully characterize the emergent dynamics of ranking procedures, in parallel with work to understand how careful specification of a behavior characterization and fitness function can guarantee the existence of useful stepping stones in the joint behaviorfitness space.
6. Conclusion
The goal of this study was to understand and harness the ability of evolution to discover useful stepping stones. Existing behaviordriven algorithms have properties that interfere with this goal; the behavior domination framework was introduced to reason formally about how these properties could be avoided. A new algorithm, BDMA2, was introduced based on this framework, and shown to improve over existing behaviordriven algorithms in domains that contain useful stepping stones. The behavior domination perspective is thus a promising tool for comparing and understanding existing behaviordriven algorithms as well as for designing better ones in the future.
References
 (1)
 Ackley (1987) D. H. Ackley. 1987. A connectionist machine for genetic hillclimbing. Kluwer, Norwell, MA.
 Bäck et al. (1997) T. Bäck, D. B. Fogel, and Z. Michalewicz. 1997. Handbook of evolutionary computation. Oxford, New York.

Bowren
et al. (2016)
J. A. Bowren, J. K. Pugh,
and K. O. Stanley. 2016.
Fully Autonomous RealTime AutoencoderAugmented Hebbian Learning through the Collection of Novel Experiences. In
Proc. of ALIFE. 382–389.  Coello et al. (2007) C. A. C. Coello, G. B. Lamont, and D. A. Van Veldhuizen. 2007. Evolutionary algorithms for solving multiobjective problems. Vol. 5. Springer.
 Cuccu and Gomez (2011) G. Cuccu and F Gomez. 2011. When Novelty is Not Enough. In Evostar. 234–243.
 Cully et al. (2015) A. Cully, J. Clune, D. Tarapore, and JB. Mouret. 2015. Robots that can adapt like animals. Nature 521, 7553 (2015), 503–507.
 Deb et al. (2002) K. Deb, A. Pratap, S. Agarwal, and T. A. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGAII. IEEE Trans. on Evolutionary Computation 6, 2 (2002), 182–197.
 Deb et al. (2016) K. Deb, K. Sindhya, and J. Hakanen. 2016. Multiobjective optimization. In Decision Sciences: Theory and Practice. 145–184.
 Gomes and Christensen (2009) J. Gomes and A. L. Christensen. 2009. Generic behaviour similarity measures for evolutionary swarm robotics. In Proc. of GECCO. 199–120.
 Gomes et al. (2014) J. Gomes, P. Mariano, and A. L. Christensen. 2014. Systematic Derivation of Behaviour Characterisations in Evolutionary Robotics. CoRR abs/1407.0577 (2014).
 Gomes et al. (2015) J. Gomes, P. Mariano, and A. L. Christensen. 2015. Devising effective novelty search algorithms: A comprehensive empirical study. In Proc. of GECCO.
 Gomez (2009) F. J. Gomez. 2009. Sustaining diversity using behavioral information distance. In Proc. of GECCO. 113–120.
 Hodjat et al. (2016) B. Hodjat, H. Shahrzad, and R. Miikkulainen. 2016. Distributed AgeLayered Novelty Search. In Proc. of ALIFE. 131–138.
 Laumanns et al. (2002) M. Laumanns, L. Thiele, K. Deb, and E. Zitzler. 2002. Combining convergence and diversity in evolutionary multiobjective optimization. Evolutionary Computation 10, 3 (2002), 263–282.
 Lehman and Miikkulainen (2015) J. Lehman and R. Miikkulainen. 2015. Extinction Events Can Accelerate Evolution. PloS one 10, 8 (2015).

Lehman
et al. (2016)
J. Lehman, S. Risi, and
J. Clune. 2016.
Creative Generation of 3D Objects with Deep Learning and Innovation Engines. In
Proc. of ICCC. 180–187.  Lehman and Stanley (2008) J. Lehman and K. O. Stanley. 2008. Exploiting OpenEndedness to Solve Problems Through the Search for Novelty. In Proc. of ALIFE. 329–336.
 Lehman and Stanley (2010) J. Lehman and K. O. Stanley. 2010. Efficiently evolving programs through the search for novelty. In Proc. of GECCO. 836–844.
 Lehman and Stanley (2011a) J. Lehman and K. O. Stanley. 2011a. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary Computation 19, 2 (2011), 189–223.
 Lehman and Stanley (2011b) J. Lehman and K. O. Stanley. 2011b. Evolving a diversity of virtual creatures through novelty search and local competition. In Proc. of GECCO. 211–218.
 Lehman and Stanley (2012) J. Lehman and K. O. Stanley. 2012. Beyond openendedness: Quantifying impressiveness.. In Proc. of ALIFE. 75–82.
 Lehman and Stanley (2013) J. Lehman and K. O. Stanley. 2013. Evolvability is inevitable: Increasing evolvability without the pressure to adapt. PloS one 8, 5 (2013).
 Liapis et al. (2013) A. Liapis, H. P. Martinez, J. Togelius, and G. N. Tannakakis. 2013. Transforming exploratory creativity with DeLeNoX. In Proc. of ICCC. 56–63.
 Meyerson et al. (2016) E. Meyerson, J. Lehman, and R. Miikkulainen. 2016. Learning behavior characterizations for novelty search. In Proc. of GECCO. 149–156.
 Mouret and Clune (2015) JB. Mouret and J. Clune. 2015. Illuminating search spaces by mapping elites. CoRR abs/1504.04909 (2015).
 Mouret and Doncieux (2009) JB. Mouret and S. Doncieux. 2009. Using behavioral exploration objectives to solve deceptive problems in neuroevolution. In Proc. of GECCO. 627–634.
 Mouret and Doncieux (2012) JB. Mouret and S. Doncieux. 2012. Encouraging behavioral diversity in evolutionary robotics: An empirical study. Evolutionary Comp. 20, 1 (2012), 91–133.
 Nguyen et al. (2015) A. Nguyen, J. Yosinski, and J. Clune. 2015. Innovation engines: Automated creativity and improved stochastic optimization via deep learning. In Proc. of GECCO. 959–966.
 Nguyen et al. (2016) A. Nguyen, J. Yosinski, and J. Clune. 2016. Understanding innovation engines: Automated creativity and improved stochastic optimization via deep learning. Evolutionary Computation 24, 3 (2016), 545–572.
 Preuss et al. (2014) M. Preuss, A. Liapis, and J. Togelius. 2014. Searching for good and diverse game levels. In Proc. of CIG. 1–8.
 Pugh et al. (2015) J. K. Pugh, L. B. Soros, P. A. Szerlip, and K. O. Stanley. 2015. Confronting the Challenge of Quality Diversity. In Proc. of GECCO. 967–974.
Comments
There are no comments yet.