1 Introduction
With the advent of 3D printing and generative design, a new goal in optimization is emerging. Having the option of choosing from different solutions that are good enough to fulfill a task can be more effective than being guided by singlesolution algorithms. The optimization field should aim to understand how to solve a problem in different ways.
Three major paradigms for multisolution optimization exist. The major difference between multiobjective optimization (MOO), multimodal optimization (MMO) and quality diversity (QD) is the context in which solution diversity is maintained. In MOO the goal is to find the Pareto set, which represents the tradeoffs between multiple criteria. MMO finds solutions that cover the search space as well as possible. QD finds combinations of phenotypic features to maximize the variation in solutions’ expressed shape or behavior  a new focus in evolutionary optimization [17].
We analyze the diversity of solution sets in the three paradigms and introduce a new niching method that allows comparing genetic and phenotypic diversity (Section 2). State of the art diversity metrics (Section 3) are used in a new problem domain (Section 4) to evaluate all paradigms (Section 5) after which we make recommendations when to use which approach (Section 6).
2 Diversity in Optimization
The intuitive understanding of diversity assumes that there are more ways to “do” or to “be” something and involves the concepts of dissimilarity and distance. Evidence can be found in the large number of approaches and metrics, and the lack of agreement in when to use which one. This section gives an overview over three paradigms that have arisen in the last decades.
Finding solutions that are diverse with respect to objective space has been a paradigm since the 1970s. Multiobjective optimization tries to discover the Pareto set of tradeoff solutions with respect to two or more objectives. The method has no control over the diversity of genomes or their expression other than the expectation that tradeoffs require different solutions. The most successful method is the Nondominated Sorting Genetic Algorithm (NSGAII)
[5].The first ideas to use genetic diversity in optimization were not used to find different solutions, but to deal with premature convergence to local optima. The concept of niching was integrated into evolutionary optimization by introducing sharing and crowding [8, 6]. In the 1990s, multilocal or multimodal optimization came into focus. This paradigm has the explicit goal to find a diverse set of high quality locations in the search space, based on a single criterion. Various algorithms have been introduced, like basin hopping [26], topographical selection [23], nearestbetter clustering [16] and restarted local search (RLS) [15].
The introduction of novelty search [11] led to studying the search for novel, nonoptimal solutions. QD, reintroducing objectives [2, 12], finds a diverse set of high quality optimizers by performing niching in phenotypic space. In applications for developing artificial creatures and robot controller morphologies [2, 12], QD only allows solutions that belong the same phenotypic niche to compete. To this end it keeps track of an archive of niches. Solutions are added to the archive if their phenotype is novel enough or better than that of a similar solution.
This work does not aim at giving an exhaustive overview over all methods, for which we refer to some of the many survey papers [3, 1, 15, 21, 22, 27]. We consciously choose not to talk about methods that combine ideas from the three paradigms, but rather compare the three paradigms in their “purest” form.
2.1 Niching with Voronoi Tessellation
To remove variations in the search dynamics when comparing different algorithms, we introduce a niching variant using ideas from Novelty Search with Local Competition (NSLC) [12] and CVTElites [25]. VoronoiElites (VE) accepts all new solutions until the maximum number of archive bins is surpassed (Alg. 1). Then the pair of elites that are phenotypically closest to each other are compared, rejecting the worstperforming. An example archive is shown in Fig. 6 at step five). By locating selection pressure on the closest solutions, VE tries to equalize the distances between individuals. The generators of the Voronoi cells do not have to coincide with the centroids, like in CVTElites, and the boundaries of the archive are not fixed. VE can be used to compare archive spaces of different dimensionality. When the genetic parameters are used as archive dimensions, VE behaves like an MMO algorithm by performing niching in genetic space. When we use phenotypic descriptors, VE behaves like a QD algorithm.
2.2 Related Work
A number of survey and analysis articles have appeared in the last decade. In [1] a taxonomy for diversity in optimization was introduced. [28] investigates how genetically diverse solution sets in MOO are found and shows that quality indicators used in MOO can be applied to MMO. [24] compares two algorithms from MMO to two QD algorithms in a robotics task, showing that clearing’s performance can be comparable to that of QD. Finally, [13] discusses 100 solution set quality indicators in MOO and [22] discusses diversity indicators for MOO.
3 Metrics
From the large number of diversity metrics available we only consider metrics that do not depend on precise domain knowledge, because no knowledge about actual local optima is available in real world applications. Three commonly used distancebased metrics are selected to evaluate the experiments in this work. The Sum of Distances to Nearest Neighbor (SDNN) measures the size of a solution set as well as the dispersion between members of that set. SolowPolasky Diversity (SPD) measures the effective number of species by using pairwise distances between the species in the set [20]. If the solutions are similar with respect to each other, SPD tends to 1, otherwise to . The sensitive parameter , which determines how fast a population tends to with increasing distance, needs to be parameterized for every domain. It is set to 1 for genetic distances and to 100 for phenotypic distances in this work. Pure Diversity (PD) is used in highdimensional manyobjective optimization [21, 27]. It does not have parameters, which makes it more robust, and depends on a dissimilarity measure (norm).
Publications in the field of QD have focus on a small number of metrics. The total fitness is used directly or through the QDscore [18], which calculates the total fitness of all filled niches in a phenotypic archive. To achieve this, the solutions from a nonQD algorithm are projected into a fixed phenotypic niching space. This score is domaindependent and does not allow comparing QD algorithms that have different archiving methods. A comparison between archives created from different features introduces a bias towards one of the archives. The collection size indicates the proportion of the niching space that is covered by the collection, but again can only be used on a reference archive [3]. Archivedependent metrics do not generalize well and introduce biases. We therefore only use distancebased diversity metrics. The high dimensionality of phenotypic spaces is taken into account by using appropriate distance norms.
4 Polygon Domain
We construct a domain of free form deformed, eightsided polygons. The genome (Fig. 1a) consists of 16 parameters controlling the polar coordinate deviation of the polygon control points. The first eight genes determine the deviation of the radius of the polygon’s control points, the second eight genes determine their angular deviation. Since the phenotypes can be expressed as binary bitmap images (Figs. 1b and 1c, resolution of 64x64 pixels) we use the Hamming distance in the diversity metrics to circumvent the problem of high dimensionality [7].
Three aspects describing the polygons are defined that can be used either as criteria or as features (Fig. 1d): the area of the polygon , its circumference and point symmetry through the center. The polygon is sampled at equidistant locations on the polygon circumference. The symmetry error is calculated as the sum of distances of all opposing sampling locations. The symmetry metric is calculated as shown in Eq. 1.
(1) 
5 Evaluation
We ask which paradigm (objective space, search space or phenotype space) provides the highest phenotypic diversity of shapes. We compare VE, RLS and NSGAII in multiple experiments. Throughout these experiments we fix the number of function evaluations and solutions and use five replicates per configuration. In NSGAII the features are used as optimization criteria, maximizing and minimizing
. The true Pareto set consists of circles with varying sizes. The number of generations is set to 1024 and mutation strength to 10% of the parameter range. The probability of crossover for NSGAII is 90% and probability of mutation
%, with degrees of freedom. VE’s archive size is varied throughout the experiments. The number of children and population size is set to the same value. RLS uses as many restarts as the size of the VE archive, the step size is set to (after a small parameter sweep) and LBFGSB is used as a local search method (within the bounds of the domain). The initial solution set for VE and NSGAII is created with a Sobol sequence  the initial RLS solution is in the center of the parameter range but RLS’ space filling character assures a good search space coverage.5.1 Genetic or Phenotypic Diversity
Biology has inspired evolutionary optimization to compose a solution of a genome, its encoding, and a phenotype, its expression. The phenotype often is a very highdimensional object, for example a highresolution 2D image, and can involve the interaction with the environment. Since the phenotypic space is usually too large, a lowdimensional representation, the genome, is used as search space. An expression function is constructed that turns a genome into its phenotype. Although the expression function should ideally be a bijective mapping, it often does not prevent multiple genomes to be mapped to the same phenotype. The phenomenon of such a surjective mapping is called genetic neutrality, which is not the same but akin to genetic neutrality in biology. In biology, a neutral mutation is understood to be a mutation that has no effect on the survivability of a life form. In evolutionary computation, genetic neutrality is referred to as genetic variants that have the same phenotype [9].
Figure 2(a) shows an example polygon. If the angle equals 0°or 45°, phenotypically speaking, these shapes are the same. In this case, eight genomes all point to the same phenotype. Similarly, Figure 2(b) shows how, through translations of the keypoints, a similar shape can appear based on different genomes. We postulate the first hypothesis: diversity maintenance in a neutral, surjective genetic space leads to lower phenotypic diversity than when using phenotypic niching.
While diversity is often thought about in terms of the distribution of points in the search space, we make a case to measure diversity in phenotypic space, which is independent of the encoding and does not suffer from the effects of genetic neutrality. Phenotypes may also include other factors that are not embodied within the solution’s shape itself, but emerge through interaction with the environment. This is taken advantage of in several publications on neuroevolution [11, 12]. In this work we only analyse the narrow interpretation of phenotypes, which does not include behavior.
The Voronoi tessellation used in VE makes it easy to compare archives of different dimensionality by fixing the number of niches. We apply VE as an MMO algorithm, performing niching in 16dimensional genetic space, and as a QD algorithm with a twodimensional phenotypic space. The number of bins is increased to evaluate when differences between genetic and phenotypic VE appear (Fig. 3). At 25 solutions, the approaches produce about the same diversity, but genetic VE finds higher quality solutions. As the number of bins is increased, based on where niching is performed (genetic or phenotypic space), the diversity in that space becomes higher. Phenotypic VE beats genetic VE in terms of phenotypic diversity, which gives us evidence that the first hypothesis is valid. At the same time, the average fitness values of genetic VE are higher than that of phenotypic VE, although the difference gets lower towards 400 solutions.
case  axial min.  axial max.  radial min.  radial max.  neutrality 

A  0  1  0.05  0.05   
B  0  1  0.125  0.125  + 
C  0.25  1  0.25  0.25  ++ 
D  0.5  1  0.5  0.5  +++ 
E  1  1  1  1  ++++ 
We compare phenotypic VE to NSGAII and RLS. When we bound between and and between , we can minimize genetic neutrality. Neutrality is increased by expanding those bounds (Table 1). In contrast to VE, the phenotypic diversity of RLS’ solutions is expected to decrease as genetic neutrality increases. Since there is no mechanism to distinguish between similar shapes with different genomes, there is an increasing probability that RLS finds similar solutions. We expect that the solution set produced by RLS due to its space filling character is more diverse than using NSGAII.
Finally, it can make more sense to treat objectives as features and, instead of searching for the Pareto set, allowing all combinations of features and increasing the diversity of the solution set. We expect NSGAII to easily find the Pareto set, which consists of circles of various scales, maximizing the area while minimizing the length of the circumference, while QD should find a variety of shapes that can be any combination of large and small and . We postulate the second hypothesis: allowing all criteria combinations, instead of using a Pareto approach, leads to higher diversity, while still approximating the Pareto set.
The number of solutions is set to 400. A result similar to Fig. 3 appears for the standard algorithms in Fig. 4. Phenotypic diversity is highest for VE, especially after the genetic neutrality threshold is crossed (at B). Diversity of NSGAII is lowest, as is expected for this setup. Although diversity of VE is higher than that of RLS, the latter’s solutions are all maximally symmetric (see fitness plots), making RLS much more appropriate when quality is more important than diversity. These results confirm the first part of the second hypothesis.
The Pareto set can be calculated a priori, as we know that circular shapes maximize area while minimizing circumference. The members of the Pareto set adhere to the following genome: , where and have the same respective value. To create 100 shapes from the Pareto set we take ten equidistant values for and and combine them.
Part of the resulting Pareto set is shown in Fig. 5. The distance to the Pareto set is measured in phenotypic space, by measuring the smallest pixel error, the sum of pixelwise differences, between a solution and the Pareto set. We see that the a number of solutions in VE and RLS are close to the Pareto set (Fig. 5 bottom left). Example results with low and high neutrality are shown on the right. Solutions that are close to the Pareto set are shown in the brightest green color. This is evidence for the second half of the second hypothesis. VE again seems to be more robust w.r.t. genetic neutrality, as it finds more solutions close to the Pareto set in highneutrality domains (bottom row) than RLS.
5.2 Phenotypic Diversity without Domain Knowledge
Up to this point we have used domain knowledge to construct a phenotypic niching space with VE. Intuitively, the area and circumference seem like good indicators for phenotypic differences. But this comparison between QD and MMO is not completely fair, as the latter does not get any domain information. On the other hand, the features used in QD might not be the most diversifying.
We remove the domain knowledge from QD and construct a phenotypic niching space by using a well known dimensionality reduction technique to map the phenotypes to a latent space, as was done in [14, 4]. To our best knowledge, this data driven phenotypic niching approach, which we name AutoVoronoiElites (AutoVE), has never been applied to shape optimization. An initial set of genomes, drawn from a quasirandom, spacefilling Sobol sequence [19] and expressed into their phenotypes, is used to train a convolutional autoencoder (cAE) (see Fig. 6
). The bottleneck in the cAE is a compressed, latent space that assigns every phenotype to a coordinate tupel. The encoder predicts these coordinates of new shapes in the latent space, which are used as phenotypic features. QD searches phenotypes that expand and improve the cAE archive. The cAE is retrained with the new samples. The cAE consists of two convolutional layers in the encoder and four transposed convolutional layers in the decoder. We set the filter size to three pixels, the stride to two pixels, and the number of filters to eight. The cAE is trained using ADAM
[10]with a learning rate of 0.001 and 350 training epochs and a mean square error loss function. Latent coordinates are normalized between 0 and 1. The number of generations (1024) is divided over two iterations for AutoVE and the number of latent dimensions is set to two (to compare with manual VE), five or ten.
Fig. 7 shows that the twodimensional manual and autoencoded phenotypic space (AutoVE 2D) produce similar diversity, whereby the quality of solutions from AutoVE 2D is higher. The higherdimensional latent spaces increase the solution set diversity at the cost of fitness. This is to be expected, as lowerfitness optima are protected in their own niches. Finally, the diversity of higherdimensional AutoVE is around 50% higher than any of the other tested methods.
6 Conclusion
The main contributions of this work are as follows: a domain was introduced that allows comparing three different diversity paradigms; a case was made to measure diversity in phenotypic rather than genetic space; the hypothesis that QD is less sensitive to genetic neutrality than MMO was confirmed; the hypothesis that while the diversity of solutions sets of QD and RLS is higher than that of MOO, they also find some solutions close to the ground truth Pareto set, was confirmed; we showed that phenotypic diversity in QD is higher than MMO and MOO. Furthermore, we introduced VE, a simpler and selfexpanding version of QD. We also used an autoencoder to discover phenotypic features in a shape optimization problem, showing that we do not need to manually predefine features to get a highly diverse solution set, allowing us to fairly compare QD to MOO and MMO. Using an autoencoder produces higher diversity than manually defined features, making AutoVE a strong choice for high diversity multisolution optimization.
Since all paradigms have their strengths and weaknesses, we propose a guide for when to use which approach. MOO should be used when you want to optimize all the criteria and want to know the tradeoff solutions between those criteria. MMO is appropriate when you have a nonneutral bijective encoding, when you have a single criterion you want to optimize for or if you want to perform a gradientbased, QuasiNewton or (direct) evolutionary local search to refine local optima. We cannot easily do this in QD due to the effect of neutrality that allows a search to “jump out of” a phenotypic niche. QD should be used if you have some criteria where you are less determined about whether to optimize for them, for example during the first phase of a design process. Some representatives from the Pareto set will still be discovered. When you are interested in the largest diversity of solutions and are more willing to get some solutions with lower fitness than when using MMO, QD is the better alternative. One of the biggest strengths of QD is the possibility to understand relationships between features or even to discover features automatically.
Some research effort should be focused on hybridization. MOO and QD are connected, as the boundary of valid solutions in the phenotypic archive is close to the Pareto front, yet there is room for improvement. Connecting MMO and QD means to use a local search method in QD, which needs to overcome the genetic neutrality problem. We cannot search close to a solution in genetic space and expect newly created solutions to be close in phenotypic space.
We gave insights about different variations of diversity and when and where to apply them, depending on whether one is most interested in tradeoffs between criteria, increasing diversity while maximizing fitness, or maximizing diversity while finding highperforming solutions in a manually defined or automatically extracted phenotypic space. It is often easy to manually define two or three phenotypic descriptors, but human imagination can run out of options quickly. Automatic discovery of phenotypic features is a more attractive option for increasing solution diversity. Real world multisolution optimization and understanding solution diversity are important steps towards increasing the efficacy and efficiency at which engineers solve problems.
References
 [1] (2013) A survey of diversityoriented optimization. EVOLVE 2013A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computing, pp. 101–109. Cited by: §2.2, §2.
 [2] (2015) Robots that can adapt like animals. Nature 521 (7553), pp. 503–507. Cited by: §2.
 [3] (2017) Quality and diversity optimization: a unifying modular framework. IEEE Transactions on Evolutionary Computation 22 (2), pp. 245–259. Cited by: §2, §3.
 [4] (2019) Autonomous skill discovery with qualitydiversity and unsupervised descriptors. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 81–89. Cited by: §5.2.
 [5] (2002) A fast and elitist multiobjective genetic algorithm: nsgaii. IEEE transactions on evolutionary computation 6 (2), pp. 182–197. Cited by: §2.
 [6] (1975) Analysis of the behavior of a class of genetic adaptive systems. Dept. Computer and Communication Sciences, University of Michigan, Ann Arbor. Cited by: §2.
 [7] (1950) Error detecting and error correcting codes. The Bell system technical journal 29 (2), pp. 147–160. Cited by: §4.
 [8] (1975) Adaptation in natural and artificial systems. MIT press. Cited by: §2.

[9]
(2011)
Robustness, evolvability, and accessibility in linear genetic programming
. In European Conference on Genetic Programming, pp. 13–24. Cited by: §5.1.  [10] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §5.2.
 [11] (2011) Abandoning objectives: evolution through the search for novelty alone. Evolutionary computation 19 (2), pp. 189–223. Cited by: §2, §5.1.
 [12] (2011) Evolving a diversity of virtual creatures through novelty search and local competition. In Proceedings of the 13th annual conference on Genetic and evolutionary computation, pp. 211–218. Cited by: §2.1, §2, §5.1.
 [13] (2019) Quality evaluation of solution sets in multiobjective optimisation: a survey. ACM Computing Surveys (CSUR) 52 (2), pp. 1–38. Cited by: §2.2.
 [14] (2016) Learning behavior characterizations for novelty search. GECCO 2016  Proceedings of the 2016 Genetic and Evolutionary Computation Conference, pp. 149–156. External Links: ISBN 9781450342063 Cited by: §5.2.
 [15] (2012) Restarted local search algorithms for continuous black box optimization. Evolutionary Computation 20 (4), pp. 575–607. Cited by: §2, §2.
 [16] (2012) Improved topological niching for realvalued global optimization. In European Conference on the Applications of Evolutionary Computation, pp. 386–395. Cited by: §2.
 [17] (2016) Quality diversity: a new frontier for evolutionary computation. Frontiers in Robotics and AI 3, pp. 40. Cited by: §1.
 [18] (2015) Confronting the challenge of quality diversity. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 967–974. Cited by: §3.
 [19] (1967) On the distribution of points in a cube and the approximate evaluation of integrals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki 7 (4), pp. 784–802. Cited by: §5.2.
 [20] (1994) Measuring biological diversity. Environmental and Ecological Statistics 1 (2), pp. 95–103. Cited by: §3.
 [21] (2017) PlatEMO: a matlab platform for evolutionary multiobjective optimization. IEEE Computational Intelligence Magazine 12 (4), pp. 73–87. Cited by: §2, §3.

[22]
(2019)
Diversity assessment of multiobjective evolutionary algorithms: performance metric and benchmark problems [research frontier]
. IEEE Computational Intelligence Magazine 14 (3), pp. 61–74. Cited by: §2.2, §2.  [23] (1992) Topographical global optimization. Recent advances in global optimization, pp. 384–398. Cited by: §2.
 [24] (2017) Comparing multimodal optimization and illumination. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 97–98. Cited by: §2.2.
 [25] (2017) Using centroidal voronoi tessellations to scale up the multidimensional archive of phenotypic elites algorithm. IEEE Transactions on Evolutionary Computation 22 (4), pp. 623–630. Cited by: §2.1.
 [26] (1997) Global optimization by basinhopping and the lowest energy structures of lennardjones clusters containing up to 110 atoms. The Journal of Physical Chemistry A 101 (28), pp. 5111–5116. Cited by: §2.
 [27] (2016) Diversity assessment in manyobjective optimization. IEEE transactions on cybernetics 47 (6), pp. 1510–1522. Cited by: §2, §3.
 [28] (2016) On multiobjective selection for multimodal optimization. Computational Optimization and Applications 63 (3), pp. 875–902. Cited by: §2.2.