1. Introduction
How solutions are represented is one of the most critical design decisions in optimization, as the representation defines the way an algorithm can move in the search space (rothlauf2006representations). Work on representations tends to focus on encoding priors or innate biases: aerodynamic designs evolved with splines to encourage smooth forms (olhofer2001adaptive)
, Compositional Pattern Producing Networks (CPPNs) introduce biases for symmetry and repetition to produce images and neural network weight patterns
(cppn; hyperneat), or encodings which aim to encourage modularity in neural networks (mouret2008mennag; durr2010genetic; doncieux2004evolving).The best representations balance a bias for high performing solutions, so they can easily be discovered, and the ability to express a diversity of potential solutions, so the the search space can be widely explored. At the one extreme, a representation which only encodes the global optimum is easy to search, but useless for finding any other solution. At the other, a representation which can encode anything presents a difficult and dauntingly vast search space.
Given a large set of example solutions, representations could be learned from data instead of been handtailored by trialanderror: a learned representation would replicate the same biases toward performance and the same range of expressivity as the source data set. For instance, given a dataset of face images, a variational autoencoder (VAE) (vae) or a Generative Adversarial Network (GAN) (gan) can learn a lowdimensional latent space, that is, a representation, that makes it possible to explore the space of face images. In essence, the decoder which maps the latent space to the phenotypic space learns the “recipe” of faces. Importantly, the existence of such a lowdimensional latent space is possible because the dataset is a very small part of the set of all possible images.
However, using a dataset of preselected highperforming solutions “traps” the search within the distribution of solutions that are already known: a VAE trained on white faces will never generate a black face. This limits the usefulness of such datadriven representations for discovering novel solutions to hard problems.
In this paper, we propose the use of the MAPElites algorithm (mapelites) to automatically generate a dataset for representations using only a performance function and a diversity space. Quality diversity algorithms like MAPElites are a good fit for representation discovery: creating archives of diverse highperforming solutions is precisely their purpose. Using the MAPElites archive as a source of example solutions, we can capture the genetic distribution of the highest performing solutions, or elites, by training a VAE and obtaining a latent representation. As the VAE is only trained on elites, this learned representation, or datadriven encoding (DDE), has a strong bias towards solutions with high fitness; and because the elites having varying phenotypes, the DDE is able to express a range of solutions. Though the elites vary along a phenotypic continuum, they commonly have many genotypic similarities (me_linemut), which makes it likely to find a good latent space.
Nonetheless, MAPElites will struggle to find highperforming solutions without an adequate representation. Fortunately, the archive is produced by MAPElites in an iterative, anytime fashion, so there is no “end state” to wait for before a DDE can be trained — a DDE can be trained during optimization. The DDE can then be used to enhance the optimization process. By improving the quality of the archive, the DDE imporves the quality of its own source data, establishing a virtuous cycle of archive and encoding improvement.
A DDE based on an archive will encounter the same difficulty as any learned encoding: the DDE can only represent solutions that are already in the dataset. How then, can we discover new solutions? Fundamentally, to search for an encoding, we need to both exploit the best known representation, that is, create better solutions according to the current best “recipes”, and also explore new representations — solutions which do not follow any “recipe”.
In this paper, we address this challenge by mixing solutions generated with the DDE with solutions obtained using standard MAPElites operators. Our algorithm applies classic operators, such as Gaussian mutation, to create candidates which could not be captured by the current DDE. At the same time we leverage the DDE to generalize common patterns across the map and create new solutions that are likely to be highperforming. To avoid introducing new hyperparameters, we tune this exploration/exploitation tradeoff optimally using a multiarmed bandit algorithm (garivier2011upper).
This new algorithm, DDEElites, reframes optimization as a search for representations (Figure 1). Integrating MAPElites with a VAE makes it possible to apply quality diversity to highdimensional search spaces, and to find effective representations for future uses. We are interested in domains that have a straightforward lowlevel representation that is too highdimensional for most algorithms, for instance: joints positions at every timestep for a walking robot ( positions for a 3second gait of a robot with degrees of freedom), 3D shapes in which each voxel is encoded individually (lowlevel representation would be 1000dimensional for a grid), images encoded in the pixelspace, etc.
Ideally, the generated DDE will capture the main regularities of the domain. In robot locomotion, this could correspond to periodic functions, since we already know that a dimensional controller based on periodic functions can produce the numerous joint commands required every second to effectively drive a 12joint walking robot in many different ways (cully2015robots). In many domains the space of possible solutions can be vast, while the inherent dimensionality of interesting solutions still compact. By purposefully seeking out a space of solutions, rather than individual solutions themselves, we can solve highdimensional problems in a lower dimensional space.
2. Background
2.1. Optimization of Representations
In his 30 year perspective on adaptation in evolutionary algorithms, De Jong identitified representation adaptation as ”perhaps the most difficult and least understood area of EA design.”
(de2007parameter)Despite the difficulty of adapting representations, the potential rewards have lured researchers for decades. Directly evolving genotypes to increase in complexity has a tradition going back at least to the eighties (goldberg1989messy; altenberg1994evolving). The strategy of optimizing a solution at low complexity and then adding degrees of freedom has proved effective on problems from optimal control (gaier2014evolution), to aerodynamic design (olhofer2001adaptive), to neural networks (neat)
. Evolving the genome’s structure is particularly important when the structure itself is the solution, such as in genetic programming
(koza1990genetic) or neural architecture search (elsken2019neural; miikkulainen2019evolving).More recent approaches toward representation evolution haved focused on genotypephenotype mappings. Neural networks, which map between inputs and outputs, are a natural choice for such ‘metarepresentations’. These mappings can evolve with the genome (scott2015learning; simoes2014self), or by fixing the genome and evolving only the mapping (hyperneat; cppn).
Supervised methods have been previously applied to learn encodings. These approaches require a set of example solutions for training. Where large, wellcurated data sets are available this strategy has proven effective at creating representations well suited to optimization (mariogan; latentVariableEvolution2_fingerprint; latentVariableEvolution1_art), but where a corpus of solutions does not exist it must be created. In (scott2018toward; moreno2018learning) these solutions were collected by saving the champion solutions found after repeatedly running an optimizer on the problem, with the hope that the learned representation would then be effective in similar classes of problems.
2.2. MAPElites
MAPElites (mapelites) is a QD algorithm which uses a niching approach to produce a set of highperforming solutions which vary across a continuum of userdefined phenotypic dimensions. These phenotypic dimensions, or behavior descriptors, describe the way the problem is solved, and are orthogonal to performance. MAPElites has been used in such diverse cases as optimizing the distance traveled by a walking robot using different legs (cully2015robots), the drag of aerodynamic designs with varied volumes and curvatures (gaier2017aerodynamic), and the win rate of of decks composed of different cards in deckbuilding games (fontaine2019mapping).
MAPElites is a steadystate evolutionary algorithm which maintains a population in a discretized grid or ‘archive’. This grid divides the continuous space of possible behaviors into bins, or ‘niches’ with each bin holding a single individual, or ‘elite’. These elites act as parents, and are mutated to form new individuals. These child individuals are evaluated and assigned a niche based on their behavior. If the niche is empty the child is placed inside; if the niche is already occupied, the individual with higher fitness is stored in the niche and the other discarded. By repeating this process, increasingly optimal solutions which cover the range of phenotype space are found. The MAPElites algorithm is summarized in Algorithm 1.
Though phenotypically diverse the elites are often genotypically similar, existing in an “elite hypervolume”, a high performing region of genotype space (me_linemut). Just as in nature, where species as diverse as fruit flies and humans share nearly 60 percent of their genome (fruitfly), the “recipe” for high performance is often composed of many of the same ingredients.
This insight was leveraged to create a new variation operator which considers the correlation among elites (me_linemut). Genes which vary little across the elites, and so are likely common factors that produce high performance, are also subject to the smallest amount of perturbation — lowering the chance their children stray from the elite hypervolume. Biasing mutation in this way ensures that exploration is focused within the elite hypervolume, and on factors which induce phenotypic variation rather than poor performance.
2.3. Variational Autoencoders
Autoencoders (AEs) (ae) are neural networks designed to perform dimensionality reduction. AEs are composed of two components: an encoder, which maps the input to a lower dimensional latent space; and a decoder, which maps the latent space back to the original space. This network is trained to reconstruct the input through the lower dimensional latent ‘bottleneck’. The encoder component can be viewed as a nonlinear generalization of PCA (pca), with the latent space approximating the principal components.
Though the AE is able to represent the data at a lower dimensionality, and reproduce it with minimal loss, it can still be a poor representation for optimization. An important quality of representations is ‘locality’, that a small change in the genotype induces a small change in the phenotype (rothlauf2006representations). When AEs are trained only to minimize reconstruction error they may overfit the distribution of the training data and create an irregular latent space. The lowlocality of such latent spaces limits their usefulnesses in optimization: points in latent space may decode to very different solutions, meaning even a small mutation could have a huge effect.
Variational autoencoders (VAEs) (vae)
are AEs whose training is regularized to ensure a highlocality latent space. The architecture is broadly the same: an encoder and decoder mediated by a bottleneck, but rather than encoding the input as a single point it is encoded as a normal distribution in the latent space. When training the model a point from this input distribution is sampled, decoded, and the reconstruction error computed. By encoding the input as a normal distribution we induce the distributions produced by the encoder be closert to normal. VAEs are trained by minimizing two terms: (1) the reconstruction error, and (2) the KulbackLiebler (KL) divergence
(kldivergance). The KL divergence is the distance between two probability distributions, here the distance between the distribution returned by the decoder and the normal distribution we provide. The KL divergence between Gaussian distributions can be expressed in terms of their means and covariances, giving the loss function:
(1) 
Encode solutions in the form of a normal distribution encourages organization of the latent space in a continuous and overlapping way, creating a more local encoding better suited to optimization.
3. DDEElites
Every representation biases optimization in some way. These biases improve optimization by limiting the range of solutions that can be expressed to those which are valid or highperforming. But finding a balance between expressivity and bias is an arduous task requiring considerable domain expertise. Our proposed method, DDEElites, automates the process of representation design, and performs in tandem with search — allowing optimization and representation learning to improve each other in a selfreinforcing cycle.
DDEElites learns an encoding from examples of high performing solutions. To create these examples we use MAPElites, a QD algorithm which produces a variety of high performing solutions.The variety produced by MAPElites is critical — the expressivity of any learned encoding is limited by the variety of examples. That MAPElites not only produces a variety of solutions, but allows us to define the nature of that variety, makes it particularly powerful for crafting representations which vary in useful ways.
DDEElites is a variant of the MAPElites algorithm. The core component of competition within a niched archive is maintained, but novel methods of producing child solutions are introduced. Child solutions are created using an encoding learned from the archive. This encoding is refined as the archive improves, which in turn improves the optimization process. DDEElites optimizes an archive of varied solutions by reframing optimization as a search for the best representation, rather than the best solution.
3.1. Operators
Data Driven Encoding
The MAPElites archive is a record of the highestperforming solutions yet found for every behavior. When the archive is updated a VAE is trained reconstruct the individuals in the archive. Reconstruction is a mapping from one phenotype to another, mediated through latent space; and the mapping from latent space to phenotype space analogous to a genotypephenotype mapping, which we refer to as a datadriven encoding (DDE).
Features common in high performing solutions will be the most successfully compressed and reconstructed — and features widely shared by high performing solutions are likely to lead to high performance. Critically, by training the encoding only on highperforming solutions we bias the space of solutions the DDE can express to those with high performance.
Reconstructive Crossover
Biases are encoded in representation by limiting the range of solutions they can express. When a solution is reconstructed with the VAE it mapped onto a restricted to the space of solutions which can be expressed by the DDE — a space characterized by high performance.
Reconstructing individuals with the VAE can create new solutions with higher fitness than the originals, but cannot create novel solutions. Solutions created by the DDE are based on those already in the archive, so cannot reach solutions which lie outside of the encoded distribution. At early stages of optimization when there are few example solutions, using only reconstruction to create new solutions would doom our encoding to a small region of expression.
Rather than completely replacing individuals with their reconstructions we instead shift them closer to forms expressible by the DDE with a new variation operator, reconstructive crossover
. Child solutions are created by performing crossover with two parents: a parent chosen from the archive and its reconstruction. Crossover takes the form of an elementwise mean of the parameter vectors.
The reconstructive crossover operator slows the loss of diversity by only moving an individual toward the distribution of solutions encoded by the DDE, not directly into it. By only shifting solutions rather than replacing them, we allow exploration outside of the distribution to continue. Even when there is little gain in fitness, solutions that are the result of reconstructive crossover have a lower inherent dimensionality, on the account of having parents pass through the compressive bottleneck of the VAE. In this way the reconstructive crossover operator not only spreads globally advantageous genes throughout the archive, but also pulls the archive towards more easily compressed solutions.
Line Mutation
Reconstructive crossover enables effective optimization within the range of solutions that the DDE can express, but explorative operators are required to widen the pool of example solutions and improve the DDE. So when creating new solutions we choose to either produce them through reconstructive crossover, or through random mutation.
In addition to isometric Gaussian mutation commonly used in MAPElites, we apply the line mutation operator proposed in (me_linemut)
. Line mutation imposes a directional component on the Gaussian perturbations. During mutation the parent genome is compared to a random genome from the archive. The variance of mutation in each dimension is then scaled by the difference in each gene:
(2) 
where and
are hyperparameters which define the relative strength of the isometric and directional mutations. Intuitively, when two genes have similar values the spread of mutation will be small, when different the spread will be large.
In many cases certain parameter values will be correlated to high fitness, regardless of the individual’s place in behavior space. The line operator is a simple way of exploiting this similarity, but in contrast to our approach does not limit expressive power – allowing correlations between individuals to be used as a method of exploring new solutions. Though both the reconstructive crossover and line mutation operators take advantage of the similarities between high performing individuals, their differing approaches allows them to be effectively combined as explorative and exploitative operators.
Parameter Control
DDEElites explores the space of representations with the exploitative operator of reconstructive crossover, which finds high performing solutions similar to those already encoded by the DDE, and explorative operators of mutation, which expand the space of solutions beyond the range of the DDE.
The optimal ratio to use these operators is not only domain dependent, but dependent on the stage of the algorithm. When the archive is nearly empty, it makes little sense to base a representation on a few randomly initialized solutions; once the behavior space has been explored, it is beneficial to continue optimization through the lens of the DDE; and when the archive is full of solutions produced by the DDE it is more useful to expand the range of possible solutions with mutation. These stages are neither predictable nor clear cut, complicating the decision of when to use each operator.
Faced with a tradeoff between exploration and exploitation we frame the choice of operators as a multiarmed bandit problem (auer2002finite). Multiarmed bandits imagine sets of actions as levers on a slot machine, each with their own probability of reward. The goal of a bandit algorithms is to balance exploration, trying new actions, and exploitation, repeating actions that yield good rewards. Bandit approaches are straightforward to implement and have been previously used successfully to select genetic operators (dacosta2008adaptive).
We define a set of possible actions as usage ratios between reconstructive crossover, line mutation, and isometric mutation. The ratio of , for example, would have solutions created by reconstructive crossover with a probability of , and line mutation with a probability of . Each action is used to create a batch of child solutions and a reward assigned in proportion to the number of children who earned a place in the archive. At each generation a new action is chosen, and the reward earned for that action recorded.
Actions are chosen based on UCB1 (auer2002finite), a simple and effective bandit algorithm which minimizes regret. Actions with the greatest potential reward are chosen, calculated as:
(3) 
where is the reward for an action , is the total number of actions that have been performed, and the number of times that action has been performed. UCB1 is an optimistic algorithm which rewards uncertainty — given two actions with the same mean reward, the action which has been tried fewer times will be chosen.
Our archive is in constant flux, and so the true reward of each mix of operators changes from generation to generation. To handle the nonstationary nature of the problem we use a sliding window (garivier2011upper), basing our predictions only on the most recent generations.
3.2. Overview
The DDEElites algorithm proceeds as follows (see Figure 2 and Algorithm 2): (1) a DDE and reconstructive crossover operator is created by training a VAE on the archive, (2) the probability of using each variation operator is determined by the history of each mix of operators and the UCB1 bandit, (3) MAPElites is run with the chosen variation operator probabilities and the success rate is used to update the bandit. The improved archive is then used to create a new DDE and reconstructive crossover operator.
4. Experiments
4.1. Domains
Planar Arm Inverse Kinematics
The goal of the first problem is to solve the inverse kinematics (IK) problem of a 2D robot arm. Given target coordinates a configuration of joint angles should be found to place the end effector at the target. To solve this task a discretized behavior space is defined over the x,y plane and MAPElites finds a configuration of joint angles which place the end effector in each bin. The location of the end effector is derived for an arm with joints with angles with using the forward kinematics equation:
There are many solutions to this IK problem, but solutions with lower joint variance, or straighter arms, are preferred to allow for smoother transitions between configurations. We define fitness as the negative joint variance: where
To summarize, the phenotype is the the angle of each joint, the behavior space the x,y coordinates of the end effector, and the fitness the negative variance of the joint angles. The elite hypervolume of a variant of this problem was shown in (me_linemut) to be very compact but with a nontrivial shape, and so well suited to our approach. The difficulty of the problem can be easily scaled up by increasing the number of joints in the arm: we solve one variant with 20 joints and another with 200.
Hexapod Locomotion
The second task is to create a repertoire of motor commands for a hexapod walking robot. Similar to the arm task, given an (x,y) coordinate a series of joint commands should be found that bring the robot to that position. A joint command, in the form of a target joint angle moved to with a proportional controller, is given to each of the 12 joints (2 on each leg) at a frequency of 20Hz, for 3 seconds, for a total of 720 commands. The pybullet simulator (coumans2016pybullet) is used to simulate the motion of the robot. We define fitness as the efficiency of the gait, calculated as the sum of all joint movements divided by the distance from the starting point:
(4) 
where is a vector of joint angles, is the starting position of the robot and its end position.
In a variant of this more difficult task, the line mutation alone did not provide a significant benefit over MAPElites (me_linemut), presumably because the elite hypervolume was separated into multiple high performing regions. This structure can frustrate measures of correlation found by sampling, but should not pose problems for our VAE, which can encode disparate genotype regions simultaneously.
The direct representation of joint angles at each time step is not expected to be a good representation. Typically in this kind of walking domain biases for regular movement are built into the encoding, whether by explicitly using open loop oscillators (cully2015robots; me_linemut) or by providing oscillating signals to the controller (clune2009evolving).
Archive Structure
For simplicity of comparison and presentation the same archive structure is used for all domains. A unit circle is divided into 1950 bins, with each bin defined by the Voronoi diagram (cvt) with centers placed in a ring formation.
4.2. Archive Illumination
The results demonstrate that the DDE makes it possible to scaleup MAPElites to highdimensional genotypic space (Fig. 4), both for the arm (20D and 200D) and the hexapod robot.
Introducing and combining several components makes it difficult to discern which are responsible for the algorithm’s success. To better understand the contribution of each component we selectively remove operators and measure performance. Five variants are tested using the different recombination operators (see Table 1)^{1}^{1}1
A full list of hyperparameters is available in the supplementary materials as well as source code, to be released with an open source license upon publication.
. Our proposed approach DDEElites uses all operators at a ratio determined by a bandit algorithm. Comparisons are shown in Figure 3.





MEIso  X  
MELine  X  
DDEXOver  X  
DDEIso  X  X  
DDEElites  X  X  X 
Variants are compared based on the quality of the archive at each generation. As in previous work (mapelites) archives are judged based on coverage (the number of bins filled) and performance (the mean fitness of solutions). Coverage and performance are two distinct objectives, and the final archives are shown as Pareto fronts for clarity.
In the 20D arm the line mutation’s effectiveness is apparent: the archive is rapidly filled with high performance solutions. At 200D though, it begins to falter: with an order of magnitude higher dimensionality, an order of magnitude more evaluations are required. In the hexapod, where there is perhaps too little global correlation, we see little improvement over the standard isometric mutation.
When only reconstructive crossover is used the search is confined to the distribution of initial genes. In the hexapod case this is apparent by chronically low coverage and performance. In the simpler and more regular arm domain the reconstructive crossover operator does allow for solutions to be rapidly found, but with such a limited ability to explore they converge on suboptimal solutions.
DDEElites with both mutation and reconstructive crossover rapidly finds high performing solutions regardless of the dimensionality or regularity of the problem. The value of including the line mutation operator is domaindependent, just as in the standard MAPElites case. In the arm domain having access to line mutation increases coverage, but in the hexapod it makes little difference. Using the bandit algorithm allows DDEElites to take advantage of the line mutation when it is helpful, and ignore it when it is not.
4.3. Archive Recreation
DDEElites is as much a method of optimizing representations as solutions. The encoding produced by DDEElites is biased toward high performance and has a range of expression matching the behaviors space. To demonstrate that DDEElites does more than guide search, but learns a representation, we search the space again, using the found DDE in place of the direct encoding.
We run the standard MAPElites algorithm, with isometric mutation only, with the latent space of the DDE acting as the genome. For reference we compare to the MAPElites algorithm using the direct encoding (as in the previous section). Optimization progress is shown in Figure 4, with the DDEbased optimization given an order of magnitude fewer evaluations (note log scale).
In every case the DDE far outperforms the direct encoding, reaching the same levels of fitness and coverage with several orders of magnitude fewer generations. The DDE can express the full range of solutions required to fill the map: in the arm cases the maps are filled in fewer than 100 generations — fewer than 5 evaluations per bin. ^{2}^{2}2100 generations = 10,000 individuals/1950 bins 5 evaluations/bin discovered. The bias toward high performance is also apparent: in the arm cases the mean fitness curve is nearly flat at the optima, indicating that when new solutions are added to the map they are already near optimal. In the hexapod case the mean fitness when using the DDE never drops below the best mean fitness of the direct encoding — with the direct encoding considerable effort is take to find good solutions, the DDE finds little else.
4.4. Behavior Matching
Beyond its place in the DDEElites optimization loop, the produced DDE is a powerful representation with high expressivity and built in biases. Though created by MAPElites, the DDE is a genotypephenotype mapping, and can be used to shape the search of any optimization algorithm. We give the blackbox optimizer CMAES (cmaes) the task of solving the arm inverse kinematics problem, and hexapod locomotion task, to reach a set of 18 target (x,y) positions (shown in Figure 5, left). In one case CMAES performs optimization using a produced DDE; in the other the phenotype is directly evolved.
When optimizing with DDE, CMAES quickly finds solutions to the target hitting problems with a precision never matched with the direct encoding (Figure 5, top). Moreover, a bias for how the problem is solved is built into the representation. As the DDE was trained only on solutions with high fitness, that is low joint variance or efficient movement, these same properties manifests themselves in the solutions found by CMAES — even without searching for them. CMAES can find a solutions to the IK problem, with the builtin priors of the DDE it finds the ones we want.
5. Discussion
A novel approach to search, DDEElites, which searches for the best solution representation concurrently with the solutions themselves, was introduced. Two core techniques form this representation search: a VAE to encode a set of solutions into a representation, and MAPElites to create this solution set. During search the VAE acts as an exploitative search operator, using the examples of high performing solutions provided by MAPElites as a template to produce similar high performers. MAPElites acts as an explorative operator, generating new solutions which expand the range of the encoding. A bandit algorithm is used to balance the tradeoff of exploration of new representations and exploitation of the current encoding throughout the search.
Our approach rapidly optimizes the MAPElites archive, even in high dimensional search spaces, and automatically creates a domainspecific encoding that can be used for accelerating future optimization. This opens promising research avenues for finding representations for online optimizers used in Model Predictive Control (mayne2000constrained), optimizing 3D shapes directly in the voxel space without using complex handdesigned representations like CPPNs (cppn; hyperneat; hornby2003generative)
, or efficiently searching for neural network weights in deep reinforcement learning
(such2017deep).Though constructed around a different view of optimization, DDEElites can be easily integrated with other variants of MAPElites. Just as the line mutation operator was included as an option for the bandit to choose for variation, other variation operators that may improve performance, such as CMAME (fontaine2019covariance), can be seamlessly inserted. The biases and expressivity of a DDE are a direct result of the make up of the archive, and it would be interesting to see how this would be reflected when using MAPElites variants which dynamically alter the distribution of individuals in the behavior space (fontaine2019mapping) or the definition of the behavior space itself (cully2019autonomous; gaier2019quality).
Though the latent space created by VAEs are easier to navigate than those created by normal autoencoders, even better models offer the opportunities for further improvements. Much work has been done to create VAEs which have even better organized latent spaces (higgins2017beta; burgess2018understanding; chen2018isolating; kim2018disentangling), ideally with each dimension responsible for a single phenotypic feature such as the lighting or color of an image.
Beyond raw performance advantages such “disentangled” representations offer even more interesting opportunities. Reducing the dimensionality of the search space into meaningful components would allow for modelbased based optimization of single solutions (bo), or the entire archive (gaier2018data). Humans engineers could interactively explore and understand such encodings, laying bare the underlying properties responsible for performance and variation — receiving insight and domain knowledge from their encoding rather than the other way around.
References
Supplemental Material
a.. Example Maps
Arm20  Arm200  Hexapod 

b.. Median/Variance of Bandit Values over Replicates
Arm20  Arm200  Hexapod 

c.. Hyperparameters
Hyperparameter  Value  

Isometric Mutation Strength  0.003  
Line Mutation Strength  0.1  
Batch Size  100  
Bandit Options, 


Bandit Window Length  100  
Generations per VAE Training  1  
Epochs per VAE Training  5  
Mutation Strength when Searching DDE  0.15  
Latent Vector Length [Arm20]  10  
Latent Vector Length [Arm200]  32  
Latent Vector Length [Hexapod]  128 
Comments
There are no comments yet.