Evolving Neural Network through the Reverse Encoding Tree
NeuroEvolution is one of the most competitive evolutionary learning frameworks for designing novel neural networks for use in specific tasks, such as logic circuit design and digital gaming. However, the application of benchmark methods such as the NeuroEvolution of Augmenting Topologies (NEAT) remains a challenge, in terms of their computational cost and search time inefficiency. This paper advances a method which incorporates a type of topological edge coding, named Reverse Encoding Tree (RET), for evolving scalable neural networks efficiently. Using RET, two types of approaches – NEAT with Binary search encoding (Bi-NEAT) and NEAT with Golden-Section search encoding (GS-NEAT) – have been designed to solve problems in benchmark continuous learning environments such as logic gates, Cartpole, and Lunar Lander, and tested against classical NEAT and FS-NEAT as baselines. Additionally, we conduct a robustness test to evaluate the resilience of the proposed NEAT algorithms. The results show that the two proposed strategies deliver an improved performance, characterized by (1) a higher accumulated reward within a finite number of time steps; (2) using fewer episodes to solve problems in targeted environments, and (3) maintaining adaptive robustness under noisy perturbations, which outperform the baselines in all tested cases. Our analysis also demonstrates that RET expends potential future research directions in dynamic environments. Code is available from https://github.com/HaolingZHANG/ReverseEncodingTree.READ FULL TEXT VIEW PDF
Evolving Neural Network through the Reverse Encoding Tree
NeuroEvolution (NE) is a method for evolving artificial neural networks through evolutionary strategy [1, 2]. The main advantage of NE is that it allows learning under conditions of sparse feedback. In addition, the population-based process makes for good parallelism , without the computational requirement of back-propagation. The evolutionary process of NE consists in modifying the current topological structure or weight of each  by calculating the potential relationship between a genotype and its fitness in the current population. The genotype describes the topology of the neural network.
For a complex task with a large search space, the terminated topology of the neural network is often deserve to satisfy targeted learning environment. The evolutionary process from the initial to the final network is difficult to control accurately. Recent studies [4, 5, 6]
show that the trade-off between protecting topological innovations and promoting evolutionary speed is also a challenge. The Genetic Algorithm (GA) using Speciation Strategies
allow a meaningful application of the crossover operation and protect topological innovations, avoiding premature disappearance. The distribution estimation algorithms, such as Population-Based Incremental Learning
(PBIL), represents a different way of describing the distribution information of candidate topologies of neural networks in the search space, i.e. by establishing a probability model. The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) further explains the correlations between the parameters of a targeted fitness function, correlations which significantly influence the time taken to find a suitable control strategy . Safe Mutation  can scale the degree of mutation of each weight, and thereby expand the scope of domains amenable to NE. In this study, a mapping relationship – based on constraining the topological scale – is set up between a genotype and its fitness, in order to explore how the evolutionary strategy influences the whole population. The limitation of the topological scale serves to prevent unrestricted expansion of the topology of the neural network during the evolutionary process. On the constrained topological scale, all neural networks that can be generated by corresponding genotypes have achieved fitness through specific tasks. The location of the specific neural network is the location of its feature matrix on the constrained topological scale. In this situation, the location of the nearest two neural networks can be regarded as infinitesimal, and the function made up of all locations is continuous. We define the location of a genotype as the input of the function, and its corresponding fitness as the output. Together, all locations form a complex phenotypic landscape .
In this phenotypic landscape, all evolutionary processes of the topology of the neural network can be regarded as processes of tree-based searching, like random forest. The initial population can be regarded as the root nodes, and the population of each generation can be regarded as the branch nodes of each layer. Based on the current population or other population information (such as the probability matrix), more representative or better nodes will be identified in the next layer and used as individuals in the next generation. Interestingly, certain classical search methods have attracted our attention. Some global search methods, like Binary Search  and Golden-Section Search , are not merely of use in finding extreme values in uni-modal functions, but have also shown promise when used in other fields [15, 16, 17, 18]. The search processes of the above global search method are similar to the reverse process of tree-based searching . The final (root) node is dependent on the elite leaf or branch nodes in each layer, as the topology of the final neural network is influenced by the features of the elite topology of each generation.
Based on the reverse process of tree-based searching (as the evolutionary strategy), we design two specific strategies in the phenotypic landscape, named NEAT with a reverse binary encoding tree (Bi-NEAT) and NEAT with a reverse golden-section  encoding tree (GS-NEAT). In addition, the correlation coefficient  is used to analyze the degree of exploration of multiple sub-clusters  in the phenotypic landscape formed by each generation of genotypes. It effectively prevents the population from falling into the optimal local solution due to over-rapid evolution. The evolution speed of NEAT and FS-NEAT (as the baselines) and our proposed strategies are discussed in the logic operations and continuous control gaming benchmark in the OpenAI-gym . These strategies have also passed different levels and types of noise tests to establish their robustness. We reach the following conclusions: (1) Bi-NEAT, and GS-NEAT can improve the evolutionary efficiency of the population in NE; (2) Bi-NEAT and GS-NEAT show a high degree of robustness when subject to noise; (3) Bi-NEAT and GS-NEAT usually yield simpler topologies than the baselines.
In this study, we introduce a search method into NeuroEvolution, and extract the feature genotypes for the purpose of encoding the feature matrix. Therefore we devote this section to brief descriptions of the following three topics: (1) Evolutionary Strategies in NeuroEvolution; (2) Search Methods; and (3) Network Coding Methods.
NeuroEvolution (NE) is a combination of Artificial Neural Network and Evolutionary Strategy. Preventing crossovering the topologies efficiently, protecting new topological innovation, and keeping topological structure simple are three core problems faced in dealing with the Topology and WEight of Artificial Neural Network (TWEANN) system . In recent years, many effective ideas have been introduced into NE. An important breakthrough came in the form of NEAT [4, 5], which protects innovation by using a distance metric to separate networks in a population into species, while controlling crossover with innovation numbers. However, the evolutionary efficiency of the population in each generation cannot be guaranteed. In order to guarantee the evolutionary efficiency of NE, three research paths have been devised: (1) the replacement of the original speciation strategy with a new speciation strategy ; (2) the introduction of more effective evolutionary strategies [10, 8, 6]; (3) the use of novel topological structures .
Certainly, modifying the topological structure and/or weight involves much more than the feature information of ANN itself. The above improvements make it challenging to prevent the modification of all features. Furthermore, the complexity of the topology required for obtaining the required ANN is unlimited, which means that the topological structure of ANN will not be necessarily simple.
In the field of Computer Science, search trees, such as the Binary Search tree , are based on the idea of divide and conquer method. They are often used to solve extrema in uni-modal arrays or find specific values in sorted arrays.
Recently, some improved search trees have also been used to solve extrema in multi-modal or other optimization fields [15, 16, 17, 18]. These search trees, such as Binary Search , Golden-Section Search , and Golden-Section SINE Search  complete complex tasks by combining with population  or other strategies . Tree-based searches make the whole population develop more accurately with geometric searching. In the field of multi-modal searching , they increase global optimization ability by estimating and abandoning small peaks. Therefore using tree-based search has the potential to improve evolutionary efficiency. In addition, tree-based searches have a strong resistance to environmental noise , where position of optimum point would be generated by a sampling-based distribution to enhance interference on noisy observation.
Given the crossover operation of genotypes, some search methods have spurred an interest in enhancing the precision of such crossover operations, thus opening up an interesting avenue for the introduction of search trees into NE.
At the stage of direct coding, the encoding rule of ANN is to convert it into a genotype . In order to generate large-scale, functional, and complex ANN, some indirect coding [30, 31] techniques have been proposed. However, they are not efficient enough for the evolution of local networks, because decreasing the granularity of coordinates leads to a decrease in resolution . The above encoding is a kind of cellular encoding , which uses chromosomes or genotypes consisting of trees of node operators to evolve a graph.
Edge encoding , which is different from cellular encoding, grows a graph by modifying its edges, and thus has a different encoding bias than the genetic search process. When naturally evolving network topologies, edge encoding is often better than cellular encoding . Edge encoding can use adjacency matrices as representational tools . An adjacency matrix represents a graph with a type of square matrix where each element represents an edge. The corresponding nodes connected by weight are indicated by the row and column of the edge in the matrix.
We propose an advanced search method, named Reverse Encoding Tree (RET), to leverage the existing speciation strategy  in NEAT. The edge encoding  with the adjacency matrix is the representation of RET for network coding. RET uses unsupervised clusters  to dynamically describe speciation and speciation relationships. To reduce the complexity of the terminated network , RET limits the maximum number of nodes in all generated neural networks.
An illustration of this strategy (using binary search, namely Bi-NEAT), is provided in Fig 1. Different from the speciation strategies in NEAT, RET crosses genotypes by search method and evaluates the relationships within and between species by best fitness and correlation coefficient in each cluster, which estimates the small peaks in the phenotypic landscape. Through abandoning these small peaks, RET speeds up the evolutionary process of NE.
The evolution of the neural network can be achieved by changing network structure, connection weight, and node bias. Changing the topology of neural networks is a coarse-grained evolutionary behavior . Therefore, to search for the solution space more smoothly, we first limit the maximum number of nodes () in the neural network. The explorable range of the population is therefore fixed and limited to avoid unrestricted expansion of the topology of the neural network during the evolutionary process. The limitation of nodes generated would give the weight and bias information in the specific network a greater chance of being optimized.
We first introduce a landscape () as a combination of generated neural networks with a fitness evaluation to perform a task in a targeted environment (e.g., XOR Gate or Cartpole ). includes all networks in the solution space.
We define a seeding () from the initial population in the range of with a specified number (), as . There is an initial distance () between each of the two genotypes, to ensure that the initial population can attain as much diversity as possible in . In addition, the related hyper-parameter describes the minimum distance between two genotypes. From previous studies , it is known that reduces the efforts of the population to over-explore the local landscape. The dynamics of genotype would increase when the distance between a novel genotype and other, existing genotypes is less than . The distance check equation is shown as:
The distance between two genotypes is encoded as the Euclidean distance  of the corresponding feature matrix:
where is the feature matrix, in the range of . In the feature matrix, the first column is the bias of each node, and the other columns are the connection weights between nodes in the neural network generated by the genotype. An illustration of the feature matrix is provided in Fig. 2. The feature information includes input, output, and hidden nodes. Therefore, the size of the feature matrix is . Because the feature matrix includes all features of the genotype, any genotype can be created from its feature matrix by .
The population in the current generation is composed of the genotypes saved (elite) from the population in the previous generation and the novel genotypes generated by RET based on the landscape of the population in the previous generation.
RET is different from original evolutionary strategies, as is shown in Fig. 3.
The search process of RET is divided into two parts: (1) the creation of a nearby genotype from the specified parent genotype by the original frame of NEAT:
(2) the creation of a global genotype from the two specified parent genotypes or feature matrices:
We further propose an efficient, unsupervised learning method for analyzing the network seeds generated. The motivation for clustering the population based on the similarity of genotypes is to explore the evolvability of each type of genotype set after protecting topology innovations. The current population is divided into
clusters for understanding the local situation of the landscape generated by the current population. Many clustering methods can be used in this strategy. We compared K-means++41], and Birch Clustering , and selected the most advanced, K-means++, thus:
where is the set of clusters, is the cluster, and is the center of the cluster. The optimal genotype in the cluster can be obtained by comparing the fitness of each genotype:
where is the fitness of the genotype. The set of saved genotypes collects the optimal genotype in every cluster:
The correlation coefficient () of distance from the optimal position of the genotype and fitness for all the genotypes in each cluster is calculated, to describe the situation of each cluster:
For the local phenotypic landscape of a single maximum value, the distance and fitness show a negative correlation (positive ), will reach . If the landscape is complex (negative ), the relationship between distance and fitness is not significant. Two types of are shown in Fig. 4.
RET’s operation occurs between each of the two clusters:
The operation selection is dependent on the optimal genotypes and the correlation coefficients of the two specified clusters. Therefore, the number of novel genotypes is less than or equal to . We assume that if , cluster has been explored fully, or its local phenotypic landscape is simple. When , the operation selection in each comparison is:
where is the novel genotype created by two centers of the specified cluster.
In summary, our proposed evolutionary strategy uses RET based on the local phenotypic landscape to evolve the feature matrix of genotypes in the population. The pseudo-code of this evolutionary process is shown in Alg. 1.
In order to verify whether NE based on tree search can improve evolutionary efficiency and fight against environmental noise effectively, we designed a two-part experiment: (1) We explore the effect of our proposed strategies and the baseline strategies in classical tasks, such as the logic gate; (2) We explore the effect of our proposed strategies and the baseline strategies in one of the classical tasks (Cartpole-v0) under different noise conditions.
The two-input symbolic logic gate, XOR, is one of the benchmark environments in the NEAT setting. The task is to evolve a network that distinguishes a correct Boolean output from . The initial reward is , and the reward will decrease by the Euclidean distance between ideal outputs and actual outputs. We select a higher targeted reward of to tackle this environment. In addition, we add three kinds of additional logic gate, IMPLY, NAND, and NOR, to explore algorithm performance with different task complexities. The complete hyper-parameter setting in the logical experiments is as shown in Tab. I. To enhance the reproducibility of our work, select the XOR environment from the most popular neat-python 111https://neat-python.readthedocs.io/en/latest/xor_example.html
package and open-source our implementation in the supplementary material.
Our testing platforms were based on OpenAI Gym , well adapted for building a baseline for continuous control. Cartpole: As a classical continuous control environment , the Cartpole-v0  environment is controlled by bringing to bear a force of or to the cart. A pendulum starts upright, and the goal is to prevent it from toppling over. An accumulated reward of would be given before a terminated environment (e.g., falling degrees from vertical, or a cart shifting more than units from the center). As experimental settings, we select a sample size of
, and use relu activation for neural network output to select an adaptive action in Tab.II. To solve the problem, we conduct and fine-tune both NEAT and FS-NEAT as baseline results for accessing targeted accumulated rewards of in episode steps .
Here, we have improved the requirements of the fitness threshold ( rewards in episode steps) and normalized the fitness threshold as . See Tab. II.
Lunar Lander: We utilize a box-2d gaming environment, lunar lander-v2 as shown in Fig. 6, from OpenAI Gym . The objective of the game is to navigate the lunar lander spaceship to a targeted landing site on the ground without collision, using two lateral thrusters and a rocket engine. Each episode lasts at most steps and runs at frames per second. An episode ends when the lander flies out of borders, remains stationary on the ground, or when time is expired. A collection of six discrete actions that correspond to the off steering commands and main engine settings. The state, s
, is an eight-dimensional vector that continuously records and encodes the lander’s position, velocity, angle, angular velocity, and indicators for the contact between the legs of the vehicle and the ground. For the experiment, we select asample size as the Cartpole-v0 setting with details in Tab. III.
One of the remain challenges for continuous learning is noisy observation  in the real-world. We further evaluate the Cartpole-v0  with a shared noisy benchmark from the bsuite . The hyper-parameter setting is shown in Tab. IV.
Gaussian noise or white noise is a common interference in sensory data. The interfered observation becomeswith a Gaussian noise
. We set up the Gaussian noise by computing the variance of all recorded states with a mean of zero.
Reverse Noise Reverse noise maps the original observation data reversely. Reverse noise is a noise evaluation for sensitivity tests with a higher L2-norm similarity but should affect the learning behavior on the physical observation. Reverse observation has been used in the continuous learning framework for communication system  to test its robustness against jamming attacks. Since 100% of the noise environment is consistent with a noise-free environment, we dilute the noise level to the original 50% (as dilution coefficient in Reverse).
|benchmark task||CartPole v0|
|dilution coefficient in Reverse||50%|
|peak in Gaussian||0.20|
Here we take NEAT and FS-NEAT as baselines. The weight of connection and bias of node are the default settings in the example of neat-python.
After running 1 000 iterations for each method in the logical experiments, continuous control and game experiments, and noise attack experiments, we obtained the results shown in Tab. V, Tab. VI, and Fig. 7. The evolutionary process across all the methods has the same fitness number in each generation. Therefore the comparison of average end generation is the same as the comparison of calculation times for the neural network in the evolutionary process.
After restraining the influence of hyper-parameters, the tasks from Tab. V describe the influence of task complexity on evolutionary strategies. The results show that with the increase in task difficulty, our algorithm can make the population evolve faster. In the IMPLY task, the difference between the average end generation is 1 to 2 generations. When the average of end generations in XOR tasks is counted, the gap between our proposed strategies and the baselines widens to nearly 20 generations. Additionally, the average node number in the final neural network and the task complexity seem to have a potentially positive correlation.
The tasks in the continuous control and game environments Bi-NEAT and GS-NEAT still show amazing potential. See Tab. VI. Unlike in the case of the logical experiments, the results show that the two proposed strategies are superior both in terms of evolutionary speed and stability. The enhanced evolutionary speed is reflected in the fact that the baselines require two to three times the average end generation as our strategies for the tested tasks. In addition, the smaller standard variance of end generation shows the evolutionary stability of our strategies.
As shown in Fig. 7, the evolutionary strategies based on RET show strong robustness in the face of noise. With the increase in noise level, the fail rate of all the tested strategies increases gradually. In most cases, the baselines show a higher fail rate than our strategies. In the task with the low noise level, our strategies have a fail rate of one, as compared to dozens for the baselines. However, in a few cases with high noise levels, all the strategies are unable to achieve results.
In general, with the same fitness number of population, Bi-NEAT and GS-NEAT show better performance by ending up with fewer generations than NEAT for the symbolic logic, continuous control, and 2D gaming as the benchmark environments in this study. Our proposed strategies are also superior in the tested tasks with incremental noisy observation. We conclude than they are robust in the face of noise attacks, able to deal easily with sparse and noisy data.
More interestingly, the performance nuances of Bi-NEAT and GS-NEAT in different tasks also attracted our attention. It is clear that Bi-NEAT is better than GS-NEAT in all tasks without noise. Our preliminary conclusion is that evolutionary speed is affected by the phenotypic landscape of different tasks, because the local peak of the landscape is usually small and sharp, as implied by the process data. Another interesting point we observed is that GS-NEAT usually fares better than Bi-NEAT in the noise test. Further efforts could be performed to investigate the underneath mechanism and theoretical bounds.
This paper introduced two specific evolutionary strategies based on RET for NE, namely Bi-NEAT and GS-NEAT. The experiments with logic gates, Cartpole, and Lunar Lander show that Bi-NEAT and GS-NEAT have faster evolutionary speeds and greater stability than NEAT and FS-NEAT (baselines). The noise test in Cartpole also shows stronger robustness than the baselines.
The influence of evolutionary speed, stability, and robustness of the whole strategy on the location selection of new topology nodes is worth further study. An assumption to validate is that this location selection can be adaptive vis-a-vis the landscape of generation.
This work was initiated by Living Systems Laboratory at King Abdullah University of Science and Technology (KAUST) lead by Prof. Jesper Tegner and supported by funds from KAUST. Chao-Han Huck Yang was supported by the Visiting Student Research Program (VSRP) from KAUST.
S. Whiteson, P. Stone, K. O. Stanley, R. Miikkulainen, and N. Kohl, “Automatic feature selection in neuroevolution,” inProceedings of the 7th annual conference on Genetic and evolutionary computation. ACM, 2005, pp. 1225–1232.
J. Lehman, J. Chen, J. Clune, and K. O. Stanley, “Safe mutations for deep and recurrent neural networks through output gradients,” inProceedings of the Genetic and Evolutionary Computation Conference. ACM, 2018, pp. 117–124.
C. Igel, “Neuroevolution for reinforcement learning using evolution strategies,” inThe 2003 Congress on Evolutionary Computation, 2003. CEC’03., vol. 4. IEEE, 2003, pp. 2588–2595.
R. Wang, J. Clune, and K. O. Stanley, “Vine: an open source interactive data visualization tool for neuroevolution,” inProceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, 2018, pp. 1562–1564.
Engineering Applications of Artificial Intelligence, vol. 50, pp. 201–214, 2016.
The codes and configurations are available in the Github. This library has been improved and upgraded on neat-python . By inheriting the Class.DefaultGenome, the global genotype, Class.GlobalGenome, is realized. The specific evolutionary strategies, like Bi-NEAT and GS-NEAT, inherit the Class.DefaultReproduction, named bi and gs in the evolution/methods folder.
In addition, we have created guidance models for our strategies, named evolutor, in the benchmark folder. Our strategies can be used as independent algorithms for multi-modal search and as candidate plug-in units for other algorithms.
In the noise experiment, the most important indicator is fail rate. Some minor results, like the average and standard deviation of end generation, are also valuable.
As shown in Fig. 8, the average of end generation in each strategy increases with the increase in noise level. Although the fail rates of our strategies are still low in the case of high noise levels, they need more generations to reach the fitness threshold. The results from standard deviation describe the evolutionary difference between the baselines and our strategies. Under noise attacks, the baselines will be unable to train, and will cause our strategies to delay achieving the requirements.
RET is not only applicable to the field of NeuroEvolution, but can also be combined with other algorithms for tackling complex tasks. Here we compare the evolution of RET and other well-accepted evolutionary strategies, to describe the evolutionary difference in the maximum or minimum position finding under the landscape.
The function landscapes, such as Rastrigin , have potential patterns. These potential patterns will determine the effect of the algorithm to some extent.
However, the landscape of the task built by NE is discrete. After completing the experiment to find the minimum value of the Rastrigin function, we use the visualized 3D model  of Mount Everest. The data set is from Geospatial Data Cloud 222http://www.gscloud.cn/, DEM around the Mount Everest, with points. Here, we compress points into points as the final discrete data, see Fig. 9.
The evolutionary process finding Mount Everest by different evolutionary strategies is shown in Fig. 10. The Mount Everest landscape with CSV format, named mount_everest.csv, in benchmark/dataset folder of our library.