A natural approach to studying schema processing

05/12/2017
by   Jack McKay Fletcher, et al.
University of Plymouth
0

The Building Block Hypothesis (BBH) states that adaptive systems combine good partial solutions (so-called building blocks) to find increasingly better solutions. It is thought that Genetic Algorithms (GAs) implement the BBH. However, for GAs building blocks are semi-theoretical objects in that they are thought only to be implicitly exploited via the selection and crossover operations of a GA. In the current work, we discover a mathematical method to identify the complete set of schemata present in a given population of a GA; as such a natural way to study schema processing (and thus the BBH) is revealed. We demonstrate how this approach can be used both theoretically and experimentally. Theoretically, we show that the search space for good schemata is a complete lattice and that each generation samples a complete sub-lattice of this search space. In addition, we show that combining schemata can only explore a subset of the search space. Experimentally, we compare how well different crossover methods combine building blocks. We find that for most crossover methods approximately 25-35 result from the combination of the previous generation's building blocks. We also find that an increase in the combination of building blocks does not lead to an increase in the efficiency of a GA. To complement this article, we introduce an open source Python package called schematax, which allows one to calculate the schemata present in a population using the methods described in this article.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

01/30/2019

Code Farming: A Process for Creating Generic Computational Building Blocks

Motivated by a desire to improve on the current state of the art in gene...
05/21/2018

Evolving Real-Time Heuristics Search Algorithms with Building Blocks

The research area of real-time heuristics search has produced quite many...
05/18/2004

Designing Competent Mutation Operators via Probabilistic Model Building of Neighborhoods

This paper presents a competent selectomutative genetic algorithm (GA), ...
10/28/2021

TorchAudio: Building Blocks for Audio and Speech Processing

This document describes version 0.10 of torchaudio: building blocks for ...
06/05/2020

Path Towards Multilevel Evolution of Robots

Multi-level evolution is a bottom-up robotic design paradigm which decom...
03/06/2020

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

Machine learning research has advanced in multiple aspects, including mo...
12/11/2002

The structure of evolutionary exploration: On crossover, buildings blocks and Estimation-Of-Distribution Algorithms

The notion of building blocks can be related to the structure of the off...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Genetic Algorithms (GAs) are a hugely popular method for optimization and have found successes on many problems [1]. Sadly, unlike other optimization techniques such as gradient decent [17], simulated annealing [13, 3, 14], Ant Colony Optimization [25, 7] or Particle Swarm Optimisation [6, 28], GAs lack a rigorous explanation of exactly why and on what functions they perform well. There is, however, a chief approach to studying the power GAs, which is by considering the schemata GAs are processing.

Schemata are simple mathematical objects which describe points and hyper planes in the space of all possible words over an alphabet of the same length [16]. Specifically, a schema is a word made with an additional symbol called the wild card symbol, which stands for ‘dont’t care’. For example, the schema over the binary alphabet represents the set of binary strings which have a in positions one and three and a or in positions two and four. In this way, the wild card symbol is similar to a blank tile in the popular board game Scrabble. Schemata have properties. For a schema , the order of , denoted is the number of non-wild card symbols (that is symbols which are not ‘’) in . For example, the order of the above schema is . The defining length of , denoted is the distance between the first and last non wild card symbol. The above schema has a defining length of 2. A word is said to be an instance of if it matches . For example, the word is an instance of the above schema. In the context of a GA, the fitness of in a population is the average fitness of all of it’s instances.

Holland, in his partly philosophical work on adaptation, argues that any adaptive process test subsets of the search space through schema [16]. As each individual in the search space belongs to several schemata at once, by evaluating one individual many schemata are implicitly sampled. The idea being, if an individual is fit it suggests the schemata of which it is an instance are also fit. Thus, by testing a few individuals many schemata are sampled. This property is called implicit or intrinsic parallelism. Holland claims that natural evolution exploits this property. From this concept, Holland created a statement about the schema processing performed by GAs:

Definition I.1.

A building block is a low order, low defining length and above average fitness schema.

Hypothesis I.1.

The BBH (The building block hypothesis): competent GAs find increasingly better solutions by combining building blocks.

Holland’s idea of building blocks is threefold. Firstly, as building blocks have above average fitness, they have a high probability of surviving and generating offspring. Secondly, as they have a low order, they have a low probability of being disrupted by mutations. Thirdly, as they have a low defining length they have a low probability of disruption because of crossover. All of these properties point towards the BBH, that is: building blocks surviving and being combined in subsequent generations. Note that the BBH is a statement about adaptive systems in general, but in this case it is applied specifically to GAs.



The BBH was the putative explanation for the power of GAs for a long time. Schema theorems, which provide a lower bound on the expected number of instances of a schema in one generation occurring in the subsequent generation [5, 21, 12] seemed to add credence to the BBH as they show that building blocks have a high chance of surviving. However, in later times the BBH came under many philosophical and theoretical criticisms. In his paper titled ’The building block fallacy’ [27], Thorton questions the reasoning leading to the BBH and proposes contradictions between the schema theorem and the BBH. Others, such as Vose calls the earlier theory of GA ”myths and folklore” and argues there is a lack of a standard GA theory [29]. In Holland’s framing, the schemata being manipulated by a GA are semi-theoretical objects in that they are not directly manipulated by the genetic algorithm. Rather, it is proposed that the distribution of offspring should change as if the schemata of the parents had been sampled and combined.

There is one article which studies schema processing in a non-theoretical manner. Namely, Mitchell et al.’s [19] insightful work on building blocks. In this article, building blocks are indirectly studied through “Royal Road Functions”. “Royal Road Functions” are fitness functions which have building blocks explicitly built into them. These functions reward individuals for finding good partial solutions, in a way setting up ‘stepping stones’ along the way to the optimal solution. Assuming the BBH, one would expect to find an optimal solution very quickly as ‘building blocks’ are written directly into the fitness function. However, the authors find that when the fitness function does not

have stepping stones it performs better, that is a GA finds the optimal solution in a fewer number of steps. It is suggested that the reward for partial solutions in Royal Road Functions causes early convergence, specifically the GA gets stuck in local optima created by the stepping stones. While the authors offer an indirect insight into BBH, it is still unclear what exactly happens to schemata in the course of a genetic algorithm. The ‘building blocks’ in Mitchell et al.’s paper are somewhat artificial as they are defined in the fitness function rather than discovered by the population itself. It is not obvious if the same findings would apply to ‘real’ building blocks found by a GA. In practical terms, the most common application of the BBH is as a heuristic in the design of efficient encodings for GAs. In particular, encodings are often chosen as to allow building blocks to be combined meaningfully by the genetic operators

[15].

We believe the fundamental problem with using the BBH as a narrative for GAs (and with seeing GAs as schema processors to begin with for that matter) is that there is no method to observe schemata being manipulated by a GA, as such it is hard, if not impossible to test accurately any meaningful statement about the type of schema processing performed by a GA. Thus, in the current work, we present a natural method for identifying the set of schemata being tested by a population. We call this method the ‘schematic completion’. We also find that the schemata found by the schematic completion always forms a mathematical structure called a complete lattice, we call ‘the schematic lattice’. Using these methods, one can observe the exact schema processing performed by a GA by simply calculating the schemata present in each population and thus. These methods, we hope, are useful tools for studying GAs through the conceptual lens of schema processing and we hope that they will deepen the understanding of GAs. Specifically, we hope to be able to explore what makes a function difficult or easy for a GA to optimize, understand how useful it is to see GAs as ‘schema processors’ and perhaps inform the design of better selection, crossover and mutation methods as to improve schema processing of a GA.

In the text that follows firstly the basics of order and lattice theory are introduced, this theory will be used to define our method for calculating the schemata present in a populating. Secondly, schemata are formally defined and the notions of schematic completion and the schematic lattice are introduced. To demonstrate the usefulness of these methods, in section 4 and 5, we show how these novel notion can be used to study GAs both theoretically and experimentally. In section 4, theoretically we show the search space for good schemata is a complete lattice and that each generation samples a complete sublattice of this search space. We also find that combining schemata is not a good method to explore the search space of schemata as in most cases only a subset of the search space can be reached by combining schemata alone. In section 5, we experimentally examine how well various crossover methods combine building blocks. We find that only 25-35% of building blocks in a generation result from the combination of the previous generation’s building blocks. We also find that an increase in building block combinations does not correspond to increase in the efficiency of the GA. In the appendix of this article, an open source Python package called schematax is introduced, which efficiently calculates the schematic completion and to draws the schematic lattice. We encourage readers interested in the following work to exploit this package for their research into GAs.

It should be noted that the mathematical insights presented in this article are not particularly difficult to understand in and of themselves. However to situate schemata in the broader mathematics of order and lattice theory some basic mathematical definitions and proofs are required. If the reader is not acquainted with this subject area we advise them to look simply at the examples and figures in section 3 to intuit the notions of the schematic completion and schematic lattice.

Ii Background Mathematics

In this section, we cover the background mathematics required to introduce the basic insights into schemata presented in the next section. Many of the following definitions regarding order theory and lattice theory are adapted from Ganter and Wille’s book on Formal Concept Analysis [10] and also Birkhoff’s seminal book on lattice and order theory [4]. If these areas are understood, please skip ahead to the next section. Firstly we cover Order and Lattice theory, secondly, we introduce Closure Systems and Galois Connections.

Ii-a Order and Lattice theory

Definition II.1.

A relation is called a partial order on a set if for all it satisfies:

  1. reflexivity:

  2. antisymmetry: and

  3. transitivity: and

A partially ordered set (poset for short) is a pair with being a partial order on the set . We use if and . denotes the inverse of .

Definition II.2.

A lower neighbour of an element is another element such that there is no element with: . In this case, is an upper neighbour of a and we write to indicate this. In the literature, it is also said as covers .

Every finite partially ordered set, can be represented by a Hasse diagram. Each element in is depicted by a circle. For any if a line is drawn between the circles representing and . Using a Hasse diagram one can read off any order relation: iff there is a descending path from to . Figure 1 shows two Hasse Diagrams.

Figure 1: Two Hasse diagrams with 8 elements
Definition II.3.

A rank function, , over a poset is a function which maps each element in to the natural numbers such that for the following two properties are satisfied:

Definition II.4.

Let be a poset and let . A lower bound of is an element with for all . An upper bound of can be defined dually. If there is a largest element in the set of all lower bounds of , this element is called the infimum of , denoted . Dually, if there is a smallest element in the set of all upper bounds of , this element is called the supremum of , denoted . If , the infimum of is called the join and is denoted , dually the supremum of is called the meet and is denoted

Intuitively, one can think of the supremum of a as the “smallest element in which is greater than or equal to all elements in ”. The infimum of can be seen as “the largest element in which is less than or equal to all elements in ”.

Definition II.5.

We call an ordered set a lattice if for every , and exist. We call a complete lattice if for every , the infimum and the supremum always exist. Every complete lattice, has a largest element called the unit element of , denoted . Dually it has a smallest element, , called the zero element of , denoted .

Every complete lattice is of course a lattice. Moreover, every finite non-empty lattice is a complete lattice.

Example II.1.

The left Hasse Diagram in figure 1 is not a lattice nor a complete lattice as the join on the top most elements does not exist. However, the right Hasse diagram in figure 1 is a complete lattice (and thus a lattice) as the supremum and infimum exist for any subset of elements. Any closed real interval with under its normal interpretation as an ordering is a complete lattice. However, any unbounded set of real numbers is not a complete lattice, but is a lattice. The power set of any non-empty set with as an ordering is an exemplary complete lattice. The Hasse diagram of the complete lattice formed by is shown in figure 2.

Figure 2: The complete lattice formed by .
Definition II.6.

A subset of a complete lattice which is closed under suprema and infima, specifically:

is called complete sublattice. If is only closed under suprema it is called a meet-subsemilattice. Dually, if is closed under infima only, it is called a join-subsemilattice.

Example II.2.

The poset is a complete sublattice of the complete lattice defined in figure 2. This complete sublattice can be seen in figure 3 below:

Figure 3: The complete lattice . This is a complete sublattice of the complete lattice shown in figure 2.
Definition II.7.

Let be a complete lattice. An element is called an atom of L if and there does not exist an element such that . L is called atomic if every element implies that is an atom or that has an atom below it. That is, . L is called atomistic if every element in can be given by the supremum of a subset of the atoms.

Example II.3.

The set of atoms of the complete lattice shown in figure 2 is: . This complete lattice is atomic and atomistic. The complete lattice shown on the right of figure 1 is atomic, however it is not atomistic as the top element cannot be reached by the supremum on any subset of the atoms.

Ii-B Closure Systems and (monotone) Galois Connections

Definition II.8.

A closure system on a set is a set of subsets of which contains and is closed under intersection. Specifically, is called a closure system on if and:

Example II.4.

Consider, and let . is a closure system on , as is included in and is closed under intersection as any intersection of elements in is also a member of .

Definition II.9.

A closure operator on a set is map, which satisfies the following for all

  1. extensity: .

  2. monotonicity: .

  3. idempotency: .

Closure operators and closure systems are closely linked, as can be seen in the following theorem.

Theorem II.5.

If is a closure operator on a set then the set:

(the set of all closures of a closure operator) is a closure system. Conversely if is a closure system on then the following operator:

Defines a closure operator.

There is a bijection between closure operators and closure systems. Every closure operator has a corresponding closure system and every closure system has a corresponding closure operator. A closure system can be seen as the set of all closures of a closure operator. Whats more, closure systems (and thus closure operators) are closely linked with complete lattices, as will be seen in the next proposition.

Proposition II.6.

If is a closure system then is a complete lattice, where for the infinum, is given by and the supremum is given by . Every complete lattice is isomorphic to the lattice of all closures of a closure system.

Example II.7.

The complete lattice formed by the closure system appearing in the example above is seen in figure 4, below:

Figure 4: The complete lattice formed by the closure system with as an ordering.
Definition II.10.

Suppose we have two partially ordered sets, and . Two montone functions over these sets, and are called a Galois connection of and if we have for all and :

In this case, is called the lower adjoint and G the upper adjoint. Equivalently, if and satisfy the following conditions they also form a Galois Connection. For all and all for call we have:

Proposition II.8.

The composition is a closure operator.

Iii Schemata

In this section we define schemata and define two basic operations, the expansion and compression. Secondly, using these operators we define the notions of the schematic completion and the schematic lattice and prove properties about these notions.
Let be a finite alphabet which does not contain symbol . We use to denote the set of all words of length over .

Definition III.1.

The schematic alphabet of is with an extra symbol, , the wild card symbol. We use to denote the schematic alphabet of . Symbols in which are not the wild card symbol are called fixed symbols.

Definition III.2.

A schema is a word over . We use to denote all schemata of length over including the empty schema, .

Example III.1.

Let be the binary alphabet, that is . The schematic alphabet of , denoted , is the alphabet . An example of a schema in is .

Definition III.3.

For any schema we define the following operator , called the expansion of , which maps to a subset of words in :

where subscript denotes the character at position in a word or schema. When then . More simply put, is the set of all words in that can be made by exchanging the symbols in with symbols from .

Example III.2.

Continuing the example above, . The in the first position is fixed. Note and .

Definition III.4.

Conversely, for any we define , called the compression of , which maps on to a schema

where is a schema of length such that the symbol at position in is determined in the following way: if for all then otherwise . If then . One can think of this operator as stacking up all the items in and if all elements in a column are equivalent, the symbol at that position in takes this value, otherwise there is a wild card symbol.

Example III.3.

Let then . Note if then . If then

Definition III.5.

Schemata can be ordered. For any we say if and only if . It follows that is a partial ordering on a set of schemata from the reflexivity, antisymmetry and transitivity of the subset relation.

Example III.4.

Again let . Consider the following schema in : , , , . They are ordered in the following way: . This is because .

Definition III.6.

It is possible to define compression in terms of expansion:

such that and for any

That is, is the schema whose expansion includes and is the smallest such schema to do so.

Definition III.7.

Conversely we can define expansion in terms of compression:

such that and for any we have:

That is, is the largest subset of words whose compression is equal to .

We will soon see that definitions 3 and 4 of the expansion and compression operators are useful computationally while definitions 6 and 7 are useful in proving properties about schemata.

Proposition III.5.

For any schema , we have .

Proof.

Let definition III.6 trivially yields , thus . ∎

Proposition III.6.

For , we have .

Proof.

Let , definition III.7 trivially yields , thus . ∎

Proposition III.7.

Compression is monotonic, that is for :

Proof.

Assume we will show . Proposition III.6 gives . Since the transitivity of the subset relation yields . Let and . Since we have , definition III.6 applied to yields . Thus we have . ∎

Proposition III.8.

For and we have:

Proof.

We will assume and show . Proposition III.7 gives us, . Proposition III.5 yields . ∎

Lemma III.9.

The compression and expansion operators form a Galois connection, where is the lower adjoint and the upper adjoint.

Proof.

Let and . From definition II.9, it is sufficient to show:

1) was shown in proposition III.7, 2) was shown in III.6 and 3) in proposition III.8 . ∎

Definition III.8.

For a set , we call the process of calculating the compression on each subset of A, that is , the schematic completion of , denoted .

Example III.10.

Let and the schematic completion of , results in the following set:

For example, the schema comes from the compression on the subset .

Theorem III.11.

(The fundamental theorem of schemata) For any , the schematic completion of , ordered by forms a complete lattice, that is the poset is a complete lattice. We call this lattice the schematic lattice of . Let , the supremum, , is given by . The infimum, , is given by .

Proof.

Lemma III.9 tells us and form a Galois connection, where is the lower adjoint and is the upper adjoint. As such, proposition II.8 yields as a closure operator. Hence, from Theorem II.5, is a closure system. Thus, the poset forms a complete Lattice (proposition II.6). Proposition II.6 also yields for , the supremum is given by and the infimum, . From the definition of ordering over schemata, we then have as a complete lattice and the infimum, , is given by . ∎

It is easy to check that the atoms of the schematic lattice is the set and is atomistic.

Example III.12.

Continuing the above example, the schematic lattice formed from schematic completion on A can be seen in figure 5.

Figure 5: The schematic lattice formed by the schematic completion on the set ordered by , that is the complete lattice
Example III.13.

Of course, the schematic completion is not restricted to words over the binary alphabet. Let and consider the set :

The schematic completion of this set gives us the schematic lattice in figure 6.

Figure 6: The complete lattice formed from the schematic completion on , that is . Notice, as each word in has an in the 3rd position the unit element of this lattice is .

We can more precisely define some of the original properties of schema using the above definitions.

Definition III.9.

The order (not to be confused with partial order) of schema is the number of fixed symbols in and is denoted . The order of can be equivalently defined as:

Similarly the antiorder, denoted of is the number of wild card symbols in , which can be defined as:

Example III.14.

Let s = . We can count the number of fixed symbols to give us . Equivalently .

Proposition III.15.

If we set, then is a rank function over schemata.

Proof.

To show is a rank function over schemata, it is sufficient to show for :

First we will show 1). Assume , using the definition of ordering we have . It follows then that , and thus . Now we show 2). Assume we then have with , thus . ∎

Corollary III.15.1.

Using the same method, it is possible to show that the order of a schema is a dual rank function over schemata if we make . That is for :

Definition III.10.

A word is said to be an instance of schema if and only if .

Example III.16.

The word is an instance of the schema as

We now introduce some novel properties of schemata not originally described in previous works.

Definition III.11.

For some , the confidence of a schema is given as:

In more simple terms, the confidence of is the proportion of that is found in .

The confidence of a schema is useful in GAs for understanding how confident one can be in the fitness assigned to a schema. In particular, the more instances of a schema has, the more we can trust it’s fitness.

The following lemmas and proposition are useful for computations over schemata as they allow us to determine the ordering of schemata by considering only the characters and wild cards rather than having to compute the expansion explicitly.

Lemma III.17.

Let , and . if and only if for all we have:

Proof.

This follows directly from the definition of compression. Let . If is empty, then clearly for all , otherwise, for any and any , can either differ from , thus or can equal , thus . ∎

Proposition III.18.

For , if and only if for all

Proof.

Let be any schema in .
() We will assume and show for all . From the definition of order over schema we have , let , , since lemma III.18 yields

for all . Proposition III.5 gives us and , meaning and , thus:

for all i.
() We will assume or and show . From Lemma III.18, there exist with and such that .
Proposition III.6 yields: . Since we have and , the transitivity of the subset relation yields . As , we have . ∎

This concludes out order theoretical interpretation of schemata. However, for brevity’s sake, many interesting many properties regarding schemata have not been mentioned here. For example a link to a area of mathematics called Formal Concept Analysis, which is concerned with finding concept hierarchies in object-feature relationships [10]. Secondly, as schemata are a (small) subset of regular expressions, this makes the schematic completion akin to the induction of regular languages [18, 22] where the search space is known to be a complete lattice [8].

In the subsequent section it is shown how these simple insights into schemata, specifically the schematic completion and the schematic lattice, can be used to study how combining schemata explores the search space.

Iv How does combining schemata explore the search space?

It is common to visualize the search space of GAs as a hypercube in which each schema defines a hyperplane

[31]. In this section, we offer a complementary view based on the mathematics in the previous section. We show that the search space of schemata is a complete lattice and that each generation of a GA samples a complete sublattice of this complete lattice. Finally, we consider how combining schemata explores the search space.

For a GA working on binary strings (that is, ) of size , the search space of all schemata is the set . If we order this set the search space is revealed to be the complete lattice . We could also construct the search space by using the schematic completion on the set of all possible words of length , meaning:

Figure 8 shows the search space of schemata for GAs working on binary strings of length .

Given a generation at time of a GA, the schematic completion on , , yields at least a subset of the schemata being tested by . However, it is still unclear if the schematic completion on yields all the schemata being tested by . To explore this possible limitation, consider the population

The schematic completion on returns the set:

Which forms the schematic lattice shown below in figure 7.

Figure 7: The complete lattice formed from the schematic completion on the population . That is, the lattice

Is the set of all schemata being tested by population ? It is possible to argue that the schema is being tested by as the individuals and both end in yet the schematic completion on does not yield . Thus it follows that the schematic completion does not return the complete set of all the schemata being tested by . However, we can see that whenever an individual ends in , it also begins with . Thus cannot be tested without having a in the beginning (as both and begin with a ), hence the schema which is being sampled by this population is which does appear in the schematic lattice. Indeed, as the fitness of a schema is given by the average of it’s instances the fitness of and have the same fitness, however is the accurate description of the schema being tested. A similar argument can be made for any schema which appears to be omitted, thus, the schematic completion returns all schemata being sampled by . Each generation of a GA then defines a set of schemata, , which is the set of schemata being sampled by . When is ordered by , it is a complete sublattice of the search space .

By calculating the schematic completion on the population for each generation, one can observe how schemata change during the course of a GA. There are many natural experiments and questions which can be examined using this method (for example, one could test how well various schema theorems apply to a GA with a finite population size), however in this section we will focus on the more fundamental question of how combining schemata explores the search space.

It is proposed that a GA combines ‘good’ schemata, however it is unknown how directly combining schemata explores the space of all schemata. To investigate, a method is required to identify when a schema results from the combination of a set of schemata. To do this we introduce the notion of schematic blending.

Definition IV.1.

For the schematic blending of , denoted is the schema given by

If returns the empty schema then is said to be unblendable, otherwise is said to be blendable. Given a set of schemata , the set of all schematic blends in , that is is denoted .

Example IV.1.

, and are unblendable, .

It is clear that the order of the schematic blend of , (if it is not empty) is greater than or equal to the order of any member of . In addition, when performing the schematic blend on schemata which result from the schematic completion on a set of words, it is common for the blend to result in schemata which already exist in schematic completion. Indeed, the following lemma explores this idea and is useful for understanding how combining schemata explores the schema space.

Lemma IV.2.

(The schematic blending lemma) For a set of schemata , returns the largest schema such that for all in .

Proof.

Let . Then we have:

Let:

Thus from the definition of compression we have as the largest schema with:

and from the definition of partial ordering for schemata we have:

In simple terms this lemma says: blending a set of schemata returns the largest schema which is smaller than all elements in . In this way the schematic blending is similar to the infimum operator over the schematic lattice. However, instead of returning a schema , a schema is required.

So far we understand this much: given a generation at time , of a genetic algorithm, the schemata being tested by this generation is the schematic completion of , , this samples a complete sub-lattice of the lattice all the possible schemata. The set of schemata which can be reached by blending these schemata is then . However, the schematic blending lemma tells us that blending only searches the spaces inbetween the layers of the lattice, and not ‘sideways’ or upwards. What is more, schematic blending is idempotent that is: . Thus, by combining schemata alone, only a search over the lower neighbours of pre-existing schemata can be performed, meaning only a subset of the space of all schemata can be reached. Figure 8 demonstrates how blending explores the space of all schemata and is a good visual summary of the results from this section. Thus we conclude, blending schemata alone is not a good tool for exploring the space of all schemata.

Figure 8: The search space for all schema of length three arranged as a complete lattice. The schemata surrounded by bold circles are those sampled by the generation , that is the elements in the complete sub-lattice . The schemata in dotted circles are those which cannot be reached through any combination of schemata in , while the plain circles show the schemata which can be reached through the combination schema in . This figure was made using the schematax package.


It is proposed that the power of GAs comes through the combination of a particular type of schemata, namely building blocks. However, as combining schemata in general is not a good exploratory tool, it sheds a serious doubt on how useful combining building blocks is as a search tool. Specifically, if a building block is not ‘enclosed’ within the schematic lattice defined by the initial generation of a GA, combining schemata alone will not discover it. This suggests the disruption and construction of schemata through the imperfect combination of schemata (via crossover) and mutation may play a vital role in allowing a greater exploration of the search space. This contrasts with the traditional view of GAs, where the disruption of schemata via crossover is traditionally seen as a nuisance as they hinder the combination of good schemata [31, 30]. There has been some work however which suggests that the construction of novel schemata through crossover play a useful exploratory role [23].

In the next section, we demonstrate how the concepts of the schematic completion and the schematic lattice can be used experimentally to observe the schema processing performed by a GA.

V Observing building blocks

In this section, we use the schematic completion to observe the building blocks during the course of a GA. First, however, to study the building block hypothesis using the above methods, we must more precisely define building blocks. In Holland’s framing (definition I.1), the phrases ‘low order’ and ‘low defining length’ could refer to an absolute value, such that ‘low order’ schema are schema whose order is less than say . However, for the purpose of this article we take ‘low’ to be relative to the given generation, so that ‘low order’ refers to schemata with below average order, similarly ‘low defining length’ refers to schemata with below average defining length. Building blocks are then redefined as follows:

Definition V.1.

A building block is a schema with below average order, below average defining length and above average fitness 111It should be noted that when computing the average order, fitness and defining length of schemata we do not include the empty schema nor schema with no wildcards (i.e the individuals of the population). .

Using this definition one can find the building blocks present in a given generation by firstly using the schematic completion on the generation to find all schemata being tested, then by secondly filtering out the building blocks using the definition above. In our framing, the BBH is a statement about the map from a schematic lattice in generation to the schematic lattice in generation . In particular, the BBH states the building blocks in generation should to some degree result from the combination of building blocks in generation

However, before we examine how well building blocks are combined using the above updated version of the building block hypothesis and the notion of schematic blending, it is first interesting to examine how the average order and defining length of building blocks change over the course of a GA. If the order and defining length of building blocks increase, it suggests that building blocks are getting larger and more clumped together (as suggested by the BBH). To investigate, we consider the Canonical GA222The Canonical GA is a binary GA, with roulette wheel selection, single point crossover and mutation. [11] solving the all ones problem (where the fitness of an individual is the number of ones found in the string). We use binary strings of length and pick the mutation rate to be and the population size as . We run the GA for generations, and for each generation, we calculate the schematic completion on each to yield all the schemata being tested by that generation, filter out the building blocks using our modified definition of the BBH above and then calculate the average order and defining length of the set of building blocks. We plot the results in Figure 9 averaged over simulations, where each simulation is started with a random initial population. In addition, to give the reader an indication of what the building blocks may look like for the all ones problem, we display the 3 building blocks from generations, , , and from one simulation in figure 9. As can be seen in Figure 9, the order and defining length of the building blocks is indeed increasing during the course of a GA. It seems, at least for the all ones problem, that the defining length and order of the building blocks quickly increase, and then level out around generation 60.

The results in figure 9, hint at the BBH being implemented GA. However, it is still unclear if and how many building blocks are being explicitly combined by the genetic operators of a GA. Perhaps the results in figure 9 are not best explained by building blocks being combined but instead it is to be expected when a population becomes less random. In particular, when a set of words is more random the average order and defining length of the schemata will be lower because the words ‘agree’ less. As such, the less random the population becomes (as is the case in a GA working on a reasonable fitness function), the more the words ‘agree’ and thus the schemata as result will have higher order and defining length. It is interesting to note that the building blocks which are found are far less neat than those suggested by [19] in the royal road fitness functions.

(a) The average order and defining length of building blocks during the course of a GA (averaged over simulations). In this plot, the dashed line represents the average defining length of the building blocks while the solid line represents the average order of the building blocks.

Generation 0: ***1*****11***************************************************** ***1***************0*******************1************************ ********************************0**1****************************
Generation 40: 1*101*01011*1***1*******111**1*1*01011***111*1**100*111111*0*1*1 1*101*01011*1***1********110*111*0*0111**111*1**100*111111*0*1*1 1*101*01011*1***1********110*111*0*0*11**111*1**100*111111*0*1*1
Generation 80:
11*01*0101**110***0*111*01***1*110**11*10*11*11**00**11****0*1** 11*01*0101**110***0*111**1***1*110**11*10*11*11**00**11****0*1** 11*01**10*101100****111**1**11*1*****1110*11*11***0**11****0*1**
Generation 120: *1*****1*1***1**110011110*11*01*1**1****0******0***1*1111*01*111 *1*****1*11**1**110011110*11*01*1**11***0******00**1*1111*01*111 *1*****1*11**1**110011110*11*01*1**1*1**0******00**1*1111*01*111

(b) A selection of three building blocks from generations: 0, 40, 80 and 120.
Figure 9:


For our second experiment, we use the concept of schematic blending to examine exactly how many building blocks from a generation result from the combination of building blocks in generation . We test how well building blocks are combined using different crossover methods. The GA is setup the same as above, however to keep the calculation of the schematic completion and blends tractable, we use a population of size working over individuals of length . We calculate the building blocks of generation as before, we then find the set of all schematic blends on the building blocks, let’s call this set . It is then checked what percentage of the building blocks in generation are members of . Figure 10 displays the results. The results, in this case, are averaged over 100 simulations. On average only approximately of building blocks from a generation are created by the combination of building blocks from the previous generation in the case of 1 to 9 point crossover as well as uniform crossover (UX). Interestingly, it is proposed that UX disturbs the combination of building blocks compared to traditional crossover methods [26, 24], however we find it combines building blocks equally well as other crossover methods. In general the reason why building blocks are not combined optimally are several and mostly well known: firstly individuals which are instances of building blocks are not guaranteed to be selected by roulette wheel selection for crossover, thus those building blocks cannot be blended, secondly mutation can disrupt the blending of building blocks if a fixed symbol is mutated after crossover, thirdly crossover is not guaranteed to blend schemata if a suboptimal crossover point is chosen. Thus, we can conclude combining building blocks from generation only accounts for of the building blocks in generation , the remaining building blocks from generation are created by other means.

Probabilistic crossover (PX) which is similar to UX but chooses bits using a weighted probability proportional to the fitness of the parents, combines building blocks the most effectively (due to it picking fitter bits with a higher probability), yet finds the optimal solution in a later generation. It is proposed that “competent genetic algorithms combine building blocks” [16]. PX offers a counter-example to this statement as it combines building blocks well, but is not ‘competent’ in that it takes a greater number of generations to find the optimal solution compared to other crossover methods which combine building blocks poorly. It is possible that greater combination of building blocks limits the exploration of the GA. To explain further: the schematic blending lemma tells that combining building blocks will only explore the lower neighbors of preexisting building blocks, thus the GA (with PX) explores this subset well, but does not search other areas of the schematic lattice effectively. Much like Mitchell et al. [19] we believe that the over emphasis of building blocks forces the GA into a locally optimal sublattice of the search space. We conclude that the ability of a genetic algorithm to combine building blocks does correspond to how quickly it will find the optimal solution and that the novel creation of schemata (through methods other than combining building blocks) is vital for a competent GA.

Crossover method Solution found % Building Blocks combined
1-point
2-point
3-point
4-point
5-point
6-point
7-point
8-point
9-point
UX
PX
Figure 10: Time taken for various crossover methods to find the optimal solution and the average percentage of building blocks combined by the respective crossover method. UX here stand for uniform crossover, while PX stands for probabilistic crossover. Each GA is solving the all ones problem on strings of size 16, with a population of size 12 and uses roulette wheel selection. The results are averaged over 100 simulations, each starting with a random initial population.

Vi Discussion and Conclusion

The schematic completion and the schematic lattice, are fruitful in the field of GAs both theoretically, as demonstrated in the mathematics presented in sections 4, and experimentally as demonstrated in section 5. It seems using both methods however, inconsistencies are found the original schema processing theory for GAs. Specifically, section 4 shows that combining schemata (and thus building blocks) explicitly is not a good method to explore the search space of schemata as only the lower neighbors of pre-existing schemata can be reached. While section 5 shows that ‘competent’ GAs do not seem to be very concerned with combining building blocks to begin with, as only approximately 25-35% of building blocks seem to come from the combination of the previous generations building blocks for most crossover methods. In addition, an increase in the combination of building blocks (as seen in PX) does not correspond with an increase in efficiency of a GA, rather, it hinders the GA. The reason for this follows from the schematic blending lemma, specifically combining building blocks only explores a subset of the search space, thus the more a GA combines building blocks the more it gets stuck in this subset. Finally, the increase of order and defining length of building blocks over time (as is seen in figure 9), is better explained by a decrease in randomness in the population rather than by building blocks being combined. However, we believe the most significant contribution of this article are the methods introduced to explicitly calculate the schemata present in a population and the identification underlying lattice structures involved in schema processing. We hope that these methods will deepen the understanding of GAs. If the reader is interested in these methods, we encourage them to exploit the schematax software which is introduced in the following appendix section.

Vii Schematax - A Python Software package for schemata

To complement this article, we introduce an open source python package which implements schemata and all of their properties defined above. Importantly the package allows one to compute the schematic completion and to draw the schematic lattice, it can downloaded from https://github.com/iSTB/python-schemata.

Naively calculating the schematic completion using the definitions given above requires iterations over the powerset and thus is very computationally expensive (). Thus we introduce an algorithm, algorithm 1 (below), to compute the schematic completion. This algorithm is based on the algorithm presented in [20]. In this algorithm, the join operation is used on each pair of schemata. This algorithm exploits the commutativity () and idempotency () of the join operation. Meaning we do not have to compute , and if is computed, we do not need to compute . This allows the inner loop only to loop over a subset of schemata. Additionally, we exploit the the atomistic nature of the schematic lattice to build the lattice from the bottom up.

Proposition VII.1.

Algorithm 1’s worst case time complexity is where is the total number of schemata found, is the length of the strings and the initial number of strings, that is the size of .

Proof.

The outer loop (line 4) loops over , thus clearly looping times. For each , the inner loop (line 5) loops times, where is the proportion of N currently found. takes steps (line 6), where is the length of the strings. To check if is not in the current set of schemata found takes steps (line 7). So we have:

To draw the schematic lattice we exploit the Graphviz software [9] which allows one to to draw aesthetically pleasing lattices efficiently. In addition, we use the package cython to transfer the Python code into C [2]. This dramatically increases the efficiency of the schematax package. For more information the reader is referred to the documentation of the software package at: https://github.com/iSTB/python-schemata.

1:procedure Complete(P)
2:     
3:     
4:     for  do
5:         
6:         for  do
7:              
8:              if  then
9:                                          
10:               
11:     return
Algorithm 1 Schematic completion
1:procedure 
2:     
3:     
4:     for for i in [0, …,k] do
5:         if  then
6:              
7:         else
8:                             
9:     return
Algorithm 2 Join

Acknowledgment

This work was supported by the Marie Curie Initial Training Network FP7-PEOPLE-2013-ITN (CogNovo, grant number 604764), the Engineering and Physical Sciences Research Council (BABEL, grant number EP/J004561/1), and the University of Plymouth (through a PhD studentship to Jack McKay Fletcher). We also wish to thank Diego Maranan, Sue Denham and John Matthias for their valuable comments.

References

  • [1] Chang Wook Ahn and R. S. Ramakrishna. A genetic algorithm for shortest path routing problem and the sizing of populations. IEEE Transactions on Evolutionary Computation, 6(6):566–579, Dec 2002.
  • [2] S. Behnel, R. Bradshaw, C. Citro, L. Dalcin, D.S. Seljebotn, and K. Smith. Cython: The best of both worlds. Computing in Science Engineering, 13(2):31 –39, 2011.
  • [3] Claude JP Bélisle. Convergence theorems for a class of simulated annealing algorithms on rd. Journal of Applied Probability, pages 885–895, 1992.
  • [4] Garrett Birkhoff, Garrett Birkhoff, Garrett Birkhoff, and Garrett Birkhoff. Lattice theory, volume 25. American Mathematical Society New York, 1948.
  • [5] Clayton L Bridges and David E Goldberg. An analysis of reproduction and crossover in a binary-coded genetic algorithm. Grefenstette, 878:9–13, 1987.
  • [6] Maurice Clerc and James Kennedy. The particle swarm-explosion, stability, and convergence in a multidimensional complex space. Evolutionary Computation, IEEE Transactions on, 6(1):58–73, 2002.
  • [7] Marco Dorigo, Mauro Birattari, and Thomas Stützle. Ant colony optimization. Computational Intelligence Magazine, IEEE, 1(4):28–39, 2006.
  • [8] Pierre Dupont, Laurent Miclet, and Enrique Vidal. What is the search space of the regular inference? In International Colloquium on Grammatical Inference, pages 25–37. Springer, 1994.
  • [9] Emden R. Gansner and Stephen C. North. An open graph visualization system and its applications to software engineering. SOFTWARE - PRACTICE AND EXPERIENCE, 30(11):1203–1233, 2000.
  • [10] Bernhard Ganter, Rudolf Wille, and Rudolf Wille. Formal concept analysis, volume 284. Springer Berlin, 1999.
  • [11] David E Goldberg and John H Holland.

    Genetic algorithms and machine learning.

    Machine learning, 3(2):95–99, 1988.
  • [12] David E Goldberg and Kumara Sastry. A practical schema theorem for genetic algorithm design and tuning. In Proceedings of the genetic and evolutionary computation conference, pages 328–335, 2001.
  • [13] Vincent Granville, Mirko Křivánek, and Jean-Paul Rasson. Simulated annealing: A proof of convergence. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 16(6):652–656, 1994.
  • [14] Chii-Ruey Hwang. Simulated annealing: theory and applications. Acta Applicandae Mathematicae, 12(1):108–111, 1988.
  • [15] Cezary Z Janikow and Zbigniew Michalewicz. An experimental comparison of binary and floating point representations in genetic algorithms. In ICGA, pages 31–36, 1991.
  • [16] Holland John.

    Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence

    .
    MIT Press, Cambridge, MA, 1992.
  • [17] Krzysztof C Kiwiel. Convergence and efficiency of subgradient methods for quasiconvex minimization. Mathematical programming, 90(1):1–25, 2001.
  • [18] GA Miller and N Chomsky. Pattern conception. In Paper for Conference on pattern detection, University of Michigan, 1957.
  • [19] Melanie Mitchell, Stephanie Forrest, and John H Holland. The royal road for genetic algorithms: Fitness landscapes and ga performance. In Proceedings of the first european conference on artificial life, pages 245–254. Cambridge: The MIT Press, 1992.
  • [20] Lhouari Nourine and Olivier Raynaud. A fast algorithm for building lattices. Information processing letters, 71(5-6):199–204, 1999.
  • [21] Riccardo Poli. Exact schema theorem and effective fitness for gp with one-point crossover. In GECCO, pages 469–476, 2000.
  • [22] Ray J Solomonoff. A new method for discovering the grammars of phrase structure languages. In COMMUNICATIONS OF THE ACM, volume 2, pages 20–20. ASSOC COMPUTING MACHINERY 1515 BROADWAY, NEW YORK, NY 10036, 1959.
  • [23] Villiam M Spears and Kenneth A De Jong. On the virtues of parameterized uniform crossover. In In Proceedings of the Fourth International Conference on Genetic Algorithms, pages 230–237, 1991.
  • [24] William M Spears. Recombination parameters. In The Handbook of Evolutionary Computation, pages 1–3. University Press, 1997.
  • [25] Thomas Stutzle and Marco Dorigo. A short convergence proof for a class of ant colony optimization algorithms. IEEE Transactions on evolutionary computation, 6(4):358–365, 2002.
  • [26] Gilbert Syswerda. Uniform crossover in genetic algorithms. In Proceedings of the 3rd International Conference on Genetic Algorithms, pages 2–9, San Francisco, CA, USA, 1989. Morgan Kaufmann Publishers Inc.
  • [27] Chris Thornton. The building block fallacy. Complexity International, 4, 1997.
  • [28] Ioan Cristian Trelea.

    The particle swarm optimization algorithm: convergence analysis and parameter selection.

    Information processing letters, 85(6):317–325, 2003.
  • [29] Michael D Vose. The simple genetic algorithm: foundations and theory, volume 12. MIT press, 1999.
  • [30] Michael D Vose and Gunar E Liepinsl. Punctuated equilibria in genetic search. Complex systems, 5:31–44, 1991.
  • [31] Darrell Whitley. A genetic algorithm tutorial. Statistics and computing, 4(2):65–85, 1994.