I Introduction
CAs present an attractive and effective modelling technique for a variety of problems. In order to use CAs in a practical modelling task, one needs to understand the underlying rules, relevant to the given phenomenon, and translate them into a CA local rule. Additionally, the state space, tessellation and neighborhood structure need to be pinned down beforehand. This narrows the application area for CAs, since there are problems for which it is hard to manually design a proper local rule. In some cases only the initial and final states of the system are known (e.g. [1, 2, 3]). Such problems motivate the research on automated CA identification. Various methods have been used, including genetic algorithms (GAs) [4, 5, 6, 7]
[8, 9, 10][11], ant intelligence [12][13], as well as direct search/construction approaches [14, 15, 16, 17].Existing methods can be divided into two main groups. Firstly, methods for solving specific, global problems. An example of such a problem is majority classification in which one only knows the initial condition and the desired outcome. Secondly, methods that exploit the entire time series of configurations, where it is assumed that all configurations are known. Only limited research efforts have been devoted to problems involving identification based on partial information [4].
The main goal of the research presented in this paper is to develop methods capable of automated CA identification in case of partial information. The paper is organized as follows. In Section II we start with introducing basic definitions and presenting some wellknown facts on CAs. Section III holds the formal definition of the CA identification problem, while in Section IV we reformulate this problem as an optimization task. In Section V
the evolutionary algorithm for solving the identification problem is presented. The paper is concluded by Section
VI which presents initial results of computational experiments.An introduction to the methods presented in this paper, and a simpler formulation of the discussed algorithm can be found in [18].
Ii Preliminaries
We start by defining a CA. In this paper we will concentrate on 1D, deterministic CAs with a symmetric neighborhood.
Let and be any function, then for we define the –cell global CA rule as:
(1) 
using periodic boundary conditions, i.e. for any it holds that .
The function used in this definition will be referred to as a local rule, and the integer will be referred to as the radius of the neighborhood. Any local rule can be uniquely defined by a lookup table (LUT) that lists all of the possible arguments together with the corresponding function values. It is assumed that the arguments are listed in a lexicographic order. The general form of such a LUT in the case of radius is shown in Table I.
The LUT can be used to enumerate local rules, as the coefficients can be treated as digits in the binary representation of an integer , i.e.
. Clearly this extends to higher values of the radius. Due to the fact that the ordering of arguments in the LUT is fixed, only the second row needs to be stored, such that a LUT may be represented as a binary vector. The length of such a vector is
.With we will denote the set of all binary sequences of finite length, i.e. . The function , satisfying if and only if , where each of the global rules is defined with the same local rule , will be referred to as a generalized global rule of a CA. Such functions will be frequently used throughout this paper, therefore we will simply refer to them as global rules or rules. In this paper a CA will be identified in terms of its global rule, and by referring to a CA we therefore always refer to its global rule in this generalized sense.
Note that rule is uniquely defined by a given local rule , but the opposite is not true. For a given rule we may find different local rules defining it. Fact 1 highlights the relationship between different local rules defining the same CA.
Fact 1.
Two local rules and , , define the same CA if and only if it holds:
(2) 
for any .
Example 1.
Let be defined by and be defined by . We can see that for any it holds that , and thus and define the same CA rule, which happens to be the identity rule.∎
For a given neighborhood radius , denotes the set of all CAs that can be expressed with the use of a local rule with a neighborhood of radius . CAs belonging to are referred to as Elementary CAs (ECAs), and form the most commonly studied class of 2–state CAs [19].
Two important properties of the sets are underlined in Fact 2.
Fact 2.
For any , and .
Let be a CA, for some and . The finite sequence of vectors given by:
where denotes the –th application of the rule , will be referred to as the spacetime diagram containing time steps. Each of the elements of a spacetime diagram will be referred to as a configuration of the CA, while the first element will be referred to as the initial configuration. If and , then refers to the state of the –th cell in the –th element of the spacetime diagram.
Example 2.
We consider an ECA defined by local rule 150. The LUT of ECA 150 is shown in Table II.
Figure 1 depicts a spacetime diagram of ECA 150, starting from a random initial configuration. Following a common convention, the spacetime diagram is visualized as a bitmap in which every row corresponds to a configuration at specific time step. The first row in the image is the initial configuration. State one is drawn as a black pixel, while white pixel corresponds to state zero.∎
Iii Problem statement
In this section we define the identification problem. The formulation presented below is based on the concept of an observation of a spacetime diagram, which is assumed to be incomplete, i.e. it contains only partial information on the states of the CA.
Formally, we assume that the states of a system, which is assumed to be an unknown CA, were observed at certain, unknown time steps. Let be an array containing symbols belonging to the set , where the symbols and denote valid states, while denotes an unknown state belonging to the set . Additionally, let the first row . Such an array will be referred to as an observation. If an observation does not contain the symbol , we refer to it as spatially complete. The first row is assumed to represent the initial configuration of a CA, and row for represents the configuration at time step . It is assumed that .
Let be an observation. The number will be referred to as the number of completely observed states. In our case, for any observation it holds that .
For each observation , we define the set that contains all of the spatially complete observations , satisfying for all such that .
Example 3.
Let observation be given by:
Then the set is given by:
As can be easily counted, .∎
We will say that a CA fits the observation if and only if there exists an and a sequence of natural numbers such that and for any it holds:
(3) 
Proposition 3.
Rule fits the observation if and only if there exist an and a sequence of natural numbers such that and for any it holds:
(4) 
The sequence in the definition of fitting, corresponds to the time steps in the CA evolution (which are assigned to the rows of the observation), while the sequence in Proposition 3 refers to the number of missing time frames between two consecutive rows in the observed diagram. Obviously .
In practice, it is useful to be able to use more than one observation for the identification. Therefore, we will consider observation sets containing a finite number of observations. For simplicity, we assume that the elements of are numbered, i.e. . We will say that rule fits the observation set , if it fits all of the observations in the set.
Note that for the sake of simplicity we will write to express the number of observed states in all of the observations belonging to , i.e. . Additionally, we will write to denote the total number of columns in all of the observations belonging to , i.e. where is the number of columns of observation .
For a nonempty observation set , the set will denote all CA rules that fit the observation set . The identification problem is defined as finding the elements of the set based on . In practice, our goal will be limited to finding at least one of the elements of for some . The problem can also be seen from the machine learning perspective in which the observation set is a training set, from which we try to learn and build a set of rules based on this knowledge.
The following fact will be used in the design of the identification algorithm, to simplify calculations. Informally, it could be expressed by understanding the observation set as a set of conditions that the rule needs to meet. Having fewer conditions, it becomes more likely to find solutions meeting those conditions.
Fact 4.
Let be an observation set, and let . Then .
Since we consider only finite observation sets, we know that for every observation set there exists a such that, if a solution exists, and is the time gap sequence of observation , then , for every . In the construction of the solution algorithm, we will assume that an upperbound for is known.
Iv CA Identification as an optimization problem
The identification problem, defined in Section III, can be formulated as an optimization problem, which in turn enables the use of evolutionary search methods.
We start with an auxiliary definition. Let be some vectors. We define the distance between and as:
(5) 
We assume that if there is no such that and then . Therefore .
Assume that is a set of observations of some unknown CA belonging to , i.e. . Let be a CA, and for every , let be a strictly increasing sequence of natural numbers.
As a start, we define the error measure , which measures how well a given CA fits the observation set , assuming that is the time step of the –th row in observation . The measure is defined as:
(6) 
where is the number of rows of observation . The following fact is an direct consequence of the definition of the identification problem.
Fact 5.
if and only if there exists a sequence such that .
Note that in the case when we will write instead of .
Let be a sequence of natural numbers, and let be a CA rule. Observation defined as:
will be referred to as the –completion of with time gaps . Note that any observation satisfies for any , .
Fact 6.
.
Example 4.
Assume that CA is ECA 150 with LUT given by Table II and local rule . Let . We consider the observation defined in Example 3 and compute .
The calculation is as follows. Firstly we compute . Since we simply apply the rule to the first row of , i.e. . Since , to find , we first need to compute one additional configuration by evaluating the rule on configuration . It is easy to check that , and thus ∎
Based on Proposition 3, we define an alternative error measure that will turn out to be more useful in the construction of the solution algorithm. Assuming that is a sequence of natural numbers representing time gaps, is defined as:
(7) 
Since , we can express without using the function as:
(8) 
Example 5.
We refer again to observation , CA and used in Example 4 and we compute the error measures and . Let us start with . Following the fact that , we get . The error measure can be computed easily by evolving , starting from the initial configuration and comparing the results with the values in , for entries not occupied by .
Starting from the top: . Since we compare the outcome with the second row of . As we see, has an incorrect value, so it does not contribute to the error and which is a correct value. Since we should further evolve three times, starting from , but since , we can simply compare with and see that no errors occur. Summing up, the total error is: .
Similarly, we find the value of , by taking pairs of rows and and comparing the results of and . The error in the first pair of rows is the same as in the case of . For the second pair the initial condition is , and since and since , we do not further evaluate . We compare with , which yields 2 incorrect values. Summing up, the total error is . ∎
The relation between and is expressed by the following proposition.
Proposition 7.
Let be a CA rule and an observation set. There exists a strictly increasing sequence of natural numbers, such that if and only if there exists a sequence of natural numbers such that .
As a consequence of Proposition 7, the identification problem can be expressed mathematically as the minimization of . Note that this is only possible due to the assumption that observation set contains partial spacetime diagrams of some unknown CA. In a more general setting, where the observations could have a more complex origin, such a simplification is not possible.
As mentioned earlier, we consider the case where the upper bound for the time gaps is known. Using this knowledge, we define the error measure independently of the selection of as:
(9) 
Note that the minimum in (9) is always defined, since there is a finite number of possibilities for the choice of . Additionally, note that for a spatially complete observation , the choice of is independent of the choice of for any , and for observations and , the choice of is independent from the choice of . Consequently, to find the value of in the case of a spatially complete observation set, we need to examine at most sequences of time gap lengths.
In the general case, the choices of the values of are dependent on each other, and thus in order to find the exact value of the error measure we need to examine all of the possibilities, which holds a substantial computational burden. Due to this, even in the case of partial observations, we follow the approach described above and treat the time steps independently. The only difference that we introduce is that if for given , few different candidate values for lead to the same, minimal value of the pairwise error, one of those candidates is being selected randomly. Such an approach, is a stochastic overestimation of the error, i.e.
the calculated value will never be lower than the actual error. Additionally, if a given CA is a solution to the problem, recalculating the approximate error measure multiple times increases the probability of finding the exact value, which is found by taking the minimum of all of the obtained results. Such an approach turned out to be sufficient in the discussed context.
V Evolutionary algorithm
Having stated the identification problem as an optimization problem in this section, we describe its solution using an evolutionary algorithm based on the classical GA. In order to follow the GA approach, we need to define the individuals’ representation, the population structure, a fitness function for ranking the individuals, but also the selection procedure for reproduction, and finally the crossover and mutation operators. Formally, also halting conditions need to be formulated.
Va Representation of individuals and population structure
Here, the individuals that make up the population are CAs, encoded through the LUT of their local rules, which is possible since the LUT of any CA can be represented as a bitstring of length . We assume that the population consists of CA belonging to , for some .
We consider populations of individuals. By we denote the population of the –th generation of the GA. The population is the initial population, and is constructed by randomly selecting bitstrings. Populations for are the outcomes of applying the genetic operators, according to the rules described in the remainder of this section.
VB Fitness function
The fitness function is directly related to the error measure defined by (9). Although Proposition 7 states that the error measures given by (6) and (9) can be used interchangeably, preliminary experiments showed that the later results in efficient and convergent algorithm, while suboptimal results were obtained using the measure given by (6). This follows from the fact that the error in row is affected by errors appearing in rows . As we know from the research on dynamical properties of CAs, small initial perturbations can strongly affect the final system state [20]. For that reason, it is easier to optimize with a GA as compared to .
Let be a LUT of some local rule which defines a CA . Then denotes the fitness of , and is defined as:
(10) 
The fitness function takes integer values from 0 up to , i.e. there are finitely many possible values of the fitness function. The goal of the GA is to maximize fitness, and a CA with a maximal fitness value is a solution of the identification problem. From the above, it is clear that if is close to zero, solving the problem is infeasible, since the number of possible values is very small and the population is not able to gradually increase its fitness. Additionally, if , then the problem is trivial because every CA is a solution.
The fitness defined by (10
) has proven to work effectively, but the computing time needed for its evolution becomes unacceptable if the observation set is large. Therefore, during the evolution, to estimate the value of
we use for some nonempty subset . We start by randomly selecting elements for the subset . Subsequently, but before evolving a new population we replace one of the elements in the subset with a randomly selected observation from . Due to Fact 4 we are sure that such an approach does not result in reducing the solution set.VC Selection operator
Having defined the fitness function, we can define the selection operator, which is responsible for selecting the parent individuals that will be used to produce the next generation. We use a random selection method where the selection probability of a given individual is proportional to its fitness. Individuals are selected with replacement, i.e. individuals might be selected multiple times for reproduction.
VD Crossover operator
To produce offspring, we select two parents according to the procedure described in Subsection VC. A uniform crossover operator is used, i.e. if denote parents, the outcome of the crossover operator is a vector with values that are randomly selected from and , i.e. .
VE Mutation operator
Finally, the offspring individual is mutated. A simple bitflip mutation is being used, i.e. for every position of the vector a decision is made whether or not the value should be flipped, with being the probability of flipping the value. The expected number of flipped positions in the population is .
VF Elite survival
After evolving a new population, the elite survival procedure is applied. Our experiments proved that such an approach is required to reach convergence. The procedure is implemented by a deterministic selection of fittest individuals from the previous population used to replace randomly selected individuals in the newly evolved one.
Including this elite survival process can dramatically increase the performance of the algorithm, though there are cases where such an approach causes the population to progress towards a local optimum. To overcome this, we apply a simple, adaptive procedure that deactivates elite survival in cases when the maximum fitness value of the population remained constant for more than generations. The elite survival procedure is again switched on after a predefined number of generations, or if the maximum fitness improved.
VG Halting conditions
The algorithm evolves by generating populations according to the procedure described above until a maximum, predefined number of populations was evolved or, if a CA that fits the observation set was discovered.
As mentioned in Subsection VB during the evolution, the fitness is approximated by for some , which is effective for selection, but can not be used in the halting condition since does not imply . Therefore, for the individual with the highest value , we additionally calculate and base the halting condition on it, i.e. the algorithm stops as soon an element is found.
Vi Results of experiments
By means of our experiments we verified to what extent the partiality of observations affects the efficiency of the GA in terms of the number of GA iterations required to find a solution.
In this experiment, the GA evolution is based on observation sets for and ECA . The integer will be referred to as the problem number. The observation set is a set of observations obtained from different, random initial conditions common for both , by selecting subsequent configurations of ECA generating time gaps of random length from 1 to . The set for is built from observations belonging to by modifying them in such a way that randomly selected, completely observed entries are replaced by “?”. In other words, by increasing the effect of spatial partiality is increased. As a result of such a procedure we obtained a series of observation sets , for which it holds . The identification algorithm was then executed for each of the obtained observation sets.
Given that the family of ECAs contains only 256 members, the identification problem would be relatively easy to tackle, so we set the radius , i.e. the population contains local rules with radius represented as bitstrings of length 32. Without this modification the algorithm is able to find a solution in a few iterations, by examining the entire search space.
In order to account for the stochastic nature of the GA, the experiment is repeated times for each , . The values of of the GA parameters used in our experiment setup are shown in Table IV.
param  value  description 

rule radius  
probability of flipping 1bit in mutation  
number of individuals in population  
elite size  
bound for the time gap length  
number of rows / columns in each observation  
number of observations  
number of cells being removed from each observation set  
number of samples for fitness approximation  
maximal number of the GA populations  
number of repetitions of the GA per rule 
The results vary significantly depending on the rule in question, which is not surprising since the dynamics of ECAs 150 and 180 is different. The normalized Maximum Lyapunov Exponent (nMLE) [21, 22, 23] of the former is the highest among all of the ECAs, and thereby this CA’s behavior may be considered complex. In contrast, the nMLE of ECA 180 is only approximately 0.48, which hints that, in some sense, the behavior of this ECA is simpler than the one displayed by ECA 150. The differences in the overall dynamical complexity of these two CAs can be acknowledged by examining their spacetime diagrams, which are depicted in Fig. 1 and Fig. 2.
To understand the performance of the GA, we first checked for which the algorithm was able to find a solution (Fig. 3). When comparing the plot for ECA 150 with the one for ECA 180, it is clear that the identification problem turned out to be much more challenging for ECA 150. Indeed, for this ECA, the algorithm was effective only if the number removed observation elements was smaller than , whereas it mostly failed when more spatial partiality was added. Besides, even for close to 0, not all of the GA executions were successful. In contrast, identifying ECA 180 was always possible for , but for we see a sudden drop in the performance. Note that in both cases, for a solution was easily found, since for this setting the problem is trivial, i.e. almost all CAs can be considered a solution.
The above results suggest that, depending on the dynamical characteristics of the CA in question, the maximum allowable number of missing elements in the observations differs. Further research is undertaken to better understand the link between the identifiability and dynamics of CAs.
Figure 4 depict the minimum, average and maximum number of GA iterations among the runs resulting in a solution for ECA 150 and ECA 180, respectively. In the case of ECA 180, we see that the efforts needed for finding a solution grows as increases, up to the point where it becomes impossible. Furthermore, we see that in most cases the difference between maximal and minimal values is relatively low. In the case of ECA 150, the results are much less stable. The differences between maximal and minimal values are substantial, and the efforts needed to find the solution do not steadily grows with the growing spatial partiality. The only similarity between the two CAs seems to be in the fact that there exists some critical beyond which the problem becomes impossible to solve.
Summary
In this paper we introduced the identification problem of CAs in the context of partial observations. An evolutionary algorithm for tackling the problem was presented, and its performance was verified for the two ECAs. The initial experiments suggest that the difficulty of the identification problem is somehow linked to the dynamical complexity of the CAs. The problem and solution algorithm presented in this paper, should be considered as one of the first steps in identifying CAs from data originating from realworld phenomenon observations. Unavoidably, such observations will be somehow incomplete in the sense that it is impossible to continuously track the involved processes.
References
 [1] S. AlKheder, J. Wang, and J. Shan, “Cellular automata urban growth model calibration with genetic algorithms,” in Proc. Urban Remote Sensing Joint Event, 2007. IEEE, 2007, pp. 1–5.
 [2] E. Sapin, L. Bull, and A. Adamatzky, “Genetic approaches to search for computing patterns in cellular automata,” IEEE Comput. Intell. Mag., vol. 4, no. 3, pp. 20–28, 2009.
 [3] P. L. Rosin, “Image processing using 3state cellular automata,” Comput. Vis. Image Underst., vol. 114, no. 7, pp. 790–802, 2010.
 [4] F. C. Richards, T. P. Meyer, and N. H. Packard, “Extracting cellular automaton rules directly from experimental data,” Physica D: Nonlinear Phenomena, vol. 45, no. 1, pp. 189–202, 1990.

[5]
M. Mitchell, J. P. Crutchfield, and R. Das, “Evolving cellular automata with
genetic algorithms: A review of recent work,” in
Proceedings of the First International Conference on Evolutionary Computation and its Applications (EvCA’96)
, 1996.  [6] T. Bäck, R. Breukelaar, and L. Willmes, “Inverse Design of Cellular Automata by Genetic Algorithms: An Unconventional Programming Paradigm,” in Unconventional Programming Paradigms, ser. Lecture Notes in Computer Science, J.P. Banâtre, P. Fradet, J.L. Giavitto, and O. Michel, Eds. SpringerVerlag, 2005, vol. 3566, pp. 161–172.
 [7] E. Sapin, O. Bailleux, and J.J. Chabrier, “Research of a cellular automaton simulating logic gates by evolutionary algorithms,” in Proceedings of the 6th European conference on Genetic programming, ser. EuroGP’03. SpringerVerlag, 2003, pp. 414–423.
 [8] S. Bandini, S. Manzoni, and L. Vanneschi, “Evolving robust cellular automata rules with genetic programming.” in Automata, A. Adamatzky, R. AlonsoSanz, A. T. Lawniczak, G. J. Martínez, K. Morita, and T. Worsch, Eds. Luniver Press, Frome, UK, 2008, pp. 542–556.
 [9] K. Maeda and C. Sakama, “Identifying cellular automata rules,” J. Cellular Automata, vol. 2, no. 1, pp. 1–20, 2007.
 [10] D. Andre, F. H. Bennett III, and J. R. Koza, “Discovery by genetic programming of a cellular automata rule that is better than any known rule for the majority classification problem,” in Proceedings of the First Annual Conference on Genetic Programming. MIT Press, 1996, pp. 3–11.
 [11] C. Ferreira, “Gene expression programming: a new adaptive algorithm for solving problems,” Complex Systems, vol. 13, no. 2, pp. 87–129, 2001.
 [12] X. Liu, X. Li, L. Liu, J. He, and B. Ai, “A bottomup approach to discover transition rules of cellular automata using ant intelligence,” Int. J. Geogr. Inf. Sci., vol. 22, no. 1112, pp. 1247–1269, 2008.

[13]
L. Bull and A. Adamatzky, “A learning classifier system approach to the identification of cellular automata,”
J. Cellular Automata, vol. 2, no. 1, pp. 21–38, 2007.  [14] A. Adamatzky, Identification Of Cellular Automata. Taylor & Francis Group, 1994.
 [15] Y. Yang and S. A. Billings, “Extracting Boolean rules from CA patterns.” IEEE Trans. Syst., Man, Cybern. B, vol. 30, no. 4, pp. 573–580, 2000.
 [16] ——, “Neighborhood detection and rule selection from cellular automata patterns,” IEEE Trans. Syst., Man, Cybern. A, vol. 30, no. 6, pp. 840–847, 2000.
 [17] X. Sun, P. L. Rosin, and R. R. Martin, “Fast rule identification and neighborhood selection for cellular automata,” IEEE Trans. Syst., Man, Cybern. B, vol. 41, no. 3, pp. 749–760, 2011.
 [18] W. Bołt, J. M. Baetens, and B. De Baets, “Identifying CAs with evolutionary algorithms,” in Proceedings 19th International Workshop on Cellular Automata and Discrete Complex Systems (AUTOMATA 2013) – Exploratory Papers, September 2013, pp. 11–20.
 [19] S. Wolfram, “Statistical mechanics of cellular automata,” Rev. Mod. Phys., vol. 55, pp. 601–644, Jul 1983.
 [20] J. M. Baetens and B. De Baets, “Phenomenological study of irregular cellular automata based on Lyapunov exponents and Jacobians,” Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 20, no. 3, 2010.
 [21] S. Wolfram, “Universality and complexity in cellular automata,” Physica D, vol. 10, pp. 1–35, 1984.
 [22] M. Shereshevsky, “Lyapunov exponents for onedimensional cellular automata,” Journal of Nonlinear Science, vol. 2, no. 1, pp. 1–8, 1992.
 [23] F. Bagnoli, R. Rechtman, and S. Ruffo, “Damage spreading and Lyapunov exponents in cellular automata,” Physics Letters A, vol. 172, no. 12, pp. 34–38, Dec. 1992.