1 Introduction
Classical Molecular Dynamics [MD] and other molecular mechanics simulations have become important computational tools in the nanoscale design of novel materials, and have brought insight into structure and function of biomolecules. All such simulations require accurate and computationally efficient forms for the interatomic potential/force function. Forcefields that serve that role are typically physicsintuitionbased functions of interatomic distances and bonding structure. Classical forcefield functions represent extensive work in invention and validation, using physical insight combined with experimental and other sources of information. Computationally, construction of a forcefield can be viewed as a fitting process, where successful functional forms of the forcefield best ”fit” the available data. Given a clear metric of ”fitness”, fitting a classical forcefield is a welldefined optimization procedure. It requires a definition of a mutable representation for the function and parameters, a quantifiable ”fitness” criterion, and a set of ergodic evolution operators.
Until now, all the functional forms for such functions were generated by physical intuition, and only their parameters (i.e. multiplicative constants, additive constants, and exponents) were optimized. Each such process, done manually, represents many manyears of highly qualified labor, and often meets with failure. This important and laborious task clearly calls for automation.
In the case of parameteronly fitting, a useful representation for the problem is an ordered set of real numbers, with each number acting as a parameter for a fixed functional form. The evolution operators, in this case, are generalized linear transformations that treat the ordered set of numbers as a vector in real space and search for the vector’s optimal size and direction. A number of research efforts have attacked the partial problem of automatically fitting the numerical parameters in the fixed functional forms, successfully obtaining better fits to the objective function based on a training set.
As we become increasingly interested in complex multispecies systems, and look more critically to the quantitative prediction of their material properties, stringent requirements lead to the need for more complicated functional forms. Optimization of the functional form itself requires a substantial change in the methodology. Such more complicated functions are embedded in a very large combinatorial space. Researchers find it difficult, if not impossible, to develop an intuition for the relationships between the various functions and the everexpanding training set. An automatic functional form fitting method is clearly called for.
We use the Genetic Programming formalism as a starting point for such a fitting algorithm. Genetic Programming [GP] [1] uses a library of elementary operators to build a population of hierarchical computer programs, represented as operator trees. We find that this description of the functional form provides us with maximum generality of the functional expression. We choose a minimal set of algebraic operators:
. If only the algebraic operators are used, the resulting tree becomes an algebraic expression with variable or constant number input at the leaves. The population of such trees is evolved using a Genetic Algorithm [GA] to improve the fitness of the tree population according to a userdefined fitness criterion. A small set of mutation operators provides a way to evolve the trees in the functional space. We describe the model in more detail in section
2.1.In the fitting problem, the training set, which underlies the fitness function, requires careful definition, robust design, and much labor. The objective function is typically based on a set of sets of atomic positions with an associated set of observables, obtained either experimentally or via ab initio simulations. For validation purposes, we construct a manufactured training set using an existing classical forcefield. Such a manufactured solution provides the advantage of a clearly defined objective in terms of a known answer. We leave the somewhat philosophical discussion of the difference between the manufactured and a real system until section 4. We choose the LennardJones pair potential as the basis of the training set. The training set is a set of boxes randomly populated with atoms. The potential energy of each box is calculated by summing the the LennardJones pair potential function over all pairs of atoms within a specified cutoff distance. The list of pairwise distances for each box together with the associated box energies serve as a basis for the fitness function, that is in turn used to evaluate the fitness of the trees in the algorithm. The details of the training set and fitness function construction can be found in section 2.3, while the overall algorithm is described in section 2.
The LennardJones functional form, which is the target function, minimally appears as a fourlevel tree in our representation this is wrong. The space of all possible trees of this size is very large (see section 4). The energy landscape is rough, discontinuous, and littered with local minima. We have developed a powerful optimization method which is a hybrid between a GA and Parallel Tempering Metropolis to enable efficient search on a massively parallel cluster architecture. The method evolves a set of populations of trees according to a scheme that combines mutation with a Metropolis Monte Carlo algorithm. Each population is assigned a different effective temperature and trees are periodically exchanged between populations. The details of the algorithm can be found in section 2.2.
Searching for several hours on processors, the algorithm reproducibly discovered the LennardJones functional form, after traversing only a very small fraction of the total search space. We believe that this demonstrates the effectiveness of this algorithm as a method for unsupervised development of forcefield functional forms. We believe that this is a first demonstration of unsupervised forcefield functional form fitting. For more details, we refer the reader to section 4.
2 Method
In general terms, our method iteratively refines an overall population of candidate GP trees with a convergence criterion defined by the fitness of the best individual tree in the population (Fig. 1). The overall population is randomly generated at the start and is iteratively refined using our Parallel Tempering Genetic Program method (section 2.2).
The trees are permuted using a set of tree evolution operators, as described in section 2.1, and move to the next generation according to their fitness, described in section 2.3. The ”fittest” individuals are monitored to detect convergence. Once convergence is obtained, the best individual tree is output.
2.1 Functional and Genetic Programming
Functional Programming [2, 3] is a programming style that approaches computation as a hierarchical evaluation of functions. The concept of a firstorder function, meaning a function that operates on functions, was defined in this context, though it has been widely used before without formal definition. For example, a derivative function takes one function to another function. In this formal setting, Genetic Programming is a case of a firstorder function, that typically operates on pure (no sideeffects) functions to construct and refine composite computer programs.
Genetic Programming methodology was originally developed by Koza [1] to enable automatic generation of computer programs. The highlevel strategy builds a population of random programs, computationally represented as nested trees of pure functions, as depicted in Figure 2, and then iteratively refines this population using a GA and a fitness function that operates on a single tree. The refinement relies on existence of mutation firstorder operators that change the structure of a tree. Such operators work on a tree either locally (mutation), or more drastically (crossover). Though the terminology for the mutation operators is borrowed from Genetic Programming, where such operators are reasonably welldefined, their description is much fuzzier in the case of a function tree.
Since we are attempting to build an algebraic expression for a forcefield, all our pure tree elements are unary or binary algebraic functions. A composite tree, traversed in the datapassing sense, from the leaves to the root, simply evaluates an algebraic expression. However, in general, as well as in our code, the pure operators are not so constrained. The library of such operators can consist of logical, calculus, transform, and any composite operators that can be encapsulated into a functional form.
The algebraic trees, that represent the functional form of the forcefield become the object of subsequent optimization. The optimization is based on an importance sampling Metropolis Monte Carlo [4]. This is an iterative process that attempts to improve a population of trees by evolving a new ”better” population in each refinement step. Evolution happens in three stages: generation, mutation, and testing. The first stage produces new trees from old trees. The second stage randomly mutates some of these trees. In the testing stage, we compare the fitness of each of the
new trees with one of the old trees and pick one or the other based on the Metropolis acceptance probability.
In the generation stage, each new tree is created by either by passthrough or crossover, with equal probablility. Passthrough involves selecting the fittest tree that has not already been passed though from the old population and copying it into the new population. Crossover involves creating a new tree by combining two parent trees selcted from the old population. Tournament selection is used to choose parent trees from the old population: four trees are selected with equal probability from the old population, and the tree with highest fitness is selected. To perform the actual crossover operation, a depth level from the first parent is selected, with the restriction that the node is not the root (which would not mix up the trees at all) or at maximum depth (which would not cause enough change). A depth level is then selected from the second parent, with the same restrictions. A randomly chosen subtree rooted at the selected depth level is then chopped from the first parent, and replaced with a randomly chosen subtree rooted at the selected depth level in the second parent, producing one new child tree containing parts of both parents. An additional restriction is that the child tree must satisfy the minimum and maximum allowable tree depths.
In the mutation stage, each new tree is either mutated or left alone with equal probability, without regard for fitness or how the tree was created. On a tree that is mutated, a node is selected, which can include the root or a leaf. The node is selected with equal probability from all nodes, meaning that there is a higher probability to select a node near the leaves than to select a node near the root; this was done to give a preference for small adjustments to the parameters rather than drastic changes to the entire functional form. The subtree rooted at this node is chopped, and a new random subtree is made from scratch, restricted to keeping the entire tree within legal depth limits. Thus, on average 1/4 of the new trees are made by crossover, 1/4 by crossover and mutation, 1/4 by mutation alone (passthrough followed by mutation), and 1/4 are retained from the previous generation.
In the testing stage, the old trees (which are ordered by fitness) and the new trees (which are in the order they were created, largely random with regards to fitness) are compared headtohead. Metropolis compares each old tree against the new tree in the same position in the list. This is a simple variant of Tournament Selection. The new tree is accepted into the population if it is better then an old one, and only occasionally otherwise according to the Boltzmann probability:
(1)  
(2) 
2.2 Parallel Tempering
Parallel Tempering (PT) was originally introduced by Swendsen and Wang [5] to deal with the local traps of spin glass energy surface. Their technique uses replicas of the system each at a different temperature and exchanges partial state information between replicas. The fundamental idea is to use the hightemperature replicas to sample the system phase space at a coarse level with the lowtemperature replicas refining the states in local traps. In this way, a hybrid of local and global sampling is achieved. Later changes to the method replace partial information exchange with a complete state swap. Many parameters of the method have come under scrutiny since. For a review of the recent developments, see [6].
In our method, a single replica is a population of candidate trees. The replicas exchange information with their nearest neighbors in the temperature space (Figure 3) by swapping random (or selected) trees with a probability based on the tree’s relative Boltzmann weights with as the fitness of the tree selected in replica at a temperature :
(3)  
(4) 
We initialize the initial temperature distribution for replicas as either linear or logarithmic between upper and lower bounds. We can also allow for a dynamic temperature adjustment using acceptance/rejection ratios as a guide to perturbing individual temperatures.
2.3 Training Set and Fitness Function
2.3.1 Training Set
The training set is a series of 3dimensional domains (boxes) randomly populated with atom positions. Each box has an associated box energy which depends on the positions of all the atoms in the box. In the current work, the box energy is given by a sum of pair potential functions, evaluated for all pairs of atoms lying within a given cutoff distance. Periodic boundary conditions are used and pairs of atoms closer than a minimum interaction distance are ignored. The pair potential was chosen to be the wellknown LennardJones potential function [7].
(5)  
(6)  
(7) 
where and are energy and length parameters.
The procedure produces box energies, one for each box. The set of the atomic coordinates for each box with periodic boundary conditions, and the corresponding set of surrogate target box energies become the training set.
2.3.2 Fitness Function
Fitness of a tree is directly based on the training set of boxes. The fitness function for a tree is evaluated by computing the total energy for each of the boxes using that tree,
(8) 
where is the pair potential function represented by the tree. The fitness of the tree is then given by
(9) 
where the minus sign is required to have increasing fitness correspond to decreasing error.
3 Results
The purpose of this study was to test whether the GP approach was capable of discovering accurate potential energy functions that replicate the true potential energy surface. Typically, information about the true potential energy surface is obtained by using quantum mechanics calculations to evaluate the energy of small configurations of atoms. Hence, we chose as our test case 10 boxes containing 10 randomly positioned particles. In order to construct a problem with a known global optimum, the energies of the boxes were calculated using the LennardJones pair potential, as described above. Each box had dimensions of . The particles were placed randomly in the box, but no particles were allowed to come closer than All pairs distances in the range were recorded and used to compute the target box energy, taking periodic images into account. The values of were set to unity. This resulted in about 60 pair distances per box. By varying the random number seed, we generated four independent test cases. For each test case we executed two different parallel tempering optimizations, with either or individual trees in each population. In all cases, we used 200 populations with temperatures distributed logarithmically from 0.1 to 10 (the units of temperature are the same as those of the fitness function i.e. ). All trees were required to have minimum depths of 3 and maximum depths of 4. The calculations were run on a cluster of 100 AMD Opteron 2.2 GHz processors with Quadrics interconnects. For the runs with 10,000 individuals per population, each generation required about 100 seconds. For 50,000 indiviuals, the time per generation was about 5 times longer. Most of this time was spent in the evaluation of box energies.
The results of the runs are stochastic, both in the initial conditions and the optimization method, and so we see a distribution of behaviors. However, most runs successfully found an algebraic equivalent of the original target function. Algebraic equivalence here means that the functional form can be transformed to the exact LennardJones form by a sequence of algebraic transformations. Three such algebraic equivalents are displayed in Figure 4.
Fig. 5(a) shows the average square error for the overall fittest tree for each generation. A total of 8 runs are shown. The dashed lines indicate four independent runs with individuals in each population, each using different initial populations and different test boxes. The solid lines indicate four independent runs with individuals in each population, each using different initial populations and the same test boxes used for the first four runs.
Of the four independent runs with , three of them successfully found algebraic equivalents of LennardJones function. The residual average square error of approximately can be attributed to machine error. The fourth run failed to find an algebraic equivalent, even after 400 generations. However, it did find several functions which are good approximations to the LennardJones function, but have quite different functional forms e.g. the tree shown in Figs. 4(a), 4(b), and 4(c) .
In the case of the larger populations, , all four runs found algebraic equivalents, and did so in substantially fewer generations. Clearly, the greater diversity of these larger populations improves the search efficiency.
Fig. 5(b) shows more clearly how one of the runs progressed. Initially fitness improves quite steadily, until a good approximation to the exact LennardJones function is found. After this point, further improvement occurs only sporadically. Eventually, the algebraic equivalent appears.
4 Discussion
The problem of finding the correct tree in the space of all possible trees of a given size is made difficult by the sheer size of the search space. Our algorithm, starting with an initial total populations of trees, in most cases found the global optimum in less than generations. This provides an upper bound for a number of trees surveyed of . Compare this figure to the number of possible trees of depth .
We will ignore the unary operator , and ignore are trees that are not maximal. Then, for binary operators, and a maximum depth of , the number of possible trees at operatoronly level is given by:
The number of leaves holding integer constants on the range or the input variable is given by
In our case , and so the total number of possible trees is,
Hence we find that an upper bound on the ratio of the total space to the searched space is roughly . Clearly, this is a nontrivial problem. For this reason, it is not surprising that we appear to be the first group to attempt an unsupervised automated approach to finding functional forms for forcefields. Previous efforts at unsupervised forcefield fitting have been restricted to parameter optimization, given a fixed functional form. We succeeded because we were able to use relatively large computational resources, and we used a very robust optimization method.
It is important to emphasize that while the surrogate test problem used in this study had a known solution, it nonetheless was representative of many potential energy surfaces where accurate functional forms are unknown. We deliberately chose a training set that closely resembled data generated by quantum density functional theory calculations of potential energy. Given the success of the method in the current study, we now intend to apply the method to some of the many atomic systems for which existing forcefields have been found lacking.
References
 [1] J. R. Koza. Genetic programming: routine, humancompetitive, highreturn machine intelligence. AISB Quarterly, (114):5 –, 2003.
 [2] J. Hughes. Why Functional Programming Matters. Computer Journal, 32(2):98–107, 1989.

[3]
L. Allison.
Models for machine learning and data mining in functional programming.
Journal of Functional Programming, 15:15 – 32, JAN 2005.  [4] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equation of state calculations by fast computing machines. Journal of Chemical Physics, 21:1087 – 1092, USA 1953.
 [5] R. H. Swendsen and J. S. Wang. Replica monte carlo simulation of spinglasses. Physical Review Letters, 57(21):2607 – 9, NOV 1986.
 [6] D. J. Earl and M. W. Deem. Parallel tempering: theory, applications, and new perspectives. Physical Chemistry Chemical Physics, 7(23):3910 – 16, 2005.
 [7] J. E. LennardJones. Cohesion. Proceedings of the Physical Society, 43:461 – 482, UK 1931.