Evolutionary Model Discovery
Agent-based modeling has been criticized for its apparent lack of establishing causality of social phenomena. However, we demonstrate that when coupled with evolutionary computation techniques, agent-based models can be used to evolve plausible agent behaviors that are able to recreate patterns observed in real-world data, from which valuable insights into candidate explanations of the macro-phenomenon can be drawn. Existing methodologies have suggested the manual assembly and comparison or automated selection of pre-built models on their ability to fit patterns in data. We discuss the cons of existing manual approaches and how evolutionary model discovery, an evolutionary approach to explore the space of agent behaviors for plausible rule-sets, can overcome these issues. We couple evolutionary model discovery with concepts from the Agent_Zero framework, ensuring social connectivity, emotional theory components and rational mechanisms. In this study, we revisit the farm-seeking strategy of the Artificial Anasazi model, originally designed to simply select the closest potential farm plot as their next farming location. We use evolutionary model discovery to explore plausible farm seeking strategies, extending our previous study by testing four social connectivity strategies, four emotional theory components and five rational mechanisms for a more complex human-like approach towards farm plot selection. Our results confirm that, plot quality, dryness and community presence were more important in the farm selection process of the Anasazi than distance, and discover farm selection strategies that generate simulations that produce a closer fit to the archaeological data.READ FULL TEXT VIEW PDF
Public policy making has direct and indirect impacts on social behaviors...
Artificial Intelligence techniques such as agent-based modeling and
We investigate the effects of social interactions in task al- location u...
Opinion dynamics - the research field dealing with how people's opinions...
Why are online community sizes so extremely unequal? Most answers to thi...
This work researches the impact of including a wider range of participan...
Agent-based models (ABMs) are widely used to gain insights into the dyna...
Evolutionary Model Discovery
Agent-based modeling, a generative modeling and simulation methodology has enabled researchers and engineers alike to replicate and predict emergent properties of complex systems in a multitude of domains. Agent-based models center around the principle that macro-phenomena in complex systems are not merely describable by the sum of the individual parts, but also the interactions between them, a dynamic easily captured through computational simulation.
Despite efforts to standardize the practice of agent-based modeling such as the ODD protocol, stylized facts matching, and pattern-oriented modeling, there is still room for improvement in standardizing the design and development of ABMs. One criticism in particular is that agent-based models are unable to provide a complete set of explanans of social behavior(Grüne-Yanoff, 2009).
Moreover, modelers are criticized for making assumptions or testing a single or limited set of hypotheses as the cause of a macro-phenomenon. These modeled assumptions and hypotheses may be sufficient to replicate a particular phenomenon, however, they are usually only one of a large set of possible causes, (or explanans (Elsenbroich, 2012)). Modelers then typically go on to validate their models through calibration. Yet, these models are open to the risk that calibration may actually be fine-tuning behaviors different from the ground-truth behavior in the actual social system. Therefore, the modeler must ensure, at least to some degree of confidence, that the rule-set itself that causes a certain agent behavior is in fact verifiable and that they are not unknowingly ‘force-fitting’ a parameter configuration onto a model which does not embed the actual ground-truth behavior.
The deceptive nature of social entities and the difficulties of performing experiments on social systems (e.g.: ethical barriers, biased sampling, lack of honest responses to data collection techniques) makes it near impossible to determine the actual ground-truth behaviors driving social agents. However, we argue that when coupled with evolutionary computation, agent-based models that embed explanans to social phenomena can be evolved as candidate models, offering social scientists insights into the actual ground-truth. We demonstrate the explanatory power of this methodology (which we will refer to as evolutionary model discovery or EMD) by learning alternate farm-seeking strategies for the Artificial Anasazi in (Gunaratne and Garibay, 2017) using genetic programming. By creating a set of genetic programming syntax nodes, where leaves represent sensory information of the environment and functions represent combination, negation, minimization and maximization, we were able to evolve farm seeking strategies quite different to that used by Artificial Anasazi agents in the original model by (Dean et al., 2000), where agents merely chose the next closest potential farm plot. These strategies, though sometimes counter-intuitive, achieved comparable fitness to the archaeological data to the best calibrated version of the original model (Stonedahl and Wilensky, 2010b) and showed a generally better qualitative match to the real-world data.
In this paper, we extend our previous work by expanding the function set of agent behavior components used by EMD to include social connectivity, emotional theory components and rational mechanisms, as opposed to merely rational mechanisms. This decision was inspired by the Agent_Zero framework, a computational social agent architecture, built on neurocognitive foundations (Epstein, 2013). EMD is then able to evolve candidates with behaviors defined by rule-sets built on appropriate combinations of social connectivity, and emotional and rational components. In particular, we explore the Artificial Anasazi farm seeking strategy once more, but this time include four different social connectivity configurations for information on potential farm plot availability, four emotional theory components (including two theories of homophily, need for community and fleeing/migration), and the five rational comparators used in (Gunaratne and Garibay, 2017)
. EMD uses genetic programming to evolve combinations of these nodes and automatically generate Netlogo models of the Artificial Anasazi, modifying farm seeking strategy, and selecting the fittest strategies each generation for reproduction, minimizing the L2-error between the simulation output, a time-series of population, against actual archaeological population records. The final candidate models of this search are then calibrated by BehaviorSearch, a genetic algorithm calibration tool provided with Netlogo, similar to that performed on the original model in(Stonedahl and Wilensky, 2010b).
Our results demonstrate that the farm selection strategy was more intelligent than implemented in (Dean et al., 2000). EMD allowed us to make a component-wise analysis of impact on farm plot selection, indicating that high quality had great positive impact on selecting a farm plot, social presence a moderate having a moderate positive influence, and dryness having a considerable negative impact on farm plot selection overall. We also see that a majority of the candidate models from the genetic programming search recommended that full information of the Valley would have been available to the Anasazi. When calibrated, the best performing model, one that seeks to chose plots with higher quality, had significantly better fitness than the calibrated original farm selection strategy of selecting the next closest plot to farm on.
EMD demonstrated the following advantages in terms of establishing the social connectivity, emotional theory components and mechanisms of social phenomena:
1. There is no requirement for repeated implementation of models, reducing the probability of error during implementation. In pattern-oriented modeling and even model selection(Stratton, 2015) entire models or agent rule-sets have to be developed separately. This leaves room for errors, and even discrepancies in the code which can potentially have large effects on the final results produced by the candidate models.
2. Generated theories consist of rule-sets built of simple comparable components. Instead of having to compare two completely different implementations, the modeler is assured that the only differences between candidate models is at the decision under doubt, for which evolutionary model discovery is applied. This ‘gap’ in the model will be filled with a combination of mechanisms and social/emotion theories and operators which can easily be compared against each other. As shown in the discussion of results of this paper, comparability of candidate models is important in drawing insights into the causes of the social phenomena being observed.
3. Replace limited hypotheses modeling and assumptions with the evolution of plausible social theories. The social plausibility discovery technique is able to discover combinations of micro-behavior which are able to generate patterns observed in data. The technique does not depend on selecting from a set of predefined models or assumptions. Instead, it is able to put together its own micro-behavior rule-sets in the form of a decision tree and
4. Efficient at searching the vast landscape of possible combinations of behavior rules. Social theories are encoded in ABMs as utility functions, thresholds, or decision trees. With a complete function and terminal set, a genetic program is able to evolve any of these program structures. Crossover operations and mutations allow genetic programming to quickly explore different possibilities in the search space of all possible rule-sets while the fitness function maintains exploitation by selecting the fitness models in every generation for production of the next.
Parameter calibration is a common technique towards validating agent-based models. However, simply calibrating the parameters of a model does not guarantee the causality behind the social behaviors of the agents matches that of individuals in the real-world.
In pattern oriented modeling, Grimm et al. discuss the manual comparison of multiple models, embedding different agent theories, against multiple patterns of data. Models closely fitting these patterns are selected as the most plausible ones (Grimm et al., 2005).
Stylized facts matching is a similar technique, used particularly in the agent-based economics literature. It involves matching multiple economic ‘facts’, qualitatively, for example by recreating shapes of wealth distributions (Gatti et al., 2005).
More recently, automated model selection has been demonstrated using genetic algorithms by Stratton (Stratton, 2015). In his thesis, Stratton describes the requirements of converting theories into a rule-based agent-based model and using a genome to encode the theory to be deployed by the agent, the type of utility function to by used and the various exogenous parameter values of the agent. A genetic algorithm then searches for the best candidate model and corresponding parameter values.
Collectively, the existing approaches are prone to the following: Models must be reimplemented allowing for programming errors and discrepancies between implementations (differences in scheduling, path dependency and parameter limits), which can cause unfair comparison of models. Secondly, when implementation of candidate models or agent theories are completely independent it becomes difficult to perform in-depth, rule-to-rule analysis between candidate models. For example: the rule component of a behavior in concern might be implemented as a utility function in one model and as a node in a decision tree in another, making it difficult to isolate and compare importance of that particular rule to the behavior when comparing performance of the two candidates.
This brings us to the question: can we exploit agent-based models to conduct causal analysis of social systems? Many social simulations begin with the modeler defining the rules governing the agent behavior. These rules are usually defined in reference to expert knowledge, modeling around extant theories.
However, agent-based models have a much greater potential in exploring the causality of social phenomena. Agent-based models can be constructed in such a way that the rules governing the behavior of agents are ’pluggable," or can be combined and recombined to explore plausible rule-sets which are able to reproduce patterns similar to that seen in real-world data. Grimm and colleagues discuss this in (Grimm et al., 2005).
Genetic programming first introduced by Koza, aims to achieve machine driven program development and is a suitable approach towards automating the causal discovery process (Koza, 1992). Genetic programming performs the evolution of a program through crossover and mutation operators performed on a representation consisting of nodes that define program statements. Node are defined as a set of function and terminal nodes that encode program statements and may be strongly typed to only accept child and parent nodes that are compatible with the arguments and return statements accepted by its program statement, respectively. The syntax tree representation is perhaps the most common representation used in genetic programming and arranges the nodes into a tree structure, where a node and its children define a section of the code within the program statement defined by its parent node.
In (Gunaratne and Garibay, 2017), we demonstrated how genetic programming can be used to inform plausible comparisons of sensory data available to agents in the Artificial Anasazi model to learn plausible rule-sets for the farm plot selection sub-model. However, this study merely considered possible rational comparisons of sensory data available to Artificial Anasazi agents under full information of their environment.
Social behavior is less likely to be completely rational. In Agent_Zero, Epstein introduces a computational social agent cognitive architecture based on neurocognitive foundations. The Agent_Zero architecture has three dimensions to the decision making process, namely social, emotional and rational thinking. Similarly, we behavior discovery process to incorporate these dimensions.
Agents are considered to try to maximize or minimize a utility value or disposition calculated in the three dimensions above. The social component is expressed through the various social connectivity configurations the agent could be placed in defining the sources of information it would use when performing an action. The emotional components encode social/emotional theories that can be used in combination to compare input from the social connections to calculate an emotional utility value. The rational component is defined by logical comparisons of sensory data from the environment and social connections, also aggregated into a utility value. However, unlike in Agent_Zero, we allow the individual emotion and rational components to interact, instead of calculating total emotional and rational utilities separately, allowing for more complex decision making. Mathematically, agent utility U is represented by:
Where, is the set of emotional theory components in the agent’s decision tree, is the set of rational mechanism int he agent’s decision tree. Both are applied on the information supplied through the social connectivity configuration . Below is an example of a possible utility of an agent using the above scheme, where emotional components are denoted with and rational components with and indicates that it uses the first social connectivity configuration of the possible configuration set:
The agent’s behavior can then be represented a minimization or maximization of this utility.
The evolutionary model discovery process performs genetic programming to evolve combinations of emotional and rational components along with a social connectivity configuration, to generate utility maximizations/minimizations that are able to reproduce patterns seen in real-world data. The models that result from the genetic program runs are considered socially plausible candidate solutions to the problem at hand and, once calibrated, can be considered together with their parameter sets as possible explanations for the social phenomena being studied.
Netlogo is one of the most common agent-based modeling software and is used in the experiments in this study. If an agent were to use the above utility minimization/maximization to select from a set of actions, an example of the above utility maximizationminimization would look like the following psuedocode:
set selection min/max-one-of [comparators]
Where selection is the selected action and comparators is the combination of emotional theory components and rational mechanisms in the form of comparators. These comparators take a property of two or more possible actions and compare the value of this action considering its current state and social information. The comparator outputs a normalized score for the actions, between 0 and 1. An agent with a combination of comparators will combine action scores for a total normalized action score for each possible action and select the action with the maximum or minimum score accordingly as its action for the next simulation step. In particular, we allow for the addition (combination), subtraction (negation), and multiplication of comparator results.
To demonstrate the evolutionary model discovery technique we considered the Artificial Anasazi, a simulation of the Kayenta Anasazi during the years of 800 AD to 1350 AD (Dean et al., 2000). This simulation was developed as part of a large effort on studying the Pueblo culture in the Long House Valley region and the archaeological efforts provided the modelers with population data as annual counts of households that would have existed in the valley during the period studied. Critics of the Artificial Anasazi have argued that the agent-based model is unable to provide a complete explanation of the behavior of the Anasazi (Grüne-Yanoff, 2009). Further, critics have pointed out that the simulation itself is but a single candidate explanation of the social phenomena at hand, the rise and fall of the Anasazi population over time. ABMs tend to produce highly variable output, sensitive to initial conditions and can seem to make identification of causality more difficult by generating more explanations, further diminishing the solution. However, we view this as an advantage as ABM can be used as a test-bed to discover multiple plausible explanations, a vast search space of social behavior which can be traversed by a highly exploratory search algorithm such as genetic programming. We concentrated on a particular sub-model of the Artificial Anasazi, the farm selection strategy. The Artificial Anasazi perform Farm selection when a new household is produced by a household that has enough resources to increase its family size or when the current farm plot is unable to produce enough yield to satisfy the nutrition need of the household anymore. The original model, models the hypothesis that the Anasazi simply select the next nearest potential farm plot to the household’s current farm plot during farm plot selection. A patch must satisfy certain criteria in order to qualify as a potential farm plot (e.g.: be a patch free of farms or households).
Our hypothesis was that the farm selection strategy used by the Anasazi people would have involved more complex decision making, considerations for the state of the potential farm plots available to them, and social and emotional influences of other households around them. The original farm selection strategy was be represented by the syntax tree shown in Figure 1.
Three candidate configurations of social connectivity were included in the search space in addition to full information (full information of all the potential farm plots was used by agents in the original version of the model). These configurations essentially bound the information regarding potential farms that are available to households to particular agentsets, whereas the original model allowed for agents to have full information, essentially assuming that agents knew every potential farm plot in the Long House Valley. 1) Family Inherited information: agents solely depend on information spread through their "family." Families are defined as a household’s parent household, sibling households, any existing grandparents and the household itself. 2) Nearest-neighbor information: agents are restricted to information from the neighbors within a given radius of their current location. 3) Best Performers: Agents rely on information on potential farms solely on the accounts of the best performing households (Considered as leaders) in the Valley. With full information, a household could compare data from all potential farm plots in the valley when making a decision to select a new farm plot.
Included in the emotional component set are two theories of homophily, community, and one of fleeing/migration. Homophily is the tendency for social entities to congregate among those with similar traits. In our study of the Artificial Anasazi we included two Homophilic behaviors. The first, homophily by age; i.e. Households prefer to move towards other households that are similar to its own age, where age is measured as the number of simulation steps the household has survived since splitting from its parent. The second was homophily by corn stock, where households would tend to select farm plots near other households with a similar corn stock to itself. Third, is the need to move near ‘communities’; agents score potential farms with many nearby households higher than those in isolation. The fourth emotional component, a theory of fleeing/migration, scores potential farms that are in a completely different zone than the current one with a full sub-score, while patches in the same zone receive a sub-score of zero.
Rational mechanisms considered for the farm selection process were logical comparisons of sensory data on the potential farm plots known to the households. These comparisons considered: quality, dryness, yield, water availability and distance to the current farm plot.
Twenty genetic programming runs were performed on the farm selection strategy of the Artificial Anasazi agents. The objective of the genetic program was to select models with the lowest L2-error value between the simulated population (number of households) time-series and the archaeological population time-series data. Details on the L2-error minimization can be found in (Gunaratne and Garibay, 2017). The genetic program was run using the ECJ platform developed by George Mason University, 20 times for 50 generations with populations of 20 individuals. Syntax trees of minimum depth 2 and maximum depth 8 were used to avoid trees exhibiting bloat. The Half-and-Half tree builder was used for initialization (Koza, 1992). The 20 genetic programming runs were distributed across performed on two Standard DS1 V2 nodes (1vcpu 3.5 GBmemory) provided by the Microsoft Azure cloud computing platform and took around 72 hours for completion.
Next we selected the five runs with the lowest error after 100 generations of the genetic program for parameter calibration. Only the best five were selected due to the long time needed to calibrate all 20 models. BehaviorSearch (Stonedahl and Wilensky, 2010a), a parameter calibration tool provided with Netlogo was used for the calibration effort. Behavior search was configured to use a genetic algorithm with grey-coding to find the parameter setting for each of the five best models at which the L2 error was minimum. The GA parameter setting was as follows, mutation rate: 0.05, population size: 20, crossover rate: 0.7, population model: generational, tournament size: 3, and fixed sampling of the mean fitness over 5 runs of each individual in the GA population over 150 generations for a total of 15000 model runs for each calibration.
The convergence of the genetic program runs evolving farm selection strategies are displayed in Figure 2. Table 1 provides a summary of the pseudo-code and full Netlogo instructions generated by all 20 runs and the best fitness of the final models at the end of the EMD search. Keeping in mind that minimizing negative compare nodes is equivalent to maximizing the positive compare node, we are able to compare and contrast each tree. In Figure 3 the syntax tree of the model that achieved the greatest fitness from the 20 runs is displayed. The corresponding model uses full information about potential farm plots in the Valley and picks plots with high quality and low yield.
The resulting candidate models of farm plot selection included several different combinations of social connectivity strategies, emotional/social theories and rational mechanisms. Yet, there were clear patterns across these combinations with certain mechanisms being more common than others. One particular model: selecting the potential farm plot with least dryness assuming full information on all potential farm plots in the valley, was the result of three of the genetic program runs. Similarly, selecting the farm plot with the maximum crop quality assuming full information of the valley, occurred twice. The same mechanism , selecting the plot with the highest quality, was also seen with information on potential farms being bounded to potential plots near the household’s neighbors in two more of the candidate runs.
|Run||Max/Min||Social Connectivity||Emotional and Rational Components||Final Koza Fitness|
The five best performing models chosen for calibration were models 14, 6, 3, 4 and 18, in order of descending performance. The distributions of the parameter configurations obtained are displayed in 8. After calibration, each of the five models and the original model were compared by L2-error statistics over 100 runs each. Model 14 proved to be significantly better than the original farm selection strategy at 95% confidence. Figure 4 displays the L2 errors and their 95%confidence intervals. As can be seen, the mean performance of models 14, 6, 18, and 3 all have lower means than the Original as well.
From the four proposed social connectivity schemes, selecting through full information of all the potential farm plots in the valley was by far the most common among the 20 candidate solutions, occurring in nine of the twenty evolved strategies. Selecting based off of information provided through potential farms plots close to neighboring households was the second most common social connectivity strategy occurring five times. Third, selecting the next farm from information bounded to potential farm plots known to ‘family households’ (or inherited information) occurred in four of the resulting trees. Bounding information to potential farm plots close to best performing households was least common, only occurring twice in the candidate solution set.
Out of the three theories of emotional decision making, four out of the twenty candidate models prefer potential farm plots with higher social presence (or an interaction of social presence with another property). However, one model selects lower social presence from potential farm plots near neighbors’ farm plots. Homophily by age was uncommon and actually, occurred inversely in two candidate models, where, it was preferred to move further from farms of similar age than closer to them. Homophily by corn stocks, was also an uncommon characteristic when considering the next farm plot, occurring twice, with both maximization and minimization of the property.
The most common rational property considered when selecting the next farm plot was its quality, being maximized in nine of the candidate models. Selecting the plot with the lowest dryness was also common, occurring in five of the candidate models. Four of the candidate models aimed to select plots producing minimal yield. Comparison of water levels of the potential farm plot was less straightforwards with two models maximizing and two models minimizing the property. Interestingly, none of he candidate models considered distance as a property of concern when selecting the next farm plot.
The evolutionary model discovery process provided useful insights into the possible agent behavior causing observed macro-patterns in social data. We included three types of syntax nodes, following the Agent_Zero framework (Epstein, 2013), including social connectivity configurations, emotional theory components and rational mechanisms. By analyzing the frequency with which the particular nodes appear, their signs, coefficients and degrees, we can draw useful insights on the plausible mechanisms and motivations generating the observed macro-phenomena.
Impact of the individual components is compared in Figure 5 and has been calculated as the number of times an individual component is positively or negatively used when comparing potential farm plots. Coefficients are considered in this comparison and Figure 5 displays negative and positive impact values separately. The positive impact of quality on farm plot selection is clearly observable, being associated positively 10 times with farm plot selection. Dryness is the second most influential component, with a considerable negative impact. Community Size follows closely with a considerable positive impact on farm plot selection. Figure 6 displays the same analysis of impact for component interactions. Again, quality demonstrates its importance as the interaction has a much higher positive impact than other interactions.
An interesting observation is that does not occur in any of the candidate models, neither as a component or in interaction with other components. This indicates that distance was perhaps relatively, one of the least important factors affecting choice of farm plot. This demonstrates the risk of insufficient exploration of plausible behaviors and alternate candidate models introduced by manual model specification, and how evolutionary model discovery can help mitigate this risk by identifying important components and component interactions.
Finally, it is interesting to see that almost half of the candidate models have full information of the potential farm plots in the Valley. This could indicate that the Anasazi had sufficient information of the potential farm plots in the valley or it could indicate that more social connectivity configurations with vaster access have to be tested.
In isolation agent-based models may not capture the actual ground-truth of agent behavior causing the social phenomena it is intended to simulate. This is due to the fact that there may exist multiple theories that can be modeled and calibrated to produce the phenomena. It is cumbersome and risky for a modeler to manually design and create these models. Further, manually designing and developing each model can make it harder to cross-compare socially plausible candidate models. This leads to modelers inevitably developing models with behaviors or assumptions that not have been checked against alternate socially plausible theories that could potentially better explain the social phenomena being studied.
We have automated this process through evolutionary model discovery, by evolving socially plausible behaviors to fit patterns in real-world social data. Using genetic programming evolutionary model discovery explores combinations of social connectivity, emotional theory components and rational mechanisms for multiple candidate models with rule-sets or behaviors that can produce the desired phenomena. In addition, the components of the candidate models can be compared with regard to their impact on the social phenomena by analyzing the frequency, sign and coefficients with which they occur in the set of candidate models.
Extending our initial study in (Gunaratne and Garibay, 2017), we search for plausible candidate models of the farm seeking behavior of the Artificial Anasazi model, adding multiple social connectivity configurations and emotional theory components to the search process, and as a result, discover a better performing model of farm plot selection. Evaluating occurrence of behavior components across the evolved candidate models, we discover that the original comparator, distance to the current farm plot, does not have as significant an impact as any of the other components included in the search. In fact, distance does not appear at all, in any of the 20 candidate solutions. Instead, quality of the potential farm plot showed a high positive impact to farm selection. Community size and dryness had moderate positive and negative impacts on the farm selection process, respectively. There is evidence in these results that the Anasazi did perform farm plot selection more intelligently, and perhaps with social influence.