How much data are needed to calibrate and test agent-based models?

11/20/2018 ∙ by Vivek Srikrishnan, et al. ∙ 0

Agent-based models (ABMs) are widely used to gain insights into the dynamics of coupled natural human systems and to assess risk management strategies Choosing a sound model structure and parameters requires careful calibration. However, ABMs are often not calibrated in a formal statistical sense. One key reason for this lack of formal calibration is the potentially large data requirements for ABMs with path-dependence and nonlinear feedbacks. Using a perfect model experiment, we examine the impact of varying data record structures on (i) model calibration and (ii) the ability to distinguish a model with agent interactions from one without. We show how limited data sets may not constrain even a model with just four parameters. This finding raises doubts about many ABM's predictive abilities in the absence of informative priors. We also illustrate how spatially aggregate data can be insufficient to identify the correct model structure. This emphasises the need for carefully fusing independent lines of evidence, for example from judgment and decision making experiments to select sound and informative priors.



There are no comments yet.


page 1

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Agent-based models (ABMs) can be a useful tool for modeling and understanding how macro-scale/aggregate features of complex systems emerge from micro-scale/individual decisions, interactions, and feedbacks (“generative” social science[1]). As a result, they have found use in many application areas, including land use change[2, 3, 4, 5, 6, 7, 8], ecology[9, 10, 11], flood risk[12, 13, 14, 15, 16], and climate change adaptation[17, 18, 19, 20, 21].

Models can be designed to address different questions about the modeled system, including prediction, explanation, and demonstration[22]. Marks[23] proposed a classification of simulation models as demonstrative or descriptive based on the model’s purpose. Demonstrative ABMs are used to illustrate that patterns of interest can be produced through local-level rules and interactions. Descriptive ABMs are intended to reproduce observed phenomena for the purpose of explanation, prediction, or both. The descriptive model category includes both simpler, “strategic” models, intended and more complex, “tactical” models[24]. Early ABMs, such as the pioneering work on segregation[25], were primarily demonstrative[26, 23]. Over time, there has been an increase in descriptive models[26], the most famous of which is the Artificial Anasazi Model[27].

Both demonstrative and descriptive models require tests to ensure that the model works as intended (sometimes referred to as verification[28])Descriptive models also benefit from a comparison of model output against observations[23]. Careful validation demonstrates that the model reproduces measured data, though this is not the same thing as demonstrating that the model is a reproduction of system dynamics[28], as all models are approximations of real processes[29].

This observation that models are only capable of approximating, rather than reproducing real system dynamics, shows the importance of descriptive model calibration: the process of selecting model structures and parameter values. One common approach to calibration is to tune model parameters until model outputs are close to the empirical data[6, 30, 31], but these procedures can lead to overfitting the model to the calibration data due to neglecting the conditional and stochastic aspects of data-generation and observation[4]

. To avoid overfitting and account for the stochastic elements of a model, another approach is to choose a model structure and parameter values which are most probable given the observations and prior information about system dynamics

[32, 33].

ABM calibration can be complicated by path-dependence and nonlinearities resulting from feedbacks. However, whether descriptive ABMs are intended to be used for explanation or prediction, these features suggest a need for quantification of model and parametric uncertainty, as observed patterns may be contingent on stochastic forcings or particular initial conditions. In this study, we focus on the question of how much data is required to probabilistically calibrate agent-based models. Here we focus on the overarching question: How much data is required to probabilistically calibrate agent-based models? We use a Bayesian approach to uncertainty quantification, based on the Bayesian interpretation of parameter values as random variables.

We use a Markov Chain Monte Carlo (MCMC) calibration method,, based on the Metropolis-Hastings algorithm

[34]. MCMC is an extremely general method for sampling from the posterior distribution. MCMC has been used for calibrating ABMs[35], and is the method we use to avoid approximation effects. However, MCMC may be computationally intractable for complex models featuring long runtimes or high-dimensional parameter spaces. An additional complication is the need to specify a statistical likelihood function, which may be difficult for particular applications.

The Metropolis-Hastings method has the ability to produce high fidelity approximations to the full joint probability density function of the model parameters

[36]. In general, there is a tradeoff between computational speed and accuracy of the resulting parameter distributions. Some alternative approaches to statistical calibration of ABMs, which are aimed at reducing computational requirements or likelihood specification, include statistical emulation[37, 38], particle filtering[39], and approximate Bayesian computation[40, 41, 31, 11]

. While these methods reduce the computational burden, they come at a cost of potentially severe statistical approximations that can influence the parameter estimates

[42, 43, 44, 45].

We address three specific questions. (i) How much data is required to statistically calibrate an ABM? The complexity of agent decision rules (in the sense of the number of parameters) and agent-agent and agent-environment interactions and feedbacks (in the sense of emergence) can reduce the ability to constrain model parameters or test hypotheses. (ii) Can we distinguish between models with varying levels of complexity, either in terms of high-dimensional decision rules or the types of agent interactions with each other and the environment? (iii) How are calibration and hypothesis-testing affected by the use of spatially-aggregated data (as opposed to observations of individual agents), which may be all that are available due to data-collecting limitations or considerations of anonymity?

For a concrete example, we focus on the particular problem of modeling housing abandonment under flood risks, following the insightful work of Tonn & Guikema[16]. Housing abandonment poses potentially severe economic problems for settlements along rivers and coastlines[46]. Residents who haven’t experienced flooding themselves may abandon their homes if their neighbors do due to depreciating values or anticipation of future flooding. An associated ABM, and two nested submodels with fewer interactions and feedbacks, are illustrated by the influence diagrams in Figure 1.

Figure 1: Influence diagrams of three nested ABMs for housing abandonment decisions under evolving flood pressure. The black components form a basic model without interactions (the “no interaction” model), in which abandonment decisions are based only on floods experienced by each agent. The blue and black components form a model with spatial interactions (the “spatial interactions” model), in which agents move due to experienced floods and the proportion of neighboring lots which are vacant. For context, we also show a more complex model (with the red additions) with spatial and economic interactions (the “economic interactions” model), in which housing market dynamics are affected by abandonments and floods and the abandonment decision includes housing values.

We focus on the simpler submodels to address the question of calibrating relatively simple ABMs. Using the colors in Figure 1, these are the “no interactions” model, in black, and the “spatial interactions” model, in black and blue. In both cases, agents decide to vacate their homes using a probabilistic decision process (logistic regression), as opposed to maximizing utility or using heuristics (which are more common in ABMs in certain application areas, such as land use

[47]. Once a house is abandoned, there is a chance that it is occupied by a new agent in a subsequent year.

Figure 2: Flood information for the synthetic housing domain and river used in this study. Subfigure a) are the flood return periods for each parcel. The dashed outlines correspond to the sub-domains used in experiments with differing numbers of agents: orange is the 25-agent domain, purple is the 50-agent domain, and green is the 100-agent domain. Subfigure b) show the maximum annual river heights, for the 50-year period prior to the start of the simulations and for the 50 years used in the maximum-length simulation.

We use these models in a perfect model experiment (see, for example, Olson et al [48] or Reed & Kollat[49], so that the data-generating process and parameters are known. The pseudo-observations are generated using the spatial-interactions model for an artificial riparian settlement and realizations of annual flood height maxima from a generalized extreme value distribution. The parcel return periods and river heights are shown in Figure 2. Details of the data-generating process are provided in the Methods section. The additional dynamic mechanism resulting from spatial interactions leads to increased probabilities of parcel abandonment for all return periods across realizations of the stochastic process, even for parcels that are far from floods (Figure 3).

Figure 3: Occupancy probability (over 1000 sample realizations) for each parcel after a 50-year model run for the no-interactions model and the spatial-interactions model. The simulated model used observations of 100 parcels.



The structure of the data (individual-parcel versus spatially-aggregated) strongly influences the final shape of the posterior distribution, both due to the number of data points and the different likelihood function specifications. Figure 4 shows the result of updating the prior distributions (specified in Table 1) with 50 years of pseudo-observations of 100 parcels. For certain key parameters (such as the logistic regression coefficient for the local flooding frequency), aggregated data (the total number of abandoned parcels at each time) leaves the posterior close to the prior (Figure 4 b). For individual parcel data, while the marginal posterior is sharpened much further (Figure 4 a).

Figure 4: Calibration results (prior and posterior distributions) for the spatial-interactions model after assimilating 50 years of observed data with 100 observed parcels. The dashed vertical line is the value used in the data-generating process. Panel a) is after assimilating individual-parcel data, and panel b) is after assimilating aggregated data.
Figure 5: Number of vacant parcels for one hundred sample realizations from the complex model, with spatial interactions. The simulated model used observations of 100 parcels for 50 years. The black line is the realization used for that calibration experiment in this study.

While it appears from Figure 4 a that the original decision rules are not fully recovered (looking at the posterior density at the data-generating value), it is important to keep in mind the influence of stochasticity in the realized data. Running the same model with the same parameters can yield model output with very different dynamics due to stochastic forcings, particularly in the presence of high levels of path dependence and positive feedbacks (see Figure 5). Between the strong influence of the stochastic elements in the model and the relative lack of sensitivity of the logistic regression to parameter values close to the data-generating value, it is not necessarily surprising that the data-generating value is assigned a relatively low density.

Figure 6: Prior and posterior hindcasts of the number of vacant parcels for the no-interactions model with aggregated data (panel a), the spatial-interactions model with aggregated data (panel b), the no-interactions model with individual-parcel data (panel c), and the spatial-interactions model with individual-parcel data (panel d) for varying combinations of observed years and parcels.

The full posterior parameter estimate illustrates one limitation of more deterministic approaches to calibration, particularly those which emphasize qualitative parameter selection, as many parameters are highly correlated. For example, the two logistic regression coefficients for flood frequency and proportion of neighboring abandoned parcels have a correlation coefficient of -0.66: a lower sensitivity to experienced floods can be offset by an increased sensitivity to neighbor behavior. Another example is the high positive correlation between the probability of a vacant lot being re-occupied and both the logistic regression intercept term and the coefficient for neighboring parcels (r=0.73 in both cases). Similar interactions would be missed by a deterministic calibration combined with one-at-a-time sensitivity analysis[50].

To validate the calibrated model, we look at the hindcasting ability of the posterior predictive distribution (shown in Figure 

6). While the three-parameter no-interactions model is well constrained by smaller data sets, the lack of fit of the posterior predictive distribution compared to the pseudo-observations for increased amounts of data reveals the missing abandonment dynamic mechanism. Without spatial interactions, the no-interactions model calibration results in a higher sensitivity to experienced flooding to account for the data, which results in an overestimate of the number of abandoned parcels in later years. Meanwhile, the spatial-interactions model, which has one additional parameter, requires more data to constrain the model (25 observed parcels is insufficient with up to 50 years of data), but, once constrained, fits the pseudo-observations better than the no-interactions model. In general, having a larger spatial domain/numbers of agents facilitates calibration more than having a longer data record.

Model Selection

More complex ABMs can be thought of as being constructed by adding new interactions and feedbacks to simpler ABMs, as illustrated in Figure 1. This allows us to view this type of model selection as hypothesis testing for the presence of additional feedback mechanisms[51]

. One standard method of comparing the fit of Bayesian models to data is by computing Bayes factors

[52]. The Bayes factor is the ratio of marginal likelihoods of two models (the integral of the data likelihood over the posterior).One important consideration when using Bayes factors is the role of the prior in the computation[33]

, particularly when they are used for point-null hypothesis testing. Here, we use the same priors for corresponding parameters to reduce this effect.

Figure 7: Log-Bayes Factors when comparing the with-interactions model to the no-interactions model for 50 and 100 parcels and 10, 25, and 50 years. The thresholds for varying degrees of evidence are taken from Kass and Raftery[52]. The marginal likelihoods for each model were estimated using bridge sampling[53] with 5000 samples and a truncated multivariate normal importance density.

For our perfect model experiment, we would expect additional (in terms of the number of observations) and spatially explicit (rather than aggregated) data to improve the ability to distinguish between the data-generating spatial-interactions model and the simpler no-interactions model. In Figure 7, we show the log-Bayes factors (along with thresholds for evidence levels proposed by Kass & Raftery[52]) to summarize the evidence for the spatial-interactions model versus the no-interactions model. We neglect the case with 25 observed parcels due to unreasonably high estimates, despite the ill-constraint on the spatial-interactions model parameters and the resulting qualitatively better fit of the no-iterations model. For individual-parcel data, with more than 25 observed parcels, there is at least strong evidence for the spatial-interactions model no matter how long the parcels were observed, which confirms the qualitative assessment (on the summary statistic of total abandoned parcels) obtained by comparing the hindcasts in Figures 6 c and 6 d.

On the other hand, when aggregated data is used for calibration, there is essentially no quantitative evidence for the spatial-interactions model. This is the case whether we compare the models using Bayes factors or a predictive information criterion such as the Watanabe-Akaike information criterion (WAIC)[54, 55]. Predictive model comparison methods avoid the direct influence of the prior on the comparison and allows for an intuitive comparison between models which have different parameterizations[56]

. The one-standard error range of the difference in WAIC between the spatial-interactions and the no-interactions model is between -2 and 2, which can be interpreted as no difference in support between the two models

[57]. However, a qualitative assessment obtained by comparing Figures 6 a and 6 b might lead a modeler to conclude that the spatial-interactions model fits the observations better than the no-interactions model. This suggests that hindcasting can serve an important supporting role to quantitative model selection.


Probabilistic calibration is an important component of the descriptive agent-based modeling process due to the influence of stochastic noise via path-dependence and feedback loops (as illustrated in Figure 5). However, as our results illustrate, each additional parameter can considerably increase the calibration data requirements. Trying to include every hypothesized feedback mechanism in the final model choice, without supporting evidence, can pose problems from statistical as well as a decision-theoretical points of view[29, 32, 33]. Starting with a simple model and adding complexity when supported by the data can produce more skillful hindcasts, projections, and more powerful insights[58, 24].

An additional concern is the specification of prior distributions. When less data is available (particularly in summarized or aggregated form), that data will have less power to update the prior distributions.This suggests that priors should be as informative as possible (with a strong warning that priors ought not to be more informative that can be supported). While we did not take prior correlations between parameters into account for this experiment, good priors for real-world problems will include prior information about correlations between parameters.

One approach to creating informed priors which include information about the relationships between parameters is probabilistic inversion[59, 60], in which expert assessments (or, as an alternative, the results of judgement and decision-making or economic experiments) can be used to update more generic priors in a way which is consistent with those assessments or experimental results. This allows the survey or experimental participants to provide information directly about outcomes rather than about model parameters, and allows for a separation of the data involved in the prior construction and Bayesian updating processes.



The two ABMs used in this study are represented by the influence diagram in Figure 1. The simpler model, in which the probability of housing abandonment is determined only by the frequency of experienced floods over the previous ten years, is the “no-interactions” model, and is determined by three parameters: the logistic regression intercept, the logistic regression coefficient for flood frequency, and the probability that vacant houses are filled by a new agent.

The “spatial-interactions” model includes an additional logistic regression covariate, the fraction of neighboring plots which are vacant. As a result, this model has four parameters, including the coefficient for this neighboring-vacancy covariate.


We generated pseudo-observations for the perfect model experiment using the model with spatial interactions, to see if we could successfully test for this effect. Parcel residency was initialized by assuming that each parcel had a 99% probability of having a resident in year 0. We used varying combinations of observed years and parcels (see Figure 2 for the observed parcel domains). The combinations were 10, 25, and 50 years, and 25, 50, and 100 parcels. Annual maxima river heights were simulated from a generalized extreme value distribution with location parameter 865, scale parameter 11, and shape parameter 0.02. Data-generating parameter values were -6 for the logistic intercept, 20 for the local-flood coefficient, 4 for the neighboring-vacancy coefficient, and 0.01 for the vacancy-fill probability.

As data may not be available in individualized forms, we examine the power of data for calibration and hypothesis testing about model structures in both individual and aggregate forms. In the individual case, the data set contains observations of each observed parcel at each time. In the aggregate case, we observe the total number of abandoned parcels at each time.


We use a Bayesian framework for model calibration, based on Bayes’ Theorem


where is the posterior density, is the data likelihood, and is the prior.

Priors are provided in Table 1. These priors were constructed using a rough understanding of the model dynamics, so that the resulting probabilities of abandonment seemed plausible. They are intentionally not centered on the known data-generating parameter values.

Parameter Prior Distribution
Intercept Normal(-7, 1)
Flood Coefficient Normal(19, 2)
Vacancy Coefficient Normal(5, 2)
Vacancy Fill Probability Beta(1, 10)
Table 1: Prior distributions for each of the parameters in the no-interactions and spatial-interactions models.

For both individual-parcel and aggregate data, we model the probability of each parcel being vacant and compute the appropriate likelihood, treating each parcel’s vacant status at time as independent and identically distributed conditional on the state in time . This representation (marginalizing over agent states to represent the model dynamics as a Markov chain) is common for many ABMs[62]. In the individual data case, we use a binomial likelihood for each parcel at each time, with the probability of a vacant parcel determined using the Markovian representation after marginalizing. In the aggregate data case, we use a Poisson likelihood on the expected number of vacant parcels.

The models described above are simple enough for us to use MCMC for Bayesian computation. We use 150,000 Metropolis-Hastings iterations after a preliminary adaptive run [63] of 30,000 iterations, which is used to estimate the covariance jump matrix and starting value of the production run. The preliminary run is initialized at the maximum-likelihood estimate. These runs took from several hours to several days, depending on the model and data structure.

Model Selection

Marginal likelihoods for each model are estimated using the method of bridge sampling[53]. The importance density is a truncated multivariate normal with mean and covariance derived from the MCMC output. The truncation occurs along the vacancy fill probability dimension, to ensure that this parameter only takes values between 0 and 1. 5,000 posterior and importance samples were used in the bridge sampling estimator, which resulted in standard errors[64] for the log-marginal likelihoods of orders of magnitude smaller than 1e-3. WAIC and the standard errors of the differences(Vehtari et al. 2016) were computed using 10,000 posterior samples.


The authors would like to thank Ben S. Lee, Joel Roop-Eckart, and Tony E. Wong for their valuable input and contributions. This work was partially supported by the National Science Foundation (NSF) through the Network for Sustainable Climate Risk Management (SCRiM) under NSF cooperative agreement GEO-1240507 and the Penn State Center for Climate Risk Management. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF. All codes for pseudo-data generation, model analysis and figure generation can be found at

Author contributions statement

V.S. and K.K. conceptualized the research. V.S. wrote the model and analysis codes. V.S. and K.K. designed the figures and wrote the paper.

Additional information

Competing interests: The authors declare no competing interests.


  • [1] Epstein, J. M. Agent-based computational models and generative social science. Complexity 4, 41–60 (1999).
  • [2] Parker, D. C., Manson, S. M., Janssen, M. A., Hoffmann, M. J. & Deadman, P. Multi-agent systems for the simulation of land-use and land-cover change: A review. Ann. Assoc. Am. Geogr. 93, 314–337, DOI: 10.1111/1467-8306.9302004 (2003).
  • [3] Evans, T. P. & Kelley, H. Multi-scale analysis of a household level agent-based model of landcover change. J. Environ. Manage. 72, 57–72, DOI: 10.1016/j.jenvman.2004.02.008 (2004).
  • [4] Brown, D. G., Page, S., Riolo, R., Zellner, M. & Rand, W. Path dependence and the validation of agent‐based spatial models of land use. Int. J. Geogr. Inf. Sci. 19, 153–174, DOI: 10.1080/13658810410001713399 (2005).
  • [5] Evans, T. P. & Kelley, H. Assessing the transition from deforestation to forest regrowth with an agent-based model of land cover change for south-central indiana (USA). Geoforum 39, 819–832, DOI: 10.1016/j.geoforum.2007.03.010 (2008).
  • [6] Kelley, H. & Evans, T. The relative influences of land-owner and landscape heterogeneity in an agent-based model of land-use. Ecol. Econ. 70, 1075–1087, DOI: 10.1016/j.ecolecon.2010.12.009 (2011).
  • [7] Evans, M. R. et al. Do simple models lead to generality in ecology? Trends Ecol. Evol. 28, 578–583, DOI: 10.1016/j.tree.2013.05.022 (2013).
  • [8] Brown, C., Alexander, P., Holzhauer, S. & Rounsevell, M. D. A. Behavioral models of climate change adaptation and mitigation in land-based sectors. Wiley Interdiscip. Rev. Clim. Change 8, e448 (2017).
  • [9] Black, A. J. & McKane, A. J. Stochastic formulation of ecological models and their applications. Trends Ecol. Evol. 27, 337–345, DOI: 10.1016/j.tree.2012.01.014 (2012).
  • [10] Grimm, V. Ten years of individual-based modelling in ecology: what have we learned and what could we learn in the future? Ecol. Modell. 115, 129–148, DOI: 10.1016/S0304-3800(98)00188-4 (1999).
  • [11] van der Vaart, E., Johnston, A. S. A. & Sibly, R. M. Predicting how many animals will be where: How to build, calibrate and evaluate individual-based models. Ecol. Modell. 326, 113–123, DOI: 10.1016/j.ecolmodel.2015.08.012 (2016).
  • [12] Aerts, J. C. J. H. et al. Integrating human behaviour dynamics into flood disaster risk assessment. Nat. Clim. Chang. 8, 193–199, DOI: 10.1038/s41558-018-0085-1 (2018).
  • [13] Dubbelboer, J., Nikolic, I., Jenkins, K. & Hall, J. An Agent-Based model of flood risk and insurance. JASSS 20, DOI: 10.18564/jasss.3135 (2017).
  • [14] Haer, T., Botzen, W. J. W. & Aerts, J. C. J. H. The effectiveness of flood risk communication strategies and the influence of social networks—insights from an agent-based model. Environ. Sci. Policy 60, 44–52, DOI: 10.1016/j.envsci.2016.03.006 (2016).
  • [15] Jenkins, K., Surminski, S., Hall, J. & Crick, F. Assessing surface water flood risk and management strategies under future climate change: Insights from an Agent-Based model. Sci. Total Environ. 595, 159–168, DOI: 10.1016/j.scitotenv.2017.03.242 (2017).
  • [16] Tonn, G. L. & Guikema, S. D. An Agent-Based model of evolving community flood risk. Risk Anal. DOI: 10.1111/risa.12939 (2017).
  • [17] Balbi, S., Giupponi, C., Perez, P. & Alberti, M. A spatial agent-based model for assessing strategies of adaptation to climate and tourism demand changes in an alpine tourism destination. Environmental Modelling & Software 45, 29–51, DOI: 10.1016/j.envsoft.2012.10.004 (2013).
  • [18] Barthel, R. et al. An integrated modelling framework for simulating regional-scale actor responses to global change in the water domain. Environmental Modelling & Software 23, 1095–1121, DOI: 10.1016/j.envsoft.2008.02.004 (2008).
  • [19] Gerst et al. Agent-based modeling of climate policy: An introduction to the ENGAGE multi-level model framework. Environmental Modelling & Software 44, 62–75, DOI: 10.1016/j.envsoft.2012.09.002 (2013).
  • [20] Schneider, S. H., Easterling, W. E. & Mearns, L. O. Adaptation: Sensitivity to natural variability, agent assumptions and dynamic climate changes. Clim. Change 45, 203–221, DOI: 10.1023/a:1005657421149 (2000).
  • [21] Ziervogel, G., Bithell, M., Washington, R. & Downing, T. Agent-based social simulation: a method for assessing the impact of seasonal climate forecast applications among smallholder farmers. Agric. Syst. 83, 1–26, DOI: 10.1016/j.agsy.2004.02.009 (2005).
  • [22] Epstein, J. M. Why model? Journal of Artificial Societies and Social Simulation 11, 12 (2008).
  • [23] Marks, R. E. Validation and model selection: Three similarity measures compared. Complexity Economics 0, 11 (2011).
  • [24] Holling, C. S. CHAPTER 8 - the strategy of building models of complex ecological systems. In Watt, K. E. F. (ed.) Systems Analysis in Ecology, 195–214, DOI: 10.1016/B978-1-4832-3283-6.50014-5 (Academic Press, 1966).
  • [25] Schelling, T. C. Dynamic models of segregation. J. Math. Sociol. 1, 143–186, DOI: 10.1080/0022250X.1971.9989794 (1971).
  • [26] Janssen, M. & Ostrom, E. Empirically based, agent-based models. Ecol. Soc. 11, DOI: 10.5751/ES-01861-110237 (2006).
  • [27] Dean, J. S. et al. Understanding anasazi culture change through agent-based modeling. In Kohler, T. A. & Gumerman, G. J. (eds.) Dynamics in Human and Primate Societies, Santa Fe Institute Studies on the Sciences of Complexity, 179–205 (Oxford: Oxford University Press, 2000).
  • [28] Oreskes, N., Shrader-Frechette, K. & Belitz, K. Verification, validation, and confirmation of numerical models in the earth sciences. Science 263, 641–646, DOI: 10.1126/science.263.5147.641 (1994).
  • [29] Box, G. E. P. Science and statistics. J. Am. Stat. Assoc. 71, 791–799, DOI: 10.1080/01621459.1976.10480949 (1976).
  • [30] Schwarz, N. & Ernst, A. Agent-based modeling of the diffusion of environmental innovations — an empirical approach. Technol. Forecast. Soc. Change 76, 497–511, DOI: 10.1016/j.techfore.2008.03.024 (2009).
  • [31] van der Vaart, E., Beaumont, M. A., Johnston, A. S. A. & Sibly, R. M. Calibration and evaluation of individual-based models using approximate bayesian computation. Ecol. Modell. 312, 182–190, DOI: 10.1016/j.ecolmodel.2015.05.020 (2015).
  • [32] Jaynes, E. T. Probability Theory: The Logic of Science (Cambridge University Press, 2003).
  • [33] Robert, C. P. The Bayesian choice: from decision-theoretic foundations to computational implementation (Springer, New York, 2007), 2nd edn.
  • [34] Hastings, W. K. Monte carlo sampling methods using markov chains and their applications. Biometrika 57, 97–109, DOI: 10.2307/2334940 (1970).
  • [35] Keith, J. M. & Spring, D. Agent-based bayesian approach to monitoring the progress of invasive species eradication programs. Proc. Natl. Acad. Sci. U. S. A. 110, 13428–13433, DOI: 10.1073/pnas.1216146110 (2013).
  • [36] Robert, C. P. The Metropolis–Hastings algorithm. In Wiley StatsRef: Statistics Reference Online, DOI: 10.1002/9781118445112.stat07834 (John Wiley & Sons, Ltd, 2014).
  • [37] Lamperti, F., Roventini, A. & Sani, A.

    Agent-based model calibration using machine learning surrogates.

    J. Econ. Dyn. Control 90, 366–389, DOI: 10.1016/j.jedc.2018.03.011 (2018).
  • [38] Oyebamiji, O. K. et al. Gaussian process emulation of an individual-based model simulation of microbial communities. J. Comput. Sci. 22, 69–84, DOI: 10.1016/j.jocs.2017.08.006 (2017).
  • [39] Kattwinkel, M. & Reichert, P. Bayesian parameter inference for individual-based models using a particle markov chain monte carlo method. Environmental Modelling & Software 87, 110–119, DOI: 10.1016/j.envsoft.2016.11.001 (2017).
  • [40] Fabretti, A. Markov chain analysis in agent-based model calibration by classical and simulated minimum distance. Knowl. Inf. Syst. DOI: 10.1007/s10115-018-1258-y (2018).
  • [41] Sirén, J., Lens, L., Cousseau, L. & Ovaskainen, O. Assessing the dynamics of natural populations by fitting individual-based models with approximate bayesian computation. Methods Ecol. Evol. 41, 379, DOI: 10.1111/2041-210X.12964 (2018).
  • [42] Frazier, D. T., Martin, G. M., Robert, C. P. & Rousseau, J. Asymptotic properties of approximate bayesian computation. Biometrika 105, 593–607, DOI: 10.1093/biomet/asy027 (2018).
  • [43] Künsch, H. R. Particle filters. Bernoulli 19, 1391–1403, DOI: 10.3150/12-BEJSP07 (2013). 1309.7807.
  • [44] Robert, C. P., Cornuet, J.-M., Marin, J.-M. & Pillai, N. S. Lack of confidence in approximate bayesian computation model choice. Proc. Natl. Acad. Sci. U. S. A. 108, 15112–15117, DOI: 10.1073/pnas.1102900108 (2011).
  • [45] Singh, R., Quinn, J. D., Reed, P. M. & Keller, K. Skill (or lack thereof) of data-model fusion techniques to provide an early warning signal for an approaching tipping point. PLoS One 13, e0191768, DOI: 10.1371/journal.pone.0191768 (2018).
  • [46] Fowler, L. B. et al. Flood mitigation for pennsylvania’s rural communities: Community-Scale impact of federal policies. Tech. Rep., Technical report, The Center for Rural Pennsylvania, 2017. URL http://www. rural. palegislature. us/documents/reports/Flood-Mitigation-2017. pdf. Accessed 01-30-2018 (2017).
  • [47] Groeneveld, J. et al. Theoretical foundations of human decision-making in agent-based land use models – a review. Environmental Modelling & Software 87, 39–48, DOI: 10.1016/j.envsoft.2016.10.008 (2017).
  • [48] Olson, R. et al. What is the effect of unresolved internal climate variability on climate sensitivity estimates?: EFFECT OF INTERNAL VARIABILITY. J. Geophys. Res. D: Atmos. 118, 4348–4358, DOI: 10.1002/jgrd.50390 (2013).
  • [49] Reed, P. M. & Kollat, J. B. Save now, pay later? multi-period many-objective groundwater monitoring design given systematic model errors and uncertainty. Adv. Water Resour. 35, 55–68, DOI: 10.1016/j.advwatres.2011.10.011 (2012).
  • [50] Ten Broeke, G., Van Voorn, G. & Ligtenberg, A. Which sensitivity analysis method should I use for my agent-based model? Journal of Artificial Societies and Social Simulation 19, 5 (2016).
  • [51] Cottineau, C., Reuillon, R., Chapron, P., Rey-Coyrehourcq, S. & Pumain, D. A modular modelling framework for hypotheses testing in the simulation of urbanisation. Systems 3, 348–377, DOI: 10.3390/systems3040348 (2015).
  • [52] Kass, R. E. & Raftery, A. E. Bayes factors. J. Am. Stat. Assoc. 90, 773–795, DOI: 10.1080/01621459.1995.10476572 (1995).
  • [53] Meng, X. L. & Wing, H. W. Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Stat. Sin. 6, 831–860 (1996).
  • [54] Watanabe, S. Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 11, 3571–3594 (2010).
  • [55] Vehtari, A., Gelman, A. & Gabry, J. Practical bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 1–20, DOI: 10.1007/s11222-016-9696-4 (2016).
  • [56] Gelfand, A. E. & Ghosh, S. K. Model choice: A minimum posterior predictive loss approach. Biometrika 85, 1–11, DOI: 10.1093/biomet/85.1.1 (1998).
  • [57] Burnham, K. P. & Anderson, D. R. Multimodel inference: Understanding AIC and BIC in model selection. Sociol. Methods Res. 33, 261–304, DOI: 10.1177/0049124104268644 (2004).
  • [58] Box, G. E. P. Robustness in the strategy of scientific model building. In Launer, R. L. & Wilkinson, G. N. (eds.) Robustness in Statistics, 201–236, DOI: 10.1016/B978-0-12-438150-6.50018-2 (Academic Press, 1979).
  • [59] Kraan, B. C. & Cooke, R. M. Uncertainty in compartmental models for hazardous materials - a case study. J. Hazard. Mater. 71, 253–268, DOI: 10.1016/S0304-3894(99)00082-5 (2000).
  • [60] Fuller, R. W., Wong, T. E. & Keller, K. Probabilistic inversion of expert assessments to inform projections about antarctic ice sheet responses. PLoS One 12, e0190115, DOI: 10.1371/journal.pone.0190115 (2017).
  • [61] Bayes, T. An essay towards solving a problem in the doctrine of chance. Philosophical Transactions of the Royal Society of London 53, 370–418 (1763).
  • [62] Izquierdo, L. R., Izquierdo, S. S., Galan, J. M. & others. Techniques to understand computer simulations: Markov chain analysis. J. Artif. Organs (2009).
  • [63] Vihola, M. Robust adaptive metropolis algorithm with coerced acceptance rate. Stat. Comput. 22, 997–1008, DOI: 10.1007/s11222-011-9269-5 (2012).
  • [64] Fruhwirth-Schnatter, S. Estimating marginal likelihoods for mixture and markov switching models using bridge sampling techniques. Econom. J. 7, 143–167, DOI: 10.1111/j.1368-423X.2004.00125.x (2004).