ABM - Agent Based Modelling
ABC - Approximate Bayesian Computation
ODD - Overview, Design, Details
BDI - Belief-Desire-Intention
HCC - Hamilton City Council
CBD - Central Business District in Hamilton, New Zealand
AW - Count data collected for the southwest direction at the Anglesea-Ward intersection
TR - Count data collected for the northeast direction on Ward St towards Victoria St
TA - Count data collected for the southeast direction on Worley Pl
CPS - Count data collected for the southeast direction out of Centre Place (South) to Civic Square
Agent Based Modelling refers to a class of computational models invoking the dynamic behaviour, reactions and interaction amongst a set of agents in a shared environment (Abar, Theodoropoulos, Lemarinier, O’Hare, 2017). A wide range of real world systems can be modelled using this framework. It is particularly effective in complex systems of multiple heterogeneous, interacting agents. Valuable information can be gathered by observing the aggregate behaviour of a micro level simulation where agents interact with both other agents and the environment.
A key component of ABM research is in the post analysis of the model output. Due to its stochastic simulation nature there is a vast flexibility in the quantity and type of data produced by a given ABM. As a result, it is necessary to output only the most relevant data and analyse it using efficient techniques for effectively answering the original set of research questions. In a traditional Bayesian context, there exists a set of random parameters describing the function of an ABM. These parameters are usually inferred by fitting a model to the data and calculating the probability of observing the parameters given the data (posterior distribution). However, it is often the case that a complex system has an intractable likelihood, where the probability of the parameters given the data cannot be calculated.
In this setting, alternative approaches must be examined to find the posteiror distribution. One such ‘likelihood free’ approach is Approximate Bayesian Computation (ABC). Where observed data can be compared against simulated data, ABC can be used to infer a set of parameters for a model. This turns out to be a suitable method for the analysis of an agent based model, where the likelihood is often intractable. Various examples of analysing an ABM using the ABC algorithm are shown, with the focus being on modelling pedestrians flows in the Hamilton CBD. ABC is used to validate the ABM by approximating the observed data, as well as infer a set of probability parameters for the movement of pedestrians.
3 Agent Based Modelling
Agent Based Modelling (ABM) is a computational framework for simulating the behaviours and interactions of autonomous agents. The agents can represent any individual or collective entity within a complex system. They each have a set of properties and behaviours, which are carried out at each time step of the model simulation. The output of the model gives us a means of assessing how the actions of individual entities affect the system as a whole. In a complex system it is often the case that an emergent pattern is seen, but the actions of the agents making up the system is not readily visible. Constructing a model using the ABM framework allows us to understand how the properties and behaviours of individual agents develop this emergent pattern.
Joslyn & Rocha, (2000) define a complex system as “consisting of a large number of interacting components (agents, processes, etc.) whose aggregate activity is non-linear (not derivable from the summations of the activity of individual components), and typically exhibits hierarchical self-organization under selective pressures”. Such systems contain a number of complexities in state space, interactions, behaviour and spatio-temporal structure. A key property is that of emergence. When a large number of agents in a complex system interact in a spatio-temporal structure, properties can be observed which are usually not readily apparent from observing the agents independently. ABM allows us to construct a complex system in such a way that the aggregate behaviour of individual agents can be observed. By constructing such a model on a microscale level, it is often the case that more information can be extracted than viewing the model solely by its aggregate behaviour.
ABM has been used across a wide range of scientific disciplines, including ecology, economics, and biology. In the biological sciences, recent advances in computational tools have allowed for the simulation of individual cells in order to observe the aggregate behaviour. One application is the modelling of the chemotactic response of E.coli cells to chemo attractant gradients in a 3D environment (Emonet et al, 2005). Representing cells as autonomous agents competing in a spatial environment is well suited for such biological analysis, where the physical observation of the process is either not possible or practical. More recently, ABM has been used in urban planning of city environments. As the dynamic structure of transportation modes evolves and the use of alternative modes increases, is it now vital to understand how traffic flows change across an urban environment. A tool developed by Aschwanden, Wullschleger, Müller, & Schmitt (2012), allowed for the simulation of a city’s transportation network to evaluate greenhouse gas emissions and pedestrian flow (Aschwanden et al, 2012).
Much of the phenomena observed in the world can be modelled with an ABM. For example consider the ecological predator-prey model. There are two popular approaches to understanding the dynamics of the change in populations between two species where one predates on the other. One is known as the Lotka–Volterra model, developed independently by Lotka (1925) and Volterra (1926). This pair of first-order non linear differential equations describe the change in the population of two species over time.
where = the population of the prey species,
= the population of the predator species,
represent the instantaneous growth of the two populations,
and are positive real parameters which describe the interaction between the two species. Each parameter can be adjusted and a solution yielded to find the change in population of two interacting species. While the Lotka-Volterra model has a long history of use in ecology, it is too simplistic to apply to ecosystems where predator competition is involved. In addition, it is not possible for either species in the model to saturate completely.
An alternative method to describe predator-prey interaction is that of an agent based model. Consider the interaction between wolf and sheep in a field. First we describe this model by answering a set of key questions. Describing an agent based model in this way enables another researcher to quickly and easily understand its function.
Under what conditions is an equilibrium of the wolf and sheep population levels obtained, when wolves predate on the sheep and sheep consume grass?
Wolves, Sheep, Grass
Wolf and Sheep: Energy, Location, Heading. Grass: Amount of grass
Wolf and Sheep: Move, Die, Reproduce. Wolf: Eat sheep. Sheep: Eat Grass. Grass: Grow
Num Sheep, Num Wolves, Movement Cost, Energy Gain from grass, energy gain from sheep, grass regrowth rate
Sheep & Wolves: 1. Move, 2. Die, 3. Eat, 4. Reproduce; Grass: 5. Grow
Sheep and wolf population versus time
The two species can be represented as two sets of agents. Each agent of the set is given relevant properties chosen as having the most influence on deriving the output we want to observe. Both the sheep and wolves are given three shared properties: an energy level, location and heading (direction). There is no limit to the number of properties, but for simplicity purposes, they should be restricted to those necessary to answer the research question. The second step is to describe the environment in which the agents interact. Here the environment is a field with patches of grass. This could be extended to include fences, elevation and other features, but again for simplicity purposes are left out of the first model. When the agents reach the end of the world, they will reappear on the other side of the world. This is known as a ‘torus shaped’ boundary condition, and is a common feature in ABMs. Next, the agent behaviour is described. Both sheep and wolves have the ability to move, eat, reproduce, and die. Where they differ is in what each species eat - wolves will consume sheep while sheep will consume grass. Parameters are then chosen to control various attributes of the model. This allows the observation of changes in the model given varying parameters - the numbers of each population, the amount of grass available, or the reproduction rate can be changed and the aggregate effect of each observed. At each time step (or tick, which will be used hereafter), the agents carry out some action. Both the sheep and wolves move, die, eat then reproduce. The grass grows at each tick. The populations of the two species is observed as the aggregate behaviour.
3.1 Analysis of an ABM
In the above model there exists a set of parameters which achieve equilibrium in the sheep and wolf populations. This is often found through trial and error. Another, more sophisticated method involves looping through many thousands of simulations, changing the set of parameters each time. The sheep and wolf populations are analysed in each run. The output is some metric describing how close the populations are to sustaining over time. This is demonstrated below. In each run, the parameters are adjusted, the model is run for a set number of time ticks, and the final numbers of both wolf and sheep compared to the initial populations. The data can be analysed with appropriate statistical models to identify stationarity. One such method is to view the two populations as two time series. The oscillation of the species populations can be viewed as seasonality. Given enough simulation time, the difference in population numbers from the start to the end of the time series is compared to determine if the system has reached equilibrium.
The BehaviorSpace tool in ABM software NetLogo allows a simulation to be run multiple times with changing parameters. A simple demonstration of this tool for analysis is when the ‘energy from grass’ parameter is changed. This parameter determines the energy that sheep gather from eating grass. Because sheep lose energy from movement, a low amount of energy gained from grass is likely to result in a larger decrease in the sheep population than if a large amount of energy was gained from eating grass. The Wolf-Sheep predation ABM allows this theory to be tested in a simulation based manner, followed by the application of robust statistical analysis methods from the output data to answer questions about the model. Keeping all other parameters fixed, the grass energy parameter was updated. The simulation was run for 1000 ticks and the population of the two species recorded at each tick.
When the grass energy is low, both species die out extremely quickly. This is due to the sheep not gaining enough energy to move, subsequently dying out. The wolves, now with no species to predate, also die out. At a grass energy of four, an equilibrium appears to be reached. The sheep can gain enough energy to continually move, which sustains the populations of the wolves predating them. If the grass energy is changed to 7, the sheep population varies far more. The population increases sharply due to the high energy levels of the sheep (who then reproduce more frequently), which results in a high level of competition for the available grass. Due to this competition, the population drops off to an extremely low level at 340 ticks. This results in the wolf population dropping to zero from the lack of prey. With no predators remaining, the sheep population increases to a high level and remains at 500 for the remainder of the simulation. In the last run the grass energy is set to 10. This has a similar effect as the previous parameter, although the sheep population reaches a higher level before remaining constant.
While this ‘brute-force’ approach is often used to analyse complex systems for the purposes of parameter inference, it is computationally inefficient. When looping through samples of each parameter, much of the simulation time is spent producing data far from the desired outcome. For a set of parameters of dimension , each sampled at N equal intervals , the required number of simulations quickly becomes prohibitively large. Consider the case of the wolf-sheep predation model with 7 parameters. If we would like to sample each parameter at 100 even intervals, the number of simulations to cover all possible combinations becomes:
Clearly this number of simulations is unreasonable. This can be shortened by reducing the number of sampled intervals for each parameter or disregarding some combinations of variables, but the issue remains of a large amount of computation time being wasted sampling parameters which do not produce simulated data close to the desired outcome of the ABM.
3.2 Equation Based Modelling (EBM)
While ABM is a ’bottom up’ approach focusing on the aggregate behaviour of agents, Equation Based Modelling (EBM) is a top down approach. EBM is based on an interrelation between a series of partial differential equations (Parunak, Savit & Riolo, 1998). The variability in the system is examined over time and/or space. Validation is enabled by comparing the output of the model to the real world system behaviour. In comparison, ABM allows validation of both the individual and aggregate system behaviour. EBM is suited for modelling physical processes where the behaviour of the components is not as important as that of the overall system.
Generally, EBM’s are a set of differential equations and parameters. Extending the Lotka–Volterra model, time can be eliminated from each equation to produce a single equation, and then solved for a constant quantity :
This model will give the number of two populations and given the set of parameters , , and . A major limitation of this in ecological models is that neither population can reach zero - they always recover after falling to a low level. However, the more flexible structure of an ABM allows either population to go extinct.
3.3 Comparison of ABM to EBM
While Agent Based Modelling is well suited for many classes of complex systems, it is better suited for some contexts than others. In problems with a large number of homogeneous agents, it is often the case that more accurate results can be obtained in a shorter amount of time using an aggregate solution. Tracking the behaviour of individual agents is not always necessary. For example, if we are concerned with the amount of water in a container, it is more useful to track the water level directly and model it using a set of differential equations. Parameters in the equations such as temperature can be adjusted and the effects observed. Where there are only a few interacting agents in a system, a similar approach can be taken by deriving a model with differential equations. One such example is in the behaviour of a tennis ball falling onto different surfaces. Here, the physical behaviours of the model are understood with a rigorous mathematical structure such as kinematic equations. This can be used to developed a set of differential equations for effectively describing the behaviour of the ball given different materials and surfaces.
Agent Based Modelling is well suited for systems with many heterogeneous agents interacting in an environment. In many populations, individual agents possess unique properties which are not shared across the entire population. Their behaviours are guided by both the environment, the history of their actions and their current properties. ABM is well suited for a system where it is useful to track individual agents and their actions. In this manner it is much more powerful than using equation based modelling. Where the behaviour of agents relies on their history of interactions, the agents can be modelled as intelligent beings with the ability to learn, reason about situations, and deal with uncertainty. An example of this is in the behaviour of pedestrians in a transportation network. Humans are intelligent beings, with each individual’s decision making process different from the rest. The reason for one individual choosing to travel a certain route is likely different to another individual, and the long term behaviours of each pedestrian differ substantially. While it would be impractical to model such a process using EBM, pedestrian modelling is well suited to an ABM approach, allowing for individual actions carried out by several heterogeneous agents.
3.4 Agent Intelligence
Recent advances in the field of artificial intelligence have allowed for the construction of agents in an ABM, such that they are able to learn from their past and take an action based on a complex set of variables. One such influential paradigm to enable this is BDI - Belief-Desire-Intention (Singh, Padgham & Logan, 2016). BDI recognises “that an agent can be identified as having: a set of beliefs about its environment and about itself; a set of desires which are computational states which it wants to maintain, and a set of intentions which are computational states which the agent is trying to achieve” (O’Hare, Jennings, 1996). More recent developments have expanded on this philosophy, going beyond simple reactive behaviour to embedding cognitive complexity (Abrams, 2013). While many ABMs are not suited to this more complex agent intelligence, the recent advances in modelling highly complex systems allows for the embedding of more advanced levels of agent intelligence to achieve a more realistic simulation.
3.5 Mathematical Background of ABM
Although ABM is much less mathematically oriented than EBM, it is important to establish a mathematical framework for representing the system and the agent properties. There exist a number of mathematical formalisations for Agent Based Models. A series of mathematical objects can be developed to represent a theoretical abstraction of agent based simulations. These objects capture the features of simulations and allow for the development of mathematical theory to explain the output of a simulation (Laubenbacher et al, 2007). An ABM can be described as a Markov Chain where the state of the system at timeis given by the state at all nodes in the system: . The state at a given node for time is given by
That is, the state at time in the system is reliant only on the previous state. This is an important feature of an ABM, allowing for the model at each tick of the simulation to make agent decisions and update the environment based only on the current state. Storing every prior state and set of decisions, while updating the current state using this prior data, becomes costly in terms of computation and memory as a system increases in complexity.
Perhaps the most common framework for describing agent based simulations is Cellular Automata (CA). They are typically defined over a regular grid such as the two dimensional . Each grid point represents a site or node. Each node has a state , where denotes a time step. The neighborhood N for each site consists of a set of nodes that can influence the future state of the node . Based on the current state and the current state of all nodes in the neighborhood , a function computes the next state of the site at . Laubenbacher et al (2007) defines this as:
where denotes the tuples consisting of all the states with .
Another modelling framework is that of finite dynamical systems. This framework represents an ABM as a time-discrete dynamical system of a finite state set. It can be seen that the representation of an ABM in this way is mathematically rich enough to allow the derivation of formal results.
Finite Dynamical Systems
A Finite Dynamical System (FDS) is an iteration of a function over a collection of variables. We denote this collection as a nonempty, finite set . There is a single, local function associated with each element in the set. Each of theses local functions take input from only variables in the ‘neighborhood’ of . Laubenbacher et al (2007) defined an FDS as a sequential composition of these local functions, forming the dynamical system:
Here, is an iteration unit which generates the dynamics of the system. From Laubenbacher et al (2007), is assembled from each local function , each of the variables can be updated simultaneously:
In the above a parallel dynamical system is obtained. Each variable in the system is updated in parallel using a local function in its own neighborhood. Alternatively, a sequential dynamical system can be obtained by updating the variables states by a fixed update order (Laubenbacher et al, 2007):
where is a permutation on a set .
The dynamics of can be represented as a directed graph on the vertex set , called the phase space of (Laubenbacher et al, 2007). There is a directed edge from v to w if and only if = w (Laubenbacher et al, 2007).
An Agent Based Model can be represented using this framework. Agents can be thought of as entities in a finite dynamical system, each carrying a set of configurations, preferences and spatial and/or temporal state information. Cells are features of the environment in which the agents carry out behaviours. These cells take on one of a finite number of possible states. Agents interact with a subset of nearby agents in the environment, and update their internal state according to a function at each time step. The neighbors of a cell in the ABM environment form an adjacency relation, obtaining a dependency graph of the agents (Laubenbacher et al, 2007). For example in a pedestrian network, the state at time for a given intersection is dependent on the states of the intersections in the neighborhood of at time .
The agents in an ABM may be updated in a number of ways. These are generally either synchronous, asynchronous, or event-driven (Laubenbacher et al, 2007). The choice of the scheduling system is highly dependent on the system that is being modelled. In the pedestrian model described previously, it is clear that at each time-step (or tick) of the simulation, all agents will update; moving through the environment and updating their internal states. This is a synchronous system. An asynchronous system is suitable for models where the actions of each agent do not depend on the current state of the agents in its local neighborhood. The choice of such a scheduling system is an important feature in the design of an ABM. The nature of dependency and interaction must be carefully considered in order to design a realistic system.
3.6 Link to statistical theory
As the foundation of Agent Based Modelling lies in simulation based on observed data in addition to data output, statistics plays a major role at nearly every stage of the modelling process. Perhaps the most important step in ABM based research lies in the analysis of data from the simulation to derive parameters for understanding the data generating process. Other analysis may include descriptive, predictive, or inferential statistics. These models can include, but are not limited to; spatial, network, survival, or time series analysis. Most ABM frameworks provide a method to import and export data into more robust statistical packages such as R or Python. These packages provide advanced modelling tools which allow the researcher to extract insights from the ABM.
3.7 Model Design
To date, there is no standard format for the documentation of an ABM. Limited replicability is an issue in scientific research using ABMs. To maximise the reproducibility of an ABM, a format for their design should be adopted which is popular in the scientific community. By standardising the description of a model in this way, it can be more easily reproduced by another party or revisited by the original developer. One such format is ODD - Overview, Design, Details, first proposed by Grimm et al (2006) as a standard protocol for describing agent based models. It was subsequently reviewed and updated in 2010 (Grimm et al, 2010). ODD describes a set of key aspects of the model:
What is the purpose of the model? What are the desired outcomes?
Entities, State Variables and Scales
What entities exist in the model? Agents, environment, collectives, spatial units. How are the entities characterised? What properties do they hold? What are the spatial and temporal resolutions of the model?
Process overview and scheduling
What actions are assigned to each entity, and in which order? How are the state variables updated? How is time modelled - discretely, continuously, or as a continuum where both can occur?
Emergence, Adaptation, Fitness, Prediction, Sensing, Interaction, Stochasticity, Collectives, Observation.
What is the initial state of the model world? How many entities are created and what are their default properties?
Is external data used as input to represent the change in a process over time?
ABMs developed in the course of this dissertation will follow the ODD protocol. For each, thorough documentation is provided to enable reproduction of the model by researchers.
3.8 ABM Software
Several ABM toolkits are available to enable the development of agent based model simulations. Perhaps the most popular of these is NetLogo. NetLogo was developed in 1999 by Uri Wilesnky. It uses a philosophy of ‘low threshold, no ceiling’, with a goal of making the software easy to use for beginners while remaining flexible enough to develop advanced models. It contains a number of example models for fields such as network analysis, social science, and biology. All ABM models in this report are developed in NetLogo. The RNetLogo and pyNetLogo libraries in R and Python respectively allow for programmatic, headless control of a NetLogo model. Statistical models were developed in both R and Python, with the simulation element communicating with the NetLogo model for sending commands and receiving output.
4 Graph Theory
ABM is commonly used to design networks in a variety of fields, including power grids - Durana et al (2014); social networks - El-Sayed et al (2012); and transport - Huynh et al (2015). Graph Theory underlies much of the construction of these networks. The basis of graph theory is the relationships between a set of entities, or nodes
. A graph is an ordered paircomprising a set of vertices or nodes together with a series of edges , which are two-element subsets of
(Prathik et al, 2016). This turns out to be a natural way of defining an ABM. For example, a model for social network analysis could be set up with users as nodes, communication between users as links, and the direction of the communication as the direction of the links. In the ABM, agents are defined as users. Their parameters may include friend count, post rate, and other metrics related to the social network in question. The environment in which the agents interact is the graph itself. Their behaviours over a series of ticks allow output such as activity levels and friend counts to be analysed on an aggregate level.
Consider a simple stochastic ABM represented as a graph (pictured below). In this ABM, agents are placed randomly at nodes. At each tick of the simulation each agent will move to a connected node via the grey lines. The graph is undirected - agents can travel to any connected edge from a given node. There are seven nodes and six edges. By designing this ABM network using the logic of graph theory, we can answer route based questions, such as the number of possible ticks it will take for an agent to move from node to node . Other problems we might want to solve include the shortest possible distance between two nodes, or whether it is possible to return to a node.
Constructing a network based ABM using the logic of graph theory allows for the formal analysis of the simulation. Consider an ABM consisting of a series of cities (nodes) and directed roads (links). A question that can be asked of this model is “does there exist a tour of all cities stopping at each city exactly once using the given roads?” (Altschuler & Williams, 2017). This is a route problem known as the Hamiltonian path problem
. This is an NP-complete problem that is solved in non-deterministic polynomial time. The model can be developed in an ABM structure to set up a simple rule based system for the traversal of a single agent between nodes. Various algorithms based on the graph theory analysis of the problem can be implemented as actions by the agent based on its current state. Setting up the model in this way allows for both abstraction from its mathematical form into one which is easier to visualise, as well as an environment to enable sandbox testing of different graph structures. This simplification is a common thread in the design of ABMs and is a large reason for their widespread use in network analysis.
5 Approximate Bayesian Computation
5.1 Bayesian Inference
Bayesian inference is a method of statistical inference grounded in Bayes Rule. It states that the probability of some event A, given that event B has occurred, is given by:
In the context of Bayesian parameter estimation, we aim to find the posterior distribution of the parameters given the data:
where is the posterior distribution,
is the likelihood,
is the prior, and
is the probability of the data. Often the denominator
involves computing an integral to normalise the posterior to a probability distribution:
In some cases, such as models where a conjugate prior is available, this is straightforward to calculate numerically. If no conjugate prior is available, there are a wide range of computational methods for sampling from the posterior. One of these is Markov chain Monte Carlo (MCMC). MCMC constructs a Markov chain with the target posterior dsitibution as its equilibrium distribution. By sampling from this chain for a long period of time, a sample from the posterior is obtained. A common algorithm for this is known as Metropolis Hastings, developed by Metropolis, Rosenbluth, Rosenbluth, Teller and Teller (1953). Formally, the algorithm uses a proposal distributionas the conditional probability of accepting a state given , and an acceptance ratio as the probability of accepting the proposed state .
MCMC can be used for parameter inference in any class of statistical model where the likelihood can be evaluated. Consider a simple generalised linear model of the form . Here the parameters are the intercept and a single coefficient
. The likelihood follows a Binomial distribution of the form
The use of this likelihood in the calculation of the acceptance probability will construct a Markov chain that converges on the posterior distributions for and .
For a large class of statistical models, the likelihood is either prohibitively expensive to compute or intractable to calculate. This arises often in complex systems where stochastic processes are involved and as a result, methods such as MCMC cannot be used. An alternative method of approximating the posterior distribution is required for these problems, known as likelihood free inference.
5.2 Likelihood Free Inference
The likelihood in a statistical model is a function of the parameters given the data, . In simple models following a known distribution this is easily calculated. However, in complex models the integral often has no closed form solution allowing it to be evaluated in a finite number of operations. An example of this is the g and k distribution.
For a quantile distribution such as this, the likelihood cannot be calculated analytically, only numerically. In practice, simulating from the distribution to compute the likelihood is much more straightforward than the numerical calculation. We draw from the distribution by simulating values for substitution into the above function. This idea is the basis of likelihood free methods - the acceptance or rejection of samples in a simulation based model. There is ongoing development in these methods due to the advent of big data and the use of simulation models to evaluate parameters. Two such popular methods are known as Synthetic Likelihood and Approximate Bayesian Computation.
5.2.1 Synthetic Likelihood
Bayesian Synthetic Likelihood uses a multivariate normal approximation with , the number of replicated model simulations, as a single tuning parameter (Price et al, 2018). In place of the likelihood of the summary statistic it uses:
where is a simulated summary statistic and is a known data point for the parameter .
An alternative form proposed by Wood (2010) uses an auxiliary likelihood based on a multivariate normal approximation. Auxiliary parameters and are used, where and is a
covariance matrix of a multivariate normal distribution. These parameters are unknown, but are generally simulated from the model based on. The estimated auxiliary function is given by Price et al, (2018) as:
Bayesian synthetic likelihood arises when this auxiliary likelihood is combined with a prior distribution on the parameter (Price et al, 2018). Following Drovandi, Pettitt, and Lee (2015), BSL samples from the following target:
This produces an unbiased estimate of. The ideal BSL target will be achieved as (Price et al, 2018).
5.2.2 Approximate Likelihood by Simulation
In comparison, ABC non-parametrically approximates the likelihood of the summary statistic as:
measures the distance between the observed and simulated data with some distance metric. Often this is chosen as the Euclidean distance, . As with BSL, the ABC framework will converge on the posterior distribution as .
5.3 ABC Overview
ABC aims to approximate a posterior distribution without the existence of the likelihood function . In essence this is achieved by simulating data with a set of prior parameters . The simulated data is then compared with the observed data in some capacity - either in totality or via a set of summary statistics. If the simulated data is deemed close enough to the observed data (using a defined distance measure), the set of sampled prior parameters belongs to the approximated posterior distribution . Formally this can be denoted as
where denotes the set of sampled points where the distance measure meets the epsilon threshold.
While ABC algorithms are well grounded in mathematical theory, they make a number of assumptions and approximations. These need to be carefully assessed before considering the use of ABC for a given model.
As and , . Practically, we need to set to approximate the posterior in a reasonable amount of time. With a high dimensional parameter space, the probability of simulating prior parameters equal to that of the posterior is extremely small. As increases, the posterior distribution shifts more towards the prior. A key component of ABC is choosing such that the posterior is calculated both accurately and within a reasonable amount of time. One such approach is to run the algorithm a large number of times and consider the top of the sampled parameters as the acceptance area. Another is to run the algorithm multiple times until a desired of is attained.
A second potential issue is in the use of summary statistics. In most cases the complete dataset is high in dimensionality. When comparing the observed and simulated data, most samples will be rejected and the algorithm will need to be run for an impractical amount of time. If the data is reduced to a set of summary statistics such that
, this curse of dimensionality is avoided. However, a poor choice of summary statistics will result in a biased posterior that is not truly representative. Additionally, the credible intervals will be inflated due to a loss of information (“Approximate Bayesian computation,” n.d.). This can be resolved by choosing asufficient statistic, which completely explains the original dataset using a smaller subset of statistics.
5.3.1 Summary Statistics
When reducing the dimensionality of a sample with a summary statistic, care must be taken to choose summary statistics that produce an unbiased approximation of the posterior distribution. R.A. Fisher (1922) defined a summary statistic as sufficient if “no other statistic which can be calculated from the same sample provides any additional information as to the value of the parameter to be estimated”. If the summary statistic is a sufficient statistic, the posterior distribution will be equivalent to the posterior distribution computed with the full dataset, given by:
where is a sufficient statistic for the data .
Let be a random sample from a probability distribution with some parameter . A summary statistic is sufficient for a parameter
if the conditional probability distribution of the full datagiven , does not depend on the parameter (Aeschbacher, Beaumont & Futschik, 2012).
In the absence of a sufficient statistic, which is often the case in a complex system, a tradeoff must be found between a loss of information and dimensionality reduction. The incorrect choice of summary statistics will result in a biased posterior which is not representative of the data. Methods exist to find summary statistics which contain the maximum amount of information about the parameters of interest. One such method is Partial least squares regression.
Partial least squares (PLS) is a projection technique which “aims to produce linear combinations of covariates which have high covariance with responses and are uncorrelated with each other” (Sisson et al, 2018). In the context of ABC, the covariates are denoted as , and the responses as . The i PLS component maximises
subject to Cov() = 0 for and a normalisation constraint on such that
= 1 (Sisson et al, 2018). The resulting components can be viewed as a lower dimensional set of statistics than the original data, and can hence be used in ABC in place of the original data for more efficient inference. An advantage of projection methods such as PLS is in their interpretability. It is easy to understand how maximising the variance of dimensions in a datasetinto a subset will often result in a sufficient representation of .
Another set of techniques to reduce the dimensionality of the data are known as subset selection methods. Joyce and Marjoram (2008) propose a step-wise selection approach, where candidate summary statistics are added and/or removed to a subset, and the effect evaluated on the ABC posterior. The rationale behind this is if the set of statistics are sufficient, adding further such statistics will not affect the likelihood of the resulting posterior; however removing statistics will. From Joyce and Marjoram (2008), the technique deems a change to the subset as significant if:
where is an estimated posterior density based on the output from rejection ABC (Sisson et al, 2018).
There exist many other techniques to reduce a large dataset to a smaller set of summary statistics, in addition to ongoing research into new techniques. Each of them have strengths and weaknesses, and therefore should be carefully evaluated when using them in a particular model.
5.4 Rejection ABC
The simplest ABC algorithm, rejection sampling, is a straightforward procedure of simulating data with some given set of parameters and evaluating the data against a set of observed data. The algorithm proceeds as follows:
Direct evaluation of the likelihood is not carried out in the above algorithm. For almost any model where data is observed, data can be simulated, and there exists a set of parameter values to be inferred, rejection ABC can be used as a straightforward approach to approximate the posterior distribution. This is achieved by noting that the acceptance probability of a given parameter is proportional to the probability of generating the observed data , under the model
for a fixed parameter vector. For a fixed , if we generate a dataset from the model , then the probability of generating our observed dataset exactly, so that , is equal to (Sisson, 2018).
5.4.1 Regression Adjustment in Rejection ABC
Regression adjustment is a technique commonly used in Bayesian methods where sampling techniques are proposed to account for discrepancies between simulated and observed summary statistics (Blum, 2017). In the context of ABC, the set of posterior parameters following the simulation are adjusted using weights proportional to the distances between the simulated and summary statistics. This approach gives weighting to parameters which give simulated data closer to the observed data. The parameter values are adjusted by fitting a regression model of the relationship between the posterior parameters and the simulated data (Francious, 2010):
The linear model is fit as above, and the parameters are corrected as follows (Francious, 2010):
A limitation of the rejection algorithm is a low acceptance rate when the prior is very different to the posterior. Beaumont et al (2009) proposed a particle filtering based method known as ABC-SMC. This algorithm alleviates the issue of low acceptance rates by avoiding low rejection rates of the sampling region and gradually evolving towards the target posterior through population filters. In high dimensional data from a complex system this results in a large gain in efficiency. The algorithm proceeds as follows:
in step 14 of the algorithm is chosen as a conditional density that serves as a transition kernel to “move” sampled parameters and then appropriately weight accepted values. In contexts of real-valued parameters, for example, might be taken as a multivariate normal or density centred at or near , and whose scales may decrease as increases. (Bonassi & West, 2015). Kernel choice is discussed further in Section 5.5.1.
The algorithm constructs intermediary distributions with increasingly small tolerance schedules. The tolerances are chosen such that gradual evolution towards the target posterior is achieved. The algorithm continues until particles have been accepted in population . The final set of particles is an approximation of the posterior distribution. The inclusion of a weight calculation for each particle in a population enables the sampling from population with probabilities equal to the normalised weights. These weights are calculated such that particles further from the population mean are sampled and refined.
The ABC-SMC algorithm addresses the main drawback of rejection ABC in the inefficiency of sampling parameters with a high distance from the posterior, and hence rejecting a large number of samples. The repeated sampling from posterior approximations results in a distribution closer to the posterior distribution (Beaumont, 2010). One drawback exists in the decreasing epsilon tolerance. If the quantile of the distances from which to reduce epsilon is appropriately chosen, the number of accepted samples in each population will not decrease significantly as decreases. However, setting this quantile too high will reduce such that a larger proportion of samples are rejected, eliminating a key advantage of the SMC approach. This can be addressed by either calculating at each population using an adaptive method (described in Section 5.5.3), or including conditional density estimation for the final population sample.
An extension to the original SMC algorithm was proposed by Bonassi & West (2015). A joint kernel
is used on the joint distribution of accepted valuesto raise the importance of proposals where the simulated data is close to . This addition is known as ABC-SMC with Adaptive Weights. The algorithm proceeds as follows:
The added step of updating the data based weights adds computation time, although this is usually negligible in comparison to the simulation time (Bonassi & West, 2015). The addition of adaptive weights increases the proportion of accepted samples in comparison to Algorithm 2, and hence decreases the overall number of simulations. As the simulations tend to dominate computation time, ABC-SMC with Adaptive Weights tends to be more computationally efficient than ABC-SMC.
5.5.1 Choice of Pertubation Kernel in ABC-SMC
From each time step , the weights at population are calculated using a pertubation kernel:
where is the probability of the sampled parameter given the prior, and
is the probability of the sampled parameter given the previous population.
A balance needs to be obtained in this kernel such that the parameter space is sufficiently explored, but not so extensively to cause a low acceptance rate (Filippi, 2016). The properties of an optimal kernel are derived from sequential importance sampling theory. Similarity is assessed between two joint distributions where . is constructed by pertubing and accepting it according to some threshold , which is reduced at each time step. Resemblance between the two distributions is the first criteria of an optimal kernel. An optimisation problem is solved to balance sufficiently exploring the parameter space while maintaining a high acceptance rate (Filippi, 2016). Filippi (2016) gives this as the solution to:
This resemblance is in terms of the Kullback - Leibler divergence (Filippi et al, 2013).
There are multiple choices of a kernel to achieve this balance. As with the selection of summary statistics, each should be carefully considered according to the model to be developed. When using ABC to validate and infer parameters of a complex system, it is vital to optimise the algorithm such that the kernel is not sampling from areas of the parameter space which give rise to output far from that of the observed data. On the other hand, a local pertubation kernel will not move the particles sufficiently and therefore may fail to find the true posterior.
The joint proposal distribution in the ABC-SMC algorithm corresponds to sampling a particle from the previous population and pertubing it to obtain a new particle (Filippi et al, 2013). The process typically proceeds as follows:
Sample a particle from the previous population using probabilities
Pertube this particle to obtain
Calculate the weight for the particle as the sum of the previous population weights multiplied by a kernel, comparing the newly pertubed particle to each particle in the previous population
5.5.2 Component wise pertubation kernel
From Filippi et al (2013), for each element in the parameter vector , each individual component
of the vector is pertubed independently using a Gaussian distributed parameterised by meanand variance . Filippi et al (2013) shows this with the form:
5.5.3 Multivariate normal pertubation kernel
It is often the case in models with a large number of parameters that correlation exists between some of the parameters. When using a component-wise pertubation kernel, this can lead to an inadequate reflection of the true posterior (Filippi et al, 2013). We can take into account this correlation by using a multivariate Gaussian distribution, which constructs a covariance matrix for the current population . Again, using the Kullback-Leibler divergence allows for the calculation of an optimal covariance matrix. From Filippi et al (2013), this yields:
In practice this kernel is a popular choice in sequential sampling models, although many other options exist. Consider a set of parameters which are correlated but in a non-linear way. In this case we may wish to construct a covariance matrix for each particle to take into account the non-linear structure. There are multiple ways to do this, however for the purposes of simplicity we will restrict ourselves to using a Gaussian pertubation kernel without taking into account local correlation. For more information about local pertubation kernels, see Filippi et al (2013).
Consider a model where the parameter kernel and data based kernel (Step 8 of Algorithm 3) are both chosen as a multivariate Gaussian kernel. The data based and parameters weights respectively are updated as:
where is a vector of proposed values, is a vector of means of the previous population, and is a positive-definite, symmetric covariance matrix.
As mentioned previously, this kernel will both explore the parameter space while maintaining a high acceptance rate. Each particle is sampled from the previous population and pertubed using a normal distribution. In the case that the chosen particle is resampled such that the simulated data is closer to the observed data, it will be accepted to the current population. If the observed data is further away and the parameter rejected, the particle from the previous population is ‘thrown out’ and a new particle is resampled. The result is both a reduction in variance and convergence on the posterior mean, with the decreasing epsilon threshold acting to move the particles closer to the posterior.
This is demonstrated using a simple example of inferring the mean of a Gaussian distribution. The choice of pertubation kernel here involves finding the probability of the proposed
given the prior mean and standard deviation, and dividing this by the probability of the proposedgiven the mean and standard deviation of the previous population, multiplied by the weights of the previous population. The prior mean is defined as . The weights are updated as:
is the probability of the proposed theta given the prior mean and variance, and
is a kernel function which sums the probabilities of the proposed given each particle in the previous population and the variance of the previous population. The function
in the these kernels is the probability density function of a normal distribution.
|Running Time||Known||Initial Epsilon||Distance Measure||Population Size|
By the seventh population the algorithm has converged on the true parameter. Each subsequent population reduces the variance.
|Iteration||Epsilon||Mean Distance||Mean Loops Per N|
5.5.4 Choice of Epsilon Threshold
The approach to reducing epsilon at each population is another detail of the ABC-SMC algorithm which impacts on efficiency. Often, at each population is adjusted using a quantile based approach (Beaumont et al, 2009). For each corresponding distance vector at population , adjust by some chosen quantile .
Both the choice of initial epsilon and alpha will determine the balance between computational and statistical efficiency. As mentioned previously, will produce an exact posterior, but in a complex system with multiple dimensions will run for an impractical length of time.
One drawback to the quantile based method is that for a large alpha, the algorithm may fail to converge on the posterior. A remedy of this is to use a threshold-acceptance rate curve (Filippi, 2016). This is achieved with the following algorithm:
Fit a Gaussian mixture model to the pertubed population
5.6 Parallelising ABC
There are three methods of controlling the balance between computational speed and posterior accuracy in the ABC-SMC algorithm - the choice of epsilon in each population, the size of the populations, and the number of populations. If the model complexity is small or the computation time is not a concern, the epsilon array can be set to small values, with a large number of populations and a large number of particles in each population. For a complex system with multiple parameters and high dimensional data, the computation time of the algorithm increases exponentially. The two principal reasons for this are 1). The time to run the simulation with the prior parameters and 2). Looping until the epsilon threshold is reached for each sample of the current population. Adjusting the two settings mentioned above will help to reduce the computation time.
An additional measure is that of parallelisation. In the advent of big data there now exist several methods for conducting complex analysis on cluster computing systems. A cluster system comprises of two or more computers working together to perform tasks (ESDS, 2014). They are particularly suited to complex computational tasks which can be divided into many smaller tasks for each computer (node). For the analysis of large datasets it is important to design statistical algorithms which are well suited to this infrastructure. In some cases this is straightforward. In the rejection ABC algorithm, the number of desired loops can be divided between each node and the results merged at the end of the computation. With more complex ABC algorithms with dependencies amongst steps this is less clear. In each population of ABC-SMC there is a dependence on the distance, weight and parameter values from the previous population. Simulations within the current population can be executed in parallel, and the results sent back to the master node for evaluation until the population threshold is reached. As the simulation time dominates when using ABC to infer for an ABM, this method of parallelisation can result in a significant decrease in computation time.
5.7 ABC Inference of a Gaussian Distribution
Two ABC models were developed to infer the mean and standard deviation of a normal distribution - Rejection ABC and ABC-SMC with Adaptive Weights. The true mean was taken to be approximately 4. A Normal prior was set for the mean and a Gamma prior for the standard deviation.
|True Mean||True SD||Mean Prior||SD Prior|
|Iterations||Acc Rate||Running Time|
Rejection ABC with a epsilon of and results in a high rejection rate of 96.5%. This can be adjusted through a more informative prior, although in many cases this is not available. Another adjustment is in epsilon. If this is increased, more samples will be accepted with the tradeoff of a less accurate posterior. The acceptance rate of 3.5% appears to give an accurate posterior.
|5||[16139, 17518, 18857, 20232]||[1, 0.74, 0.56, 0.44, 0.35]||117s|
The ABC-SMC model has a longer running time, but reduces the variance of both posterior estimates and . The acceptance rate at the final population is 50%, which is significantly higher than the rejection model. The increase in computation time is due to the multiple populations and increased time spent computing weights. For a complex model, the reduced time spent simulating data will outweigh this. The posterior mean estimates are slightly closer to the true mean.
5.8 Validation of an ABM using ABC
A challenge in designing an ABM to represent a complex system is representing the system in an accurate manner. Where there is observed data for a system, an ABM can be said to be validated against such a system if its output (simulated data) is close to the system’s observed data. ABC is a suitable approach for this validation. If an ABM is designed with control from a set of parameters, ABC can be used to run the simulation with a set of prior parameters. If the ABC is modelled correctly, it will converge on a set of parameters which will produce simulated data close to the observed data from the system that the ABM is modelled from. In comparison to the ‘brute force’ method described in Section 3.1, significantly less time is spent simulating data using parameters far from the true parameters of the system. Further models will focus on the use of ABC algorithms to infer parameters for an Agent Based Model. While the final pedestrian model is relatively simple in nature, it serves as a basis for the addition of larger amounts of data, a higher dimension of parameters, and a more complex ABM environment.
6 ABC Inference of an ABM
A simple stochastic model was created in NetLogo to demonstrate the use of ABC within an ABM. The simulation consists of a stochastic process with seven nodes arranged in a fork pattern (described in Section 4). This is taken to be a pedestrian network where nodes represent intersections and connections between the nodes are paths. All agents begin at the tail node 6 and travel to a connected node at each step. Observed data is first collected by setting all turning parameters to with the exception of arriving at Node 0 from 3. The probability of travelling vertically to 0 is set to 0.9. When this direction is taken, the action is denoted as . That is, when the agent has travelled from 5 to 3, it then travels to 0.
Observed data was taken as an average of counts at Node 0 over ten runs. The known parameters are
In reality, the number of pedestrians in any given system is often not known. While the proportional differences between the node counts depends on the directional probabilities, the magnitude of the counts is dependent on the number of pedestrians in the system. The observed data was an average of the count at node zero over ten runs, calculated as . The prior parameters are
The Beta distribution is a family of continuous probability distributions defined over the interval [0,1]. It is characterised by two parametersand which control the shape. The expected value and variance of the distribution are defined as
It is commonly used as a prior in Bayesian statistics to describe the probability of an event. It is the conjugate prior probability distribution for the Bernoulli, Binomial, Negative Binomial and Geometric distributions. A flat prior can be defined as. In the following example a prior of was used as a wide prior, assuming little knowledge about the true probability. The expected value and variance of the Beta prior is given as
Both Rejection ABC and ABC-SMC were used to infer the Beta parameter. The following algorithms were used:
|Prior||Observed Data||Summary Statistic|
The NetLogo simulation was controlled using the pyNetLogo library in Python. The Rejection ABC algorithm was programmed in Python.
|Simulations||Accepted Samples||Mean()||Running Time (mins)|
The posterior distribution appears to converge on the true parameter as epsilon decreases. However the running time is long, and 99% of samples are rejected. The same model was run using ABC-SMC to assess the difference in efficiency. The settings of the model are
|Populations||Population Size||Initial Epsilon|
|Simulations||Accepted Samples||Mean()||Running Time (mins)|
Here the acceptance rate is much higher (6%). The algorithm converges on the posterior much more quickly than the rejection algorithm.
Extended Network Model
This model can be extended to inferring parameters for a larger set of nodes. For nodes with multiple possible directions, the Dirichlet distribution can be used as a multivariate generalisation of the Beta prior. The observed data can also be extended to multiple dimensions, where counts are observed at each node in the network. Consider a network with nodes, each with edges extending from the node, and observed counts. The model can be setup as:
This model will serve as a framework for modelling the movement of pedestrians in such a network.
Pedestrian Model Parameters
An intuitive way to describe a pedestrian making a decision for a direction to travel at each intersection is that of a Categorical distribution. Given possible directions, there is a set of probabilities
for travelling in each direction. The Categorical distribution is a generalisation of the Bernoulli distribution with probabilityof outcome 1 and of outcome 2. The generalisation extends the model to possible categories. The probability of each category or outcome is specified with (Categorical distribution, n.d.).
In Bayesian statistics, the Dirichlet distribution is the conjugate prior for the Categorical distribution. In a model where each data point follows a Categorical distribution with an unknown parameter vector p
, this is treated as a random variablewith a prior Dirichlet distribution. The posterior distribution of the parameter is also Dirichlet. Knowledge of a parameter can be updated by incorporating observations one at a time, following a formally defined mathematical process. The expected value of the posterior distribution is
The expected probability of seeing a category is equal to the proportion of successful occurrences of out of all total occurrences.
If there are three possible directions at a given intersection , equal probabilities of each direction can be specified by setting each alpha to the same value, giving a symmetric Dirichlet distribution. The size of each alpha specifies the variance. For example, setting the prior distribution gives a mean and variance of:
7 Hamilton CBD Pedestrian Model
Traffic modelling is an important aspect of transport engineering. Modelling transport networks aids in understanding hidden phenomena of travel and allows for the testing of infrastructure design. There is a need in modern urban environments to prioritise sustainable modes of transport, of which walking plays a key role.
A pedestrian model can be viewed in the context of an Agent Based Model. At each tick, pedestrians make a decision to move to a new location using a complex internal thought process. This decision is based on attributes including, but not limited to; the desired destination, structure of the network and characteristics of the built environment.
Understanding pedestrian flows is vital is designing an environment for pedestrians such that they reach their destination in a safe and enjoyable manner. To aid in this understanding, data first needs to be collected on where pedestrians are travelling in the network.
Pedestrian counters have been setup in the Hamilton CBD at 21 key locations. These counters are situated on store verandahs and use 3D StereoVision cameras, giving them the ability to distinguish the direction a pedestrian is travelling and provide distance and height measurements. Each camera records a total count in two directions in five minute intervals. Currently there is no extensive analysis being conducted with the data from these sensors, apart from tracking the short term and long term trend. Beyond the 21 sensors operated by HCC, there is a long term vision of extending this to more locations to better understand pedestrian flow. The model described below serves as an initial exploration of how a pedestrian model can be developed for Hamilton using count data. The model will aim to answer a key question of pedestrian flows:
What are the probabilities of turning in each possible direction, for each intersection in the Hamilton CBD?
These probabilities are hidden parameters in the network, which result in the counts observed at the pedestrian counters. Using the observed data, we can infer these probabilities parameters using an Agent Based Model and ABC-SMC.
The model focuses on the four pedestrian counters near the Centre Place shopping centre at the intersection of Ward St and Worley Place. The system is defined as a network with entry/exit points, intersections nodes, and counting nodes. The network is loaded from a shapefile of the Hamilton City walking network (NZTM) and is geospatially consistent with the real world location. The 4 pedestrian counters are included as a certain type of node which counts travel through itself. The additional counters (denoted with a dashed circle below) are not installed counters. However for the purposes of covering all entry and exit points they are included as such, with mock data created according to a best guess. The data from these counters is used to validate the ABM.
For simplicity purposes, the network was simplified to two intersections and four counters. The entry to the northern Centre Place building was ignored, but could be added in future models in addition to extending the network to a larger portion of the CBD when the data is available.
An Agent Based Model was developed in NetLogo to simulate the counts at the four counters, with validation carried out using an ABC-SMC with Adaptive Weights model. The data for the four existing counters was first explored to understand how the count varies over time at each location.
|14||Anglesea Ward NE||2018-07-07||09:00:00|
|36||Anglesea Ward NE||2018-07-07||10:00:00|
|40||Anglesea Ward NE||2018-07-07||11:00:00|
|56||Anglesea Ward NE||2018-07-07||12:00:00|
For each counter, the count changes significantly over the course of a day. The peak is generally around lunchtime. There is a slight increase in the evening before falling to a minimum level during the early morning hours. All four counters appear to vary significantly from the weekend to Monday. The count during the lunchtime peak was investigated further.
All four locations appear to have similar counts over the course of a working week. There is a decrease in the count during Saturday and Sunday, particularly at the Anglesea/Ward counter. Sunday experiences the lowest counts for the week. In terms of magnitude, the Anglesea/Ward counter has the highest foot traffic. 12pm-1pm is a key time to understand pedestrian flows, as many people working in the CBD leave their place of work to eat or shop. Understanding the movement of pedestrians at this time will provide valuable insight into where people choose to travel.
7.1 ABM Design
The pedestrian network is represented in NetLogo as a series of nodes and connecting links. This can be viewed as a undirected, acyclic graph with six nodes and five edges. It is not possible to tra