When modelling real-world phenomena we often use the model to make forecasts, only to be later faced with new observations that weren’t available at the time of the forecast. This leads to the problem of how to “assimilate” the new observations into an updated forecast and, perhaps, another forecast for a later time. There exists an extensive literature on this problem as applied to weather forecasting models which describes how the problem can be reduced using Bayes’ rule:
is our original forecast (this is a probability distribution since there is uncertainty in the forecast)is the likelihood of the observation, , given a forecast, , and is our updated forecast. However, the algorithms developed to do this in weather forecasting models depend on the model’s governing equations having some degree of smoothness, a condition not generally satisfied by agent-based models.
If there are only a very small number of observations, particle filtering or Markov Chain Monte Carlo methods may have some success, but as the number of observations increases, these methods tend to quickly fail. This is because as the number of observations increases, the proportion of the model’s phase-space that has non-zero likelihood often shrinks exponentially, so if the dimensionality of the phase-space is not trivially small, finding a non-zero start point for a Markov chain becomes exponentially harder. Similarly for particle filters; the number of particles needed to prevent all particles ending up with zero probability, increases exponentially with the number of observations.
For these reasons, the problem of data assimilation in agent-based models remains an unsolved problem, and this imposes a severe restriction on the utility of these models when applied to real world problems.
2 Quantum field theory applied to agent-based models
In order to apply Bayes’ rule to agent-based models (ABMs) we must first devise a way to represent a probability distribution over the possible states of an agent-based model. The task is complicated somewhat by the fact that, in general, the exact number of agents in the model is unknown and variable, so the set of model states must span models with any number of agents. Luckily, we can use the mathematical formalism of quantum field theory to do this. In a quantum field there are an unknown number of interacting particles and the quantum field is represented as a quantum amplitude over the possible configurations of particles; to apply this to ABMs, we just need to interpret the particles as agents and replace the quantum amplitudes with probabilities.
2.1 The creation operator
Let represent the empty model state (i.e. the state with no agents present), and let be an operator on model states that has the effect of adding an agent in state to the model state, this is called the creation operator. So, for example, is the state with a singe agent in state . Since the result of a creation operator is itself a state, we can apply another creation operator, for example, to represent a state with two agents, one in state and one in state , or even to represent a state with two agents in state . We can go on applying operators in this way to construct any desired state.
If we now allow states to be multiplied by real numbers and added together, a vector space is induced which can represent probability distributions over model states. For examplewould represent a probability distribution with probability that there’s a single agent in state and probability that there’s a single agent in state (with no probability that both agents are present at the same time).
Since each term in a probability distribution always ends in the empty state, , we can just as easily consider the additions and multiplications to be part of the operator, so the example distribution above could be equivalently written . Given this, composure of operators works in the expected way, for example, .
With this representation, we need a way to express the fact that it doesn’t matter in what order the creation operators are applied, so . Since this is true for all states, not just , we can drop the and say111to be correct, the 0 is interpreted as the “multiply by zero” operator or alternatively where is the identity operator
This form of equation is known as a commutation relation or commutator and has a shorthand notation:
2.2 The annihilation operator
Now let be the annihilation operator which has the following properties:
where is the Kronecker delta function.
Given just these properties, we can show that
So, the annihilation operator has the effect of multiplying by the number of agents in a state, then removing one of those agents.
Notice that, using the commutation relations in this way we can transform any sequence of operations on into an equivalent form that contains no annihilation operators. We’ll call this the reduced form of the operator.
3 Equations of motion of a probabilistic ABM
Armed with just the creation and annihilation operators we now show that if the behaviour of agents in an ABM can be described as a set of propensities to act and interact at any instant then we can transform the ABM into a “Hamiltonian” operator, containing only addition, multiplication, creation and annihilation operators. The Hamiltonian operator turns a probability distribution over ABM states into a rate of change of that distribution. This is significant in that it allows us to express the equations of motion of a probability distribution as an operator, and so describe probabilistically how the ABM evolves through time. That is, we’ve defined a “probabilistic ABM” that captures all the behaviour of the original ABM in a smooth, differentiable way which, as we’ll show, makes it much easier to perform inference on the ABM.
To do this, start with the definition of an agent’s behaviour: this must be representable as a set of actions and interactions. An action consists of a pair of functions, where is a rate per unit time and is a set of agents. An agent in state is said to have behaviour if in any infinitesimal time-slice, , it replaces itself by the agents in with probability . An interaction consists of a pair of functions, such that an agent in state , in the presence of another agent in state replaces both itself and the other agent with with probability . Although this is not a common way to think about agent-based models, a very large class of models can easily be described in this way.
Without loss of generality, we assume all agents in a model have the same set of behaviours (different behaviours among agents can be achieved by putting an “agent type” marker in the agent’s state and making the behaviour rates zero for certain agent types).
Consider now an action, and let be the results of the action. The Hamiltonian for this action would be
The product term expresses the rate of increase in probability in the state resulting from the action, while the term expresses the (equal in magnitude) rate of decrease in probability in the initial state due to the agent being removed when it acts. Because the annihilation operator, , multiplies by the number of agents in a given state, the rate of change is also multiplied if there are multiple agents to which the behaviour applies (or multiplied by zero if there are no applicable agents).
Now consider an interaction . The Hamiltonian for this interaction would be
The double annihilation operator, , has the effect of multiplying by the number of agent pairs so that each agent in state gets a chance to interact with each agent in state . This also works when , since , and the number of pairs is .
This pattern can easily be extended to interactions between three or more agents, but in this paper we’ll consider only binary interactions.
If an agent has more than one behaviour, the Hamiltonian is simply the sum of the individual behaviours, so in this way we can build a Hamiltonian for a whole ABM. So, an ABM whose agents have a set of action behaviours and a set of interaction behaviours has the Hamiltonian
Although the number of terms in the Hamiltonian can become very large, we’ll see later that we only ever need to deal computationally with the much smaller commutation relations .
Given the Hamiltonian, the time evolution of a probability distribution over ABM states, , is by definition
This has the analytical solution
where is the initial state, is the state at time and the exponential of an operator, , can be defined as
where is a state at time , are the observations at time and is the probability distribution of the ABM at time given all observations up to that time.
So, we have a succinct, mathematical solution to the problem of data assimilation in an ABM. We now need to translate equation 7 into a numerical algorithm that will execute in a reasonable time.
4 The Deselby distribution
As a first step towards a practical numerical algorithm, we introduce the Deselby distribution which we define as
where is an integer, is a real number and denotes the -order falling factorial of .
The Deselby distributions have the useful property that
It can also be seen that
Now imagine a product of Deselby distributions over all agent states
so that, for any
So, in the same way as with , any sequence of operations on can be reduced to a sequence of creation operations on , i.e. a reduced form, and we can consider as a “ground” state in just the way we have been using so far.
So, if we express a probability distribution in terms of operators on we can apply the Hamiltonian to this to get a rate of change in in terms of . This will be much more computationally efficient than dealing directly with operators on .
Note also that when
, the Deselby distribution is just the Poisson distribution, so it is particularly convenient if we can express our prior beliefs about the number of agents in each state of an ABM in terms of Poisson distributions.
4.1 Data assimilation with Deselby distributions
As a concrete example, suppose there are a number of identical agents moving around a 2-dimensional grid, and we observe the number of agents in the
grid-square. To make this slightly more general, suppose our method of observation may be imperfect and there is a probability, r, that an agent is detected given that it is there, such that the likelihood function is just the Binomial distribution
where is the observed number of agents and is the actual number present. Appendix A shows that, if our prior is a Deselby distribution , then the posterior is given by
Where we introduce the operator which has the properties
where is with the component, , replaced by .
Since euqation 10 is true for any Deselby distribution, it is also true for any state described as an operator on . So, given (a distribution at time expressed as an operator on ) and an observation of agents in state at time , the postrior distribution at time is given by
Multiple observations can be treated by applying operators for each observation
4.2 Deselby state approximation
Even when working with Deselby distributions, as time progresses the size of the reduced form of the distribution will become large and sooner or later. In order to maintain computational tractability, we’ll need to somehow approximate the distribution. There are many ways of doing this, but a common way is to minimise the Kulback-Leibler divergence between the original distribution and the approximation over a family of possible approximations. This has the effect of minimising the loss of information due to the approximation.
If is the original distribution and the approximation, and we choose to minimise the resulting approximation has a particularly simple form.
By definition the KL-divergence is given by
where we define
where is the Kronecker delta function.
is defined to be the conjugate of , where the conjugation operator is defined as having the properties
and if is a constant.
If we choose to be a Deselby distribution, and minimise the KL divergence with respect to we have, for all
Solving this for gives
Where the operator
has the effect of summing over a distribution (i.e. summing the coefficients of an operator expressed in reduced form on ).
This can be interpreted as saying that the Deselby distribution, , that minimises is the one whose mean number of agents in state matches the mean of for all .
This gives us values for given . To calculate , notice that for the KL-divergence to be finite, the support of must be a subset of the support of . So if has value , the probability that there are any less than agents in state in must be zero. So, we can choose to be the highest value for which there are definitely at least that many agents in state in .
5 A practical algorithm for data assimilation
Using equations 15 and 16 we can perform a data assimilation cycle that consists of the following steps: take an initial state , integrate forward in time, apply the set of observations , then finally approximate to the Deselby distribution that minimises the KL-divergence.
Since we’ve just observed agents in state for each we can define the of the approximation to be
given this, is given by
To calculate this, we need an algorithm to calculate expressions of the form
To increase numerical stability, we transform this to
where is the identity operator. Expanding the exponential in gives
where . However,
but since is the Hamiltonian, and the Hamiltonian preserves probability mass,
for all and so we can replace by in equation 19 where and
without affecting the value of .
This is very convenient computationally, because although is in general very large, is considerably smaller so we have a much more efficient way to calculate the terms of the expansion.
The computation can be made even more efficient by noting that
for all . So when calculating each we can use the commutation relations to remove all creation operators by moving them to the left, leaving only annihilation operators.
In practice, we only need to calculate the first few terms in the expansion in 19 so we end up with
The terms, can be calculated consecutively using equation 20 with
for the numerator in equation 18 and
for the denominator.
We then just need to multiply by (i.e. the Deselby distribution from the previous assimilation cycle), convert to reduced form and sum the coefficients (in order to perform the operation)222We can also sum coefficients when working with a Deselby ground state since the sum of the coefficients of a Deselby distribution, expressed as an operator on , is 1.
Numerical experiments showed that, when calculating the terms in equation 21
, a good estimate of the error obtained by truncating the series at theterm can be calculated as
This can be calculated along with the terms of the expansion until the error estimate falls below some desired threshold.
A final computational simplification can be made by noticing that when calculating the numerator of equation 18, observations far away from the grid-square will have very little influence on its posterior (depending on and the speed of information flow through the ABM).
This intuition is formalised in Appendix B where we show that if we let be the set of all states operated on by operator , and let
denote the -fold application of a commutation, then if the observations in can be partitioned into two sets and in such a way that
(where above is the empty set, not the ground state) we can factorise out the effect of all observations in (to -order accuracy) in both the numerator and denominator so that they cancel and we can effectively remove the observations in (note that in the above equation we’ve removed the operator as it has no effect under the operation).
6 Spatial predator-prey example model
In order to provide a numerical demonstration of the above techniques, we simulated a 32x32 grid of agents. Each agent could be either a “predator” or a “prey”. At any time a prey had a propensity to move to an adjacent grid-square (i.e. up, down, left or right), to die or to give birth to another prey on the an adjacent grid-square. Predators also had propensities to move to adjacent grid-squares or die, but in the presence of a prey on an adjacent grid-square, they also had a propensity to eat the prey and reproduce into an adjacent grid-square. Perhaps surprisingly, even a model with such a simple set of behaviours exhibits quite complex emergent properties that are difficult to predict. The rate per unit time of each behaviour is shown in table 1.
|Aget type||Description||Rate per unit time|
|predation/reproduction (per prey)||0.5|
A non-probabilistic, forward simulation of the ABM was performed in order to create simulated observation data. The initial state of the simulation was constructed by looping over all grid-squares and choosing the number of predators in that square from a Poisson distribution with rate parameter and the number of prey from a Poisson distribution with . The model was then simulated forward in time and at time intervals of 0.5 units an observation of the state of the simulation was recorded. An observation was performed as follows: for each grid-square, with 0.02 probability, the number of prey was observed. If an observation was made, each prey in that grid-square was counted with a 0.9 probability. The same procedure was then repeated for predators.
Once the observations were generated, a sequence of data assimilation cycles were performed as described in equations 17 and 18, with a starting state of the Deselby ground state with for all representing a predator and for all representing a prey. The terms in the expansion were calculated until the expected truncation error, calculated using equation 22 fell below 0.2% of the sum of terms up that point.
The mutual information between the assimilated state and the real state (where the real state is represented by a Dirac delta distribution) was calculated at the end of each assimilation cycle. In order to distinguish between the information accumulated by the data assimilation cycles and the information contained in the current time’s observation and the prior at the start of the simulation, we also calculated a reference value equal to the mutual information between
and the real state (i.e. the posterior given only the initial prior and the current window’s observations).
Figure 1 shows the difference between the mutual information of the assimilated state and the reference value for each assimilation cycle of 10 separate simulations. The upward trend in this plot shows that the assimilation cycle is successfully accumulating information about the real state.
Figure 2 shows snapshots of the real state of the non-probabilistic simulation superimposed on the assimilated state, for three of the simulations after 64 assimilation cycles. Inspection of these snapshots seems to suggest that the mutual information tends to be higher when agents are arranged in clusters rather than being more scattered, which is not surprising. Also, when a grid-square containing an agent is observed but (because of the 0.9 probability of observation) the number of observed agents happens to be zero, this will reduce the mutual information.
Posterior, assimilated probability (background colour) plotted against actual agent positions (dots) for the highest (top), average (middle) and lowest (bottom) mutual information (out of 10 simulations) after 64 assimilation cycles. Red dots are real prey positions, blue dots are real predator positions. The background colour shows the posterior probability: higher intensity red corresponds to higher probability of prey, higher intensity of blue corresponds to higher probability of predator.
7 Previous work
Applying the methods of quantum field theory to other domains is not new; for example Dodd and Ferguson suggest applying it to epidemiology, O’Dwyer and Green apply it to simulations of biodiversity while Santos et.al. applies it to enzymatic reations. Abarbanel comes closest to our work in that he talks about quantum field theory in the context of data assimilation. However, to our knowledge, nobody has before developed quantum field theory to apply to such a general class of ABMs, nor have they developed it into an algorithm for data assimilation. Also, the various numerical strategies presented here that make the technique computationally tractable, including our presentation of the Deselby distribution is, we believe, new.
8 Discussion and Conclusion
This paper presents some early results of a promising new technique in data assimilation for agent based models. However, there’s plenty of work left to be done and many opportunities for further development. For example, more research into flexible ways of approximating the state or computationally efficient ways of representing a state as a data structure, would be worthwhile. The example model presented here is very small and simple, the next step would be to apply this technique to larger and more complex models. In addition, in this paper we only consider agents with a discrete set of states, the extension to states including continuous variables is fairly straightforward and work on this is currently under way.
At present there are very few techniques that can be used to successfully perform data assimilation in agent based models. We believe the results presented here provide a significant first step towards filling that gap and hope that it will form the foundation of a fruitful approach to the problem.
The code used to create the results in this paper can be found at https://github.com/deselby-research/ProbabilisticABM, where you’ll also find a library of code we’re developing in order to experiment with new techniques in this area.
-  Eugenia Kalnay. Atmospheric modeling, data assimilation and predictability. Cambridge university press, 2003.
-  Alberto Carrassi, Marc Bocquet, Laurent Bertino, and Geir Evensen. Data assimilation in the geosciences: An overview of methods, issues, and perspectives. Wiley Interdisciplinary Reviews: Climate Change, 9(5):e535, 2018.
-  Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall Press, 2009.
-  Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.
-  Peter J Dodd and Neil M Ferguson. A many-body field theory approach to stochastic models in population biology. PloS one, 4(9):e6855, 2009.
-  James P O’Dwyer and Jessica L Green. Field theory for biogeography: a spatially explicit model for predicting patterns of biodiversity. Ecology letters, 13(1):87–95, 2010.
-  Fernando AN Santos, Hermes Gadêlha, and Eamonn A Gaffney. Fock space, symbolic algebra, and analytical solutions for small stochastic systems. Physical Review E, 92(6):062714, 2015.
-  Henry Abarbanel. Predicting the future: completing models of observed complex systems. Springer, 2013.
Appendix A Appendix: Binomial observation
The product of a Deselby distribution and a binomial distribution is given by
rearranging and dropping constants
expanding the product of falling factorials
Expressing this as an operator on
But, from the commutation relations, we have the identity
Let be an operator that has the properties
Appendix B Appendix: Factorisation of compound observations
Consider an expression of the form
where , and are operators, and is a Hamiltonian.
Due to the form of a Hamiltonian, for all X, so
Where we introduce the notation
to denote the -fold application of a commutation.
where is a sequence of creation operators and is the Deselby ground state.
If we let be the set of all states operated on by operator , then if
where above is the empty set (not the ground state) then can easily be factorised into such that
But if consists only of annihilation operators (which we can arrange by using commutation relations to move all creation operators to the left and then removing them), then
but if equation 26 is satisfied then
irrespective of whether we strip the first term of creation operators, so, if 26 is satisfied
where we’ve removed the factorisation of as it doesn’t affect the sum.