Data assimilation in Agent-based models using creation and annihilation operators

10/08/2019 ∙ by Daniel Tang, et al. ∙ 0

Agent-based models are a powerful tool for studying the behaviour of complex systems that can be described in terms of multiple, interacting “agents”. However, because of their inherently discrete and often highly non-linear nature, it is very difficult to reason about the relationship between the state of the model, on the one hand, and our observations of the real world on the other. In this paper we consider agents that have a discrete set of states that, at any instant, act with a probability that may depend on the environment or the state of other agents. Given this, we show how the mathematical apparatus of quantum field theory can be used to reason probabilistically about the state and dynamics the model, and describe an algorithm to update our belief in the state of the model in the light of new, real-world observations. Using a simple predator-prey model on a 2-dimensional spatial grid as an example, we demonstrate the assimilation of incomplete, noisy observations and show that this leads to an increase in the mutual information between the actual state of the observed system and the posterior distribution given the observations, when compared to a null model.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

When modelling real-world phenomena we often use the model to make forecasts, only to be later faced with new observations that weren’t available at the time of the forecast. This leads to the problem of how to “assimilate” the new observations into an updated forecast and, perhaps, another forecast for a later time. There exists an extensive literature on this problem as applied to weather forecasting models[1][2] which describes how the problem can be reduced using Bayes’ rule:

(1)

Here,

is our original forecast (this is a probability distribution since there is uncertainty in the forecast)

is the likelihood of the observation, , given a forecast, , and is our updated forecast. However, the algorithms developed to do this in weather forecasting models depend on the model’s governing equations having some degree of smoothness, a condition not generally satisfied by agent-based models.

If there are only a very small number of observations, particle filtering or Markov Chain Monte Carlo methods may have some success

[3], but as the number of observations increases, these methods tend to quickly fail. This is because as the number of observations increases, the proportion of the model’s phase-space that has non-zero likelihood often shrinks exponentially, so if the dimensionality of the phase-space is not trivially small, finding a non-zero start point for a Markov chain becomes exponentially harder. Similarly for particle filters; the number of particles needed to prevent all particles ending up with zero probability, increases exponentially with the number of observations.

For these reasons, the problem of data assimilation in agent-based models remains an unsolved problem, and this imposes a severe restriction on the utility of these models when applied to real world problems.

2 Quantum field theory applied to agent-based models

In order to apply Bayes’ rule to agent-based models (ABMs) we must first devise a way to represent a probability distribution over the possible states of an agent-based model. The task is complicated somewhat by the fact that, in general, the exact number of agents in the model is unknown and variable, so the set of model states must span models with any number of agents. Luckily, we can use the mathematical formalism of quantum field theory to do this. In a quantum field there are an unknown number of interacting particles and the quantum field is represented as a quantum amplitude over the possible configurations of particles; to apply this to ABMs, we just need to interpret the particles as agents and replace the quantum amplitudes with probabilities.

2.1 The creation operator

Let represent the empty model state (i.e. the state with no agents present), and let be an operator on model states that has the effect of adding an agent in state to the model state, this is called the creation operator. So, for example, is the state with a singe agent in state . Since the result of a creation operator is itself a state, we can apply another creation operator, for example, to represent a state with two agents, one in state and one in state , or even to represent a state with two agents in state . We can go on applying operators in this way to construct any desired state.

If we now allow states to be multiplied by real numbers and added together, a vector space is induced which can represent probability distributions over model states. For example

would represent a probability distribution with probability that there’s a single agent in state and probability that there’s a single agent in state (with no probability that both agents are present at the same time).

Since each term in a probability distribution always ends in the empty state, , we can just as easily consider the additions and multiplications to be part of the operator, so the example distribution above could be equivalently written . Given this, composure of operators works in the expected way, for example, .

With this representation, we need a way to express the fact that it doesn’t matter in what order the creation operators are applied, so . Since this is true for all states, not just , we can drop the and say111to be correct, the 0 is interpreted as the “multiply by zero” operator or alternatively where is the identity operator

This form of equation is known as a commutation relation or commutator and has a shorthand notation:

(2)

2.2 The annihilation operator

Now let be the annihilation operator which has the following properties:

(3)
(4)

and

(5)

where is the Kronecker delta function.

Given just these properties, we can show that

This is a recurrence relationship which, using equations 3 and 5, can be solved to give

So, the annihilation operator has the effect of multiplying by the number of agents in a state, then removing one of those agents.

Notice that, using the commutation relations in this way we can transform any sequence of operations on into an equivalent form that contains no annihilation operators. We’ll call this the reduced form of the operator.

3 Equations of motion of a probabilistic ABM

Armed with just the creation and annihilation operators we now show that if the behaviour of agents in an ABM can be described as a set of propensities to act and interact at any instant then we can transform the ABM into a “Hamiltonian” operator, containing only addition, multiplication, creation and annihilation operators. The Hamiltonian operator turns a probability distribution over ABM states into a rate of change of that distribution. This is significant in that it allows us to express the equations of motion of a probability distribution as an operator, and so describe probabilistically how the ABM evolves through time. That is, we’ve defined a “probabilistic ABM” that captures all the behaviour of the original ABM in a smooth, differentiable way which, as we’ll show, makes it much easier to perform inference on the ABM.

To do this, start with the definition of an agent’s behaviour: this must be representable as a set of actions and interactions. An action consists of a pair of functions, where is a rate per unit time and is a set of agents. An agent in state is said to have behaviour if in any infinitesimal time-slice, , it replaces itself by the agents in with probability . An interaction consists of a pair of functions, such that an agent in state , in the presence of another agent in state replaces both itself and the other agent with with probability . Although this is not a common way to think about agent-based models, a very large class of models can easily be described in this way.

Without loss of generality, we assume all agents in a model have the same set of behaviours (different behaviours among agents can be achieved by putting an “agent type” marker in the agent’s state and making the behaviour rates zero for certain agent types).

Consider now an action, and let be the results of the action. The Hamiltonian for this action would be

The product term expresses the rate of increase in probability in the state resulting from the action, while the term expresses the (equal in magnitude) rate of decrease in probability in the initial state due to the agent being removed when it acts. Because the annihilation operator, , multiplies by the number of agents in a given state, the rate of change is also multiplied if there are multiple agents to which the behaviour applies (or multiplied by zero if there are no applicable agents).

Now consider an interaction . The Hamiltonian for this interaction would be

The double annihilation operator, , has the effect of multiplying by the number of agent pairs so that each agent in state gets a chance to interact with each agent in state . This also works when , since , and the number of pairs is .

This pattern can easily be extended to interactions between three or more agents, but in this paper we’ll consider only binary interactions.

If an agent has more than one behaviour, the Hamiltonian is simply the sum of the individual behaviours, so in this way we can build a Hamiltonian for a whole ABM. So, an ABM whose agents have a set of action behaviours and a set of interaction behaviours has the Hamiltonian

Although the number of terms in the Hamiltonian can become very large, we’ll see later that we only ever need to deal computationally with the much smaller commutation relations .

Given the Hamiltonian, the time evolution of a probability distribution over ABM states, , is by definition

This has the analytical solution

(6)

where is the initial state, is the state at time and the exponential of an operator, , can be defined as

Equation 6 gives us a prior forecast which can be inserted into Bayes’ rule (equation 1) in order to do data assimilation:

(7)

where is a state at time , are the observations at time and is the probability distribution of the ABM at time given all observations up to that time.

So, we have a succinct, mathematical solution to the problem of data assimilation in an ABM. We now need to translate equation 7 into a numerical algorithm that will execute in a reasonable time.

4 The Deselby distribution

As a first step towards a practical numerical algorithm, we introduce the Deselby distribution which we define as

where is an integer, is a real number and denotes the -order falling factorial of .

The Deselby distributions have the useful property that

and so

(8)

It can also be seen that

(9)

Now imagine a product of Deselby distributions over all agent states

so that, for any

So, in the same way as with , any sequence of operations on can be reduced to a sequence of creation operations on , i.e. a reduced form, and we can consider as a “ground” state in just the way we have been using so far.

So, if we express a probability distribution in terms of operators on we can apply the Hamiltonian to this to get a rate of change in in terms of . This will be much more computationally efficient than dealing directly with operators on .

Note also that when

, the Deselby distribution is just the Poisson distribution, so it is particularly convenient if we can express our prior beliefs about the number of agents in each state of an ABM in terms of Poisson distributions.

4.1 Data assimilation with Deselby distributions

As a concrete example, suppose there are a number of identical agents moving around a 2-dimensional grid, and we observe the number of agents in the

grid-square. To make this slightly more general, suppose our method of observation may be imperfect and there is a probability, r, that an agent is detected given that it is there, such that the likelihood function is just the Binomial distribution

where is the observed number of agents and is the actual number present. Appendix A shows that, if our prior is a Deselby distribution , then the posterior is given by

(10)

Where we introduce the operator which has the properties

(11)
(12)

and

(13)

where is with the component, , replaced by .

Since euqation 10 is true for any Deselby distribution, it is also true for any state described as an operator on . So, given (a distribution at time expressed as an operator on ) and an observation of agents in state at time , the postrior distribution at time is given by

(14)

Multiple observations can be treated by applying operators for each observation

(15)

4.2 Deselby state approximation

Even when working with Deselby distributions, as time progresses the size of the reduced form of the distribution will become large and sooner or later. In order to maintain computational tractability, we’ll need to somehow approximate the distribution. There are many ways of doing this, but a common way is to minimise the Kulback-Leibler divergence between the original distribution and the approximation[4] over a family of possible approximations. This has the effect of minimising the loss of information due to the approximation.

If is the original distribution and the approximation, and we choose to minimise the resulting approximation has a particularly simple form.

By definition the KL-divergence is given by

where we define

and

where is the Kronecker delta function.

is defined to be the conjugate of , where the conjugation operator is defined as having the properties

and

and if is a constant.

If we choose to be a Deselby distribution, and minimise the KL divergence with respect to we have, for all

Solving this for gives

(16)

Where the operator

has the effect of summing over a distribution (i.e. summing the coefficients of an operator expressed in reduced form on ).

This can be interpreted as saying that the Deselby distribution, , that minimises is the one whose mean number of agents in state matches the mean of for all .

This gives us values for given . To calculate , notice that for the KL-divergence to be finite, the support of must be a subset of the support of . So if has value , the probability that there are any less than agents in state in must be zero. So, we can choose to be the highest value for which there are definitely at least that many agents in state in .

5 A practical algorithm for data assimilation

Using equations 15 and 16 we can perform a data assimilation cycle that consists of the following steps: take an initial state , integrate forward in time, apply the set of observations , then finally approximate to the Deselby distribution that minimises the KL-divergence.

Since we’ve just observed agents in state for each we can define the of the approximation to be

(17)

given this, is given by

(18)

To calculate this, we need an algorithm to calculate expressions of the form

To increase numerical stability, we transform this to

where is the identity operator. Expanding the exponential in gives

(19)

where . However,

but since is the Hamiltonian, and the Hamiltonian preserves probability mass,

for all and so we can replace by in equation 19 where and

(20)

without affecting the value of .

This is very convenient computationally, because although is in general very large, is considerably smaller so we have a much more efficient way to calculate the terms of the expansion.

The computation can be made even more efficient by noting that

for all . So when calculating each we can use the commutation relations to remove all creation operators by moving them to the left, leaving only annihilation operators.

In practice, we only need to calculate the first few terms in the expansion in 19 so we end up with

(21)

The terms, can be calculated consecutively using equation 20 with

for the numerator in equation 18 and

for the denominator.

We then just need to multiply by (i.e. the Deselby distribution from the previous assimilation cycle), convert to reduced form and sum the coefficients (in order to perform the operation)222We can also sum coefficients when working with a Deselby ground state since the sum of the coefficients of a Deselby distribution, expressed as an operator on , is 1.

Numerical experiments showed that, when calculating the terms in equation 21

, a good estimate of the error obtained by truncating the series at the

term can be calculated as

(22)

where

This can be calculated along with the terms of the expansion until the error estimate falls below some desired threshold.

A final computational simplification can be made by noticing that when calculating the numerator of equation 18, observations far away from the grid-square will have very little influence on its posterior (depending on and the speed of information flow through the ABM).

This intuition is formalised in Appendix B where we show that if we let be the set of all states operated on by operator , and let

denote the -fold application of a commutation, then if the observations in can be partitioned into two sets and in such a way that

(where above is the empty set, not the ground state) we can factorise out the effect of all observations in (to -order accuracy) in both the numerator and denominator so that they cancel and we can effectively remove the observations in (note that in the above equation we’ve removed the operator as it has no effect under the operation).

6 Spatial predator-prey example model

In order to provide a numerical demonstration of the above techniques, we simulated a 32x32 grid of agents. Each agent could be either a “predator” or a “prey”. At any time a prey had a propensity to move to an adjacent grid-square (i.e. up, down, left or right), to die or to give birth to another prey on the an adjacent grid-square. Predators also had propensities to move to adjacent grid-squares or die, but in the presence of a prey on an adjacent grid-square, they also had a propensity to eat the prey and reproduce into an adjacent grid-square. Perhaps surprisingly, even a model with such a simple set of behaviours exhibits quite complex emergent properties that are difficult to predict. The rate per unit time of each behaviour is shown in table 1.

Aget type Description Rate per unit time
Prey death 0.1
reproduction 0.15
movement 1.0
Predator death 0.1
predation/reproduction (per prey) 0.5
movement 1.0
Table 1: The rates of each behaviour in the predator-prey model

A non-probabilistic, forward simulation of the ABM was performed in order to create simulated observation data. The initial state of the simulation was constructed by looping over all grid-squares and choosing the number of predators in that square from a Poisson distribution with rate parameter and the number of prey from a Poisson distribution with . The model was then simulated forward in time and at time intervals of 0.5 units an observation of the state of the simulation was recorded. An observation was performed as follows: for each grid-square, with 0.02 probability, the number of prey was observed. If an observation was made, each prey in that grid-square was counted with a 0.9 probability. The same procedure was then repeated for predators.

Once the observations were generated, a sequence of data assimilation cycles were performed as described in equations 17 and 18, with a starting state of the Deselby ground state with for all representing a predator and for all representing a prey. The terms in the expansion were calculated until the expected truncation error, calculated using equation 22 fell below 0.2% of the sum of terms up that point.

6.1 Results

The mutual information between the assimilated state and the real state (where the real state is represented by a Dirac delta distribution) was calculated at the end of each assimilation cycle. In order to distinguish between the information accumulated by the data assimilation cycles and the information contained in the current time’s observation and the prior at the start of the simulation, we also calculated a reference value equal to the mutual information between

and the real state (i.e. the posterior given only the initial prior and the current window’s observations).

Figure 1 shows the difference between the mutual information of the assimilated state and the reference value for each assimilation cycle of 10 separate simulations. The upward trend in this plot shows that the assimilation cycle is successfully accumulating information about the real state.

Figure 2 shows snapshots of the real state of the non-probabilistic simulation superimposed on the assimilated state, for three of the simulations after 64 assimilation cycles. Inspection of these snapshots seems to suggest that the mutual information tends to be higher when agents are arranged in clusters rather than being more scattered, which is not surprising. Also, when a grid-square containing an agent is observed but (because of the 0.9 probability of observation) the number of observed agents happens to be zero, this will reduce the mutual information.

Figure 1: Plot showing the increase of mutual information between the computed posterior distribution and the actual state for the first 64 assimilation windows of 10 simulations. The zero-information point is taken to be that of the posterior given the current window’s observations and the prior at the start of the simulation.
Figure 2:

Posterior, assimilated probability (background colour) plotted against actual agent positions (dots) for the highest (top), average (middle) and lowest (bottom) mutual information (out of 10 simulations) after 64 assimilation cycles. Red dots are real prey positions, blue dots are real predator positions. The background colour shows the posterior probability: higher intensity red corresponds to higher probability of prey, higher intensity of blue corresponds to higher probability of predator.

7 Previous work

Applying the methods of quantum field theory to other domains is not new; for example Dodd and Ferguson[5] suggest applying it to epidemiology, O’Dwyer and Green[6] apply it to simulations of biodiversity while Santos et.al.[7] applies it to enzymatic reations. Abarbanel[8] comes closest to our work in that he talks about quantum field theory in the context of data assimilation. However, to our knowledge, nobody has before developed quantum field theory to apply to such a general class of ABMs, nor have they developed it into an algorithm for data assimilation. Also, the various numerical strategies presented here that make the technique computationally tractable, including our presentation of the Deselby distribution is, we believe, new.

8 Discussion and Conclusion

This paper presents some early results of a promising new technique in data assimilation for agent based models. However, there’s plenty of work left to be done and many opportunities for further development. For example, more research into flexible ways of approximating the state or computationally efficient ways of representing a state as a data structure, would be worthwhile. The example model presented here is very small and simple, the next step would be to apply this technique to larger and more complex models. In addition, in this paper we only consider agents with a discrete set of states, the extension to states including continuous variables is fairly straightforward and work on this is currently under way.

At present there are very few techniques that can be used to successfully perform data assimilation in agent based models. We believe the results presented here provide a significant first step towards filling that gap and hope that it will form the foundation of a fruitful approach to the problem.

The code used to create the results in this paper can be found at https://github.com/deselby-research/ProbabilisticABM, where you’ll also find a library of code we’re developing in order to experiment with new techniques in this area.

References

  • [1] Eugenia Kalnay. Atmospheric modeling, data assimilation and predictability. Cambridge university press, 2003.
  • [2] Alberto Carrassi, Marc Bocquet, Laurent Bertino, and Geir Evensen. Data assimilation in the geosciences: An overview of methods, issues, and perspectives. Wiley Interdisciplinary Reviews: Climate Change, 9(5):e535, 2018.
  • [3] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall Press, 2009.
  • [4] Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.
  • [5] Peter J Dodd and Neil M Ferguson. A many-body field theory approach to stochastic models in population biology. PloS one, 4(9):e6855, 2009.
  • [6] James P O’Dwyer and Jessica L Green. Field theory for biogeography: a spatially explicit model for predicting patterns of biodiversity. Ecology letters, 13(1):87–95, 2010.
  • [7] Fernando AN Santos, Hermes Gadêlha, and Eamonn A Gaffney. Fock space, symbolic algebra, and analytical solutions for small stochastic systems. Physical Review E, 92(6):062714, 2015.
  • [8] Henry Abarbanel. Predicting the future: completing models of observed complex systems. Springer, 2013.

Appendix A Appendix: Binomial observation

The product of a Deselby distribution and a binomial distribution is given by

rearranging and dropping constants

(23)

expanding the product of falling factorials

Expressing this as an operator on

so

But, from the commutation relations, we have the identity

so

Let be an operator that has the properties

(24)
(25)

and

then

So

Appendix B Appendix: Factorisation of compound observations

Consider an expression of the form

where , and are operators, and is a Hamiltonian.

Due to the form of a Hamiltonian, for all X, so

Where we introduce the notation

to denote the -fold application of a commutation.

So

Now let

so

So

But

so

So

So

where is a sequence of creation operators and is the Deselby ground state.

If we let be the set of all states operated on by operator , then if

(26)

where above is the empty set (not the ground state) then can easily be factorised into such that

But if consists only of annihilation operators (which we can arrange by using commutation relations to move all creation operators to the left and then removing them), then

but if equation 26 is satisfied then

irrespective of whether we strip the first term of creation operators, so, if 26 is satisfied

where we’ve removed the factorisation of as it doesn’t affect the sum.