Bayesian tracking and parameter learning for non-linear multiple target tracking models

10/08/2014 ∙ by Lan Jiang, et al. ∙ University of Cambridge University of Bristol 0

We propose a new Bayesian tracking and parameter learning algorithm for non-linear non-Gaussian multiple target tracking (MTT) models. We design a Markov chain Monte Carlo (MCMC) algorithm to sample from the posterior distribution of the target states, birth and death times, and association of observations to targets, which constitutes the solution to the tracking problem, as well as the model parameters. In the numerical section, we present performance comparisons with several competing techniques and demonstrate significant performance improvements in all cases.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The multiple target tracking (MTT) problem is to infer, as accurately as possible, the states or tracks of multiple moving objects from noisy measurements. The problem is made difficult by the fact that the number of targets is unknown and changes over time due to the birth of new targets and the death of existing ones. Moreover, objects are occasionally undetected, false non-target generated measures (clutter) may be recorded and the association between the targets and the measurements is unknown.

Given observations recorded over a length of time, say from time to , our aim is to jointly infer the target tracks and the MTT model parameters. We adopt a Bayesian approach and our main contribution in this paper is a new Markov chain Monte Carlo (MCMC) algorithm to sample from the MTT posterior distribution, which is a trans-dimensional distribution with mixed continuous and discrete variables. The discrete variables are comprised of the number of targets, birth and death times, and association of observations to targets, while the continuous variables are individual target states and model parameters.

For a linear Gaussian MTT model (see Section V-A) an MCMC method for tracking, excluding parameter learning, was proposed in [1]

. This MCMC algorithm samples in a much smaller space than we have to since the continuous valued target states can be integrated out analytically; i.e. it amounts to sampling a probability mass function on a discrete space. (Their method is referred to as MCMC-DA hereinafter.) However, this model reduction cannot be done for a general non-linear and non-Gaussian MTT model, so the sampling space has to be enlarged to include the continuous state values of the targets. Despite this, our new algorithm is efficient in that it approaches the performance of MCMC-DA for the linear Gaussian MTT model, which will be demonstrated in the numerical section.

An MCMC algorithm for tracking in a non-linear non-Gaussian MTT model, but excluding parameter learning, was also recently proposed by [2]. Their method follows the MCMC-DA technique of [1]

closely. Although the likelihood of the non-linear non-Gaussian MTT model is not available when the continuous valued states of the targets are integrated out, an unbiased estimate of it can be obtained using a particle filter. The Metropolis-Hastings algorithm can indeed be applied as long as the likelihood of the Bayesian posterior can be estimated in an unbiased fashion and this has been the subject of many recent papers in Bayesian computation; e.g. see

[3, 4, 5]. This property is exploited in [2] and their MCMC sampler for tracking is essentially the MCMC-DA method combined with an unbiased estimate of the likelihood of the reduced model (i.e. continuous states integrated out) which is given by the particle filter. (In the literature on Bayesian computation, this algorithm is known as the Particle Marginal Metropolis Hastings (PMMH); see [3] for an extensive discussion in a non-MTT context.) Although appealing because it is simple to implement, the method in [2]

can result in an inefficient sampler as we show when comparing with our method. This is because the likelihood estimate has a high variance and this will reduce the overall average acceptance probability of the algorithm. When static parameters are taken into account, which

[2] did not do, the variance problem becomes far worse as many products that form the MTT likelihood would have to be simultaneously unbiasedly estimated for the acceptance probability of every proposed parameter change. An elegant solution to this problem is the Particle Gibbs (PGibbs) algorithm of [3] for parameter learning in state-space models; we extend this technique to the MTT model.

Our MCMC algorithm for tracking and parameter learning is a batch method and is suitable for applications where real-time tracking is not essential; e.g. the recent surge in the use of tracking in Single Molecule Fluorescence Microscopy [6, 7]. However, our technique can be incorporated into existing online trackers (e.g., the Multiple Hypotheses Tracking (MHT) algorithm [8], the Joint Probabilistic Data Association Filter (JPDAF) [9], and the Probability Hypothesis Density (PHD) filter [10, 11]) to correct past tracking errors in light of new observations as well as for learning the parameters. There are numerous ways to effect this, for example, by applying MCMC to tracks within a fixed window of time, which is a technique frequently used in the particle filtering literature for online inference in state-space models. See [12, 13] for more discussions on this. Note that, on-line trackers mentioned above normally ignore parameter learning problem with a few exceptions discussed in [14] where an online maximum likelihood method was proposed for calibrating linear Gaussian MTT model.

Additional contributions of this paper are several interesting comparisons with existing methods. (i) To quantify the loss of efficiency of our new algorithm compared to MCMC-DA [1] that works on a reduced sampling space, we compare them directly for linear Gaussian MTT model, and show that we do indeed perform almost comparably to MCMC-DA. (ii) A comparison with [2] is given to show that our technique outperforms theirs with much less particles. (iii) To demonstrate improvements over online tracking, we present a comparison with the MHT algorithm [8]. As mentioned before, our technique is not a competitor to online tracking but can be incorporated into such trackers to correct past errors. (iv) We compare our parameter estimates with those obtained by the approximate maximum likelihood technique in [15] which is built on the Poisson approximation of the likelihood. While ours is Bayesian, there should be, at least, agreement between the maximum likelihood estimate and the mode of the posterior. We show that some parameter estimates obtained by [15] are significantly biased.

The remainder of the paper is organised as follows. In Section II, we describe the MTT model and formulate the Bayesian target tracking and static parameter estimation problems for the MTT model. In Section III, we propose a new MCMC tracking algorithm that combines a novel extension of MCMC-DA algorithm to non-linear MTT models with a particle Gibbs move for effectively refreshing the samples for target tracks. In Section IV, we show how to do Bayesian static parameter estimation based on the MCMC tracking algorithm presented in Section III. Numerical examples are shown in Section V for the comparisons mentioned above.

Ii Multiple target tracking model

The hidden Markov model (HMM), or the state-space model (SSM), is a class of models commonly used for modelling the physical dynamics of a

single target. In an HMM, a latent discrete-time Markov process is observed through a process of observations such that


where , , and

are the dimensions of the state and observation. In this paper, a random variable (r.v.) is denoted by a capital letter, while its realisation is denoted by a small case. We call

, ,

the initial, transition, and measurement densities respectively (resp.), and they are parametrised by a real valued vector


In an MTT model, the state and the observation at each time are the random finite sets (we use bold letters to denote sets):

Each element of is the state of an individual target. The number of targets under surveillance changes over time due to the death of existing targets and the birth of new targets. Independently from other targets, a target survives to the next time with survival probability and its state evolves according to the transition density , otherwise it ‘dies’. In addition to the surviving targets, new targets are ‘born’ from a Poisson process with density and each of their states is initialised by sampling from the initial density . The hidden states of the new born targets and surviving targets from time make up . We assume that at time there are only new born targets, i.e. no surviving targets from the past. Independently from other targets, each target in is detected and generates an observation according to observation density with probability . In addition to observations generated from detected targets, false measurements (clutter) can appear from a Poisson process with the density

and are uniformly distributed over

. We denote by the superposition of clutter and measurements of the detected targets.

Ii-a The law of MTT model

In the following, we give a description of the generative model of the MTT problem, where are treated as ordered sets for convenience. A series of r.v.’s are now defined to give a precise characterisation of the MTT model. Let be a vector of ’s and ’s where ’s indicate survivals and ’s indicate deaths of targets from time . For ,

Denote the number of surviving targets at time , and the number of ‘birth’ at time . We have

At time , the surviving targets from time are re-labeled as , and the newly born targets are denoted as (according to certain numbering rule specified by users as will be addressed shortly). The order of the surviving targets at time is determined by their ancestor order at time . Specifically, we define the ancestor vector for ,

Note that denotes the ancestor of target from time , i.e., evolves to for . Next, we define to be a vector showing the target to measurement association at time . For ,

Denote the number of detected targets at time , and the number of false measurements at time . We have

where denotes the cardinality of the set. Sampling from the prior of , amounts to first sampling a binary detection vector whose element is an independent and identically distributed (i.i.d.) Bernoulli r.v. with success parameter (to decide which targets are detected, i.e, indices of non-zero entries in ), then sample a association vector to determine the association between detected targets and observations uniformly from all -permutations of , i.e, with probability (to decide specific values for non-zeros entires of ).

The main difficulty in the MTT problem is that we do not know birth-death times of targets, whether they are detected or not, and which measurement point in is associated to which detected target in . Now we define data association


to be the collection of the above mentioned unknown r.v.’s at time , and


be the vector of the MTT model parameters. Assuming survival and detection probabilities are state independent, we can write down the MTT model described literally above as


Here is used to denote a finite sequence ,

denotes the probability mass function of the Poisson distribution with mean

, is the volume (the Lebesgue measure) of , and is the indicator function of the numbering rule for the new born targets (e.g, if new-borns are ordered in an ascending order of the first component, then is the set of states satisfying ).111

is introduced here to avoid the labelling ambiguity of new born targets. The labelling ambiguity also arises in other areas, e.g. Bayesian inference of mixture distributions; see

[16] for more details.
So the joint density of all the variables of the MTT is

Finally, the marginal likelihood of the data is given by

Ii-B Two equivalent mathematical descriptions for MTT

Note that, conditional on , may be regarded as a collection of HMMs (with different starting and ending times and possible missing observations) and observations which are not relevant to any of these models. In the MTT terminology, each HMM corresponds to a target, starting and ending times of HMMs correspond to birth and death times of those targets, and missing and irrelevant observations correspond to mis-detections and clutter.

Note that, each target has a distinct label where , which is determined by its birth time and the numbering of its initial state at the birth time (dependent on the numbering rule). Let and be the birth and death time of the target with label , and denote its trajectory as

where is the -th state of target ; is the observation generated by provided detection, otherwise we take ; is its life span. In particular, form a HMM with initial and state transition densities and and observation density as in (1) with the convention that to handle mis-detections. In addition, we define that contains all irrelevant observations during time with .

To recover from , we also need to know which contains222We can write where is a vector with being the index of in (the collection of all observations at its appearing time ) if , otherwise . the information of the birth time, the death time and the indices of measurements assigned to target for . is defined for clutter so that it contains all clutter’s appearance times and their corresponding measurement indices. The point we want to make here is that given ordering rule for new born targets, we have a one-to-one mapping between the two equivalent descriptions of the MTT model, i.e.


In Figure 1, we give a realisation of the MTT model to illustrate the r.v.’s introduced in both descriptions and show the correspondence between these two descriptions. It can be seen that each target (HMM) evolves and generates observations independently, with the only dependancy introduced by the target labels dependent on the numbering rule.

[name=X11, style=Cdet] & & [name=X21, style=Cmisdet] & & [name=X31, style=Cdet] & & [name=X41, style=Cmisdet]

& [name = Y11, mnode=r] & & & & [name=Y31, mnode=r]

[name=X12, style=Cdet] & & [name=X22, style=Cdet] & & [name=X32, style=Cdet] & & [name=X42, style=Cdet]

& [name=Y12, mnode=r] & & [name=Y22, mnode=r] & & [name=Y32, mnode=r] & & [name=Y42, mnode=r]

[name=X13, style=Cmisdet] & & [name=X23, style=Cdet] & &[name=X43, style=Cmisdet]

& [name=Y13, mnode=r] & & [name=Y23, mnode=r] & & [name=Y33, mnode=r]

Fig. 1: A realisation from the MTT model: states of a targets are connected with arrows and with their observations when detected. Undetected targets are coloured grey, and false measurements are in dashed lines. For this example,
, , ,
, , ;
, , ,
, , ;
, , ;   , , .

Although it is more straightforward to write down the MTT probability model in terms of the first description, see (4)-(6), the second description here is indispensable for our MCMC moves where we first propose change to for some target or a set of targets, then we get the unique based on the equivalence of these two descriptions.

Ii-C Bayesian tracking and parameter estimation for MTT

There are two main problems we are interested in this paper: assuming is known, the first one is to estimate the data association and the states of the targets given the observations . This problem is formalised as estimating the posterior distribution


where serves as a normalising constant not depending on . We present a novel MCMC method which samples from the posterior distribution (8) for non-linear MTT models in Section III.

The second problem we are interested is the static parameter estimation problem, that is estimating from the data . We regard as a r.v. taking values in with a prior density , and our goal is to estimate the posterior distribution of given data, that is


which is intractable for MTT models in general. In Section IV, we extend our MCMC tracking method in Section III to get samples from the joint posterior distribution .

Iii Tracking with known parameters

In this section we assume the parameter of the MTT model is known and we want to estimate the posterior density defined in (8).

For a linear Gaussian MTT model, one can consider the following factorisation of the posterior density

and concentrate on sampling from , as , the likelihood of the data given the data association, can be calculated exactly. Similarly, once we have samples for , can be calculated exactly for every sample of .333Strictly speaking, the closed forms are available when we ignore the ordering rule here. This is indeed the case for the MCMC-DA algorithm of [1], which is essentially an MCMC algorithm for sampling from . However, when the MTT model is non-linear, which is the case in this paper, MCMC-DA is not applicable since is not available. [2] proposed to circumvent this by using an unbiased estimator in place of , which is obtained by running a particle filter for each target. This is essentially the PMMH algorithm of [3] applied to the MTT problem. However, this strategy mixes slowly due to the variance of the estimate of , especially when the number of particles is small, which is demonstrated in Section V-A. It is also not efficient since is only a by-product of the PMMH algorithm, and not used to propose the change of data association . In this paper, we first design an efficient sampler to change and together based on the old samples to avoid the variance problem encountered in the PMMH when the particle number is small. Then, we refresh by applying the particle Gibbs (PGibbs) algorithm proposed in [3] to accelerate mixing.

This section documents our MCMC algorithm for sampling jointly from (8). Before going into the details, it will be useful to have an insight into the distribution in (8). Notice that the dimension of is proportional to which is determined by the data association . Therefore, the posterior distribution in (8) is trans-dimensional and the standard Metropolis-Hastings (MH) algorithm is not applicable for this distribution.

A general method for sampling from a trans-dimensional distribution is the reversible jump MCMC (RJ-MCMC) algorithm of [17]. Assume we have the target distribution where is discrete, and is a vector with dimension that changes with . Here, can be considered as a model index, whose dimension is not necessarily different from for . To move a sample from to a subspace with a higher dimension, we can first propose , where is the model index such that , and are extra continuous r.v.’s such that (dimension matching). Finally the candidate sample is given by a bijection: . For the reverse move, with probability propose to move to subspace , and use the bijection to get . The acceptance probability for the proposed sample is where


where the rightmost term is the Jacobian of . The acceptance ratio of the reverse move is


In the MTT model, each data association corresponds to a model index , corresponds to the continuous variable , and corresponds to . From this perspective, we can devise a RJ-MCMC algorithm for (8) which has two main parts: (i) MCMC moves that are designed to explore the data association , followed by (ii) an MCMC move that explores the continuous states . While the later move aims to explore only, we also need to adapt to respect the adopted ordering rule of new born targets. We present a single iteration of the proposed MTT algorithm in Algorithm 1 referred to as MCMC-MTT.

Input: Current sample , data , parameter , number of inner loops ,
Output: Updated sample
for  do
      Update by one of the MCMC moves in Algorithm 2 to explore the data association
for  do
      Update by an MCMC move (Algorithm 3) to explore the continuous state space
Algorithm 1 MCMC-MTT

Algorithm 1 can be viewed as an extension of MCMC-DA [1] to the non-linear non-Gaussian case by incorporating into the sampling space. Designing the MCMC kernel for the first loop is demanding and we reserve Section III-A for the description of this kernel. The second loop uses a PGibbs kernel to refresh the samples of conditioned on the data association, which is an important factor for fast mixing when we enlarge our sampling space. The PGibbs step is standard since given the data association, the MTT model can be decoupled into a set of HMMs (as emphasised by the alternative description introduced in Section II-B).

We have found that Algorithm 1 can work properly with any initialisation for , even with the all clutter case, i.e. , hence , and , which is a convenient choice when no prior information is available. We generally take an order of magnitude larger than ( typically) as the second loop takes more time than the first one.

Iii-a MCMC to explore the data association

Algorithm 2 proposes a new data association with one of the following six moves at random:

  1. birth move: to create a new target and its trajectory;

  2. death move: to randomly delete an existing target;

  3. extension move: to randomly extend an existing track;

  4. reduction move: to randomly reduce an existing track;

  5. state move: to randomly modify the links between state variables at successive times;

  6. measurement move: to randomly modify the links between state variables and observation variables.

The first four of the moves change the dimension of , and hence they will be called trans-dimensional moves where RJ-MCMC needs to be applied. Specifically, the dimension matching here is done by introducing new states or deleting existing ones, and the bijections are such that the Jacobian in (10) is always . Reversibility is ensured by pairing the birth (resp. extension) move with the death (resp. reduction) move. The last two moves, i.e., the state move and the measurement move, leave the dimension of unchanged, so called as dimension-invariant moves, and a normal MH step can be applied. We will see later that these two moves are self-reversible, i.e., they are paired with themselves. In the following subsection, we describe the essence of each move included in Algorithm 2.

Input: Current sample , data , parameter , window parameter
Output: Updated sample
Sample randomly. switch  do
      case  do propose by the birth move  case  do propose by the death move  case  do propose by the extension move  case  do propose by the reduction move  case  do propose by the state move  case  do propose by the measurement move 
Calculate the MCMC acceptance probability for move
(See (12), (13), (15), (16), (11) for the calcatuion of ). Change with probability , otherwise reject the proposal.
Algorithm 2 MCMC moves to update data association

Iii-A1 Trans-dimensional moves

Two pairs of moves (birth/death, extension/reduction) are designed to jump between different dimensions for .

Birth and death moves

Assume the current sample of our MCMC algorithm for implies existing targets. We propose a new target with randomly chosen birth and death times and randomly assigned observations from the clutter, i.e. observations unassigned to any of the existing targets. We give a sketch of the birth move here.

We first propose a random birth time and sample death time based on (note, can be changed later during this birth process) for the new target, then extend the trajectory of the target forward in time in a recursive way until . Each extension step proceeds as follows. Assume the latest observation we assigned to the new target is observed at time . (For the first iteration, , takes the mean of the initial position.) We define the time block where

given a user defined probability (close to 1). The logic behind this is that within the next measurement would appear with a priori probability larger than . Among all the unassigned observations in this time block , we form a set of candidate observations whose distance to (which depends on both time and space) is less than a certain threshold set by users. Note should be big enough so that block contains most possible candidates. (i) With probability , we decide that the next observation to be assigned to the new target is located in and choose it randomly from the set of candidate observations with probability inversely proportional to the distance to , provided that the set is non-empty. If the set of candidate measurements is empty, however, we terminate the target either at if , or at some random time in the block (proposed by taking into account) otherwise. The termination time is the final proposed death time for the target. (ii) If (i) is not performed, i.e. with probability , we decide that the target is not detected during the whole block . Then we recommence the process above from the end of the block, unless . We refer to this iterative observation assignment procedure as grouping measurement step, at the end of which, we obtain containing the birth time, the death time and measurement indices of the new born target, and we denote the probability induced in this step. The new target’s states

are proposed by running unscented Kalman filter (UKF)

[18] followed by backwards sampling [19], which is essentially a Gaussian proposal for the target states (see Appendix A for more on UKF and backwards sampling). Denote the probability density induced in this step. The sampled hidden states will serve as dimension matching parameters of the RJ-MCMC algorithm. Given the set , new data association can be obtained deterministically by the one-to-one mapping (7) mentioned in section II-B according to the ordering rule. Finally, we get new states , where is to insert into at the corresponding positions indicated by . The resulting Jacobian is .

The death move, which is the reverse move of the birth move, is done by randomly deleting one of the existing tracks. The acceptance ratio of the birth move is


whose reciprocal is the acceptance ratio for the corresponding death move. Here, is the probability, induced by the death move. Note that, depends on and the distance between the last assigned observation of the target and all clutter in the next few time steps. Thus, in some sense, the move exploits a pseudo-posterior distribution of the life time of the target and the target-observation assignments given the unassigned data points.

Compared to the birth move in [1], our birth move allows any number of consecutive mis-detections (note the parameter ) and improves the efficiency of the target-observation assignments. Also, our birth move proposes the continuous state components of the new born target which are integrated out in [1].

Extension and reduction moves

In this move, we choose one of the existing targets, and extend its track either forwards or backwards in time. The idea of forward extension is outlined as follows, and the backward one can be executed in a similar way. First decide how long we will extend the target based on , and decide the detection at each time for the extended part, based on and the number of clutter at that time. To extend from time to , if the target is detected, we assign to it an observation chosen from the clutter at time with a probability inversely proportional to its distance to the predicted (prior) mean of the state at . (Here, we mean by the ‘distance’ between and .) Then we calculate the Gaussian approximation of the state posterior by applying the unscented transformation [18] using the chosen observation. The forward extension step is repeated forwards in time until we reach the extension length. Denote the probability induced here, where consists of the new death time and the observation information of the extended part. Then, backwards sample the extended part states by Gaussian proposals denoted by that is calculated based on the forward filtering density (the Gaussian approximation of the posteriors) used in proposing . Finally, and can be obtained similarly to the birth move based on the one-to-one mapping in (7), and the Jacobian term in (10) is .

The reduction move paired with the extension move is implemented as follows. We randomly choose target among the existing targets, then choose the reduction type and the reduction time point, either to discard part of the track, or to discard its part. Denote the probability induced here. The acceptance ratio of the extension move is


whose reciprocal is the acceptance ratio for the corresponding reduction move.

Compared to the extension/reduction move in [1], our extension/reduction move is done in both ways instead of merely forward extension. Also the extension move makes use of the hidden states to add in measurements instead of using the last assigned measurement. Again, the continuous state variables are proposed here instead of being marginalised as in [1].

Iii-A2 Dimension invariant moves

These moves leave the dimension of invariant and are dedicated to changing the links between the existing target states at successive times (state move) and the assignments between the target states and measurements (measurement move). The target state values are also modified in order to increase the acceptance rate. These two moves are specially designed here, where the state move can be considered as certain combinations of the split/merge and switch moves in [1], while the measurement move corresponds to the update move in [1], but with more choice of modification to the observation assignment. The diversity of the modification choice is enhanced by introducing the state variables into the sampling space.

(a) state move
(b) measurement move
Fig. 2: Graphical illustration of the state move and the measurement move
State Move

In this move, we randomly choose time and locally change , i.e. the links between and . Figure 1(a) is given to illustrate the move. Assume we would like to change the descendant link of . When has descendant , we can propose to change its descendant to which originally evolved from (sub-moves in Figure 1(a)), or to link to the initial state of a target born at time (sub-moves ), or to delete the link (sub-move ). Sub-moves have different arrangements for the old descendant , who becomes clutter in sub-move , or the descendant of in sub-move (i.e. switches its ancestor with ), or the new descendant of in sub-move . Sub-moves differ in a similar way in terms of the old descendant arrangement. When has no descendant, it can be merged with a new-born target at time by linking to its initial state (sub-move ), or steal another surviving target’s descendant (sub-move ). Reversibility is ensured by paring sub-moves and , and , and the remaining ones with themselves444For the reversible move of , we choose to have the descendant link changed. For the other moves, we still choose .. Note that, the new link, e.g, the one between and in sub-move , means and all its descendants together with their observations will become ’s descendants and the corresponding observations in the latter time. Essentially, by changing the step described above proposes for each target in set whose state links are modified. Denote the probability induced here, where .

Note that, when the state noise is small, the state move will mostly be rejected if we only modify the state links. Thus, local modification of is necessary to get state moves accepted. For this reason, we propose new , where is the parts of within the time window centred at with window size parameter where


by Gaussian proposals, i.e., running UKF and backward sampling for each target conditioned on its observations in and its states right before and after the window at times and resp., if they exist. Denote the probability density of proposing new local target states. After updating for each