New results on particle filters with adaptive number of particles

11/04/2019 ∙ by Victor Elvira, et al. ∙ 0

In this paper, we present new results on particle filters with adaptive number of particles. First, we analyze a method which is based on generating fictitious observations from an approximated predictive distribution of the observations and where the generated observations are compared to actual observations at each time step. We show how the number of fictitious observations is related to the number of moments assessed between the approximated and the true predictive probability density function. Then, we introduce a new statistic for deciding how to adapt the number of particles in an online manner and without the need of generating fictitious particles. Finally, we provide a theoretical analysis of the convergence of a general class of particle filters with adaptive number of particles.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In science and engineering, there are many problems that are studied by dynamic probabilistic models, which mathematically describe the evolution of hidden states and their relation with observations that are sequentially acquired. In many of these problems, the objective is to estimate sequentially the posterior distribution of hidden states. A methodology that has gained considerable popularity in the last two and a half decades is particle filtering (also known as

sequential Monte Carlo) [1, 2]. This is a Monte Carlo methodology that approximates the distributions of interest by means of random (weighted) samples.

Arguably, a key parameter of particle filters (PFs) is the number of used particles. A larger number of particles improves the approximation of the filter but also increases the computational complexity. However, a priori it is impossible to know the appropriate number of particles to achieve desirable accuracies of estimated parameters and distributions.

I-a A brief review of the literature

Until the publication of [3], very few papers had considered the selection/adaptation of the number of particles. In [3], a methodology was introduced to address

this problem with the goal of adapting the number of particles in real-time. The method is based on a rigorous mathematical analysis. Other efforts toward the same goal include the use of a Kullback-Leibler divergence-based approximation error

[4], where the divergence was defined between the distribution of the PF and a discrete approximation of the true distribution computed on a predefined grid. The idea from [4] was further explored in [5]

. A heuristic approach based on the effective sample size was proposed in

[6], [7, Chapter 4]. A disadvantage of using the effective sample size is that once a PF loses track of the hidden state, the effective sample size does not provide information for adjusting the number of particles [8]. See other problems with the effective sample size in [9].

A theoretically-based approach for selecting the number of particles was reported in [10], where Feynman-Kac framework was invoked [11]. In [12]

, an autoregressive model for the variance of the estimators produced by the PF was employed. Both methods operate only in batch modes. In a group of papers on alive PFs, the number of particles is adaptive and based on sampling schemes that ensure a predefined number of particles to have non-zero weights

[13, 14, 15]. In [16], a fixed number of particles is adaptively allocated to several candidate models according to their performances. In [17], particle sets of the same size are generated until an estimation criterion for their acceptance is met.

I-B A summary of the method proposed in [3]

In [3], we introduced a methodology for assessing the convergence of PFs. The proposed method works online, and is both model– and filter–independent. The method is based on simulating fictitious observations from one-step-ahead predictive distributions approximated by the PF, and comparing them with actual observations that are available at each time step. In the case of one-dimensional observations, a statistic is constructed that simply represents the number of fictitious observations which are smaller than the actual observation. We show that when the filter has converged, the distribution of the statistic is uniform on a discrete support. We propose an algorithm for statistically assessing the uniformity of the statistic, and based on it modifying the number of particles. The method is supported with rigorous theory regarding the particle approximation of the predictive distribution of the observations and the convergence of the distribution of the statistic.

I-C Contributions

In this paper, we describe a generic framework of block-adaptive particle filters based on the philosophy from [3]. The main idea is based on a specific statistic, , which is computed at each time step by comparing the received observation and the predictive distribution of the observations approximated by the filter. In this framework, we propose a novel algorithm that assesses the convergence based on the correlation of the statistic .111We presented this algorithm in the conference paper [18]. Moreover, we propose a novel algorithm for assessing the quality of the approximation and for adapting the number of particles. Unlike the algorithm from [3], this algorithm is based on a new statistic , whose computation does not require generation of fictitious particles. We analyze the choice of the key parameters of the adaptive framework, , the number of fictitious observations, , the number of time steps in each window, and the width of that window . We provide several theoretical results related to these parameters that not only justify the robustness of the method but also point to adequate choices of these parameters. Then, we analyze the convergence of the block-adaptive framework when the number of particles is increased. We show that for a sufficiently large window, the errors of previous windows (with less number of particles) are forgotten and that the convergence rate of the approximation depends on the current number of particles.

I-D Organization of the paper

In the next section, we briefly describe particle filtering as a sequential Monte Carlo methodology and we introduce our notation. In Section III, we propose a general online scheme for selecting the number of particles and for adapting the sizes of the observed data blocks for computing statistics. In the following section, we introduce the new statistic for assessing the performance of a PF. We provide convergence results in Section V. In the last two sections, we present results of numerical experiments and our conclusions, respectively.

Ii Particle filtering

We consider Markov dynamic systems whose state-space models are described by

(1)
(2)
(3)

where

  • denotes discrete time;

  • is a -dimensional (random) hidden process (state) at time , and where ,

  • is the a priori pdf of the state,

  • is the conditional density of given ;

  • is a

    -dimensional observation vector at time

    , where and is assumed conditionally independent of all the other observations given ,

  • is the conditional pdf of given . It is often referred to as the likelihood of , when it is viewed as a function of given .

Based on the model and made assumptions, we want to estimate in a recursive manner the posterior probability distributions

, . We can write

where we see how is related to its counterpart at the previous time instant .

We represent the filtering and the predictive posterior pdf of the state by

(4)

The object that plays a central role in our study is the predictive pdf of the observations, with its probability measure given by

It is well known that the predictive pdf is instrumental for model inference [19, 20, 21, 22].

The main idea of particle filtering is to estimate sequentially the probability measures from the observations . The basic method for accomplishing this is known as bootstrap filter (BF) [23]. The BF implements three steps, 1) drawing particles , with being the number of particles, from the conditional pdf, , where the conditioning state has values at previously drawn particles, 2) computation of the normalized importance weights of the particles , and 3) resampling with replacement and with probabilities from the set [24], where the resampling is repeated times.

Once the particles and their weights are computed, we can obtain approximations of various probability measures. The filter measure can be approximated as

(5)

where represents the Dirac delta measure at . Further, given , the predictive pdf’s of , and , , are approximated by

(6)
(7)
(8)

Iii General online selection of the number of particles: block adaptation

  1. [Initialization]

    1. Initialize the particles and the weights of the filter as x_0^(m)∼p(x_0),  m=1,…,M_0, w_0^(m)=1/M_0,  m=1,…,M_0,

    and set .

  2. [For ]

    1. Bootstrap particle filter:

      • [label=–]

      • Resample samples of with weights to obtain .

      • Propagate .

      • Compute the non-normalized weights .

      • Normalize the weights to obtain , .

    2. Fictitious observations:

      • [label=–]

      • Draw .

      • Compute , i.e., the position of within the set of ordered fictitious observations .

    3. If , (assessment of convergence):

      • [label=–]

      • With some specific algorithm from Section III, analyze the sequence .

      • Increase/decrease/keep fixed the number of particles .

      • Set .

    4. If , set and go to . Otherwise, end.

TABLE I: General algorithm for adapting the number of particles (on a BPF)

The generic block-adapting method for selecting the number of particles is summarized in Table I. We implement it on the standard BPF, but the methodology is the same for any other PF. In Step 1(a), the filter is initialized with particles. The filter works at each time step in a standard manner, as described in Step 2(a). However, the first modification w.r.t. the BPF comes in Step 2(b), where fictitious observations are simulated from the predictive distribution of the observations , (see [3, Section IV-A]). These fictitious observations are used to compute the statistic of , where is the position of within the set of ordered fictitious observations .

The algorithm works with windows of size , where at the end of the window (Step 2(c)), the sequence is processed for assessment of convergence of the filter. The number of particles is adapted (increased, decreased, or kept constant) based on the assessment.

Under the assumption of no approximation error in the observation predictive pdf, i.e., with , the fictitious observations are drawn from the same pdf as the actual observation . In that case, the statistic does not depend on the filter approximation, i.e., . Two preliminary theoretical results about the statistic are proved in [3]:

Proposition 1.

If are i.i.d. samples from a common continuous (but otherwise arbitrary) probability distribution, then the pmf of the r.v. is

(9)
Proposition 2.

If the r.v.s are i.i.d. with common pdf , then the r.v.’s in the sequence are independent.

Since in practice, is just an approximation of the predictive observation pdf , it is also shown that the pmf of converges to a uniform pmf when .

In the following, we describe two different assessment methods, one that tests the uniformity and the other the correlation of , respectively.

Iii-a Algorithm 1: Uniformity of the statistic

The algorithm assesses the uniformity of the empirical distribution of the statistic within the block, exploiting Proposition 1

. Under the null hypothesis (perfect convergence of the filter), the r.v.

is uniform on , i.e., the pmf of the r.v. is the pmf of Eq. (9). Therefore, a statistical test is performed for checking whether

is a sequence of samples from the uniform distribution.

Iii-B Algorithm 2: Correlation of the statistic

If the filter has converged, the samples of at the end of the block are i.i.d., and distributed uniformly on . Since independence implies absence of correlation, we can test if the samples of are correlated [18]. We note that in estimating the autocorrelation, longer windows (larger value of ) allow for more accurate estimation. However, in that case we lose on the responsiveness in the adaptation.

Iii-C Sharing of moments of the approximating and true predictive distributions

A key parameter in the assessment framework is the number of fictitious observations, . In the following, we show that if the pmf of is uniform, then the first moments of and defined by and , respectively, for are identical, and where denotes the cdf of .

Lemma 1.

For any pdf and its associated cdf ,

(10)

Proof:

(11)
(12)
(13)
(14)

where we applied the change of variable .∎

Lemma 2.

Assume that the pmf of is uniform. Then the following equalities hold:

(15)

Proof: The proof can be found in Appendix A.

Theorem 1.

Assume that the pmf of is uniform. Then has at least the following moments equal to the respective moments of the true .

Proof: See Lemma 1 and Lemma 2.

Remark 1.

Theorem 1 shows that the first noncentral moments of with respect to and are identical.

In the next section we propose a new method for assessing the convergence of the filter. It is based on a new statistic, , which follows the same distribution as when . Further, we show that the complexity of computing is .

Iv A new method for assessment of convergence

We introduce a new method for assessing convergence of a PF, which does not simulate fictitious observations. It is based on a new statistic that we define below. Let us assume first that there is no approximation error. Then we can write . We define the r.v. , where is the cdf of . Next, we prove two propositions about .

Proposition 3.

If is distributed according to some continuous probability distribution, then the pdf of the r.v. is .

Proof: This is known as the probability integral transform, and its simple proof is widely known, see, e.g. [25, p. 139].

Proposition 4.

The r.v.’s in the sequence are independent.

The sequence of r.v.’s are constructed to be independent, similarly as the sequence , as is shown in Appendix B of [3]. ∎

Note that in practice, the predictive pdf is approximated by the particle filter by with associated cdf , i.e., the cdf of is not exactly . However, if the assumptions of Theorem 1 in [3] are met, the theorem guarantees the a.s. convergence of the approximate measure , and it enables us to describe the error between and . Note that the error goes to zero when . To be specific, we introduce the r.v. with cdf . First, we make the same assumptions as in [3, Section III], (), (), and () . Two of these assumptions are related to the likelihood of the state. Namely, the likelihood has to be bounded, (), and differentiable with respect to with bounded derivatives up to order , (). The assumption ()  requires that the random probability measure satisfies a certain inequality. We have the following convergence result for .

Theorem 2.

Let be a sample from , and let the observations be fixed. Further, let the Assumptions (), () and () hold. Then, there exists a sequence of non-negative r.v.’s such that a.s. and

(17)

In particular, a.s. Moreover, the pdf of the r.v. when converges to .

Proof: The result is a consequence of Theorem 1 in [3], where it is proved that

where is the indicator function defined as

(18)

Note that, independently of the set and, therefore, Theorem 1 in [3] can be applied and a.s., which implies the convergence of their cdfs as . Moreover, due to Proposition 3, , with , i.e., its pdf converges to .∎

Remark 2.

The computation of the statistic requires evaluation of the approximated predictive cdf at . This will require the evaluation of kernels (cdfs), one for each particle, which may become a heavy computational load if the number of particles is high.

Iv-a Relation between and

Note that represents the number of fictitious observations that are smaller than , while represents the probability mass of in the interval . The connection between the two statistics is strong. Intuitively, when the number of simulated fictitious observations goes to infinity, the rate of these observations that are smaller than should tend to the probability of a fictitious observation being smaller than . More precisely,

Theorem 3.

If we grow the number of fictitious observations, , to infinity, the statistic divided by becomes the statistic , i.e.,

(19)

Proof: Recall that , where is the indicator function defined in Eq. (18). It is possible to estimate this integral by drawing samples from and building the raw Monte Carlo estimator

. Note that this estimator is unbiased, i.e., according to the law of large numbers,

(20)

Remark 3.

Note that the complexity of when grows to infinity as well, while the computation of depends on the number of particles .

V Convergence of the bootstrap particle filter with adaptive number of particles

In this section we obtain explicit error rates for a general class of PFs that update the number of particles, , at the end of blocks of observations of length . Let

(21)

denote the Markov kernel that governs the dynamics of the state sequence and write

(22)

to indicate the conditional pdf of the observations. While the notation in (21) implies that the kernel has a density w.r.t. the Lebesgue measure, the results to be presented in this section actually hold for a broader class of kernels.

Table II outlines the general algorithm we analyze here. This is a BPF with particles in the th block of observations whose length is . The theoretical results we introduce are valid for variable block sizes, , as well as for fixed block sizes, for every (i.e., Table III is a particular case of the general algorithm). Our analysis also holds independently of the update rule for , as long as only positive values are permitted. Specifically, we assume that there is a positive lower bound such that for every . In practice, we usually have a finite upper bound as well (but this plays no role in the analysis).

  1. [Initialization]

    1. Draw independent samples from the prior and assign uniform weights , .

    2. Set (block counter) and choose (size of the first block).

  2. [For ]

    1. Bootstrap particle filter:

      • [label=–]

      • Propagate the particles , .

      • Compute normalized weights , .

    2. Fictitious observations:

      • [label=–]

      • Draw .

      • Compute , i.e., the position of the actual observation within the set of ordered fictitious observations .

    3. Assessment of convergence: If (end of the -th block) then:

      • [label=–]

      • Analyze with some specific algorithm from Section III

      • Set .

      • Select the number of particles .

      • Select the block size .

      • Resample particles with replacement, from the weighted set , to obtain .

      Else:

      • [label=–]

      • Resample particles with replacement, from the weighted set , to obtain .

TABLE II: General BPF with block-adaptive number of particles, .

Intuitively, we aim at proving that the error bounds for the approximate filter change as we update the number of particles . For a real r.v. with probability measure , the norm of , with , is

(23)

Since the approximate measures produced by the algorithm in Table II are random, the approximation error , where is some integrable function, is a real r.v. and we can assess its norm. In the sequel, we show that by the end of the th block of observations, the upper bound of the error , where and is a bounded real function, depends on , but not on the past values or the lower bound . This is true under certain regularity assumptions that we introduce below.

We start with constructing the prediction-update (PU) operators that produce the sequence of filtering probability measures given a prior measure , the sequence of kernels and the likelihoods .

Definition 1.

Let be the set of probability measures on the space and let the sequence of operators satisfy the relationships

(24)

for any .

It is not hard to see that Definition 1 implies that . In order to represent the evolution of the sequence of filtering measures over several time steps, we introduce the composition of operators

(25)

It is apparent that . The composition operator is most useful to represent the filters obtained after consecutive steps when we start from different probability measures at time , i.e., to compare and for .

For our analysis, we assume that the sequences of probability measures generated by , , are stable. The intuitive meaning is that such sequences “forget” their initial condition over time. This is formalized below.

Assumption 1.

For every and every there exists such that

(26)
(27)

The strongest assumption in our analysis is that the sequence of likelihoods is uniformly bounded away from zero (as well as upper bounded), as specified below.

Assumption 2.

There exists a positive constant such that

(28)

for every and every .

Note that Assumption A.2 depends not only on the form of the pdf but also on the specific sequence of observations While A.2 may appear restrictive, it is typical in the analysis of uniform convergence of PFs [11, 26] and is expected to hold naturally when the state space is compact.

The main result in this section is stated next.

Theorem 4.

Let and let be the particle approximation of the filtering measure produced by the block-adaptive BPF in Table II. If assumptions A.1 and A.2 hold, then for any there exists such that

(29)
(30)

for every , every and a constant independent of , and .

See Appendix B for a proof.

We make a few remarks about Theorem 4:

  • The theorem states that can be chosen in such a way that the “inherited error” due to a low number of particles in the th block can be forgotten when a larger number is selected in the th block (due to the stability property of the PU operator ). In particular, if , then and .

  • The “big oh” notation indicates that, for some constant ,

    (31)

    In practice, the variability of the window lengths can be a drawback (or just an unwanted complication). The inequality (31) can obviously be satisfied as well for a constant . If we choose a constant window length, , then the second error term in (29) has the form

    (32)

    Hence, we simply need to choose sufficiently large to ensure that is small enough (the error will not vanish as , however).

  • If the sequence of probability measures generated by the PU operators is exponentially stable [11] and the s are suitably chosen, then the inequality (29) reduces to

    (33)

    for some constant .

Vi Numerical experiments

In the first experiment, we show the relation between the correlation coefficient of and the MSE of estimator built by the particle approximation in a non-linear state-space model. Then, we complement the results of III-C, showing numerically some properties of for different values of and , and their connection to the statistic . Third, we show numerically the convergence of the block-adaptive BPF.

(a) MSE in the estimate of the posterior mean.
(b) Algorithm 1 of Section III-A. We show the p-value of the Pearson’s test for assessing the uniformity of the statistic .
(c) Algorithm 2 of Section III-B. The computed Pearson’s correlation coefficient as a function of .
Fig. 1: Stochastic growth model: MSE, p-value of the Pearson’s , and Pearson’s correlation coefficient .

Vi-a Assessing convergence from the correlation of .

Consider the stochastic growth model [19],

(34)
(35)

where , and and are independent Gaussian r.v.’s with zero mean, and variance and , respectively. At this point, we define two models:

  • Model 1: and ,

  • Model 2: and .

In this example, we ran the BPF for time steps, always with a fixed number of particles. We tested different values of . In order to evaluate the behavior of , we set fictitious observations. Figure 1(a) shows the mean squared error (MSE) in the estimate of the posterior mean for each value of , which obviously decreases with . Figure 1(b) displays the p-value of the Pearson’s test for assessing the uniformity of (in the domain ) in windows of (Algorithm 1 of Section III-A; see more details in [3]). Clearly, increasing the number of particles also increases the p-value, i.e., the statistic distribution becomes closer to the uniform distribution. Figure 1(c) is related to Algorithm 2 of Section III-B. We show the sample Pearson’s correlation coefficient , using the whole sequence of statistics , computed with a lag . All results are averaged over independent runs.

We observe that when we increase , the correlation between consecutive statistics decreases. It is interesting to note that the curve of the correlation coefficient has a very similar shape to the MSE curve. While can easily be computed, the MSE is always unknown, which shows the utility of Algorithm 2.

It can be seen that both algorithms can identify a malfunctioning of the filter when the number of particles is insufficient. We note that Algorithm 2 works better for Model 1 than for Model 2 because the autocorrelation of the statistics is more sensitive in detecting the malfunctioning for low . However, Algorithm 1 works better for Model 2 because the p-value of the uniformity test is always smaller than in Model 1, i.e., it is more discriminative. Therefore, there is no clear superiority of one algorithm over the other.

Vi-B The three-dimensional Lorenz system

Table III shows results of the Lorenz example described in [3, Section V-A] with fixed number of particles . We show the MSE in the approximation of the posterior mean, averaged over runs. Again is the sample Pearson’s correlation coefficient, using the whole sequence of statistics with a lag , and p-val is the p-value of the Pearson’s test for assessing the uniformity of the same set. Similar conclusions than in previous example can be obtained.


Fixed
8 16 32 64 128 256 512 1024 2048 4096 8192 16384
MSE 105.63 75.56 40.19 15.69 5.90 2.90 1.77 1.55 1.53 1.52 1.52 1.52
0.6927 0.4939 0.2595 0.1132 0.0463 0.0273 0.0210 0.0190 0.0195 0.0151 0.0151 0.0192
p-val 0.0393 0.1276 0.2923 0.4279 0.4823 0.5016 0.5117 0.5106 0.4998 0.5141 0.5040 0.5181
TABLE III: Lorenz Model: , , . Algorithm details: , . MSE in the approximation of the posterior mean, the averaged , and the averaged p-value of the Pearson’s chi-square test on the uniformity on .

Vi-C Behavior of and its relation with

In Fig. 2, we show the histograms of and for the stochastic growth model described in (34)-(35). We set . The BPF is with fixed . When grows, the pmf seems to converge to the pdf of . In Table, IV we show the averaged absolute error (distance) between the realizations of r.v.s and for the stochastic growth model with fixed . The results are averaged over time steps in independent runs. It is clear that when grows, the deviation between both r.v.s, which take values in , decreases. Thus, for , the difference is on average .


2 3 5 7 10 20 50 100 1000 5000
0.2254 0.1836 0.1409 0.1183 0.0987 0.0696 0.0435 0.0305 0.0097 0.0043