I Introduction
In science and engineering, there are many problems that are studied by dynamic probabilistic models, which mathematically describe the evolution of hidden states and their relation with observations that are sequentially acquired. In many of these problems, the objective is to estimate sequentially the posterior distribution of hidden states. A methodology that has gained considerable popularity in the last two and a half decades is particle filtering (also known as
sequential Monte Carlo) [1, 2]. This is a Monte Carlo methodology that approximates the distributions of interest by means of random (weighted) samples.Arguably, a key parameter of particle filters (PFs) is the number of used particles. A larger number of particles improves the approximation of the filter but also increases the computational complexity. However, a priori it is impossible to know the appropriate number of particles to achieve desirable accuracies of estimated parameters and distributions.
Ia A brief review of the literature
Until the publication of [3], very few papers had considered the selection/adaptation of the number of particles. In [3], a methodology was introduced to address
this problem with the goal of adapting the number of particles in realtime. The method is based on a rigorous mathematical analysis. Other efforts toward the same goal include the use of a KullbackLeibler divergencebased approximation error
[4], where the divergence was defined between the distribution of the PF and a discrete approximation of the true distribution computed on a predefined grid. The idea from [4] was further explored in [5]. A heuristic approach based on the effective sample size was proposed in
[6], [7, Chapter 4]. A disadvantage of using the effective sample size is that once a PF loses track of the hidden state, the effective sample size does not provide information for adjusting the number of particles [8]. See other problems with the effective sample size in [9].A theoreticallybased approach for selecting the number of particles was reported in [10], where FeynmanKac framework was invoked [11]. In [12]
, an autoregressive model for the variance of the estimators produced by the PF was employed. Both methods operate only in batch modes. In a group of papers on alive PFs, the number of particles is adaptive and based on sampling schemes that ensure a predefined number of particles to have nonzero weights
[13, 14, 15]. In [16], a fixed number of particles is adaptively allocated to several candidate models according to their performances. In [17], particle sets of the same size are generated until an estimation criterion for their acceptance is met.IB A summary of the method proposed in [3]
In [3], we introduced a methodology for assessing the convergence of PFs. The proposed method works online, and is both model– and filter–independent. The method is based on simulating fictitious observations from onestepahead predictive distributions approximated by the PF, and comparing them with actual observations that are available at each time step. In the case of onedimensional observations, a statistic is constructed that simply represents the number of fictitious observations which are smaller than the actual observation. We show that when the filter has converged, the distribution of the statistic is uniform on a discrete support. We propose an algorithm for statistically assessing the uniformity of the statistic, and based on it modifying the number of particles. The method is supported with rigorous theory regarding the particle approximation of the predictive distribution of the observations and the convergence of the distribution of the statistic.
IC Contributions
In this paper, we describe a generic framework of blockadaptive particle filters based on the philosophy from [3]. The main idea is based on a specific statistic, , which is computed at each time step by comparing the received observation and the predictive distribution of the observations approximated by the filter. In this framework, we propose a novel algorithm that assesses the convergence based on the correlation of the statistic .^{1}^{1}1We presented this algorithm in the conference paper [18]. Moreover, we propose a novel algorithm for assessing the quality of the approximation and for adapting the number of particles. Unlike the algorithm from [3], this algorithm is based on a new statistic , whose computation does not require generation of fictitious particles. We analyze the choice of the key parameters of the adaptive framework, , the number of fictitious observations, , the number of time steps in each window, and the width of that window . We provide several theoretical results related to these parameters that not only justify the robustness of the method but also point to adequate choices of these parameters. Then, we analyze the convergence of the blockadaptive framework when the number of particles is increased. We show that for a sufficiently large window, the errors of previous windows (with less number of particles) are forgotten and that the convergence rate of the approximation depends on the current number of particles.
ID Organization of the paper
In the next section, we briefly describe particle filtering as a sequential Monte Carlo methodology and we introduce our notation. In Section III, we propose a general online scheme for selecting the number of particles and for adapting the sizes of the observed data blocks for computing statistics. In the following section, we introduce the new statistic for assessing the performance of a PF. We provide convergence results in Section V. In the last two sections, we present results of numerical experiments and our conclusions, respectively.
Ii Particle filtering
We consider Markov dynamic systems whose statespace models are described by
(1)  
(2)  
(3) 
where

denotes discrete time;

is a dimensional (random) hidden process (state) at time , and where ,

is the a priori pdf of the state,

is the conditional density of given ;

is a
dimensional observation vector at time
, where and is assumed conditionally independent of all the other observations given , 
is the conditional pdf of given . It is often referred to as the likelihood of , when it is viewed as a function of given .
Based on the model and made assumptions, we want to estimate in a recursive manner the posterior probability distributions
, . We can writewhere we see how is related to its counterpart at the previous time instant .
We represent the filtering and the predictive posterior pdf of the state by
(4) 
The object that plays a central role in our study is the predictive pdf of the observations, with its probability measure given by
It is well known that the predictive pdf is instrumental for model inference [19, 20, 21, 22].
The main idea of particle filtering is to estimate sequentially the probability measures from the observations . The basic method for accomplishing this is known as bootstrap filter (BF) [23]. The BF implements three steps, 1) drawing particles , with being the number of particles, from the conditional pdf, , where the conditioning state has values at previously drawn particles, 2) computation of the normalized importance weights of the particles , and 3) resampling with replacement and with probabilities from the set [24], where the resampling is repeated times.
Once the particles and their weights are computed, we can obtain approximations of various probability measures. The filter measure can be approximated as
(5) 
where represents the Dirac delta measure at . Further, given , the predictive pdf’s of , and , , are approximated by
(6)  
(7)  
(8) 
Iii General online selection of the number of particles: block adaptation

The generic blockadapting method for selecting the number of particles is summarized in Table I. We implement it on the standard BPF, but the methodology is the same for any other PF. In Step 1(a), the filter is initialized with particles. The filter works at each time step in a standard manner, as described in Step 2(a). However, the first modification w.r.t. the BPF comes in Step 2(b), where fictitious observations are simulated from the predictive distribution of the observations , (see [3, Section IVA]). These fictitious observations are used to compute the statistic of , where is the position of within the set of ordered fictitious observations .
The algorithm works with windows of size , where at the end of the window (Step 2(c)), the sequence is processed for assessment of convergence of the filter. The number of particles is adapted (increased, decreased, or kept constant) based on the assessment.
Under the assumption of no approximation error in the observation predictive pdf, i.e., with , the fictitious observations are drawn from the same pdf as the actual observation . In that case, the statistic does not depend on the filter approximation, i.e., . Two preliminary theoretical results about the statistic are proved in [3]:
Proposition 1.
If are i.i.d. samples from a common continuous (but otherwise arbitrary) probability distribution, then the pmf of the r.v. is
(9) 
Proposition 2.
If the r.v.s are i.i.d. with common pdf , then the r.v.’s in the sequence are independent.
Since in practice, is just an approximation of the predictive observation pdf , it is also shown that the pmf of converges to a uniform pmf when .
In the following, we describe two different assessment methods, one that tests the uniformity and the other the correlation of , respectively.
Iiia Algorithm 1: Uniformity of the statistic
The algorithm assesses the uniformity of the empirical distribution of the statistic within the block, exploiting Proposition 1
. Under the null hypothesis (perfect convergence of the filter), the r.v.
is uniform on , i.e., the pmf of the r.v. is the pmf of Eq. (9). Therefore, a statistical test is performed for checking whetheris a sequence of samples from the uniform distribution.
IiiB Algorithm 2: Correlation of the statistic
If the filter has converged, the samples of at the end of the block are i.i.d., and distributed uniformly on . Since independence implies absence of correlation, we can test if the samples of are correlated [18]. We note that in estimating the autocorrelation, longer windows (larger value of ) allow for more accurate estimation. However, in that case we lose on the responsiveness in the adaptation.
IiiC Sharing of moments of the approximating and true predictive distributions
A key parameter in the assessment framework is the number of fictitious observations, . In the following, we show that if the pmf of is uniform, then the first moments of and defined by and , respectively, for are identical, and where denotes the cdf of .
Lemma 1.
For any pdf and its associated cdf ,
(10) 
Proof:
(11)  
(12)  
(13)  
(14) 
where we applied the change of variable .∎
Lemma 2.
Assume that the pmf of is uniform. Then the following equalities hold:
(15) 
Proof: The proof can be found in Appendix A.
Theorem 1.
Assume that the pmf of is uniform. Then has at least the following moments equal to the respective moments of the true .
Remark 1.
Theorem 1 shows that the first noncentral moments of with respect to and are identical.
In the next section we propose a new method for assessing the convergence of the filter. It is based on a new statistic, , which follows the same distribution as when . Further, we show that the complexity of computing is .
Iv A new method for assessment of convergence
We introduce a new method for assessing convergence of a PF, which does not simulate fictitious observations. It is based on a new statistic that we define below. Let us assume first that there is no approximation error. Then we can write . We define the r.v. , where is the cdf of . Next, we prove two propositions about .
Proposition 3.
If is distributed according to some continuous probability distribution, then the pdf of the r.v. is .
Proof: This is known as the probability integral transform, and its simple proof is widely known, see, e.g. [25, p. 139].
Proposition 4.
The r.v.’s in the sequence are independent.
The sequence of r.v.’s are constructed to be independent, similarly as the sequence , as is shown in Appendix B of [3]. ∎
Note that in practice, the predictive pdf is approximated by the particle filter by with associated cdf , i.e., the cdf of is not exactly . However, if the assumptions of Theorem 1 in [3] are met, the theorem guarantees the a.s. convergence of the approximate measure , and it enables us to describe the error between and . Note that the error goes to zero when . To be specific, we introduce the r.v. with cdf . First, we make the same assumptions as in [3, Section III], (), (), and () . Two of these assumptions are related to the likelihood of the state. Namely, the likelihood has to be bounded, (), and differentiable with respect to with bounded derivatives up to order , (). The assumption () requires that the random probability measure satisfies a certain inequality. We have the following convergence result for .
Theorem 2.
Let be a sample from , and let the observations be fixed. Further, let the Assumptions (), () and () hold. Then, there exists a sequence of nonnegative r.v.’s such that a.s. and
(17) 
In particular, a.s. Moreover, the pdf of the r.v. when converges to .
Proof: The result is a consequence of Theorem 1 in [3], where it is proved that
where is the indicator function defined as
(18) 
Note that, independently of the set and, therefore, Theorem 1 in [3] can be applied and a.s., which implies the convergence of their cdfs as . Moreover, due to Proposition 3, , with , i.e., its pdf converges to .∎
Remark 2.
The computation of the statistic requires evaluation of the approximated predictive cdf at . This will require the evaluation of kernels (cdfs), one for each particle, which may become a heavy computational load if the number of particles is high.
Iva Relation between and
Note that represents the number of fictitious observations that are smaller than , while represents the probability mass of in the interval . The connection between the two statistics is strong. Intuitively, when the number of simulated fictitious observations goes to infinity, the rate of these observations that are smaller than should tend to the probability of a fictitious observation being smaller than . More precisely,
Theorem 3.
If we grow the number of fictitious observations, , to infinity, the statistic divided by becomes the statistic , i.e.,
(19) 
Proof: Recall that , where is the indicator function defined in Eq. (18). It is possible to estimate this integral by drawing samples from and building the raw Monte Carlo estimator
. Note that this estimator is unbiased, i.e., according to the law of large numbers,
(20) 
∎
Remark 3.
Note that the complexity of when grows to infinity as well, while the computation of depends on the number of particles .
V Convergence of the bootstrap particle filter with adaptive number of particles
In this section we obtain explicit error rates for a general class of PFs that update the number of particles, , at the end of blocks of observations of length . Let
(21) 
denote the Markov kernel that governs the dynamics of the state sequence and write
(22) 
to indicate the conditional pdf of the observations. While the notation in (21) implies that the kernel has a density w.r.t. the Lebesgue measure, the results to be presented in this section actually hold for a broader class of kernels.
Table II outlines the general algorithm we analyze here. This is a BPF with particles in the th block of observations whose length is . The theoretical results we introduce are valid for variable block sizes, , as well as for fixed block sizes, for every (i.e., Table III is a particular case of the general algorithm). Our analysis also holds independently of the update rule for , as long as only positive values are permitted. Specifically, we assume that there is a positive lower bound such that for every . In practice, we usually have a finite upper bound as well (but this plays no role in the analysis).

Intuitively, we aim at proving that the error bounds for the approximate filter change as we update the number of particles . For a real r.v. with probability measure , the norm of , with , is
(23) 
Since the approximate measures produced by the algorithm in Table II are random, the approximation error , where is some integrable function, is a real r.v. and we can assess its norm. In the sequel, we show that by the end of the th block of observations, the upper bound of the error , where and is a bounded real function, depends on , but not on the past values or the lower bound . This is true under certain regularity assumptions that we introduce below.
We start with constructing the predictionupdate (PU) operators that produce the sequence of filtering probability measures given a prior measure , the sequence of kernels and the likelihoods .
Definition 1.
Let be the set of probability measures on the space and let the sequence of operators satisfy the relationships
(24) 
for any .
It is not hard to see that Definition 1 implies that . In order to represent the evolution of the sequence of filtering measures over several time steps, we introduce the composition of operators
(25) 
It is apparent that . The composition operator is most useful to represent the filters obtained after consecutive steps when we start from different probability measures at time , i.e., to compare and for .
For our analysis, we assume that the sequences of probability measures generated by , , are stable. The intuitive meaning is that such sequences “forget” their initial condition over time. This is formalized below.
Assumption 1.
For every and every there exists such that
(26)  
(27) 
The strongest assumption in our analysis is that the sequence of likelihoods is uniformly bounded away from zero (as well as upper bounded), as specified below.
Assumption 2.
There exists a positive constant such that
(28) 
for every and every .
Note that Assumption A.2 depends not only on the form of the pdf but also on the specific sequence of observations While A.2 may appear restrictive, it is typical in the analysis of uniform convergence of PFs [11, 26] and is expected to hold naturally when the state space is compact.
The main result in this section is stated next.
Theorem 4.
See Appendix B for a proof.
We make a few remarks about Theorem 4:

The theorem states that can be chosen in such a way that the “inherited error” due to a low number of particles in the th block can be forgotten when a larger number is selected in the th block (due to the stability property of the PU operator ). In particular, if , then and .

The “big oh” notation indicates that, for some constant ,
(31) In practice, the variability of the window lengths can be a drawback (or just an unwanted complication). The inequality (31) can obviously be satisfied as well for a constant . If we choose a constant window length, , then the second error term in (29) has the form
(32) Hence, we simply need to choose sufficiently large to ensure that is small enough (the error will not vanish as , however).
Vi Numerical experiments
In the first experiment, we show the relation between the correlation coefficient of and the MSE of estimator built by the particle approximation in a nonlinear statespace model. Then, we complement the results of IIIC, showing numerically some properties of for different values of and , and their connection to the statistic . Third, we show numerically the convergence of the blockadaptive BPF.
Via Assessing convergence from the correlation of .
Consider the stochastic growth model [19],
(34)  
(35) 
where , and and are independent Gaussian r.v.’s with zero mean, and variance and , respectively. At this point, we define two models:

Model 1: and ,

Model 2: and .
In this example, we ran the BPF for time steps, always with a fixed number of particles. We tested different values of . In order to evaluate the behavior of , we set fictitious observations. Figure 1(a) shows the mean squared error (MSE) in the estimate of the posterior mean for each value of , which obviously decreases with . Figure 1(b) displays the pvalue of the Pearson’s test for assessing the uniformity of (in the domain ) in windows of (Algorithm 1 of Section IIIA; see more details in [3]). Clearly, increasing the number of particles also increases the pvalue, i.e., the statistic distribution becomes closer to the uniform distribution. Figure 1(c) is related to Algorithm 2 of Section IIIB. We show the sample Pearson’s correlation coefficient , using the whole sequence of statistics , computed with a lag . All results are averaged over independent runs.
We observe that when we increase , the correlation between consecutive statistics decreases. It is interesting to note that the curve of the correlation coefficient has a very similar shape to the MSE curve. While can easily be computed, the MSE is always unknown, which shows the utility of Algorithm 2.
It can be seen that both algorithms can identify a malfunctioning of the filter when the number of particles is insufficient. We note that Algorithm 2 works better for Model 1 than for Model 2 because the autocorrelation of the statistics is more sensitive in detecting the malfunctioning for low . However, Algorithm 1 works better for Model 2 because the pvalue of the uniformity test is always smaller than in Model 1, i.e., it is more discriminative. Therefore, there is no clear superiority of one algorithm over the other.
ViB The threedimensional Lorenz system
Table III shows results of the Lorenz example described in [3, Section VA] with fixed number of particles . We show the MSE in the approximation of the posterior mean, averaged over runs. Again is the sample Pearson’s correlation coefficient, using the whole sequence of statistics with a lag , and pval is the pvalue of the Pearson’s test for assessing the uniformity of the same set. Similar conclusions than in previous example can be obtained.
Fixed 
8  16  32  64  128  256  512  1024  2048  4096  8192  16384 
MSE  105.63  75.56  40.19  15.69  5.90  2.90  1.77  1.55  1.53  1.52  1.52  1.52 
0.6927  0.4939  0.2595  0.1132  0.0463  0.0273  0.0210  0.0190  0.0195  0.0151  0.0151  0.0192  
pval  0.0393  0.1276  0.2923  0.4279  0.4823  0.5016  0.5117  0.5106  0.4998  0.5141  0.5040  0.5181 
ViC Behavior of and its relation with
In Fig. 2, we show the histograms of and for the stochastic growth model described in (34)(35). We set . The BPF is with fixed . When grows, the pmf seems to converge to the pdf of . In Table, IV we show the averaged absolute error (distance) between the realizations of r.v.s and for the stochastic growth model with fixed . The results are averaged over time steps in independent runs. It is clear that when grows, the deviation between both r.v.s, which take values in , decreases. Thus, for , the difference is on average .

2  3  5  7  10  20  50  100  1000  5000 
0.2254  0.1836  0.1409  0.1183  0.0987  0.0696  0.0435  0.0305  0.0097  0.0043 
Comments
There are no comments yet.