Many long run variance estimators used in stationary time series, steady state simulation, and Markov chain Monte Carlo (MCMC) applications include a lag window. This paper proposes lugsail lag windows whose shape resembles the similarly named lugsail (a fore-and-aft, four-cornered sail that is suspended from a spar or yard). The proposed lag window is fast to calculate and general enough to include the Bartlett lag window and Bartlett flat-top lag windows (Anderson, 1971; Politis and Romano, 1995, 1996).
The distinguishing feature of lugsail windows (see Figure 1) is they increase above one before decreasing back down to zero. We are not familiar with other lag windows with this property. Berg and Politis (2009) state “…there is no benefit in allowing the window to have values larger than 1.” As we illustrate here, the benefit of values larger than one is they offset downward bias commonly encountered with other lag windows (see e.g. Chien et al., 1997; Flegal and Jones, 2010; Priestley, 1981), and hence provide more accurate estimates. In short, lugsail windows induce upward bias in finite sample settings, while remaining asymptotically unbiased as long as the truncation point is allowed to increase as the sample size increases.
Let be a stationary ergodic process with stationary distribution defined on a space . For , interest is in estimating . Let , then the ergodic average
with probability 1 as. Quality of is assessed by the long run variance, .
Most often, spectral variance (SV) methods are used to estimate by weighting and truncating the sample lag covariances. That is, let be a truncation point, be a lag window at lag , and let be the sample lag covariance matrix at lag . Then the multivariate SV estimator is,
Although is asymptotically unbiased, it can exhibit significant bias in finite samples. This bias comes from (i) ignoring lag covariances above the truncation point and (ii) downweighting lag covariances in the lag window . The flat-top lag window removes the first-order bias term due to lag windows via cancellation. However, larger order (negative) bias due to the truncation point remains unaccounted for. Simply choosing close to leads to highly undesirable finite sample properties as noted by Priestley (1981) and others. In fact, for strong consistency of , should be chosen so that both and increase to as (Damerdji, 1991).
For and , we define the lugsail lag window as
When or , (2) is the Bartlett window and when , (2) is the general flat-top lag window of Politis and Romano (1995, 1996). The key distinguishing feature of lugsail lag windows is they increase above one until , before decreasing down to zero. In short, lugsail lag windows propose a simple and novel “overweighting” of the initial lag covariances, which offsets some (if not all) of the small order bias terms in small runs while retaining asymptotic unbiasedness, mean square consistency, and strong consistency.
Downward biased long run variance estimators critically impact sequential stopping rules for computer-generated data (e.g. steady state and MCMC simulations). In these settings, simulation is terminated when is relatively small and downward bias causes early termination of simulation and under-coverage of confidence regions. An incomplete list of finite sample MCMC simulations exhibiting this phenomenon is Flegal et al. (2008), Gong and Flegal (2016), and Jones et al. (2006).
Other asymptotically conservative estimators of have been proposed by Dai and Jones (2017), Geyer (1992), and Kosorok (2000). However, these estimators are computationally intensive and retain significant asymptotic bias by design. Multivariate SV estimators also require significant computational effort compared to the multivariate batch means (BM) estimator of Chen and Seila (1987) and weighted BM estimators of Liu and Flegal (2018b).
We consider lugsail lag windows in SV and weighted BM estimators where we derive variance and bias expressions. Our results focus on Markov chains, but generalize to steady state simulations (see e.g. Damerdji, 1995)
. Our bias results significantly weaken existing conditions, that is, we require only polynomial ergodicity and a little over two finite moments of, instead of uniform ergodicity and twelve finite moments. Further, results for the lugsail BM estimator yield an expression of bias and variance for the multivariate BM estimator. This alone is a significant contribution since it proves mean square consistency for the traditional multivariate BM estimator, a problem only addressed in the univariate case under stronger assumptions (Damerdji, 1995; Flegal and Jones, 2010; Song and Schmeiser, 1995).
Lugsail estimators can be expressed as a linear combination of strongly consistent estimators. This observation yields strong consistency of lugsail estimators, a necessary condition for asymptotic validity of sequential stopping rules (Glynn and Whitt, 1992). Unfortunately, this form can lead to estimators that are not positive-definite in finite samples. We propose a slight modification ensuring positive-definiteness which retains the large sample properties. Our proposed solution is applicable for BM, weighted BM, and SV estimators, which are also not necessarily positive-definite in finite samples. This issue has not been addressed previously, since under strong consistency, the estimators yield positive-definite estimators as .
We illustrate the finite sample properties of lugsail BM estimators in three examples. First, we consider the autoregressive process of order 1 where we illustrate lugsail estimators yield closer to nominal coverage probabilities than BM and weighted BM. Our second example implements a Bayesian probit regression with the lupus data from van Dyk and Meng (2001). Again the lugsail estimators provide larger coverage probabilities, especially in small sample settings. Finally we consider a Bayesian dynamic spatial-temporal model where, due to the large lag covariances, the BM estimator significantly underestimates . The lugsail estimator in this case, improves on the weighted BM estimator with minimal cost.
2 Bias and batch means
Since our focus is on MCMC, we require Markov chain mixing conditions. For and , denote as the -step transition kernel for the -ergodic Markov chain. The chain is polynomially ergodic of order if there exists an such that and for all
is the total variation norm. A central limit theorem (CLT) holds under polynomial ergodicity with sufficient moment conditions(Jones, 2004), i.e.
Provided an estimate of is available, (3) can be used to make large sample confidence regions around .
Let , where is the number of batches and is the size of each batch. For
, the mean vector for batchof size is . The multivariate BM estimator is
where the subscript denotes the batch size used to construct the estimator. Chen and Seila (1987) first introduced the multivariate BM estimator, and it was further used in the works of Charnes (1995), Muñoz and Glynn (2001), and Vats et al. (2017). However, its bias and variance have not been studied outside of the univariate case ().
Let the th element of , , and be denoted by , , and , respectively. We first show the large sample bias for the multivariate BM estimator for a class of ergodic processes, which generalizes Proposition 1 in Song and Schmeiser (1995) who only considered the case.
If for all , , then,
Let denote Euclidean norm. We now provide a specific result for Markov chains. Flegal and Jones (2010) obtain the bias for the univariate case. However, they assume uniform ergodicity and , both of which are considerably stronger than what we require.
Suppose for some . If is a polynomially ergodic Markov chain of order for some , then , and hence
Let be a strictly stationary stochastic process on a probability space and set . Define the -mixing coefficients for as
Since the chain is polynomially ergodic, as . In addition, due to Ibragimov and Linnik (1971, Theorem 17.2.2) there exists which depends on the moments of and such that,
The diagonals of the bias matrix, , are negative in the presence of positive autocorrelation in the components of the Markov chain, so that for a fixed , on average, the diagonals are underestimated by BM. For , can be positive or negative depending on the cross-correlation in the Markov chain and target distribution. From Dai and Jones (2017, Proposition 1), is negative-definite for reversible Markov chains, .
Under the assumption of positive autocorrelation, even for non-reversible Markov chains, the diagonals of
exhibit negative bias, and thus the sum of the eigenvalues exhibit negative bias. In practice, we have found that this is enough to yield a negative determinant bias for the BM estimator. The aim of lugsail lag windows is to offset this negative bias.
The integer sequence is such that and as , and both and are nondecreasing.
3 Lugsail batch means
By construction, BM does not use a lag window. However, recently Liu and Flegal (2018b) proposed a family of weighted BM estimators that generalizes the BM estimator in (4). For , let , then the weighted BM estimator is
Under the conditions of Theorem 2,
For positive autocorrelation when , the lugsail BM yields positive bias. As , , and so the bias goes to 0. When and , the positive bias equals the negative bias of the BM estimator.
Strong consistency follows from the continuous mapping theorem. See Vats et al. (2017) for conditions for strong consistency of .
If with probability 1 as , then with probability 1 as .
We require a strong invariance principle (SIP) to hold for the ergodic process to establish variance of the lugsail BM estimator and show mean square consistency. Let be a -dimensional standard Brownian motion.
There exists a lower triangular matrix , a nonnegative increasing function
on the positive integers, a finite random variable, and a sufficiently rich probability space such that for almost all and for all , with probability 1,
Here is such that . A SIP is known to hold for many processes, including regenerative processes, -mixing, and strongly mixing processes (for discussion see Vats et al., 2018). The results of lugsail estimators hold for processes that satisfy (8), but our interest is in polynomially ergodic Markov chains.
The following theorem establishes the variance of the lugsail BM estimator, whose proof is in the appendix.
Theorem 6 provides variance expressions for the BM estimator and the weighted BM estimator with a flat-top window, both of which were unknown for the multivariate case. Specifically, for BM ( or )
and for weighted BM with a flat-top Bartlett window ()
The variance is an increasing function of both and . As in Politis and Romano (1995, 1996), we find yields a reasonable variance of the estimator. We recommend to induce finite sample over-bias. Then the ratio of the variances with , for weighted BM with the flat-top Bartlett lag window and lugsail BM estimators, is
As we show in Section 6, the variability gain is minimal compared to benefits of finite sample properties for lugsail estimators.
Since both the bias and variance go to zero as , we obtain mean square consistency of the lugsail BM estimator.
Let for some and let be a polynomially ergodic Markov chain of order for . Then for all and Assumption 2 holds with for . If in addition and as , then as
3.1 Degrees of freedom
Estimators of can be used to construct large sample confidence regions using the large sample approximation
where is a Hotelling’s distribution with dimensionality parameter. As , since , .
, the difference between the quantiles ofand grows as increases (Figure 2). Hence an estimate of is necessary, where our solution is similar in spirit to those of Chien et al. (1997) and Glynn and Whitt (1991). Under consistency of the BM estimators, the CLT in (3), and uniform integrability, is approximately a Wishart distribution for large with scale matrix and degrees of freedom. Using method of moments, we obtain the following approximation,
4 Lugsail spectral variance
The lugsail lag window most naturally fits in the class of SV estimators defined at (1). Using the lugsail lag window and a little algebra, we obtain
where and are SV estimators with a Bartlett lag window with truncation points and , respectively. That is, the lugsail SV estimator can be obtained as a linear combination of two SV estimators similar to the weighted BM estimator. An application of the continuous mapping theorem yields strong consistency, where sufficient conditions for strong consistency of can be found in Vats et al. (2018).
If with probability 1 as , then with probability 1 as .
Studying bias and variance of SV estimators is quite challenging, but recent work by Liu and Flegal (2018a) can be used. They consider an alternative formulation of that differs only in some end effects. Let for , then set
Liu and Flegal (2018a, Theorem 1) provides conditions under which with probability 1 as . In order to study the properties of , we will study the properties of with the lugsail lag window. Let and be the th element of and , respectively.
Suppose for some . If is a polynomially ergodic Markov chain of order for some , then,
For any lag window satisfying , under uniform ergodicity and , Liu and Flegal (2018a) show that since ,
By Theorem 2, under the weaker conditions described here. The lugsail lag window is piece-wise linear and continuous, so is 0 except for the end points and non-differentiable points. So and and for all other . Thus is satisfied for the lugsail lag window. ∎
Variance of the lugsail SV estimator can be obtained as follows.
As in the case of lugsail BM, the variance is an increasing function of both and . To keep the variance restricted but also introduce positive bias, we recommend and . For this setting, the multiplicative constant in the variance is . Although this is approximately a half of the constant for lugsail BM, Liu and Flegal (2018b) exhaustively show SV estimators are more computationally demanding than BM estimators. For this reason, we do not implement the SV lugsail estimators in our examples.
5 Corrections for positive-definiteness
Finite sample estimates of may not be positive-definite. For example, may be negative in the case of weighted BM estimators and need not be positive-definite in the case of SV estimators. We propose an adjustment on the lines of Jentsch and Politis (2015) to the multivariate estimators where the adjusted estimator retains the large sample properties of the original.
Let be any estimator of and , that is, is the diagonal matrix of the univariate variance estimates. Consider the correlation matrix corresponding to , denoted . Note is a symmetric matrix with real-valued entries, and hence the eigenvalue decomposition exists. Here is a orthogonal matrix and is the diagonal matrix of eigenvalues of .
If is not positive-definite, then some eigenvalues . To correct for this, define for and . Then as and due to positive-definiteness of . Let , then the adjusted estimator is
The constants and are chosen by the user. We suggest and , which work well in practice and do not require tuning for different problems. Note these choices imply as .
In the following examples we compare the performance of lugsail BM with traditional BM and weighted BM with the flat-top window. We compare the determinants of the estimators and coverage probabilities of confidence regions over independent replications. Confidence regions are constructed using (9). Lugsail and weighted BM estimators are implemented in the GitHub development version of the R package mcmcse (Flegal et al., 2017).
6.1 Vector autoregressive process
We consider the vector autoregressive process of order 1. Let be a matrix with spectral norm less than 1 and let be a positive definite matrix. For let and ,
The stationary distribution here is where . In addition, the chain is geometrically ergodic when the spectral norm of is less than 1 (Tjøstheim, 1990). We are interested in estimating . The true value of is available in closed form (Vats et al., 2018).
We set for and to be the AR correlation matrix with coefficient .9. These choices of the parameters ensure that each individual component of is a univariate autoregressive process, however,
still inherits sufficient cross-correlations for multivariate analysis of Monte Carlo error to be pertinent.
We first do a univariate analysis with varying choices of. We compare the estimators for run lengths of and , and set . Figure 3 shows for , BM estimators underestimate the variance for large values of , however the lugsail BM estimator overestimate the variance until (the left y-axis is on scale). After , the small order bias terms are significant; even still the lugsail estimators yield larger coverage probabilities than BM and weighted BM. For , all estimators perform similarly, except for , where the BM estimator exhibits under coverage.
For the multivariate process, we set and . Results are in Table 1. For small sample sizes, the lugsail estimator is conservative, as it is designed to be. However, note at small sample sizes the determinant of the estimated is closer to the truth. The conservative coverage probability is largely due to the higher variance of the lugsail estimator, yielding smaller degrees of freedom for . However, as the number of iterations increase, the coverage probabilities decreases due to consistency of the estimators. The weighted BM estimator is certainly an improvement on the BM estimator but still underestimates the true variance.
|0.761 (0.00575)||0.930 (0.00206)||0.973 (0.00083)|
|0.856 (0.00390)||0.893 (0.00302)||0.927 (0.00214)|
|0.883 (0.00327)||0.894 (0.00300)||0.905 (0.00272)|
|Determinant to the th root|
|68.1 (0.100)||78.8 (0.160)||88.4 (0.174)||89.7|
|82.7 (0.068)||86.3 (0.109)||91.1 (0.120)||89.7|
|87.4 (0.039)||88.5 (0.063)||89.9 (0.070)||89.7|
Figure 4 shows confidence ellipsoids constructed for the first and second component of , for Monte Carlo sample sizes of and using one replication. For Monte Carlo sample size , the ellipse created by the lugsail estimator is slightly larger than the truth in the principal direction, whereas the weighted BM slightly undercovers the regions and the BM estimator severely undercovers. For , the difference between the estimators is negligible.
6.2 Bayesian probit regression
Consider the lupus data from van Dyk and Meng (2001)
which deals with the occurrence of latent membranous lupus. The response variableis an indicator of disease (1 if present) and the covariates are the difference between IgG3 and IgG4 (immunoglobulin G), and IgA (immunoglobulin A), , for . We consider the model with intercept and both covariates,
Set . We assign a flat prior to , so and the resulting posterior distribution is
where . Roy and Hobert (2007) show that is proper. Our goal is to estimate the posterior expectation of , . To sample from the posterior distribution, we use the PX-DA algorithm of Liu and Wu (1999) shown to be geometrically ergodic by Roy and Hobert (2007).
We study coverage probabilities and determinants of estimators of over 1000 replications. The “truth” is declared by running the Markov chain for one long run of . We set . Figure 5 shows boxplots of the estimated determinants to the th root over 1000 replications for Monte Carlo samples sizes and . As discussed in Theorem 6, the weighted BM and lugsail BM estimators exhibit larger variability, however, the size of the estimators is uniformly larger than the BM estimator. This difference between the size of the estimators decreases for the larger Monte Carlo sample size of . The lugsail estimator yields higher coverage probabilities throughout.
6.3 Bayesian dynamic spatial-temporal model
Consider the dynamic spatial-temporal model as described by Finley et al. (2015). Let be location sites and be time-points; denotes the observed measurement at location and time . In addition, let be the vector of predictors, and be the vector of coefficients. For ,
where is a spatial Gaussian process with covariance function , is the spatial variance component, and is the correlation function with exponential decay. Both and emulate the Markovian nature of dependence in time. Prior distributions for are those described in the spDynLM function in the spBayes R package. We fit the model to a subset of NETemp dataset in the spBayes package. This dataset has monthly temperature measurements and elevation from 10 weather stations on the east coast of USA. The resulting posterior of has 185 components. Due to the Markovian time dependence in and , the posterior distribution exhibits significant correlation across its components.
We use the conditional Metropolis-Hastings sampler with default hyper parameters implemented in the spDynLM function to sample from the posterior. The rate of convergence of this sampler is unknown. Our goal is to estimate the posterior expectation of . The “true” posterior mean was calculated by running 1000 parallel chains of size
We calculate coverage probabilities and size of the estimators of with . Since is fairly large the corrections for positive-definiteness were required. Figure 6 shows boxplots of the determinant of estimators to the th root for and samples. After correcting for positive-definiteness, the weighted BM and lugsail estimators still have approximately double the variability of the BM estimator. The determinant for the weighted BM and the lugsail BM estimators is significantly larger than the BM estimator, in part also due to the corrections for positive-definiteness. The BM estimator significantly undercovers, while the lugsail estimator yields conservative coverage. Note that the average determinant for the weighted BM estimator at is almost equal to the average determinant of the lugsail BM estimator at iterations. Thus, it takes approximately times the effort for the weighted BM estimator to overcome the small order bias terms.
MCMC procedures typically involve careful tuning and analysis to ensure reasonable rates of convergence to stationarity. In addition, MCMC procedures are computationally involved and not easily parallelizable due to their sequential nature. Given this complex and involved implementation process, quantification of Monte Carlo standard errors has not received enough attention. To this end, the lugsail lag window provides fast, practical, and more accurate estimators of the Monte Carlo standard error, equipping users with the tools to assess the quality of their MCMC samples.
Computational efficiency of lugsail batch means estimators is a direct consequence of the lugsail lag window being piece-wise linear. This Bartlett-like feature is by design but could easily be replaced by non-linear windows (e.g. Tukey-Hanning or Prazen windows) with a likely loss in computationally efficiency. In essence, allowing lag windows to increase above one can now be experimented with many other existing windows.
As discussed, the lugsail lag window is appropriate for applications outside of MCMC. In time series, lag windows are used in heteroskedasticity and autocorrelation consistent covariance matrix estimation in linear and nonlinear models with dependent errors (Andrews, 1991; De Jong, 2000; Hansen, 1992). These are available in the Econometrics toolbox in Matlab and the sandwich package in R. In steady state simulations, lag windows for spectral variance estimators have long been a part of output analysis (Priestley, 1981; Welch, 1987). The Signal Processing Toolbox in Matlab and spec.pgram function in the base R package implement these variance estimators. The lugsail lag window estimators can be easily implemented by two calls of the respective functions in these packages. Specifically for MCMC, lugsail batch means estimators are available in the GitHub development version of the R package mcmcse.
Appendix A Proof of Theorem 6
If and are jointly normally distributed random variables with mean vector
are jointly normally distributed random variables with mean vector, such that
Proposition 2 (Janssen and Stoica (1988)).
If and are jointly normally distributed with mean 0, then
Recall that is a -dimensional standard Brownian motion. Let denote the th component of the vector and let , and . Recall where is the lower triangular matrix in (8). Define the -dimensional scaled Brownian motion and let be the th component of . In addition, define and .
Consider the Brownian motion equivalent of the lugsail BM estimator,
Let , so that