Let be a stationary time series, whose distribution depends on for some . Denote by the true parameter, which we want to estimate from observations of the time series. Maximum likelihood estimation (MLE) has been extensively used for parameter estimation, since under weak regularity conditions it is known to be asymptotically efficient. For many models, however, MLE is not always feasible tu carry out, due to a likelihood that may be intractable to compute, or maximization of the likelihood is difficult, or because the likelihood function is unbounded on . To overcome such problems, alternative methods have been developed, for instance, the generalized method of moments (GMM) in Hansen (1982), the quasi-maximum likelihood estimation (QMLE) in White (1982), and composite likelihood methods in Lindsay (1988).
In a similar vein, Feuerverger (1990) proposed an estimator based on matching the empirical characteristic function (chf) computed from blocks of the observed time series and the true chf. More specifically, given a fixed , the observed blocks of are
where . In that paper, a finite set of points in needs to be chosen as arguments for which the true and the empirical chf are compared. However, the practical choice of this set depends on the problem at hand and the asymptotic results derived in Feuerverger (1990) do not offer practical guidance for choosing these points. To overcome this limitation Yu (1998) and Knight and Yu (2002) considered a integrated weighted squared distance between the empirical and the true chfs.
This method has been used in a variety of applications; an interesting review paper, Yu (2004) contains a wealth of examples and references. More recent publications, where the method has been successfully applied to discrete-time models include Knight et al. (2002), Meintanis and Taufer (2012), Kotchoni (2012), Milovanovic et al. (2014), Francq and Meintanis (2016) and Ndongo et al. (2016). The method also applies to continuous-time processes after discretization and has been used prominently for Lévy-driven models. The book Belomestny et al. (2015) provides additional insight and references in this field.
All these papers assume the ideal situation that the chf has an explicit expression, as a function of . We call the corresponding parameter estimator the oracle estimator and use it for comparison with the two new estimators we propose in this paper for models whose chf is not available in closed form. Both these estimators are constructed from a functional approximation of the chf constructed from simulated sample paths of .
While much attention has been given to the choice of the integrated distance used when computing such estimators, which under some regularity conditions can achieve the Cramér-Rao efficiency bound (see eq. (2.3) of Knight and Yu (2002) and Proposition 4.2 of Carrasco et al. (2007)), the focus of our paper is on the practical and theoretical aspects that emerge when it is required to approximate the theoretical chf for parameter estimation.
Our first estimator is computed from a simple Monte Carlo approximation to replace the true, but unknown chf. This is similar to the simulated method of moments of McFadden (1989) and of the indirect inference method (Smith (1993) and Gourieroux et al. (1993)). In particular, indirect inference has been successfully applied in a variety of situations: parameter estimation of continuous time models with stochastic volatility (Bianchi and Cleur (1996), Jiang (1998), Raknerud and Skare (2012), Laurini and Hotta (2013) and Wahlberg et al. (2015)), robust estimation (de Luna and Genton (2001) and Fasen-Hartmann and Kimmig (2018)), and finite sample bias reduction (Gourieroux et al. (2000, 2010) and do Rego Sousa et al. (2018)).
More precisely, for many different , we simulate an iid sample of blocks denoted by
for , and define a simulation based parameter estimator, which minimizes the integrated weighted mean squared error, which is the integrated distance we use, between the empirical chf computed from the blocks (1.2) of the observed time series and its simulated version computed from a large number of simulated paths of the time series.
This is in contrast to the simulation based estimator defined in Section 5.2 of Carrasco et al. (2007), which is computed from one long time series path instead of the iid sample of blocks in (1.2). Since the Monte Carlo approximation of the chf here is computed from independent blocks, it should have smaller variance than the corresponding one for dependent blocks. By the same method in Carrasco et al. (2007), Forneron (2018) estimated the structural parameters and the distribution of shocks in dynamic models.
Indeed this gives a chf approximation which yields, by minimizing the integrated distance, strongly consistent and asymptotically normal parameter estimators. We also report their small sample properties for different models.
However, as the Monte Carlo approximation of the chf is computed from iid blocks from a time series, control variates techniques (see Glynn and Szechtman (2002) and Robert and Casella (2004)) provide an even more accurate approximation for the chf. Control variates techniques are classical variance reduction methods in simulation. The idea is to use a set of control variates, which are correlated with the chf. The method then approximates the joint covariance matrix of the control variates and the chf, and uses it to construct a new Monte Carlo approximation of the chf. We choose the first two terms in the Taylor expansion of the complex exponential , and for as control variates. This requires knowing the mean and covariance matrix of for .
In assessing the performance of both the Monte Carlo approximation and the control variates approximation of the chf, two trends emerge. First, both the Monte Carlo and the control variates approximations work better for small values of the argument. Second, the control variates approximation performs much better than the Monte Carlo approximation, in particular, for small values of the argument. As a consequence, we propose a control variates based parameter estimator whose integrated mean squared error distance distinguishes between small and large values of the argument.
Under regularity conditions we prove strong consistency of the proposed parameter estimators and asymptotic normality of the simulation based parameter estimator. We find that the simulation based parameter estimator is asymptotically normal with asymptotic covariance matrix equal to the one of the oracle estimator as derived in Knight and Yu (2002). From this we conclude that there cannot be any improvement in the limit law for the asymptotic normality of the control variates based estimator. However, we prove that it is computed from a better approximation of the chf. Thus, the control variates estimator improves the finite sample performance compared to the simulation based parameter estimator.
The finite sample performance of the estimators are investigated for two important models. We begin with a stationary Gaussian ARFIMA model, whose chf is explicitly known so that we can use the oracle estimator and compare its performance with the simulated based estimator. Their performance is comparable and also very close to the MLE, so in this model there is no need to use control variates. The second example is a nonlinear model for time series of counts, which has been proposed originally in Zeger (1988) and applied, for instance, for modeling disease counts (see also Campbell (1994), Chan and Ledolter (1995) and Davis et al. (1999)).
In the second example, the oracle estimator does not apply, since the chf of the vectorcannot be computed in closed form. For this model and different parameter sets, both the simulation based and the control variates based estimators perform satisfactory, and the control variates based estimator improves the performance of the simulation based estimator considerably. When compared with the composite pairwise likelihood estimator in Davis and Yau (2011), the control variates based estimator has comparable or even smaller bias.
Our paper is organized as follows. In Section 2 we present the oracle estimator, and the estimators computed from a Monte Carlo approximation and from a control variates approximation of the chf in detail. Here we also motivate the choice of the control variates used. The asymptotic properties of the two new estimators are established in Section 3. As all estimators are computed from true or approximated chf’s we assess their performance in Section 4, first for a Gaussian AR(1) process and then for the Poisson-AR process. Practical aspects of calculating the weighted least squares function are discussed in Section 5, as well as the estimation results for finite samples. In Section 5.1 we compare the oracle estimator, the simulation based parameter estimator and the MLE for a Gaussian ARFIMA model, whereas in Section 5.2 we compare the simulation based parameter estimator and the control variates based estimator for the Poisson-AR process. The proofs of Section 3 are given in the Appendix.
2 Parameter estimation based on the empirical characteristic function
Throughout we use the following notation. For we use the -norm: , where is the complex conjugate of . For and we denote by the -norm, but recall that in all norms are equivalent. Furthermore, denotes the usual Euclidean inner product in . For the symbols and denote its real and imaginary part. For a function its gradient is given by and .
2.1 The oracle estimator
Let be a stationary time series process, whose distribution depends on for some . Denote by the true parameter, which we want to estimate, and suppose that we observe . Given a fixed , define for the -dimensional blocks
where . The observed blocks corresponds to
which can be used to estimate the empirical characteristic function (chf), defined as
Under mild conditions such as ergodicity, converges a.s. pointwise to the true chf for all . We assume that is chosen in such a way that uniquely identifies the parameter of interest . The idea of estimating from a single time series observation by matching the empirical chf of blocks of the observed time series and the true one has been proposed in Yu (1998) and Knight and Yu (2002), and we use the one in Knight and Yu (2002), where the oracle estimator of is defined as
with suitable weight function such that the integral is well-defined, and chf
In an ideal situation, has an explicit expression, which is known for all .
2.2 Estimator based on a Monte Carlo approximation of
Unfortunately, a closed form expression of the chf is for many time series processes not available. However, it can be approximated by a Monte Carlo simulation, and an idea borrowed from the simulated method of moments (McFadden (1989), see also Smith (1993) and Gourieroux et al. (1993) for a similar idea in the context of indirect inference) is to replace by its functional approximation constructed from simulated sample paths of . For many different , we simulate, independent of the observed time series, an iid sample of the blocks in (2.1) denoted by
for , and define the Monte Carlo approximation of based on these simulations as
If we replace in (2.4) by , we obtain the simulation based parameter estimator
with suitable weight function such that the integral is well-defined.
An alternative estimate to (2.6) of the chf is based on generating one long time series path and use the empirical chf of the consecutive blocks of
-dimensional random variables constructed as in (2.1). While being unbiased, the estimate will generally have larger variance than the estimate proposed in (2.6) using independent blocks of random variables. Nevertheless, in some cases when it is expensive to generate realizations even of size , such as the case when a long burn-in is required to achieve stationarity, it may be computationally more efficient to generate one long series. While we do not pursue this approach here, the technical aspects of using one large realization is not much different than the estimate based on independent replicates as in (2.6).
Since is based on iid time series blocks, we can reduce its variance further using control variates to produce an even more accurate approximation for the chf. This will result in an improved version of .
2.3 Estimator based on a control variates approximation of
The estimator in (2.7) requires only that the stationary time series process can be simulated, and is therefore easily applicable to a large class of models. When computing of (2.8), it is very important that the error
in approximating the true chf is small, since it propagates to . In order to reduce the variance of the empirical chf , we use the method of control variates, as often used variance reduction technique in the context of Monte Carlo integration (Glynn and Szechtman (2002), Oates et al. (2017), Portier and Segers (2018)).
We construct a control variates approximation of from the iid sample , , as in (2.5). We also require explicit expressions for the moments for and .
Recall that for all , so that both random variables have the same moments. As in Portier and Segers (2018), we denote by the distribution of the block and by its empirical version. For example, if for , we want to provide a good approximation for
To apply the control variates technique, we need control functions, which are correlated with and whose expectations are known. We use the first two terms in the Taylor series of the complex function , which suggests the vector of control functions , where for ,
so that , the zero vector in . The Monte Carlo approximation of based on the iid sample , , is then
Since , the Monte Carlo approximation is unbiased and has variance
Then for every vector , we have that
is also an unbiased estimator of. Since , , is an independent sample,
and, if we differentiate the map with respect to and set it equal to zero, we obtain (cf. Approach 1 in Glynn and Szechtman (2002)) the theoretical optimum
provided the inverse exists. In this case, the estimator
has minimal asymptotic variance. In order to investigate the existence of the above inverse note that for each fixed and ,
Since by the Cauchy-Schwarz inequality,
it follows (see e.g. Klenke (2013), Theorem 5.8) that
for some with . As the scalar product is random, universal coefficients to satisfy the right-hand side of (2.14) exist only in degenerate cases, which we do not consider.
For the iid sample , as in (2.5) we obtain the control variates approximation of given by
Recall from (2.10) that , so we could simply replace in (2.8) by as given in (2.16). However, as we shall see in Section 4, the control variates approximation provides superior approximations of only for values of , for which is small. Thus, we replace in (2.8) by a combination of and . More precisely, we propose the following control variates based estimator:
where for appropriate ,
with suitable weight function such that the integral is well-defined. Note that
and . The choice of the indicator function is justified by the fact that, when estimating the parameter , we focus on approximations of for close to .
3 Asymptotic behavior of the parameter estimators
Before performing the parameter estimation we need to make sure that the parameters are identifiable from the model. For the estimators we propose, we require simply that the chf uniquely identifies the parameter of interest. This will always hold true for the examples we consider later on.
In the sequel, we will make various assumptions on different aspects of the underlying process, smoothness of the model, moments of the process, and properties of the weight function. We group these assumptions into the following categories.
Assumptions A (Parameter space and time series process).
is a compact subset of and , the interior of .
is a stationary and ergodic sequence.
is -mixing with rate function satisfying for some .
Assumptions B (Continuity and differentiability in ).
For each , the map is continuous on .
For each , the map is twice continuously differentiable in an open neighborhood around .
Assumptions C (Moments).
Assumptions D (Weight function).
for some .
Assumption B is indeed satisfied by many linear and non-linear time series processes, in particular, when they have a representation or for iid noise variables , and is a measurable function. Prominent examples are the MA and AR representations of a causal or invertible ARMA model (see e.g. eqs. (3.1.15) and (3.1.18) in Brockwell and Davis (2013)) or the ARCH representation of a GARCH model (see e.g. Francq and Zakoïan (2011), Theorem 2.8). In this case, assumptions 1 and 2 will hold whenever the map is continuously differentiable for . For example, if is Lipschitz-continuous for , then the continuity assumption 1 holds.
The key asymptotic properties, consistency and asymptotic normality of our estimates are stated in the following theorems. The proofs of these results are postponed to the appendix.
We formulate first the strong consistency results of the parameters.
Theorem 3.2 (Consistency of ).
The asymptotic normality of the simulation based parameter estimator reads as follows.
Theorem 3.3 (Asymptotic normality of ).
Theorem 3.3 shows that is asymptotically normal and achieves the same asymptotic efficiency as the oracle estimator from (2.3). Therefore, there cannot be any improvement in the limit law for the asymptotic normality of . However, as we show in Section 4 it is based on a better approximation of the chf than that used for . Thus, the control variates estimator improves the finite sample performance compared to the simulation based estimator .
4 Assessing the quality of the estimated chf
In this section we compare the performance of both the Monte Carlo approximation and the control variates approximation of the chf as defined in (2.6) and (2.16), respectively. We start with the following comparison of the two chf approximations.
and the almost sure convergence of both terms.
The quantities needed to compute the estimator in (2.15) are, for each :
Hence, strong consistency of follows from the SLLN. This together with implies by Theorem 1 in Glynn and Szechtman (2002) that, as ,
with as defined in (2.11). Therefore, provides an approximation of the integral in (2.4) with smaller variance than . As a consequence, this favors the control variates estimator over the simulation based estimator for large sample sizes .
For all forthcoming examples we choose and . We begin with a stationary Gaussian AR(1) process, where we know the chf explicitly, and then proceed to the Poisson-AR process, where we approximate the true unknown chf by a precise simulated version.
4.1 The AR(1) process
We start with a stationary Gaussian AR(1) process to show how the method of control variates improves the Monte Carlo approximation of its chf. Let be the AR(1) process
with parameter space being a compact subset of . Then the true chf of is given by
where the covariance matrix is explicitly known and identifies the parameter uniquely; see e.g. Brockwell and Davis (2013), Example 3.1.2. For a fixed and many we compute the absolute errors
where is the Monte Carlo approximation of the chf of and its control variates approximation. To understand how well we can approximate , we plot in Figure 1, and against for different parameters . These quantities are computed from an iid sample as in (2.5). To simulate iid observations from the model (4.3), we use the fact that the one-dimensional stationary distribution is , and then use the recursion in (4.3) to simulate and . We chose randomly generated values of from the -dimensional Laplace distribution with chf given in (5.2).
It is clear from Figure 1 that both the Monte Carlo and the control variates approximations work better when is small, and also that the control variates approximations are best for small values of . The superiority of the control variates approximation for all and all parameter settings is clearly visible, and already expected from Remark 4.1.
4.2 The Poisson-AR model
We consider a nonlinear time series process for time series of counts, which has been proposed originally in Zeger (1988). A prototypical Poisson-AR(1) model suggested in Davis and Rodriguez-Yam (2005) assumes that the observations
are independent and Poisson-distributed with meanswhere the process is a latent stationary Gaussian AR(1) process, given by the equations
with parameter space being a compact subset of . The parameter is uniquely identifiable from the second order structure, which has been computed in Section 2.1 of Davis et al. (2000).
For this model, the true chf of cannot be computed in closed form. To mimic the assessment of the errors in eq. (4.4), we simulate iid observations from by first simulating a Gaussian AR(1) process (as described in Section 4.1) and then simulating independent Poisson random variables with means , and , respectively. From this we compute the empirical characteristic function and take it as in the absolute error terms (4.4).
Now, as in Section 4.1, we compare the performance of both the Monte Carlo approximation and the control variates approximation of the chf. Figure 2 presents the results. The plots in Figure 2 are also in favor of the control variates approximation, when compared to the Monte Carlo approximation.