Local Polynomial Estimation of Time-Varying Parameters in Nonlinear Models

04/10/2019 ∙ by Dennis Kristensen, et al. ∙ UCL 0

We develop a novel asymptotic theory for local polynomial (quasi-) maximum-likelihood estimators of time-varying parameters in a broad class of nonlinear time series models. Under weak regularity conditions, we show the proposed estimators are consistent and follow normal distributions in large samples. Our conditions impose weaker smoothness and moment conditions on the data-generating process and its likelihood compared to existing theories. Furthermore, the bias terms of the estimators take a simpler form. We demonstrate the usefulness of our general results by applying our theory to local (quasi-)maximum-likelihood estimators of a time-varying VAR's, ARCH and GARCH, and Poisson autogressions. For the first three models, we are able to substantially weaken the conditions found in the existing literature. For the Poisson autogression, existing theories cannot be be applied while our novel approach allows us to analyze it.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We provide a novel asymptotic theory for local polynomial estimators of time-varying parameters in a broad class of non-linear time series models. The theory imposes very little structure on the chosen objective function used for estimation and on the underlying model being estimated. In particular, in contrast to the existing literature on kernel-based estimation of time-varying parameters, we impose substantially weaker smoothness and moment conditions on the likelihood and its derivatives. For example, in the case of local linear estimators we do not require the existence of so-called derivative processes. And for the local constant version we only need the first-order derivative process to exist while the existing literature require higher-order derivatives to be well-defined. Finally, again compared to existing theories, our results hold under much weaker restrictions on the bandwidth sequence used in the estimation thereby allowing for standard bandwidth selection procedures to be used. These features of our theory in turn imply that our asymptotic results take a simpler form and more closely resemble those found in the literature on local maximum likelihood estimation in a cross-sectional setting. Our theory also applies to GARCH-type models and for this class we show that additional biases appear due to the local polynomial approximation being less precise.

We demonstrate the aformentioned attractive features of our theory in two ways: First, we re-visit some specific models that have been analyzed elsewhere in the literature and show that our theory allows us to substantially weaken existing regularity conditions for the estimators to be well-behaved. Second, we apply our theory to models that fall outside the framework of existing theories. A simulation study investigates the finite-sample performance of the estimators and an empirical application shows the usefulness of the proposed methodology in practice.

To motivate and further discuss our results, consider the following class of models,

(1)

for where and are observed, is an unobserved error, and is sequence of a possibly time-varying parameters generated by an underlying function . Here, may contain lags of and so the above class of models includes

-order Markov models. However, our theory goes beyond the above and also covers many other models such as generalized autoregressive models that include, for example, GARCH as special case. Assuming that

is a smooth deterministic function, we develop and analyze nonparametric estimators of for any given . Our proposed estimation method is based on the local maximum likelihood principle (see Tibshirani and Hastie 1987 and Fan et al. 1995): It takes as input a given (quasi-)likelihood function of the model in the stable case where is assumed constant. We then develop a kernel-weighted version of this objective function where is approximated by a polynomial in . Maximizing this w.r.t. the coefficients of the polynomial, we arrive at a local polynomial estimator of and its derivatives.

We develop a novel asymptotic theory showing that the polynomial estimators are pointwise (in time) consistent and asymptotically normally distributed. The proof strategy pursued here is different from the standard one found in the existing literature in that we rely on a alternative expansion of the score function in order to obtain expressions of the leading bias and variance components. This allows us to obtain a simpler expression of the leading bias and variance terms under weaker regularity conditions compared to, e.g.,

Dahlhaus et al. (2017) and the references therein.

Our estimation method includes as special cases the local constant estimator and the local linear estimator. We find that the local constant estimator suffers from additional biases in the interior of the domain compared to the local linear estimator with its bias involving the so-called derivative process of the stationary approximation to data. Moreover, the local linear estimator enjoys the well-known automatic boundary adjustment property meaning that at the beginning and end of the sample, this estimator will perform better than the local constant one.

Our general theory encompasses most existing results for nonparametric estimators of with time-varying parameters which are mainly for local constant estimators; see, e.g., Kristensen (2012), Robinson (1989), Dahlhaus and Subba Rao (2006) and Fryzlewicz et al. (2008)

, and in many cases lead to weaker conditions for existing results to hold. We demonstrate this feature by revisiting specific models analyzed in these papers and showing that their asymptotic results carry through under substantially weaker moment and parameter restrictions. Moreover, it allows us to analyze estimators of models that, as far as we can tell, cannot be handled by the existing theory, such as Poisson autoregressions with time-varying parameters. Our theory also contributes to the literature on asymptotic analysis of local polynomial estimators of varying-coefficient models by extending existing results (as in

Fan et al. 1995 and Loader (2006)) to cover situations where the objective functions is non-concave. This is an important extension since the quasi-likelihoods of most non-linear models are non-concave, and the analysis of this case requires some new technical tools.

The remainder of the paper is organized as follows: Framework and estimators are introduced in Section 2. Section 3 presents the asymptotic theory of the estimators. In Section 4, we extend the theory to cover GARCH-type models. We then apply our general theory to particular models in Section 5. We present the results of two simulation studies and an empirical application in Sections 6 and 7, respectively. All lemmas and proofs have been relegated to the Appendix.

2 Framework

We are given observations, , , from a nonlinear time-series model with associated (quasi-) log-likelihood where . The quasi-likelihood is assumed to identify the data-generating parameters when these are in fact constant. That is, when is constant, the data-generating parameter value is the maximizer of . A natural estimator in the time-invariant case would then be the M-estimator maximizing the sample analogue, . The choice of is, of course, model specific. For example, in a regression setting, we could choose as the least squares criteria, while in (G)ARCH models it could be the Gaussian (quasi-)log-likelihood.

Now, returning to the case where is potentially varying over time, we then wish to estimate for some given value We propose to do this using local polynomial estimators where is approximated by the following polynomial of order for ,

(2)

where with and

Next, to control the approximation error, , we introduce a kernel weighted version of the global quasi-log-likelihood and substitute in the polynomial approximation,

where , is a kernel function, and a bandwidth. We then estimate the polynomial coefficients by

where will be specified below, so that and , . When , we recover the standard local-constant estimator.

Special care has to be taken with the implementation of local polynomial estimators when the chosen objective function is not well-defined for all value of and/or is compact. A simple example is ARCH models where parameters have to remain positive for the volatility process to be well-defined. In such cases, we have to ensure that satisfies these constraints for . To this end, it proves useful to introduce rescaled versions of using the following weighting matrix,

We then define which satisfies

where . Importantly, and depend on the same argument which facilitates the derivation of precise restrictions on the parameter space so that is well-defined for all . The corresponding parameter space for then takes the form which expands as . Moreover, the asymptotic analysis proves to be much simpler to carry out in terms of since contains the relative rates of convergence of as we shall see in the following section.

3 Asymptotic theory

To establish an asymptotic theory for the proposed class of local polynomial estimators, we will rely on the concept of local stationarity as introduced by Dahlhaus (1997); see also Dahlhaus and Subba Rao (2006) and Dahlhaus et al., 2017. We first generalize this concept to sequences of random functions:

Definition 1.

A triangular family of random sequences , , and , is uniformly locally stationary on (ULS) for some if there exists a family of processes , , such that: (i) The process is stationary and ergodic for all ; (ii) for some and ,

(3)

Compared to existing definitions of local stationarity, we allow for an additional term to appear in the approximation error. This is needed in order to allow for the initial value of the (non-stationary) data-generating process to be arbitrary. In contrast, most of the existing literature implicity assumes that the data-generating process has been intialized at where is its stationary approximation. This has as consequence that the data-generating process changes as the researcher varies in the local log-likelihood which is a rather peculiar assumption. Moreover, in the estimation of GARCH-type models, the conditional variance process entering the likelihood is normally initialized at a fixed value and so again an additional error term will appear when comparing this with its stationary version. The above definition again allows for this feature. To see how the additional error is generated in Markov models, we refer the reader to Theorem 7 in Appendix A.4 which allow for an arbitrary intialization of the data-generating process. The additional error term due to different intializations is here assumed to decay geometrically and so our definition rules out long-memory type processes. This is mostly for simplicity and we expect that most of our results can be generalized to allow for slower decay rates. Appendix A.1 contain a number of novel results for kernel weighted averages of parameter-dependent locally stationary processes which will be used in the following analysis of our polynomial estimators.

We will then require that is ULS with stationary approximation . To illustrate, consider (1): The stationary approximation will here take the form where is the stationary solution to the model when is constant,

(4)

If the data-generating process is locally stationary, it follows under great generality that the likelihood and its derivatives are also locally stationary as shown in the following theorem:

Theorem 1.

Suppose that is ULS with stationary approximation satisfying ; (ii) is i.i.d. and independent of ; and (iii) for some , for all and . Then is ULS.

This result generalizes Proposition 2.5 in Dahlhaus et al. (2017) in two directions: First, it allows for to be parameter dependent and second it allows for an i.i.d. component, , to enter the transformation. Allowing for parameter dependence means we can apply the above result to GARCH-type models, among others. The reason why we allow for the presence of the additional component is best illustrated by again considering (1): In this model, we can rewrite and thereby the likelihood as a function of and the error term . Doing so allows for easier verification of local stationarity of the likelihood and its derivatives; see Section 5 for examples of this.

Under ULS, the nonstationary local likelihood function and its derivatives are well-approximated by their stationary versions. For example, where

. The next step is then to develop a uniform Law of Large Numbers (ULLN) for

. Furthermore, in order to analyze the bias properties of the local constant version, we need to be able to expand the stationary version of the score function w.r.t. . To this end, we introduce the following additional concepts:

Definition 2.

A stationary process is said to be -continuous w.r.t. if the following holds for all : and

The process is said to be -differentiable w.r.t. if there exists a stationary and ergodic process with such that

Our definition of time differentiability is slightly different from the one found in Dahlhaus et al. (2017) and other papers where differentiability w.r.t. has to hold almost surely; our version is slightly weaker since we only require it to hold in the -norm. The definition of -continuity w.r.t. is also weaker than almost sure continuity: If is almost surely continuous with the process is also -continuous since , , will then satisfy almost surely and so, by dominated convergence, . It is easily verified that -continuity w.r.t. implies stochastic equicontinuity of and so a ULLN holds, c.f. Lemma 1(i) in Appendix A.1.

We are now ready to state the regularity conditions under which our estimators are consistent:

Assumption 1.

(i) has compact support and (ii) is symmetric around 0; (iii) for some , , .

Assumption 2.

The parameter space where is compact. The true value .

Assumption 3.

(i) is ULS for some and with stationary approximation ; (ii) is -continuous; (iii) has a unique maximum at .

Assumption 1(i) imposes stronger than usual assumptions on and excludes, among others, the Gaussian kernel and higher-order kernels. It includes, on the other hand the Epanechnikov and the triangular kernel. The restriction that is used to ensure identification of the parameters when ; without this, identification is not necessarily guaranteed; see below for further discussion. The compact support assumption appears to be quite important for the analysis of local polynomial estimation of non-concave models: In order to establish uniform convergence of the likelihood we need to be compact as is standard in the literature. But under this restriction, it is easily checked that as for any given with for some and any . Thus, to allow for kernels with unbounded support, we would generally need the parameter space to collapse at as . Such shrinking behaviour in turn means that a Taylor expansion of w.r.t. is not possible and so standard arguments to establish asymptotic normality of cannot be applied. On the other hand, by restricting the support to be compact, it is easily checked that with defined in Assumption 2, is well-defined for all and (where we set for ). Moreover, is an interior point of and so in our analysis of we can employ standard arguments involving a Taylor expansion of the score function around this point. Thus, it appears as if the compact support assumption is needed for standard asymptotic arguments to apply. One could replace the definition of with

This allows for a larger parameter space in finite samples. However, as , and so we maintain the above definition of for simplicity.

Assumption 3(ii)-(iii) are standard in the analysis of “global” extremum estimators of stationary models on the form . In particular, for a given time series model, we can import existing results for verification of Assumption 3(ii)-(iii); see Section 5 for more details. 3(iii) in conjunction with the assumption that ensures that the local polynomial estimator identifies . If we allow for kernels that take negative values, we have to replace 3(iii) with the following more abstract identification condition: The function satisfies for any . We have not been able to provide primitive conditions for this to hold when can take negative values and so instead impose the positivity constraint on .

If the objective function is concave and is concave, we can replace Assumption 3(i)-(ii) with the following pointwise versions: For any , is locally stationary and ; see Theorem 2.7 in Newey and McFadden (1994). Under the above assumptions, the following consistency result holds:

Theorem 2.

Let Assumptions 1-3 hold. Then, as and . In particular, .

Note that the above theorem only shows consistency of and so at this stage we cannot make any statements regarding , . This is similar to other results for nonlinear extremum estimators where parameters associated with components appearing in the objective function that grow (shrink) with a slower (faster) rate than the leading one will not be identified; see, e.g., Theorem 9 in Han and Kristensen (2014) where a global consistency result is only provided for the component with the fastest rate.

However, with some further regularity conditions on the quasi-likelihood function, we can provide a more precise analysis of the estimators. With and , and , we introduce the score and hessian,

It is easily checked that belongs to the interior of for all large enough due to Assumption 4(ii) in conjuntion with Assumption 2 and, due to the consistency result, so will w.p.a.1. Thus, will satisfy the first-order condition which combined with the mean-value theorem yield

(5)

where is situated on the line segment connecting and . We then decompose the score function into the bias and variance component, , where

(6)

and with defined in eq. (2). This decomposition is different from the one usual employed in the analysis of kernel estimators of time-varying coefficients where is replaced by the stationary version of the score function evaluated at , ; see, e.g., Dahlhaus et al. (2017) and Dahlhaus and Subba Rao (2006). This choice has as consequence that the corresponding bias term in their case generally involves the time derivative process of the score function and so their analysis tend to impose stronger regularity conditions. By instead centering the analysis around , our version of the first-order bias component can be obtained through a standard Taylor expansion w.r.t. ,

(7)

Thus, our approach allows for a simpler derivation of the leading bias and variance terms under the following weak regularity conditions:

Assumption 4.

(i) is twice continuously differentiable; and (ii) lies in the interior of and is times continuously differentiable.

Assumption 5.

(i) is a martingale difference (MGD) array w.r.t. ; (ii) is ULS for some and with -continuous stationary approximation .

Assumption 6.

is ULS for some and with -continuous stationary approximation and is non-singular.

Assumption 5

is non-standard compared to the existing literature (as discussed above) and allows us to apply a martingale central limit theorem for locally stationary sequences (see Lemma

1(iii) in Appendix A.1) to . The MGD assumption amounts to assuming that the time-varying model is correctly specified and has to be verified on a case-by-case basis. Finally, Assumption 6 together with the expansion in eq. (7) is used to derive the limits of and ,

(8)
(9)

where and , . Combining these limit results, we obtain:

Theorem 3.

Suppose that Assumptions 1-6 hold. Then, as and ,

where and . In particular, for ,

(10)

where while and denotes the th element of and th element of , respectively.

Similar to existing results for local polynomial estimators in a cross-sectional setting, the bias component only depends on and so the estimators adapt to the curvature of . The asymptotic variance in Theorem 3 can be estimated using plug-in methods: It follows from the proof of Theorem 3 that

satisfies while .

Comparing the above limit results and the conditions under which they are derived with the corresponding ones found in Dahlhaus et al. (2017) and the references therein, we note that our bandwidth restrictions are much weaker than theirs. In particular, standard bandwidth selection rules can be employed here but not in their set-up. Moreover, the existing literature requires time derivatives of the stationary score function to exist and be well-behaved with these entering the bias expressions. We on the other hand are able to obtain results that are analogous to the ones found in the literature on local polynomial likelihood estimators; see, e.g., Theorem 1b of Fan et al. (1995).

Equation (10) holds for any value of and . However, when is even,

since all odd moments of

are zero due to the symmetry assumption. For example, for the local constant estimator (), Theorem 3 only informs us that the bias component of is . To obtain the leading bias term in this case, a higher-order expansion in eq. (6) is necessary. This expansion requires additional assumptions involving time derivatives and standard derivatives w.r.t. of :

Assumption 7.

is time-differentiable in the -sense at with time-derivative .

Assumption 8.

For : (i) exists and is ULS with -continuous stationary approximation .

Assumption 9.

,

The time-derivative will generally involve time-derivatives of the underlying stationary approximation of data. For example, if where the right-hand side is differentiable w.r.t. , then it takes the form

(11)

where is the time derivative of . Assuming in addition that is times continuously differentiable, the following asymptotic expansion of under Assumptions 7-8 holds:

(12)

The short memory condition imposed in Assumption 9 is used to control the variance component of the first-order bias term derived in Theorem 3. A sufficient condition for this assumption to hold is that is a geometric moment contraction, c.f. Proposition 2 in Wu and Shao (2004). We then obtain the following higher-order expansion of the bias component to be used when is even:

Theorem 4.

Suppose Assumptions 1-9 hold and is times continuously differentiable. Then, as and ,

(13)

where, with and ,

Corollary 1.

The local constant estimator () satisfies, as , and ,

(14)

where and, with ,

To our knowledge this is the first complete characterization of the bias components of local constant estimators in general time-varying parameter models. Compared to existing results for specific models (see, e.g., Dahlhaus and Subba Rao, 2006) , we see that our bias expression takes a different form. In particular, ours only involves the first-order time derivative process, , while existing results involve higher-order derivatives. This is due to the aformentioned different proof techniques. One can show that our and theirs bias expressions are equivalent under their stronger regularity conditions. Comparing Theorems 3 and 4, we see that the local linear and local constant estimators share the same convergence rate and asymptotic variance, but that the local constant estimator suffers from additional biases. This is consistent with the theory found for local constant and local linear estimators in a cross-sectional setting. However, compared with the theory in a cross-sectional setting (as in Fan et al., 1995), our bias takes a slightly different form. This is due to the fact that the data-generating process in our setting is non-stationary with the stationary approximation generating additional biases. Similar to the results found in a cross-sectional regression context, c.f. Fan (1993), we expect the additional biases of the local constant estimator to translate into reduced precision and efficiency compared to the local linear one.

Moreover, as is well-known, local polynomial estimators have the advantage of exhibiting automatic boundary carpentering. This property also holds in our setting near the end points of the sample ( and ). Formally, we analyze the properties of the estimators at and , respectively, for some . The following corollary reports the properties for the first case, a similar result holds for the latter one. We leave out the proof since it follows along the same arguments as Theorems 3 and 4, except that the asymptotic bias and variance terms take a slightly different form.

Corollary 2.

Let be the local polynomial estimator of order . Under the same conditions as in Theorem 4,

where , , and

This corollary shows that the asymptotic biases and variances for the local constant and linear estimators at the boundaries are different. While the difference between two asymptotic variances is only a scale, the bias of the local constant estimators vanishes at a slower rate than the local linear one.

4 Extension to time-varying generalized autoregressive models

The theory developed in Section 3 requires to be a Martingale difference. This assumption is violated in time-varying GARCH-type models as we shall see. We here demonstrate how our proof strategy can be generalized to cover the following class of generalized autoregressive models (GAR’s),

(15)

This class includes GARCH and Poisson Autogressions, amongst others. Since is not directly observed, the likelihood takes the form

(16)

where is initialized at for some fixed and depends on the functional form of and the assumed distribution of .

We will here only provide a theory for local constant estimators since the analysis of local polynomial estimators requires a completely different proof strategy compared to the one pursued in this paper. To see the complications that arise when analyzing local polynomial estimators of GAR’s, first recall that we need to replace