The parametric estimators applied by rolling are commonly used for the analysis of time series with nonlinear features, such as structural change due to time varying parameters and local trends due to growth episodes, for example. The rolling approach is expected to adjust the estimates and predictors for changes in the trajectory of a process, while relying on simple computation and updating formulas.
The aim of this paper is to study the properties of rolling estimators in the class of Temporally Local Maximum Likelihood (TLML) estimators [see e.g. Nicholls, Quinn (1980), Hastie, Tibshirani (1993), Cai (2007) for TLML in linear regression models and Anderes, Stein (2011) for random fields]. We study the TLML estimators of constant parameters, stochastic and stationary parameters and parameters with the Ultra Long Run (ULR) dynamics bridging the gap between the constant and stochastic parameters.
The TLML estimators are characterized by the underlying dynamic model and the selected sequence of weights. Our model of interest is an epidemiological SIS model defined by a set of deterministic differential equations. We consider its discrete time stochastic version based on Markov chains with heterogenous transition probability and the geometric and hyperbolic weights. We study the Poisson and Poisson-Gaussian approximations of the model that lead to generalized linear models (GLIM) and facilitate the numerical implementation of the TLML approaches.
In general, the functional TLML estimator indexed by time is close to a stationary process. The limiting stationary process can degenerate to a constant process if the weights are ”local”, i.e. are sufficiently discounting observations far apart from time . We derive the condition on the weights under which this degeneracy occurs and discuss the interpretations of the limiting constant, i.e. the pseudo-true value of the time varying parameter. Then, under the no-degeneracy condition on the weights, we discuss the asymptotic normality of TLML estimators. We also consider the case of stochastic parameters, which are smoothly varying and follow an Ultra Long Run process, in order to bridge the gap between models with constant parameters and models with stationary stochastically time varying parameters.
The paper is organized as follows. The dynamic model with stochastic time varying parameters and the associated TLML estimators are introduced in Section 2. Section 3 describes the asymptotic properties of the TLML estimator under the joint stationarity assumption on the observations and stochastic parameters. We explain how these asymptotic results can be used in practice to build time varying confidence intervals for the TLML estimator. An illustration based on the Susceptible-Infected-Susceptible (SIS) epidemiological model is provided in Section 4. This Section highlights the performance of the proposed approach in an epidemiological model. Section 5 concludes. Appendix 1 provides the second-order asymptotic properties of the TLML estimator. It shows how to get a more accurate approximation of the distribution of the estimator, by adjusting the TLML estimator for bias. Appendix 2 verifies the stationarity of the discrete time stochastic SIS model. Appendix 3 provides additional information on the quasi-collinearity and on the dynamic properties of the estimation errors.
2 The Dynamic Model and the TLML Estimator
2.1 The model
The model involves -dimensional observations and -dimensional unobserved possibly time-varying parameters The model is defined by the conditional distribution of given varying). The associated conditional density is specified as :
It depends on the time varying parameter. More specifically, the above conditional density depends on the value of the parameter at time only. For expository purpose, we assume that process is a Markov 111Alternatively, the right-hand side of (2.1) can be interpreted as the conditional composite likelihood at order 1 [see Varin, Reid, Firth (2011)]. process given , with a time heterogenous transition probability.
We do not specify explicitly the dynamics of parameter . Our statistical analysis assumes that:
i) is stochastic and strictly stationary;
ii) there is no parametric specification of the dynamic of
Therefore, our approach differs from other types of nonparametric analysis which assume that is a smooth deterministic function of time [see e.g. Fan, Gijbels (1996), Cai et. al. (2000), Zhou, Wu (2010)] and from approaches that use specific state-space models, either deterministic [see e.g. Cakmakli, Simsek (2020) for an application to epidemiology], or stochastic [see Kim, Nelson (1999), Canova, Perez, Forero (2015)].
2.2 The TLML estimator
Let us introduce a sequence of weights assumed nonnegative. The TLML estimator of is defined as :
The sequence defines a nonparametric functional approximation of the stochastic process . As is random, is a predictor of . Nevertheless, we follow the common practice of referring to as an ”estimator” of , even though is not deterministic.
We assume below that the solution exists and is unique 222 is not identifiable, although the dynamic of can become identifiable, if is stochastic [see Primiceri (2005)]. .
The TLML estimator is a pseudo maximum likelihood estimator because:
i) the heterogeneity of is omitted.
ii) the dated log-likelihoods are weighted, so that higher weights are assigned to the most recent current and lagged observations whenever the sequence is decreasing in . As we consider a dynamic framework for tracing the evolution of latent stochastic parameter in real time, a one-sided kernel is selected 333Our analysis differs from the cases where the parameters vary with an exogenous variable and a continuum of values of this variable are asymptotically observed.. The analysis in real time differs from the ex-post analysis, in which a long period of time is considered to detect either a small number of switching regimes with either sudden or smooth transitions [see, e.g. Francq, Gautier (2004)], or recurrent events in cross-sectional data [see, Yu et. al. (2013)], or to develop tests of constant parameters [Fan, Zhang (2000)]. Whereas a large part of the literature assumes longitudinal data, where many subjects are observed at multiple times [see, Fan, Zhang (2000), Lin, Ying (2001)], we consider a pure time series framework with only one available realization path.
The following Sections 3 and 4 examine the impact of this double misspecification.
2.3 The weights
Let us introduce the cumulated weights and cumulated square weights :
Various weighting schemes can be considered:
Example 1 : Unweighted estimator
This case corresponds to a standard ML estimator with omitted heterogeneity. We have :
Example 2 : Rolling weighted estimator
This case arises when the weights are zero for sufficiently large: , if . Then, , if .
Example 3 : Geometric weights
Let us assume : , with We have :
Example 4 : Hyperbolic weights
If , we see that :
exists if ; otherwise, we have : if ,
In the following Sections 3.1, 3.2, we consider a fixed weighting function , which does not depend on the number of observations . In epidemiological studies, the rolling is often performed over a window of one week, i.e. days [se e.g. Shapiro et. al (2020), PHO (2021), p.13, for COVID-19 analysis]. Later on, in Sections 3.3, 3.4, we introduce a kernel which allows for a well-chosen dependence on the number of observations.
3 Asymptotic Properties of the TLML Estimator Under Stationarity
The asymptotic properties of the TLML are derived in this Section under the following stationarity assumption :
Assumption A.1 : Joint Stationarity The joint process is strictly stationary.
Assumption A.1 includes the special case of independent of and a strictly stationary process . In this special case, there is no omitted heterogeneity in the TLML approach and the likelihood is well-specified. Generally, Assumption A.1 allows for other dynamics of , such as : i) i.i.d. stochastic parameters [Nicholls, Quinn (1980)], ii) Gaussian AR(1) stochastic parameter , where are i.i.d. N(0,1), and iii) ultra long run (ULR) process , where tends to 1 and tends to 0 at appropriate speeds when tends to infinity. The ULR process accommodates small smoothed stationary deviations of the parameter from a constant path [see Gourieroux, Jasiak (2021) for ULR processes, Froeb, Koyak (1994) for the notions of smoothness in stochastic time series].
Assumption A.1 considers stationary stochastic parameters. As mentioned earlier, it excludes the parameters defined as a smooth deterministic function of time, considered in the literature on nonparametric estimators of time varying coefficients [see e.g. Cai (2007), Zhou, Wu (2010)] that introduces nonstationary features and disregards the uncertainty on the parameter in the long run.
Below, we derive the main asymptotic results, which are the conditions of consistency, i.e. convergence to a pseudo-true value, the asymptotic normality of the local estimator under consistency [see Appendix 1 for the higher order expansion]. We also discuss the stationarity of the sequence of local estimators when the consistency condition is not satisfied. We study the ULR process of to bridge the gap between the constant parameters and stationary stochastic parameters. We restrict our attention to regularity conditions, which are necessary for clarity of the results.
Suppose, the weights are fixed and do not depend on the number of observations. The consistency of the estimator is deduced from the asymptotic behaviour of the optimization criterion under a suitable normalization.
Let us introduce the weighted criterion function:
Next, we assume that the moments ofexist up to order 2. Then, the first-order moment is
and the second-order moment is:
are the expectation and variance computed from the true distribution of process. This true distribution involves both the true conditional transition (2.1) and the true stationary distribution of .
Let us assume that a geometric mixing condition holds, for ease of exposition.
Assumption A.2 : Geometric mixing
The process is geometrically mixing with geometric order .
Then, we have :
We deduce the following result:
Proposition 1 : If tends to zero when tends to infinity, then the finite sample objective function tends to the limiting objective function , where denotes the expectation taken with respect to the true joint stationary distribution of , which depends on the true underlying dynamics of the stochastic parameter.
In particular this limiting function does not depend on the sequence of weights. Then, by applying the standard Jennrich’s argument [Jennrich (1969), Andrews (1987)], the solution of the finite sample optimization problem will tend to the solution of the asymptotic problem.
Corollary 1 : If tends to zero, when tends to infinity, then (exists asymptotically and) is consistent of
Remark 1 : If the joint process is a sequence of i.i.d. variables, we have , and the sufficient condition for consistency to becomes approaching zero when tends to infinity.
Remark 2 : If there is no parameter heterogeneity then , by the property of the Kullback-Leibler information criterion. In the presence of heterogeneity, there is no notion of a true value of parameter and
is a pseudo true value of this parameter that depends on the joint distribution of process. In general, this pseudo-true value depends on both the true distribution of and the true transition (2.1). In particular is not equal, or even close, to the expected parameter value .
The condition on the weights given in Proposition 1 is easy to interpret. The TLML estimator is convergent if the observations far from are sufficiently down-weighted. More precisely, for large, the estimator is close to the virtual estimator :
computed with an infinite sum. This virtual estimator is a fixed function of the stationary process . Therefore it is also stationary. More precisely, the joint process is strictly stationary.
Proposition 2 : Under the stationarity assumption A.1, the TLML estimator is equivalent to the virtual TLML estimator . This virtual functional estimator varying) is such that is a strictly stationary process.
The following two extreme cases can be distinguished: The process is either:
i) constant; Proposition 1 provides a sufficient condition for this property to hold and shows that .
ii) or stationary, but does not degenerate to a constant.
In the second case ii), the joint process is also stationary, but does not have mean zero, in general.
Example 5 : Gaussian observations with mean heterogeneity.
To illustrate the above results, let us consider the model with the observations that are i.i.d. with distribution conditional on . The TLML estimator is a weighted average of the observations :
It can be written as :
where are i.i.d. . Therefore, conditional on , the distribution of is normal with (conditional) mean and variance .
The unconditional mean of the estimator is equal to :
Therefore, the difference between the weighted local estimate and the time varying parameter has mean zero. That implies that the functional predictor is unbiased of .
i) Rolling weighted estimator (see Example 2).
For , we have :
It follows that is a moving-average (MA) transformation of process with a finite moving average order equal to . In particular, it does not converge when tends to infinity.
ii) Geometric weights (Example 3).
We get :
which is a nondegenerate moving average MA() transformation of an infinite MA order. The weights satisfy:
which tends to , if
iii) Hyperbolic rates (Example 4)
The consistency, i.e. the convergence to a pseudo true value, can be reached with a hyperbolic weight . Indeed, we observe the following asymptotic behaviour :
The consistency is achieved when with .
In some sense, if , the TLML estimator is not sufficiently local, as it tends to a global summary of the joint distribution of , . Although the estimator does not converge when it is ”too local”, it has a non-degenerate distribution that can be used for statistical inference.
3.2 Asymptotic Normality
When the TLML estimator converges to a pseudo-true value , we can write the first-order expansion of the first-order conditions (FOC) of the objective function. The FOC are :
The FOC can be expanded in a neighbourhood of the pseudo-true value :
where is negligible in probability with respect to the components of the left hand side of the equation.
Let us introduce the following matrices and vectors:
where and are independent of both time and weights. Therefore the expansion (3.8) can be rewritten as :
where is asymptotically .
Proposition 3 : Under Assumptions A-1, A-2, we have :
This is the case of double misspecification, which results from estimating the parameter as if the parameter were constant and as if the weighted log-likelihood function with the constant parameter were well-specified. In general, this double misspecification affects the limit of the estimator, its speed of convergence and entails the sandwich form of its asymptotic variance-covariance matrix [Huber (1967), White (1982)]. Let us discuss these effects in more detail.
Without the time variation of the parameter and is a martingale difference sequence (MDS) and the FOC are based on martingale estimating equations, using the terminology of Godambe, Heyde (1987). The mixing coefficient of this sequence corresponds to and the condition for convergence is
This MDS condition implies also that if . Moreover we have . We deduce the following Corollary:
Corollary 2 : Let us assume A.1 and the absence of heterogeneity then, if , we have :
Under the assumptions of Corollary 2, the presence of local weights has no effect on the limit of the estimator. It does not entail a ”sandwich” formula of asymptotic variance, but can change the speed of convergence of the estimator : the more local the weights are, the slower the speed of convergence.
Remark 3 : The computation of the TLML estimator at date is easy, but can become costly if it has to be computed ex-post from a large number of dates. In that case, this estimator can be replaced by an Iterative Local Maximum Likelihood (ILML) estimator, which is obtained from a one-step of the Newton-Raphson maximization algorithm applied to the local log-likelihood , with starting value [see e.g. Cai, Fan, Li (2000), Section 2.2].
3.3 The Frontier Between Local and Global Analysis
The asymptotic distribution of Proposition 3 is valid when the weights are sufficiently global. Otherwise, it follows from Proposition 2 that the sequence of TLML estimators is stationary, but the distributional properties of this sequence cannot be derived analytically for a time varying stationary stochastic parameter. These distributional properties can only be explored in a Monte-Carlo experiment by using different scenarios for the dynamics of (see Section 4 for a simulation study).
It is, however, possible to examine analytically what may arise at the ”frontier” between the local and global approaches by considering ULR processes of [Gourieroux, Jasiak (2021)]. For expository purpose, we consider the example of Gaussian observations with mean heterogeneity (Example 5), where . We assume a stationary Gaussian ULR process of (a triangular array, more precisely):
If tends to tends to 0 when tends to infinity, then the local-to-unity/small sigma autoregressive dynamic process tends to a time invariant trajectory The level of this trajectory is stochastic and differs from its theoretical mean . Thus at the limit, we get a purely predictable process in the terminology of Wold decomposition. An appropriate choice of this type of triangular array can be derived from a continuous time latent dynamic. Let us introduce a latent stationary Ornstein-Uhlenbeck process such that :
where ] is a Brownian motion, are positive parameters. The stationary distribution of is Gaussian with mean and variance Then, process satisfies a discrete time stationary autoregressive dynamic (3.10) with , and the same stationary distribution as . Thus tends to 1 and tends to zero at speed . When increases there is less stochastic time heterogeneity and the analysis with a given sequence of weights will become more global.
Let us now introduce a sequence of weights also indexed by as :
Hence, at the limit we get a purely predictable process.
When increases, the sequence of weights tends to a constant infinite sequence
Then, the TLML estimator is equal to :
where for large and the are i.i.d. standard normal, independent of process . We deduce the following asymptotic behaviour :
Proposition 4 : In the Gaussian model with ULR mean process, the TLML estimator with time varying weights tends to :
Proof : The numerator and denominator of the first component in the RHS of (3.13) are Riemann sums that converge to their associated (stochastic) integrals [Hansen (1992)]. The second component is close to
, by the Law of Large Numbers.
As a consequence of weights change, the sequence of local estimators is no longer stationary. However, for large , this stochastic sequence tends to a stochastic level that depends on the weights and the dynamic of , i.e. on the long run dynamic of .
This example shows that the consistency result of Section 3.1 can be extended to a stochastic parameter following a ULR process with accordingly chosen weights. Note that the choice of weights as in (3.12) supposes a large number of weighted observations.
The analytical derivation given above has to be used with caution when applied by rolling over a rather short window. For a fixed window of width , say, and equal weights, the estimator is :
The short window modifies the distribution of , which is now equal to
This Gaussian example is somehow misleading, when the associated weighted local maximum likelihood estimators are interpreted as weighted averages. Similar results could be derived for more complicated examples, when the pseudo first-order conditions are not linear with respect to the parameters. Then, the limiting distribution of will not have mean zero.
3.4 Confidence Intervals
Let us now explain how the asymptotic results derived in Sections 3.1-3.3 can be used to build time varying confidence intervals for the TLML estimators. We first consider a constant parameter and a stationary stochastic parameter models. Next, we show how to bridge the gap between these models. Scalar parameters are assumed for ease of exposition.
3.4.1 Model with constant parameters
The constant parameter model is a standard asymptotic framework. A confidence interval (CI) for can be based on Corollary 2. It is computed as:
In practice, the same CI are used for models with time-varying parameters. This approach is valid when does not vary too much in the neighbourhood of time . It is not valid, otherwise. The practice of estimating an asymptotic CI can be partly improved by adjusting it for ”finite sample” bias [see Appendix 1 for this adjustment to the weighted (pseudo) log-likelihood].
3.4.2 Model with stationary stochastic parameter
The stationary stochastic parameter model entails the curse of dimensionality, due to the unknown distribution of process. Therefore, a reasonable CI cannot be provided. However, despite this identification issue, information on the accuracy of estimators can be revealed. Two approaches can be followed, which are described below and used in Section 4.
i) Prediction accuracy
Instead of predicting the future values of parameter , one can focus on the short-run prediction of . For a constant parameter model, can be predicted by . When is time varying, can be predicted by . The joint process is stationary by Proposition 2. Then we can estimate nonparametrically the conditional distribution of given , or given and build a conditional prediction interval.
If we focus on parameter only, we can consider different dynamic models for the stationary evolution of . As the models are easy to simulate (see the application to SIS modelling), it is possible to draw the time-varying parameters , compute corresponding to this draw and then calculate the sequence of dynamic TLML estimators . Given that varying, is stationary, we can find by averaging over a large number of replications the marginal distribution of , for example [see Figures 10-11 in the application to SIS modelling]
3.4.3 ULR dynamics for bridging the gap
The ULR dynamics introduced in Section 3.3 bridges the gap between the model with constant parameters and analytical CI formula (Section 3.4.1), and the model with stationary stochastic parameters and the curse of dimensionality (Section 3.4.2).
Proposition 4 shows that, although the TLML estimator is not consistent, its distribution is equal to the distribution of a ratio of stochastic integrals, whose dynamics depend on the underlying parameters of the Ornstein-Uhlenbeck process (3.11). Proposition 4 allows however for different choices of , i.e. of parameter . Let us compute TLML estimates corresponding to values .
For large , the joint asymptotic distribution of the -dimensional vector of TLML estimators is equal to the distribution of the vector of ratios of stochastic integrals , where is defined in Proposition 4.
This asymptotic distribution is unknown, as it depends on the two unknown parameters . These parameters can be estimated by applying the maximum likelihood method to the observed with the Ornstein-Uhlenbeck likelihood. Given these, one can estimate the limiting distribution of .
The estimators of parameters are not expected to be accurate, as they are based on a finite sample of summary statistics . The treatment of this ”finite sample” issue is out of the scope of the present paper. However, the confidence intervals can be obtained by applying a test inversion bootstrap approach [Carpenter (1999)].
4 An Illustration
To illustrate the finite sample and asymptotic properties of the TLML estimator, we consider below an epidemiological dynamic model with two compartments , susceptible individuals, , infected individuals, where after the infectious period the individual is not immune and becomes again susceptible. This type of model is known under the acronym SIS [Susceptible-Infected-Susceptible] [see e.g. Brauer et al. (2008) for a discussion of SIS models]. The SIS models are used in applications to some sexually transmitted diseases and bacterial diseases [Hethcote, Yorke (1994)], such as tuberculosis, meningitis and gonorrhea. We consider the SIS model in our illustration due to its simplicity allowing us to explain its stochastic extensions and to perform statistical inference. This model is also interesting because the trajectories of the counts of infected have nonlinear dynamic features, such as peaks and cluster effects that may render inaccurate the TLML approaches during some episodes of an epidemic.
4.1 The (Stochastic) SIS Model
4.1.1 The differential equation
In the epidemiological literature the SIS model is usually defined in continuous time, assuming a population of infinite size and deterministic dynamics. Let denote the proportion of individuals in compartment , at time . By construction . Then, the evolution of the proportion of infected individuals satisfies the differential equation :
Parameters are the infinitesimal rates of infection and recovery, respectively. They depend on the time unit, whereas the ratio is independent of the time unit. This explains the role of this ratio in the continuous time epidemiological literature. The change in proportion is due to i) the rate of new infections , that involves the proportion of susceptible with a transmission (or contact rate) parameter ii) the recoveries with recovery rate
Let us consider equation (4.1) when :
If the initial value is smaller (resp. larger) than , then starts to increase (resp. decrease), as indicated by the sign of the derivative, up to value . Thus the condition (4.2) is a stability condition for the differential condition (4.1) with an equilibrium at 444If decreases to . At the limit, the disease disappears. .
Equation (4.1) can be solved analytically. For instance, if we get :
This is a logistic pattern, which is increasing asymptotically up to . Formula (4.3) provides an alternative parametrization of the proportion of infected by means of instead of .
Remark 4 : The SIS model is quite flexible and various extensions of this model have been considered in the literature by introducing a delayed recovery [Cooke, York (1973), Greenberg, Hoppensteadt (1975)], temporary immunity [Xu, Li (2018)], nonlinear incidence rate [Das et al. (2011), Rifhat et al. (2017)], or varying population size [Hethcote, Van den Driessche (1995)].
4.1.2 Stochastic SIS Model
The continuous time deterministic model is of limited use for statistical inference for the following reasons. First, the population of interest has a finite size . Second, the available observations are often recorded daily, which is a discrete time setting. Moreover, as the model is deterministic, the statistical theory is not applicable.
A discrete time stochastic analogue of the continuous time deterministic SIS model does not suffer from these limitations. It is based on the analysis of individual histories through Markov chains [see e.g. Gourieroux, Jasiak (2020)]. Let us introduce the following variables:
, the counts of individuals in state at time with :
the counts of individuals moving from to between and .
The counts of migrations between the states of infected and susceptible are given below :
In particular, we get the ”conservation of mass idendities” [Matis, Kiffe (2000), Breto et al. (2009)] :
We get the following property :
Proposition 5 : Under the Markov chain assumption, conditional on the lagged counts the variables and
are independent with binomial distributions :
where the contagion and recovery parameters take values between 0 and 1.
Even though we are still using the notation and for the contagion and recovery parameters, these parameters are not identical in the continuous and discrete time. In the discrete time, they are interpreted as daily rates instead of instantaneous rates. This explains the different domains of these parameters, which are in continuous time and in discrete time, respectively.
It follows that :
Corollary 3: The counts of infected individuals is a Markov process with conditional distribution , where denotes the convolution of distributions.
The first stochastic feature is due to the finite size of the population. If tends to infinity, we get approximately:
which is a discrete time analogue of equation (4.1).
Let us now discuss the stationarity of process . We observe that if , then . Therefore, the state 0 is an absorbing state of the chain and its stationary distribution is the point mass at 0. However, we are mainly interested in episodes when the disease exists, i.e. (and is often large). For such episodes, we can approximate the distribution in Corollary 3 by , where denotes the binomial distribution restricted to strictly positive values. Below, denotes a Markov chain with such transition distribution.
Proposition 6: A sufficient stationarity condition for nondegenerate process is .
Proof : See Appendix 2.
Considering the process , the condition given above can be easily explained as follows.
Let us assume that the process is stationary. Since it is bounded, the moments exist and we have :
where . This equality implies : Then, the following two cases arise:
i) , or equivalently , a.s, since is nonnegative.
ii) , which is the condition of Proposition 6.
Moreover, we have: In our framework and are small. If