1 Introduction
Observationdriven models like the GARCH model of Bollerslev, where timevarying parameters are driven by functions of lagged observations, are typically viewed as data generating processes. As such, all relevant information is encoded on past observations and there is no room for using actual and future observations when estimating timevarying parameters. However, they can also be viewed as predictive filters, as timevarying parameters are onestepahead predictable. This idea was largely exploited by Daniel B. Nelson, who explored the asymptotic properties of conditional covariances of a misspecified GARCH under the assumption that the data generating process is a diffusion^{2}^{2}2The interpretation of GARCH processes as filters is well described in this statement by NELSON199261: “Note that our use of the term ‘estimate’ corresponds to its use in the filtering literature rather than the statistics literature; that is, an ARCH model with (given) fixed parameters produces ‘estimates’ of the true underlying conditional covariance matrix at each point in time in the same sense that a Kalman filter produces ‘estimates’ of unobserved state variables in a linear system”.; see NELSON199261, NelsonFoster, NELSON1995303 and NelsonSmooth. In particular, NelsonSmooth showed how to efficiently use information in both lagged and led GARCH residuals to estimate the unobserved states of a stochastic volatility model. Despite many observationdriven models have been proposed in the econometric literature, little attention has been paid to the problem of smoothing within this class of models in case they are employed as misspecified filters rather than data generating processes.
We aim at filling this gap by introducing a smoothing method for a general class of observationdriven models, namely scoredriven models of GAS1 and Harvey_2013, also known as “Generalized Autoregressive Score” (GAS) models or “Dynamic Conditional Score” (DCS) models. We show that, in the steady state, Kalman filter and smoothing recursions for linear Gaussian models can be rewritten in terms of the score of the conditional density, the Fisher information matrix and a set of static parameters. In particular, the predictive filtering recursion turns out to have the form of scoredriven models. The latter can therefore be viewed as approximate filters for nonlinear nonGaussian models. The performances of these filters have been examined by GAS3, who showed that misspecified scoredriven models provide similar forecasting performances as correctly specified parameterdriven models. Based on the same logic, we build a new class of approximate nonlinear smoothers that have similar form to Kalman backward smoothing recursions but employ the score of the nonGaussian density. The resulting smoothing method is very general, as it can be applied to any observation density, in a similar fashion to scoredriven models. We name the newly proposed methodology ScoreDriven Smoother (SDS). Similarly, we introduce a ScoreDriven Update (SDU) filter, allowing to update predictive filtered estimates once new observations become available.
Smoothing with the SDS requires performing a backward recursion following the standard scoredriven forward recursion to filter timevarying parameters. While going backward, the SDS updates filtered estimates by including the effect of actual and future observations and leads to a more efficient reconstruction of timevarying parameters. In our experiments, we have found that, compared to filtered estimates, the SDS provides gains up to in mean square errors, for a wide class of data generating processes. Considered that the likelihood of observationdriven models can be typically written down in closed form, smoothing with the SDS is particularly advantageous from a computational point of view. In contrast, the classical theory of filtering and smoothing for nonlinear nonGaussian models requires the use of computationally demanding simulationbased techniques (DurbinKoopman). Another relevant advantage of the SDS over traditional simulationbased methods is that extension to a setting with multiple timevarying parameters is immediate, as it maintains the same simple form as in the univariate case.
This general framework allows to construct confidence bands around filtered and smoothed estimates. In observationdriven models, confidence bands are typically needed because static parameters are replaced by their maximum likelihood estimates. In the language of BLASQUES2016875, this is known as parameter uncertainty. However, if observationdriven models are employed as filters, the latent state variables are not completely revealed by past observations. Thus, also filtering uncertainty has to be considered when building confidence bands. While confidence bands reflecting parameter uncertainty can be built through the methods developed by BLASQUES2016875, it is less clear how one can take into account filtering uncertainty in observationdriven models. Zamojski proposed a bootstrap based method to construct insample confidence bands for the GARCH. As acknowledged by the author, this method leads to underestimate filtering uncertainty and provides narrow confidence bands. We show that, as a byproduct of our results, one can build both insample and outofsample confidence bands accounting for filtering uncertainty in scoredriven models. We examine in detail the construction of confidence bands in the case of the GARCH model. A general and systematic treatment of filtering uncertainty in scoredriven models is provided by BBCL.
Scoredriven models have been successfully applied in the recent econometric literature. For instance, GAS2 developed a multivariate dynamic model for volatilities and correlations using fat tailed distributions. HarveyLuati described a new framework for filtering with heavy tails while OhPatton introduced highdimensional factor copula models based on scoredriven dynamics for systemic risk assessment. As shown by Blasques, in the class of observationdriven models, scoredriven models are locally optimal from an information theoretic perspective. For any scoredriven model, one can devise companion SDS and SDU recursions. In particular, the SDS is useful for offline signal reconstruction and analysis, while the SDU can be used for online updating of timevarying parameters. We examine in detail the companion SDS and SDU recursions of popular observationdriven models, namely the GARCH, the MEM model of EngleMEM and EngleGallo and an AR(1) model with a timevarying autoregressive coefficient. In oder to show the effectiveness of the proposed methodology in a setting with multiple timevarying parameters, we consider the GAS model of GAS2 and the WishartGARCH model of realWishart. We show, both on simulated and empirical data, the advantages of SDS and SDU over standard filtered estimates.
A related smoothing technique for a dynamic Student’s location model was introduced by Harvey_2013, who replaced prediction errors in the Kalman smoothing recursions with a martingale difference that is proportional to the score of the distribution. An application of this smoother can be found in Caivano2016. The main difference with our approach is that we write the Kalman recursions for the mean of timeinvariant linear Gaussian models in a general form that only depends on the score and the Fisher information matrix of the observation density. The resulting smoothing recursions are different by those obtained by Harvey_2013 and are easily applicable to a generic scoredriven model by replacing the Gaussian density with the observation density at hand. The SDS is also related to the “approximation via mode estimation” technique described by DurbinKoopman2000 and DurbinKoopman. These authors proved that one can find a sequence of approximating linear Gaussian models enabling the computation of the conditional mode of a nonGaussian model via a NewtonRaphson algorithm. The main difference with our methodology is that the SDS requires a unique, nonlinear recursion rather than a sequence of Kalman recursions for approximating linear Gaussian models. In addition, in our methodology, the filter coincides with wellknown observationdriven model (e.g. GARCH, MEM, ACD, etc) while the approximation via mode estimation technique uses a sequence of filters that are not easily interpretable as dynamic models.
By performing extensive Monte Carlo simulations of nonlinear nonGaussian statespace models, we compare the performance of the SDS to that of correctly specified parameterdriven models. In particular, we consider two stochastic volatility models and a stochastic intensity models. Importance sampling methods allow to evaluate the full likelihood of these models. The Quasi Maximum Likelihood (QML) method of HarveySV is also considered as a benchmark when estimating the two stochastic volatility models. Compared to correctly specified models, the losses incurred by the SDS are very small in all the simulated scenarios and are always lower, on average, than 2.5% in mean square errors. Moreover, the SDS systematically outperforms the QML. Computational times are decisively in favour of the SDS. For the models used in the simulation study, we found that smoothing with the SDS is on average 215 times faster than smoothing with efficient importance sampling techniques. The advantages of the proposed method are also shown on empirical data. Using realized covariance as a proxy of latent covariance, we show that SDU and SDS covariance estimates obtained through the dynamic GAS model fitted on Russel 3000 stock returns are superior to standard filtered scoredriven estimates. The analysis allows to examine the informational content of present and future logreturns from a dynamic covariance modelling perspective.
The rest of the paper is organized as follows: section 2 introduces the SDS and provides the main theoretical results; section 3 describes several examples of SDS’s and discusses how to construct confidence bands; section 4 shows the results of the Monte Carlo study; in section 5 the SDS is applied on an empirical analysis involving assets of Russel 3000 index; section 6 concludes.
2 Theoretical framework
In this section, we discuss in detail the main theoretical results conveying to the formulation of our approximate, nonlinear smoothing technique. We start by showing that, in the steady state, the classical Kalman filter and smoothing recursions for linear Gaussian models can be rewritten in an alternative form that only involves the score of the conditional likelihood, the Fisher information matrix and a set of static parameters. Abstracting from the linear Gaussian setting, these recursions can be viewed as the approximate filtering and smoothing recursions for a nonGaussian model by computing scores and information based on the nonGaussian density. We then show that filtering uncertainty in scoredriven models can be evaluated as an immediate byproduct of our results.
2.1 Kalman filtering and smoothing
Let us consider a linear Gaussian statespace representation:
(1)  
(2) 
where
is a column vector of state variables and
is a column vector of observations. The parameters , , and are system matrices. Let denote the set of observations up to time , namely . We are interested in updating our knowledge of the underlying state variable when a new observation becomes available and to predict based on the last observations . We thus define:(3)  
(4) 
The Kalman filter allows to compute recursively , , and . Assuming , where and are known, for , we have (Harvey, DurbinKoopman):
(5)  
(6)  
(7) 
and
(8)  
(9)  
(10) 
where . The loglikelihood can be computed in the prediction error decomposition form, namely:
(11) 
Smoothed estimates , , , can be computed through the following backward recursions:
(12)  
(13) 
and
(14)  
(15) 
where , , and . The conditional distribution of
is Gaussian with mean and variance given by
, , , depending on the conditioning set.2.2 A more general representation
In Appendix A we prove the following:
Proposition 1
Note that a steady state solution exists whenever the system matrices are constant (Harvey, DurbinKoopman). In this case, the variance matrix converges to after few time steps. The new Kalman recursions for the mean are reparameterized in terms of the score and the Fisher information matrix . This representation is equivalent to the one in equations (6), (7) and (12), (13). However, it is more general, as it only relies on the predictive density . In principle, the forward recursions (16), (17) and the backward recursions (18), (19) can be applied to any parameterdriven model for which a measurement density is defined.
2.3 SDS recursions
Note that the predictive filter (17) has an autoregressive structure and is driven by the score of the conditional likelihood, i.e. it has the form of scoredriven models of GAS1 and Harvey_2013. Thus, if one looks at scoredriven models as filters, it turns out that the scoredriven filter (SDF
hereafter) is optimal in case of linear Gaussian models. In case of nonlinear nonGaussian models, the SDF can be regarded as an approximate nonlinear filter. The main difference with the Kalman filter is that the Gaussian score is replaced by the score of the true conditional density, thus providing robustness to nonGaussianity. As shown by
GAS3, scoredriven filters have similar predictive accuracy as correctly specified nonlinear nonGaussian models, while at the same time providing significant computational gains. Indeed, the likelihood can be written in closed form and standard quasiNewton techniques can be employed for optimization.Based on the same principle, we introduce an approximate nonlinear smoother allowing to estimate timevarying parameters using all available observations. In case of linear Gaussian models, the Kalman smoother is a minimum variance linear unbiased estimator (MVLUE) of the state. Thus, we define our smoother in such a way that it coincides with the latter in this specific case. In case of nonlinear nonGaussian models, it maintains the same simple form of Kalman backward smoothing recursions but replaces the Gaussian score with the one of the nonGaussian density.
Let us assume that observations , , are generated by the following observation density:
(21) 
where is a vector of timevarying parameters and is a vector of static parameters. We generalize the filtering and smoothing recursions (16)(19) for the measurement density as:
(22)  
(23) 
and:
(24)  
(25) 
where and . The predictive filter in equation (23) has the same form of scoredriven models. The term is now the score of the measurement density , namely:
(26) 
while is the information matrix, which may be timevarying. The vector and the two matrices are static parameters included in . They are estimated by maximizing the loglikelihood, namely:
(27) 
Thus, one can run the backward smoothing recursions (24), (25) after computing the forward filtering recursions (22), (23), in a similar fashion to Kalman filter and smoothing recursions. Note that the above recursions are nonlinear, as the score of a nonGaussian density is typically nonlinear in the observations. The filter in equation (22) allows to update the current estimate once a new observation becomes available. While going backward, the smoothing recursions (24), (25) update the two filters and using all available observations. Smoothed estimates are generally less noisy than filtered estimates , and provide a more accurate reconstruction of the timevarying parameters.
It is a standard practice in scoredriven models replacing the score with the scaled score . The role of the scaling matrix is to take into account the curvature of the loglikelihood function. GAS1 discussed several choices of based on inverse powers of the information matrix . For instance, given a normal density with timevarying variance, if , one recovers the standard GARCH model. The filtering and smoothing recursions (22)(25) are obtained if one sets
equal to the identity matrix. When using a scaled score
, the filtering recursions (22), (23) become:(28)  
(29) 
Since the score is now scaled by , the term in equation (24) has to take into account the new normalization. We thus replace with . As a result, we obtain the general backward smoothing recursions:
(30)  
(31) 
Note that the second equation is unaffected, as the term already corrects for the scaling. For instance, if , we obtain:
(32)  
(33) 
that is, the information matrix disappears because its effect is already taken into account when scaling the score. If , we get:
(34)  
(35) 
From a computational point of view, the backward recursions (30), (31) are simple since and are typically available from the forward filtering recursion. We term the approximate smoother obtained through recursions (30), (31) as ScoreDriven Smoother (SDS). Basically, for any scoredriven model, one can devise a companion SDS recursion that only requires the , and the static parameters, as estimated through the SDF. Note that the forward recursion (28) is the analogue of recursion (6) in the Kalman filter and allows to update SDF estimates once a new observation becomes available. We denote the approximate ScoreDriven Update filter (28) by SDU. The proposed methodology can thus be schematically represented through the following procedure:

Estimation of static parameters:

Forward predictive and update filter:

Backward smoother:
2.4 Filtering uncertainty
The general framework developed in section (2.3) also allows to construct insample and outofsample confidence bands around filtered and smoothed estimates. As underlined by BLASQUES2016875, confidence bands can reflect both parameter and filtering uncertainty. Parameter uncertainty is related to the fact that static parameters are replaced by their maximum likelihood estimates. Both observationdriven and parameterdriven models are affected by parameter uncertainty. In observationdriven models, confidence bands reflecting parameter uncertainty can be constructed through the methods developed by BLASQUES2016875. Filtering uncertainty is related to the fact that timevarying parameters are not completely revealed by observations. As such, it is absent in observationdriven models, where timevarying parameters are deterministic functions of past observations. However, if observationdriven models are regarded as filters, one is interested in constructing confidence bands around filtered and smoothed estimates reflecting the conditional distribution of the underlying state variable.
In linear Gaussian models, filtering uncertainty can be assessed through the variance matrices , , introduced in section (2.1), which provide the conditional variance of the unobserved state variable. It is instead less clear how one can quantify filtering uncertainty in misspecified observationdriven models. Zamojski proposed a bootstrap based method for assessing filtering uncertainty in GARCH filters. Confidence bands constructed through this technique tend to underestimate filtering uncertainty, because they are based on bootstraps of the filter rather than the underlying state variable. In addition, the method of Zamojski does not allow to construct outofsample confidence bands, which are often needed in practical applications.
In our framework, insample and outsample confidence bands can be constructed by exploiting the relation between Kalman filter recursions and scoredriven recursions. In section 2.2, we have shown that the steady state variance matrix can be expressed as:
(36) 
In the scoredriven framework, the analogue of , which we denote by , is then given by:
(37) 
where the scaling matrix is introduced to take into account different normalizations of the score. From eq. (9), (A.2), the analogue of is:
(38) 
Similarly, the analogue of , from eq. (14), (15), is:
(39)  
(40) 
with and .
Confidence bands can be computed as quantiles of the conditional distribution of the state variable. For a general statespace model, the latter is nonGaussian and is not known analytically. Assuming a Gaussian density generally leads to underestimate filtering uncertainty, as the true conditional density is typically fattailed. In order to construct robust confidence bands, we use a more flexible density determined by matching location and scale parameters with those of the normal density. This method is described in its full generality by
BBCL. In section 3.2, we show an application to the GARCH and assess the performance of robust confidence bands in a simulation study.3 Examples of SDS recursions
In this section we provide several examples of SDS estimates. As a first step, we focus on two volatility models that are quite popular in the econometric literature, namely the GARCH model of Bollerslev and the multiplicative error model (MEM) of EngleMEM and EngleGallo. These are scoredriven models which are susceptible of treatment within our framework. As a third example, we present an AR(1) model with a scoredriven autoregressive coefficient. The timevarying autoregressive coefficient allows to capture temporal variations in persistence, as well as nonlinear dependencies (BlasquesNonlinear
). Autoregressive models with timevarying coefficients have been employed by
DELLEMONACHE2017482 and SHARK for inflation and volatility forecasting, respectively.One of the advantages of the SDS recursions (30), (31) is that they maintain the same simple form when , is a vector containing multiple timevarying parameters. In this multivariate setting, the use of simulationbased techniques would be highly computationally demanding. In order to test the SDS in a multivariate setting, we consider the GAS model of GAS2 and the WishartGARCH model of realWishart. The former is a conditional correlation model for heavytail returns while the latter is a joint model for the dynamics of daily returns and realized covariance matrices. In these models, the number of timevarying parameters grows as the square of the number of assets and therefore they provide an interesting multivariate framework in which to assess the performance of the SDS.
1. GARCHSDS
Consider the model:
(41) 
The predictive density is thus:
(42) 
Setting and , equations (28), (29) reduce to:
(43)  
(44) 
In particular, the predictive filter (44) is the standard GARCH(1,1) model. The smoothing recursions (30), (31) reduce to:
(45)  
(46) 
.
2. MEMSDS
Consider the model:
(47) 
where
has a Gamma distribution with density
. The predictive density is thus given by:(48) 
Setting and , equations (28), (29) reduce to:
(49)  
(50) 
In particular, the predictive filter (50) is the standard MEM(1,1) model. The smoothing recursions (30), (31) reduce to:
(51)  
(52) 
.
3. AR(1)SDS
Consider the model:
(53) 
The predictive density is thus given by:
(54) 
Setting and , equations (28), (29) reduce to:
(55)  
(56) 
while the smoothing recursions (30), (31) reduce to:
(57)  
(58) 
.
4. GASSDS
Let denote a vector of demeaned daily logreturns. Consider the following observation density:
(59) 
where is a timevarying covariance matrix and
is the number of degrees of freedom. Note that
is a normalized Student distribution such that . Applying the filtering equation (29) leads to the GAS model of GAS2. Closed form formulas for the score and information matrix are reported in GAS2. These authors also proposed two parameterizations of leading to positivedefinite estimates. The first is similar to the one used in the DCC model of EngleDCC, while the second is based of hyperspherical coordinates. In the two parameterizations, the number of timevarying parameters is and , respectively.5. WishartGARCHSDS
Let us assume that, in addition to daily logreturns , we can compute realized measures from the intraday returns of the assets. Let denote a positive definite estimate of the realized covariance matrix. Let also denote the field generated by and . The observation density in the WishartGARCH model is:
(60)  
(61) 
where
is a multivariate zeromean normal distribution with covariance matrix
and is a Wishart distribution with mean and degrees of freedom . Assuming that and are conditionally independent given , the conditional loglikelihood can be written as:(62) 
where:
(63)  
(64) 
Here , and is the multivariate Gamma function of order . We denote the vector of timevarying covariances by , . The score and the information matrix can be computed as reported in realWishart. OpschoorHeavy proposed an alternative specification with a heavy tail distribution for both returns and realized measures. Similar SDS recursions can be recovered for this fattail specification using our general framework.
Figures 1  3 show several examples of SDS estimates from the above models. The timevarying parameters follow both deterministic and stochastic patterns and are generated as described in the next paragraph.
3.1 Comparison of SDF and SDS estimates
In order to show the effectiveness of the proposed methodology, we compare SDF and SDS estimates. It is natural expecting that SDS estimates are affected by lower estimation errors, as they use more information when reconstructing timevarying parameters. However, comparing with the latter allows to provide a quantitative assessment of the benefits of using the SDS in place of standard scoredriven estimates.
We first focus on the univariate models (GARCH, MEM, AR) and simulate timeseries of observations with different dynamic patterns for the timevarying parameters. The first 2000 observations are used to estimate the models while the remaining observations are used for testing. Let generically denote the timevarying parameters , and in the three models. We consider the following data generating processes for :

Slow sine:

Fast sine:

Ramp:

Step:

Model:
. We set , , , , . For some of these dynamic specifications, figure 1 shows examples of filtered and smoothed estimates of timevarying parameters obtained through the GARCH. As expected, SDS estimates are less noisy than filtered estimates and provide a more accurate reconstruction of timevarying parameters.
Table 1 shows average MSE and MAE of SDF and SDS estimates, for all the patterns considered above. We also report MSE and MAE obtained through the SDU filter in equation (28). The latter updates once a new observation arrives. This translates into a slight improvement over filtered estimates. The SDS, using all available observations, significantly improves on SDF estimates, with relative gains larger than and lower than in mean square errors.
We now consider the two multivariate models, namely the GAS and the WishartGARCH. We compare SDF estimates to SDU and SDS estimates in a simulation setting where time series of daily realized covariance matrices and logreturns are generated as described in Appendix B. The aim of the experiment is to estimate the true covariance matrix from observations of daily returns in the GAS model and from observations of both daily returns and realized covariance matrices in the WishartGARCH model. We consider three scenarios where the number of assets is respectively, and thus we have timevarying covariances^{3}^{3}3We implement the GAS model using hyperspherical coordinates, and thus we have timevarying covariances..
For , figures 2 and 3 compare SDF and SDS estimates of the and elements of the simulated covariance matrix in the GAS and WishartGARCH models, respectively. As in the previous univariate cases, smoothed estimates provide a better reconstruction of the timevarying covariances. Note that, compared to the GAS model, the WishartGARCH provides estimates which are closer to the simulated , as they are obtained by conditioning on a larger information set.
In order to quantify estimation errors, we use the root mean square error (RMSE) and the quasilikelihood (Qlike), which are robust loss measures for covariance estimates (Patton2011246). These are defined in Appendix B. Table 2 shows relative RMSE and Qlike gains of SDU and SDS estimates over SDF. We first note that SDU and SDS provide significantly lower RMSE. In the GAS model, the relative gain of the SDU is roughly equal to 3%, while the one of SDS is larger than 14% and lower than 19%. In the WishartGARCH model, the relative gain of SDU is larger than 7% and lower than 13%, while the one of the SDS is larger than 13% and lower than 19%. It is interesting to note that SDU gains are significantly larger in the WishartGARCH model. This is due to the fact that today’s realized covariance is a highly informative proxy of , thus leading to drastic RMSE reduction when included in the information set. In contrast, daily returns are less informative and thus it is necessary to include all available observations to achieve significant RMSE reduction in the GAS model. If one looks at the Qlike loss, relative gains of SDU and SDS are moderate compared to RMSE but they are statistically significant. Even in this case, SDU gains are larger in the WishartGARCH model due to the highly informative content of realized covariance measures.
3.2 Confidence bands
In section (2.4), we have seen that an estimate of the conditional variance of the state variable is given by the variance matrices , and defined in eq. (37), (38), (40). As in the Kalman filter, one can use these variances to construct confidence bands around filtered and smoothed estimates. However, the conditional density of the state variable is typically fattailed and cannot be written in closed form. Assuming normality generally provides narrow confidence bands and thus underestimates filtering uncertainty.
Robust insample and outofsample confidence bands can be constructed by computing quantiles of a more flexible distribution determined by matching location and scales parameters. We illustrate here an application of this technique in the case of the GARCH and provide a systematic treatment in BBCL.
Let us consider the following stochastic volatility model:
(65)  
(66) 
We are interested in computing quantiles of the conditional density of . Filtered and smoothed estimates of the latent logvariance are recovered by computing the scoredriven recursions for the following observation density:
As an outcome of this procedure, we also obtain the conditional variances , and . Let , , be the conditional distribution function of . The quantile function of is then given by:
(67) 
As a first approximation, we compute quantiles by assuming . For , , we obtain , , , respectively. These conditional densities depend on parameters which are an output of the scoredriven recursions and thus confidence bands can be easily computed through eq. (67) using the Gaussian quantile function. We then assume , i.e. a Student’s distribution with location , scale and degrees of freedom. If , we recover the Gaussian confidence bands. However, if is finite, confidence bands will be larger and provide a better approximation to the true filtering uncertainty. In this example, the parameter is chosen by fitting a distribution on the residuals of an AR model estimated on . More sophisticated techniques are developed in BBCL.
In order to test the quality of confidence bands, we generate 1000 time series of observations of the stochastic volatility model (65), (66). The values of static parameters are chosen in order to be similar to those obtained when estimating the model on real financial returns: , , . Figure (4) shows one of the simulated patterns, together with 95% confidence bands for filtered and smoothed estimates computed through the method described above.
We estimate the GARCH in the subsample comprising the first 2000 observations and construct insample confidence bands. In the remaining subsample of 2000 observations, outofsample bands are constructed using previous parameter estimates. Both Gaussian and robust confidence bands are built at 90%, 95%, 99% nominal confidence levels. We compare the nominal confidence level to the coverage, defined as the fraction of times the true variance path is inside the confidence bands. Table (3) shows average coverages for insample and outofsample SDF, SDU and SDS confidence bands. As expected, confidence bands constructed by assuming a Gaussian density provide an average coverage which is significantly lower than the nominal confidence level, meaning that they underestimate filtering uncertainty. In contrast, the average coverage of robust confidence bands is very close to the nominal coverage. Similar results are found when changing the variance in the latent process. In particular, for larger values of , the quality of Gaussian confidence bands further deteriorates, while robust bands still provide a good matching to the nominal level. A systematic treatment of the technique described here and, more generally, of filtering uncertainty in observationdriven models can be found in BBCL.
4 Monte Carlo analysis
In this section we perform extensive Monte Carlo simulations to test the performance of the SDS under different dynamic specifications for the timevarying parameters. Since we interpret the SDS as an approximate smoother for nonlinear nonGaussian models, we compare its performance to that of correctly specified parameterdriven models. The main idea is to examine the extent to which the approximation leads to similar results as correctly specified parameterdriven models. In this case, the use of the SDS would be particularly advantageous from a computational point of view, as the likelihood of scoredriven models can be written in closed form and smoothing can be performed through a simple backward recursion. This analysis is similar in spirit to that of GAS3, who compared scoredriven models to correctly specified parameterdriven models and found that the two classes of models have similar predictive accuracy, with very small average losses. We find a similar result for the SDS.
4.1 Linear nonGaussian models
We first consider an AR(1) model with a distributed measurement error:
(68)  
(69) 
We choose and . The signaltonoise ratio is defined as . The corresponding observation driven model is a location model (Harvey_2013) with predictive density:
(70) 
Setting , equation (29) reduces to:
(71) 
while the smoothing recursions (30), (31) reduce to:
(72)  
(73) 
. We compare standard Kalman filtered and smoothed estimates to SDF, SDU and SDS estimates. Similarly to previous simulation studies, we generate 1000 time series of 4000 observations and use the first subsample of 2000 observations for estimation and the remaining observations for testing. Table 4 shows relative MSE and MAE for different values of . Note that SDF, SDU and SDS provide better estimates than standard Kalman filter and smoother. In particular, we observe large differences for low values of , where the distribution strongly deviates from the Gaussian, and for low values of , at which accounting for the nonnormality of the measurement error becomes more important. Note also that gains of SDS over Kalman smoother estimates are larger than gains of SDF over the Kalman filter for low and . These results confirm the ability of the SDS to provide robust smoothed estimates of timevarying parameters, to the same extent as the SDF provides robust filtered estimates of timevarying parameters in presence of a nonGaussian prediction density.
4.2 Nonlinear nonGaussian models
We now examine the behavior of the SDS in presence of nonlinear nonGaussian parameterdriven models. In particular, we consider the following three specifications, which are quite popular in the econometric literature:

Stochastic volatility model with Gaussian measurement density^{4}^{4}4Note that this is the same stochastic volatility model considered in the in section 3.2.:

Stochastic volatility with nonGaussian measurement density:

Stochastic intensity model with Poisson measurement density:
Comments
There are no comments yet.