A General Class of Score-Driven Smoothers

03/13/2018
by   Giuseppe Buccheri, et al.
Scuola Normale Superiore
0

Motivated by the observation that score-driven models can be viewed as approximate filters, we introduce a new class of simple approximate smoothers for nonlinear non-Gaussian state-space models that are named "score-driven smoothers" (SDS). The newly proposed SDS improves on standard score-driven filtered estimates as it is able to use all available observations when reconstructing time-varying parameters. In contrast to complex and computationally demanding simulation-based methods, the SDS has similar structure to Kalman backward smoothing recursions but uses the score of the non-Gaussian observation density. Through an extensive Monte Carlo study, we provide evidence that the performance of the approximation is very close (with average differences lower than 2.5 simulation-based techniques, while at the same time requiring significantly lower computational burden.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/26/2019

Score-Driven Exponential Random Graphs: A New Class of Time-Varying Parameter Models for Dynamical Networks

Motivated by the evidence that real-world networks evolve in time and ma...
12/28/2019

The Lévy State Space Model

In this paper we introduce a new class of state space models based on sh...
12/18/2018

A new time-varying model for forecasting long-memory series

In this work we propose a new class of long-memory models with time-vary...
08/12/2020

ScoreDrivenModels.jl: a Julia Package for Generalized Autoregressive Score Models

Score-driven models, also known as generalized autoregressive score mode...
06/24/2020

Inference in Stochastic Epidemic Models via Multinomial Approximations

We introduce a new method for inference in stochastic epidemic models wh...
04/02/2021

Dynamic models using score copula innovations

This paper introduces a new class of observation driven dynamic models. ...
06/10/2021

Score Matching Model for Unbounded Data Score

Recent advance in score-based models incorporates the stochastic differe...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Observation-driven models like the GARCH model of Bollerslev, where time-varying parameters are driven by functions of lagged observations, are typically viewed as data generating processes. As such, all relevant information is encoded on past observations and there is no room for using actual and future observations when estimating time-varying parameters. However, they can also be viewed as predictive filters, as time-varying parameters are one-step-ahead predictable. This idea was largely exploited by Daniel B. Nelson, who explored the asymptotic properties of conditional covariances of a misspecified GARCH under the assumption that the data generating process is a diffusion222The interpretation of GARCH processes as filters is well described in this statement by NELSON199261: “Note that our use of the term ‘estimate’ corresponds to its use in the filtering literature rather than the statistics literature; that is, an ARCH model with (given) fixed parameters produces ‘estimates’ of the true underlying conditional covariance matrix at each point in time in the same sense that a Kalman filter produces ‘estimates’ of unobserved state variables in a linear system”.; see NELSON199261, NelsonFoster, NELSON1995303 and NelsonSmooth. In particular, NelsonSmooth showed how to efficiently use information in both lagged and led GARCH residuals to estimate the unobserved states of a stochastic volatility model. Despite many observation-driven models have been proposed in the econometric literature, little attention has been paid to the problem of smoothing within this class of models in case they are employed as misspecified filters rather than data generating processes.

We aim at filling this gap by introducing a smoothing method for a general class of observation-driven models, namely score-driven models of GAS1 and Harvey_2013, also known as “Generalized Autoregressive Score” (GAS) models or “Dynamic Conditional Score” (DCS) models. We show that, in the steady state, Kalman filter and smoothing recursions for linear Gaussian models can be re-written in terms of the score of the conditional density, the Fisher information matrix and a set of static parameters. In particular, the predictive filtering recursion turns out to have the form of score-driven models. The latter can therefore be viewed as approximate filters for nonlinear non-Gaussian models. The performances of these filters have been examined by GAS3, who showed that misspecified score-driven models provide similar forecasting performances as correctly specified parameter-driven models. Based on the same logic, we build a new class of approximate nonlinear smoothers that have similar form to Kalman backward smoothing recursions but employ the score of the non-Gaussian density. The resulting smoothing method is very general, as it can be applied to any observation density, in a similar fashion to score-driven models. We name the newly proposed methodology Score-Driven Smoother (SDS). Similarly, we introduce a Score-Driven Update (SDU) filter, allowing to update predictive filtered estimates once new observations become available.

Smoothing with the SDS requires performing a backward recursion following the standard score-driven forward recursion to filter time-varying parameters. While going backward, the SDS updates filtered estimates by including the effect of actual and future observations and leads to a more efficient reconstruction of time-varying parameters. In our experiments, we have found that, compared to filtered estimates, the SDS provides gains up to in mean square errors, for a wide class of data generating processes. Considered that the likelihood of observation-driven models can be typically written down in closed form, smoothing with the SDS is particularly advantageous from a computational point of view. In contrast, the classical theory of filtering and smoothing for nonlinear non-Gaussian models requires the use of computationally demanding simulation-based techniques (DurbinKoopman). Another relevant advantage of the SDS over traditional simulation-based methods is that extension to a setting with multiple time-varying parameters is immediate, as it maintains the same simple form as in the univariate case.

This general framework allows to construct confidence bands around filtered and smoothed estimates. In observation-driven models, confidence bands are typically needed because static parameters are replaced by their maximum likelihood estimates. In the language of BLASQUES2016875, this is known as parameter uncertainty. However, if observation-driven models are employed as filters, the latent state variables are not completely revealed by past observations. Thus, also filtering uncertainty has to be considered when building confidence bands. While confidence bands reflecting parameter uncertainty can be built through the methods developed by BLASQUES2016875, it is less clear how one can take into account filtering uncertainty in observation-driven models. Zamojski proposed a bootstrap based method to construct in-sample confidence bands for the GARCH. As acknowledged by the author, this method leads to underestimate filtering uncertainty and provides narrow confidence bands. We show that, as a byproduct of our results, one can build both in-sample and out-of-sample confidence bands accounting for filtering uncertainty in score-driven models. We examine in detail the construction of confidence bands in the case of the GARCH model. A general and systematic treatment of filtering uncertainty in score-driven models is provided by BBCL.

Score-driven models have been successfully applied in the recent econometric literature. For instance, GAS2 developed a multivariate dynamic model for volatilities and correlations using fat tailed distributions. HarveyLuati described a new framework for filtering with heavy tails while OhPatton introduced high-dimensional factor copula models based on score-driven dynamics for systemic risk assessment. As shown by Blasques, in the class of observation-driven models, score-driven models are locally optimal from an information theoretic perspective. For any score-driven model, one can devise companion SDS and SDU recursions. In particular, the SDS is useful for off-line signal reconstruction and analysis, while the SDU can be used for on-line updating of time-varying parameters. We examine in detail the companion SDS and SDU recursions of popular observation-driven models, namely the GARCH, the MEM model of EngleMEM and EngleGallo and an AR(1) model with a time-varying autoregressive coefficient. In oder to show the effectiveness of the proposed methodology in a setting with multiple time-varying parameters, we consider the -GAS model of GAS2 and the Wishart-GARCH model of realWishart. We show, both on simulated and empirical data, the advantages of SDS and SDU over standard filtered estimates.

A related smoothing technique for a dynamic Student’s location model was introduced by Harvey_2013, who replaced prediction errors in the Kalman smoothing recursions with a martingale difference that is proportional to the score of the distribution. An application of this smoother can be found in Caivano2016. The main difference with our approach is that we write the Kalman recursions for the mean of time-invariant linear Gaussian models in a general form that only depends on the score and the Fisher information matrix of the observation density. The resulting smoothing recursions are different by those obtained by Harvey_2013 and are easily applicable to a generic score-driven model by replacing the Gaussian density with the observation density at hand. The SDS is also related to the “approximation via mode estimation” technique described by DurbinKoopman2000 and DurbinKoopman. These authors proved that one can find a sequence of approximating linear Gaussian models enabling the computation of the conditional mode of a non-Gaussian model via a Newton-Raphson algorithm. The main difference with our methodology is that the SDS requires a unique, nonlinear recursion rather than a sequence of Kalman recursions for approximating linear Gaussian models. In addition, in our methodology, the filter coincides with well-known observation-driven model (e.g. GARCH, MEM, ACD, etc) while the approximation via mode estimation technique uses a sequence of filters that are not easily interpretable as dynamic models.

By performing extensive Monte Carlo simulations of nonlinear non-Gaussian state-space models, we compare the performance of the SDS to that of correctly specified parameter-driven models. In particular, we consider two stochastic volatility models and a stochastic intensity models. Importance sampling methods allow to evaluate the full likelihood of these models. The Quasi Maximum Likelihood (QML) method of HarveySV is also considered as a benchmark when estimating the two stochastic volatility models. Compared to correctly specified models, the losses incurred by the SDS are very small in all the simulated scenarios and are always lower, on average, than 2.5% in mean square errors. Moreover, the SDS systematically outperforms the QML. Computational times are decisively in favour of the SDS. For the models used in the simulation study, we found that smoothing with the SDS is on average 215 times faster than smoothing with efficient importance sampling techniques. The advantages of the proposed method are also shown on empirical data. Using realized covariance as a proxy of latent covariance, we show that SDU and SDS covariance estimates obtained through the dynamic -GAS model fitted on Russel 3000 stock returns are superior to standard filtered score-driven estimates. The analysis allows to examine the informational content of present and future log-returns from a dynamic covariance modelling perspective.

The rest of the paper is organized as follows: section 2 introduces the SDS and provides the main theoretical results; section 3 describes several examples of SDS’s and discusses how to construct confidence bands; section 4 shows the results of the Monte Carlo study; in section 5 the SDS is applied on an empirical analysis involving assets of Russel 3000 index; section 6 concludes.

2 Theoretical framework

In this section, we discuss in detail the main theoretical results conveying to the formulation of our approximate, nonlinear smoothing technique. We start by showing that, in the steady state, the classical Kalman filter and smoothing recursions for linear Gaussian models can be re-written in an alternative form that only involves the score of the conditional likelihood, the Fisher information matrix and a set of static parameters. Abstracting from the linear Gaussian setting, these recursions can be viewed as the approximate filtering and smoothing recursions for a non-Gaussian model by computing scores and information based on the non-Gaussian density. We then show that filtering uncertainty in score-driven models can be evaluated as an immediate byproduct of our results.

2.1 Kalman filtering and smoothing

Let us consider a linear Gaussian state-space representation:

(1)
(2)

where

is a column vector of state variables and

is a column vector of observations. The parameters , , and are system matrices. Let denote the set of observations up to time , namely . We are interested in updating our knowledge of the underlying state variable when a new observation becomes available and to predict based on the last observations . We thus define:

(3)
(4)

The Kalman filter allows to compute recursively , , and . Assuming , where and are known, for , we have (Harvey, DurbinKoopman):

(5)
(6)
(7)

and

(8)
(9)
(10)

where . The log-likelihood can be computed in the prediction error decomposition form, namely:

(11)

Smoothed estimates , , , can be computed through the following backward recursions:

(12)
(13)

and

(14)
(15)

where , , and . The conditional distribution of

is Gaussian with mean and variance given by

, , , depending on the conditioning set.

2.2 A more general representation

In Appendix A we prove the following:

Proposition 1

In the steady state, Eq. (6), (7), (12), (13) can be written as:

(16)
(17)

and

(18)
(19)

where , , , and is the steady state variance matrix which is the solution of the matrix Riccati equation:

(20)

Note that a steady state solution exists whenever the system matrices are constant (Harvey, DurbinKoopman). In this case, the variance matrix converges to after few time steps. The new Kalman recursions for the mean are re-parameterized in terms of the score and the Fisher information matrix . This representation is equivalent to the one in equations (6), (7) and (12), (13). However, it is more general, as it only relies on the predictive density . In principle, the forward recursions (16), (17) and the backward recursions (18), (19) can be applied to any parameter-driven model for which a measurement density is defined.

2.3 SDS recursions

Note that the predictive filter (17) has an autoregressive structure and is driven by the score of the conditional likelihood, i.e. it has the form of score-driven models of GAS1 and Harvey_2013. Thus, if one looks at score-driven models as filters, it turns out that the score-driven filter (SDF

hereafter) is optimal in case of linear Gaussian models. In case of nonlinear non-Gaussian models, the SDF can be regarded as an approximate nonlinear filter. The main difference with the Kalman filter is that the Gaussian score is replaced by the score of the true conditional density, thus providing robustness to non-Gaussianity. As shown by

GAS3, score-driven filters have similar predictive accuracy as correctly specified nonlinear non-Gaussian models, while at the same time providing significant computational gains. Indeed, the likelihood can be written in closed form and standard quasi-Newton techniques can be employed for optimization.

Based on the same principle, we introduce an approximate nonlinear smoother allowing to estimate time-varying parameters using all available observations. In case of linear Gaussian models, the Kalman smoother is a minimum variance linear unbiased estimator (MVLUE) of the state. Thus, we define our smoother in such a way that it coincides with the latter in this specific case. In case of nonlinear non-Gaussian models, it maintains the same simple form of Kalman backward smoothing recursions but replaces the Gaussian score with the one of the non-Gaussian density.

Let us assume that observations , , are generated by the following observation density:

(21)

where is a vector of time-varying parameters and is a vector of static parameters. We generalize the filtering and smoothing recursions (16)-(19) for the measurement density as:

(22)
(23)

and:

(24)
(25)

where and . The predictive filter in equation (23) has the same form of score-driven models. The term is now the score of the measurement density , namely:

(26)

while is the information matrix, which may be time-varying. The vector and the two matrices are static parameters included in . They are estimated by maximizing the log-likelihood, namely:

(27)

Thus, one can run the backward smoothing recursions (24), (25) after computing the forward filtering recursions (22), (23), in a similar fashion to Kalman filter and smoothing recursions. Note that the above recursions are nonlinear, as the score of a non-Gaussian density is typically nonlinear in the observations. The filter in equation (22) allows to update the current estimate once a new observation becomes available. While going backward, the smoothing recursions (24), (25) update the two filters and using all available observations. Smoothed estimates are generally less noisy than filtered estimates , and provide a more accurate reconstruction of the time-varying parameters.

It is a standard practice in score-driven models replacing the score with the scaled score . The role of the scaling matrix is to take into account the curvature of the log-likelihood function. GAS1 discussed several choices of based on inverse powers of the information matrix . For instance, given a normal density with time-varying variance, if , one recovers the standard GARCH model. The filtering and smoothing recursions (22)-(25) are obtained if one sets

equal to the identity matrix. When using a scaled score

, the filtering recursions (22), (23) become:

(28)
(29)

Since the score is now scaled by , the term in equation (24) has to take into account the new normalization. We thus replace with . As a result, we obtain the general backward smoothing recursions:

(30)
(31)

Note that the second equation is unaffected, as the term already corrects for the scaling. For instance, if , we obtain:

(32)
(33)

that is, the information matrix disappears because its effect is already taken into account when scaling the score. If , we get:

(34)
(35)

From a computational point of view, the backward recursions (30), (31) are simple since and are typically available from the forward filtering recursion. We term the approximate smoother obtained through recursions (30), (31) as Score-Driven Smoother (SDS). Basically, for any score-driven model, one can devise a companion SDS recursion that only requires the , and the static parameters, as estimated through the SDF. Note that the forward recursion (28) is the analogue of recursion (6) in the Kalman filter and allows to update SDF estimates once a new observation becomes available. We denote the approximate Score-Driven Update filter (28) by SDU. The proposed methodology can thus be schematically represented through the following procedure:

  1. Estimation of static parameters:

  2. Forward predictive and update filter:

  3. Backward smoother:

2.4 Filtering uncertainty

The general framework developed in section (2.3) also allows to construct in-sample and out-of-sample confidence bands around filtered and smoothed estimates. As underlined by BLASQUES2016875, confidence bands can reflect both parameter and filtering uncertainty. Parameter uncertainty is related to the fact that static parameters are replaced by their maximum likelihood estimates. Both observation-driven and parameter-driven models are affected by parameter uncertainty. In observation-driven models, confidence bands reflecting parameter uncertainty can be constructed through the methods developed by BLASQUES2016875. Filtering uncertainty is related to the fact that time-varying parameters are not completely revealed by observations. As such, it is absent in observation-driven models, where time-varying parameters are deterministic functions of past observations. However, if observation-driven models are regarded as filters, one is interested in constructing confidence bands around filtered and smoothed estimates reflecting the conditional distribution of the underlying state variable.

In linear Gaussian models, filtering uncertainty can be assessed through the variance matrices , , introduced in section (2.1), which provide the conditional variance of the unobserved state variable. It is instead less clear how one can quantify filtering uncertainty in misspecified observation-driven models. Zamojski proposed a bootstrap based method for assessing filtering uncertainty in GARCH filters. Confidence bands constructed through this technique tend to underestimate filtering uncertainty, because they are based on bootstraps of the filter rather than the underlying state variable. In addition, the method of Zamojski does not allow to construct out-of-sample confidence bands, which are often needed in practical applications.

In our framework, in-sample and out-sample confidence bands can be constructed by exploiting the relation between Kalman filter recursions and score-driven recursions. In section 2.2, we have shown that the steady state variance matrix can be expressed as:

(36)

In the score-driven framework, the analogue of , which we denote by , is then given by:

(37)

where the scaling matrix is introduced to take into account different normalizations of the score. From eq. (9), (A.2), the analogue of is:

(38)

Similarly, the analogue of , from eq. (14), (15), is:

(39)
(40)

with and .

Confidence bands can be computed as quantiles of the conditional distribution of the state variable. For a general state-space model, the latter is non-Gaussian and is not known analytically. Assuming a Gaussian density generally leads to underestimate filtering uncertainty, as the true conditional density is typically fat-tailed. In order to construct robust confidence bands, we use a more flexible density determined by matching location and scale parameters with those of the normal density. This method is described in its full generality by

BBCL. In section 3.2, we show an application to the GARCH and assess the performance of robust confidence bands in a simulation study.

3 Examples of SDS recursions

In this section we provide several examples of SDS estimates. As a first step, we focus on two volatility models that are quite popular in the econometric literature, namely the GARCH model of Bollerslev and the multiplicative error model (MEM) of EngleMEM and EngleGallo. These are score-driven models which are susceptible of treatment within our framework. As a third example, we present an AR(1) model with a score-driven autoregressive coefficient. The time-varying autoregressive coefficient allows to capture temporal variations in persistence, as well as nonlinear dependencies (BlasquesNonlinear

). Autoregressive models with time-varying coefficients have been employed by

DELLEMONACHE2017482 and SHARK for inflation and volatility forecasting, respectively.

One of the advantages of the SDS recursions (30), (31) is that they maintain the same simple form when , is a vector containing multiple time-varying parameters. In this multivariate setting, the use of simulation-based techniques would be highly computationally demanding. In order to test the SDS in a multivariate setting, we consider the -GAS model of GAS2 and the Wishart-GARCH model of realWishart. The former is a conditional correlation model for heavy-tail returns while the latter is a joint model for the dynamics of daily returns and realized covariance matrices. In these models, the number of time-varying parameters grows as the square of the number of assets and therefore they provide an interesting multivariate framework in which to assess the performance of the SDS.

1. GARCH-SDS
Consider the model:

(41)

The predictive density is thus:

(42)

Setting and , equations (28), (29) reduce to:

(43)
(44)

In particular, the predictive filter (44) is the standard GARCH(1,1) model. The smoothing recursions (30), (31) reduce to:

(45)
(46)

.

2. MEM-SDS
Consider the model:

(47)

where

has a Gamma distribution with density

. The predictive density is thus given by:

(48)

Setting and , equations (28), (29) reduce to:

(49)
(50)

In particular, the predictive filter (50) is the standard MEM(1,1) model. The smoothing recursions (30), (31) reduce to:

(51)
(52)

.

3. AR(1)-SDS
Consider the model:

(53)

The predictive density is thus given by:

(54)

Setting and , equations (28), (29) reduce to:

(55)
(56)

while the smoothing recursions (30), (31) reduce to:

(57)
(58)

.

4. -GAS-SDS
Let denote a vector of demeaned daily log-returns. Consider the following observation density:

(59)

where is a time-varying covariance matrix and

is the number of degrees of freedom. Note that

is a normalized Student distribution such that . Applying the filtering equation (29) leads to the -GAS model of GAS2. Closed form formulas for the score and information matrix are reported in GAS2. These authors also proposed two parameterizations of leading to positive-definite estimates. The first is similar to the one used in the DCC model of EngleDCC, while the second is based of hyperspherical coordinates. In the two parameterizations, the number of time-varying parameters is and , respectively.

5. Wishart-GARCH-SDS
Let us assume that, in addition to daily log-returns , we can compute realized measures from the intraday returns of the assets. Let denote a positive definite estimate of the realized covariance matrix. Let also denote the -field generated by and . The observation density in the Wishart-GARCH model is:

(60)
(61)

where

is a multivariate zero-mean normal distribution with covariance matrix

and is a Wishart distribution with mean and degrees of freedom . Assuming that and are conditionally independent given , the conditional log-likelihood can be written as:

(62)

where:

(63)
(64)

Here , and is the multivariate Gamma function of order . We denote the vector of time-varying covariances by , . The score and the information matrix can be computed as reported in realWishart. OpschoorHeavy proposed an alternative specification with a heavy tail distribution for both returns and realized measures. Similar SDS recursions can be recovered for this fat-tail specification using our general framework.

Figures 1 - 3 show several examples of SDS estimates from the above models. The time-varying parameters follow both deterministic and stochastic patterns and are generated as described in the next paragraph.

3.1 Comparison of SDF and SDS estimates

In order to show the effectiveness of the proposed methodology, we compare SDF and SDS estimates. It is natural expecting that SDS estimates are affected by lower estimation errors, as they use more information when reconstructing time-varying parameters. However, comparing with the latter allows to provide a quantitative assessment of the benefits of using the SDS in place of standard score-driven estimates.

We first focus on the univariate models (GARCH, MEM, AR) and simulate time-series of observations with different dynamic patterns for the time-varying parameters. The first 2000 observations are used to estimate the models while the remaining observations are used for testing. Let generically denote the time-varying parameters , and in the three models. We consider the following data generating processes for :

  1. Slow sine:

  2. Fast sine:

  3. Ramp:

  4. Step:

  5. Model:

. We set , , , , . For some of these dynamic specifications, figure 1 shows examples of filtered and smoothed estimates of time-varying parameters obtained through the GARCH. As expected, SDS estimates are less noisy than filtered estimates and provide a more accurate reconstruction of time-varying parameters.

Table 1 shows average MSE and MAE of SDF and SDS estimates, for all the patterns considered above. We also report MSE and MAE obtained through the SDU filter in equation (28). The latter updates once a new observation arrives. This translates into a slight improvement over filtered estimates. The SDS, using all available observations, significantly improves on SDF estimates, with relative gains larger than and lower than in mean square errors.

We now consider the two multivariate models, namely the -GAS and the Wishart-GARCH. We compare SDF estimates to SDU and SDS estimates in a simulation setting where time series of daily realized covariance matrices and log-returns are generated as described in Appendix B. The aim of the experiment is to estimate the true covariance matrix from observations of daily returns in the -GAS model and from observations of both daily returns and realized covariance matrices in the Wishart-GARCH model. We consider three scenarios where the number of assets is respectively, and thus we have time-varying covariances333We implement the -GAS model using hyperspherical coordinates, and thus we have time-varying covariances..

For , figures 2 and 3 compare SDF and SDS estimates of the and elements of the simulated covariance matrix in the -GAS and Wishart-GARCH models, respectively. As in the previous univariate cases, smoothed estimates provide a better reconstruction of the time-varying covariances. Note that, compared to the -GAS model, the Wishart-GARCH provides estimates which are closer to the simulated , as they are obtained by conditioning on a larger information set.

In order to quantify estimation errors, we use the root mean square error (RMSE) and the quasi-likelihood (Qlike), which are robust loss measures for covariance estimates (Patton2011246). These are defined in Appendix B. Table 2 shows relative RMSE and Qlike gains of SDU and SDS estimates over SDF. We first note that SDU and SDS provide significantly lower RMSE. In the -GAS model, the relative gain of the SDU is roughly equal to 3%, while the one of SDS is larger than 14% and lower than 19%. In the Wishart-GARCH model, the relative gain of SDU is larger than 7% and lower than 13%, while the one of the SDS is larger than 13% and lower than 19%. It is interesting to note that SDU gains are significantly larger in the Wishart-GARCH model. This is due to the fact that today’s realized covariance is a highly informative proxy of , thus leading to drastic RMSE reduction when included in the information set. In contrast, daily returns are less informative and thus it is necessary to include all available observations to achieve significant RMSE reduction in the -GAS model. If one looks at the Qlike loss, relative gains of SDU and SDS are moderate compared to RMSE but they are statistically significant. Even in this case, SDU gains are larger in the Wishart-GARCH model due to the highly informative content of realized covariance measures.

3.2 Confidence bands

In section (2.4), we have seen that an estimate of the conditional variance of the state variable is given by the variance matrices , and defined in eq. (37), (38), (40). As in the Kalman filter, one can use these variances to construct confidence bands around filtered and smoothed estimates. However, the conditional density of the state variable is typically fat-tailed and cannot be written in closed form. Assuming normality generally provides narrow confidence bands and thus underestimates filtering uncertainty.

Robust in-sample and out-of-sample confidence bands can be constructed by computing quantiles of a more flexible distribution determined by matching location and scales parameters. We illustrate here an application of this technique in the case of the GARCH and provide a systematic treatment in BBCL.

Let us consider the following stochastic volatility model:

(65)
(66)

We are interested in computing quantiles of the conditional density of . Filtered and smoothed estimates of the latent log-variance are recovered by computing the score-driven recursions for the following observation density:

As an outcome of this procedure, we also obtain the conditional variances , and . Let , , be the conditional distribution function of . The quantile function of is then given by:

(67)

As a first approximation, we compute quantiles by assuming . For , , we obtain , , , respectively. These conditional densities depend on parameters which are an output of the score-driven recursions and thus confidence bands can be easily computed through eq. (67) using the Gaussian quantile function. We then assume , i.e. a Student’s -distribution with location , scale and degrees of freedom. If , we recover the Gaussian confidence bands. However, if is finite, confidence bands will be larger and provide a better approximation to the true filtering uncertainty. In this example, the parameter is chosen by fitting a distribution on the residuals of an AR model estimated on . More sophisticated techniques are developed in BBCL.

In order to test the quality of confidence bands, we generate 1000 time series of observations of the stochastic volatility model (65), (66). The values of static parameters are chosen in order to be similar to those obtained when estimating the model on real financial returns: , , . Figure (4) shows one of the simulated patterns, together with 95% confidence bands for filtered and smoothed estimates computed through the method described above.

We estimate the GARCH in the sub-sample comprising the first 2000 observations and construct in-sample confidence bands. In the remaining sub-sample of 2000 observations, out-of-sample bands are constructed using previous parameter estimates. Both Gaussian and robust confidence bands are built at 90%, 95%, 99% nominal confidence levels. We compare the nominal confidence level to the coverage, defined as the fraction of times the true variance path is inside the confidence bands. Table (3) shows average coverages for in-sample and out-of-sample SDF, SDU and SDS confidence bands. As expected, confidence bands constructed by assuming a Gaussian density provide an average coverage which is significantly lower than the nominal confidence level, meaning that they underestimate filtering uncertainty. In contrast, the average coverage of robust confidence bands is very close to the nominal coverage. Similar results are found when changing the variance in the latent process. In particular, for larger values of , the quality of Gaussian confidence bands further deteriorates, while robust bands still provide a good matching to the nominal level. A systematic treatment of the technique described here and, more generally, of filtering uncertainty in observation-driven models can be found in BBCL.

4 Monte Carlo analysis

In this section we perform extensive Monte Carlo simulations to test the performance of the SDS under different dynamic specifications for the time-varying parameters. Since we interpret the SDS as an approximate smoother for nonlinear non-Gaussian models, we compare its performance to that of correctly specified parameter-driven models. The main idea is to examine the extent to which the approximation leads to similar results as correctly specified parameter-driven models. In this case, the use of the SDS would be particularly advantageous from a computational point of view, as the likelihood of score-driven models can be written in closed form and smoothing can be performed through a simple backward recursion. This analysis is similar in spirit to that of GAS3, who compared score-driven models to correctly specified parameter-driven models and found that the two classes of models have similar predictive accuracy, with very small average losses. We find a similar result for the SDS.

4.1 Linear non-Gaussian models

We first consider an AR(1) model with a -distributed measurement error:

(68)
(69)

We choose and . The signal-to-noise ratio is defined as . The corresponding observation driven model is a -location model (Harvey_2013) with predictive density:

(70)

Setting , equation (29) reduces to:

(71)

while the smoothing recursions (30), (31) reduce to:

(72)
(73)

. We compare standard Kalman filtered and smoothed estimates to SDF, SDU and SDS estimates. Similarly to previous simulation studies, we generate 1000 time series of 4000 observations and use the first subsample of 2000 observations for estimation and the remaining observations for testing. Table 4 shows relative MSE and MAE for different values of . Note that SDF, SDU and SDS provide better estimates than standard Kalman filter and smoother. In particular, we observe large differences for low values of , where the -distribution strongly deviates from the Gaussian, and for low values of , at which accounting for the non-normality of the measurement error becomes more important. Note also that gains of SDS over Kalman smoother estimates are larger than gains of SDF over the Kalman filter for low and . These results confirm the ability of the SDS to provide robust smoothed estimates of time-varying parameters, to the same extent as the SDF provides robust filtered estimates of time-varying parameters in presence of a non-Gaussian prediction density.

4.2 Nonlinear non-Gaussian models

We now examine the behavior of the SDS in presence of nonlinear non-Gaussian parameter-driven models. In particular, we consider the following three specifications, which are quite popular in the econometric literature:

  1. Stochastic volatility model with Gaussian measurement density444Note that this is the same stochastic volatility model considered in the in section 3.2.:

  2. Stochastic volatility with non-Gaussian measurement density:

  3. Stochastic intensity model with Poisson measurement density: