The prediction distribution of a GARCH(1,1) process

12/20/2018 ∙ by Karim M. Abadir, et al. ∙ University of Bologna Imperial College London European Union 0

This paper derives the analytic form of the h-step ahead prediction density of a GARCH(1,1) process under Gaussian innovations, with a possibly asymmetric news impact curve. The analytic form of the density is novel and improves on current methods based on approximations and simulations. The explicit form of the density permits to compute tail probabilities and functionals, such as expected shortfall, that measure risk when the underlying asset return is generated by a GARCH(1,1). The prediction densities are derived for any finite prediction horizon h. For the stationary case, as h increases the prediction density converges to a distribution with Pareto tails which whose form has been already described in the literature. The formulae in the paper characterize the degree of non-gaussianity of the prediction distribution, and the distance between the tails of the finite horizon prediction distribution and the ones of the stationary distribution.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Since their introduction in Engle (1982) and Bollerslev (1986), Generalised AutoRegressive Conditional Heteroskedasticity (GARCH) processes have been widely employed in financial econometrics, see e.g. Bollerslev, Russell, and Watson (2010). In their original formulation, the conditional distribution of innovations was typically assumed to be Gaussian.

Empirically, the distribution of stock returns has been studied extensively under the random walk assumption, see e.g. Fama (1965); in this literature, Gaussianity of stock returns has been questioned as too thin-tailed when compared to its empirical counterpart. Gaussian GARCH processes can generate uncorrelated, heteroskedastic returns with a stationary distribution with fatter tails than the Gaussian.

GARCH processes can include several lags of the past squared shocks and several lags of the past volatility; in practice, however, the GARCH(1,1) model with is often found to offer a good fit for asset returns, and it is usually preferred to GARCH models with more parameters, see Tsay (2010) section 3.5, or Andersen, Bollerslev, Christoffersen, and Diebold (2006), section 3.6. Moreover, many multivariate GARCH models are built on the univariate GARCH(1,1), see e.g. Engle, Ledoit, and Wolf (2017) and references therein. In this sense the GARCH(1,1) is both the prototype and the workhorse of GARCH processes in practice.

GARCH processes map shocks, i.e. news, into the conditional volatility; the function obtained by replacing past conditional volatilities with unconditional ones was called by Engle and Ng (1993) the news-impact-curve (NIC). For GARCH(1,1) processes, this curve yields the same value of volatility for positive and negative shocks, i.e. it is symmetric. Glosten, Jagannanthan, and Runkle (1993) (henceforth GJR) extended the GARCH setup to allow for asymmetric news impact curve responses to negative shocks.

Many measures of risk are functions of the prediction density of asset returns. These measures include the Value at Risk, which is a quantile of the prediction distribution of the asset return, see

Jorion (2006), as well as the Expected Shortfall, see Patton, Ziegel, and Chen (2017). The latter is the expected value of the prediction distribution of the asset return in the left tail, between minus infinity and the Value at Risk; this measure has been recently re-emphasised by the Third Basel Accords. Both measures are functionals of the prediction distribution of asset returns, see Arvanitis, Hallam, Post, and Topaloglou (2018).

The prediction distribution of a GARCH(1,1) hence plays an important role for the computation of risk measures in financial applications. This distribution is not known in analytic form beyond the one-step-ahead prediction distribution, which is given by the assumption on innovations used to build the process, see e.g. Andersen, Bollerslev, Christoffersen, and Diebold (2006), page 811.

The unknown analytic form of the prediction density of a GARCH has led econometricians to look for alternative approximate solutions. Alexander, Lazar, and Stanescu (2013)

have resorted to approximations based on the first 4 moments of the prediction distribution; see also

Baillie and Bollerslev (1992). They use the Cornish-Fisher expansion and the Johnson SU distribution with the same 4 moments.

An alternative to this approach is to simulate from the prediction distribution and to estimate it non-parametrically, e.g. by kernel methods. While consistent, this estimator has the slower rate of convergence typical of kernel density estimators. More recently

Delaigle, Meister, and Rombouts (2016) have proposed a non-parametric root- consistent estimator of the stationary distribution of the (log-)volatility process. Despite a better convergence rate, the non-parametric estimation of the density requires computing time, and does not lead to exact results.

The tail behavior of the stationary distribution of the GARCH(1,1) has been studied extensively, see Mikosch and Starica (2000) and Davis and Mikosch (2009). The tails of the stationary distribution of both the volatility and of the GARCH process are of Pareto type, say. These properties are based on results for random difference equations and renewal theory obtained in Kesten (1973) and Goldie (1991).

The tail index is associated with the number of moments of the stationary distribution, which exist up to order . Larger values of are associated with thinner tails of the stationary distribution; this is interpreted here to mean that the larger the number of moments (i.e. the larger

) the smaller the distance from the Gaussian distribution, which has an infinite number of moments.

depends on the coefficient and of the GARCH(1,1) process , , as well as on the type of the one-step-ahead distribution. Examples of values of the tail index are given in Davis and Mikosch (2009).

The present paper derives the analytical form of the -step-ahead prediction density of a GARCH(1,1), allowing for the GJR type with asymmetric NIC. Closed form expressions are given for the prediction density of a GARCH(1,1) process for Gaussian innovations. The results are obtained by marginalizing the joint density of the prediction observations, using integration and special functions, for any prediction horizon .111The one-step ahead distribution for is given by construction of the process. The formulae are valid for stationary as well as non-stationary GARCH(1,1) processes.

In the case of 2-steps-ahead, the prediction distribution is obtained without imposing constrains on the values of the and coefficients. For the -steps-ahead prediction distribution with , a condition on is required to guarantee integrability of a certain integral; a sufficient condition for this to be satisfied is to have larger than 0.62, which is a condition often satisfied in practice.

The prediction density is found to be close to a Gaussian density (with appropriate variance) for high values of

, and far from it for low values of it. Similarly, large values of are found to be associated to higher values of , i.e. smaller distance from the Gaussian distribution for the stationary distribution with Pareto tails .

The rest of the paper is organised as follows. Section 2 describes the general approach for the derivation of the integral. Section 3 states main results, while Section 4 discusses the form of the prediction density when compared with the tails of the stationary distribution. Section 5 concludes. The Appendix contains proofs.

2 The prediction density

This section illustrates the construction used to characterise the prediction density as an integral, involving (a product of several copies of) the chosen density of innovations. Consider the asymmetric GARCH(1,1)


where , and is the indicator function for event , and is the sign of or ; these signs are the same because . The sequence is assumed to be i.i.d., centered around zero and with Gaussian p.d.f. .

Time is taken to be the starting time of the forecasts, and it is assumed that one wishes to predict for some , conditional on information set at time , taken to consist of observations of and . This information set is consistent with observing from minus infinity to time 0 under stationarity. Note also that, because and are observed, also is observed.

Throughout the paper the values taken by the random variables

, , are denoted , and respectively, and sometimes the subscript is omitted if this does not cause ambiguity. The next Lemma reports consequences of the symmetry of the one-step-ahead density on relevant conditional p.d.f.s. In the Lemma, the following notation is used, , ; here , denote values of and .

Lemma 2.1 (Densities).

For symmetric , is symmetric, i.e. , , and it is given by


Moreover, and one has


where depends on the value of and the sign of for via (2.1).

Denote the set of all possible by , . Densities are first computed conditionally on and later they are marginalized with respect to it. Here, conditioning on is relevant only for the GJR case .

The basic building block is given by the expression in (2.3). This density can be marginalised with respect to as follows


Finally, the conditioning with respect to the signs is averaged across different configurations, using the mutual independence of the signs and the fact that for all , thanks to the symmetry of . One hence finds


where the sum is over , for . The prediction density is hence found by combining (2.5), (2.4), (2.3), (2.2).

The next Lemma reports a recursion for the volatility process, that turns out to be useful when solving the integral in (2.4). In the Lemma, for , let and , where denotes a value of .

Lemma 2.2 (Volatility and transformations).

The volatility process can also be written

For , has the following recursive expression in terms of ’s


with , which is measurable with respect to the information set at time . Moreover, one has


where .

3 Main results

The main results are summarised in Theorem 3.2 below. Before stating the main theorems, an auxiliary assumption is introduced. Define with and .

Assumption 3.1.

  • For , let ;

  • For let .

It can be noted that , as . In Figure 1, the area above the curve represents the set for .

Figure 1: as a function of . Blue line: . Shaded area: region , see Assumption 3.1.

In Theorem 3.2 below, is the confluent hypergeometric function of the second kind, also known as Tricomi function, see Abadir (1999) and Gradshteyn and Ryzhik (2007), section 9.21, whose integral representation is,


with Moreover, the following notation is used in summations:

Theorem 3.2 (GARCH(1,1) prediction density).

Assume that are i.i.d. and let Assumption 3.1 hold; then one has, for ,


where ,



with , and , and empty sums (respectively products) are understood to be equal to  (respectively equal to ). Recall finally that also in (3.3) is a function of .


See Appendix.    

Note that in the case when , equation (3.2) holds for any value of , while for  it holds if and only if . For , the validity of the (3.2) is guaranteed by the sufficient condition , which is, however, not necessary.

The line of proof of Theorem 3.2 is the following: for the integral is solved by substitution and by using equation (3.1). For , subsequent (negative) binomial expansions of expression (2.6) for are required, whose validity is ensured by the inequality

which is satisfied under Assumption 3.1, see Lemma 5.1 in the Appendix.

Immediate consequences of Theorem 3.2 are collected in the following corollary.

Corollary 3.3 (C.d.f. and moments).

The prediction c.d.f.s of and are given by

with moments

where , are defined in (3.3).

Note that in the moments calculations are made of finite sums extending to , involving the Tricomi functions, which do not fall in the logarithmic case as in Theorem 3.2; see Abadir (1999) for the logarithmic case. In fact, implies that

is a finite sum.

Some standardised densities of and the corresponding right tails are plotted in Fig. 2 for . The curve is the standard Gaussian. Figure 3 shows the predictive densities for and values of that range from to 8.5 () to 1/8.5 (). Figure 4 shows the tails for asymmetric news impact curves.

Figure 2: Prediction densities (left panel) and zoom of the right tails (right panel) for standardised , , . Computations performed in Mathematica.
Figure 3: Prediction density for standardised , varying values of

One can see that the prediction densities are more similar to a Gaussian when is large.

Figure 4: Right tail of for standardised , , in blue, red and green respectively, ( is standard Gaussian)

4 Stationary distribution

The limit representation of the random variable in the stationary case can be found in Francq and Zakoian (2010) Theorem 2.1 page 24. The tail behaviour of the limit distribution is reviewed in Mikosch and Starica (2000) and Davis and Mikosch (2009). The tails of the stationary distribution of both the volatility and of the GARCH process are of Pareto type, say, where is a tail index. These properties are based on results for random difference equations and renewal theory obtained in Kesten (1973) and Goldie (1991).

The tail index of the stationary distribution depends on the coefficient and of the GARCH(1,1) process as well as on the one-step-ahead distribution. Examples of the tail index are given in Davis and Mikosch (2009); for Gaussian innovations, for , while for .

The index is the unique solution of . When is an integer, the expression simplifies to


see Davis and Mikosch (2009) eq. (10). Substituting the moments from the distribution, and assigning values to over a grid of pre-specified values, one can solve (4.1) for , and hence for . This allows to compute (values of) the surface . Figure 5 reports the level curves of as a function of and obtained in this way. The figure also reports the lines where is constant. It is seen that, for large values of , and increase roughly together. This association is not present for small values of .

Figure 5: Level curves of as a function of and in the Gaussian case. Dashed lines represent loci where is constant.

The relation between and fat-tailedness of the prediction density for finite horizon can be illustrated using the case . From Theorem 3.2,

where and222The quantity can be interpreted as the minimum value that can take, in the ideal case when (thus ) and is given, i.e. .

Hence when one has with , see Abramowitz and Stegun (1964), eq. 13.1.8, so that all the Tricomi functions , for varying , tend to one.333This is unlike in the case for fixed where the sequence of is decreasing from 1 to 0 for increasing . As a result, when the prediction distribution converges to a .

Hence in both the case of the prediction density for and the stationary distribution, the fat tailedness of the distributions is small for large values of .

5 Conclusions

This paper presents the analytical form of the prediction density of a GARCH(1,1) process. This can be used to evaluate the probability of tail events or of quantities that may be of interest for value at risk calculations. This improves on approximation methods based on moments, or on Monte Carlo simulation and estimation.

The techniques in this paper can ge applied also with symmetric innovations density different from the N(0,1) one. Different densities imply distinct subsequent (negative) binomial expansions of expression (2.6) for , and different auxiliary convergence conditions on the GARCH coefficients, similarly to Assumption 3.1.


  • (1)
  • Abadir (1999) Abadir, K. M. (1999) An introduction to hypergeometric functions for economists. Econometric Reviews, 18(3), 287–330.
  • Abramowitz and Stegun (1964) Abramowitz, M., and I. Stegun (1964) Handbook of mathematical functions. National Bureau of Standards, Applied Mathematics.
  • Alexander, Lazar, and Stanescu (2013) Alexander, C., E. Lazar, and S. Stanescu (2013) Forecasting VaR using analytic higher moments for GARCH processes. International Review of Financial Analysis, 30, 36 – 45.
  • Andersen, Bollerslev, Christoffersen, and Diebold (2006) Andersen, T., T. Bollerslev, P. F. Christoffersen, and F. X. Diebold (2006) Volatility and correlation forecasting. in Handbook of Economic Forecasting, Volume 1, ed. by G. Elliott, C. W. Granger, and A. Timmermann. Elsevier.
  • Arvanitis, Hallam, Post, and Topaloglou (2018) Arvanitis, S., M. Hallam, T. Post, and N. Topaloglou (2018) Stochastic Spanning. Journal of Business & Economic Statistics, 0(0), 1–13.
  • Baillie and Bollerslev (1992) Baillie, T. R., and T. Bollerslev (1992) Prediction in dynamic models with time-dependent conditional variances. Journal of Econometrics, 52, 91–113.
  • Bollerslev (1986) Bollerslev, T. (1986) Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics, 31, 307–327.
  • Bollerslev, Russell, and Watson (2010) Bollerslev, T., J. R. Russell, and M. W. e. Watson (2010) Volatility and time series econometrics: essays in honor of Robert F. Engle. Oxford University Press.
  • Davis and Mikosch (2009) Davis, R., and T. Mikosch (2009) Extreme Value Theory for GARCH Processes. in Handbook of Financial Time Series, ed. by T. Andersen, R. Davis, J.-P. Kreiss, and T. Mikosch, pp. 187–200. Springer.
  • Delaigle, Meister, and Rombouts (2016) Delaigle, A., A. Meister, and J. Rombouts (2016) Root- consistent density estimation in {GARCH} models. Journal of Econometrics, 192(1), 55 – 63.
  • Engle (1982) Engle, R. (1982) Autoregressive conditional heteroskedasticity with estimates of the variance of the United Kindom inflation. Econometrica, 11, 122–150.
  • Engle and Ng (1993) Engle, R., and V. Ng (1993) Measuring and Testing the Impact of News on Volatility. Journal of Finance, 48, 1749–1778.
  • Engle, Ledoit, and Wolf (2017) Engle, R. F., O. Ledoit, and M. Wolf (2017) Large Dynamic Covariance Matrices. Journal of Business & Economic Statistics, 0(0), 1–13.
  • Fama (1965) Fama, E. F. (1965) The Behavior of Stock-Market Prices. The Journal of Business, 38(1), 34–105.
  • Francq and Zakoian (2010) Francq, C., and J.-M. Zakoian (2010) GARCH models. Wiley.
  • Glosten, Jagannanthan, and Runkle (1993) Glosten, L., R. Jagannanthan, and D. Runkle (1993) On the relation between expected value and the volatility of the nominal excess return on stocks. Journal of Finance, 48, 1779–1802.
  • Goldie (1991) Goldie, C. M. (1991) Implicit Renewal Theory and Tails of Solutions of Random Equations. Annals of Applied Probability, 1, 126–166.
  • Gradshteyn and Ryzhik (2007) Gradshteyn, I., and I. Ryzhik (2007) Book of Tables of integrals, series, and products. 7th ed., Academic Press.
  • Jorion (2006) Jorion, P. (2006) Value at Risk - The New Benchmark for Managing Financial Risk. McGraw Hill, New York.
  • Kesten (1973) Kesten, H. (1973) Random difference equations and renewal theory for products of random matrices. Acta Mathematica, 131, 207–248.
  • Mikosch and Starica (2000) Mikosch, T., and C. Starica (2000) Limit Theory for the Sample Autocorrelations and Extremes of a GARCH (1,1) process. Annals of Statistics, 28(5), 1427–1451.
  • Mood, Graybill, and Boes (1974) Mood, A. M., F. A. Graybill, and D. C. Boes (1974) Introduction to the Theory of Statistics, 3rd Edition. Mc Graw-Hill.
  • Patton, Ziegel, and Chen (2017) Patton, A. J., J. F. Ziegel, and R. Chen (2017) Dynamic Semiparametric Models for Expected Shortfall (and Value-at-Risk). ArXiv,
  • Tsay (2010) Tsay, R. S. (2010) Analysis of financial time series. Wiley, 3rd edn.


Proof of Lemma 2.1.

Consider the transformation theorem for ; from standard results, see e.g. Mood, Graybill, and Boes (1974), page 201, Example 19, one has

where is the indicator function of the event . Because, by symmetry, one has , the expression in the previous display simplifies to or, letting indicate , and solving for , one finds , which is (2.2). Note that the expression with the absolute value is also valid for . This proves (2.2).

Eq. (2.3) follows from definitions.    

Proof of Lemma 2.2.

Consider from (2.3), and consider the transformation of from to . Observe that the domain of integration remains , that the inverse transformation is , with Jacobian , where . Hence one finds

from which (2.7) follows, as in (2.4).    

Proof of Theorem 3.2

The proof of Theorem 3.2 is based on the following Lemmas 5.1 and 2.1.

Lemma 5.1 (Conditions on ).

Let Assumption 3.1.b hold. Then, for any


of Lemma 5.1. For the inequality (5.1) reads . Solving the quadratic on the l.h.s. for one finds two roots, and , so that the quadratic is non-negative for or for . Because , this holds only when . This proves that (5.1) is valid for for and a fortiori also for .

An induction approach is used for . Assume that (5.1) is valid for some and ; it can then be shown that (5.1) is valid also replacing with . To see this, take (5.1) for and multiply by . One finds

Because , one has , so that,

Rearranging as , one finds that (5.1) holds also for . The induction step hence proves that (5.1) holds for any if .    

Lemma 5.2 (Coefficients ).

Assume that holds for ; then


equals as defined in .

Proof of Lemma 5.2.

Rewrite (5.2) setting as


Using equation (2.6), for this expression equals where

where , , , . Hence

which follows from equation (3.1). This shows that for , for .

Next consider the case , where

and one wishes to expand . Consider the inequality , and the associated quadratic equation in with solutions and as in Assumption 1. One has that for one finds , which ensure that . Hence for one can expand as

Similarly, for case , one can write (2.6) as


using the following recursions


In this notation represents the terms in the inner-most parenthesis in (2.6), the terms in the second inner-most parentheses in (2.6), etc, up to . It can be shown that condition (5.1) implies that


in (5.5); in order to prove this, one can start from and proceed to show that this holds for .

Let now  and apply subsequent binomial expansions to powers of in from (5.4) and (5.5)  one finds