The expectation-maximization algorithm for autoregressive models with normal inverse Gaussian innovations

The autoregressive (AR) models are used to represent the time-varying random process in which output depends linearly on previous terms and a stochastic term (the innovation). In the classical version, the AR models are based on normal distribution. However, this distribution does not allow describing data with outliers and asymmetric behavior. In this paper, we study the AR models with normal inverse Gaussian (NIG) innovations. The NIG distribution belongs to the class of semi heavy-tailed distributions with wide range of shapes and thus allows for describing real-life data with possible jumps. The expectation-maximization (EM) algorithm is used to estimate the parameters of the considered model. The efficacy of the estimation procedure is shown on the simulated data. A comparative study is presented, where the classical estimation algorithms are also incorporated, namely, Yule-Walker and conditional least squares methods along with EM method for model parameters estimation. The applications of the introduced model are demonstrated on the real-life financial data.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

05/30/2021

Normal Inverse Gaussian Autoregressive Model Using EM Algorithm

In this article, normal inverse Gaussian (NIG) autoregressive model is i...
09/19/2018

Parameter Estimation of Heavy-Tailed AR Model with Missing Data via Stochastic EM

The autoregressive (AR) model is a widely used model to understand time ...
06/22/2015

Non-Normal Mixtures of Experts

Mixture of Experts (MoE) is a popular framework for modeling heterogenei...
03/14/2018

Joint Modelling of Location, Scale and Skewness Parameters of the Skew Laplace Normal Distribution

In this article, we propose joint location, scale and skewness models of...
05/04/2016

Sampling Requirements for Stable Autoregressive Estimation

We consider the problem of estimating the parameters of a linear univari...
05/01/2021

Autoregressive Hidden Markov Models with partial knowledge on latent space applied to aero-engines prognostics

[This paper was initially published in PHME conference in 2016, selected...
03/10/2022

Controlling the flexibility of non-Gaussian processes through shrinkage priors

The normal inverse Gaussian (NIG) and generalized asymmetric Laplace (GA...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In time series modeling, the autoregressive (AR) models are used to represent the time-varying random process in which output depends linearly on previous terms and a stochastic term also known as error term or innovation term. In the classical approach, the marginal distribution of the innovation terms is assumed to be normal. However, the application of non-Gaussian distributions in time series allows modeling the outliers and asymmetric behavior visible in many real data. In a study conducted by Nelson and Granger nelson1979 , it was shown that out of 21 real time series data, only 6 were found to be normally distributed. In addition, in financial markets the distribution of the observed time series (log-returns) are mostly non-Gaussian and have tails heavier than the normal distribution, however lighter than the power law. These kind of distributions are also called semi heavy-tailed, see, e.g., Rachev2003 ; Cont2004 ; Omeya2018 . The AR models with non-Gaussian innovations are very well studied in the literature. Sim Sim1990 considered AR model of order with Gamma process as the innovation term. For AR models with innovations following a Student’s -distribution, see, e.g., Tiku2000 ; Tarami2003 ; Christmas2011 ; Nduka2018 and references therein. Note that Student’s -distribution is used in modeling of asset returns Heyde2005 .

One of the semi heavy-tailed distributions with wide range of shapes is normal inverse Gaussian (NIG), which was introduced by Barndorff-Nielsen Barndorff1997a . NIG distributions were used to model the returns from the financial time-series aw1 ; aw2 ; aw4 ; aw5 ; aw6 ; aw7 ; aw10

. In practical situations, we come across the data with skewness, extreme values or with missing values which can be easily assimilated by the NIG distribution. The distribution has stochastic representation, i.e., it can be written as the normal variance-mean mixture where the mixing distribution is the inverse Gaussian distribution and a more general distribution known as generalised hyperbolic distribution is obtained by using generalised inverse Gaussian as the mixing distribution

Barndorff2013 . In recent years, new methods dedicated to analyze and estimate the parameters related to NIG distributions based models were introduced which is a testimony of their popularity, see, e.g. stt1 ; stt2 ; stt4 ; stt5 .

It is worth mentioning, apart from applications in economics and finance, the NIG distributions have found interesting applications in many other areas, such as computer science aw8 , energy markets aw11 , commodity markets aw14 and image analysis stt7 . The multivariate NIG distributions, counterparts of univariate NIG distributions, were considered in stt3 ; stt6 and were applied in various disciplines, see e.g. aw13 .

In the literature, there are also various models and stochastic processes that are based on NIG distribution. The very first example is the NIG Lévy process, see e.g, ma4 which was also applied to financial data modeling ma7

. There are also numerous time series models with NIG distributed innovations and their various applications. We mention here the interesting analysis of heteroscedastic models

aaw3 ; ma1 ; ma2 ; ma3 ; ma5 and autoregressive models ma8 , see also ma6 .

In this paper, we study the autoregressive model of order (an AR()) with NIG innovations. Because the NIG distribution tails are heavier than the normal one, the introduced model can capture large jumps in the real-life time-series data which are quite ubiquitous. Moreover, by applying the autoregressive filter, the considered model can capture the possible short dependence in the data that is characteristic for financial time series returns. We introduce a new estimation algorithm for the analyzed model’s parameters. A step-by-step procedure for model parameters’ estimation based on expectation-maximization (EM) algorithm is proposed. According to our knowledge, the EM algorithm was not used for the time series models with NIG distributed innovations. For EM-based approach for the NIG distributions, see, e.g. stt4 . Thus, in this sense we extend this research. The efficacy of the proposed estimation procedure is shown for the simulated data. For the comparison, the model parameters are also estimated using Yule-Walker (YW) and conditional least square (CLS) methods, the most known algorithms for the AR models with finite-variance residuals. The comparative study clearly indicates that the EM-based algorithm outperforms the other considered methods. Finally, the introduced model applications are demonstrated on NASDAQ stock market index data. The introduced model explains very well the NASDAQ index data which can not be modeled using AR model with normally distributed innovations.

The rest of the paper is organized as follows: In Section 2, first we provide the main properties of the NIG distribution and then the model with NIG innovation term is introduced. A step-by-step procedure to estimate the parameters of the introduced model based on EM algorithm is discussed in Section 3. In Section 4, the efficacy of the estimation procedure is proved for simulated data. In this section we also present the comparative study, where the new technique is compared with the YW and CLS algorithm. Finally, the applications to real financial data are demonstrated. The last section concludes the paper.

2 NIG autoregressive model

In this section, we introduce the AR() model having independent identically distributed (i.i.d.) NIG innovations. However, first we remind the definition and main properties of the NIG distribution.
NIG distribution:

A random variable

is said to have a NIG distribution which is denoted by NIG

if its probability density function (pdf) has the following form

(2.1)

where , , and denotes the modified Bessel function of the third kind of order evaluated at and is defined by

Using the asymptotic properties of the modified Bessel function, Jorgensen1982 and the fact that as , we have the following expression

From the above, one can conclude that the tail probability for NIG distributed random variable

satisfies the following

where , which shows that NIG is a semi-heavy tailed distribution Rachev2003 ; Cont2004 ; Omeya2018 . It is worth mentioning that the NIG distributed random variable can be represented in the following form

(2.2)

where is a standard normal random variable i.e. and has an inverse Gaussian (IG) distribution with parameters and denoted by IG, having pdf of the following form

(2.3)

The representation given in Eq. (2.2) is useful when we generate the NIG distributed random numbers. It is also suitable to apply the EM algorithm for the maximum likelihood (ML) estimation of the considered model’s parameters. The representation (2.2) makes it convenient to find the main characteristics of a NIG distributed random variable , namely, we have

Moreover, the skewness and kurtosis of

is given by

For , the NIG distribution is symmetric. Moreover, it is leptokurtic if while it is platykurtic in case . Note that a leptokurtic NIG distribution is characterized by larger number of outliers than we have for normal distribution and thus, it is a common tool for financial data description.

NIG autoregressive model of order : Now, we can define the AR() univariate stationary time-series , with NIG innovations

(2.4)

where is a

-dimensional column vector,

is a vector of lag terms and , are i.i.d. innovations distributed as NIG. The process is a stationary one if and only if the modulus of all the roots of the characteristic polynomial are greater than one. In this article, we assume that the error term follows a symmetric NIG distribution with mean 0 i.e. Using properties of NIG distribution (see A), the conditional distribution of given and the preceding data is given by

where is the pdf given in Eq. (2.1) and is the realization of . We have and Var, where = Var and (see B).

3 Parameter estimation using EM algorithm

In this section, we provide a step-by-step procedure to estimate the parameters of the model proposed in Eq. (2.4). The procedure is based on EM algorithm. In this paper, we provide estimates of all parameters of the introduced AR() with NIG distributed innovations. It is worth to mention that EM is a general iterative algorithm for model parameter estimation by maximizing the likelihood function in the presence of missing or hidden data. The EM algorithm was introduced in Dempster1977

and it is considered as an alternative to numerical optimization of the likelihood function. It is popularly used in estimating the parameters of Gaussian mixture models (GMMs), estimating hidden Markov models (HMMs) and model-based data clustering algorithms. Some extensions of EM include the expectation conditional maximization (ECM) algorithm

Meng1993 and expectation conditional maximization either (ECME) algorithm Liu1995 . For a detailed discussion on the theory of EM algorithm and its extensions we refer the readers to McLachlan2007 . The EM algorithm iterates between two steps, namely the expectation step (E-step) and the maximization step (M-step). In our case, the observed data is assumed to be from NIG and the unobserved data follows IG. The E-step computes the expectation of the complete data log-likelihood with respect to the conditional distribution of the unobserved or hidden data, given the observations and the current estimates of the parameters. Further, in the M-step, a new estimate for the parameters is computed which maximize the complete data log-likelihood computed in the E-step. We find the conditional expectation of log-likelihood of complete data with respect to the conditional distribution of given . For we find

in the E-step where represents the estimates of the parameter vector at -th iteration. Further, in the M-step, we compute the parameters by maximizing the expected log-likelihood of complete data found in the E-step such that

The algorithm is proven to be numerically stable McLachlan2007 . Also, as a consequence of Jensen’s inequality, log-likelihood function at the updated parameters will not be less than that at the current values . Although there is always a concern that the algorithm might get stuck at local extrema, but it can be handled by starting from different initial values and comparing the solutions. In next proposition, we provide the estimates of the parameters of the model defined in eq. (2.4) using EM algorithm.

Proposition 3.1.

Consider the AR() time-series model given in Eq. (2.4) where error terms follow NIG. The maximum likelihood estimates of the model parameters using EM algorithm are as follows

(3.5)
(3.6)

where , , , , and

Proof.

See C. ∎

4 Simulation study and applications

In this section, we illustrate the performance of the proposed model and the introduced estimation technique using simulated data sets and real time series of NASDAQ stock exchange data.

4.1 Simulation study

We discuss the estimation procedure for AR(2) and AR(1) models. The model (2.4) is simulated in two steps. In the first step, the NIG innovations are simulated using the normal variance-mean mixture form (2.2). For NIG random numbers, standard normal and IG random numbers are required. The algorithm mentioned in Devroye1986 is used to generate iid IG distributed random numbers using the following steps:

  • Step 1: Generate standard normal variate and set .

  • Step 2: Set .

  • Step 3: Generate uniform random variate .

  • Step 4: If , then ; else

Note that the substitutions for parameters as and are required in the above algorithm because the pdf taken in Devroye1986 is different from the form given in (2.3). Again we simulate a standard normal vector of size and use (2.2) with simulated IG random numbers to obtain the NIG random numbers of size . In step 2, the simulated NIG innovations and the relation given in (2.4) are used to generate AR() simulated series.

Case 1: In the simulation study, first we analyze the AR(2) model with NIG innovations. In the analysis we used trajectories of length each. The used parameters of the model are: and while the residuals were generated from NIG distribution with and . The exemplary time series data plot and scatter plot of innovation terms are shown in Fig. 1.

(a) The time series data plot.
(b) Scatter plot of innovation term.
Figure 1: The exemplary time series of length (left panel) and the corresponding innovation term (right panel) of the AR(2) model with NIG distribution. The parameters of the model are: .

Now, for each simulated trajectory, we apply the estimation algorithm presented in the previous section. For EM algorithm several stopping criteria could be used. One of the examples is the criterion based on change in the log-likelihood function which utilizes the relative change in the parameters’ values. We terminate the algorithm when the following criterion for the relative change in the parameters’ values is satisfied (this criterion is commonly used in literature)

(4.7)

The parameters’ estimates obtained from the simulated data are shown in the boxplot in Fig. 2. Moreover, we compared the estimation results with the classical YW algorithm and CLS method. We remind that the YW algorithm is based on the YW equations calculated for the considered model, and utilizes the empirical autocovariance function for the analyzed data. More details of YW algorithm for autoregressive models can be found, for instance in brockwell2016introduction . We remind, the CLS method estimates the model parameters for dependent observations by minimizing the sum of squares of deviations about the conditional expectation.

Fig. 2(a) and Fig. 2(b) represent the estimates of the model parameters and using YW, CLS and EM methods, respectively. Furthermore, using the estimated and parameters with YW and CLS methods the residuals or innovation terms are obtained and then again EM algorithm is used to estimate the remaining and parameters which are plotted in Fig. 2(c) and 2(d). Moreover, the estimates for and using directly EM algorithm given in (3.6) are also plotted in Fig. 2(c) and 2(d). From boxplots presented in Fig. 2 we observe that the estimates of and parameters using the EM algorithm have less variance in comparison to the YW and CLS algorithms. Moreover, for and parameters, we see that the means of the estimates for the three presented methods are close to the true values, but the range of outliers for the EM algorithm is comparatively less. Therefore, we can infer that EM algorithm performs better (in comparison of other considered algorithms) in the parameter estimation of AR() model with NIG innovations.

(a) Boxplots for estimates.
(b) Boxplots for estimates.
(c) Boxplots for estimates.
(d) Boxplots for estimates.
Figure 2: Boxplots of the estimates of the AR(2) model’s parameters with theoretical values: represented with blue dotted lines. The boxplots are created using trajectories each of length .

Case 2: As the second example, we analyze the AR(1) model with NIG innovations. Here we examine the trajectories of data points. The same number of data points are examined in the real data analysis demonstrated in the next subsection. This exemplary model is discussed to verify the results for the real data. The simulated errors follow from NIG distribution with parameter and while the model’s parameter is . In Fig. 3, we present the exemplary simulated trajectory and the corresponding innovation terms.

(a) Time series data plot
(b) Scatter plot of innovation terms
Figure 3: The exemplary time series plot of the first trajectory of length from AR(1) model (left panel) with the corresponding scatter plot of innovation terms NIG distribution (right panel). The parameters of the model are .

Similarly as in Case 1, the introduced EM algorithm was applied to the simulated data with the stopping criteria based on the relative change in the parameter values defined in Eq. (4.7). The boxplots of the estimated parameters for trajectories each of length are shown in Fig. 4. Similar as in the previous case, we compare the results for EM, YW and CLS algorithms.

(a) Boxplot for estimate.
(b) Boxplot for estimate.
(c) Boxplot for estimate.
Figure 4: Boxplots of the estimates of the AR(1) model’s parameters with theoretical values: represented with blue dotted lines. The boxplots are created using trajectories each of length .

From Fig. 4 one can observe that although the estimate of has more variance compared to YW and CLS methods, but the estimates and have less variance and the spread of outliers is also slightly less. The means of the estimated parameters from trajectories of length using EM algorithm are . We can conclude that the EM algorithm, also in this case, gives the better parameters’ estimates for the considered model.

4.2 Real data applications

In this part, the considered AR() model with NIG distribution is applied to the NASDAQ stock market index data, which is available on Yahoo finance nasdaq . It covers the historical prices and volume of all stocks listed on NASDAQ stock exchange from the period March 04, 2010 to March 03, 2020. The data consists of data points with features having open price, closing price, highest value, lowest value, adjusted closing price and volume of stocks for each working day end-of-the-day values. We choose the end-of-the-day adjusted closing price as a univariate time series for the analysis purpose. The innovation terms of time series data is assumed to follow NIG distribution as the general one. In Fig. 5 we represent the adjusted closing price of NASDAQ index.

Figure 5: The adjusted closing price (in$) of NASDAQ index from the period March 04, 2010 to March 03, 2020 with data points.

Observe that the original time series data has an increasing trend. Moreover, one can easily observe that the data exhibit non-homogeneous behavior. Thus, before further analysis the analyzed time series should be segmented in order to obtain the homogeneous parts. To divide the vector of observations into homogeneous parts, we applied the segmentation algorithm presented in acta , where authors proposed to use the statistics defined as the cumulative sum of squares of the data. Finally, the segmentation algorithm is based on the specific behavior of the used statistics when the structure change point exists in the analyzed time series. More precisely, in acta it was shown that the cumulative sum of squares is a piece-wise linear function when the variance of the data changes. Because in the considered time series we observe the non-stationary behavior resulting from the existence of the deterministic trend, thus, to find the structure break point, we applied the segmentation algorithm for their logarithmic returns. Finally, the algorithm indicates that the data needs to be divided into two segments, the first observations are considered as data 1 and rest all observations as - data 2.

(a) Data 1
(b) Data 2
Figure 6: The segmented data 1 (left panel) and data 2 (right panel) together with the fitted polynomials.

The trends for both data sets were removed by using the degree 6 polynomial detrending. The trend was fitted by using the least squares method. The original data sets with the fitted polynomials are shown in Fig. 6. Next, for data 1 and data 2 we analyze the detrending time series and for each of them we use the partial autocorrelation function (PACF) to recognize the proper order of AR model. It is worth mentioning the PACF is a common tool to find the optimal order of the autoregressive models brockwell2016introduction . We select the best order that is equal to the lag corresponding to the largest PACF value (except a lag equal to zero). We use the PACF plots to determine the components of AR() model. Fig. 7 shows the stationary data (after removing the trend) and corresponding PACF plots indicating the optimal model - AR(1).

(a) Stationary data 1.
(b) PACF of data 1.
(c) Stationary data 2.
(d) PACF of data 2.
Figure 7: The time-series plot of stationary data 1 and data 2 (after removing the trend) - left panel and corresponding PACF plot - right panel.

After above-described pre-processing steps, the EM algorithm is used to estimate the model’s parameters. In the proposed model, and parameters are kept fixed. The estimated values of parameters for data 1 and data 2 are summarised in Table 1.

Data 1
Data 2
Table 1: Estimated parameters of data 1 and data 2 using EM algorithm.

As the final step, we analyze the innovation terms corresponding to data 1 and data 2 to confirm they can be modeled by using NIG distribution. The Kolmogorov-Smirnov (KS) test is used to check the normality of the residuals corresponding to data 1 and data 2. The KS test is a non-parametric test which is used as goodness-of-fit test by comparing the distribution of samples with the given probability distribution (one-sample KS test) or by comparing the empirical distribution function of two samples (2-sample KS test)

ks1951 . First, we use the one-sample KS test to reject the hypothesis of normal distribution of the residuals. The -value of the KS test is

for both cases, indicating that the null hypothesis (normal distribution) is rejected for both series. Thus, we applied two-sample KS test for both innovation terms with the null hypothesis of NIG distribution. The tested NIG distributions have the following parameters:

- for residuals corresponding to data 1; and - for the residuals corresponding to data 2. The -values for 2-sample KS test are and for data 1 and data 2, respectively, which indicates that there is no evidence to reject the null hypothesis. Therefore, we assume that both the residual series follow the same distribution, implying that data 1 follows NIG and data 2 has NIG.

(a) QQ plot between innovations of data 1 and normal distribution.
(b) QQ plot between innovations of data 1 and NIG distribution.
Figure 8: QQ plots of innovation terms of data 1 compared with (a) normal distribution and (b) NIG distribution.
(a) QQ plot between innovations of data 2 and normal distribution.
(b) QQ plot between innovations of data 2 and NIG distribution.
Figure 9: QQ plots of innovation terms of data 2 compared with (a) normal distribution and (b) NIG distribution.

To confirm that NIG distributions (with fitted parameters) are acceptable for the residual series, in Fig. 8 and Fig. 9 we demonstrate the QQ plot for the residuals of AR(1) models for data 1 and data 2 and the simulated data from normal (left panels) and corresponding NIG distributions (right panels). Observe that the tail of both data 1 and data 2 deviates from the red line on the left panels, which indicates that the data does not follow the distribution.

(a) KDE plot for data 1
(b) KDE plot for data 2
Figure 10: Kernel density estimation plots for comparing the innovation terms of data 1 with NIG distribution and normal distribution (left panel) and data 2 with NIG and normal distribution (right panel).
(a) Data 1
(b) Data 2
Figure 11:

The adjusted closing price of NASDAQ index for both segments (blue lines) along with the quantile lines of

constructed base on the fitted AR(1) models with NIG distribution with added trends.

The correspondence with NIG distribution is also demonstrated in Fig. 10, where the kernel density estimation (KDE) plot is presented for the residual series and compared with the pdf of the normal and corresponding NIG distributions. A brief description of KDE method is given in D. As one can see, the KDE plots clearly indicate the NIG distribution is the appropriate one for the residual series.

Finally, to confirm that the fitted models are appropriate for data 1 and data 2, we constructed the quantile plots of the data simulated from the models for which the removed polynomials were added. The quantile lines are constructed based on simulated trajectories with the same lengths as data 1 and data 2. In Fig. 11 we present the constructed quantile lines on the levels and the adjusted closing price of the NASDAQ index.

The presented results for the real data indicate that the AR(1) model with innovation terms corresponding to NIG distribution can be useful for the financial data with visible extreme values.

5 Conclusions

The heavy-tailed and semi-heavy-tailed distributions are at the heart of the financial time-series modeling. NIG is a semi-heavy tailed distribution which has tails heavier than the normal and lighter than the power law tails. In this article, the AR models with NIG distributed innovations are discussed. We have demonstrated the main properties of the analyzed systems. The main part is devoted to the new estimation algorithm for the considered models’ parameters. The technique incorporates the EM algorithm widely used in the time series analysis. The effectiveness of the proposed algorithm is demonstrated for the simulated data. It is shown that the new technique outperforms the classical approaches based on the YW and CLS algorithms. Finally, we have demonstrated that an AR() model with NIG innovations explain well the NASDAQ stock market index data for the period March 04, 2010 to March 03, 2020. We believe that the discussed model is universal and can be used to describe various real-life time series ranging from finance and economics to natural hazards, ecology, and environmental data.

Acknowledgements: Monika S. Dhull would like to thank the Ministry of Education (MoE), Government of India, for supporting her PhD research work.

The work of A.W. was supported by National Center of Science under Opus Grant No. 2020/37/B/HS4/00120 “Market risk model identification and validation using novel statistical, probabilistic, and machine learning tools”. A.K. would like to express his gratitude to Science and Engineering Research Board (SERB), India, for the financial support under the MATRICS research grant MTR/2019/000286.

References

  • (1) H. L. Nelson(Jr.) and C. Granger, “Experience with using the Box-Cox transformation when forecasting economic time series,” J. Econom., vol. 10, no. 1, pp. 57–69, 1979.
  • (2) S. T. Rachev, Handbook of Heavy Tailed Distributions in Finance: Hand-books in Finance. Amsterdam, The Netherlands: Elsevier, 2003.
  • (3) R. Cont and P. Tankov, Financial Modeling With Jump Processes. Chapman & Hall, CRC Press, London, UK, 2004.
  • (4) E. Omeya, S. V. Gulcka, and R. Vesilo, “Semi-heavy tails,” Lith. Math. J., vol. 58, pp. 480–499, 2018.
  • (5) C. H. Sim, “First-order autoregressive models for gamma and exponential processes,” J. Appl. Prob., vol. 27, pp. 325–332, 1990.
  • (6) M. L. Tiku, W. K. Wong, D. C. Vaughan, and G. Bian, “Time series models in non-normal situations: Symmetric innovations,” J. Time Ser. Anal., vol. 21, pp. 571–596, 2000.
  • (7) B. Tarami and M. Pourahmadi, “Multi-variate autoregressions: Innovations, prediction variances and exact likelihood equations,” J. Time Ser. Anal., vol. 24, pp. 739–754, 2003.
  • (8) J. Christmas and R. Everson, “Robust autoregression: Student- innovations using variational Bayes,” IEEE Trans. Signal Process., vol. 59, pp. 48–57, 2011.
  • (9) U. C. Nduka, “EM-based algorithms for autoregressive models with -distributed innovations,” Commun. Statist. Simul. Comput., vol. 47, pp. 206–228, 2018.
  • (10) C. C. Heyde and N. N. Leonenko, “Student processes,” Adv. Appl. Probab., vol. 37, pp. 342–365, 2005.
  • (11) O. Barndorff-Nielsen, “Exponentially decreasing distributions for the logarithm of particle size,” Proc. Roy. Soc. London Ser. A., vol. 353, pp. 401–409, 1977.
  • (12) L. Alfonso, R. Mansilla, and C. A. Terrero-Escalante, “On the scaling of the distribution of daily price fluctuations in the Mexican financial market index,” Physica A, vol. 391, no. 10, pp. 2990–2996, 2012.
  • (13) K. Burnecki, J. Gajda, and G. Sikora, “Stability and lack of memory of the returns of the Hang Seng index,” Physica A, vol. 390, no. 18–19, pp. 3136–3146, 2011.
  • (14) O. Barndorff-Nielsen, “Normal inverse Gaussian distributions and stochastic volatility modelling,” Scand. J. Stat., vol. 24, pp. 1–13, 1977.
  • (15) R. Weron, Computationally intensive value at risk calculations. Springer, Berlin, 2004.
  • (16) R. Weron, K. Weron, and A. Weron, “A conditionally exponential decay approach to scaling in finance,” Physica A, vol. 264, no. 3-4, pp. 551–561, 1999.
  • (17) A. Kalemanova, B. Schmid, and R. Werner, “The normal inverse Gaussian distribution for synthetic CDO pricing,” J. Deriv., vol. 14, no. 3, pp. 80–94, 2007.
  • (18) F. E. Benth, M. Groth, and P. C. Kettler, “A quasi-Monte Carlo algorithm for the normal inverse Gaussian distribution and valuation of financial derivatives,” Int. J. Theor. Appl. Finance, vol. 9, no. 6, pp. 843–867, 2006.
  • (19) O. Barndorff-Nielsen, T. Mikosch, and S. I. Resnick, “Lévy processes: Theory and applications,” Proc. Roy. Soc. London Ser. A., 2013.
  • (20) Y.-P. Chang, M.-C. Hung, H. Liu, and J.-F. Jan, “Testing symmetry of a NIG distribution,” Commun. Stat. - Simul. Comput.®, vol. 34, no. 4, pp. 851–862, 2005.
  • (21) A. Hanssen and T. A. Oigard, “The normal inverse Gaussian distribution: A versatile model for heavy-tailed stochastic processes,” in 2001 Proc. - ICASSP IEEE Int. Conf. Acoust. Speech Signal Process. (Cat. No. 01CH37221), vol. 6, pp. 3985–3988, IEEE, 2001.
  • (22) D. Karlis, “An EM type algorithm for maximum likelihood estimation of the normal-inverse Gaussian distribution,” Statist. Probab. Lett., vol. 57, pp. 43–52, 2002.
  • (23)

    D. Karlis and J. Lillestöl, “Bayesian estimation of NIG models via Markov chain Monte Carlo methods,”

    Appl. Stoch. Models Bus. Ind., vol. 20, no. 4, pp. 323–338, 2004.
  • (24) H. Sadreazami, M. O. Ahmad, and M. N. S. Swamy, “Multiplicative watermark decoder in contourlet domain using the normal inverse Gaussian distribution,” IEEE Trans. Multimed., vol. 18, no. 2, pp. 196–207, 2016.
  • (25) F. E. Benth and J. uratė Ŝaltytė Benth, “The normal inverse Gaussian distribution and spot price modelling in energy markets,” Int. J. Theor. Appl. Finance, vol. 7, no. 2, pp. 177–192, 2004.
  • (26) Q. Z. Zou, W. Y. Dou, and J. L. Li, “Study on distribution of oil price return based on NIG distribution,” in 2010 International Conference on Management and Service Science, pp. 1–5, 2010.
  • (27) X. Zhang and X. Jing, “Image denoising in contourlet domain based on a normal inverse Gaussian prior,” Digit. Signal Process., vol. 20, no. 5, pp. 1439–1446, 2010.
  • (28) Y.-P. Chang, M.-C. Hung, S.-F. Wang, and C.-T. Yu, “An EM algorithm for multivariate NIG distribution and its application to value-at-risk,” Int. J. of Inf., vol. 21, no. 3, pp. 265–283, 2010.
  • (29) T. A. Oigard, A. Hanssen, R. E. Hansen, and F. Godtliebsen, “EM-estimation and modeling of heavy-tailed processes with the multivariate normal inverse Gaussian distribution,” Signal Process., vol. 85, pp. 1655–1673, 2005.
  • (30) K. Aas, I. H. Haff, and X. K. Dimakos, “Risk estimation using the multivariate normal inverse Gaussian distribution,” J. Risk, vol. 8, no. 2, pp. 39–60, 2005/2006.
  • (31) T. H. Rydberg, “The normal inverse Gaussian Lévy process: simulation and approximation,” Commun. Stat. Stoch. Model., vol. 13, no. 4, pp. 887–910, 1997.
  • (32) H. Albrecher and M. Predota, “On Asian option pricing for NIG Lévy processes,” J. Comput. Appl. Math., vol. 172, no. 1, pp. 153–168, 2004.
  • (33) A. Wyłomańska, “How to identify the proper model,” Acta Phys. Pol. B, vol. 43, no. 5, pp. 1241–1253, 2012.
  • (34) J. Gajda, G. Bartnicki, and K. Burnecki, “Modeling of water usage by means of ARFIMA–GARCH processes,” Physica A, vol. 512, pp. 644–657, 2018.
  • (35) R. Kiliç, “Conditional volatility and distribution of exchange rates: GARCH and FIGARCH models with NIG distribution,” Stud. Nonlinear Dyn. Econom., vol. 11, no. 3, 2007.
  • (36) L. Forsberg and T. Bollerslev, “Bridging the gap between the distribution of realized (ECU) volatility and ARCH modelling (of the Euro): the GARCH-NIG model,” J. Appl. Econom., vol. 17, no. 5, pp. 535–548, 2002.
  • (37) M. B. Jensen and A. Lunde, “The NIG-S&ARCH model: a fat-tailed, stochastic, and autoregressive conditional heteroskedastic volatility model,” Econom. J., vol. 4, no. 2, p. 319–342, 2001.
  • (38) J. Weiss, P. Bernardara, M. Andreewsky, and M. Benoit, “Seasonal autoregressive modeling of a skew storm surge series,” Ocean Model(Oxf), vol. 47, pp. 41–54, 2012.
  • (39) A. Wilhelmsson, “Value at risk with time varying variance, skewness and kurtosis—the NIG-ACD model,” Econom. J., vol. 12, no. 1, pp. 82–104, 2009.
  • (40) B. Jorgensen, “Statistical properties of the generalized inverse Gaussian distribution. Lecture Notes in Statistics,” Springer-Verlag, New York, vol. 9, 1982.
  • (41) A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via EM algorithm,” J. R. Stat. Soc. Ser. B 39, vol. 39, pp. 1–38, 1977.
  • (42) X. L. Meng and D. B. Rubin, “Maximum likelihood estimation via the ECM algorithm: A general framework,” Biometrika, vol. 80, pp. 267–278, 1993.
  • (43) C. H. Liu and D. B. Rubin, “ML estimation of -distribution using EM and its extensions, ECM and ECME,” Statist. Sinica., vol. 5, pp. 19–39, 1995.
  • (44) G. J. McLachlan and T. Krishnan, The EM Algorithm and Extensions, 2ed. John Wiley & Sons, 2007.
  • (45) L. Devroye, Non-Uniform Random Variate Generation, 1ed. Springer, New York, 1986.
  • (46) P. J. Brockwell and R. A. Davis, Introduction to Time Series and Forecasting. Springer, 2016.
  • (47) “Historical data of NASDAQ stock exchange available on Yahoo Finance.” https://finance.yahoo.com/quote/%5EIXIC/history?period1=1267660800&period2=1583193600&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true.
  • (48) J. Gajda, G. Sikora, and A. Wyłomańska, “Regime variance testing- A quantile approach,” Acta Phys. Pol. B, vol. 44, no. 5, pp. 1015–1035, 2013.
  • (49) F. J. Massey(Jr.), “The Kolmogorov-Smirnov test for goodness of fit,” J. Am. Stat. Assoc., vol. 46, pp. 68–78, 1951.
  • (50) J. Lillestol, “Risk analysis and the NIG distribution,” J. Risk, vol. 4, p. 41–55, 2000.
  • (51) A. Elgammal, R. Duraiswami, D. Harwood, and L. Davis, “Background and foreground modeling using non-parametric kernel density estimation for visual surveillance,” Proceedings of the IEEE, vol. 90, no. 7, pp. 1151–1163, 2002.

Appendix A Additional properties of NIG distribution

The following properties of NIG distribution are presented in Lillestol2000 . Let NIG , then following holds.

  1. The moment generating function of

    is

  2. If NIG , then NIG ,  

  3. If NIG , then NIG ,  

  4. If NIG and NIG are independent then the sum NIG .

  5. If NIG , then NIG .

Appendix B Mean and variance of AR() model with NIG innovations

We have . Assuming stationarity, it follows that

Further, for , we have and

(B.8)

where = Var and Moreover,

(B.9)

For , using (B) and (B.9), it is easy to show that

where Again for , usiigarg (B) and (B.9), it follows

Appendix C Proof of Proposition 3.1

Proof.

For AR(p) model, let for denote the complete data. The observed data is assumed to be from NIG and the unobserved data follows IG. We can write the innovation terms as follows

Note that and the conditional pdf is

Now, we need to estimate the unknown parameters . We find the conditional expectation of log-likelihood of unobserved/complete data with respect to the conditional distribution of given . Since the unobserved data is assumed to be from IG therefore, the posterior distribution is generalised inverse Gaussian (GIG) distribution i.e.,

The conditional first moment and inverse first moment are as follows:

These first order moments will be used in calculating the conditional expectation of the log-likelihood function. The complete data likelihood is given by

The log likelihood function will be

Now in E-step of EM algorithm, we need to compute the expected value of complete data log likelihood known as , which is expressed as

where, and . Update the parameters by maximizing the function using the following equations

Solving the above equations, we obtain the following estimates of the parameters

(C.10)

where

Appendix D KDE method

The KDE method also known as Parzen–Rosenblatt window method, is a non-parametric approach to find the underlying probability distribution of data. It is a technique that lets one to create a smooth curve given a set of data and one of the most famous method for density estimation. For a sample having distribution function has the kernel density estimate defined as elgammal2002

where is kernel function with bandwidth such that .