 # Statistical Inference for Stable Distribution Using EM algorithm

The class of α-stable distributions with a wide range of applications in economics, telecommunications, biology, applied, and theoretical physics. This is due to the fact that it possesses both the skewness and heavy tails. Since α-stable distribution suffers from a closed-form expression for density function, finding efficient estimators for its parameters has attracted a great deal of attention in the literature. Here, we propose some EM algorithm to estimate the maximum likelihood estimators of the parameters of α-stable distribution. The performance of the proposed EM algorithm is demonstrated via comparison study in the presence of other well-known competitors and analyzing three sets of real data.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Despite the lack of non-analytical expression for the probability density function (pdf), the class of

-stable distributions are becoming increasingly popular in such fields as economics, finance, and insurance. Details for the applications of -stable distributions in these fields can be found in , , , , , , and 

. The only exceptions for which the pdf has analytical expression are Gaussian, Levy, and Cauchy distributions. This can be regarded as a major obstacle in the way of using this class in practice. This is while the characteristic function (chf) of

-stable distributions has closed-form expression and takes different forms, see  and . In what follows we review briefly two important forms which are known in the literature as and

parameterizations. If random variable

follows -stable distribution, then the chf of , i.e., , in parameterization is given by

 φY(t)=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩exp{−|σt|α[1−jβ sgn(t)tan(πα2)]+jtμ0−jtβσtan(πα2)}, if α≠1,exp{−|σt|[1+jβ sgn(t)2πlog|t|]+jtμ0−2πtβσlogσ},         if α=1. (1)

where =-1 and is the well-known sign function. The family of stable distributions has four parameters: tail thickness , skewness , scale , and location . If =0, it would be the class of symmetric -stable (SS) distributions. If =1 and , we have the class of the positive stable distributions. In this case, varies over the positive semi-axis of real line. Also, the chf of in parameterization is given by

 (2)

where is the location parameter. The pdf and chf in parameterization are continuous functions of all four parameters, while the pdf and chf in parameterization both has discontinuity at . As it is seen, the parameterizations (2) and (1) differs only for the location parameter. The location parameters in and parameterizations, respectively shown by and , are related as

 μ1=⎧⎨⎩μ0−βσtan(πα2),     if α≠1,μ0−β2πσlogσ,         if α=1. (3)

Clearly, when both and parameterizations are equal.

Hereafter, we write and to denote the class of stable distributions in and

parameterizations, respectively. For the class of normal distributions with mean

and variance

, i.e., , the pdf at point is shown by . The generic symbol

accounts for the family of exponential distributions with rate parameter

, and denotes a Weibull distribution with pdf ; for , shape parameter , and scale parameter . Also , , and are pdfs of distributions , , and , respectively.

This paper is organized as follows. In what follow firstly the EM algorithm and its extensions are reviewd briefly, then some properties of -stable distributions are given. In Section 2, the proposed EM algorithm is introduced. Model validation using simulations and real data analysis is carried out in Section 3. Some conclusions are made in Section 4.

### 1.1 EM algorithm and its extensions

The EM algorithm is the most popular approach for estimating the parameters of a statistical model when we encounter missing (or latent) observations, see . Assume that

denotes the vector of complete data with pdf

in which consists of observed and missing values, respectively. Also, denotes the likelihood function of complete data in which is the parameter vector. The EM algorithm finds the parameter vector that maximizes the conditional expectation of the log-likelihood of complete data given the observed data and a current guess , of the parameter vector. Usually, the EM algorithm works as follows.

1. E-step: given and , it computes .

2. M-step: it finds such that is maximized.

Notice that refers to the logarithm of . Both steps of the EM algorithm are repeated until convergence occurs, see . The convergence of the EM algorithm is guaranteed by . In what follows we describe two extensions of the EM algorithm.

#### 1.1.1 ECM Algorithm

When implementing the M-step of the EM algorithm is mathematically impossible, an additional step is considered. Besides the E- and M-step, a sequence of the conditional maximization is carried out. The new step is known as the CM-step and the algorithm is known as the ECM algorithm. In the ECM algorithm, the CM-step finds the maximum of through maximizing the constrained marginal log-likelihood function, see  and .

#### 1.1.2 Stochastic EM Algorithm

For a complete data of size , assuming we are currently at th iteration, each stochastic EM (SEM) algorithm amounts to a four-step sequence given by the following.

1. Given and , a sequence of latent (or missing) variables, i.e., is simulated from posterior pdf ; for .

2. The simulated latent realizations are replaced into the log-likelihood function of complete data.

3. The EM algorithm is applied to the set of complete data, in which ; for , is the vector of observed and latent realizations.

4. The vector of parameters are updated as and then it is used to simulate from posterior pdf ; for .

Under some mild regularity conditions, by repeating above four-step process for a sufficiently large number of cycles, the distribution of constitutes a Markov chain that converges to a stationary distribution, see , , and . Contrary to the EM algorithm, for the SEM algorithm convergence occurs in distribution and in practice the number of cycles is determined through a graphical display. Suppose to be the length of burn-in period and is a sufficiently large number, the SEM algorithm estimates as:

 ^Θ=1M−M0M∑t=M0+1Θ(t).

### 1.2 Some properties of α-stable distribution

Estimating the parameters of a -stable distribution through the EM algorithm needs a hierarchy or stochastic representation. Here we give three useful results which play a large role to follow the next chapter.

###### Proposition 1.1

Suppose , , and . We have

 Yd=η√2PN+θV+μ0−λ (4)

where denotes the equality in distribution, , , , and . All random variables , , and are mutually independent.

###### Proposition 1.2

Let be independent of . Then,

 Y−θV−μ0+λδ∼S1(α,0,1,0)

where , , and .

###### Proposition 1.3

Let be independent of . Then,

 S√2Ed=NW.

where and .

## 2 Proposed EM algorithm

Assume that constitute a sequence of identically and independent realizations of -stable distribution in parameterization. The vector of complete data associated with (4) is shown by in which , are vectors of realizations of latent variables and , respectively. It turns out that representation (4) admits the following hierarchy.

 Y|P=p,V=v∼ N(μ0−λ+θv,2pη2), P∼ V∼ S1(α,1,1,0), (5)

where , , and are defined after Proposition 1.1. Using hierarchy (2), the log-likelihood function of complete data is

 lc(Θ)= C−nlogη−12n∑i=1(yi−μ0+λ−θvi√2piη)2+n∑i=1logfPi(pi|α)+n∑i=1logfVi(vi|α),

where C is a constant independent of the parameters vector . The conditional expectation of , i.e., is

 Q(Θ∣∣Θ(t))= C−nlogη−θ24η2n∑i=1E(t)2i+θ2η2n∑i=1(yi−μ0+λ)E(t)1i −14η2n∑i=1(yi−μ0+λ)2E(t)0i+n∑i=1E(logfPi(pi|α))+n∑i=1E(logfVi(vi|α)).

To complete the E-step, we need to compute the quantities ; for . Now, it is straightforward to show that

 E(t)ri= 12η(t)√πfY(yi∣∣α(t),β(t),σ(t),μ(t)0) ×∫∫p−1.5vrexp{−12(yi−μ(t)0+λ(t)−θ(t)v√2pη)2}h(v|α(t))g(p|α(t))dvdp, (6)

where , , and . Details for evaluating in (2) is described in Appendix 1. Assuming we are currently at th iteration, all steps of the proposed EM algorithm including the E-, M-, CM-step, and maximizing the profile log-likelihood function are given by the following.

• E-step: Given current guess of , i.e., , the quantity ; for is computed.

• M-step: Given , the parameter vector is updated as by maximizing with respect to . In the M-step, the location parameter is updated as

 μ(t+1)0=∑ni=1(yi+λ(t))E(t)0i+η(t)∑ni=1E(t)1i∑ni=1E(t)0i,

and the updated scale parameter , is the root of equation in which ,

 b= β(t)tan(πα(t)2)∑ni=1(yi−μ(t+1)0+λ(t))E(t)0i2(1−|β(t)|)2α(t) −sgn(β(t))|β(t)|1α(t)∑ni=1(yi−μ(t+1)0+λ(t))E(t)1i2(1−|β(t)|)2α(t),

and

 c=∑ni=1(yi−μ(t+1)0+λ(t))2E(t)0i2(1−|β(t)|)2α(t).

The updated skewness parameter is obtained as where

 F(β)= −nlog(1−|β|)α(t)+14(|β|1−|β|)2α(t)n∑i=1E(t)2i +14(1−|β|)2αn∑i=1(yi−μ(t+1)0+βσ(t+1)tan(πα(t)/2))2E(t)0i. (7)
• CM-step: The tail thickness parameter is updated in the CM-step by maximizing the marginal log-likelihood function with respect to as

 α(t+1)=argmaxα n∑i=1logf(yi∣∣α,β(t+1),σ(t+1),μ(t+1)0)

For this, at -th iteration of the EM algorithm, we apply the SEM algorithm by three following steps to obtain .

1. Let be independent and identically distributed realizations from . Consider the transformation in which

 y∗i=yi−θ(t+1)vi−μ(t+1)0−λ(t+1)δ(t+1)

where and ; for . It turns out, from Propositions 1.2 and 1.3, that

 Y∗∗i∣∣Wi=wi ∼N(0,w−2i), Wi ∼W(α,1). (8)

Based on hierarchy (1), the log-likelihood of complete data can be written as

 lc(α∣∣y∗∗i)= C+nlogα−n∑i=1wαi+αn∑i=1logwi. (9)
2. Considering as the latent variable, we simulate from posterior distribution given and ; for , using the method described in Appendix 3.

3. Substitute in right-hand side of (9) and maximize it with respect to to obtain . Obtaining , we go back to the step one and repeat the CM-step for cycles. This yields a sequence of updated tail thickness parameter as: . Now, the tail thickness parameter is updated as

 α(t+1)=1M−M0M∑j=M0+1α(t+1,j),

where in the length of burn-in period of the SEM and is updated tail thickness parameter at th cycle of the CM-step while we are at -th iteration of the EM algorithm. Once we obtain , we go back and perform the EM algorithm from E-step for a sufficiently large number of iterations, say . After a burn-in period of length , the EM algorithm converges to the true distribution.

Updating the skewness parameter by maximizing in (2) yields result that goes to zero. So, we update using optimization tools such as developed in package by maximizing profile log-likelihood function. For this, after obtaining EM-based estimations of , , and , namely, , , and , we maximize with respect to to obtain . Fortunately, this approach leads to satisfactory results.

## 3 Model validation using simulated and real data

Here, firstly, we perform a simulation study to compare the performance of the proposed EM algorithm with the ML approach for estimating the parameters of the class . To do this, data are generated by the method of simulating -stable random variable (see ) and then the maximum likelihood (ML) estimations are evaluated using software, see . Secondly, we apply the proposed EM and ML approaches to the five sets of real data.

### 3.1 Simulation study

We apply the proposed EM and ML approaches to the 200 sets of samples of size 300. For the sake of simplicity, we restrict ourselves to the case of . In each iteration, settings for the skewness and tail thickness parameters are: and . The results of simulations are shown in Figure 1 and Figure 2 for and , respectively. During simulations, we set , , , and . This means that the proposed EM algorithm is run for 140 iterations of which the first 100 iterations are removed as burn-in period. Also, in each iteration, the CM-step is repeated for 40 times of which the average of the last 20 runs is considered as the updated tail thickness parameter.

Form Figure 1 and Figure 2, in the sense of square root of the mean-squared error (RMSE) criterion, the following observations can be made.

• The EM-based estimator of the scale parameter outperforms the corresponding ML-based estimator when (scale parameter is small).

• The EM-based estimator of the tail thickness parameter, skewness, and scale parameters outperform the corresponding ML-based estimators when (scale parameter is large).

It should be noted that the proposed EM algorithm shows better performance than the sample quantile (SQ, see

) and empirical characteristic function (CF, see ) approaches, and so were eliminated by competitions. Also, as it is known, the EM algorithm cannot outperform the ML method. Sometimes, as noted above, the EM algorithm shows better performance than the ML method. This is because, computes the evaluated ML estimators not the exact ML ones.

### 3.2 Model validation via real data

The -stable distribution is the most common used candidate for modelling the prices of speculative assets such as stock returns, see , , , and . Here, we consider three sets of real data for illustrating an application of the -stable distribution. The data are daily price returns of the major European stock indices including Switzerland SMI, France CAC, and UK FTSE. Following common practice for daily closing prices, we consider the transformed prices for business days as ; for . We then fitted the distribution to the transformed data by the four methods including ML, EM, SQ, and CF. We obtained three sets of data from package developed for (R Core Team ) environment which include observations. The results after applying above four approaches are given in Table 1. It should be noted that, for implementing the EM algorithm, the started , , , and are well away from the estimated values.

It is clear from Table 1 that the proposed EM algorithm outperforms the M, SQ, and CF approaches with respect to either the log-likelihood value or the KS statistic. The time series plots of iterations and fitted pdf to the histogram for returns of CAC shares are depicted in Figures 3. We emphasize that both of log-likelihood and KS statistics reporets in Table 1 are evaluated using . Also the fitted PDF to the histograms in Figure 3 are drawn using when the PDF parameters are estimated through the EM algorithm.

## 4 Conclusion

We derive an identity for -stable random variable that is scale-location normal mixture representation. Based on this representation, we propose some EM algorithm to estimate the parameters of -stable distribution. The proposed EM algorithm works very good for admissible ranges of the parameters, i.e., , , , and

. The steps of the proposed EM algorithm are: expectation, maximization, conditionally maximization, and profile log-likelihood maximization. Simulation studies reveal that the EM-based estimators of the tail thickness, skewness, and scale parameters outperform the corresponding maximum likelihood (ML) estimators when scale parameter is large. Also, the EM-based estimator of the scale parameter outperforms the corresponding ML estimator when scale parameter is small. This is not surprising since

computes the evaluated ML estimators not the exact ML estimators. The sample quantile and empirical characteristic function approaches were eliminated by competitions since the EM approach outperforms them. The simulations reveal that proposed EM algorithm is robust with respect to the initial values. Three sets of real data are used to demonstrate that the proposed EM estimators are close to that of the ML in the sense of goodness-of-fit measures including the log-likelihood and Kolmogorov-Smirnov statistics. A great advantage of the proposed EM algorithm over the ML method is that the proposed EM algorithm can be applied to estimate the parameters of mixture of -stable distributions. Further, since representation (4) can be adopted for the multivariate case, the proposed EM can be applied for the multivariate -stable distribution and hence is an appropriate approach for modelling the returns of the dependent assets which have multivariate -stable distribution. Two above privileges of the proposed EM algorithm can be considered as a possible future work.

Appendix 1: Proof of Proposition 1.1

Let and denote two independent -stable random variables. Define . We have:

 φY(t)= Eexp{jt[σ(1−β)1αU+σsgn(β)|β|1αV+μ0−σβtan(πα2)]} = Eexp{jtσ(1−|β|)1αU}Eexp{jtσsgn(β)|β|1αV+jtμ0−jtβσtan(πα2)} = exp{−(1−|β|)|σt|α−|β||σt|α[1−jsgn(tβ)tan(πα2)]+jtμ0−jtβσtan(πα2)} = exp{−|σt|α[1−jsgn(t)βtan(πα2)]+jtμ0−jtβσtan(πα2)},

where in above, to arrive at the expression after third equality, we use this fact that if , then . The expression after the last equality is the chf of . Now, with taking account into the fact that if , then can be represented as a Gaussian scale mixture model, i.e., where , see [25, p. 20]. Finally, set , , and to see that . The proof is complete.

Appendix 2: Proof of Proposition 1.2

Suppose and . Define , to see that

 P(R≤r)= ∫∞0P(E≤rp)fP(p)dp=1−∫∞0exp{−rp}fP(p)dp,

where the last integral in above, i.e., the Laplace transform of random variable is ; for , see [25, pp. 15]. This means that and consequently

 1√Rd=1W, (10)

where . On the other hand, as noted in Appendix 1, we can write where and are independent. It follows from (10) that

 S√2Ed=NW.

The result follows.

Appendix 3: Proof of Proposition 1.3

As the main part of the CM-step, implementing the SEM algorithm requires to simulate from posterior distribution of given and . For this purpose, we use the Metropolis-Hasting algorithm. As the candidate, we use the Weibull distribution with the shape parameter . Hence, the acceptance rate , becomes

 Awi=min⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩1,wnewiexp{−(y∗∗iwnewi)22}w(t)iexp{−(y∗∗iw(t)i)22}⎫⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎬⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎭.

Employing the Metropolis-Hasting algorithm, has a little chance for acceptance in each iteration when gets large. Therefore, we use a rejection-acceptance sampling scheme in th cycle of the SEM algorithm at the -th iteration of the proposed EM algorithm to generate from the posterior distribution of given and ; for . For this, we notice that and the pdf is bounded by some value independent of , i.e.,

 fY∗∗i|Wi(y∗∗i|wi)≤exp{−0.5}√2π|y∗∗i|.

So, the rejection-acceptance sampling scheme to generate from is given by the following.

1. Generate a sample from , say .

2. Generate a sample from uniform distribution over

, say .

3. If , then accept as a realization of ; otherwise, start from step 1.

Appendix 3: Evaluating

At -th iteration of the proposed EM algorithm, for th observed value , define three matrices such as A, B, and C. Elements of matrix A are independent realizations from and elements of matrix B are coming from . Assume that and , respectively, are the th element of the th row of matrices A and B, then the th element of the th row of matrix C, i.e., is . Now,

 E(t)0i= K∑r=1K∑c=1CrcBrcK2, E(t)1i= K∑r=1K∑c=1ArcCrcBrcK2, E(t)2i= K∑r=1K∑c=1A2rcCrcBrcK2.

Also, is approximated as

 fY(yi∣∣α(t),β(t),σ(t),μ0(t))≈ K∑r=1K∑c=1CrcK2.

It should be noted that the constant K must be large enough. Using , the quantities ; for and , are approximated very accurately.

## References

•  Broniatowski, M., Celeux, G., and Diebolt, J. (1983). Reconnaissance de Mélanges de densités par un algorithme dápprentissage probabiliste, Data analysis and informatics, 3, 359-374.
• 

Buckle, D. J. (1995). Bayesian inference for stable distributions,

Journal of the American Statistical Association, 90(430), 605-613.
•  Celeux, G. and Diebolt, J. (1985). The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for mixture problem, Computational statistics quarterly, 2 (1), 73-82.
•  Chambers, J., Mallows, C., and Stuck, B. (1976). A method for simulating stable random variables, Journal of the American Statistical Association, 71, 340–344.
•  Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society Series B, 39, 1-38.
•  Ip, E. H. S. (1994). A stochastic EM estimator for handling missing data, Unpublished Doctoral dissertation, Department of Statistics, Stanford University.
•  Kogon, S. M. and Williams, D. B. (1998). Characteristic function based estimation of stable parameters, In: Adler R, Feldman R, Taqqu M, Eds. A Practical Guide to Heavy Tailed Data. Boston: Birkhäuser, 311-338.
•  Little, R. J. A., and Rubin, D.B. (1983). Incomplete data, in Encyclopedia of Statistical Sciences, S. Kotz and N.L. Johnson, eds., Vol. 4, John Wiley, New York, pp. 46-53.
•  Liu, C. and Rubin, D. B. (1994). The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence, Biometrika, 81, 633-648.
•  Mandelbrot, B. B. (1963). The Variation of Certain Speculative Prices. Journal of Business, 36, 394-419.
•  Mandelbrot, B. B. and Hudson, R. L. (2007). The Misbehavior of Markets: A Fractal View of Financial Turbulence, Basic Books, Annotated Edition.
•  McCulloch, J. H. (1986). Simple consistent estimators of stable distribution parameters. Communications in Statistics-Simulation and Computation, 5, 1109-1136.
•  McLachlan, G. J. and Krishnan, T. (2008). The EM Algorithm and Extensions, second edition, John Wiley.
•  Meng, X. L. and Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework, Biometrika, 80, 267-278.
•  Mittnik, S. and Paolella, M. S. (2003). Prediction of financial downside risk with heavy tailed conditional distributions, In: Rachev, S.T. (Ed.), Handbook of Heavy Tailed Distributions in Finance, Elsevier Science, Amsterdam.
•  Nolan, J. P. (1998). Parameterizations and modes of stable distributions, Statistics and Probability Letters, 38, 187-195.
•  Nolan, J. P. (2001). Maximum likelihood estimation of stable parameters, In Barndorff-Nielsen, O. E., Mikosch, T., and Resnick, I. (Eds.), Lévy Processes: Theory and Applications, 379-400, Birkhöuser, Boston.
•  Nolan, J. P. (2003). Modeling financial distributions with stable distributions, Volume 1 of Handbooks in Finance, Chapter 3, pp. 105–130. Amsterdam: Elsevier.
•  Nolan, J. P. (2013). Multivariate elliptically contoured stable distributions: theory and estimation, Computational Statistics, 28(5), 2067-2089.
•  Ortobelli, S., Rachev, S. T., and Fabozzi, F. J. (2010). Risk management and dynamic portfolio selection with stable Paretian distributions. Journal of Empirical Finance, 17(2), 195-211.
•  R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
•  Rachev, S. T. (2003). Handbook of Heavy Tailed Distributions in Finance. Amsterdam: Elsevier.
•  Rachev, S. T., Menn, C., and Fabrizzi, F. J. (2005). Fat-tailed and Skewed Asset Return Distributions: Implications for Risk Management, Portfolio Selection, and Option Pricing, John Wiley, Hoboken, New Jersey.
•  Rachev, S. T. and Mittnik, S. (2000). Stable Paretian Models in Finance, John Wiley, Chichester.
•  Samorodnitsky, G. and Taqqu, M. S. (1994). Stable Non-Gaussian Random Processes: Stochastic Models and Infinite Variance, Chapman and Hall, London.
•  Zolotarev, V. M. (1986). One-Dimensional Stable Distributions, American Mathematical Society, Providence, R. I.