# The multivariate tail-inflated normal distribution and its application in finance

This paper introduces the multivariate tail-inflated normal (MTIN) distribution, an elliptical heavy-tails generalization of the multivariate normal (MN). The MTIN belongs to the family of MN scale mixtures by choosing a convenient continuous uniform as mixing distribution. Moreover, it has a closed-form for the probability density function characterized by only one additional “inflation” parameter, with respect to the nested MN, governing the tail-weight. The first four moments are also computed; interestingly, they always exist and the excess kurtosis can assume any positive value. The method of moments and maximum likelihood (ML) are considered for estimation. As concerns the latter, a direct approach, as well as a variant of the EM algorithm, are illustrated. The existence of the ML estimates is also evaluated. Since the inflation parameter is estimated from the data, robust estimates of the mean vector and covariance matrix of the nested MN distribution are automatically obtained by down-weighting. Simulations are performed to compare the estimation methods/algorithms and to investigate the ability of AIC and BIC to select among a set of candidate elliptical models. For illustrative purposes, the MTIN distribution is finally fitted to multivariate financial data where its usefulness is also shown in comparison with other well-established multivariate elliptical distributions.

## Authors

• 13 publications
• 2 publications
06/17/2020

### Family of mean-mixtures of multivariate normal distributions: properties, inference and assessment of multivariate skewness

In this paper, a new mixture family of multivariate normal distributions...
03/09/2021

### Multivariate tail covariance for generalized skew-elliptical distributions

In this paper, the multivariate tail covariance (MTCov) for generalized ...
06/21/2018

### Maximal skewness projections for scale mixtures of skew-normal vectors

Multivariate scale mixtures of skew-normal (SMSN) variables are flexible...
06/03/2019

### Unconstrained representation of orthogonal matrices with application to common principle components

Many statistical problems involve the estimation of a (d× d) orthogonal ...
10/21/2018

### Multiple Scaled Contaminated Normal Distribution and Its Application in Clustering

The multivariate contaminated normal (MCN) distribution represents a sim...
11/12/2020

### Some properties of the unified skew-normal distribution

For the family of multivariate probability distributions variously denot...
08/18/2021

### A New Asymmetric Copula with Reversible Correlations and Its Application to the EU Sovereign Debt Crisis

This paper proposes a novel asymmetric copula based upon bivariate split...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Statistical inference dealing with continuous multivariate data is commonly focused on the multivariate normal (MN) distribution, with mean vector and covariance matrix

, due to its computational and theoretical convenience. However, for many real phenomena, the tails of this distribution are lighter than required. This is often due to the presence of mild outliers

(Ritter, 2015, p. 4). Outliers are defined as “mild”, with respect to the MN distribution (which is the reference distribution), when they do not really deviate from the MN and are not strongly outlying, but rather they produce an overall distribution that is too heavy-tailed to be modeled by the MN (Mazza and Punzo, 2020, Morris et al., 2019, and Punzo and Tortora, 2019); for a discussion about the concept of reference (or target) distribution, see Davies and Gather (1993). Therefore, mild outliers can be modeled by means of a heavy-tailed elliptically symmetric (or elliptically contoured or, simply, elliptical) distribution embedding the MN distribution as a special case (Ritter, 2015, p. 79).

A classical way to define such a larger model is by means of the MN scale mixture (also known as MN variance mixture); it is a finite/continuous mixture of MN distributions on

obtained via a convenient discrete/continuous mixing distribution with positive support. The MN scale mixture simply alters the tail behavior of the MN distribution while leaving the resultant distribution elliptically contoured. This is made possible by the additional parameters of the mixing distribution which, in the scale mixture, govern the deviation from normality in terms of tail weight. Regardless of the mixing distribution considered, expressing a multivariate elliptical distribution as a MN scale mixture enables (among other): natural/efficient computation of the maximum likelihood (ML) estimates via expectation-maximization (EM) based algorithms (

Lange and Sinsheimer, 1993 and Balakrishnan et al., 2009), robust ML estimation of the parameters and based on down-weighting of mild outliers (Peel and McLachlan, 2000, Punzo and McNicholas, 2016, and Farcomeni and Punzo, 2019), and efficient Monte Carlo calculations (Relles, 1970, Andrews and Mallows, 1974, Efron and Olshen, 1978 and Choy and Chan, 2008). Moreover, robustness studies have often used MN scale mixtures for simulation and in the analysis of outlier models; see West (1984, 1987) and the references therein.

The MN scale mixture family includes many classical distributions, such as the multivariate (M; Lange et al., 1989 and Kotz and Nadarajah, 2004), obtained by using a convenient gamma as mixing distribution, the multivariate contaminated normal (MCN) of Tukey (1960, see also , ), defined under a convenient Bernoulli mixing distribution (see, e.g., Yamaguchi, 2004 and Punzo et al., 2020), and the multivariate symmetric generalized hyperbolic (MSGH; McNeil et al., 2005) obtained if a generalized inverse Gaussian is considered as mixing distribution.

In the present paper, we add a new member to the MN scale mixture family: the multivariate tail-inflated normal (MTIN) distribution. It is obtained by using a convenient continuous uniform as mixing distribution. In Section 2.1 we show that the proposed distribution has a closed-form probability density function (pdf) which is characterized by a single additional parameter, with respect to and of the nested MN, governing the tail weight. We also make explicit the representations of the MTIN distribution as MN scale mixture (Section 2.2.1) and as elliptical distribution (Section 2.2.2). In Section 2.3 we provide the first four moments of the MTIN distribution, which are those of practical interest (see also Appendix B). We estimate the parameters by the method of moments (Appendix C) and maximum likelihood (ML; Section 3); for the latter, we illustrate two alternative algorithms/methods to obtain these estimates (a direct approach in Section 3.1 and an ECME algorithm in Section 3.2). Always for the ML method, in Section 4 we discuss the existence of the estimates. The ECME algorithm allows us to highlights how the ML estimates of and are robust in the sense that mild outliers are down-weighted in the computation of these parameters (see Section 5 for details). Two simulated data analyses are designed to compare the estimation methods discussed above in terms of parameter recovery and computational time required (Section 6.1), and to evaluate the appropriateness of AIC and BIC in selecting among a set of candidate elliptical models that can be considered as natural competitors of our MTIN distribution (Section 6.2). At last, we illustrate the proposed model by analyzing financial multivariate data related to four of the publicly owned companies considered by the Dow Jones index (Section 7); here, we also compare the performance of the MTIN distribution with other heavy-tailed elliptical distributions which are well-known in the financial literature. We conclude the paper with a brief discussion in Section 8.

## 2 Multivariate tail-inflated normal distribution

### 2.1 Probability density function

###### Definition 2.1 (Probability density function).

A -dimensional random vector is said to have a -variate tail-inflated normal distribution with mean vector , scale matrix , and inflation parameter , in symbols , if its pdf is given by

 fTIN(x;μ,Σ,θ)=2π−d2|Σ|−12θ[δ(x;μ,Σ)](d2+1)[Γ(d2+1,(1−θ)δ(x;μ,Σ)2)−Γ(d2+1,δ(x;μ,Σ)2)], (1)

where is the upper incomplete gamma function, is the determinant, and denotes the squared Mahalanobis distance between and (with as the covariance matrix).

Alternative formulations of (1) are given in Appendix A.

Theorem 2.1 proves that the limiting form of (1) as is the pdf of the -variate normal distribution with mean vector and covariance matrix ; in symbols .

###### Theorem 2.1 (Limiting normal form).

The pdf of can be obtained as a special case of (1) when ; in formula

 limθ→0fTIN(x;μ,Σ,θ)=fN(x;μ,Σ). (2)
###### Proof.

To prove the theorem we work on the limit, as , of

 1θ[Γ(d2+1,(1−θ)δ(x;μ,Σ)2)−Γ(d2+1,δ(x;μ,Σ)2)], (3)

which is the part of (1) depending on . The limit of (3), as , is of the form zero over zero. Using L’Hospital’s rule, such a limit becomes

 limθ→0(1−θ)d2[δ(x;μ,Σ)2](d2+1)exp(−δ(x;μ,Σ)2)=[δ(x;μ,Σ)2](d2+1)exp(−δ(x;μ,Σ)2). (4)

When (4) is substituted in (1) instead of (3), is obtained. ∎

### 2.2 Representations

In this section we show that, if , then its distribution has a twofold representation as MN scale mixture (Section 2.2.1) and as multivariate elliptically symmetric distribution (Section 2.2.2). The properties of these families are so implicitly inherited by our model.

#### 2.2.1 Multivariate normal scale mixture representation

For robustness sake, one of the most common ways to generalize the MN distribution is represented by the MN scale mixture (MNSM), also called MN variance mixture (MNVM), with pdf

 fMNSM(x;μ,Σ,θ)=∫Sh⊆IR>0fN(x;μ,Σ/w)h(w;θ)dw, (5)

where is the mixing probability density (or mass) function — with support — depending on the parameter(s) . The pdf in (5) is unimodal, elliptically symmetric, and guarantees tails heavier than those of the MN distribution (see, e.g., Barndorff-Nielsen et al., 1982, Fang et al., 2013, Section 2.6, Yamaguchi, 2004 and McLachlan and Peel, 2000, Section 7.4). The tail weight of is governed by .

Examples of distributions belonging to the MNSM family are: the multivariate contaminated normal (MCN), the multivariate (M), the multivariate symmetric generalized hyperbolic (MSGH), the multivariate symmetric hyperbolic (MSH), the multivariate symmetric variance-gamma (MSVG), and the multivariate symmetric normal inverse Gaussian (SNIG); for details about these special cases, as well as about the properties of the MNSM family, see McNeil et al. (2005, Section 3.2.1).

In Theorem 2.2 we show that can be represented as a MNSM by considering a convenient (continuous) uniform as mixing distribution.

###### Theorem 2.2 (Scale mixture representation).

The pdf in (1) can be obtained as a special case of the pdf in (5) by considering a uniform on as mixing pdf. In formula,

 fTIN(x;μ,Σ,θ)=∫11−θfN(x;μ,Σ/w)hU(w;1−θ,1)dw, (6)

where denotes the pdf of a uniform on .

###### Proof.

The pdf in (6) can be written as

 fTIN(x;μ,Σ,θ)=1θ(2π)d2|Σ|−12∫11−θwd2exp[−w2δ(x;μ,Σ)]dw. (7)

By noting that

 (8)

the proof of the theorem is straightforward. ∎

A more direct interpretation of Theorem 2.2 can be given by the hierarchical representation of as

 W∼U(1−θ,1)X|W=w∼Nd(μ,Σ/w), (9)

where denotes a uniform distribution on . This alternative way to see the MTIN distribution is useful for random generation and for the implementation of EM-based algorithms (as we will better appreciate in Section 3).

#### 2.2.2 Elliptical representation

As well-documented in the literature, the class of elliptical distributions has several good properties (Cambanis et al., 1981, Fang et al., 2013, Chapter 2.5 and Rachev et al., 2010, p. 357) and the class of MNSMs is contained therein (Bingham et al., 2002, p. 247, McNeil et al., 2005, p. 61 and Fang et al., 2013, p. 48). Therefore, the MTIN distribution is also elliptical and Theorem 2.3 presents its elliptical representation.

###### Theorem 2.3 (Elliptical representation).

If , then is elliptically distributed with the following stochastic representation

 X=μ+ΛUT√W, (10)

where is a matrix such that , is a -variate random vector uniformly distributed on the unit hypersphere with topological dimensions ,

is a random variable having a

distribution with degrees of freedom, and . Note that , , and are independent.

###### Proof.

To find the distribution of as defined by (10) we need to derive the pdf of the random variable

 R=T√W,

and then use the relationship (see Bagnato et al., 2017)

 (11)

The random variable has density

 fV(v;θ)=2θv31I(1,11−θ)(v), (12)

where is the indicator function on the set . By definition, the density of the product of the independent random variables and is given by

 fR(r;θ) = ∫IRfV(v;θ)fT(rv)1|v|dv (13) = 21−d2rd−1θΓ(d2)∫11−θvd2exp(−v2r2)dv.

By using (13) in (11), we obtain the pdf of the MTIN distribution in (1). ∎

### 2.3 Moments

In this section we provide some of the moments of the MTIN distribution. In addition to the mean, we give the first two moments of higher even order, i.e. the covariance matrix and kurtosis, which are helpful in assessing the influence of mild outliers on the distribution (Rachev et al., 2010, p. 307); indeed, these are the moments of practical interest for people using multivariate heavy-tailed elliptical distributions. As measure of kurtosis, as usual, we consider

 Kurt(X)=E{[(X−μ)′Σ−1(X−μ)]2},

where is a random vector having mean and covariance matrix (Mardia, 1970).

###### Theorem 2.4 (MTIN: mean vector, covariance matrix and kurtosis).

If , then

 E(X) = μ, (14) Var(X) = v(θ)Σ, (15) Kurt(X) = k(θ)d(d+2), (16)

where

 v(θ) = −log(1−θ)θ, (17) k(θ) = θ2(1−θ)log2(1−θ). (18)
###### Proof.

See Appendix B. ∎

Based on (15), the covariance matrix of the MTIN distribution is proportional to the covariance matrix of the nested MN distribution, with proportionality factor depending on . In particular, the function , given in (17) and graphically represented via a solid line in Figure 1, is increasing in .

Because of the values can assume, is an inflated version of , without limits about the degree of inflation that tends to when . However, if , then ; so, under this case, tends to .

According to (18), is proportional to the kurtosis of the (nested) MN distribution, with proportionality factor depending on . In particular, the function , given in (18) and graphically represented by a dashed line in Figure 1, is increasing in . Therefore, is greater than meaning that the MTIN distribution allows for leptokurtosis. An upper bound for does not exist since when . However, if , then ; so, under this case, tends to . There is another aspect to be noted from Figure 1. While is a smoothly increasing function of , is approximately flat on the left, increasing suddenly only when is close to 1. From a practical point of view, this means that we need an high -value to have a relevant excess kurtosis.

## 3 Maximum likelihood estimation

Several estimators of the parameters of the MTIN distribution may be considered. Among them, the maximum likelihood (ML) estimator is the most attractive because of its asymptotic properties (see, e.g., Pfanzagl and Hamböker, 1994, p. 206). Naturally, other estimation methods can be used; Appendix C gives details about the method of moments (MM). Below we discuss two possible ways to obtain the ML estimates of ; for their comparison, see Section 6.1.

### 3.1 Direct approach

Given a random sample (observed data) from , the ML estimation method is based on the maximization of the (observed-data) log-likelihood function

 l(Ψ) = nlog(2)−nd2log(π)−n2log|Σ|−nlog(θ)−(d2+1)n∑i=1log[δ(xi;μ,Σ)] (19) +n∑i=1log[Γ(d2+1,(1−θ)δ(xi;μ,Σ)2)−Γ(d2+1,δ(xi;μ,Σ)2)],

which does not admit a closed form solution. Furthermore, maximizing (19) with respect to is a constrained problem due to and . To make the maximization problem unconstrained, we apply the following reparameterization for and . According to the Cholesky decomposition we write

 Σ=Ω′Ω,

where is an upper triangular matrix with real-valued entries. As concerns we write

 θ=exp(γ)1+exp(γ),

where . According to the above parametrization, the log-likelihood function in (19) can be re-written as

 l(Ψ∗) = nlog(2)−nd2log(π)−n2log∣∣Ω′Ω∣∣−n{γ−log[1+exp(γ)]} (20) −(d2+1)n∑i=1log[δ(xi;μ,Ω′Ω)] +n∑i=1log{Γ[d2+1,δ(xi;μ,Ω′Ω)2[1+exp(γ)]]−Γ[d2+1,δ(xi;μ,Ω′Ω)2]},

where . Details about the first order partial derivatives of in (20), with respect to , and , are given in Appendix F. Operationally, we perform unconstrained maximization of with respect to via the general-purpose optimizer optim() for R (R Core Team, 2018), included in the stats package. Different algorithms, such as the Nelder-Mead or BFGS, can be used for maximization. They can be passed to optim() via the argument method. Once the maximum is determined, the estimates for and are simply obtained by back-transformations. For a comparison, in terms of parameter recovery and computational times, between Nelder-Mead and BFGS algorithms applied to the direct ML estimation of , see Section 6.1.

### 3.2 ECME algorithm

To circumvent the complexity of the direct ML approach discussed in Section 3.1, we could consider the application of the expectation-maximization (EM) algorithm (Dempster et al., 1977), which is the classical approach to find ML estimates for distributions belonging to the MNSM family. However, for the MTIN, the M-step is not well-defined (for the motivations we give in Appendix D), and the EM algorithm fails to converge. This is not a serious problem because ML estimates of can be still found, much more efficiently, by using the expectation-conditional maximization either (ECME) algorithm (Liu and Rubin, 1994). The higher efficiency of the ECME algorithm, with respect to the EM algorithm, has been also discussed with reference to another member of the MNSM family, the M distribution (see Liu and Rubin, 1994, 1995 and McLachlan and Krishnan, 2007, Section 5.8).

The ECME algorithm is an extension of the expectation-conditional maximum (ECM) algorithm (Meng and Rubin, 1993) which, in turn, is an extension of the EM algorithm (see McLachlan and Krishnan, 2007, Chapter 5, for details). The ECM algorithm replaces the M-step of the EM algorithm by a number of computationally simpler conditional maximization (CM) steps. The ECME algorithm generalizes the ECM algorithm by conditionally maximizing on some or all of the CM-steps the incomplete-data log-likelihood. As for the EM and ECM algorithms, the ECME algorithm monotonically increases the likelihood and reliably converges to a stationary point of the likelihood function (see Meng and van Dyk, 1997 and McLachlan and Krishnan, 2007, Chapter 5). Moreover, Liu and Rubin (1994) found the ECME algorithm to be nearly always faster than both the EM and ECM algorithms in terms of number of iterations, and that it can be faster in total computer time by orders of magnitude (see also McLachlan and Krishnan, 2007, Chapter 5).

For the application of any variant of the EM algorithm, in the light of the hierarchical representation of the MTIN distribution given in (9), it is convenient to view the observed data as incomplete. The complete data are , where the missing variables are defined so that

 Xi|Wi=wi∼Nd(μ,Σ/wi), (21)

independently for , and

 W1,…,Wni.i.d.∼U(1−θ,1). (22)

Because of this conditional structure, the complete-data likelihood function can be factored into the product of the conditional densities of given the and the marginal densities of , i.e.

 Lc(Ψ) = n∏i=1fN(xi;μ,Σ/wi)hU(wi;1−θ,1) (23) =

Accordingly, the complete-data log-likelihood function can be written as

 lc(Ψ)=log[Lc(Ψ)]=l1c(μ,Σ)+l2c(θ), (24)

where

 l1c(μ,Σ)=−nd2log(2π)−n2log|Σ|+d2n∑i=1log(wi)−12n∑i=1wiδ(xi;μ,Σ), (25)

and

 l2c(θ)=−nlog(θ)+n∑i=1log[1I(1−θ,1)(wi)]. (26)

The ECME algorithm iterates between three steps, one E-step and two CM-steps, until convergence. The two CM-steps arise from the partition of as , where and . These steps, for the generic th iteration of the algorithm, , are detailed below.

#### 3.2.1 E-step

The E-step requires the calculation of

 Q(Ψ|Ψ(r))=Q1(μ,Σ|Ψ(r))+Q2(θ|Ψ(r)), (27)

the conditional expectation of given the observed data , using the current fit for . In (27) the two terms on the right-hand side are ordered as the two terms on the right-hand side of (24). As well-explained in McNeil et al. (2005, p. 82), in order to compute we need to replace any function of the latent mixing variables which arise in (25) and (26) by the quantities , where the expectation, as it can be noted by the subscript, is taken using the current fit for , . To calculate these expectations we can observe that the conditional pdf of satisfies , up to some constant of proportionality. In detail,

 f(wi|xi;Ψ) ∝ f(wi,xi;Ψ) (28) ∝ wd2+1−1iexp[−δ(xi;μ,Σ)2wi]1I(1−θ,1)(wi) ∝ 1η(xi;Ψ)fG(wi;d2+1,δ(xi;μ,Σ)2)1I(1−θ,1)(wi),

where

 fG(w;α,β)=βαΓ(α)wα−1exp(−βw)

denotes the pdf of a gamma distribution with parameters

and and

This means that has a doubly-truncated gamma distribution (Coffey and Muller, 2000), on the interval , with parameters and , whose pdf is given in (28); in symbols

 Wi|Xi=xi∼DTG(1−θ,1)(d2+1,δ(xi;μ,Σ)2). (29)

Now, the functions arising in (25) and (26) are , and . However, there is no need to compute the expectation of since the term is not related with the parameters; moreover, there is no need to compute the expectation of because we do not use to update . Thanks to (29) we have that

 EΨ(r)(Wi|Xi=xi)=w(r)i, (30)

where

 w(r)i= 2[Γ(d2+2,(1−θ(r))δ(xi;μ(r),Σ(r))2)−Γ(d2+2,δ(xi;μ(r),Σ(r))2)]Γ(d2+1)η(xi;Ψ(r))δ(xi;μ(r),Σ(r)) = 2δ(xi;μ(r),Σ(r))[Γ(d2+2,(1−θ(r))δ(xi;μ(r),Σ(r))2)−Γ(d2+2,δ(xi;μ(r),Σ(r))2)][Γ(d2+1,(1−θ(r))δ(xi;μ(r),Σ(r))2)−Γ(d2+1,δ(xi;μ(r),Σ(r))2)]. (31)

Then, by substituting with in the last term on the right-hand side of (25), we obtain

 Q1(μ,Σ|Ψ(r))=−n2log|Σ|−12n∑i=1w(r)iδ(xi;μ,Σ),

where the terms constant with respect to and are dropped.

#### 3.2.2 CM-step 1

The first CM-step, at the same iteration, requires the calculation of as the value of that maximizes , with respect to and , with fixed at . Theis maximization is easily implemented if we note that is a weighted log-likelihood, with weights , of independent observations from , respectively. So, the updates for and are

 μ(r+1)=1n∑i=1w(r)in∑i=1w(r)ixi (32)

and

 Σ(r+1)=1nn∑i=1w(r)i(xi−μ(r+1))(xi−μ(r+1))′. (33)

#### 3.2.3 CM-step 2

In the second CM-step, at the same iteration, is chosen to maximize the observed-data log-likelihood function , as given by (19), with fixed at . In detail, from (19) we have

 l({Ψ(r+1)1,θ})= nlog(2)−nd2log(π)−n2log∣∣Σ(r+1)∣∣−nlog(θ)−(d2+1)n∑i=1log[δ(xi;μ(r+1),Σ(r+1))] +n∑i=1log[Γ(d2+1,(1−θ)δ(xi;μ(r+1),Σ(r+1))2)−Γ(d2+1,δ(xi;μ(r+1),Σ(r+1))2)]. (34)

Thus, the second CM-step of the ECME algorithm chooses to maximize (34) with and . This implies that is a solution of the equation

 (35)

As a closed form solution for the root of (35) is not analytically available, we use the uniroot() function in the stats package of R to perform the numerical one-dimensional search of .

## 4 Existence of the ML estimates

In this section we demonstrate the existence of the ML estimates for , , and . With this aim, we consider the solutions from the ECME algorithm of Section 3.2. In line with the CM-steps of the algorithm, we first demonstrate the existence of and , given (Theorem 4.1), and then the existence of given and (Theorem 4.2).

###### Theorem 4.1.

Given and a sample , if , then there exists , being the set of positive-definite symmetric matrices, such that

 l(ˆμ,ˆΣ,θ)≥l(μ,Σ,θ),for % every (μ,Σ)∈IRd×M+d×d. (36)
###### Proof.

The pdf of can be written as

 fTIN(x;μ,Σ,θ)=|Σ|−12g[δ(x;μ,Σ)],

where

 g(t)=2π−d2θt(d2+1)[Γ(d2+1,(1−θ)t2)−Γ(d2+1,t2)].

To prove the theorem we use Proposition 2.4 in Cuesta-Albertos et al. (2008). In particular, it is sufficient to verify that the following assumptions are valid:

1. there exists a strictly decreasing sequence which converges to zero, such that for every ;

2. if , then there exists such that ;

3. is continuous on .

While c is easily verifiable, a and b need some attention. To verify a we can consider the first derivative of , i.e.

 g′(t)=−2π−d2θt(d2+2)[Γ(d2+2,(1−θ)t2)−Γ(d2+2,t2)].

It is straightforward to verify that for all ; so, a is satisfied for any strictly decreasing sequence which converges to zero.

In order to verify b, consider . Then

 rd2+1g(r) = 2π−d2θ[Γ(d2+1,(1−θ)t2)−Γ(d2+1,t2)] (37) = 2π−d2θ(∫r20exp(−x)xd2+1dx−∫(1−θ)r20exp(−x)xd2+1dx).

When , both the integrals in (37) tend to , so that . Then, b is verified for . ∎

###### Theorem 4.2.

Given , , and a sample , then there exists such that

 (38)
###### Proof.

According to (7), and given and , the log-likelihood function results

 l(μ,Σ,θ)=l(θ)=−nd2log(2π)−n2log|Σ|+n∑i=1log{1θ∫11−θwd2exp[−w2δ(xi;μ,Σ)]dw}. (39)

It is straightforward to realize that is continuous in . If the limits of , when tends to the boundaries and of the support, are both finite, then the log-likelihood function has a maximum and the theorem is proved. When we have

 limθ→1l(θ)=−nd2log(2π)−n2log|Σ|+n∑i=1log{∫10wd2exp[−w2δ(xi;μ,Σ)]dw}.

When , using the results in (3)–(4), we obtain

 limθ→0l(θ)=−nd2log(2π)−n2log|Σ|−12n∑i=1δ(xi;μ,Σ). (40)

The limit in (40) is constant and corresponds to the log-likelihood function of a -variate normal distribution computed in and . ∎

## 5 Some notes on robustness

The MTIN model allows to obtain improved (in terms of robustness) ML estimates of and , with respect to those provided by the nested (reference) MN model, in the presence of mild outliers. In detail, the influence of the observations is reduced (down-weighted) as the squared Mahalanobis distance increases. This is in line with the -estimation (Maronna, 1976) which uses a decreasing weighting function to down-weight the observations with large values. To be more precise, according to (32) and (33), and can be respectively viewed, since is estimated from the data by ML, as an adaptively weighted sample mean and sample covariance matrix, in the sense used by Hogg (1974), with weights given in (31). This approach, in addition to be a type of -estimation, follows Box (1980) and Box and Tiao (2011) in embedding the reference MN model in a larger model with one or more parameters (here ) that afford protection against non-normality.

For each possible value of the dimension , consider the weights in (31) as a bivariate (weighting) function of the squared Mahalanobis distance and of the inflation parameter , i.e.

 w(δ,θ;d)=2[Γ(d2+2,(1−θ)δ2)−Γ(d2+2,δ2)]δ[Γ(d2+1,(1−θ)δ2)−Γ(d2+1,δ2)]. (41)

It is straightforward to show that the weighting function in (41) is positive. An example of graphical representation of , in the case , is provided in Figure 2.

As concerns the limiting behavior of as , by using the fundamental theorem of calculus and l’Hspital’s rule we obtain

 limδ→∞w(δ,θ;d)=1−θ. (42)

From (42) we note that, when observations are very far from the bulk of the data () their weight in the computation of and depends by . If is close to zero (i.e. if the MTIN approaches the MN distribution), then the weight is close to one; indeed, this is what we expect under the normal case. In the other cases, the weight linearly decreases as increases, i.e. as the MTIN departs from the MN distribution. As concerns the limiting behavior of as , by using the mathematics considered for the limit in (42), we have

 limθ→0w(δ,θ;d)=1. (43)

From (43) we can observe that if , then regardless of , and this happens because we are operationally working with a MN distribution (see Theorem 2.1).

## 6 Simulation studies

In this section, we investigate various aspects related to our model through simulation studies. While the simulation study of Section 6.1 is considered to evaluate parameter recovery of the proposed estimation procedures, in addition to the computational times required by these procedures, the simulation study of Section 6.2 aims to compare classical model selection criteria in selecting among a set of well-established elliptical distributions, with differing number of parameters, including the MTIN. The whole analysis is conducted in R and the code for density evaluation, random number generation, and fitting for the MTIN distribution is available, in the form of an R package named mtin, from http://docenti.unict.it/punzo/Rpackages.htm.

### 6.1 Parameter recovery and computational times

In this section we compare two methods to estimate the parameters of the MTIN distribution, namely the method of moments (MM), discussed in Appendix C, and ML, discussed in Section 3. These methods are compared with respect to parameter recovery and to the computational time required. We use three approaches to compute ML estimates: direct approach with the Nelder-Mead algorithm, direct approach with the BFGS algorithm, and ECME algorithm (cf. Section 3). All the algorithms used to obtain ML estimates are initialized by the solution provided by the method of moments.

In this study we consider three experimental factors: the dimension (), the sample size (), and the inflation parameter (). The values of are chosen unbalanced on the right simply to have scenarios with more excess kurtosis (refer to the considerations made at the end of Section 2.3).

For each combination of , and , we sample one hundred datasets from a MTIN distribution with a zero mean vector () and an identity scale matrix (), for a total of datasets. On each generated dataset, we fit the MTIN distribution with the four approaches cited above. Computation is performed on a Windows 10 PC, with Intel i7-8550U CPU, 16.0 GB RAM, using R 64-bit, and the elapsed time (in seconds) is computed via the system.time() function of the base package. Parallel computing, using 4 cores, is considered.

We start evaluating parameter recovery by focusing on ; we limit the investigation to this parameter because, in analogy with other normal scale mixtures, the parameter(s) governing the tail-weight is (are) the most difficult to be estimated. However, although not reported here, the ranking of the estimation methods we obtain with respect to are roughly preserved when we move to and .

In Figure 3 we report the box-plots of the differences , for bias evaluation, while in Figure 4 we report the box-plots of the squared differences , for mean square error (MSE) evaluation. Each of the nine plots in these figures refers to a particular pair and shows box-plots each summarizing (for every and used method) the behavior of the considered differences with respect to the available 100 replications. As expected, the differences under evaluation improve as increases. Interestingly enough, the differences improves when either or increase. Regardless of the scenario and difference considered, BFGS and ECME algorithms for ML estimation work comparably and represent the best approaches. The worst approach is MM. The very similar behavior between BFGS and ECME algorithms can be further corroborated by looking at Table 1, where log-likelihood values are averaged over the 100 replications of each triplet . Here we can note how these algorithms return exactly the same average results, which are always slightly better than those provided by the Nelder-Mead algorithm, and this is true regardless of the triplet considered. To be more precise, looking at the whole set of obtained results (not reported here for the sake of space), BFGS and ECME algorithms provide, on each fitted model, the same maximized log-likelihood value.

As concerns the computational times, for each combination of and the estimation method considered, Figure 5 reports the box-plots of the elapsed times in each of the replications obtained marginalizing with respect to and .

The MM provides the best (lowest) elapsed times, regardless of the true underlying . Among the algorithms considered to obtain ML estimates, the BFGS is the best one, followed by the Nelder-Mead and ECME algorithms. The elapsed times from the ECME algorithm seem to be a decreasing function of , tying those of the BFGS algorithm when . Another interesting result can be noted by looking at Figure 6 where we report, analogously to Figure 5, the box plots of the elapsed times only in the case , when varying the sample size.