Blurred truncation; Cauchy distribution; heavy tails; finite moments; poly-t distribution
Heavy-tailed distributions that generalize the normal distribution abound, with the generalized hyperbolic distribution perhaps being the most all-inclusive, e.g. Barndorff-Nielsen and Stelzer (2005). However, in a finite world, heavy tails cannot continue for ever. A simple example is the Cauchy distribution (e.g. Johnson et al, 1995), somewhat alarmingly exemplified as the distribution of bullet holes made on a wall by a rotating gun. In the real world, the wall would be finite, giving sharp truncation to the Cauchy distribution. Even with an arbitrarily long wall, the bullets would not travel arbitrarily far because of air resistance and gravity, giving censoring or what one might call ‘blurred truncation’, a thinning of the tail. In general, one would expect blurred truncation of long-tailed distributions to be common in a finite world. Thus the distribution of the heights of athletes might have a long tail, but physical limits eventually impose a cut-off: the tallest man ever known of had a height of ‘only’ 2.72 metres.
Thus ideally a fat-tailed distribution with an eventual cutoff is required, because even a distribution with infinite thin tails has some probability for ‘impossible’ values of the random variate. For example, concerning athlete heights, a thin-tailed distribution has a small probability for the impossibly tall, or even for negative heights. However, estimating a cutoff parameter is difficult, and so it seems best to settle for a distribution that is thin in the extreme tails and so has only a tiny probability for impossible occurrences. The need for this more realistic modelling of long-tailed data is the motivation for this work.
There is a second reason for seeking fat-tailed distributions that are thin in the extreme tail. Besides their physical implausibility, heavy-tailed distributions have some moments undefined or infinite, which causes problems for some analyses, although usually the problem is only a minor one. Thus long-tailed distributions model financial returns such as FTSE returns much better than the normal approximation used in deriving the famous Black-Scholes option pricing formula. In fitting financial indices such as the FTSE, given successive values , the ‘logged return’ can be fitted to a t-distribution. The actual return is
. Computing the expected return under some conditions then requires the existence of the means of exponentials of the random variable. Hence Cassidyet al (2010) found that an integral needed for pricing European options (which was essentially the truncated moment-generating function) then diverged and truncation of the t-distribution was needed to avoid an infinite result.
In general, many workers have cited infinite moments as a problem and have introduced and studied truncated long-tailed distributions, e.g. Nadarajah (2011). Apart from financial calculations of the kind described, where all moments must exist, in general the first four sample moments are useful for characterising distributions, especially via measures such as skewness and kurtosis. They can also be used for estimating model parameters by the method of moments, or finding good starting values for methods of estimation such as maximum-likelihood. This last cannot be done if these moments are undefined or infinite in the model.
To construct a fat-tailed distribution with all moments existing, for the bullet-holes example given, assume a Raleigh distribution with survival function for the distance along the wall a bullet can travel in either direction. The resulting pdf would be , where the constant can be found analytically. It is here called the NC(1) distribution, and interpolates between a Cauchy distribution, as , and a normal distribution as . The t-distribution also does so, although the t-distribution can have even longer tails than the Cauchy. Here, however, all moments exist. Further, the moment generating function also exists.
This distribution is a special case of the ‘double-t’ distribution. The poly-t distribution pdf is a product of pdfs, and is discussed by Drèze (1977), who gives an account of the origin of the name. More recently Nadarajah (2009) has given the properties of the double-t distribution. However, the special cases given here are of value because they are much more mathematically tractable than the general case, have fewer parameters, and have proved useful in fitting data.
The following sections describe the tractable types of double-t distribution. Finally some data fits are briefly discussed and conclusions drawn. Many calculations have been checked via purpose-written programs, and one such was used for the data fitting, using the NAG (Numerical Algorithms Group) library of routines.
2 The double-t distribution and its special cases
The special case of the poly-t distribution considered by Nadarajah (2009) has pdf
where the mean has been taken as zero. This is the product of two t-pdfs with a common mean but different spreads and numbers of degrees of freedom. Specialising to the case wheregives what one might call the N-t distribution, the general case of the distributions considered here. This is a 4-parameter distribution where the long tails of the t-distribution can eventually become thin. Generalizing the Cauchy distribution to a t-distribution, a generalization of the NC(1) pdf would have pdf
for , a t-type distribution with a Raleigh survival function for events to be accepted. We can write , where is the number of degrees of freedom of a t-distribution, and (1) then generalizes the t-distribution. However, it can show even more extreme tail behaviour when , before the eventual thin tail. We can also take , when the distribution becomes short-tailed. It is unimodal if . With , the excess kurtosis is . This N-t distribution is then the short-tailed distribution of Tiku and Vaughan (1999). With the affine transformation , the distribution of has 4 parameters ().
The constant cannot be found analytically, and requires the evaluation of an integral, but the distribution can still be used for data fitting, evaluating by numerical quadrature. This does not become a large computational burden even when is a function of covariates, because only needs to be evaluated once per likelihood iteration as it is not a function of the mean . The parameters both control tail behaviour.
This distribution is fitted to data by maximum-likelihood later. Some special cases are mathematically more tractable, the case where , given in appendix B, and where is an integer. Calling these NC(n) distributions, for pdfs that are the product of normal and Cauchy pdfs to the power , the introductory example is an NC(1) distribution. The NC(2) distribution corresponds to a normal pdf times a t pdf, and so on. It was found as expected that the NC(1) distribution is needed for fitting very long-tailed data, but in practice the NC(2) distribution is arguably more useful, certainly for non-financial datasets. The NC(n) distributions are the main focus here, especially where is 1 or 2, but other cases are given in appendix A, and a multivariate distribution in appendix B.
3 Properties of the NC(n) distributions
3.1 Probabilistic genesis and pdf
The approach here can be thought of as taking a long-tailed distribution and censoring or thinning its tails according to the survival function of a second distribution, to obtain a new distribution with some desired properties. One probabilistic basis for the NC(1) distribution has already been alluded to: random numbers originate from a Cauchy distribution and are accepted with a probability . Alternatively, a random number could originate from a normal distribution, and be accepted with probability . Note that in the bullets on the wall example the probability of acceptance varies with the line-of-flight distance , which for a gun a distance from the wall is given by . Hence a Raleigh distribution for still gives a term .
Another probabilistic basis is to consider a normally-distributed random variable where the inverse standard deviation is a random variable with pdf, a Gaussian truncated below at , i.e. for , where is the normal distribution function. Then the pdf is
where . The integral over the pdf (2) is given in Gradshteyn and Ryzhik (2015) as result 3.466 (1), albeit in different notation.
It is the exclusion of high variances
that gives the blurred truncation to the Cauchy distribution. With the generalized hyperbolic distribution, the generalizing distribution is the generalized inverse Gaussian distribution, so this does not produce blurred truncation.
The form (2) is useful for derivations, but for practical use it is convenient to reparameterize the distribution, setting and , to obtain the pdf
where and . Then as the Cauchy distribution is obtained, and as the distribution becomes standard normal. The random variable will then be an affine transformation of . Without this reparameterization, the standard 1-parameter distribution would interpolate between Cauchy as and a normal distribution of zero variance as .
In general, here is analogous to the number of degrees of freedom of the t-distribution. Figure 1 shows the NC(1) pdf for several values of ,
and figure 2 shows the t-distribution with degrees of freedom with the closest NC(1) distribution, as measured by Hellinger distance.
This was selected for illustration as the distributions are identical with and infinite, but it was not evident how close they would be in between. Except in the extreme tails, the distributions look very similar. The NC(1) distribution has a slightly higher peak, because there is less probability in the extreme tails. It has slightly thinner shoulders, and is fatter in the tail before the tail finally thins. The excess kurtosis is 6 for the t-distribution, and 1.50 for the NC(1) distribution, showing that although the distributions look similar, the NC(1) tail is eventually thin.
The ‘standard’ 1-parameter form for is described here, with for the 3-parameter distribution of with arbitrary mean and variance.
Moments exist for
. The odd moments are all zero by symmetry. Evaluation of the integrals for second and fourth momentsand can be done given (3) and the normal integral. Then
The excess kurtosis goes from infinity at to zero at , as shown in figure 3.
The moment generating function is the integral
from which moments may be derived by expanding the first term in the exponent. The mgf exists for .
where denotes the generalized incomplete gamma function. When the first argument is half-integer, as here, this can be written in closed form, in terms of special functions (Chaudhry et al, 1996). It would however be easier and probably much quicker to evaluate the integral numerically.
3.3 Distribution function
This cannot be written in closed-form, but must be obtained by integrating (3). A useful result follows from reversing the order of the integrations over and . Then we have that the distribution function is given by
This form is arguably less convenient for numerical quadrature, but when the integral can be done, to give the results
These results assist numerical integration, which can now be done over a smaller range.
3.4 Random numbers
These can be generated in several ways using the rejection method. A simple and fairly efficient method is to generate a random variable from a normal distribution, to generate from a truncated normal distribution, and then to set . Robert (1995) used an exponential majorizing distribution in a rejection method to generate random numbers from the truncated normal distribution. For completeness, the full method including his algorithm is given here.
, uniformly-distributed random numbers;
generate the shifted exponential random variable ;
if restart from step 2;
generate , a normally-distributed random number;
return the random number as .
With observed, where , the pdf of is
Write , then on making observations , the log-likelihood is
the likelihood derivatives are
and the second derivatives are
This is an instance where the NC(1) distribution is more tractable than the t-distribution, because differentiating the corresponding constant for the t-distribution would require the computation of the psi (digamma) function.
3.6 NC(n) distributions
For distributions of the form
for integer , the constant can be found analytically. Such distributions can be useful in fitting e.g. financial data, and so the derivation of the constant is outlined. The generalizing function is taken as
for . This is chosen to give an integral of form (6). Then
where the last result follows on integrating by parts times.
It remains to find the constant , where . Using integration by parts yet again on , we obtain
Then for integration by parts gives the recurrence relation
Hence it is possible to find the constants from and (7) and fit these distributions to data. The corresponding number of degrees of freedom is . Again for practical use.
Random numbers can be generated from the pdf
using the rejection method, by generating a random number from the distribution, rescaling it, and accepting it with probability . In detail:
Generate a random number from the distribution, e.g. using the method of Bailey (1994);
Rescale to ;
Generate a uniform random number and accept if , else return to step 1.
This algorithm can also be used for the NC(1) distribution. Timing tests showed that for the NC(1) distribution, the random-number algorithm given specifically for that distribution took only 54.2% as much time as the general algorithm here. This was based on a range of values of from 0.01 to 0.99.
3.7 The NC(2) distribution
This distribution cannot attain the Cauchy-like tails of the NC(1) distribution, but has proved useful in fitting the datasets described later; even financial datasets are not as long-tailed as the Cauchy.
The pdf is
with defined as in the previous section, i.e.
The first two even moments can be easily derived and are:
The excess kurtosis as a function of is shown in figure 3. Note that as the kurtosis is infinite for NC(1) and NC(2) distributions.
Random numbers can be generated as previously described for the NC(n) distribution.
4 Fits to data
The fitting was done using a purpose-written Fortran program and using the NAG library function minimizers and quadrature routines. Derivatives were not required by the gradient-based function minimisers used, and identical results were obtained with different minimisers. The model parameters were the mean , scale factor , and tail parameters and/or . Iteration converged with no problem using the sample mean and standard deviation as starting values for and starting estimates of at middle-of the range values, e.g. . However, random restarts were made by perturbing the parameter values from their fitted values and re-minimising, to check that the global maximum of the log-likelihood had been attained.
Several publicly-available datasets were fitted. There is little point showing histograms with the fitted model, as the fits are visually very similar to that of the t-distribution, as can be seen from figure 2. First, the heights in centimetres of 100 female athletes, from Cook and Weisberg (1994). Next, the 1080 monthly-average heights of the Rio Negro river at Manaus (Sternberg, 1987), the 10939 logged returns on the S & P 500 index to 20 may 2021, the 9013 logged returns on the FTSE-100 to 8 April 2021, and the 13863 logged returns on the Nikkei to 20 May 2021. The first two datasets are physical measurements, and the last three are financial. It is known that day-to-day returns are not independent, so a better fit to data can be obtained by modelling this dependence, but this was ignored here for simplicity.
Results from fitting the , NC(1), NC(2) and N-t distributions are shown in table 1.
|N-t||348.76||0.166 (.171)||0.71 (1.747)|
|River height||1974.45||-||6.43 (1.234)|
|N-t||1974.10||0.309 (.236)||1.34 (.918)|
|S & P 500||15143.32||-||2.95 (.097)|
|NOD (2)||15142.90||0.0038 (.004)||-|
|N-t||15141.37||0.0124 (.007)||2.73 (.156)|
|N-t||12778.65||0.0087 (.014)||3.40 (.211)|
|N-t||21386.32||0.0504 (.007)||1.88 (.152)|
with standard error for the 5 datasets studied. The models are the t-distribution, Normal-Cauchy, NOD of degree two and normal-t distribution.
The parameters are of no interest for our purpose and are not shown. All distributions except the N-t have 3 parameters and so log-likelihoods can be compared. The N-t distribution has 4 parameters.
Overall, the NC(1) distribution fits comparably with the t-distribution, sometimes better and sometimes worse (better for the height of athletes data and the Nikkei). The NC(2) distribution fits better than the t-distribution in 4 out of the 5 cases. The N-t distribution generalizes the t-distribution and so must always fit better in the sense that the log-likelihood will be higher. The improvement is large for the Nikkei, where the log-likelihood increases by over 24. The large size of the dataset enables small differences in fit to be detected. Because is somewhat lower for the N-t fit, it seems that the Nikkei returns favour a very long-tailed distribution, but one where the tails do eventually attenuate. It is clear that the N-t distribution gives a significantly better fit than the t-distribution, so that .
The aim here was to show ‘noninferiority’, i.e. that the NC(n) distributions could data comparably to the t-distribution, sometimes slightly better and sometimes slightly worse. This is the case here. The tentative conclusion is the NC(2) distribution may be more useful in practice than the NC(1) distribution. Although it does not allow such extreme tail behaviour, it is better at modelling the less extreme tail behaviour that usually occurs. However, the NC(1) distribution allows extreme tail behaviour and will always be adequate.
It is interesting to examine the excess kurtosis of the three financial indices, where with the t-distribution, the predicted kurtosis would be infinite. Data were divided into 10 tranches, and randomized, to obtain sample estimates , while predictions were made from the fitted N-t model. The results were: S & P 500: , FTSE-100 , Nikkei . This shows that the predicted kurtoses are in fair agreement with the sample kurtoses, confirming that the new models give a better description of the data than the t-distribution.
A class of fat-tailed distributions that are normal in the extreme tails has been described, with a number of generalizations and related distributions. Figure 2 shows the NC(1) distribution to be very similar in shape to the t-distribution, except in the extreme tails, and it has been demonstrated to fit datasets very comparably to the t-distribution. The data-fitting led to the recognition that the NC(2) distribution also looks promising for practical work, as it fitted 4 out of 5 datasets better than the t-distribution.
Compared to the t-distribution, the computation of the pdf is of comparable difficulty, both requiring special functions for the normalization constant. Moments are simply computed, just being somewhat messier than those of the t-distribution. The moment generating function also exists, and can be found as an integral. Random numbers can be easily generated, which is vital for many computations, e.g. Markov-chain Monte-Carlo. The distribution function must be found by numerical quadrature, whereas for the t-distribution it is available as a special function, the incomplete beta function.
One might ask, given that we have the t-distribution, why we should be interested in a similar distribution, where the computations are slightly more complex. There are at least three reasons. Firstly, from the arguments given in the introduction, we expect that these distributions will model real-world behaviour better in the extreme tails, which could be useful for predicting the probability of extreme events.
Secondly, for sensitivity analysis: to check robustness of results to model assumptions we can repeat our analysis with the NC(1) distribution replacing the t-distribution, and we should still reach similar conclusions.
Thirdly, an advantage of the NC(1) distribution as a replacement for the t-distribution is that all the moments and the moment-generating function exist. This has two benefits. One is that the method of moments can be used to find good starting values for a maximum-likelihood fit to the data, e.g. by using figure 3. The main benefit however is for cases where the mean of the exponent of the random variable must be computed. This happens with long-tailed data such as financial return data, where logarithms have been taken and the distribution is still long-tailed. The mean of the exponent then gives the mean of the returns themselves. This situation also occurs where for example BMI (body mass index) or other medical statistics might be fitted by taking logarithms. It will be the mean and variance of BMI itself, not its logarithm, that are ultimately required.
Future work in this area could be gaining experience on fitting the distribution (3) in different application areas, and investigating further related distributions. Random number generation is a practically useful topic, and the generation schemes given here could doubtless be honed.
Appendix A: a multivariate distribution
5.1 Multivariate distributions
Writing , where is the mean and the covariance matrix of the normal distribution, and with all variances scaled by as before, the p-dimensional pdf becomes
To obtain the desired pdf, take . Then for , the bivariate case, , and . The required pdf is therefore
For the trivariate case, and the constant is , where
In general, taking as the covariance matrix of a normal distribution, the covariance matrix is
Appendix B: other cases
5.2 A blurred t-distribution with zero degrees of freedom
When , a different distribution is obtained. The modified Bessel function of the second kind can be written as
Setting we obtain the pdf
This gave similar fits to (2) for the datasets fitted here, but is less tractable. Moments can be computed in terms of the Bessel functions etc.
5.3 Asymmetric distributions
To introduce asymmetry to the NC(1) distribution, a simple method is to use a 2-piece distribution:
where are scale factors. The pdf of this type of distribution and its first derivative are continuous at . The mean can be shown to be
where is the exponential integral .
For , even moments about are simply derived, e.g. the second moment is times the second moment of (3). Thus the moments can be derived in terms of special functions, albeit messily. The probability that is , so random numbers can be generated by finding which half of the range lies in, then proceeding as for the NC(1) distribution.
5.4 Survival distributions
The two distributions used in this approach can of course be survival distributions, defined for . Tebbens et al (2001) used a distribution to model the tail behaviour of the volume of seamounts.
Let . This distribution interpolates between the Lomax distribution when and the exponential when , and has three parameters, on letting be rescaled. The constant
where the incomplete gamma function is .
The author would like to thank Professors Philip Scarf and Ian McHale for helpful comments.
-  Bailey, R. W. (1994). Polar generation of random variates with the t-distribution, Mathematics of Computation, 62 (206), 779-781.
-  Barndorff-Nielsen O. E. and Stelzer, R. (2005). Absolute moments of generalized hyperbolic distributions and approximate scaling of normal inverse Gaussian Lévy processes, Scandinavian Journal of Statistics 32 (4), 617-637.
-  Cassidy, D.T., Hamp, M.J. and Ouyed, R. (2010). Pricing European options with a log Student’s t-distribution: a Gosset formula. Physica A, 389, 5736-5748.
-  Chaudhry, M. A., Temme, N. M. and Veling, E. J. M. (1996). Asymptotics and closed form of a generalized incomplete gamma function, Journal of Computational and Applied Mathematics, 67, 371-379.
-  Cook, R. D. and Weisberg, S. (1994). An Introduction to Regression Graphics. Wiley, New York.
Drèze, J. H. (1977). Bayesian regression analysis using poly-t densities, Journal of Econometrics6, 329-345.
-  Gradshteyn I. S. and Ryzhik I. M. (2015). Table of integrals, series, and products, 8th ed., Academic Press, Waltham.
-  Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995), Continuous univariate distributions vol. 1, Wiley, New York.
-  Nadarajah, S. (2009). The product density distribution arising from the product of two student’s PDFs, Statistical papers 50 605-615.
-  Nadarajah, S. (2011). Making the Cauchy work, Brazilian Journal of Probability and Statistics,25 (1),99-120.
-  Robert, C. P. Simulation of truncated normal variables (1995). Statistics and Computing 5, 121-125.
-  Sternberg, H. O’R. (1987) Aggravation of floods in the Amazon river as a consequence of deforestation? Geografiska Annaler, 69A, 201-219.
-  Tebbens, S. F., Burroughs, S. M., Barton, C. C. and Naar, D. F. (2001). Statistical self-similarity of hotspot seamount volumes modeled as self-similar criticality, Geophysical Research Letters, 28 (14), 2711-2714.
-  Tiku, M. L. and Vaughan, D. C. (1999). A family of short-tailed symmetric distributions. Technical report, McMaster University, Canada.