1 Introduction
Increasing triangular maps are a recent construct in probability theory that can transform any source density to any target probability density [3]. The KnotheRosenblatt transformation [30; 18],[36, Ch.1]
, gives a heuristic construction of an increasing triangular map for transporting densities that is
unique (up to null sets) [3]. These transformations provide a unified framework to study popular neural density estimation methods like normalizing flows
[33; 32; 29][26; 14; 17; 35; 19] which provide a tractable method for evaluating a probability density [15]. Indeed, these methods are becoming increasingly attractive for task of multivariate density estimation in unsupervised machine learning.
This work is devoted to studying the properties of triangular flows that learn increasing triangular transformations when the target density is a heavytailed distribution. Heavy tailed analysis studies the phenomena governed by large movements and encompasses both statistical inference and probabilistic modelling [28]. Indeed, heavytail analysis is extensively used in diverse applications like financial riskmodelling wherein the financial returns and riskmanagement calculations require heavytailed analysis [5; 10; 21], in datanetworks where heavytailed distributions are observed for file sizes, transmission rates, transmission duration and network traffic [24; 8; 20], and in modelling insurance claim sizes and frequencies in order to set premiums efficiently to quantify the risk to the company [5; 10].
Specifically, we study triangular flows to represent multivariate heavytailed elliptical distributions often used for modeling financial data and in the theory of portfolio optimization. Indeed, the basis of modern portfolio optimization relies on the Gaussian distribution hypothesis
[23; 34; 31]. However, as demonstrated by multiple studies [9; 12; 16], Gaussian distribution hypothesis cannot be justified for financial modelling and elliptical distributions are the suggested alternative particularly because they allow to retain certain desirable practical properties of normal distribution.
We begin our exposition in §3 where we show that in onedimension, the density quantile functions of the source and the target probability density precisely characterizes the slope of an (unique) increasing transformation. Subsequently, we give an exact characterisation of degree of heavytailedness of a distribution based on the asymptotic properties of the density quantile function. This allows us to clearly characterize the properties of an increasing transformation required to push a source density to any target density with varying tail behaviour respectively. Finally, we make precise the connection between the asymptotics of the density quantile function and existence of higherorder moments of a distribution. We use this to give a precise rate (which accounts for the relative heaviness of source and target densities) at which an increasing transformation must grow to capture the tail behaviour of the target density.
In §4, we extend these results for higher dimensions. We define multivariate heavytailed distributions as distributions whose marginals are heavy tailed in all directions and show that any increasing triangular map from a lighttailed distribution to a heavytailed distribution must have all diagonal entries of the Jacobian matrix (and hence all eigenvalues and the determinant) to be unbounded. We discuss the implications of our findings for neural density estimation in §5. We highlight the tradeoff between choosing an appropriate source density and the “complexity” of the transformation required to learn a target density. We provide all the proofs in §A.
We summarize our main contributions as follows: 1) We show that density quantiles precisely capture the properties of a pushforward transformation, 2) We relate the properties of density quantiles to existence of functional moments and tailproperties allowing us to provide asymptotic rates for transformations required to capture heavytailed behaviour, 3) We reveal properties of density quantiles for certain classes of distributions both for one dimensions and higherdimensions that might be of independent interest, 4) We precisely study the properties of increasing maps required to capture heavytailed behaviour, 5) We reveal the tradeoff between choosing a “complex” source density and an “expressive” transformation for representing target densities and its implications for flow based models.
2 Preliminaries
Consider two probability density functions
and (with respect to the Lebesgue measure) over the source domain and the target domain , respectively. There always exists a deterministic transformation (cf. [Ch.1, 36]) such that for all (measurable) set ,(1) 
Specifically, by using the change of variables formula, i.e. , a diffeomorphic function
can push forward a base random variable
to a target random variable such that is the push forward of i.e. , where is the absolute value of the determinant of the Jacobian of .Fortuitously, it is always possible to construct such a transformation : we call a mapping triangular if its th component only depends on the first variables . The name “triangular” comes from the fact that the Jacobian of is a triangular matrix function. We call increasing if for all , is an increasing function of .
Theorem 1 ([3]).
For any two densities and over , there exists a unique (up to null sets of ) increasing triangular map so that .
Before proceeding further, let us first give an example of a construction of an increasing triangular transformation to help better understand Theorem 1. This example will subsequently form the basis of our theoretical exposition in the paper.
Example 1 (Increasing Rearrangement).
Let and be univariate probability densities with distribution function and , respectively. One can define the increasing map such that , where is the quantile function of :
(2) 
Indeed, if , one has that . Also, if , then . Theorem 1 is a rigorous iteration of this univariate argument by repeatedly conditioning (a construction popularly known as the KnotheRosenblatt transformation [30; 18]). Note that the increasing property is essential for claiming the uniqueness of . ^{1}^{1}1For instance, if is symmetric, then both and push to itself.
Thus, triangular mappings constitute an appealing function class to learn a target density. Indeed, many recent generative models in unsupervised machine learning are precisely special cases of this approach [15]. In this paper, we characterize the properties of such increasing triangular mappings required to learn a target density that is heavytailed from a source density .
3 Properties of Univariate Transformations
Increasing Rearrangement is a unique increasing transformation between two densities. (cf. Example 1). Conveniently, we can analyze the slope of this transformation analytically. For a probability density over a domain , let
denote the cumulative distribution function of
, be the quantile function given by and be the density quantile function with a functional form as . It is further given by the reciprocal of the derivative of the quantile function i.e. . The slope of such that where are two densities is given by the ratio of the density quantile function of the source and the target distribution respectively, i.e.(3) 
Theorem 2.
Let and be two densities and be an increasing map such that . If the density quantile of shrinks to 0 at a rate slower than the density quantile of , then is asymptotically unbounded.
Clearly, the density quantile functions precisely characterize the slope of an increasing transformation. Moreover, we can further characterise the asymptotic properties of an increasing transformation using the asymptotics of density quantiles of distributions following [27; 1] who proved that the limiting behaviour of any density quantile function as (corresponding to right tails) is:
(4) 
where implies that is a finite constant.
Example 2.
Let and . Then, such that is given by:
(5) 
where is the error function. Furthermore, and and hence, . Similarly, for :
(6) 
and . Therefore, .
Additionally, we can also define the limiting behaviour of the quantile function as as:
(7) 
The parameter is called the tailexponent and defines the (right) tailarea of a distribution. Indeed, if for two distributions with tail exponents and , if , the corresponding distribution has heavier tails relative to the other. The tail exponent allows us to define distributions based on their degree of heaviness as follows:
(8) 
Following [27], if
the distributions are shorttailed, e.g. Uniform distribution. Here, we further show that a distribution has support bounded from above if and only if the right density quantile function has tailexponent
.Proposition 1.
Let be a density with as . Then, where i.e. has a support bounded from above.
corresponds to a family of distributions for which all higher order moments exist. However, these distributions are relatively heavier tailed than shorttailed distributions and were termed as medium tailed distributions in [27]
, e.g. normal and exponential distribution. Additionally, for
, a more refined description of the asymptotic behaviour of quantile function can be given in terms of the shape parameter :(9) 
determines the degree of heaviness in medium tailed distributions; the smaller the value of , the heavier the tails of the distribution e.g. exponential distribution has , and normal distribution has . Based on this, we can define
(10) 
Therefore, we have and . Finally, heavy tailed distributions have e.g. Cauchy and . We next give a precise characterization of asymptotic properties of a diffeomorphic transformation from one distribution to the other with varying tail behaviour in the following corollary of Theorem 2:
Corollary 1.
Let be a source distribution, be a target distribution and be an increasing transformation such that . Then,

if , the slope of converges asymptotically to 0

if , the slope of converges asymptotically to a finite constant

if , the slope of asymptotically diverges to infinity

if then,

if , the slope of diverges to infinity asymptotically

if , the slope of converges to a finite constant

if , the slope of converges to zero asymptotically

Let us give another example to underscore the importance of using density quantiles to define tailbehaviour and the increasing pushforward transformations.
Example 3 (Pushing uniform to normal).
Let be uniform over and be normal distributed. The unique increasing transformation
(11) 
where is the error function, which was Taylor expanded in the last equality. The coefficients and . We observe that the derivative of is an infinite sum of squares of polynomials. Both uniform and normal distributions are considered “lighttailed” (all their higher moments exist and are finite). However, an increasing transformation from uniform to normal distribution has unbounded slope. Density quantile functions help us to reveal this precisely: and i.e. Normal distribution is “relatively” heavier tailed than uniform distribution explaining the asymptotic divergence of this transformation. Indeed, the density quantiles help to provide a more granular definition of heavytailedness based on the tailexponent and shape exponent .
Given a random variable , the expected value of a function can be written in terms of the quantile function as: . This allows us to draw a precise connection between the degree of heavytailed ness of a distribution as given by the density quantile functions (and tail exponent ) and the the existence of the number of its higherorder moments.
Theorem 3.
Let be a distribution with as . Then, exists and is finite for some iff .
Corollary 2.
If is a distribution with as and as ^{2}^{2}2This condition takes the lefttail into account as well. Note that it is not necessary for both tails to have the same behaviour and our analysis will extend to such cases easily.. Then, exists and is finite iff .
Based on these observations, we can equivalently define heavytailed distributions as follows:
Definition 1.
A distribution with compact support i.e. where and is said to be heavy tailed if for all , exists and is finite, but for , is infinite or does not exist.
Definition 2.
A distribution with tail exponent is said to be heavy tailed if for all , exists and is finite, but for , is infinite or does not exist.
Definition 3 (heavy tailed distributions).
A distribution with tailexponent is heavy tailed with degree with if for all , exists and is finite, but for all , is infinite or does not exist.
These definitions allow us to finally give the rate an increasing transformation must emulate to exactly represent tailproperties of a target density given some source density. Concretely,
Theorem 4.
Let be a heavy distribution, be a heavy distribution and be a diffeomorphism such that . Then for small , .
4 Properties of Multivariate Transformations
We recall that there exists a unique bijective increasing triangular map that transforms a highdimensional joint source density to a target density . The th component of is given by where is the cdf of the conditional distribution of given . Analogous to our results in §3, we shall characterise the properties of by studying the properties of required to push to with varying tail properties. Evidently, for a triangular transformation , the determinant of the Jacobian i.e. is just the product of the diagonals where each diagonal entry is given by .
Hence, by being able to characterize the properties of the conditional density quantiles, we shall be able to characterize the properties of . However, we first define the notion of tailbehaviour in multivariate distributions: A multivariate distribution is heavytailed if the marginal distributions in every direction on the (highdimensional) sphere are heavytailed i.e. a distribution is said to be heavytailed if for all vectors where and , . This definition automatically implies that the univariate random variable is heavy tailed. In particular, we will consider the class of elliptical distributions since they admit the same tailbehaviour in every direction.
Definition 4 (Elliptical distribution, [4]).
A random vector is said to be elliptically distributed denoted by with if and only if there exists a , a matrix with maximal rank and, a nonnegative random variable , such that , where the random vector is independent of and is uniformly distributed over the unit sphere , and is the cumulative distribution function of the variate .
For ease in developing our results, we consider only full rank elliptical distributions i.e. . The spherical random vector produces elliptically contoured density surfaces due to the transformation . The density function of an elliptical distribution as defined above ^{3}^{3}3The density function is defined if the density is absolutely continuous, which happens if the generating variate is absolutely continuous is given by:
(12) 
where the function is related to , the density of , by the equation: , here is the area of a unit sphere. Thus, the tail properties of a random variable with an elliptical distribution i.e. is determined by the generating random variable . Indeed, is heavytailed in all directions if the univariate generating random variable is heavytailed. Define
(13) 
Intuitively, is the th order moment of when is integervalued. We can now generalize Definition 3 to the multivariate case: the distribution is heavy iff is finite for all iff is heavy. Similarly, from Definition 2 one has that is heavy iff is heavy.
Elliptical distributions have certain convenient properties: an affinely transformed elliptical random vector is elliptical. Let and . Consider the transformed vector where . Then, . In particular, if is a permutation matrix then is also elliptically distributed and belongs to the same locationscale family as . Additionally, the marginal and conditional distributions of an elliptical distributions are also elliptical.
Lemma 1 (Marginal distributions of an elliptical distribution are elliptical, [11]).
Let where and partition such that . Let and be the corresponding partitions of and respectively. Then, .
Lemma 2 (Conditional distributions of an elliptical distribution are elliptical, [4; 11]).
Let where and is p.s.d with and where . Further, let and partition such that . Let and be the corresponding partitions of and respectively. If the conditional random vector exists then
where where .
In our next result we analyze the tailproperties of the conditional distributions of heavytailed elliptical distribution.
Proposition 2.
Under the same assumptions as in Lemma 2, if is heavy, then the conditional distribution of is heavy.
We now state the main result of this section: an increasing triangular map that transforms a lighttailed elliptical distribution to a heavytailed elliptical distribution has all diagonal entries of to be unbounded.
Theorem 5.
Let and be two random variables with elliptical distributions with densities and respectively where is heavier tailed than . If is an increasing triangular map such that , then all diagonal entries of are unbounded. Moreover, the determinant of the Jacobian of is also unbounded.
Theorem 6.
Let be a random variable with density function that is lighttailed and be a target random variable with density function that is heavytailed. If pushes forward to i.e. such that , then there exists an index such that is unbounded.
We next give a general result for any transformation.
Proof.
We provide the proof in Appendix A. ∎
5 Triangular Flows and Approximation
Neural density estimation methods like autoregressive models [25; 2; 19; 35] and normalizing flows [29; 33; 32] provide a tractable way to evaluate the exact density and are increasingly being used for the purpose of multivariate density estimation in machine learning [17; 6; 7; 26; 35; 14]. Invariably, these methods aim to learn a bijective, invertible and increasing transformation from a simple, known source density to a desired target density such that the inverse and the Jacobian are easy to compute.
As discussed in [15], most autoregressive models and normalizing flows at their core implement exactly a triangular map i.e. they learn a transformation such that . [17] considered the affine map . [14] alternatively replaced the affine form of [17]
with a univariate neural network and
[15] proposed to use the primitive of a univariate sumofsquares of polynomials as the approximation of an increasing function. [13] and [26] proposed efficient implementations of these methods based on affine maps using binary masks that compute all the parameters of the transformation in a single pass of the network. Interestingly, all these methods compose several triangular maps in the hope that this composition of functions is “complex” enough to approximate any generic triangular map.Here, we argue that there are two ways to learn a target density : First, as we discussed in Section 34, we can choose an appropriate base density such that the resulting triangular transformation from to can be represented using simpler triangular transformations that are Lipschitz continuous, or, we can choose a base density from a simple class of distributions (say Gaussian with identity covariance) and learn a “complex” triangular transformation via composition of several triangular transformations. However, we note here that composing several triangular maps is essentially tantamount to converting the source density to more complex base density such that the final composition of the triangular map transforms this to the target density. We propose an alternative that can allow for simpler transformations to target density by considering a more flexible class of source densities than the Gaussian distribution. One way would be to parametrize the source density as an elliptical distribution where the generating variate is from a studentt distribution with degrees of freedom where is a parameter to be learned along with the parameters of the transformation. It is evident from our exposition in Section 4 that such a model would require a Lipschitz continuous triangular map when learning heavytailed distributions.
Related question is how well one can approximate a distributions with another distribution , where is heavy tailed, is light tailed, but is not flexible enough to push tails, so that is heavier than . One can consider several similarity metrics for this task. Let us start with Wasserstein distance, most natural for the flow theory. We wish to find a lower bound on the approximation error for , where is a set of all measures on with marginals and on the first and the second factors. Here we have two situations. First, assume that doesn’t have the th moment. Then, because is lighter than , . And because doesn’t have th moments, . The only possibility to have a finite distance in this case is exactly if , and the distance is zero. Alternatively, assume that has the th moment. Then is finite. The measure is Radon (as a finite measure on a secondcountable space). Because the set of finitesupported Radon measures is dense in the metric space of Radon measures with distance [22], one can approximate arbitrary well with any finitesupported Radon measure . Hence, varying one can find arbitrary close to .
One can do similar analysis with divergence. The existence of the integral depends on tail behaviour of both distributions among other properties ^{4}^{4}4For example, topological properties: for KLdivergence, must be contained in . . However, if the integral exists and is finite, one writes it as an integral over a compact set plus an integral over tails, and make the latter as small as wanted by simply increasing . Hence, heaviness determines the possibility of approximation. In case when the target distribution has very heavy tails, the approximation reduces to representation problem, and one needs a flexible enough transformation in order to make as heavy as .
6 Conclusion
We studied the properties of triangular flows for capturing heavytailed distributions. We showed that density quantile functions play a central role in characterising the properties of increasing pushforward maps. Subsequently, we proved that for a triangular flow all eigenvalues of the Jacobian are unbounded when pushing a lighttailed distribution to a heavytailed distribution. We revealed properties of quantile and density quantile functions and related it to both existence of functional moments and heavytailedness of a distribution that can be of independent interest. As a byproduct of our analysis, we demonstrated the tradeoff between the complexity of source distribution and expressively of transformations in capturing target densities in generative models. This work opens the possibility for multiple future directions: an interesting line of research will be to conduct holistic experiments to systematically analyze our results for example by considering flexible source distributions with parameters that can be trained along with the model. Another direction will be to analyze general flows that are nontriangular. Further, application of these insights into realworld problems of finance, insurance and networks might also be interesting.
References
 [1] DF Andrews et al. A general method for the approximation of tail areas. The Annals of Statistics, 1(2):367–372, 1973.
 [2] Yoshua Bengio and Samy Bengio. Modeling HighDimensional Discrete Data with MultiLayer Neural Networks. In NeurIPS, 1999.
 [3] Vladimir Igorevich Bogachev, Aleksandr Viktorovich Kolesnikov, and Kirill Vladimirovich Medvedev. Triangular transformations of measures. Sbornik: Mathematics, 196(3):309–335, 2005.

[4]
Stamatis Cambanis, Steel Huang, and Gordon Simons.
On
the theory of elliptically contoured distributions.
Journal of Multivariate Analysis
, 11(3):368–385, 1981.  [5] Stuart Coles, Joanna Bawa, Lesley Trenner, and Pat Dorazio. An introduction to statistical modeling of extreme values, volume 208. Springer, 2001.
 [6] Laurent Dinh, David Krueger, and Yoshua Bengio. NICE: Nonlinear independent components estimation. In ICLR workshop, 2015.
 [7] Laurent Dinh, Jascha SohlDickstein, and Samy Bengio. Density estimation using Real NVP. In ICLR, 2017.
 [8] Diane E Duffy, Allen A McIntosh, Mark Rosenstein, and Walter Willinger. Analyzing telecommunications traffic data from working common channel signaling subnetworks. Computing Science and Statistics, pages 156–156, 1993.
 [9] Ernst Eberlein, Ulrich Keller, et al. Hyperbolic distributions in finance. Bernoulli, 1(3):281–299, 1995.
 [10] Paul Embrechts, Claudia Klüppelberg, and Thomas Mikosch. Modelling extremal events: for insurance and finance, volume 33. Springer Science & Business Media, 2013.
 [11] Gabriel Frahm. Generalized elliptical distributions: theory and applications. PhD thesis, Universität zu Köln, 2004.
 [12] Gabriel Frahm, Markus Junker, and Alexander Szimayer. Elliptical copulas: applicability and limitations. Statistics & Probability Letters, 63(3):275–286, 2003.
 [13] Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. MADE: Masked autoencoder for distribution estimation. In ICML, pages 881–889, 2015.
 [14] ChinWei Huang, David Krueger, Alexandre Lacoste, and Aaron Courville. Neural Autoregressive Flows. In ICML, 2018.
 [15] Priyank Jaini, Kira A Selby, and Yaoliang Yu. SumofSquares Polynomial Flow. International Conference of Machine Learning (ICML), 2019.
 [16] Markus Junker and Angelika May. Measurement of aggregate risk with copulas. The Econometrics Journal, 8(3):428–454, 2005.
 [17] Diederik P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. Improved variational inference with inverse autoregressive flow. In NeurIPS, pages 4743–4751, 2016.
 [18] Herbert Knothe et al. Contributions to the theory of convex bodies. The Michigan Mathematical Journal, 4(1):39–52, 1957.
 [19] Hugo Larochelle and Iain Murray. The neural autoregressive distribution estimator. In AIStats, pages 29–37, 2011.
 [20] W Leland. Statistical analysis of high time resolution ethernet lan traffic measurements. Proc. 25th Interface, 1993.
 [21] Yannick Malevergne and Didier Sornette. Extreme financial risks: From dependence to risk management. Springer Science & Business Media, 2006.
 [22] R Mardare, P Panangaden, and G Plotkin. Free complete Wasserstein algebras. Logical Methods in Computer Science, 14(3), 2018.
 [23] Harry Markowitz. Portfolio selection. The journal of finance, 7(1):77–91, 1952.
 [24] Krishanu Maulik, Sidney Resnick, and Holger Rootzén. Asymptotic independence and a network traffic model. Journal of Applied Probability, 39(4):671–699, 2002.
 [25] Radford M. Neal. Connectionist learning of belief networks. Artificial Intelligence, 56(1):71–113, 1992.
 [26] George Papamakarios, Theo Pavlakou, and Iain Murray. Masked autoregressive flow for density estimation. In NeurIPS, pages 2338–2347, 2017.
 [27] Emanuel Parzen. Nonparametric Statistical Data Modeling. Journal of the American statistical association, 74(365):105–121, 1979.
 [28] Sidney I Resnick. Heavytail phenomena: probabilistic and statistical modeling. Springer Science & Business Media, 2007.
 [29] Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. In ICML, 2015.
 [30] Murray Rosenblatt. Remarks on a multivariate transformation. The annals of mathematical statistics, 23(3):470–472, 1952.
 [31] William F Sharpe. A simplified model for portfolio analysis. Management science, 9(2):277–293, 1963.
 [32] Esteban G. Tabak and Cristina V. Turner. A family of nonparametric density estimation algorithms. Communications on Pure and Applied Mathematics, 66(2):145–164, 2013.
 [33] Esteban G. Tabak and Eric VandenEijnden. Density estimation by dual ascent of the loglikelihood. Communications in Mathematical Sciences, 8(1):217–233, 2010.
 [34] James Tobin. Liquidity preference as behavior towards risk. The review of economic studies, 25(2):65–86, 1958.
 [35] Benigno Uria, MarcAlexandre Côté, Karol Gregor, Iain Murray, and Hugo Larochelle. Neural autoregressive distribution estimation. The Journal of Machine Learning Research, 17(1):7184–7220, 2016.
 [36] Cédric Villani. Optimal Transport: Old and New, volume 338. Springer, 2008.
Appendix A Proofs
See 1
Proof.
Let .
(14)  
(15)  
(16) 
A similar argument proves the reverse direction. ∎
See 3
Proof.
(17)  
(18) 
The first integral is finite because the integrand is nonsingular. For the second integrand, we can use the asymptotic behaviour of the quantile function by choosing very close to . Subsequently, the integral exists and converges if and only if . ∎
See 4
Proof.
The integral
(19)  
(20) 
converges for , because is heavy. Because is a univariate diffeomorphism, it is a strictly monotone function. Without loss of generality, let us consider to be positive increasing function and investigate the right asymptotic. Consider the function for big positive . Assume there is a sequence , such that and the sequence does not converge to zero. In other words, there exists , such that for any there exists , such that . Let us work with this infinite subsequence . Because is increasing function, we can estimate its integral from the left by its left Riemannian sum with respect to the sequence of points :
Since, is heavy, the series on the right hand side diverges as a left Riemannian sum of a divergent integral. But this contradicts to the convergence of the integral on the left hand side. Hence, our assumption was wrong and for all sequences we have: . Hence, which leads to the desired result that . ∎
See 2
Proof.
The density function of the conditional is proportional to , where and is the same function as for the distribution of (see [4]). Then, because it is a dimensional elliptical distribution, it is heavy iff for all . It is given that is heavy, which is equivalent to . Because , one gets that , hence is heavy. ∎
See 5
Proof.
We need to show that
(21) 
Thus, all we need to show is that the generating variate of the conditional distribution for the target is heavier than the generating variate of the conditional distribution of the source. From §3, we know that the tail exponent in the asymptotics of the density quantile function characterize the degree of heaviness. Furthermore, we also know that asymptotical behaviour of the density quantile function is directly related to the asymptotical behaviour of the density function since if is a density function, the cdf is given by , the quantile function therefore is and the density quantile function is the reciprocal of the derivative of the quantile function i.e. . Hence, we need to ensure that asymtotically, the density of is heavier than the density of . Using the result of the cdf of a conditional distribution as given by Eq.(15) in [4] we have that asymptotically
(22) 
where is the dimension of the partition that is being conditioned upon. Since, is heavier tailed than , we have that is heavier tailed than for all the conditional distributions. ∎
See 6
Proof.
We will prove this using contradiction; assume that . Assume for simplicity that . Therefore, we have
(23)  
(24) 
Since, is heavy tailed, such that
(25)  
(26) 
We have
(27)  
(28)  
(29)  
(30)  
(31) 
Partition into sets , i.e. such that if , and , then there exists at least one index such that . Subsequently, we can rewrite the integral above as