# Statistical properties of large data sets with linear latent features

Analytical understanding of how low-dimensional latent features reveal themselves in large-dimensional data is still lacking. We study this by defining a linear latent feature model with additive noise constructed from probabilistic matrices, and analytically and numerically computing the statistical distributions of pairwise correlations and eigenvalues of the correlation matrix. This allows us to resolve the latent feature structure across a wide range of data regimes set by the number of recorded variables, observations, latent features and the signal-to-noise ratio. We find a characteristic imprint of latent features in the distribution of correlations and eigenvalues and provide an analytic estimate for the boundary between signal and noise even in the absence of a clear spectral gap.

## Authors

• 1 publication
• 5 publications
09/28/2018

### Weak detection of signal in the spiked Wigner model

We consider the problem of detecting the presence of the signal in a ran...
06/26/2020

### Tensor estimation with structured priors

We consider rank-one symmetric tensor estimation when the tensor is corr...
10/27/2014

### Multiple Output Regression with Latent Noise

In high-dimensional data, structured noise caused by observed and unobse...
09/28/2016

### Stabilizing Linear Prediction Models using Autoencoder

To date, the instability of prognostic predictors in a sparse high dimen...
07/30/2021

### A Scalable Approach to Estimating the Rank of High-Dimensional Data

A key challenge to performing effective analyses of high-dimensional dat...
02/10/2021

### On high-dimensional wavelet eigenanalysis

In this paper, we mathematically construct wavelet eigenanalysis in high...
05/09/2020

### An Investigation of Why Overparameterization Exacerbates Spurious Correlations

We study why overparameterization – increasing model size well beyond th...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## References

• [1] Sum of i.i.d Beta-distributed variables. Note: Mathematics Stack Exchangehttps://math.stackexchange.com/q/3096929 (version: 2019-02-02)
• [2] G. J. Berman, D. M. Choi, W. Bialek, and J. W. Shaevitz (2014) Mapping the stereotyped behaviour of freely moving fruit flies. Journal of The Royal Society Interface 11 (99), pp. 20140672. External Links: Document
• [3] M. Capitaine and C. Donati-Martin (2016) Spectrum of deformed random matrices and free probability. External Links: 1607.05560
• [4] A. I. Dell, J. A. Bender, K. Branson, I. D. Couzin, G. G. de Polavieja, L. P.J.J. Noldus, A. Pérez-Escudero, P. Perona, A. D. Straw, M. Wikelski, and U. Brose (2014) Automated image-based tracking and its application in ecology. Trends in Ecology & Evolution 29 (7), pp. 417–428. External Links: ISSN 0169-5347, Document
• [5] J. A. Gallego, M. G. Perich, L. E. Miller, and S. A. Solla (2017) Neural manifolds for the control of movement. Neuron 94 (5), pp. 978–984.
• [6] H. Hotelling (1953) New light on the correlation coefficient and its transforms. Journal of the Royal Statistical Society. Series B (Methodological) 15 (2), pp. 193–232. External Links: ISSN 00359246, Link
• [7] C. Killer, T. Bockwoldt, S. Schütt, M. Himpel, A. Melzer, and A. Piel (2016-03) Phase separation of binary charged particle systems with small size disparities using a dusty plasma. Phys. Rev. Lett. 116, pp. 115002. External Links: Document
• [8] P. Loubaton and P. Vallet (2011) Almost Sure Localization of the Eigenvalues in a Gaussian Information Plus Noise Model. Application to the Spiked Models.. Electronic Journal of Probability 16 (none), pp. 1934 – 1959. External Links: Document
• [9] B. Lusch, J. N. Kutz, and S. L. Brunton (2018) Deep learning for universal linear embeddings of nonlinear dynamics. Nature communications 9 (1), pp. 1–10.
• [10] V. A. Marčenko and L. A. Pastur (1967-04) DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES. Mathematics of the USSR-Sbornik 1 (4), pp. 457–483.
• [11] L. Meshulam, J. L. Gauthier, C. D. Brody, D. W. Tank, and W. Bialek (2017-12) Collective Behavior of Place and Non-place Neurons in the Hippocampal Network.. Neuron 96 (5), pp. 1178–1191.e4.
• [12] M. C. Morrell, A. J. Sederberg, and I. Nemenman (2021-03) Latent Dynamical Variables Produce Signatures of Spatiotemporal Criticality in Large Biological Systems. Physical review letters 126 (11), pp. 118302.
• [13] J. L. Natale, D. Hofmann, D. G. Hernández, and I. Nemenman (2018) Reverse-Engineering Biological Networks From Large Data Sets. In Quantitative Biology: Theory, Computational Methods and Examples of Models, B. Munsky, L. Tsimring, and W. S. Hlavacek (Eds.),
• [14] E. H. Nieh, M. Schottdorf, N. W. Freeman, R. J. Low, S. Lewallen, S. A. Koay, L. Pinto, J. L. Gauthier, C. D. Brody, and D. W. Tank (2021) Geometry of abstract learned knowledge in the hippocampus. Nature 595, pp. 80–84.
• [15] NOAA Physical Sciences Laboratory Gridded climate data. Note: https://psl.noaa.gov/data/gridded/Accessed: 2021-06-30
• [16] J. Page, M. P. Brenner, and R. R. Kerswell (2020)

Revealing the state space of turbulence using machine learning

.
arXiv preprint arXiv:2008.07515.
• [17] C. Pandarinath, D. J. O’Shea, J. Collins, R. Jozefowicz, S. D. Stavisky, J. C. Kao, E. M. Trautmann, M. T. Kaufman, S. I. Ryu, L. R. Hochberg, et al. (2018) Inferring single-trial neural population dynamics using sequential auto-encoders. Nature methods 15 (10), pp. 805–815.
• [18] M. Potters and J. Bouchaud (2020) A first course in random matrix theory: for physicists, engineers and data scientists. Cambridge University Press.
• [19] D. Schanz, S. Gesemann, and A. Schröder (2016) Shake-the-box: lagrangian particle tracking at high particle image densities. Experiments in Fluids 57, pp. 1–27.
• [20] D. J. Schwab, I. Nemenman, and P. Mehta (2014) Zipf’s law and criticality in multivariate data without fine-tuning. Physical review letters 113 (6), pp. 068102.
• [21] A. M. Sengupta and P. P. Mitra (1999)

Distributions of singular values for some random matrices

.
Physical Review E 60 (3), pp. 3389.
• [22] M. Sinhuber, K. Van Der Vaart, R. Ni, J. G. Puckett, D. H. Kelley, and N. T. Ouellette (2019) Three-dimensional time-resolved trajectories from laboratory insect swarms. Scientific Data 6 (1), pp. 1–8.
• [23] G. J. Stephens, B. Johnson-Kerner, W. Bialek, and W. S. Ryu (2008) Dimensionality and dynamics in the behavior of c. elegans. PLoS Comput Biol 4 (4), pp. e1000028.
• [24] M. T. Valentine, P. D. Kaplan, D. Thota, J. C. Crocker, T. Gisler, R. K. Prud’homme, M. Beck, and D. A. Weitz (2001) Investigating the microenvironments of inhomogeneous soft materials with multiple particle tracking. Physical Review E 64 (6), pp. 061506.
• [25] W. Weisser, C. Roscher, S. Meyer, A. Ebeling, G. Luo, E. Allan, H. Bessler, R. Barnard, N. Buchmann, F. Buscot, et al. (2017) Biodiversity effects on ecosystem functioning in a 15-year grassland experiment: patterns, mechanisms, and open questions. Basic and Applied Ecology 23, pp. 1–73. External Links: ISSN 1439-1791, Document

## Appendix A Data distribution for the latent feature model with no noise, its variance and large m limit

Each entry of the latent features data matrix is given by the sum of products of two i.i.d. Gaussian random variables and :

 Xij∼m∑μ=1uv. (25)

The product, , is distributed according to the normal product distribution [1]:

 x∼K0(|x|σUσV)πσUσV, (26)

where is the modified Bessel function of the second kind:

 Kν(x)=Γ(ν+12)(2x)ν√π∫∞0dqcos(q)(x2+q2)ν+1/2. (27)

To derive the probability density of the latent feature model entries , we first compute the characteristic function

by taking the Fourier transform of the normal product distribution. We then use the fact that the characteristic function

of the sum of products is given by . The inverse Fourier transform of then yields the sought after probability density.

Specifically, the characteristic function of the normal product distribution is

 φx(t) =E(eitx)=∫∞−∞dxK0(|x|σUσV)πσUσVeitx =∫∞−∞dxK0(|x|)πeitσUσVx =1π∫∞−∞dx∫∞0dqcos(q)√|x|2+q2eitσUσVx =1π∫∞−∞dx∫∞0dqcos(xq)√1+q2eitσUσVx =1π∫∞0dq1√1+q2 ∫∞−∞dx2π(eix(q+σUσVt)+eix(σUσVt−q)) =∫∞0dq1√1+q2(δ(σUσVt+q)+δ(σUσVt−q)) =1√1+σ2Uσ2Vt2 (28)

for , and is the Dirac delta function.

The characteristic function of the sum of products is given by

 φX=(φx)m=(1+σ2Uσ2Vt2)−m2 (29)

Finally, performing the inverse transformation we obtain the probability density function of the sum

 pdf(X) =∫∞−∞dt2πe−itX1(1+σ2Uσ2Vt2)m2 =∫∞−∞dt2πe−itX ∫∞0dqδ(σUσVt+q)+δ(σUσVt−q)(1+q2)m2 =1σUσV∫∞0dq ∫∞−∞dt2πe−itXσUσVδ(t+q)+δ(t−q)(1+q2)m2 =1πσUσV∫∞0dqcos(q|X|σUσV)(1+q2)m2 =[|X|σUσV]m−11πσUσV∫∞0dqcos(q)(|X|2σ2Uσ2V+q2)m2 =[|X|2]m−12Km−12(|X|σUσV)(σUσV)m+12√πΓ(m2). (30)

Since the probability density function of is symmetric around zero, the mean of the distribution vanishes:

 μUV =∫∞−∞dXXpdf(X)=0. (31)

The variance is

 σ2UV =∫∞−∞dXX2pdf(X) =12m−12√πΓ(m2)× ×∫∞−∞dXσUσV[|X|σUσV]m−12|X|2Km−12(|X|σUσV) =2σ2Uσ2V2m−12√πΓ(m2)∫∞0dX|X|m+32Km−12(|X|). (32)

The integral above can be evaluated in terms of generalized hypergeometric functions [2]. We present the calculation for when is even in detail:

 ∫∞0dXXα−1Kν(X)=[Σ(ν,α;Z)+Σ(−ν,α;Z)]∞0, (33)

where

 Σ(ν,α;Z) ≡−2ν−1πZα−νcsc(πν)(ν−α)Γ(1−ν)× 1F2(α−ν2;1−ν,α−ν2+1;Z24) (34)

with parameters

 α≡m+52 and ν≡m−12, (35)

and is the generalized hypergeometric function

 1F2(a1;b1,b2;z)=∞∑k=0(a1)zk(b1)k(b2)kk!. (36)

In the expression above, is the Pochhammer symbol, and is the cosecant. Since is even, we also have . Putting everything together, we obtain the following expression for the variance

 σ2UV =[Σ(ν,α;Z)+Σ(−ν,α;Z)]∞02m−32√πΓ(m2)σ−2Uσ−2V =limZ→∞Σ(ν,α;Z)+Σ(−ν,α;Z)2m−32√πΓ(m2)σ−2Uσ−2V, (37)

where we have used the fact that the numerator after the first equality vanishes at . We can evaluate the limit , on the right-hand side numerically as shown in Fig. S1 and find that the variance of the latent feature data values is

 σ2UV=mσ2Uσ2V. (38)

This is in agreement with the intuition that every latent dimension contributes its own variance to the variance of the data.

We note that, for large values of the number of latent features , the distribution (30

) becomes normal, in agreement with the law of large numbers:

 pdf(Xij)=1√2πσ2UVe−X22σ2UV. (39)

Crucially, the variance of remains -dependent. Figure S2

for compares exact analytical expression of the probability distribution and its Gaussian approximation to numerical simulations.

As a final note, if we were interested in the distribution of data with noise, we would need to convolve the density in Eq. (30) with the Gaussian density of the noise.

## Appendix B Probability density of the correlation coefficients

For our latent features model with noise, here we calculate the probability distribution of entries in the empirical data correlation matrix. Before doing this, a few notes are in order. First, the correlations depend on the basis, in which variables are measured, becoming a diagonal matrix in the special case when the measured variables are the principal axes of the data cloud. Thus to make statements independent of the basis, we consider the distribution of typical correlations, or correlations in the basis random w. r. t. the principal axes of the data. For a given realization, the -dimensional data cloud is typically anisotropic, with long directions dominated by the latent feature signal and short directions dominated by noise. When , principal axes of the data cloud do not align with the measured variables for the vast majority of random rotations, and correlations between any random pair of variables have contributions from all latent dimensions. Thus we expect the number of latent dimensions to be imprinted in the distribution of the elements of the correlation matrix, so that the statistics of the elements carries information about the underlying structure of the model.

### b.1 Preliminaries: Density of the correlation coefficient of two random Gaussian variables

The correlation coefficient of two independent zero-mean variables and sampled times is

 r=1T∑txtytσxσy, (40)

where the vectors’ components are mutually independent, i.i.d. random variables. The correlation coefficient is distributed according to [6]

 (41)

This can be rewritten in terms of a Beta distribution

 Beta(x;α,β)=1B(α,β)xα−1(1−x)β−1, (42)

where and is the Beta function. Specifically, the density of correlations is given by the symmetric Beta distribution

 pdf(r)=Beta(r;α,α;ℓ=−1,s=2), (43)

where the location and scale are set such that the density is defined on the interval of correlation values [-1,1], and

 α=T−12. (44)

We also note that the variance of a symmetric Beta distribution with the scale is

 var=s24(2α+1)=12α+1. (45)

### b.2 Density of correlations in the latent feature model

There are multiple contributions to the correlations among the measured variables. We compute them individually, and then combine the contributions. We find that each contribution is distributed according to a symmetric Beta distribution. To obtain the overall density, we approximate the sum of Beta distributions by a single Beta distribution, the parameter of which is obtained by matching the variance to the sum of the variances of the individual components. To perform these analyses, we only keep terms to the leading order in the or the limit. Further, we assume that is small in accordance with the classical and intensive regimes limits.

 (cR)pq =1T∑tRTptRtqσnpσnq, (46) (σnq)2 =1T∑tRTqtRtq. (47)

The expression on the right-hand side is the correlation coefficient between two random Gaussian variables. Using Eq. (43), we arrive at

 pdf((cR)pq)=Beta((cR)pq;αn,αn;−1,2),p≠q, (48)

with

 αn=T−12. (49)

and the variance of this density is

 varn=T−1. (50)

Next we compute the density of the pure signal contribution

 (cUV)pq =1Tσspσsq∑t(∑μVpμUμt)(∑νUtνVνq), (51) (σsp)2 (52)

and similarly for . Rearranging, we find

 (cUV)pq =1σspσsq∑μνVpμVνq(1T∑tUμtUtν), (53) (σsp)2 =∑μνVpμVνp(1T∑tUμtUtν). (54)

The expression in parentheses of both of the equations above is a (co)-variance of Gaussian random numbers. For , it follows the scaled -distribution with degrees of freedom. For , it is given by a rescaled version of the distribution in Eq. (30), with instead of . Crucially, the variance of either is . Thus in the limit , the terms in parentheses are , where the correction is probabilistic, but will be neglected in what follows. We get

 (cUV)pq (55) (σsp)2 =mσ2U(1m∑μV2pμ), (56)

We see that the sought after correlation is a correlation coefficient between Gaussian variables, but with samples instead of . Using again Eq. (43), we write

 pdf((cUV)pq)=Beta((cUV)pq;αs,αs;−1,2),p≠q, (57)

with parameter

 αs=m−12. (58)

We remind the reader that Eq. (57) holds to . The variance of this density is

 vars =m−1. (59)

This expression agrees with numerical simulations very well, cf. Fig. 1.

Finally, for the signal-noise cross terms in the correlation, we have

 (c(UV)TR)pq =1Tσspσnq∑t∑μVpμUμtRtq =1σspσnq∑μVpμ(1T∑tUμtRtq). (60)

For the quantity in parentheses in Eq. (60), we define

 rμp≡1T∑tUμtRtq. (61)

This is a covariance between two independent Gaussian random numbers and again follows a rescaled form of the distribution in Eq. (30) with variance . Since is large, the distribution approaches a Gaussian and we further define , such that is a unit Gaussian random variable. Thus we obtain

 (c(UV)TR)pq=σUmσnqT−1/2σspσnq(1m∑μVpμr′μq)=m1/2T−1/2⎛⎝1m∑μVpμr′μq1m∑μVpμ2⎞⎠, (62)

where we have extracted the factor of to highlight that the expression in parenthesis is the correlation between Gaussian random numbers. From this, using Eq. (43), we conclude that

 pdf ((c(UV)TR)pq)= Beta((c(UV)TR)pq;αsn,αsn;−1,2),p≠q, (63)

with parameter

 αsn=m1/2T1/2−12. (64)

The variance of this density is

 varsn =m−1/2T−1/2=√vars⋅varn. (65)

An analogous expression holds for the contribution.

The empirical correlation matrix is given by

 cpq=1T∑tXptXtqσsnpσsnq, (66)

where

 (σsnp)2=(σsp)2+(σnp)2. (67)

Using Eqs (46), (51) and (60), the correlation matrix can be written as a weighted sum of the three types of contributions

 cpq =σspσsqσsnpσsnq(cUV)pq+σspσnqσsnpσsnq(c(UV)TR)pq +σnpσsqσsnpσsnq(cRTUV)pq+σnpσnqσsnpσsnq(cR)pq (68)

Each term on the right-hand side of this equation follows a Beta distribution as computed above. However, the parameter of each distribution is modified by the corresponding weight in the above sum. Consequently, the variance of each distribution is rescaled by the weight:

 var′s =σspσsqσsnpσsnqvars (69) var′n =σnpσnqσsnpσsnqvarn (70) var′sn =σspσnqσsnpσsnqvarsn. (71)

To determine an expression for the combined distribution of signal and noise correlations, we make use of the observation that the sum of Beta distributions can be well approximated by a single Beta distribution [3]. We determine the parameters of the Beta distribution by adding the means and variances of the distributions in the sum and analytically match the parameter of the single Beta distribution.

The means of the Beta distributions in Eq. (48), Eq. (57), and Eq. (B.2) are zero and thus the mean of the density of the combined contributions is also zero. Taking the sum of variances we obtain

 var=var′s+var′n+var′sn+var′ns. (72)

In the limit when and are large enough such that contributions of and can be neglected, we have the following convergence of the empirical quantities

 (σsp)2 →mσ2Uσ2V, (73) (σnp)2 →σ2, (74) (σsnp)2,(σnsp)2 →mσ2Uσ2V+σ2. (75)

Consequently the variances of the contributions take the form

 var′s →m−11+SNR−1, (76) var′n →T−11+SNR, (77) var′sn,var′ns →m−1/2T−1/2√1+SNR√1+SNR−1, (78)

Thus, in this limit, the variance of the Beta distribution, Eq. (72), is of the form

 var ≈⎛⎜ ⎜⎝m−1/2√1+SNR−1+T−1/2√1+SNR⎞⎟ ⎟⎠2. (80)

Finally, from the relation in Eq. (45), we obtain the parameter of the sought after Beta distribution.

 α=var−1−12. (81)

A comparison between the analytic form of the density and simulated data is shown in Fig. 1 for , and in Fig. S3 for finite and . In the extreme noise limits, the analytic form closely matches the simulation. In the large noise limit of , shown in Fig. S3 (b), the density is close to a Gaussian, because the number of observations is large. In the regime of finite , shown in Fig. S3 (a), deviations between the analytic form and the simulation appear for small values of . We expect that these deviations will disappear by removing the various approximations made in the above analytic derivation.

## Appendix C Spectrum of the normalized empirical covariance Matrix

To compute the eigenvalue density of the NECM , we use methods of Random Matrix Theory [4]. The standard approach is to compute the finite size Stieljtes transform

 gNC(z)=1NTr(zI−C)−1, (82)

where

is the identity matrix,

and is a complex function. In the limit of large matrices – large or thermodynamic limit – the finite size Stieltjes transform becomes, . Then the eigenvalue density is obtained as the imaginary part of the limit of the Stieltjes transform:

 ρ(λ)=1πlimη→0+Ig(z=λ−iη), (83)

where denotes the imaginary part.

We start with writing again the definition of the normalized empirical covariance matrix (NECM), which differs from the correlation matrix only by :

 C= 1T˜XT˜X=1T(˜UV+σR)T(˜UV+σR) = 1T((˜UV)T(˜UV)+~σ2RTR +~σ(˜UV)TR+~σRT˜UV). (84)

The NECM contains three different contributions: the from the pure latent feature signal, from pure noise, and two terms of the type , which are cross terms between the latent signal and the noise. Each contribution is an random matrix. Critical to computing the eigenvalue density of random matrices is the concept of matrix freeness [5], which is the generalization of statistical independence to matrices. The eigenvalue spectrum of sums and products of free matrices can be computed from spectra of summands and factors using the - and the -transforms, which are related to the Stieltjes transform and are additive and multiplicative, respectively. The signal-signal and the noise-noise contributions in the NECM definition are certainly free w. r. t. each other. We will argue in Appendix C.3 that, in our regimes of interest (the zero-noise limit (), the classical statistics limit from Eq. (9), and intensive limit from Eq. (10)), the cross-term contributions are negligible, so that we can drop them and approximate the NECM as

 C ≈(UV)T(UV)+σ2RTRσ2XT:=C˜UV+C˜σR, (85)

so that free matrix theory applies.

### c.1 Parameterizing the random matrix problem and the large matrix limit

To calculate the spectrum of the signal-signal contribution to the NECM,

 C˜UV=1σ2XT(UV)T(UV), (86)

we note that, assuming , this matrix is of rank . Thus we can work in the basis, where

 C˜UV=(H˜UV000), (87)

and

 H˜UV=1σ2XT(UTU)(VVT). (88)

There are non-trivial eigenvalues associated with , while the remaining eigenvalues are zero. The finite size Stieltjes transform, , is then of the form

 gNC˜UV(z) =1N(m1mm∑μ=11z−λμ+N−mz) =1N(mhmH˜UV(z)+N−mz), (89)

where are the eigenvalues of and is its finite size Stieltjes transform.

Now we note that in Eq. (88) is the product of two white Wishart matrices

 H˜UV=Nσ2XWUWVT, (90)

where

 WY=1TYTY, (91)

is the Wishart matrix, and is a matrix with i.i.d. standard normal entries. The key parameter characterizing such standard is the ratio of the number of columns to that of rows

 q≡NT. (92)

Since and are and matrices, respectively, a natural characterisation of is then

 q≡NT,qU≡mT,qVT≡mN, (93)

with , so that there are only two independent parameters.

It is now convenient to define

 σ2X=m(σ2Uσ2V+σ2m)≡m¯σ2X, (94)

where we used Eq. (38), so that Eq. (90) becomes

 H˜UV=1qVT¯σ2XWUWVT. (95)

In the following, we only consider the limit of large matrices. Here , , and go to infinity in such a way that , and SNR are all constant. Then in the thermodynamic limit the finite size Stieltjes transform in Eq. (C.1) becomes

 gC˜UV =qVTh+1−qVTz, (96)

where and are the large matrices limits of the Stieltjes transforms of and , respectively.

### c.2 The spectrum of C˜UV

We now compute the eigenvalue density of . The first step is to compute the Stieljtes transform . From Eq. (95), it is clear that this reduces to the problem of computing the eigenvalue spectrum of a product of two Wishart matrices.

The spectrum of a product of two free matrices can be computed with the help of the -transform, which is defined for a random matrix as

 SA(t)=t+1tT−1A(t), (97)

where is the functional inverse of the -transform . In turn, the -transform is related to the Stieltjes transform of through the relation

 TA(z)=zgA(z)−1. (98)

 SAB(t)=SA(t)SB(t), (99)

and, furthermore, for a scalar ,

 SaA(t)=a−1SA(t). (100)

For the white Wishart matrix, Eq. (91), the -transform is known to be [4]

 SWY(t)=11+qt. (101)

Thus we only need to use the multiplicative property of the -transform to compute the signal-signal contributions to the NECM. Specifically,

 SH˜UV(t) =qVT¯σ2XSWUSWVT =qVT¯σ2X(1+qUt)(1+qVTt). (102)

Equation (97) then yields

 T−1H˜UV(t)=t+1tSH˜UV(t)=t+1t(1+qUt)(1+qVTt)qVT¯σ2X. (103)

We now solve the equation for the functional inverse, , using the definition of the -transform, Eq. (98), and dividing by a common factor of . We obtain a cubic equation for the Stieltjes transform :

 h3 z2qUqVT+h2z(qVT(1−qU)+qU(1−qVT)) +h((1−qU)(1−qVT)−zqVT¯σ2X)+qVT¯σ2X=0. (104)

Finally, we divide by to obtain

 h3z