References
 [1] Sum of i.i.d Betadistributed variables. Note: Mathematics Stack Exchangehttps://math.stackexchange.com/q/3096929 (version: 20190202) Cited by: Statistical properties of large data sets with linear latent features.
 [2] (2014) Mapping the stereotyped behaviour of freely moving fruit flies. Journal of The Royal Society Interface 11 (99), pp. 20140672. External Links: Document Cited by: Statistical properties of large data sets with linear latent features.
 [3] (2016) Spectrum of deformed random matrices and free probability. External Links: 1607.05560 Cited by: Statistical properties of large data sets with linear latent features, Statistical properties of large data sets with linear latent features.
 [4] (2014) Automated imagebased tracking and its application in ecology. Trends in Ecology & Evolution 29 (7), pp. 417–428. External Links: ISSN 01695347, Document Cited by: Statistical properties of large data sets with linear latent features.
 [5] (2017) Neural manifolds for the control of movement. Neuron 94 (5), pp. 978–984. Cited by: Statistical properties of large data sets with linear latent features.
 [6] (1953) New light on the correlation coefficient and its transforms. Journal of the Royal Statistical Society. Series B (Methodological) 15 (2), pp. 193–232. External Links: ISSN 00359246, Link Cited by: §B.1, Statistical properties of large data sets with linear latent features.
 [7] (201603) Phase separation of binary charged particle systems with small size disparities using a dusty plasma. Phys. Rev. Lett. 116, pp. 115002. External Links: Document Cited by: Statistical properties of large data sets with linear latent features.
 [8] (2011) Almost Sure Localization of the Eigenvalues in a Gaussian Information Plus Noise Model. Application to the Spiked Models.. Electronic Journal of Probability 16 (none), pp. 1934 – 1959. External Links: Document Cited by: Statistical properties of large data sets with linear latent features, Statistical properties of large data sets with linear latent features.
 [9] (2018) Deep learning for universal linear embeddings of nonlinear dynamics. Nature communications 9 (1), pp. 1–10. Cited by: Statistical properties of large data sets with linear latent features.
 [10] (196704) DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES. Mathematics of the USSRSbornik 1 (4), pp. 457–483. Cited by: Statistical properties of large data sets with linear latent features, Statistical properties of large data sets with linear latent features.
 [11] (201712) Collective Behavior of Place and Nonplace Neurons in the Hippocampal Network.. Neuron 96 (5), pp. 1178–1191.e4. Cited by: Statistical properties of large data sets with linear latent features.
 [12] (202103) Latent Dynamical Variables Produce Signatures of Spatiotemporal Criticality in Large Biological Systems. Physical review letters 126 (11), pp. 118302. Cited by: Statistical properties of large data sets with linear latent features.
 [13] (2018) ReverseEngineering Biological Networks From Large Data Sets. In Quantitative Biology: Theory, Computational Methods and Examples of Models, B. Munsky, L. Tsimring, and W. S. Hlavacek (Eds.), Cited by: Statistical properties of large data sets with linear latent features.
 [14] (2021) Geometry of abstract learned knowledge in the hippocampus. Nature 595, pp. 80–84. Cited by: Statistical properties of large data sets with linear latent features.
 [15] Gridded climate data. Note: https://psl.noaa.gov/data/gridded/Accessed: 20210630 Cited by: Statistical properties of large data sets with linear latent features.

[16]
(2020)
Revealing the state space of turbulence using machine learning
. arXiv preprint arXiv:2008.07515. Cited by: Statistical properties of large data sets with linear latent features.  [17] (2018) Inferring singletrial neural population dynamics using sequential autoencoders. Nature methods 15 (10), pp. 805–815. Cited by: Statistical properties of large data sets with linear latent features.
 [18] (2020) A first course in random matrix theory: for physicists, engineers and data scientists. Cambridge University Press. Cited by: footnote 1, Statistical properties of large data sets with linear latent features, Statistical properties of large data sets with linear latent features.
 [19] (2016) Shakethebox: lagrangian particle tracking at high particle image densities. Experiments in Fluids 57, pp. 1–27. Cited by: Statistical properties of large data sets with linear latent features.
 [20] (2014) Zipf’s law and criticality in multivariate data without finetuning. Physical review letters 113 (6), pp. 068102. Cited by: Statistical properties of large data sets with linear latent features.

[21]
(1999)
Distributions of singular values for some random matrices
. Physical Review E 60 (3), pp. 3389. Cited by: Statistical properties of large data sets with linear latent features, Statistical properties of large data sets with linear latent features.  [22] (2019) Threedimensional timeresolved trajectories from laboratory insect swarms. Scientific Data 6 (1), pp. 1–8. Cited by: Statistical properties of large data sets with linear latent features.
 [23] (2008) Dimensionality and dynamics in the behavior of c. elegans. PLoS Comput Biol 4 (4), pp. e1000028. Cited by: Statistical properties of large data sets with linear latent features.
 [24] (2001) Investigating the microenvironments of inhomogeneous soft materials with multiple particle tracking. Physical Review E 64 (6), pp. 061506. Cited by: Statistical properties of large data sets with linear latent features.
 [25] (2017) Biodiversity effects on ecosystem functioning in a 15year grassland experiment: patterns, mechanisms, and open questions. Basic and Applied Ecology 23, pp. 1–73. External Links: ISSN 14391791, Document Cited by: Statistical properties of large data sets with linear latent features.
Appendix A Data distribution for the latent feature model with no noise, its variance and large limit
Each entry of the latent features data matrix is given by the sum of products of two i.i.d. Gaussian random variables and :
(25) 
The product, , is distributed according to the normal product distribution [1]:
(26) 
where is the modified Bessel function of the second kind:
(27) 
To derive the probability density of the latent feature model entries , we first compute the characteristic function
by taking the Fourier transform of the normal product distribution. We then use the fact that the characteristic function
of the sum of products is given by . The inverse Fourier transform of then yields the sought after probability density.Specifically, the characteristic function of the normal product distribution is
(28) 
for , and is the Dirac delta function.
The characteristic function of the sum of products is given by
(29) 
Finally, performing the inverse transformation we obtain the probability density function of the sum
(30) 
Since the probability density function of is symmetric around zero, the mean of the distribution vanishes:
(31) 
The variance is
(32) 
The integral above can be evaluated in terms of generalized hypergeometric functions [2]. We present the calculation for when is even in detail:
(33) 
where
(34) 
with parameters
(35) 
and is the generalized hypergeometric function
(36) 
In the expression above, is the Pochhammer symbol, and is the cosecant. Since is even, we also have . Putting everything together, we obtain the following expression for the variance
(37) 
where we have used the fact that the numerator after the first equality vanishes at . We can evaluate the limit , on the righthand side numerically as shown in Fig. S1 and find that the variance of the latent feature data values is
(38) 
This is in agreement with the intuition that every latent dimension contributes its own variance to the variance of the data.
We note that, for large values of the number of latent features , the distribution (30
) becomes normal, in agreement with the law of large numbers:
(39) 
Crucially, the variance of remains dependent. Figure S2
for compares exact analytical expression of the probability distribution and its Gaussian approximation to numerical simulations.
As a final note, if we were interested in the distribution of data with noise, we would need to convolve the density in Eq. (30) with the Gaussian density of the noise.
Appendix B Probability density of the correlation coefficients
For our latent features model with noise, here we calculate the probability distribution of entries in the empirical data correlation matrix. Before doing this, a few notes are in order. First, the correlations depend on the basis, in which variables are measured, becoming a diagonal matrix in the special case when the measured variables are the principal axes of the data cloud. Thus to make statements independent of the basis, we consider the distribution of typical correlations, or correlations in the basis random w. r. t. the principal axes of the data. For a given realization, the dimensional data cloud is typically anisotropic, with long directions dominated by the latent feature signal and short directions dominated by noise. When , principal axes of the data cloud do not align with the measured variables for the vast majority of random rotations, and correlations between any random pair of variables have contributions from all latent dimensions. Thus we expect the number of latent dimensions to be imprinted in the distribution of the elements of the correlation matrix, so that the statistics of the elements carries information about the underlying structure of the model.
b.1 Preliminaries: Density of the correlation coefficient of two random Gaussian variables
The correlation coefficient of two independent zeromean variables and sampled times is
(40) 
where the vectors’ components are mutually independent, i.i.d. random variables. The correlation coefficient is distributed according to [6]
(41) 
This can be rewritten in terms of a Beta distribution
(42) 
where and is the Beta function. Specifically, the density of correlations is given by the symmetric Beta distribution
(43) 
where the location and scale are set such that the density is defined on the interval of correlation values [1,1], and
(44) 
We also note that the variance of a symmetric Beta distribution with the scale is
(45) 
b.2 Density of correlations in the latent feature model
There are multiple contributions to the correlations among the measured variables. We compute them individually, and then combine the contributions. We find that each contribution is distributed according to a symmetric Beta distribution. To obtain the overall density, we approximate the sum of Beta distributions by a single Beta distribution, the parameter of which is obtained by matching the variance to the sum of the variances of the individual components. To perform these analyses, we only keep terms to the leading order in the or the limit. Further, we assume that is small in accordance with the classical and intensive regimes limits.
We start with the pure noise contribution to the correlations
(46)  
(47) 
The expression on the righthand side is the correlation coefficient between two random Gaussian variables. Using Eq. (43), we arrive at
(48) 
with
(49) 
and the variance of this density is
(50) 
Next we compute the density of the pure signal contribution
(51)  
(52) 
and similarly for . Rearranging, we find
(53)  
(54) 
The expression in parentheses of both of the equations above is a (co)variance of Gaussian random numbers. For , it follows the scaled distribution with degrees of freedom. For , it is given by a rescaled version of the distribution in Eq. (30), with instead of . Crucially, the variance of either is . Thus in the limit , the terms in parentheses are , where the correction is probabilistic, but will be neglected in what follows. We get
(55)  
(56) 
We see that the sought after correlation is a correlation coefficient between Gaussian variables, but with samples instead of . Using again Eq. (43), we write
(57) 
with parameter
(58) 
We remind the reader that Eq. (57) holds to . The variance of this density is
(59) 
This expression agrees with numerical simulations very well, cf. Fig. 1.
Finally, for the signalnoise cross terms in the correlation, we have
(60) 
For the quantity in parentheses in Eq. (60), we define
(61) 
This is a covariance between two independent Gaussian random numbers and again follows a rescaled form of the distribution in Eq. (30) with variance . Since is large, the distribution approaches a Gaussian and we further define , such that is a unit Gaussian random variable. Thus we obtain
(62) 
where we have extracted the factor of to highlight that the expression in parenthesis is the correlation between Gaussian random numbers. From this, using Eq. (43), we conclude that
(63) 
with parameter
(64) 
The variance of this density is
(65) 
An analogous expression holds for the contribution.
The empirical correlation matrix is given by
(66) 
where
(67) 
Using Eqs (46), (51) and (60), the correlation matrix can be written as a weighted sum of the three types of contributions
(68) 
Each term on the righthand side of this equation follows a Beta distribution as computed above. However, the parameter of each distribution is modified by the corresponding weight in the above sum. Consequently, the variance of each distribution is rescaled by the weight:
(69)  
(70)  
(71) 
To determine an expression for the combined distribution of signal and noise correlations, we make use of the observation that the sum of Beta distributions can be well approximated by a single Beta distribution [3]. We determine the parameters of the Beta distribution by adding the means and variances of the distributions in the sum and analytically match the parameter of the single Beta distribution.
The means of the Beta distributions in Eq. (48), Eq. (57), and Eq. (B.2) are zero and thus the mean of the density of the combined contributions is also zero. Taking the sum of variances we obtain
(72) 
In the limit when and are large enough such that contributions of and can be neglected, we have the following convergence of the empirical quantities
(73)  
(74)  
(75) 
Consequently the variances of the contributions take the form
(76)  
(77)  
(78) 
Thus, in this limit, the variance of the Beta distribution, Eq. (72), is of the form
var  (80) 
Finally, from the relation in Eq. (45), we obtain the parameter of the sought after Beta distribution.
(81) 
A comparison between the analytic form of the density and simulated data is shown in Fig. 1 for , and in Fig. S3 for finite and . In the extreme noise limits, the analytic form closely matches the simulation. In the large noise limit of , shown in Fig. S3 (b), the density is close to a Gaussian, because the number of observations is large. In the regime of finite , shown in Fig. S3 (a), deviations between the analytic form and the simulation appear for small values of . We expect that these deviations will disappear by removing the various approximations made in the above analytic derivation.
Appendix C Spectrum of the normalized empirical covariance Matrix
To compute the eigenvalue density of the NECM , we use methods of Random Matrix Theory [4]. The standard approach is to compute the finite size Stieljtes transform
(82) 
where
is the identity matrix,
and is a complex function. In the limit of large matrices – large or thermodynamic limit – the finite size Stieltjes transform becomes, . Then the eigenvalue density is obtained as the imaginary part of the limit of the Stieltjes transform:(83) 
where denotes the imaginary part.
We start with writing again the definition of the normalized empirical covariance matrix (NECM), which differs from the correlation matrix only by :
(84) 
The NECM contains three different contributions: the from the pure latent feature signal, from pure noise, and two terms of the type , which are cross terms between the latent signal and the noise. Each contribution is an random matrix. Critical to computing the eigenvalue density of random matrices is the concept of matrix freeness [5], which is the generalization of statistical independence to matrices. The eigenvalue spectrum of sums and products of free matrices can be computed from spectra of summands and factors using the  and the transforms, which are related to the Stieltjes transform and are additive and multiplicative, respectively. The signalsignal and the noisenoise contributions in the NECM definition are certainly free w. r. t. each other. We will argue in Appendix C.3 that, in our regimes of interest (the zeronoise limit (), the classical statistics limit from Eq. (9), and intensive limit from Eq. (10)), the crossterm contributions are negligible, so that we can drop them and approximate the NECM as
(85) 
so that free matrix theory applies.
c.1 Parameterizing the random matrix problem and the large matrix limit
To calculate the spectrum of the signalsignal contribution to the NECM,
(86) 
we note that, assuming , this matrix is of rank . Thus we can work in the basis, where
(87) 
and
(88) 
There are nontrivial eigenvalues associated with , while the remaining eigenvalues are zero. The finite size Stieltjes transform, , is then of the form
(89) 
where are the eigenvalues of and is its finite size Stieltjes transform.
Now we note that in Eq. (88) is the product of two white Wishart matrices
(90) 
where
(91) 
is the Wishart matrix, and is a matrix with i.i.d. standard normal entries. The key parameter characterizing such standard is the ratio of the number of columns to that of rows
(92) 
Since and are and matrices, respectively, a natural characterisation of is then
(93) 
with , so that there are only two independent parameters.
In the following, we only consider the limit of large matrices. Here , , and go to infinity in such a way that , and SNR are all constant. Then in the thermodynamic limit the finite size Stieltjes transform in Eq. (C.1) becomes
(96) 
where and are the large matrices limits of the Stieltjes transforms of and , respectively.
c.2 The spectrum of
We now compute the eigenvalue density of . The first step is to compute the Stieljtes transform . From Eq. (95), it is clear that this reduces to the problem of computing the eigenvalue spectrum of a product of two Wishart matrices.
The spectrum of a product of two free matrices can be computed with the help of the transform, which is defined for a random matrix as
(97) 
where is the functional inverse of the transform . In turn, the transform is related to the Stieltjes transform of through the relation
(98) 
Crucially, for free matrices and , the transform is multiplicative
(99) 
and, furthermore, for a scalar ,
(100) 
For the white Wishart matrix, Eq. (91), the transform is known to be [4]
(101) 
Thus we only need to use the multiplicative property of the transform to compute the signalsignal contributions to the NECM. Specifically,
(102) 
Equation (97) then yields
(103) 
We now solve the equation for the functional inverse, , using the definition of the transform, Eq. (98), and dividing by a common factor of . We obtain a cubic equation for the Stieltjes transform :
(104) 
Finally, we divide by to obtain
Comments
There are no comments yet.