-  Sum of i.i.d Beta-distributed variables. Note: Mathematics Stack Exchangehttps://math.stackexchange.com/q/3096929 (version: 2019-02-02) Cited by: Statistical properties of large data sets with linear latent features.
-  (2014) Mapping the stereotyped behaviour of freely moving fruit flies. Journal of The Royal Society Interface 11 (99), pp. 20140672. External Links: Cited by: Statistical properties of large data sets with linear latent features.
-  (2016) Spectrum of deformed random matrices and free probability. External Links: Cited by: Statistical properties of large data sets with linear latent features, Statistical properties of large data sets with linear latent features.
-  (2014) Automated image-based tracking and its application in ecology. Trends in Ecology & Evolution 29 (7), pp. 417–428. External Links: Cited by: Statistical properties of large data sets with linear latent features.
-  (2017) Neural manifolds for the control of movement. Neuron 94 (5), pp. 978–984. Cited by: Statistical properties of large data sets with linear latent features.
-  (1953) New light on the correlation coefficient and its transforms. Journal of the Royal Statistical Society. Series B (Methodological) 15 (2), pp. 193–232. External Links: Cited by: §B.1, Statistical properties of large data sets with linear latent features.
-  (2016-03) Phase separation of binary charged particle systems with small size disparities using a dusty plasma. Phys. Rev. Lett. 116, pp. 115002. External Links: Cited by: Statistical properties of large data sets with linear latent features.
-  (2011) Almost Sure Localization of the Eigenvalues in a Gaussian Information Plus Noise Model. Application to the Spiked Models.. Electronic Journal of Probability 16 (none), pp. 1934 – 1959. External Links: Cited by: Statistical properties of large data sets with linear latent features, Statistical properties of large data sets with linear latent features.
-  (2018) Deep learning for universal linear embeddings of nonlinear dynamics. Nature communications 9 (1), pp. 1–10. Cited by: Statistical properties of large data sets with linear latent features.
-  (1967-04) DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES. Mathematics of the USSR-Sbornik 1 (4), pp. 457–483. Cited by: Statistical properties of large data sets with linear latent features, Statistical properties of large data sets with linear latent features.
-  (2017-12) Collective Behavior of Place and Non-place Neurons in the Hippocampal Network.. Neuron 96 (5), pp. 1178–1191.e4. Cited by: Statistical properties of large data sets with linear latent features.
-  (2021-03) Latent Dynamical Variables Produce Signatures of Spatiotemporal Criticality in Large Biological Systems. Physical review letters 126 (11), pp. 118302. Cited by: Statistical properties of large data sets with linear latent features.
-  (2018) Reverse-Engineering Biological Networks From Large Data Sets. In Quantitative Biology: Theory, Computational Methods and Examples of Models, B. Munsky, L. Tsimring, and W. S. Hlavacek (Eds.), Cited by: Statistical properties of large data sets with linear latent features.
-  (2021) Geometry of abstract learned knowledge in the hippocampus. Nature 595, pp. 80–84. Cited by: Statistical properties of large data sets with linear latent features.
-  Gridded climate data. Note: https://psl.noaa.gov/data/gridded/Accessed: 2021-06-30 Cited by: Statistical properties of large data sets with linear latent features.
Revealing the state space of turbulence using machine learning. arXiv preprint arXiv:2008.07515. Cited by: Statistical properties of large data sets with linear latent features.
-  (2018) Inferring single-trial neural population dynamics using sequential auto-encoders. Nature methods 15 (10), pp. 805–815. Cited by: Statistical properties of large data sets with linear latent features.
-  (2020) A first course in random matrix theory: for physicists, engineers and data scientists. Cambridge University Press. Cited by: footnote 1, Statistical properties of large data sets with linear latent features, Statistical properties of large data sets with linear latent features.
-  (2016) Shake-the-box: lagrangian particle tracking at high particle image densities. Experiments in Fluids 57, pp. 1–27. Cited by: Statistical properties of large data sets with linear latent features.
-  (2014) Zipf’s law and criticality in multivariate data without fine-tuning. Physical review letters 113 (6), pp. 068102. Cited by: Statistical properties of large data sets with linear latent features.
Distributions of singular values for some random matrices. Physical Review E 60 (3), pp. 3389. Cited by: Statistical properties of large data sets with linear latent features, Statistical properties of large data sets with linear latent features.
-  (2019) Three-dimensional time-resolved trajectories from laboratory insect swarms. Scientific Data 6 (1), pp. 1–8. Cited by: Statistical properties of large data sets with linear latent features.
-  (2008) Dimensionality and dynamics in the behavior of c. elegans. PLoS Comput Biol 4 (4), pp. e1000028. Cited by: Statistical properties of large data sets with linear latent features.
-  (2001) Investigating the microenvironments of inhomogeneous soft materials with multiple particle tracking. Physical Review E 64 (6), pp. 061506. Cited by: Statistical properties of large data sets with linear latent features.
-  (2017) Biodiversity effects on ecosystem functioning in a 15-year grassland experiment: patterns, mechanisms, and open questions. Basic and Applied Ecology 23, pp. 1–73. External Links: Cited by: Statistical properties of large data sets with linear latent features.
Appendix A Data distribution for the latent feature model with no noise, its variance and large limit
Each entry of the latent features data matrix is given by the sum of products of two i.i.d. Gaussian random variables and :
The product, , is distributed according to the normal product distribution :
where is the modified Bessel function of the second kind:
To derive the probability density of the latent feature model entries , we first compute the characteristic function
by taking the Fourier transform of the normal product distribution. We then use the fact that the characteristic functionof the sum of products is given by . The inverse Fourier transform of then yields the sought after probability density.
Specifically, the characteristic function of the normal product distribution is
for , and is the Dirac delta function.
The characteristic function of the sum of products is given by
Finally, performing the inverse transformation we obtain the probability density function of the sum
Since the probability density function of is symmetric around zero, the mean of the distribution vanishes:
The variance is
The integral above can be evaluated in terms of generalized hypergeometric functions . We present the calculation for when is even in detail:
and is the generalized hypergeometric function
In the expression above, is the Pochhammer symbol, and is the cosecant. Since is even, we also have . Putting everything together, we obtain the following expression for the variance
where we have used the fact that the numerator after the first equality vanishes at . We can evaluate the limit , on the right-hand side numerically as shown in Fig. S1 and find that the variance of the latent feature data values is
This is in agreement with the intuition that every latent dimension contributes its own variance to the variance of the data.
We note that, for large values of the number of latent features , the distribution (30
) becomes normal, in agreement with the law of large numbers:
Crucially, the variance of remains -dependent. Figure S2
for compares exact analytical expression of the probability distribution and its Gaussian approximation to numerical simulations.
As a final note, if we were interested in the distribution of data with noise, we would need to convolve the density in Eq. (30) with the Gaussian density of the noise.
Appendix B Probability density of the correlation coefficients
For our latent features model with noise, here we calculate the probability distribution of entries in the empirical data correlation matrix. Before doing this, a few notes are in order. First, the correlations depend on the basis, in which variables are measured, becoming a diagonal matrix in the special case when the measured variables are the principal axes of the data cloud. Thus to make statements independent of the basis, we consider the distribution of typical correlations, or correlations in the basis random w. r. t. the principal axes of the data. For a given realization, the -dimensional data cloud is typically anisotropic, with long directions dominated by the latent feature signal and short directions dominated by noise. When , principal axes of the data cloud do not align with the measured variables for the vast majority of random rotations, and correlations between any random pair of variables have contributions from all latent dimensions. Thus we expect the number of latent dimensions to be imprinted in the distribution of the elements of the correlation matrix, so that the statistics of the elements carries information about the underlying structure of the model.
b.1 Preliminaries: Density of the correlation coefficient of two random Gaussian variables
The correlation coefficient of two independent zero-mean variables and sampled times is
where the vectors’ components are mutually independent, i.i.d. random variables. The correlation coefficient is distributed according to 
This can be rewritten in terms of a Beta distribution
where and is the Beta function. Specifically, the density of correlations is given by the symmetric Beta distribution
where the location and scale are set such that the density is defined on the interval of correlation values [-1,1], and
We also note that the variance of a symmetric Beta distribution with the scale is
b.2 Density of correlations in the latent feature model
There are multiple contributions to the correlations among the measured variables. We compute them individually, and then combine the contributions. We find that each contribution is distributed according to a symmetric Beta distribution. To obtain the overall density, we approximate the sum of Beta distributions by a single Beta distribution, the parameter of which is obtained by matching the variance to the sum of the variances of the individual components. To perform these analyses, we only keep terms to the leading order in the or the limit. Further, we assume that is small in accordance with the classical and intensive regimes limits.
We start with the pure noise contribution to the correlations
The expression on the right-hand side is the correlation coefficient between two random Gaussian variables. Using Eq. (43), we arrive at
and the variance of this density is
Next we compute the density of the pure signal contribution
and similarly for . Rearranging, we find
The expression in parentheses of both of the equations above is a (co)-variance of Gaussian random numbers. For , it follows the scaled -distribution with degrees of freedom. For , it is given by a rescaled version of the distribution in Eq. (30), with instead of . Crucially, the variance of either is . Thus in the limit , the terms in parentheses are , where the correction is probabilistic, but will be neglected in what follows. We get
We see that the sought after correlation is a correlation coefficient between Gaussian variables, but with samples instead of . Using again Eq. (43), we write
We remind the reader that Eq. (57) holds to . The variance of this density is
This expression agrees with numerical simulations very well, cf. Fig. 1.
Finally, for the signal-noise cross terms in the correlation, we have
For the quantity in parentheses in Eq. (60), we define
This is a covariance between two independent Gaussian random numbers and again follows a rescaled form of the distribution in Eq. (30) with variance . Since is large, the distribution approaches a Gaussian and we further define , such that is a unit Gaussian random variable. Thus we obtain
where we have extracted the factor of to highlight that the expression in parenthesis is the correlation between Gaussian random numbers. From this, using Eq. (43), we conclude that
The variance of this density is
An analogous expression holds for the contribution.
The empirical correlation matrix is given by
Each term on the right-hand side of this equation follows a Beta distribution as computed above. However, the parameter of each distribution is modified by the corresponding weight in the above sum. Consequently, the variance of each distribution is rescaled by the weight:
To determine an expression for the combined distribution of signal and noise correlations, we make use of the observation that the sum of Beta distributions can be well approximated by a single Beta distribution . We determine the parameters of the Beta distribution by adding the means and variances of the distributions in the sum and analytically match the parameter of the single Beta distribution.
In the limit when and are large enough such that contributions of and can be neglected, we have the following convergence of the empirical quantities
Consequently the variances of the contributions take the form
Thus, in this limit, the variance of the Beta distribution, Eq. (72), is of the form
Finally, from the relation in Eq. (45), we obtain the parameter of the sought after Beta distribution.
A comparison between the analytic form of the density and simulated data is shown in Fig. 1 for , and in Fig. S3 for finite and . In the extreme noise limits, the analytic form closely matches the simulation. In the large noise limit of , shown in Fig. S3 (b), the density is close to a Gaussian, because the number of observations is large. In the regime of finite , shown in Fig. S3 (a), deviations between the analytic form and the simulation appear for small values of . We expect that these deviations will disappear by removing the various approximations made in the above analytic derivation.
Appendix C Spectrum of the normalized empirical covariance Matrix
To compute the eigenvalue density of the NECM , we use methods of Random Matrix Theory . The standard approach is to compute the finite size Stieljtes transform
is the identity matrix,and is a complex function. In the limit of large matrices – large or thermodynamic limit – the finite size Stieltjes transform becomes, . Then the eigenvalue density is obtained as the imaginary part of the limit of the Stieltjes transform:
where denotes the imaginary part.
We start with writing again the definition of the normalized empirical covariance matrix (NECM), which differs from the correlation matrix only by :
The NECM contains three different contributions: the from the pure latent feature signal, from pure noise, and two terms of the type , which are cross terms between the latent signal and the noise. Each contribution is an random matrix. Critical to computing the eigenvalue density of random matrices is the concept of matrix freeness , which is the generalization of statistical independence to matrices. The eigenvalue spectrum of sums and products of free matrices can be computed from spectra of summands and factors using the - and the -transforms, which are related to the Stieltjes transform and are additive and multiplicative, respectively. The signal-signal and the noise-noise contributions in the NECM definition are certainly free w. r. t. each other. We will argue in Appendix C.3 that, in our regimes of interest (the zero-noise limit (), the classical statistics limit from Eq. (9), and intensive limit from Eq. (10)), the cross-term contributions are negligible, so that we can drop them and approximate the NECM as
so that free matrix theory applies.
c.1 Parameterizing the random matrix problem and the large matrix limit
To calculate the spectrum of the signal-signal contribution to the NECM,
we note that, assuming , this matrix is of rank . Thus we can work in the basis, where
There are non-trivial eigenvalues associated with , while the remaining eigenvalues are zero. The finite size Stieltjes transform, , is then of the form
where are the eigenvalues of and is its finite size Stieltjes transform.
Now we note that in Eq. (88) is the product of two white Wishart matrices
is the Wishart matrix, and is a matrix with i.i.d. standard normal entries. The key parameter characterizing such standard is the ratio of the number of columns to that of rows
Since and are and matrices, respectively, a natural characterisation of is then
with , so that there are only two independent parameters.
In the following, we only consider the limit of large matrices. Here , , and go to infinity in such a way that , and SNR are all constant. Then in the thermodynamic limit the finite size Stieltjes transform in Eq. (C.1) becomes
where and are the large matrices limits of the Stieltjes transforms of and , respectively.
c.2 The spectrum of
We now compute the eigenvalue density of . The first step is to compute the Stieljtes transform . From Eq. (95), it is clear that this reduces to the problem of computing the eigenvalue spectrum of a product of two Wishart matrices.
The spectrum of a product of two free matrices can be computed with the help of the -transform, which is defined for a random matrix as
where is the functional inverse of the -transform . In turn, the -transform is related to the Stieltjes transform of through the relation
Crucially, for free matrices and , the -transform is multiplicative
and, furthermore, for a scalar ,
Thus we only need to use the multiplicative property of the -transform to compute the signal-signal contributions to the NECM. Specifically,
Equation (97) then yields
We now solve the equation for the functional inverse, , using the definition of the -transform, Eq. (98), and dividing by a common factor of . We obtain a cubic equation for the Stieltjes transform :
Finally, we divide by to obtain