On the largest singular values of certain large random matrices with application to the estimation of the minimal dimension of the state-space representations of high-dimension
This paper is devoted to the estimation of the minimal dimension P of the state-space realizations of a high-dimensional time series y, defined as a noisy version (the noise is white and Gaussian) of a useful signal with low rank rational spectral density, in the high-dimensional asymptotic regime where the number of available samples N and the dimension of the time series M converge towards infinity at the same rate. In the classical low-dimensional regime, P is estimated as the number of significant singular values of the empirical autocovariance matrix between the past and the future of y, or as the number of significant estimated canonical correlation coefficients between the past and the future of y. Generalizing large random matrix methods developed in the past to analyze classical spiked models, the behaviour of the above singular values and canonical correlation coefficients is studied in the high-dimensional regime. It is proved that they are smaller than certain thresholds depending on the statistics of the noise, except a finite number of outliers that are due to the useful signal. The number of singular values of the sample autocovariance matrix above the threshold is evaluated, is shown to be almost independent from P in general, and cannot therefore be used to estimate P accurately. In contrast, the number s of canonical correlation coefficients larger than the corresponding threshold is shown to be less than or equal to P, and explicit conditions under which it is equal to P are provided. Under the corresponding assumptions, s is thus a consistent estimate of P in the high-dimensional regime. The core of the paper is the development of the necessary large random matrix tools.
READ FULL TEXT