1. Introduction
The covariance matrix of a
dimensional random vector
is of central importance in many statistical analysis procedures. Estimation of or its inverse (generally known as the precision matrix) are central to, for example, multivariate regression, timeseries analysis, canonical correlation analysis, discriminant analysis, and Gaussian graphical modeling. Let be a realization of . It is wellknown [50] that the sample covariance matrix is a poor estimate of when approaches the sample size or when . When approaches , will tend to become illconditioned. When , is singular leaving undefined. However, many contemporary applications in fields such as molecular biology, neuroimaging, and finance, encounter situations of interest where .The estimation of can be improved by shrinking the eigenvalues of towards a central value [e.g., 50] or by convexly combining with some wellconditioned target matrix [e.g., 45]. These solutions define a class of estimators that can be caught under the umbrella term ‘ridgetype covariance estimation’. Such estimators depend on a penalty parameter determining the rate of shrinkage and choosing its value is of prime importance. Available procedures for choosing the penalty have some (situation dependent) disadvantages in the sense that (a) they can be computationally expensive, (b) they can be restricted to special cases within the class of ridgetype estimators, or (c) they are not guaranteed to result in a meaningful penaltyvalue. There is thus some demand for a generic and computationally friendly procedure on the basis of which one can (i) heuristically select an acceptable (minimal) value for the penalty parameter and (ii) assess if more formal procedures have indeed proposed an acceptable penaltyvalue. Here such a tool is provided on the basis of the matrix condition number.
The remainder is organized as follows. Section 2 reviews the class of ridgetype estimators of the covariance matrix. In addition, penalty parameter selection is reviewed and an exposé of the matrix condition number is given. The spectral condition number is central to the introduction of the spectral condition number plot in Section 3. This graphical display is posited as an exploratory tool, that may function as a fast and simple heuristic in determining the (minimum) value of the penalty parameter when employing ridge estimators of the covariance or precision matrix. Section 4 illustrates usage of the spectral condition number plot with data from the field of oncogenomics. Section 5 discourses on the software that implements the proposed graphical display. Sections 6 and 7 conclude with discussions.
2. Eigenstructure regularization and the condition number
2.1. Ridgetype shrinkage estimation
Regularization of the covariance matrix goes back to Stein [50, 51], who proposed shrinking the sample eigenvalues towards some central value. This work spurred a large body of literature [see, e.g., 23, 24, 64, 63]. Of late, two encompassing forms of what is referred to as ‘ridgetype covariance estimation’ have emerged.
The first form considers convex combinations of and some positive definite (p.d.) target matrix [12, 34, 35, 36, 45, 18]:
(1) 
with penalty parameter
. Such an estimator can be motivated from the Steinian biasvariance tradeoff as it seeks to balance the highvariance, lowbias matrix
with the lowervariance, higherbias matrix . The second form is of more recent vintage and considers the adhoc estimator [61, 65, 22]:(2) 
with
. This second form is motivated, much like how ridge regression was introduced by
Hoerl and Kennard [26], as an adhoc modification of in order to deal with singularity in the highdimensional setting.van Wieringen and Peeters [58] show that both (1) and (2) can be justified as penalized maximum likelihood estimators [cf. 61]. However, neither (1) nor (2) utilizes a proper penalty in that perspective. Starting from the proper ridgetype penalty , van Wieringen and Peeters [58] derive an alternative estimator:
(3) 
with . van Wieringen and Peeters [58] show that, when considering a p.d. , the estimator (3) is an alternative to (1) with superior behavior in terms of risk. When considering the nonp.d. choice , they show that (3) acts as an alternative to (2), again with superior behavior.
2.2. Penalty parameter selection
The choice of penaltyvalue is crucial to the aforementioned estimators. Let denote a generic penalty. When choosing too small, an illconditioned estimate may ensue when (see Section 2.3). When choosing too large, relevant data signal may be lost. Many options for choosing are available. The ridge estimators, in contrast to regularized estimators of the covariance or precision matrix [e.g., 19, 3], do not generally produce sparse estimates. This implies that modelselectionconsistent methods (such as usage of the BIC), are not appropriate. Rather, for type estimators, it is more appropriate to seek loss efficiency.
A generic strategy for determining an optimal value for that can be used for any ridgetype estimator of Section 2.1 is
fold crossvalidation (of the likelihood function). Asymptotically, such an approach can be explained in terms of minimizing KullbackLeibler divergence. Unfortunately, this strategy is computationally prohibitive for large
and/or large . Lian [38] and Vujačić et al. [60] propose approximate solutions to the leaveoneout crossvalidation score. While these approximations imply gains in computational efficiency, they are not guaranteed to propose a reasonable optimal value for .Ledoit and Wolf [36] propose a strategy to determine analytically an optimal value for under a modified Frobenius loss for the estimator (1) under certain choices of [cf. 18]. This optimal value requires information on the unknown population matrix . The optimal penaltyvalue thus needs to be approximated with some estimate of . Ledoit and Wolf [36] utilize an consistent estimator while Schäfer and Strimmer [45]
use the unbiased estimate
. In practice, this may result in overshrinkage [9] or even negative penaltyvalues [45].Given the concerns stated above, there is some demand for a generic and computationally friendly tool for usage in the following situations of interest:

When one wants to speedily determine a (minimal) value for for which is wellconditioned;

When one wants to determine speedily whether an optimal proposed by some other (formal) procedure indeed leads to a wellconditioned estimate ;

When one wants to determine speedily a reasonable minimal value for for usage in a searchgrid (for an optimal such value) by other, optimizationbased, procedures.
In Section 3 we propose such a tool based on the spectral condition number.
2.3. Spectral condition number
The estimators from Section 2.1 are p.d. when their penaltyvalues are strictly positive. However, they are not necessarily wellconditioned for any strictly positive penaltyvalue when , especially when the penalty takes a value in the lower range. We seek to quantify the condition of the estimators w.r.t. a given penaltyvalue. To this end we utilize a condition number [59, 57].
Consider a nonsingular matrix as well as the matrix norm . The condition number w.r.t. matrix inversion can be defined as [25]
(4) 
indicating the sensitivity of inversion of w.r.t. small perturbations . When the norm in (4) is induced by a vector norm the condition number is characterized as [25]:
(5) 
For singular matrices would equal
. Hence, a high condition number is indicative of nearsingularity and quantifies an illconditioned matrix. Indeed, the inverse of the condition number gives the relative distance of
to the set of singular matrices [11]:Another interpretation can be found in error propagation. A high condition number implies severe loss of accuracy or large propagation of error when performing matrix inversion under finite precision arithmetic. One can expect to loose at least digits of accuracy in computing the inverse of [e.g., Chapter 8 and Section 6.4 of, respectively, 7, 21]. In terms of error propagation, is also a reasonable sensitivity measure for linear systems [25] .
We can specify (5) with regard to a particular norm. We have special interest in the condition number , usually called the spectral condition number for its relation to the spectral decomposition. When is a symmetric p.d. matrix, it is wellknown that [59, 21]
where are the eigenvalues of . We can connect the machinery of ridgetype regularization to this spectral condition number.
Let be the spectral decomposition of with denoting a diagonal matrix with the eigenvalues of on the diagonal and where
denotes the matrix that contains the corresponding eigenvectors as columns. Note that the orthogonality of
implies . This decomposition can then be used to show that, like the Stein estimator [50], estimate (2) is rotation equivariant: . That is, the estimator leaves the eigenvectors of intact and thus solely performs shrinkage on the eigenvalues. When choosing with , the estimator (3) also is of the rotation equivariant form, as we may then write:(6) 
An analogous decomposition can be given for the estimator (1) when choosing , with . The effect of the shrinkage estimators can then, using the rotation equivariance property, also be explained in terms of restricting the spectral condition number. For example, using (6), we have:
Similar statements can be made regarding all rotation equivariant versions of the estimators discussed in Section 2.1. Similar statements can also be made when considering a target for estimators (1) and (3) that is not of the form whenever this target is wellconditioned (i.e., has a lower condition number than ).
Example 1.
For clarification consider the following toy example. Assume we have a sample covariance matrix whose largest eigenvalue . Additionally assume that is estimated in a situation where so that and, hence,
(under the IEEE computing Standard for FloatingPoint Arithmetic;
[27]). Say we are interested in regularization using the estimator (3) using a scalar target matrix with . Even using a very small penalty of it is then quickly verified that . Under a large penalty such as we find that . Indeed, van Wieringen and Peeters [58] have shown that, in this rotation equivariant setting, as for all . Hence, as the penalty value grows very large. Section 1 of the Supplementary Material visualizes this behavior (in the given setting) for various scenarios of interest: , , , , and .Let denote a generic ridgetype estimator of the covariance matrix under generic penalty . We thus quantify the conditioning of for given (and possibly a given ) w.r.t. perturbations with
(7) 
A useful property is that , i.e., knowing the condition of the covariance matrix implies knowing the condition of the precision matrix (so essential in contemporary topics such as graphical modeling). The condition number (7) can be used to construct a simple and computationally friendly visual tool for penalty parameter selection.
3. The spectral condition number plot
3.1. The basic plot
As one may appreciate from the exposition in Section 2.3, when moves away from nearsingularity, small increments in the penalty can cause dramatic changes in . One can expect that at some point along the domain of , its value will be large enough for to stabilize (in some relative sense). We will cast these expectations regarding the behavior in a (loose) definition:
Heuristic Definition.
Let denote a generic ridgetype estimator of the covariance matrix under generic fixed penalty . In addition, let indicate a real perturbation in as opposed to the theoretical perturbation . We will term the estimate wellconditioned when small increments in translate to (relatively) small changes in visàvis .
From experience, when considering ridgetype estimation of or its inverse in situations, the point of relative stabilization can be characterized by a levelingoff of the acceleration along the curve when plotting the condition number against the (chosen) domain of . Consider Figure 1, which is the first example of what we call the spectral condition number plot.
Figure 1 indeed plots (7) against the natural logarithm of . As should be clear from Section 2.3, the spectral condition number displays (mostly) decreasing concave upward behavior in the feasible domain of the penalty parameter with a vertical asymptote at and a horizontal asymptote at (which amounts to in case of a strictly positive scalar target). The logarithm is used on the axis as, especially for estimators (2) and (3), it is more natural to consider orders of magnitude for . In addition, usage of the logarithm ‘decompresses’ the lower domain of , which enhances the visualization of the point of relative stabilization, as it is in the lower domain of the penalty parameter where illconditioning usually ensues when . Figure 1 uses simulated data (see Section 4 of the Supplementary Material) with and . The estimator used is the adhoc ridgetype estimator given by (2). One can observe relative stabilization of the spectral condition number – in the sense of the Heuristic Definition – at approximately . This value can be taken as a reasonable (minimal) value for the penalty parameter. The spectral condition number plot can be a simple visual tool of interest in the situations sketched in Section 2.2, as will be illustrated in Section 4.
3.2. Interpretational aids
The basic spectral condition number plot can be amended with interpretational aids. The software (see Section 5) can add two such aids to form a panel of plots. These aids support the heuristic choice for a penaltyvalue and provide additional information on the basic plot.
The first aid is the visualization of against the domain of . As stated in Section 2.3, provides an estimate of the digits of accuracy one can expect to loose (on top of the digit loss due to inherent numerical imprecision) in operations based on . Note that this estimate is dependent on the norm. This aid can support choosing a (miminal) penaltyvalue on the basis of the error propagation (in terms of approximate loss in digits of accuracy) one finds acceptable. Figure 2 gives an example.
Let denote the curvature (of the basic plot) that maps the natural logarithm of the penaltyvalue to the condition number of the regularized precision matrix. We seek to approximate the secondorder derivative of this curvature (the acceleration) at given penaltyvalues in the domain . The software (see Section 5) requires the specification of and , as well as the number of steps one wants to take along the domain . Say we take steps, such that . The implementation takes steps that are logequidistant, hence for all . The central finite difference approximation to the secondorder derivative [see e.g., 37] of at then takes the following form:
which is available for . The second visual aid thus plots against the feasible domain of (see Figure 2 for an example). The behavior of the condition number along the domain of the penaltyvalue is to always decrease. This decreasing behavior is not consistently concave upward for (1) (under general nonscalar targets) and (3), as there may be parts of the domain where the behavior is concave downward. This aid may then more clearly indicate inflection points in the regularization path of the condition number. In addition, this aid may put perspective on the point of relative stabilization when the axis of the basic plot represents a very wide range.
Software implementing the spectral condition number plot is discussed in Section 5. The following section illustrates, using oncogenomics data, the various uses of the spectral condition number plot with regard to covariance or precision matrix regularization. Section 2 of the Supplementary Material contains a second data example to further illustrate usage of the condition number plot. Section 4 of the Supplementary Material contains all R code with which these illustrations can be reproduced (including querying the data).
4. Illustration
4.1. Context and data
Various histological variants of kidney cancer are designated with the amalgamation ‘renal cell carcinoma’ (RCC). Chromophobe RCC (ChRCC) is a rather rare and predominantly sporadic histological variant, accounting for 46% of RCC cases [49]. ChRCC originates in the distal convoluted tubule [47], a portion of the nephron (the basic structural unit of the kidney) that serves to maintain electrolyte balance [52]. Often, histological variants of cancer have a distinct pathogenesis contingent upon the deregulation of certain molecular pathways. A pathway can be thought of as a collection of molecular features that work interdependently to regulate some biochemical function. Recent evidence suggests that (reactivation of) the Hedgehog (Hh) signaling pathway may support cancer development and progression in clear cell RCC (CCRCC) [13, 8], the most common subtype of RCC. The Hhsignaling pathway is crucial in the sense that it “orchestrates tissue patterning” in embryonic development, making it “critical to normal kidney development, as it regulates the proliferation and differentiation of mesenchymal cells in the metanephric kidney” [8]. Later in life Hhsignaling is largely silenced and constitutive reactivation may elicit and support tumor growth and vascularization [13, 8]. Our goal here is to explore if Hhsignaling might also be reactivated in ChRCC. The exploration will make use of network modeling (see Section 4.3) in which the network is taken as a representation of a biochemical pathway. This exercise hinges upon a wellconditioned precision matrix.
We attained data on RCC from the The Cancer Genome Atlas Research Network [55] as queried through the Cancer Genomics Data Server [6, 20] using the cgdsr Rpackage [28]. All ChRCC samples were retrieved for which messenger ribonucleic acid (mRNA) data is available, giving a total of
samples. The data stem from the IlluminaHiSeq_RNASeqV2 RNA sequencing platform and consist of normalized relative gene expressions. That is, individual gene expressions are given as mRNA zscores relative to a reference population that consists of all tumors that are diploid for the gene in question. All features were retained that map to the Hhsignaling pathway according to the Kyoto Encyclopedia of Genes and Genomes (KEGG)
[31], giving a total of gene features. Regularization of the desired precision matrix is needed as . Even though features from genomic data are often measured on or mapped to (approximately) the same scale, regularization on the standardized scale is often appropriate as the variability of the features may differ substantially when ; a point also made by [61]. Note that we may use the correlation matrix instead of in equations (1) to (3) without loss of generality.4.2. Penalty parameter selection
The precision estimator of choice is the inverse of (3). The target matrix is chosen as , with set to the reciprocal of the average eigenvalue of : . First, the approximate leaveoneout crossvalidation (aLOOCV) procedure [38, 60] is used (on the negative loglikelihood) in finding an optimal value for under the given target and data settings. This procedure searches for the optimal value in the domain . A relatively finegrained grid of logequidistant steps along this domain points to as being the optimal value for the penalty (in the chosen domain). This value seems low given the ratio of the data. This calls for usagetype ii. of the condition number plot (Section 2.2), where one uses it to determine if an optimal penalty as proposed by some procedure indeed leads to a wellconditioned estimate. The condition number is plotted over the same penaltydomain considered by the aLOOCV procedure. The lefthand panel of Figure 3 depicts this condition number plot. The green vertical line represents the penaltyvalue that was chosen as optimal by the aLOOCV procedure. Clearly, the precision estimate at is not wellconditioned in the sense of the Heuristic Definition. This exemplifies that the (essentially largesample) approximation to the LOOCV score may not be suitable for nonsparse situations and/or for situations in which the ratio grows more extreme (the negative loglikelihood term then tends to completely dominate the bias term). At this point one could use the condition number plot in accordance with usagetype i., in which one seeks a reasonable minimal penaltyvalue. This reasonable minimal value (in accordance with the Heuristic Definition) can be found at approximately , at which .
One could worry that the precision estimate retains too much noise under the heuristic minimal penaltyvalue. To this end, a proper LOOCV procedure is implemented that makes use of the rootfinding Brent algorithm [4]. The expectation is that the proper datadriven LOOCV procedure will find an optimal penaltyvalue in the domain of for which the estimate is wellconditioned. The penaltyspace of search is thus constrained to the region of wellconditionedness for additional speed, exemplifying usagetype iii. of the condition number plot. Hence, the LOOCV procedure is told to search for the optimal value in the domain . The optimal penaltyvalue is indeed found to the right of the heuristic minimal value at . At this value, indicated by the red vertical line in Figure 3, . The precision estimate at this penaltyvalue is used in further analysis.
4.3. Further analysis
Biochemical networks are often reconstructed from data through graphical models. Graphical modeling refers to a class of probabilistic models that uses graphs to express conditional (in)dependence relations (i.e., Markov properties) between random variables. Let
denote a finite set of vertices that correspond to a collection of random variables with probability distribution
, i.e., . Let denote a set of edges, where edges are understood to consist of pairs of distinct vertices such that is connected to , i.e., . We then consider graphs under the basic assumption . In this Gaussian case, conditional independence between a pair of variables corresponds to zero entries in the precision matrix. Indeed, the following relations can be shown to hold for all pairs with [see, e.g., 62]:where indicates the absence of an edge. Hence, the graphical model can be selected by determining the support of the precision matrix. For support determination we resort to a local false discovery rate procedure proposed by [45]
, retaining only those edges whose empirical posterior probability of being present equals or exceeds
. The righthand panel of Figure 3 represents the retrieved Markov network on the basis of . The vertexlabels are curated gene names of the Hhsignaling pathway genes. The graph seems to retrieve salient features of the Hhsignaling pathway. The Hhsignaling pathway involves a cascade from the members of the Hhfamily (IHH, SHH, and DHH) via the SMO gene to ZIC2 and members of the Wntsignaling pathway. The largest connected component is indicative of this cascade, giving tentative evidence of reactivation of the Hhsignaling pathway in (at least) rudimentary form in ChRCC.5. Software
The Rpackage rags2ridges [43] implements (along with functionalities for graphical modeling) the ridge estimators of Section 2.1 and the condition number plot through the function CNplot. This function outputs visualizations such as Figure 1 and Figure 2. The condition number is determined by exact calculation using the spectral decomposition. For most practical purposes this exact calculation is fast enough, especially in rotation equivariant situations for which only a single spectral decomposition is required to obtain the complete solution path of the condition number. Additional computational speed is achieved by the implementation of core operations in C++ via Rcpp and RcppArmadillo [15, 14]. For example, producing the basic condition number plot for the estimator (3) on the basis of data with dimensions and , using a scalar target matrix and steps along the penaltydomain, will take approximately 1.2 seconds (see Section 3 of the Supplementary Material for a benchmark exercise). The additional computational cost of the interpretational aids is linear: producing the panel of plots (including interpretational aids) for the situation just given takes approximately 1.3 seconds. When is very small and the exact calculation of the condition number may suffer from rounding problems (much like the imaginary linear system ), but remains adequate in its indication of illconditioning.
When exact computation is deemed too costly in terms of floatingpoint operations, or when one wants more speed in determining the condition number, the CNplot function offers the option to cheaply approximate the condition number, which amounts to
The condition number is computationally less complex than the calculation of in nonrotation equivariant settings. The machinery of ridgetype regularization is, however, less directly connected to this condition number (in comparison to the condition number). The approximation of uses LAPACK routines [2] and avoids overflow. This approximation is accessed through the rcond function from R [54]. The package rags2ridges is freely available from the Comprehensive R Archive Network (http://cran.rproject.org/) [54].
6. Discussion
The condition number plot is a heuristic tool and heuristics should be handled with care. Below, some cases are presented that serve as notes of caution. They exemplify that the proposed heuristic accompanying the condition number plot should not be applied (as any statistical technique) without proper inspection of the data.
6.1. Artificial illconditioning
A first concern is that, when the variables are measured on different scales, artificial illconditioning may ensue [see, e.g., 21]. In case one worries if the condition number is an adequate indication of error propagation when using variables on their original scale one can ensure that the columns (or rows) of the input matrix are on the same scale. This is easily achieved by scaling the input covariance matrix to be the correlation matrix. Another issue is that it is not guaranteed that the condition number plot will give an unequivocal point of relative stabilization for every data problem (which hinges in part on the chosen domain of the penalty parameter). Such situations can be dealt with by extending the domain of the penalty parameter or by determining the value of that corresponds to the loss of digits (in the imaginary linear system ) one finds acceptable.
6.2. Naturally high condition numbers
Some covariance matrices may have high condition numbers as their ‘natural state’. Consider the following covariance matrix: , with and where denotes the dimensional allones matrix. The variates are thus equicorrelated with unit variance. The eigenvalues of this covariance matrix are and (with multiplicity ) . Consequently, its condition number equals . The condition number of thus becomes high when the number of variates grows large and/or the (marginal) correlation approaches one (or ). The large ratio between the largest and smallest eigenvalues of in such situations mimics a highdimensional setting in which any nonzero eigenvalue of the sample covariance estimate is infinitely larger than the smallest (zero) eigenvalues. However, irrespective of the number of samples, any sample covariance estimate of an with large and close to unity (or ) exhibits such a large ratio. Would one estimate the in penalized fashion (even for reasonable sample sizes) and choose the penalty parameter from the condition number plot as recommended, then one would select a penalty parameter that yields a ‘wellconditioned’ estimate. Effectively, this amounts to limiting the difference between the penalized eigenvalues, which need not give a condition number representative of . Thus, the recommendation to select the penalty parameter from the wellconditioned domain of the condition number plot may in some (perhaps exotic) cases lead to a choice that crushes too much relevant signal (shrinking the largest eigenvalue too much). For highdimensional settings this may be unavoidable, but for larger sample sizes this is undesirable.
6.3. Contamination
Real data is often contaminated with outliers. To illustrate the potential effect of outliers on the usage of the condition number plot, consider data
drawn from a contaminated distribution, typically modeled by a mixture distribution: for , some positive constant , and mixing proportion . Then, the expectation of the sample covariance matrix . Its eigenvalues are: , for . In highdimensional settings with few samples the presence of any outlier corresponds to mixing proportions clearly deviating from zero. In combination with any substantial the contribution of the outlier(s) to the eigenvalues may be such that the contaminated sample covariance matrix is represented as better conditioned (visàvis its uncontaminated counterpart). It is the outlier(s) that will determine the domain of wellconditionedness in such a situation. Then, when choosing the penalty parameter in accordance with the Heuristic Definition, undershrinkage may ensue. One may opt, in situations in which the results are influenced by outliers, to use a robust estimator for in producing the condition number plot.7. Conclusion
We have proposed a simple visual display that may be of aid in determining the value of the penalty parameter in ridgetype estimation of the covariance or precision matrix when the number of variables is large relative to the sample size. The visualization we propose plots the spectral condition number against the domain of the penalty parameter. As the value of the penalty parameter increases, the covariance (or precision) matrix will move away from (near) singularity. In some lowerend of the domain this will mean that small increments in the value of the penalty parameter will lead to large decreases of the condition number. At some point, the condition number can be expected to stabilize, in the sense that small increments in the value of the penalty have (relatively) little effect on the condition number. The point of relative stabilization may be deemed to indicate a reasonable (minimal) value for the penalty parameter.
Usage of the condition number plot was exemplified in situations concerned with the direct estimation of covariance or precision matrices. The plot may also be of interest in situations in which (scaled versions of) these matrices are conducive to further computational procedures. For example, it may support the ridge approach to the regression problem . We would then assess the conditioning of for use in the ridgesolution to the regression coefficients: .
We explicitly state that we view the proposed condition number plot as an heuristic tool. We emphasize ‘tool’, as it gives easy and fast access to penaltyvalue determination without proposing an optimal (in some sense) value. Also, in the tradition of exploratory data analysis [56], usage of the condition number plot requires good judgment. As any heuristic method, it is not without flaws.
Notwithstanding these concerns, the condition number plot gives access to a fast and generic (i.e., nontarget and nonridgetype specific) procedure for regularization parameter determination that is of use when analytic solutions are not available and when other procedures fail. In addition, the condition number plot may aid more formal procedures, in terms of assessing if a wellconditioned estimate is indeed obtained, and in terms of proposing a reasonable minimal value for the regularization parameter for usage in a search grid.
Acknowledgements
This research was supported by grant FP7269553 (EpiRadBio) through the European Community’s Seventh Framework Programme (FP7, 20072013).
References
 [1] American Cancer Society. URL http://www.cancer.org/cancer/prostatecancer/detailedguide/prostatecancersurvivalrates.
 Anderson et al. [1999] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users’ Guide. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 3 edition, 1999.
 Bien and Tibshirani [2011] J. Bien and R. Tibshirani. Sparse estimation of a covariance matrix. Biometrika, 98:807–820, 2011.
 Brent [1971] R. P. Brent. An algorithm with guaranteed convergence for finding a zero of a function. The Computer Journal, 14:422–425, 1971.
 Carvalho et al. [2008] C. M. Carvalho, J. Chang, J. E. Lucas, J. R. Nevins, Q. Wang, and M. West. Highdimensional sparse factor modeling: Applications in gene expression genomics. Journal of the American Statistical Association, 103:1438–1456, 2008.
 Cerami et al. [2012] E. Cerami, J. Gao, U. Dogrusoz, B. E. Gross, S. O. Sumer, B. A. Aksoy, A. Jacobsen, C. J. Byrne, M. L. Heuer, E. Larsson, Y. Antipin, B. Reva, A. P. Goldberg, C. Sander, and N. Schultz. The cBio cancer genomics portal: An open platform for exploring multidimensional cancer genomics data. Cancer Discovery, 2:401–404, 2012.
 Cheney and Kincaid [2008] W. Cheney and D. Kincaid. Numerical Computing and Mathematics. Belmont, CA, USA: Thomson Brooks/Cole, 6th edition, 2008.
 D’Amato et al. [2014] C. D’Amato, R. Rosa, R. Marciano, V. D’Amato, L. Formisano, L. Nappi, L. Raimondo, C. Di Mauro, A. Servetto, F. Fulciniti, A. Cipolletta, C. Bianco, F. Ciardiello, B. M. Veneziani, S. De Placido, and R. Bianco. Inhibition of Hedgehog signalling by NVPLDE225 (Erismodegib) interferes with growth and invasion of human renal cell carcinoma cells. British Journal of Cancer, 111:1168–1179, 2014.
 Daniels and Kass [2001] M. J. Daniels and R. E. Kass. Shrinkage estimators for covariance matrices. Biometrics, 57:1173–1184, 2001.
 de Kroon et al. [2013] A. I. P. M. de Kroon, P. J. Rijken, and C. H. de Smet. Checks and balances in membrane phospholipid class and acyl chain homeostasis, the yeast perspective. Progress in Lipid Research, 52:374–394, 2013.
 Demmel [1987] J. W. Demmel. On condition numbers and the distance to the nearest illposed problem. Numerische Mathematik, 51:251–289, 1987.

Devlin et al. [1975]
S. J. Devlin, R. Gnanadesikan, and J. R. Kettenring.
Robust estimation and outlier detection with correlation coefficients.
Biometrika, 62:531–545, 1975.  Dormoy et al. [2009] V. Dormoy, S. Danilin, V. Lindner, L. Thomas, S. Rothhut, C. Coquard, J. J. Helwig, D. Jacqmin, H. Lang, and T. Massfelder. The sonic hedgehog signaling pathway is reactivated in human renal cell carcinoma and plays orchestral role in tumor growth. Molecular Cancer, 8:123, 2009.
 Eddelbuettel [2013] D. Eddelbuettel. Seamless R and C++ Integration with Rcpp. SpringerVerlag, New York, 2013.
 Eddelbuettel and François [2011] D. Eddelbuettel and R. François. Rcpp: Seamless R and C++ integration. Journal of Statistical Software, 40(8), 2011.
 [16] QIAGEN Ingenuity Target Explorer. URL https://targetexplorer.ingenuity.com.
 Ferrara et al. [2003] N. Ferrara, H. P. Gerber, and J. LeCouter. The biology of VEGF and its receptors. Nature Medicine, 9:669–676, 2003.
 Fisher and Sun [2011] T. J. Fisher and X. Sun. Improved Steintype shrinkage estimators for the highdimensional multivariate normal covariance matrix. Computational Statistics and Data Analysis, 55:1909–1918, 2011.
 Friedman et al. [2008] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9:432–441, 2008.
 Gao et al. [2013] J. Gao, B. A. Aksoy, U. Dogrusoz, G. Dresdner, B. Gross, S. O. Sumer, Y. Sun, A. Jacobsen, R. Sinha, E. Larsson, E. Cerami, C. Sander, and N. Schultz. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Science Signaling, 6:pl1, 2013.
 Gentle [2007] J. E. Gentle. Matrix Algebra: Theory, Computations, and Applications in Statistics. New York: Springer Verlag, 2007.
 Ha and Sun [2014] M. J. Ha and W. Sun. Partial correlation matrix estimation using ridge penalty followed by thresholding and reestimation. Biometrics, 70:765–773, 2014.
 Haff [1980] L. R. Haff. Empirical Bayes estimation of the multivariate normal covariance matrix. Annals of Statistics, 8:586–597, 1980.
 Haff [1991] L. R. Haff. The variational form of certain Bayes estimators. Annals of Statistics, 19:1163–1190, 1991.
 Higham [1995] D. J. Higham. Condition numbers and their condition numbers. Linear Algebra and its Applications, 214:193–213, 1995.
 Hoerl and Kennard [1970] A. E. Hoerl and R. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12:55–67, 1970.
 IEEE Computer Society [2008] IEEE Computer Society. IEEE Standard for FloatingPoint Arithmetic. IEEE Std 7542008, pages 1–70, Aug 2008.
 Jacobsen [2015] A. Jacobsen. cgdsr: RBased API for Accessing the MSKCC Cancer Genomics Data Server (CGDS), 2015. URL http://CRAN.Rproject.org/package=cgdsr. R package version 1.2.5.
 Kaiser [1958] H. F. Kaiser. The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23:187–200, 1958.
 Kaiser [1970] H. F. Kaiser. A second generation little jiffy. Psychometrika, 35:401–415, 1970.
 Kanehisa and Goto [2000] M. Kanehisa and S. Goto. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research, 28(1):27–30, 2000.
 [32] KEGG. URL http://www.genome.jp/keggbin/show_pathway?hsa04370.
 Kutmon et al. [2016] M. Kutmon, A. Riutta, N. Nunes, K. Hanspers, E. L. Willighagen, A. Bohler, J. Mélius, A. Waagmeester, S. R. Sinha, R. Miller, S. L. Coort, E. Cirillo, B. Smeets, C. T. Evelo, and A. R. Pico. WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Research, 44(D1):D488–D494, 2016.
 Ledoit and Wolf [2003] O. Ledoit and M. Wolf. Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. Journal of Empirical Finance, 10:603–621, 2003.
 Ledoit and Wolf [2004a] O. Ledoit and M. Wolf. Honey, I shrunk the sample covariance matrix. Journal of Portfolio Management, 30:110–119, 2004a.

Ledoit and Wolf [2004b]
O. Ledoit and M. Wolf.
A wellconditioned estimator for largedimensional covariance
matrices.
Journal of Multivariate Analysis
, 88:365–411, 2004b. 
LeVeque [2007]
R. J. LeVeque.
Finite Difference Methods for Ordinary and Partial Differential Equations: Steady State and Time Dependent Problems
. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, 2007.  Lian [2011] H. Lian. Shrinkage tuning parameter selection in precision matrices estimation. Journal of Statistical Planning and Inference, 141:2839–2848, 2011.
 Lin and Perlman [1985] S. Lin and M. Perlman. A Monte Carlo comparison of four estimators of a covariance matrix. In P. R. Krishnaiah, editor, Multivariate Analysis, pages 411–429. Amsterdam: North Holland, 6 edition, 1985.
 Manukyan et al. [2014] A. Manukyan, E. Çene, A. Sedef, and I. Demir. Dandelion plot: a method for the visualization of Rmode exploratory factor analyses. Computational Statistics, 29:1769–1791, 2014.
 Mersmann [2014] Olaf Mersmann. microbenchmark: Accurate Timing Functions, 2014. URL http://CRAN.Rproject.org/package=microbenchmark. R package version 1.42.
 Nunez et al. [2008] L. R. Nunez, S. A. Jesch, M. L. Gaspar, C. Almaguer, M. VillaGarcia, M. RuizNoriega, J. PattonVogt, and S. A. Henry. Cell Wall Integrity MAPK Pathway is essential for lipid homeostasis. Journal of Biological Chemistry, 283:34204 –34217, 2008.

Peeters et al. [2016]
C. F. W. Peeters, A. E. Bilgrau, and W. N. van Wieringen.
rags2ridges: Ridge Estimation of Precision Matrices from HighDimensional Data
, 2016. URL http://cran.rproject.org/package=rags2ridges. R package version 2.1.  Pourahmadi [2013] M. Pourahmadi. Highdimensional covariance estimation. Hoboken NJ: John Wiley & Sons, 2013.
 Schäfer and Strimmer [2005] J. Schäfer and K. Strimmer. A shrinkage approach to largescale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4:art. 32, 2005.
 Schwarz [1978] G. E. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6:461–464, 1978.
 Shuch et al. [2015] B. Shuch, A. Amin, A. J. Armstrong, J. N. Eble, V. Ficarra, A. LopezBeltran, G. Martignoni, B. I. Rini, and A. Kutikov. Understanding pathologic variants of renal cell carcinoma: Distilling therapeutic opportunities from biologic complexity. European Urology, 67:85–97, 2015.
 Siegel et al. [2016] R. L. Siegel, K. D. Miller, and A. Jemal. Cancer Statistics, 2016. CA: A Cancer Journal for Clinicians, 66:7–30, 2016.
 Stec et al. [2009] R. Stec, B. Grala, M. Ma̧czewski, L. Bodnar, and C. Szczylik. Chromophobe renal cell cancer–review of the literature and potential methods of treating metastatic disease. Journal of Experimental & Clinical Cancer Research, 28:134, 2009.
 Stein [1975] C. Stein. Estimation of a covariance matrix. Rietz Lecture. 39th Annual Meeting IMS. Atlanta, Georgia, 1975.
 Stein [1986] C. Stein. Lectures on the theory of estimation of many parameters. Journal of Mathematical Sciences, 34:1373–1403, 1986.
 Subramanya and Ellison [2014] A. R. Subramanya and D. H. Ellison. Distal convoluted tubule. Clinical Journal of the American Society of Nephrology, 9:2147–2163, 2014.
 Taylor et al. [2010] B. S. Taylor, N. Schultz, H. Hieronymus, A. Gopalan, Y. Xiao, B. S. Carver, V. K. Arora, P. Kaushik, E. Cerami, B. Reva, Y. Antipin, N. Mitsiades, T. Landers, I. Dolgalev, J. E. Major, M. Wilson, N. D. Socci, A. E. Lash, A. Heguy, J. A. Eastham, H. I. Scher, V. E. Reuter, P. T. Scardino, C. Sander, C. L. Sawyers, and W. L. Gerald. Integrative genomic profiling of human prostate cancer. Cancer Cell, 18:11–22, 2010.
 R Development Core Team [2011] R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2011. URL http://www.Rproject.org/. ISBN 3900051070.
 The Cancer Genome Atlas Research Network [2013] The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature, 499:43–49, 2013.
 Tukey [1977] J. W. Tukey. Exploratory data analysis. AddisonWesley, 1977.
 Turing [1948] A. M. Turing. Roundingoff errors in matrix processes. Quarterly Journal of Mechanics and Applied Mathematics, 1:287–308, 1948.
 van Wieringen and Peeters [2016] W. N. van Wieringen and C. F. W. Peeters. Ridge estimation of inverse covariance matrices from highdimensional data. Computational Statistics & Data Analysis, 103:284–303, 2016.
 Von Neumann and Goldstine [1947] J. Von Neumann and H. H. Goldstine. Numerical inverting of matrices of high order. Bulletin of the American Mathematical Society, 53:1021–1099, 1947.
 Vujačić et al. [2015] I. Vujačić, A. Abbruzzo, and E. C. Wit. A computationally fast alternative to crossvalidation in penalized Gaussian graphical models. Journal of Statistical Computation and Simulation, 85:3628–3640, 2015.
 Warton [2008] D.I. Warton. Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association, 103:340–349, 2008.
 Whittaker [1990] J. Whittaker. Graphical Models in Applied Multivariate Statistics. John Wiley & Sons Ltd., Chichester, 1990.
 Won et al. [2013] J. H. Won, J. Lim, S. J. Kim, and B. Rajaratnam. Conditionnumberregularized covariance estimation. Journal of the Royal Statistical Society, Series B, 75:427–450, 2013.
 Yang and Berger [1994] R. Yang and J. O. Berger. Estimation of a covariance matrix using the reference prior. Annals of Statistics, 22:1195–1211, 1994.
 Yuan and Chan [2008] K. H. Yuan and W. Chan. Structural equation modeling with near singular covariance matrices. Computational Statistics and Data Analysis, 52:4842–4858, 2008.
1. Behavior largest and smallest eigenvalues
This section contain a visual impression (Figure S1) of the behavior of the largest and smallest eigenvalues along the domain of the penalty parameter. The setting is given in Example 1 of the main text.
2. A second Illustration
2.1. Context and data
Prostate cancer refers to an adenocarcinoma in the prostate gland. It is the most common solid tumor diagnosed in western men [48]. Its prognosis is largely determined by metastasis with low survival rates for metastatic forms in comparison to organconfined forms [1].
Vascular endothelial growth factor (VEGF) is a signal protein that supports vascularization [17]. Vascular endothelial cells are cells that line the interior of blood vessels. The formation of new blood vessels is pivotal to the growth and metastasis of solid tumors. VEGF is actually at the helm of a cascade of signaling pathways that form the VEGFsignaling pathway. In specific, the activation of VEGF leads to the activation of the mitogenactivated protein kinase (MAPK) and phosphatidylinositol 3’kinase (PI3K)AKT signaling pathways [32]. The VEGFsignaling pathway thus largely consists of two subpathways. These subpathways mediate “the proliferation and migration of endothelial cells and” promote “their survival and vascular permeability” [32]. Hence, VEGF overexpression supports tumor growth and metastasis.
VEGFsignaling is likely active in metastatic prostate cancer. This contention is supported by recent evidence that changes in the PI3Kpathway are present in metastatic tumor samples [53]. Cancer may be viewed, from a pathway perspective, as consisting of a loss of normal biochemical connections and a gain of abnormal biochemical connections. Severe deregulation would then imply the loss of normal subpathways and/or the gain of irregular subpathways. Our goal here is to explore if the VEGFsignaling pathway in metastatic prostate cancer can still be characterized as consisting of the MAPK and PI3KAKT subpathways. The exploration will make use of a pathwaybased factor analysis in which the retained latent factors are taken to represent biochemical subpwathways [cf. 5]. This exercise hinges upon a wellconditioned covariance matrix.
We attained data on prostate cancer from the Memorial SloanKettering Cancer Center [53] as queried through the Cancer Genomics Data Server [6, 20] using the cgdsr Rpackage [28]. All metastatic samples were retrieved for which messenger ribonucleic acid (mRNA) data is available, giving a total of samples. The data stem from the Affymetrix Human Exon 1.0 ST array platform and consist of wholetranscript mRNA expression values. All Human Genome Organization (HUGO) Gene Nomenclature Committee (HGNC) curated features were retained that map to the VEGFsignaling pathway according to the Kyoto Encyclopedia of Genes and Genomes (KEGG) [31], giving a total of gene features. Regularization of the desired covariance matrix is needed as . Regularization is performed (as in the illustration in the main text) on the standardized scale.
2.2. Model
We work with standardized data. Hence, let denote a centered and scaled dimensional observation vector available for persons. The factor model states that
where denotes an dimensional vector of latent variables typically called ‘factors’, and where is a matrix whose entries denote the loading of the th variable on the th factor, , . Finally, the denote error measurements. Hence, the model can be conceived of as a multivariate regression with latent predictors. An important assumption in this model is that : the dimension of the latent vector is smaller than the dimension of the observation vector. The following additional assumptions are made: (i) The observation vectors are independent; (ii) ; (iii) ; (iv) , with a dimensional diagonal matrix with strictly positive diagonal entries ; and (v) and are independent for all and . The preceding assumptions establish a distributional assumption on the covariance structure of the observed datavector: (vi) .
Hence, to characterize the model, we need to estimate and on the basis of the sample correlation matrix. As for the data at hand, the sample correlation matrix is singular and standard maximum likelihood estimation (MLE) is not available. Recent efforts deal with such situations by (Bayesian) sparsity modeling of the factor model [e.g., 5], i.e., by imposing sparsity constraints in the loadings matrix. Our approach differs. We first ensure that we have a wellconditioned correlation matrix by using a regularized (essentially a Bayesian) estimator. Afterwards, standard MLE techniques are employed to estimate and on the basis of this regularized correlation matrix. Hence, we perform MLE on the basis of where is (in some sense) deemed optimal.
2.3. Penalty parameter selection
The estimator of choice is again (3) of the main text where we replace with the sample correlation matrix . The target matrix is chosen as , such that a regularized correlation matrix ensues. Again, the aLOOCV procedure was tried first in finding an optimal value for under the given target and data settings. We searched for the optimal value in the domain with logequidistant steps along this domain. Again, the procedure pointed to as being the optimal value for the penalty (in the chosen domain), which seems low given the ratio of the data. The condition number plot (covering the same penaltydomain considered by the aLOOCV procedure) indeed indicates that the precision estimate at is not wellconditioned in the sense of the Heuristic Definition, exhibiting a condition number of approximately (Figure S2). A reasonable minimal penaltyvalue (in accordance with the Heuristic Definition) can be found at approximately , at which . This reasonable minimal value is subsequently used to constrain the searchdomain to the region of wellconditionedness. A rootfinding (by the Brent algorithm [4]) LOOCV procedure is then told to search for the optimal value in the domain . The optimal penaltyvalue is found at . At this value, indicated by the red vertical line in Figure S2, .
The KaiserMeyerOlkin index [30] of .98 indicates that a reasonable proportion of variance among the variables might be common variance and, hence, that is suitable for a factor analysis. The regularized estimate is used in further analysis.
2.4. Further analysis
The dimension of the factor solution is unknown. Hence, the optimal factor dimension needs to be determined in conjunction with the estimation of the model. Now, let denote the MLE solution under factors. Then, we determine the optimal dimension (contingent upon the MLE solutions) using the Bayesian Information Criterion (BIC; [46]), which, for the problem at hand, amounts to:
where indicates the free parameters in the model. Figure S3 gives the BIC scores for each dimension allowed by the existence condition . The solution with the lowest BIC score is deemed optimal, indicating that is the preferred solution.
The specified FA model is inherently underidentified. Assume
is an arbitrary orthogonal matrix. Considering the implied covariance structure of the observed data we may write:
This equality implies that, given , there is an infinite number of alternative loading matrices that generate the same covariance structure as . Thus, in any solution, can be made to satisfy additional conditions, and, hence, the structure of the existence condition given above. Naturally, any estimation method then requires a minimum of restrictions on to attain uniqueness (up to possibly polarity reversals in the columns of ). In MLE this is achieved by requiring that be diagonal along with an ordercondition on its diagonal elements. As this is a convenience solution that has no direct interpretational meaning, usually a posthoc rotation is applied, whence estimation is settled, to enhance interpretation. Here, the Varimax [29] rotation to an orthogonal simple structure was ultimately used as oblique rotation (that allows for factor correlations) indicated a nearzero correlation between the two retained latent factors (note that any orthogonal representation has equivalent oblique representations).
The final factor solution is represented in Figure S4. This Dandelion plot [40] visualizes the magnitudes of the elements of and and their relation to the two retained factors. The latent factors are taken to represent biochemical subpwathways. Table S1 characterizes the factors according to their constituting genes that achieve the highest absolute loading. These genes are furthermore associated with VEGFsignaling subpathways as indicated by the WikiPathways database [33]. Factor 1 consists mostly of MAPKrelated genes. In addition, genes related to the glycerol phospholipid biosynthesis pathway load on factor 1. The MAPK pathway has been associated with lipid homeostase in yeast [42] and “phospholipid composition and synthesis are similar in yeast and mammalian cells” [10]. Genes with pathway association indicated by ‘’ in Table S1 (PIK3R5 and SPHK2) could not be associated to VEGFsubpathways according to the WikiPathways database. PIK3R5 is, however, related to the MAPKsignaling pathway according to the Ingenuity Target Explorer [16]. Factor 2 consists mostly of genes related to the PI3K/AKTsignaling and prostate cancer pathways. Moreover, this factor also includes general VEGFsignaling pathway genes that are conducive in activating the MAPK and PI3K/AKT cascade. These results suggest that the compositional subpathway structure of the VEGFsignaling pathway in metastatic prostate cancer can still be characterized as consisting of the MAPK and PI3KAKT cascades. Hence, VEGFpathway degerulation in metastatic prostate cancer is more likely characterized by intricate changes in its known subpathways than by the loss of these (normal) subpathways or the gain of (abnormal) subpathways.
Factor  HUGO Symbol  Pathway 

1  CDC42  MAPKsignaling 
MAPK14  MAPKsignaling  
PIK3R5    
PLA2G2C  Glycerol phospholipid biosynthesis  
PLA2G3  Glycerol phospholipid biosynthesis  
PLA2G4E  Glycerol phospholipid biosynthesis  
PPP3R1  MAPKsignaling  
PRKCG  MAPKsignaling  
SPHK1  VEGFsignaling  
SPHK2    
2  AKT1  PI3K/AKTsignaling in cancer 
AKT2  AKTsignaling / VEGFsignaling  
AKT3  AKTsignaling / VEGFsignaling  
BAD  AKTsignaling / Prostate cancer  
MAP2K2  MAPKsignaling / Prostate cancer  
MAPKAPK2  MAPKsignaling / Prostate cancer  
PIK3R2  AKTsignaling / VEGFsignaling  
PLA2G6  Glycerol phospholipid biosynthesis  
PXN  VEGFsignaling / Prostate cancer  
SHC2  VEGFsignaling 
3. Benchmark
A benchmarking exercise was conducted to get an indication of the execution time of the CNplot function. The sample size is not a factor in the execution time as the function operates on the covariance (or precision) matrix. Hence, time complexity hinges on the dimension and the number of steps taken along the (specified) domain of the penaltyparameter. Execution time was thus evaluated for various combinations of and .
Timing evaluations were done for all ridge estimators mentioned in Section 2.1 of the main text and all combinations of the elements in and . The basic condition number plot was produced 50 times (on simulated covariance matrices) for each combination of estimator, , and . In addition, both a rotation equivariant situation (using a scalar target) and a rotation nonequivariant situation (using a nonscalar target) were considered for estimators (1) and (3) of the main text (note that estimator (2) is always rotation equivariant). All execution times of the CNplot call in R were evaluated with the microbenchmark package [41]. All timings were carried out on a Intel Core i74702MQ 2.2 GHz processor. The results, stated as median runtimes in seconds, are given in Tables S2 to S4. The code used in this benchmarking exercise can be found in Listing 6 in Section 4 of this Supplement.
Tables S2 to S4 indicate that, in general, the condition number plot can be produced quite swiftly. As stated in the main text: in rotation equivariant situations only a single spectral decomposition is required to obtain the complete solution path of the condition number. Hence, the time complexity of the CNplot call in these situations is approximately (worstcase time complexity of a spectral decomposition) as, after the required spectral decomposition, the solution path can be obtained in linear time only. Increasing the number of steps along the penaltydomain thus comes at little additional computational cost. For the rotation nonequivariant setting the time complexity is approximately as a spectral decomposition is required for all . The runtime of the condition number plot function under Estimator (3) of the main text is then somewhat longer than the corresponding runtime under Estimator (1). This is because the spectral decomposition in the former situation is also used for computing the (relatively expensive) matrix square root. As acts as a scaling factor in rotation nonequivariant settings the computation time of the plot can be reduced by coarsening the searchgrid along the penaltydomain.
gray!20  Rotation equivariant setting  

.023  .044  .222  .998  
.024  .046  .218  .965  
.027  .050  .221  1.122  
.035  .061  .233  1.188  
gray!20  Rotation nonequivariant setting  
.689  3.267  18.801  139.982  
1.315  6.661  39.022  278.454  
2.703  13.418  76.489  553.812  
4.995  26.112  152.427  1,106.002 
gray!20  Rotation equivariant setting  

.023  .044  .214  1.179  
.024  .048  .218  1.227  
.027  .054  .219  1.032  
.031  .054  .222  1.046  
gray!20  Rotation nonequivariant setting  
.339  1.448  9.701  65.988  
.590  2.849  18.128  129.262  
1.090  5.385  35.188  281.583  
2.366  12.199  75.222  522.910 
gray!20  Rotation equivariant setting  

.026  .056  .281  1.606  
.026  .059  .282  1.597  
.027  .058  .261  1.600  
.030  .060  .280  1.689 
To compare the runtimes for the CNplot call we also conducted timing evaluations for the approximate leaveoneout crossvalidation (aLOOCV) and rootfinding LOOCV procedures as used in the main text (implemented in rags2ridges). Time complexity for the rootfinding LOOCV procedure hinges on the dimension and the sample size . In contrast, time complexity for for the aLOOCV procedure is dependent on , , and . Timings were done for all combinations of the elements in , and (when relevant) . The basic condition number plot was produced 50 times (on simulated data of dimension ) for each combination of , and (possibly) . For the rootfinding LOOCV procedure both a rotation equivariant situation and a rotation nonequivariant situation were considered. Ridge estimator (3) of the main text was used for all timing evaluations. The results, stated as median runtimes in seconds, are given in Tables S5 and S6. The code used in this exercise can also be found in Listing 6 in Section 4 of this Supplement.
Tables S5 and S6 indicate that, even under these relatively light settings, the runtimes for the aLOOCV and rootfinding LOOCV procedures far exceed the runtime of (corresponding) calls to CNplot. The runtime for the rootfinding LOOCV procedure is (given ) exacerbated for larger sample sizes. The runtime for the aLOOCV procedure is (given ) exacerbated for larger samples sizes, finer searchgrids, and (not shown) situations of rotation nonequivariance.
gray!20  Rotation equivariant setting  

36.843  186.662  
83.932  438.080  
gray!20  Rotation nonequivariant setting  
37.151  186.780  
84.335  439.607 
gray!20  , Rotation equivariant setting  

53.059  393.309  
100.733  802.741  
gray!20  , Rotation equivariant setting  
105.372  779.881  
212.129  1561.064 
4. R Codes
The R libraries needed to run the code snippets below can be found in Listing 1:
The code in Listing 2 will produce Figure 1 of the main text:
The code in Listing 3 will produce (the components of) Figure 2 of the main text:
The code in Listing 4 was used for Section 4 of the main text:
The code in Listing 5 was used for the second illustration contained in Section 2 of this supplement:
Comments
There are no comments yet.