Correlated Components Analysis --- Extracting Reliable Dimensions in Multivariate Data
How does one find data dimensions that are reliably expressed across repetitions? For example, in neuroscience one may want to identify combinations of brain signals that are reliably activated across multiple trials or subjects. For a clinical assessment with multiple ratings, one may want to identify an aggregate score that is reliably reproduced across raters. The approach proposed here --- "correlated components analysis" --- is to identify components that maximally correlate between repetitions (e.g. trials, subjects, raters). This can be expressed as the maximization of the ratio of between-repetition to within-repetition covariance, resulting in a generalized eigenvalue problem. We show that covariances can be computed efficiently without explicitly considering all pairs of repetitions, that the result is equivalent to multi-class linear discriminant analysis for unbiased signals, and that the approach also maximize reliability, defined as the mean divided by the deviation across repetitions. We also extend the method to non-linear components using kernels, discuss regularization to improve numerical stability, present parametric and non-parametric tests to establish statistical significance, and provide code.
READ FULL TEXT