Eigenanalysis plays a central role in many aspects of machine learning because the natural coordinates and relative lengths of a multifactor dataset are revealed by the analysis Hastie:2001 ; Alpaydin:2010 ; Kuhn:2013 ; Scholkopf:2002 . Singular-value decomposition and eigendecomposition are two of the most common instances of eigenanalysis Strang:1988 ; Golub:2014 , yet there is a wide range of adaptation, such as eigenface FaceRecognition:2011 . The eigenanalysis ecosystem is so important and well studied that highly optimized implementations are available in core numerical libraries such as LAPACK lapack:1999 .
This article does not seek to suggest or make any improvement to the existing theoretical or numerical study of eigenanalysis as it applies to any one standalone problem. Instead, the focus of this article is on the improvement of eigenanalysis when applied repeatedly to an evolving dataset. With today’s world being awash in data, and with machine-learning techniques being applied regularly, if not continuously, on ever-expanding datasets as new data arrives, the evolution of the eigensystem ought to be included in any well-conceived study. Yet, the researcher will find that the orientation of the eigenvector basis produced by current eigenanalysis algorithms is not consistent, that it may change according to hardware and software, and that it is sensitive to data perturbations.
This paper demonstrates a method for the consistent orientation of an eigensystem, in which the orientation is made consistent after an initial call to a canonical eigenanalysis routine. This method, however, does not address the issue of a correct orientation: correctness is context specific and must be addressed in a manner exogenous to eigenanalysis. The analyst will need to know, for instance, whether a positive change in the factors aligned to particular eigenvectors should imply a positive or negative change in a dependent feature. In this article, consistency alone is sought.
As a consequence of the method detailed below, the evolution of the eigenvectors themselves can be tracked. Eigenvectors that span lie on a hypersphere , and the mean pointing direction on the sphere and dispersion about the mean are statistics of interest, as is any statistically significant drift. Basic results from directional statistics (see Mardia:2000 ; Jammal:2001 ; Ley:2017 ) are used to study the mean direction and dispersion of the common and differential modes over eigenvector evolution.
2 Inconsistent Orientation
Given a square, symmetric matrix, , with real-valued entries, , the eigendecomposition is
where the columns of are the eigenvectors of , the diagonal entries along are the eigenvalues, and entries in both are purely real. Moreover, when properly normalized, , indicating that the eigenvectors form an orthonormal basis.
To inspect how the eigenvectors might be oriented, write out as , where
is a column vector, and take the inner product two ways:
A choice of is indistinguishable from a choice of because in either case the equality in (1) holds. In practice, the eigenvectors take an inconsistent orientation, even though they will have a particular orientation once returned from a call to an eigenanalysis routine.
A similar situation holds for singular-value decomposition (SVD). For a centered panel of data where , SVD gives
where the columns in panel are the projections of the columns of onto the eigenvectors , scaled to have unit variance. The eigenvectors have captured the natural orientation of the data. Unlike the example above, the eigensystem here is positive semidefinite, so all eigenvalues in are nonnegative, and in turn a scatter plot of data in typically appears as an ellipsoid in (see figure 1). Following the thought experiment above, a sign flip of an eigenvector in flips the sign of the corresponding column in , thereby preserving the equality in (2). Consequently, the eigenvector orientations produced by SVD also have inconsistent orientation, since either sign satisfies the equations.
In isolation, such inconsistent eigenvector orientation does not matter. The decomposition is correct and is generally produced efficiently. However, in subsequent analysis of the eigensystem—principal component analysis (PCA) being a well known example Hastie:2001 —and for an evolving environment, the inconsistency leads to an inability to interpret subsequent results and the inability to track the eigenvector evolution.
For instance, a simple linear regression performed on the eigensystem (whether with reduced dimension or not) with dependent variable reads
where the weight vector is , . Each entry in corresponds to a column in , which in turn corresponds to an eigenvector in . If the sign of one of the eigenvectors is flipped (compare left and right in figure 1), the corresponding column in has its sign flipped, which flips the sign of the corresponding entry. In an evolving environment, a particular regression might be denoted by index , and the th regression reads
When tracking any particular entry in the vector, say , it will generally be found that flips sign throughout the evolution of system (see figure 2). There are two consequences: First, for a single regression, the interpretation of how a change of the th factor corresponds to a change in is unknown because the th eigenvector can take either sign. Second, as the data evolves, the sign of will generally change, even if the unsigned value remains fixed, so that the time series of cannot be meaningfully analyzed. If these sources of sign flips can be eliminated, then a time series of weights can be interpreted as we might have expected (see figure 2 lower right).
3 Toward a Consistent Orientation
To construct a consistent eigenvector basis we must begin with a reference, and SVD offers a suitable start. The data in matrix of (2) is presented with columns, where, typically, each column refers to a distinct observable or feature. Provided that the data is well formed, such as having zero mean, bound variance, and negligible autocorrelation along each column, the row-space of forms the constituent basis with basis vectors such that (see figure 1). If we label the columns in by , then the basis when materialized onto the constituent axes is simply
Now, when plotted as a scatter plot in , the data in the rows of typically form an ellipsoid in the space. When the columns of have zero linear correlation, the axes of the ellipsoid align to the constituent axes. As correlation is introduced across the columns of , the ellipsoid rotates away. Given such a case, we might seek to rotate the ellipsoid back onto the constituent axes. The constituent axes can therefore act as our reference.
This idea has good flavor, but is incomplete. An axis is different from a vector, for a vector may point along either the positive or negative direction of a particular axis. Therefore, ambiguity remains. To capture this mathematically, let us construct a rotation matrix such that rotates the data ellipse onto the constituent axes. A modified SVD will read
If it were the case that , then the goal would be achieved. However, is not true. Instead,
is the general case, where the sign on the th entry depends on the corresponding eigenvector orientation in . While the following is an oversimplification of the broader concept of orientation, it is instructive to consider that a basis in may be oriented in a right- or left-handed way. It is not possible to rotate one handedness onto the other because of the embedded reflection.
It is worth noting that an objection might be raised at this point because the eigenvector matrix
is itself a unitary matrix withand therefore serves as the sought-after rotation. Indeed, . The problem is that is quadratic, so any embedded reflections cancel. In order to identify the vector orientation within , we must move away from the quadratic form.
4 A Representation for Consistent Orientation
The representation detailed herein uses the constituent basis as the reference basis and seeks to make (5) an equality for a particular . To do so, first define the diagonal sign matrix as
The product is , and clearly . Given an entry , the product reflects the th eigenvector such that while leaving the other eigenvectors unaltered.
An equality can now be written with the representation
given suitable rotation and reflection matrices and . This representation says that a rotation matrix can be found to rotate onto the constituent basis , provided that a diagonal reflection matrix is found to selectively flip the sign of eigenvectors in to ensure complete alignment in the end. Once (7) is satisfied, the oriented eigenvector basis can be calculated in two ways:
To achieve this goal, the first step is to order the eigenvalues in descending order, and reorder the eigenvectors in accordingly. (Since the representation (7) holds for real symmetric matrices and not solely for positive semidefinite systems, the absolute value of the eigenvalues should be ordered in descending order.) Indexing into below expects that this ordering has occurred.
The representation (7) is then expanded so that each eigenvector is inspected for a reflection, and a rotation matrix is constructed for its alignment to the constituent. The expansion reads
There is a good separation of concerns here: reflections are imparted by the operators to the right of , while rotations are imparted by the operators to the left. Reflections do not preserve orientation whereas rotations do preserve orientation.
The algorithm created to attain a consistent orientation follows the pattern
For each eigenvector, a possible reflection is computed first, followed by a rotation. The result of the rotation is to align the target eigenvector with its associated constituent basis vector. The first rotation is performed on the entire space , the second is performed on the subspace so that the alignment is preserved, the third rotation operates on to preserve the preceding alignments, and so on. The solution to (7) will include reflection elements and rotation angles embedded in the rotation matrices .
Before working through the solution, it will be helpful to walk through an example of the attainment algorithm in .
5 Algorithm Example in
This example illustrates a sequence of reflections and rotations in to attain alignment between and , and along the way shows cases in which a reflection or rotation is simply an identity operator.
A example eigenvector is created with a left-hand basis, such that an alignment to the constituent basis would yield . The precondition of ordered eigenvalues has already been met. Figure 3 (a) shows the relative orientation of to .
The first step is to consider whether or not to reflect , the purpose being to bring
into the nonnegative side of the halfspace created by the hyperplane perpendicular to. To do so, the relative orientation of is measured with an inner product. The reflection entry is then determined from
(with ). From this, is formed. A consequence of the resultant orientation is that the angle subtended between and is .
The next step is to align with through a rotation : the result is shown in figure 3 (b). The principal result of is that : the basis vectors are aligned. Significantly, the remaining vectors are perpendicular to , and therefore the subsequent rotation needs only to act in the subspace.
With this sequence of operations in the full space now complete, the subspace is treated next. As before, the first step is to inspect the orientation of with . However, care must be taken because is no longer in its original orientation; instead, it was rotated with the whole basis during , as shown in figure 3 (b). It is the new orientation of that needs inspection, and that orientation is . As seen in figure (b), and point in opposite directions, so eigenvector needs to be reflected. We have
which in this example is . The updated representation of the eigenbasis is , as seen in figure 3 (c).
To complete work in , rotation is found to bring into alignment with . The sequence is illustrated in figure 3 (d).
The solution needs to work through the last subspace to align . The reflection entry is evaluated from
where in this example. From this we arrive at . Canonically, a final rotation will carry into , but since we are in the last subspace of , the operation is a tautology. Still, is included in the representation (9) for symmetry.
Lastly, the final, oriented eigenvector basis can be calculated two ways,
compare with (8).
In summary, the algorithm walks down from the full space into two subspaces in order to perform rotations that do not upset previous alignments. As this example shows, , so there are in fact two reflection operators that are simply identities. Moreover, , since the prior rotations automatically align that last basis vector to within a reflection.
6 Algorithm Details
The algorithm that implements (9) is now generalized to . There are three building blocks to the algorithm: sorting of the eigenvalues in descending order, calculation of the reflection entries, and construction of the rotation matrices. Of the three, construction of the rotation matrices requires the most work and has not been previously addressed in detail.
6.1 Sorting of Eigenvalues
The purpose of sorting the eigenvalues in descending order and rearranging the corresponding eigenvectors accordingly is to ensure that the first rotation operates on the axis of maximum variance, the second rotation on the axis of next-largest variance, and so on, the reason being that the numerical certainty of the principal ellipsoid axis is higher than that of any other axis. Since the rotations are concatenated in this method, an outsize error introduced by an early rotation persists throughout the calculation. By sorting the eigenvalues and vectors first, any accumulated errors will be minimized.
Since the algorithm works just as well for real-valued symmetric matrices, it is reasonable to presort the eigenvalues in descending order of their absolute value. From a data-science perspective, we generally expect positive semidefinite systems; therefore a system with negative eigenvalues is unlikely, and if one is encountered, there may be larger upstream issues to address.
6.2 Reflection Calculation
Reflection entries are calculated by inspecting the inner product between a rotated eigenvector and its associated constituent basis (several examples are given in the previous section). The formal reflection-value expression for the th basis vector is
Despite of the apparent complexity of these expressions, the practical implementation is simply to inspect the sign of the entry in the matrix.
6.3 Rotation Matrix Construction
One way to state the purpose of a particular rotation is that the rotation reduces the dimension of the subspace by one. For an eigenvector matrix in , denoted by , the rotation acts such that
Here, ensures that the entry on the right is . Next, the sign of the matrix entry is inspected and set accordingly. The next rotation decrements the dimension again such that
As before, ensures that the entry on the right is . The entries along the diagonal reflect the fact that a eigenvector has been aligned with its corresponding constituent basis vector. Subsequent rotations preserve the alignment by operating in an orthogonal subspace.
6.3.1 Alignment to One Constituent Basis Vector
To simplify the notation for the analysis below, let us collapse and all operations prior to the th rotation into a single working matrix, denoted by . The left-hand side of the two equations above then deals with and . The th column of the th working matrix is , and the th column entry is .
With this notation, the actions of and above on the principal working columns and are
Strictly speaking, the entries in the column vectors on the right-hand sides are , but since is expected to be orthonormal, the inner product is unity.
, and form the basis for the implementation of QR decomposition (see alsoNRC:1992 ). There are two approaches to the solution: use of Householder reflection matrices, or use of Givens rotation matrices. The Householder approach is reviewed in section 7.1 below, but in the end it is not suitable in this context. The Givens approach works well, and the solution here has an advantage over the classical formulation because we can rely on .
To simplify the explanation once again, let us focus on the four-dimensional space . In this case, needs to zero out three entries below the pivot, and likewise needs to zero out two entries below its pivot. Since one Givens rotation matrix can zero out one column entry at most, a cascade of Givens matrices is required to realize the pattern in (16) and the length of the cascade depends on the dimension of the current subspace.
A Givens rotation imparts a rotation within an plane embedded in a higher-dimensional space. While a simple two-dimensional counterclockwise rotation matrix is
an example Givens rotation matrix in is
where the dot notation on the right represents the pattern of nonzero entries. The full rotation is then decomposed into a cascade of Givens rotations:
The order of the component rotations is arbitrary and is selected for best analytic advantage. That said, a different rotation order yields different Givens rotation angles, so there is no uniqueness to the angles calculated below. The only material concern is that the rotations must be applied in a consistent order.
A advantageous cascade order of Givens rotations to expand the left-hand equation in (16) follows the pattern
where the rotation has been moved to the other side of the original equation, and denotes a vector in , in which all entries are zero except for a unit entry in the th row. Each Givens matrix moves some of the weight from the first row of into a specific row below, and otherwise the matrices are not coupled.
In the same vein, Givens rotations to represent follow the pattern
Similar to before, weight in the entry of is shifted into the third and fourth rows of the final vector. It is evident that requires only one Givens rotation, and that for
is simply the identity matrix.
6.3.2 Solution for Givens Rotation Angles
Now, to solve for the Givens angles, the cascades are multiplied through. For the cascade in (18), multiplying through yields
where entries denote row entries in and are used only to further simplify the notation. This is the central equation, and it has a straightforward solution. Yet first, there are several important properties to note:
The norms of the two column vectors are both unity.
While there are four equations, the three angles are the only unknowns. The fourth equation is constrained by the norm.
The entry (or specifically the entry) is nonnegative: . This holds by construction because the associated reflection matrix, applied previously, ensures the nonnegative value of this leading entry.
In order to uniquely take the arcsine, the rotation angles are restricted to the domain . As a consequence, the sign of each row is solely determined by the sine-function values, the cosine-function values being nonnegative on this domain: . This leads to a global constraint on the solution since could otherwise hold for pairs of negatively signed cosine-function values.
With these properties in mind, a solution to (19) uses the arcsine method, which starts from the bottom row and works upward. The sequence in this example reads
where . The sign of each angle is solely governed by the sign of the corresponding entry. Moreover, other than the first angle, the Givens angles are coupled in the sense that no one angle can be calculated from the entries of working matrix alone, and certainly not prior to the rotation of into . This shows once again that there is no shortcut to working down from the full space through each subspace until is resolved.
Past the first equation, the arguments to the arcsine functions are quotients, and it will be reassuring to verify that these quotients never exceed unity in absolute value. For , we can write
thus , so the arcsine function has a defined value. Similarly, for we have
and therefore . Again, the arcsine function has a defined value. This pattern holds in any dimension.
The preceding analysis is carried out for each eigenvector in . For each subspace there are angles to resolve. For convenience, the Givens angles can be organized into the upper right triangle of a square matrix. For instance, in ,
Lastly, it is worth noting that Dash constructed a storage matrix of angles similar to (21) when calculating the embedded angles in a correlation matrix Dash:Polar:2004 . His interest, shared here, is to depart from the Cartesian form of a matrix—whose entries in his case are correlation values and in the present case are eigenvector entries—and cast them into polar form. Polar form captures the natural representation of the system.
This section has shown that each eigenvector rotation matrix is constructed from a concatenation of elemental rotation matrices called Givens rotations. Each Givens rotation has an angle, and provided that the rotation order is always preserved, the angles are a meaningful way to represent the orientation of the eigenbasis within the constituent basis.
The representation for in (21) highlights that there are angles to calculate to complete the full basis rotation in . It will be helpful to connect the angles to a rotation matrix, so let us define a generator function such that
The job of the generator is to concatenate Givens rotations in the correct order, using the recorded angles, to correctly produce a rotation matrix, either a matrix for one vector, or the matrix for the entire basis.
7 Consideration of Alternative Approaches
The preceding analysis passes over two choices made during the development of the algorithm. These alternatives are detailed in this section, together with the reasoning for not selecting them.
7.1 Householder Reflections
Equations nearly identical to (16) appear in well-known linear algebra texts, such as Golub:2014 ; Strang:1988 ; Strang:1986 , typically in the context of QR decomposition, along with the advice that Householder reflections are preferred over Givens rotations because all entries below a pivot can be zeroed out in one step. This advice is accurate in the context of QR decomposition, but does not hold in the current context.
Recall that in order to align eigenvector to constituent basis , Givens rotations are necessary for a space of dimension . Arranged appropriately, each rotation has the effect of introducing a zero in the column vector below the pivot. A Householder reflection aligns to in a single operation by reflecting onto about an appropriately aligned hyperplane in . In general this hyperplane is neither parallel nor perpendicular to , and therefore neither are any of the other eigenvectors in .
The Householder operator is written as
where is the Householder vector that lies perpendicular to the reflecting hyperplane. The operator is Hermitian, unitary, involutory, and has . In fact, all eigenvalues are , except for one that is .
Let us reconsider the eigenvector representation (7) and apply a general unitary transform to , as in
(Unitary transform is not to be confused with the matrix of projected data in SVD equation (2).) The equality holds for any unitary operator, so Householder reflections are admitted. Let us then compare expanded representations in the form of (9) for both rotation and reflection operators to the left of :
Even though (23) holds for Householder reflections, the equality itself is not the goal; rather, the interpretability of the eigenvector orientation is paramount. In the rotations method, reflections only reflect one eigenvector at a time whereas rotations transform the basis as a whole. In contrast, the reflections method intermingles eigenvector reflections with basis reflections, thereby disrupting the interpretation of the basis orientation.
A comparison between the rotation and reflection methods is illustrated in figure 4. Here, the eigenvector basis spans with an initial orientation of
Following the upper pane in the figure, where the reflection method is illustrated, it is found that and therefore in order to reflect . Once is completed, a clockwise rotation of is imparted by to align with , thus completing . Lastly, (trivially so) and therefore . The tuple of sign reflections for the eigenvectors in is therefore .
A parallel sequence is taken along the lower pane, where a Householder reflection is used in place of rotation. Expression is as before, but to align with a reflection is used. To do so, Householder vector is oriented so that a reflection plane is inclined by from the axis. Reflection of about this plane indeed aligns to but has the side effect of reflecting too. As a consequence, when it is time to inspect the orientation of with respect to , we find that . Therefore the tuple of eigenvector signs is now .
Our focus here is not to discuss which tuple of signs is “correct,” but rather on how to interpret the representation. The choice made in this article is to use rotations instead of reflections for the basis transformations so that the basis orientation is preserved through this essential step. The use of Householder reflections, by contrast, scatters the basis orientation after each application.
7.2 Arctan Calculation Instead of Arcsine
A solution to (19) is stated above by equations (20) in terms of arcsine functions, and a requirement of this approach is that the range of angles be restricted to . While this is the preferred solution, there is another approach that uses arctan functions instead of arcsine. The arctan solution to (19) starts at the top of the column vector and walks downward, the angles being
The first angle is defined as (which is done solely for equation symmetry), and the function is the four-quadrant form , in which the admissible angular domain is . It would appear that the arctan method is a better choice.
The difficulty arises with edge cases. Let us consider the vector
The sequence of arctan angles is then
The final arctan evaluation is not numerically stable. For computer systems that support signed zero, the arctan can either be or , see numpy:arctan2 ; ieee:arctan2 . The latter case is catastrophic because product has forced the sign of to be positive, yet now flips the sign since . After is constructed, the first eigenvector is not considered thereafter, so there is no chance, unless the algorithm is changed, to correct this spurious flip.
Therefore, to avoid edge cases that may give false results, the arcsine method is preferred.
8 Application to Regression
Returning now to the original motivation for a consistently oriented eigenvector basis, the workflow for regression and prediction on the eigenbasis becomes:
SVD: Find from in-sample data such that
Orient: Find from such that
Regression: Find from such that
Prediction: Calculate from , where is out-of-sample data, with
The workflow highlights that both elements of the orientation solution, rotation and reflection , are used, although for different purposes. The regression (26) requires the reflection matrix to ensure that the signs of faithfully align to the response of . Predictive use with out-of-sample data requires the estimate, as expected, but also the rotation matrix . The rotation orients the constituent basis, which is observable, into the eigenbasis, which is not.
Without dimension reduction, the rotation in (27) is associative:
However, associativity breaks with dimension reduction. Principal components analysis, for instance, discards all but the top few eigenvector components (as ranked by their corresponding eigenvalue) and uses the remaining factors in the regression. The number of entries in equals the number of remaining components, not the number of constituent components. In this case, only can be used. Typically, online calculation of this product is simple because the calculation is updated for each new observation. Rather than being an matrix, out-of-sample updates are , which means that requires only a BLAS level-2 (vector-matrix) call on an optimized system blas:live .
9 Treatment of Evolving Data
Eigenvectors and values calculated from data are themselves sample estimates subject to uncertainties based on the particular sample at hand, statistical uncertainties based on the number of independent samples available in each dimension111See Meucci:2007 , Section 4.2.2, Dispersion and Hidden Factors. as well as on assumptions about the underlying distribution,222See Meucci:2007 , Section 4.3, Maximum Likelihood Estimators. and scale-related numerical uncertainties based on the degree of colinearity among factors. All of these uncertainties exist for a single, static dataset.
Additional uncertainty becomes manifest when data evolves because the estimate of the eigensystem will vary at each point in time, this variation being due in part to sample noise, and possibly due to nonstationarity of the underlying random processes.
In the presence of evolving data, therefore, the eigensystem fluctuates, even when consistently oriented. It is natural, then, to seek statistics for the location and dispersion of the eigensystem. To do so, two orthogonal modes of variation are identified (see figure 5): stretch variation, which is tied to change of the eigenvalues; and wobble variation, which is tied to change of the eigenvectors. Each is treated in turn.
9.1 Stretch Variation
Stretch variation is nominally simple because eigenvalues are scalar, real numbers. The mean and variance follow from the usual sample forms of the statistics. For an ensemble of samples, the average eigenvalues are simply
A challenge for the variance statistic is that eigenvalues are not independent since
How the covariance manifests itself is specific to the dataset at hand.
9.2 Wobble Variation
Quantification of wobble variation requires a different toolset because eigenvectors are vectors
, and these orthonormal vectors point onto the surface of a hypersphere such that. Figure 6 left illustrates a case in point for : a stationary variation of the eigenvectors forms point constellations on the surface of the sphere , one constellation for each vector.333Figure 6 was drawn using point constellations drawn from the von Mises–Fisher distribution Mardia:2000 with concentration . Sampling this distribution in and was done using the VonMisesFisher class in the TensorFlowProbability package tensorflow:probability:2019 , which belongs to the TensorFlow platform tensorflow:2015 . The VonMisesFisher class draws samples using the nonrejection based method detailed by Kurz and Hanebeck Kurz:2015 . The field of directional statistics informs us regarding how to define mean direction (the vector analogue of location) and dispersion in a consistent manner for point constellations such as these Mardia:2000 ; Jammal:2001 ; Ley:2017 . Directional statistics provides a guide on how to use the vector information available from and embedded angle information recorded in , see (21). Recent work focuses on the application of these statistics to machine learning Sra:Directional:2018 . Nonetheless, application of directional statistics to eigenvector systems appears to be underrepresented in the literature.
The following discussion only applies to eigensystems that have been consistent orientated.
9.2.1 Estimation of Mean Direction
Both mean direction and directional dispersion are measurable statistics for eigensystems, and determining the mean direction is the simpler of the two. The reason this is so is because the eigenvectors are orthogonal in every instance of an eigenbasis, thus the mean directions must also be orthogonal. Consequently, the eigenvectors can be treated uniformly.
It is a tenet of directional statistics that the mean direction is calculated from unit vectors, not from their angles.444Simple examples exist in the cited literature that show that an angle-based measure is not invariant to the choice a “zero” angle reference. For an ensemble of unit vectors , the mean direction is defined as
with the caveat that the mean direction is undefined for . The resultant vector is the vector sum of component vectors and has length under an norm (see figure 7). The mean direction is thus a unit-vector version of the resultant vector.
Extending this construction for mean direction to an ensemble of oriented eigenvector matrices , the mean location of the eigenvectors is
Using the methodologies above, the average basis can be rotated onto with a suitable rotation matrix , and in doing so, the Cartesian form of can be converted into polar form. Using the generator function from (22) to connect the two, we have
Note that is not the arithmetic average of angles but is exclusively derived from . In fact, an alternative way to express the basis sum in (30) is to write
and from this we see the path of analysis:
9.2.2 Estimation of Dispersion
Dispersion estimation is more involved for an ensemble of eigenbases than for an ensemble of single vectors because both common and differential modes of variation exist. Referring to figure 6 left, consider a case where the only driver of directional variation for the eigenbasis is a change of the pointing direction of . As scatters about its mean direction, vectors will likewise scatter, together forming three constellations of points, as in the figure. Thus, wobble in imparts wobble in the other vectors. Since dispersion is a scalar independent of direction, the dispersion estimates along all three directions in this case are identical.
Yet, there is another possible driver for variation that is orthogonal to , and that is motion within the plane. Such motion is equivalent to a pirouette of the plane about . Taking as a reference, variation within this subspace needs to be estimated, see figure 6 right. When viewed from , however, the point constellation distribution about is a mixture of variation drivers.
Regardless of the estimation technique for dispersion, it is clear that the pattern developed in section 6.3, Rotation Matrix Construction, for walking through subspaces of to attain occurs here as well. Thus, rather than calculate a dispersion estimate for each eigenvector, since they are mixtures, a dispersion estimate is made for each subspace.
Returning to the generator for subspace in (22), given an ensemble the ensemble of the th subspace is
where . From the th column is extracted, and the lower entries taken to produce column vector . Figuratively,
The reason behind of removing the top zero entries from the column vector is that parametric models of dispersion are isotropic in the subspace, and therefore keeping dimensions with zero variation would create an unintended distortion.
Now that the vectors of interest have been identified, their dispersions can be considered. Unlike the mean direction, there is no one measure of dispersion. Two approaches are touched upon here, one being model free and the other based on a parametric density function. Both approaches rely on the resultant vector .
Returning to figure 7, the two panels differ in that on the left there is a higher degree of randomness in the pointing directions of the unit vectors along the sequence, while on the right there is a lower degree of randomness. The consequence is that the expected lengths of the result vectors are different: the lower the randomness, the longer the expected length. Define the mean resultant length for samples as
For distributions on a circle, Mardia and Jupp define a model-free circular variance as (see Mardia:2000 ). Among other things, this variance is higher for more dispersed unit vectors and lower for less dispersed vectors. Yet, after the generalization to higher dimensions, it remains unclear how to compare the variance from one dimension to another.
As an alternative, a model-based framework starts by positing an underlying parametric probability distribution and seeks to define its properties and validate that they meet the desired criteria. In the present case, eigenvectors are directional, as opposed to axial; their evolution-induced scatter is probably unimodal, at least across short sequences; and their dimensionality is essentially arbitrary. An appropriate choice for an underlying distribution therefore is the von Mises–Fisher (vMF) distribution. The vMF density function in is Sra:Directional:2018
The single argument to the density function is the vector , and the parameters are the dimension , mean direction , and concentration . The mean direction parameter is the same as in (29): . The concentration parameter, a scalar, is like an inverse variance. For , unit vectors
are uniformly distributed on the hypersphere, whereas with , the density concentrates to the mean direction . Maximum likelihood estimation yields
where the nonlinear function is
and is the modified Bessel function of the first kind. A useful approximation is
The key aspect here is the connection of the concentration parameter to the mean resultant length from (36). Notice as well that is related to the dimensionality , indicating that concentration values across different dimensions cannot be directly compared.
Sampling from the vMF distribution is naturally desirable. Early approaches used rejection methods (see the introduction in Kurz:2015 ), these being simpler, yet the runtime is not deterministic. Kurz and Hanebeck have since reported a stochastic sampling method that is deterministic, see Kurz:2015 , and it is this method that is implemented in TensorFlow (see footnote 3 on page 3). The samples in figure 6 were calculated with TensorFlow.
To conclude this section, in order to find the underlying drivers of variation along a sequence of eigenbasis observations, concentration parameters , or at least the mean resultant lengths , should be computed for each descending subspace in . The concentration parameters might then be stored as a vector,
9.3 Rank-Order Change of Eigenvectors
Along an evolving system, the rank order of eigenvectors may change. This paper proposes no mathematically rigorous solution to dealing such change—it is unclear that one exists, nor is its absence a defect of the current work—yet there are a few remarks that may assist the analyst to determine whether rank-order change is a property of the data itself or a spurious effect that results from an upstream issue.
Eigenvectors are “labeled” by their eigenvalue: in the absence of eigenvalue degeneracy, each vector is mapped one-to-one with a value. The concept of rank-order change of eigenvectors along evolving data is only meaningful if a second label can be attached to the vectors such that the second label does not change when the first does. Intuitively (for evolving data), a second label is the pointing direction of the vectors. This choice is fraught: Two time series are not comparable if they begin at different moments along a sequence because the initial labeling may not match. Additionally, validation of secondary labeling stability needs to be performed, and revalidation is necessary for each update because stability cannot be guaranteed.
With an assumption that a second labeling type is suitably stable, rank-order changes may still be observed. Before it can be concluded that the effect is a trait of the data, modeling assumptions must be considered first. There is a significant assumption embedded in the estimation of the eigensystem itself: using an eig(..) function call presupposes that the underlying statistical distribution is Gaussian. However, data is often heavy tailed, so copula or implicit methods are appropriate since Gaussian-based methods are not robust (see Meucci:2007
). Another modeling concern is the number of independent samples per dimension used in each panel: too few samples inherently skews the eigenvalue spectrum, and when coupled with sample noise, may induce spurious rank-order changes (again, seeMeucci:2007 ). Lastly, PCA ought only be performed on homogeneous data categories because the mixing of categories may well lead to spurious rank-order changes. Better to apply PCA independently to each category and then combine.
If rank-order change is determined to be a true trait of the data, then axial statistics is used in place of directional statistics. Referring to figure 6 above, two or more constellations will have some of their points mirror-imaged through the origin, creating a “barbell” shape along a common axis. The juxtaposition of the Watson density function, which is an axial distribution, to the von Mises–Fisher distribution,
shows that the axial direction is squared in order to treat the dual-signed nature of the pointing directions Sra:Axial:2013 . With this, directional statistics can be applied once again.
9.4 Regression Revisited
Let us apply the averages developed in this section to the equations in section 8, Application to Regression. The SVD and orientation steps remain the same. The linear regression of (26) is modified to use the local average of the eigenvalues from (28),
The prediction step is also modified from (27) to read
Here the mean direction of the eigenvectors in (30) replaces the single-observation rotation matrix in the original.
Use of eigenvalue and eigenvector averages will reduce sample-based fluctuation and therefore may improve the predictions. However, from a time-series perspective, averages impart delay because an average is constructed based on a lookback interval. For stationary underlying processes the delay may not a problem, yet for nonstationary processes the lag between the average and the current state can lead to lower quality predictions.
In any event, at the very least this section has presented a methodology to decompose variation across a sequence of eigensystem observations into meaningful quantities.
10 Reference Implementation
The Python package thucyd, written by this author, is freely available from PyPi555Available at pypi.org/project/thucyd and Conda-Forge666Available at github.com/conda-forge/thucyd-feedstock and can be used directly once installed. The source code is available on at gitlab.com/thucyd-dev/thucyd.
There are two functions exposed on the interface of thucyd.eigen:
orient_eigenvectors implements the algorithm in section 6, and
, eq. (22).
The pseudocode in listing 1 outlines the reference implementation for orient_ eigenvectors that is available at the above-cited source-code repositories. Matrix and vector indexing follows the Python Numpy notation.
10.1 Eigenvector Orientation
A simple orientation example in reads as follows: