DeepAI
Log In Sign Up

A Consistently Oriented Basis for Eigenanalysis

12/30/2019
by   Jay Damask, et al.
0

Repeated application of machine-learning, eigen-centric methods to an evolving dataset reveals that eigenvectors calculated by well-established computer implementations are not stable along an evolving sequence. This is because the sign of any one eigenvector may point along either the positive or negative direction of its associated eigenaxis, and for any one eigen call the sign does not matter when calculating a solution. This work reports an algorithm that creates a consistently oriented basis of eigenvectors. The algorithm postprocesses any well-established eigen call and is therefore agnostic to the particular implementation of the latter. Once consistently oriented, directional statistics can be applied to the eigenvectors in order to track their motion and summarize their dispersion. When a consistently oriented eigensystem is applied to methods of machine-learning, the time series of training weights becomes interpretable in the context of the machine-learning model. Ordinary linear regression is used to demonstrate such interpretability. A reference implementation of the algorithm reported herein has been written in Python and is freely available, both as source code and through the thucyd Python package.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

04/07/2021

DoubleML – An Object-Oriented Implementation of Double Machine Learning in Python

DoubleML is an open-source Python library implementing the double machin...
03/09/2022

SparseChem: Fast and accurate machine learning model for small molecules

SparseChem provides fast and accurate machine learning models for bioche...
03/21/2018

Seglearn: A Python Package for Learning Sequences and Time Series

Seglearn is an open-source python package for machine learning time seri...
03/17/2021

DoubleML – An Object-Oriented Implementation of Double Machine Learning in R

The R package DoubleML implements the double/debiased machine learning f...
06/01/2020

F-IVM: Learning over Fast-Evolving Relational Data

F-IVM is a system for real-time analytics such as machine learning appli...
09/15/2021

Sign-MAML: Efficient Model-Agnostic Meta-Learning by SignSGD

We propose a new computationally-efficient first-order algorithm for Mod...
08/05/2021

Implementing the BBE Agent-Based Model of a Sports-Betting Exchange

We describe three independent implementations of a new agent-based model...

1 Overview

Eigenanalysis plays a central role in many aspects of machine learning because the natural coordinates and relative lengths of a multifactor dataset are revealed by the analysis Hastie:2001 ; Alpaydin:2010 ; Kuhn:2013 ; Scholkopf:2002 . Singular-value decomposition and eigendecomposition are two of the most common instances of eigenanalysis Strang:1988 ; Golub:2014 , yet there is a wide range of adaptation, such as eigenface FaceRecognition:2011 . The eigenanalysis ecosystem is so important and well studied that highly optimized implementations are available in core numerical libraries such as LAPACK lapack:1999 .

This article does not seek to suggest or make any improvement to the existing theoretical or numerical study of eigenanalysis as it applies to any one standalone problem. Instead, the focus of this article is on the improvement of eigenanalysis when applied repeatedly to an evolving dataset. With today’s world being awash in data, and with machine-learning techniques being applied regularly, if not continuously, on ever-expanding datasets as new data arrives, the evolution of the eigensystem ought to be included in any well-conceived study. Yet, the researcher will find that the orientation of the eigenvector basis produced by current eigenanalysis algorithms is not consistent, that it may change according to hardware and software, and that it is sensitive to data perturbations.

Figure 1: An ellipsoid-shaped point cloud in 

typical of centered, variance-bound data having a stationary or slowly changing underlying random process. The point cloud is quoted in a constituent basis

while the eigenvectors lie along the natural axes of the data. Here, the ambiguity of the direction of  is unresolved. Left, a left-handed basis of eigenvectors. Right, a right-handed basis of eigenvectors.

This paper demonstrates a method for the consistent orientation of an eigensystem, in which the orientation is made consistent after an initial call to a canonical eigenanalysis routine. This method, however, does not address the issue of a correct orientation: correctness is context specific and must be addressed in a manner exogenous to eigenanalysis. The analyst will need to know, for instance, whether a positive change in the factors aligned to particular eigenvectors should imply a positive or negative change in a dependent feature. In this article, consistency alone is sought.

As a consequence of the method detailed below, the evolution of the eigenvectors themselves can be tracked. Eigenvectors that span  lie on a hypersphere , and the mean pointing direction on the sphere and dispersion about the mean are statistics of interest, as is any statistically significant drift. Basic results from directional statistics (see Mardia:2000 ; Jammal:2001 ; Ley:2017 ) are used to study the mean direction and dispersion of the common and differential modes over eigenvector evolution.

2 Inconsistent Orientation

Given a square, symmetric matrix, , with real-valued entries, , the eigendecomposition is

(1)

where the columns of  are the eigenvectors of , the diagonal entries along  are the eigenvalues, and entries in both are purely real. Moreover, when properly normalized, , indicating that the eigenvectors form an orthonormal basis.

To inspect how the eigenvectors might be oriented, write out  as , where 

is a column vector, and take the inner product two ways:

A choice of  is indistinguishable from a choice of  because in either case the equality in (1) holds. In practice, the eigenvectors take an inconsistent orientation, even though they will have a particular orientation once returned from a call to an eigenanalysis routine.

A similar situation holds for singular-value decomposition (SVD). For a centered panel of data  where , SVD gives

(2)

where the columns in panel  are the projections of the columns of  onto the eigenvectors , scaled to have unit variance. The eigenvectors have captured the natural orientation of the data. Unlike the example above, the eigensystem here is positive semidefinite, so all eigenvalues in  are nonnegative, and in turn a scatter plot of data in  typically appears as an ellipsoid in  (see figure 1). Following the thought experiment above, a sign flip of an eigenvector in  flips the sign of the corresponding column in , thereby preserving the equality in (2). Consequently, the eigenvector orientations produced by SVD also have inconsistent orientation, since either sign satisfies the equations.

In isolation, such inconsistent eigenvector orientation does not matter. The decomposition is correct and is generally produced efficiently. However, in subsequent analysis of the eigensystem—principal component analysis (

PCA) being a well known example Hastie:2001 —and for an evolving environment, the inconsistency leads to an inability to interpret subsequent results and the inability to track the eigenvector evolution.

Figure 2: Left, in a regression along the eigenaxes yields the tuple . Right, top, because the eigenvector direction can flip from one eigenanalysis to the next, the signs of the -tuple also flip, defeating any attempt at interpretability. Right, bottom, ideally there is no sign flipping of the eigenvectors and therefore no flips of the -tuple along a sequence.

For instance, a simple linear regression performed on the eigensystem (whether with reduced dimension or not) with dependent variable  reads

(3)

where the weight vector is , . Each entry in  corresponds to a column in , which in turn corresponds to an eigenvector in . If the sign of one of the eigenvectors is flipped (compare left and right in figure 1), the corresponding column in  has its sign flipped, which flips the sign of the corresponding  entry. In an evolving environment, a particular regression might be denoted by index , and the th regression reads

(4)

When tracking any particular entry in the  vector, say , it will generally be found that  flips sign throughout the evolution of system (see figure 2). There are two consequences: First, for a single regression, the interpretation of how a change of the th factor corresponds to a change in  is unknown because the th eigenvector can take either sign. Second, as the data evolves, the sign of  will generally change, even if the unsigned value remains fixed, so that the time series of  cannot be meaningfully analyzed. If these sources of sign flips can be eliminated, then a time series of weights  can be interpreted as we might have expected (see figure 2 lower right).

3 Toward a Consistent Orientation

To construct a consistent eigenvector basis we must begin with a reference, and SVD offers a suitable start. The data in matrix  of (2) is presented with  columns, where, typically, each column refers to a distinct observable or feature. Provided that the data is well formed, such as having zero mean, bound variance, and negligible autocorrelation along each column, the row-space of  forms the constituent basis  with basis vectors  such that (see figure 1). If we label the columns in  by , then the basis  when materialized onto the constituent axes is simply

or, .

Now, when plotted as a scatter plot in , the data in the rows of  typically form an ellipsoid in the space. When the columns of  have zero linear correlation, the axes of the ellipsoid align to the constituent axes. As correlation is introduced across the columns of , the ellipsoid rotates away. Given such a case, we might seek to rotate the ellipsoid back onto the constituent axes. The constituent axes can therefore act as our reference.

This idea has good flavor, but is incomplete. An axis is different from a vector, for a vector may point along either the positive or negative direction of a particular axis. Therefore, ambiguity remains. To capture this mathematically, let us construct a rotation matrix  such that  rotates the data ellipse onto the constituent axes. A modified SVD will read

If it were the case that , then the goal would be achieved. However, is not true. Instead,

(5)

is the general case, where the sign on the th entry depends on the corresponding eigenvector orientation in . While the following is an oversimplification of the broader concept of orientation, it is instructive to consider that a basis in  may be oriented in a right- or left-handed way. It is not possible to rotate one handedness onto the other because of the embedded reflection.

It is worth noting that an objection might be raised at this point because the eigenvector matrix 

is itself a unitary matrix with

and therefore serves as the sought-after rotation. Indeed, . The problem is that  is quadratic, so any embedded reflections cancel. In order to identify the vector orientation within , we must move away from the quadratic form.

4 A Representation for Consistent Orientation

The representation detailed herein uses the constituent basis as the reference basis and seeks to make (5) an equality for a particular . To do so, first define the diagonal sign matrix  as

(6)

The product is , and clearly . Given an entry , the product reflects the th eigenvector such that while leaving the other eigenvectors unaltered.

An equality can now be written with the representation

(7)

given suitable rotation and reflection matrices  and . This representation says that a rotation matrix  can be found to rotate  onto the constituent basis , provided that a diagonal reflection matrix  is found to selectively flip the sign of eigenvectors in  to ensure complete alignment in the end. Once (7) is satisfied, the oriented eigenvector basis  can be calculated in two ways:

(8)

To achieve this goal, the first step is to order the eigenvalues in descending order, and reorder the eigenvectors in  accordingly. (Since the representation (7) holds for real symmetric matrices and not solely for positive semidefinite systems, the absolute value of the eigenvalues should be ordered in descending order.) Indexing into  below expects that this ordering has occurred.

The representation (7) is then expanded so that each eigenvector is inspected for a reflection, and a rotation matrix is constructed for its alignment to the constituent. The expansion reads

(9)

There is a good separation of concerns here: reflections are imparted by the  operators to the right of , while rotations are imparted by the  operators to the left. Reflections do not preserve orientation whereas rotations do preserve orientation.

The algorithm created to attain a consistent orientation follows the pattern

(10)

For each eigenvector, a possible reflection is computed first, followed by a rotation. The result of the rotation is to align the target eigenvector with its associated constituent basis vector. The first rotation is performed on the entire space , the second is performed on the subspace  so that the alignment is preserved, the third rotation operates on  to preserve the preceding alignments, and so on. The solution to (7) will include  reflection elements  and rotation angles embedded in the  rotation matrices .

Before working through the solution, it will be helpful to walk through an example of the attainment algorithm in .

5 Algorithm Example in 

This example illustrates a sequence of reflections and rotations in  to attain alignment between  and , and along the way shows cases in which a reflection or rotation is simply an identity operator.

A example eigenvector  is created with a left-hand basis, such that an alignment to the constituent basis  would yield . The precondition of ordered eigenvalues has already been met. Figure 3 (a) shows the relative orientation of  to .

Figure 3: Example of the orientation of a left-handed eigenvector basis  to the constituent basis . a) Orthonormal eigenvectors as they are oriented with respect to the constituent basis vectors . b) Rotation of  so that . c) Reflection of  so that . d) Rotation of by  to align  with .

The first step is to consider whether or not to reflect , the purpose being to bring 

into the nonnegative side of the halfspace created by the hyperplane perpendicular to 

. To do so, the relative orientation of is measured with an inner product. The reflection entry  is then determined from

(with ). From this, is formed. A consequence of the resultant orientation is that the angle subtended between  and  is .

The next step is to align  with  through a rotation : the result is shown in figure 3 (b). The principal result of is that : the basis vectors are aligned. Significantly, the remaining vectors are perpendicular to , and therefore the subsequent rotation needs only to act in the subspace.

With this sequence of operations in the full space  now complete, the subspace  is treated next. As before, the first step is to inspect the orientation of  with . However, care must be taken because  is no longer in its original orientation; instead, it was rotated with the whole basis during , as shown in figure 3 (b). It is the new orientation of  that needs inspection, and that orientation is . As seen in figure (b), and  point in opposite directions, so eigenvector  needs to be reflected. We have

which in this example is . The updated representation of the eigenbasis is , as seen in figure 3 (c).

To complete work in , rotation  is found to bring  into alignment with . The sequence is illustrated in figure 3 (d).

The solution needs to work through the last subspace  to align . The reflection entry  is evaluated from

where in this example. From this we arrive at . Canonically, a final rotation  will carry into , but since we are in the last subspace of , the operation is a tautology. Still, is included in the representation (9) for symmetry.

Lastly, the final, oriented eigenvector basis can be calculated two ways,

(11)

compare with (8).

In summary, the algorithm walks down from the full space  into two subspaces in order to perform rotations that do not upset previous alignments. As this example shows, , so there are in fact two reflection operators that are simply identities. Moreover, , since the prior rotations automatically align that last basis vector to within a reflection.

6 Algorithm Details

The algorithm that implements (9) is now generalized to . There are three building blocks to the algorithm: sorting of the eigenvalues in descending order, calculation of the reflection entries, and construction of the rotation matrices. Of the three, construction of the rotation matrices requires the most work and has not been previously addressed in detail.

6.1 Sorting of Eigenvalues

The purpose of sorting the eigenvalues in descending order and rearranging the corresponding eigenvectors accordingly is to ensure that the first rotation operates on the axis of maximum variance, the second rotation on the axis of next-largest variance, and so on, the reason being that the numerical certainty of the principal ellipsoid axis is higher than that of any other axis. Since the rotations are concatenated in this method, an outsize error introduced by an early rotation persists throughout the calculation. By sorting the eigenvalues and vectors first, any accumulated errors will be minimized.

Since the algorithm works just as well for real-valued symmetric matrices, it is reasonable to presort the eigenvalues in descending order of their absolute value. From a data-science perspective, we generally expect positive semidefinite systems; therefore a system with negative eigenvalues is unlikely, and if one is encountered, there may be larger upstream issues to address.

6.2 Reflection Calculation

Reflection entries are calculated by inspecting the inner product between a rotated eigenvector and its associated constituent basis (several examples are given in the previous section). The formal reflection-value expression for the th basis vector is

(12)

where

(13)

Despite of the apparent complexity of these expressions, the practical implementation is simply to inspect the sign of the entry in the matrix.

6.3 Rotation Matrix Construction

One way to state the purpose of a particular rotation is that the rotation reduces the dimension of the subspace by one. For an eigenvector matrix  in , denoted by , the rotation acts such that

(14)

Here, ensures that the  entry on the right is . Next, the sign of the  matrix entry is inspected and  set accordingly. The next rotation decrements the dimension again such that

(15)

As before, ensures that the  entry on the right is . The  entries along the diagonal reflect the fact that a eigenvector has been aligned with its corresponding constituent basis vector. Subsequent rotations preserve the alignment by operating in an orthogonal subspace.

6.3.1 Alignment to One Constituent Basis Vector

To simplify the notation for the analysis below, let us collapse  and all operations prior to the th rotation into a single working matrix, denoted by . The left-hand side of the two equations above then deals with and . The th column of the th working matrix is , and the th column entry is .

With this notation, the actions of  and  above on the principal working columns  and  are

(16)

Strictly speaking, the  entries in the column vectors on the right-hand sides are , but since  is expected to be orthonormal, the inner product is unity.

Equations like (16) are well known in the linear algebra literature, for instance Strang:1988 ; Golub:2014

, and form the basis for the implementation of QR decomposition (see also

NRC:1992 ). There are two approaches to the solution: use of Householder reflection matrices, or use of Givens rotation matrices. The Householder approach is reviewed in section 7.1 below, but in the end it is not suitable in this context. The Givens approach works well, and the solution here has an advantage over the classical formulation because we can rely on .

To simplify the explanation once again, let us focus on the four-dimensional space . In this case, needs to zero out three entries below the pivot, and likewise needs to zero out two entries below its pivot. Since one Givens rotation matrix can zero out one column entry at most, a cascade of Givens matrices is required to realize the pattern in (16) and the length of the cascade depends on the dimension of the current subspace.

A Givens rotation imparts a rotation within an  plane embedded in a higher-dimensional space. While a simple two-dimensional counterclockwise rotation matrix is

an example Givens rotation matrix in  is

where the dot notation on the right represents the pattern of nonzero entries. The full rotation  is then decomposed into a cascade of Givens rotations:

(17)

The order of the component rotations is arbitrary and is selected for best analytic advantage. That said, a different rotation order yields different Givens rotation angles, so there is no uniqueness to the angles calculated below. The only material concern is that the rotations must be applied in a consistent order.

A advantageous cascade order of Givens rotations to expand the left-hand equation in (16) follows the pattern

(18)

where the rotation  has been moved to the other side of the original equation, and  denotes a vector in , in which all entries are zero except for a unit entry in the th row. Each Givens matrix moves some of the weight from the first row of  into a specific row below, and otherwise the matrices are not coupled.

In the same vein, Givens rotations to represent  follow the pattern

Similar to before, weight in the  entry of  is shifted into the third and fourth rows of the final vector. It is evident that  requires only one Givens rotation, and that for 

is simply the identity matrix.

6.3.2 Solution for Givens Rotation Angles

Now, to solve for the Givens angles, the cascades are multiplied through. For the  cascade in (18), multiplying through yields

(19)

where entries  denote row entries in  and are used only to further simplify the notation. This is the central equation, and it has a straightforward solution. Yet first, there are several important properties to note:

  1. The  norms of the two column vectors are both unity.

  2. While there are four equations, the three angles  are the only unknowns. The fourth equation is constrained by the  norm.

  3. The  entry (or specifically the entry) is nonnegative: . This holds by construction because the associated reflection matrix, applied previously, ensures the nonnegative value of this leading entry.

  4. In order to uniquely take the arcsine, the rotation angles are restricted to the domain . As a consequence, the sign of each row is solely determined by the sine-function values, the cosine-function values being nonnegative on this domain: . This leads to a global constraint on the solution since could otherwise hold for pairs of negatively signed cosine-function values.

With these properties in mind, a solution to (19) uses the arcsine method, which starts from the bottom row and works upward. The sequence in this example reads

(20)

where . The sign of each angle is solely governed by the sign of the corresponding  entry. Moreover, other than the first angle, the Givens angles are coupled in the sense that no one angle can be calculated from the entries of working matrix  alone, and certainly not prior to the rotation of  into . This shows once again that there is no shortcut to working down from the full space through each subspace until  is resolved.

Past the first equation, the arguments to the arcsine functions are quotients, and it will be reassuring to verify that these quotients never exceed unity in absolute value. For , we can write

thus , so the arcsine function has a defined value. Similarly, for  we have

and therefore . Again, the arcsine function has a defined value. This pattern holds in any dimension.

The preceding analysis is carried out for each eigenvector in . For each subspace  there are  angles to resolve. For convenience, the Givens angles can be organized into the upper right triangle of a square matrix. For instance, in ,

(21)

Lastly, it is worth noting that Dash constructed a storage matrix of angles similar to (21) when calculating the embedded angles in a correlation matrix Dash:Polar:2004 . His interest, shared here, is to depart from the Cartesian form of a matrix—whose entries in his case are correlation values and in the present case are eigenvector entries—and cast them into polar form. Polar form captures the natural representation of the system.

6.3.3 Summary

This section has shown that each eigenvector rotation matrix  is constructed from a concatenation of elemental rotation matrices called Givens rotations. Each Givens rotation has an angle, and provided that the rotation order is always preserved, the angles are a meaningful way to represent the orientation of the eigenbasis within the constituent basis.

The representation for  in (21) highlights that there are angles to calculate to complete the full basis rotation in . It will be helpful to connect the angles to a rotation matrix, so let us define a generator function  such that

(22)

The job of the generator is to concatenate Givens rotations in the correct order, using the recorded angles, to correctly produce a rotation matrix, either a matrix  for one vector, or the matrix  for the entire basis.

7 Consideration of Alternative Approaches

The preceding analysis passes over two choices made during the development of the algorithm. These alternatives are detailed in this section, together with the reasoning for not selecting them.

7.1 Householder Reflections

Equations nearly identical to (16) appear in well-known linear algebra texts, such as Golub:2014 ; Strang:1988 ; Strang:1986 , typically in the context of QR decomposition, along with the advice that Householder reflections are preferred over Givens rotations because all entries below a pivot can be zeroed out in one step. This advice is accurate in the context of QR decomposition, but does not hold in the current context.

Recall that in order to align eigenvector  to constituent basis , Givens rotations are necessary for a space of dimension . Arranged appropriately, each rotation has the effect of introducing a zero in the column vector below the pivot. A Householder reflection aligns  to  in a single operation by reflecting  onto  about an appropriately aligned hyperplane in . In general this hyperplane is neither parallel nor perpendicular to , and therefore neither are any of the other eigenvectors in .

The Householder operator is written as

where  is the Householder vector that lies perpendicular to the reflecting hyperplane. The operator is Hermitian, unitary, involutory, and has . In fact, all eigenvalues are , except for one that is .

Let us reconsider the eigenvector representation (7) and apply a general unitary transform  to , as in

(23)

(Unitary transform  is not to be confused with the matrix of projected data  in SVD equation (2).) The equality holds for any unitary operator, so Householder reflections are admitted. Let us then compare expanded representations in the form of (9) for both rotation and reflection operators to the left of :

Even though (23) holds for Householder reflections, the equality itself is not the goal; rather, the interpretability of the eigenvector orientation is paramount. In the rotations method, reflections only reflect one eigenvector at a time whereas rotations transform the basis as a whole. In contrast, the reflections method intermingles eigenvector reflections with basis reflections, thereby disrupting the interpretation of the basis orientation.

Figure 4: Rotation versus reflection transformation comparison. Top, the rotation method produces the sign tuple . Bottom, the Householder reflection method produces the sign tuple .

A comparison between the rotation and reflection methods is illustrated in figure 4. Here, the eigenvector basis spans  with an initial orientation of

(24)

Following the upper pane in the figure, where the reflection method is illustrated, it is found that and therefore in order to reflect . Once  is completed, a clockwise rotation of  is imparted by  to align  with , thus completing . Lastly, (trivially so) and therefore . The tuple of sign reflections for the eigenvectors in  is therefore .

A parallel sequence is taken along the lower pane, where a Householder reflection is used in place of rotation. Expression  is as before, but to align  with  a reflection is used. To do so, Householder vector  is oriented so that a reflection plane is inclined by  from the  axis. Reflection of  about this plane indeed aligns  to  but has the side effect of reflecting  too. As a consequence, when it is time to inspect the orientation of  with respect to , we find that . Therefore the tuple of eigenvector signs is now .

Our focus here is not to discuss which tuple of signs is “correct,” but rather on how to interpret the representation. The choice made in this article is to use rotations instead of reflections for the basis transformations so that the basis orientation is preserved through this essential step. The use of Householder reflections, by contrast, scatters the basis orientation after each application.

7.2 Arctan Calculation Instead of Arcsine

A solution to (19) is stated above by equations (20) in terms of arcsine functions, and a requirement of this approach is that the range of angles be restricted to . While this is the preferred solution, there is another approach that uses arctan functions instead of arcsine. The arctan solution to (19) starts at the top of the column vector and walks downward, the angles being

(25)

The first angle is defined as (which is done solely for equation symmetry), and the  function is the four-quadrant form , in which the admissible angular domain is . It would appear that the arctan method is a better choice.

The difficulty arises with edge cases. Let us consider the vector

The sequence of arctan angles is then

The final arctan evaluation is not numerically stable. For computer systems that support signed zero, the arctan can either be or , see numpy:arctan2 ; ieee:arctan2 . The latter case is catastrophic because product  has forced the sign of  to be positive, yet now  flips the sign since . After  is constructed, the first eigenvector is not considered thereafter, so there is no chance, unless the algorithm is changed, to correct this spurious flip.

Therefore, to avoid edge cases that may give false results, the arcsine method is preferred.

8 Application to Regression

Returning now to the original motivation for a consistently oriented eigenvector basis, the workflow for regression and prediction on the eigenbasis becomes:

  1. SVD: Find  from in-sample data such that

  2. Orient: Find  from  such that

  3. Regression: Find  from such that

    (26)
  4. Prediction: Calculate  from , where is out-of-sample data, with

    (27)

The form of (26) here assumes that there is no exogenous correction to the direction of  in response to a change of the factors along the eigenbasis, as discussed in section 1, Overview.

The workflow highlights that both elements of the orientation solution, rotation  and reflection , are used, although for different purposes. The regression (26) requires the reflection matrix  to ensure that the signs of  faithfully align to the response of . Predictive use with out-of-sample data requires the estimate, as expected, but also the rotation matrix . The rotation orients the constituent basis, which is observable, into the eigenbasis, which is not.

Without dimension reduction, the rotation in (27) is associative:

However, associativity breaks with dimension reduction. Principal components analysis, for instance, discards all but the top few eigenvector components (as ranked by their corresponding eigenvalue) and uses the remaining factors in the regression. The number of entries in equals the number of remaining components, not the number of constituent components. In this case, only can be used. Typically, online calculation of this product is simple because the calculation is updated for each new observation. Rather than being an  matrix, out-of-sample updates are , which means that  requires only a BLAS level-2 (vector-matrix) call on an optimized system blas:live .

9 Treatment of Evolving Data

Figure 5: Orthogonal modes of variation of an eigensystem. Left, in one variation mode, the eigenvalues of a new observation differ from those of the past, thereby stretching and/or compressing the point-cloud ellipse. Right, in another variation mode, it is the eigenvectors that vary in this way, thereby changing the orientation of the point-cloud ellipse with respect to the constituent basis.

Eigenvectors and values calculated from data are themselves sample estimates subject to uncertainties based on the particular sample at hand, statistical uncertainties based on the number of independent samples available in each dimension111See Meucci:2007 , Section 4.2.2, Dispersion and Hidden Factors. as well as on assumptions about the underlying distribution,222See Meucci:2007 , Section 4.3, Maximum Likelihood Estimators. and scale-related numerical uncertainties based on the degree of colinearity among factors. All of these uncertainties exist for a single, static dataset.

Additional uncertainty becomes manifest when data evolves because the estimate of the eigensystem will vary at each point in time, this variation being due in part to sample noise, and possibly due to nonstationarity of the underlying random processes.

In the presence of evolving data, therefore, the eigensystem fluctuates, even when consistently oriented. It is natural, then, to seek statistics for the location and dispersion of the eigensystem. To do so, two orthogonal modes of variation are identified (see figure 5): stretch variation, which is tied to change of the eigenvalues; and wobble variation, which is tied to change of the eigenvectors. Each is treated in turn.

9.1 Stretch Variation

Stretch variation is nominally simple because eigenvalues  are scalar, real numbers. The mean and variance follow from the usual sample forms of the statistics. For an ensemble of  samples, the average eigenvalues are simply

(28)

A challenge for the variance statistic is that eigenvalues are not independent since

How the covariance manifests itself is specific to the dataset at hand.

9.2 Wobble Variation

Quantification of wobble variation requires a different toolset because eigenvectors are vectors

, and these orthonormal vectors point onto the surface of a hypersphere such that

. Figure 6 left illustrates a case in point for : a stationary variation of the eigenvectors forms point constellations on the surface of the sphere , one constellation for each vector.333Figure 6 was drawn using point constellations drawn from the von Mises–Fisher distribution Mardia:2000 with concentration . Sampling this distribution in  and  was done using the VonMisesFisher class in the TensorFlowProbability package tensorflow:probability:2019 , which belongs to the TensorFlow platform tensorflow:2015 . The VonMisesFisher class draws samples using the nonrejection based method detailed by Kurz and Hanebeck Kurz:2015 . The field of directional statistics informs us regarding how to define mean direction (the vector analogue of location) and dispersion in a consistent manner for point constellations such as these Mardia:2000 ; Jammal:2001 ; Ley:2017 . Directional statistics provides a guide on how to use the vector information available from and embedded angle information recorded in , see (21). Recent work focuses on the application of these statistics to machine learning Sra:Directional:2018 . Nonetheless, application of directional statistics to eigenvector systems appears to be underrepresented in the literature.

The following discussion only applies to eigensystems that have been consistent orientated.

Figure 6: Illustration of eigenvector wobble in  and its  subspace. Left, points illustrating eigenvector wobble for a stationary process. The points lie on the hypersphere , and there is one constellation for each eigenvector. Samples were drawn from a von Mises–Fisher distribution with concentration . In this illustration, the point constellations are identical, each having been rotated onto the mean direction of each vector. In general the point constellations are mixtures having different component dispersions. Right, eigenvector  may wobble in the plane independently from  wobble. Points here illustrate wobble in this  subspace.

9.2.1 Estimation of Mean Direction

Both mean direction and directional dispersion are measurable statistics for eigensystems, and determining the mean direction is the simpler of the two. The reason this is so is because the eigenvectors are orthogonal in every instance of an eigenbasis, thus the mean directions must also be orthogonal. Consequently, the eigenvectors can be treated uniformly.

It is a tenet of directional statistics that the mean direction is calculated from unit vectors, not from their angles.444Simple examples exist in the cited literature that show that an angle-based measure is not invariant to the choice a “zero” angle reference. For an ensemble of  unit vectors , the mean direction is defined as

(29)

with the caveat that the mean direction is undefined for . The resultant vector  is the vector sum of component vectors  and has length  under an  norm (see figure 7). The mean direction is thus a unit-vector version of the resultant vector.

Extending this construction for mean direction to an ensemble of oriented eigenvector matrices , the mean location of the eigenvectors is

(30)

and

(31)

for normalization.

Using the methodologies above, the average basis  can be rotated onto  with a suitable rotation matrix , and in doing so, the Cartesian form of  can be converted into polar form. Using the generator function  from (22) to connect the two, we have

(32)

Note that is not the arithmetic average of angles but is exclusively derived from . In fact, an alternative way to express the basis sum in (30) is to write

(33)

and from this we see the path of analysis:

9.2.2 Estimation of Dispersion

Dispersion estimation is more involved for an ensemble of eigenbases than for an ensemble of single vectors because both common and differential modes of variation exist. Referring to figure 6 left, consider a case where the only driver of directional variation for the eigenbasis is a change of the pointing direction of . As  scatters about its mean direction, vectors  will likewise scatter, together forming three constellations of points, as in the figure. Thus, wobble in  imparts wobble in the other vectors. Since dispersion is a scalar independent of direction, the dispersion estimates along all three directions in this case are identical.

Yet, there is another possible driver for variation that is orthogonal to , and that is motion within the plane. Such motion is equivalent to a pirouette of the plane about . Taking as a reference, variation within this  subspace needs to be estimated, see figure 6 right. When viewed from , however, the point constellation distribution about  is a mixture of variation drivers.

Regardless of the estimation technique for dispersion, it is clear that the pattern developed in section 6.3, Rotation Matrix Construction, for walking through subspaces of  to attain occurs here as well. Thus, rather than calculate a dispersion estimate for each eigenvector, since they are mixtures, a dispersion estimate is made for each subspace.

Returning to the generator for subspace  in (22), given an ensemble  the ensemble of the th subspace is

(34)

where . From  the th column is extracted, and the lower  entries taken to produce column vector . Figuratively,

(35)

The reason behind of removing the top zero entries from the column vector is that parametric models of dispersion are isotropic in the subspace, and therefore keeping dimensions with zero variation would create an unintended distortion.

Figure 7: The resultant vector for two levels of concentration, . Lower concentration, which is akin to higher dispersion, leads to a smaller expected norm of the resultant vector , whereas higher concentration leads to a larger expected norm. Sample vectors in  were generated from the von Mises–Fisher distribution. (Note that the difference in horizontal and vertical scales distorts the length of the unit vectors in the plots.)

Now that the vectors of interest  have been identified, their dispersions can be considered. Unlike the mean direction, there is no one measure of dispersion. Two approaches are touched upon here, one being model free and the other based on a parametric density function. Both approaches rely on the resultant vector .

Returning to figure 7, the two panels differ in that on the left there is a higher degree of randomness in the pointing directions of the unit vectors along the sequence, while on the right there is a lower degree of randomness. The consequence is that the expected lengths of the result vectors are different: the lower the randomness, the longer the expected length. Define the mean resultant length for  samples as

(36)

For distributions on a circle, Mardia and Jupp define a model-free circular variance as (see Mardia:2000 ). Among other things, this variance is higher for more dispersed unit vectors and lower for less dispersed vectors. Yet, after the generalization to higher dimensions, it remains unclear how to compare the variance from one dimension to another.

As an alternative, a model-based framework starts by positing an underlying parametric probability distribution and seeks to define its properties and validate that they meet the desired criteria. In the present case, eigenvectors are directional, as opposed to axial; their evolution-induced scatter is probably unimodal, at least across short sequences; and their dimensionality is essentially arbitrary. An appropriate choice for an underlying distribution therefore is the von Mises–Fisher (

vMF) distribution. The vMF density function in  is Sra:Directional:2018

(37)

The single argument to the density function is the vector , and the parameters are the dimension , mean direction , and concentration . The mean direction parameter is the same as in (29): . The concentration  parameter, a scalar, is like an inverse variance. For , unit vectors 

are uniformly distributed on the hypersphere

, whereas with , the density concentrates to the mean direction . Maximum likelihood estimation yields

where the nonlinear function  is

and  is the modified Bessel function of the first kind. A useful approximation is

The key aspect here is the connection of the concentration parameter  to the mean resultant length  from (36). Notice as well that  is related to the dimensionality , indicating that concentration values across different dimensions cannot be directly compared.

Sampling from the vMF distribution is naturally desirable. Early approaches used rejection methods (see the introduction in Kurz:2015 ), these being simpler, yet the runtime is not deterministic. Kurz and Hanebeck have since reported a stochastic sampling method that is deterministic, see Kurz:2015 , and it is this method that is implemented in TensorFlow (see footnote 3 on page 3). The samples in figure 6 were calculated with TensorFlow.

To conclude this section, in order to find the underlying drivers of variation along a sequence of eigenbasis observations, concentration parameters , or at least the mean resultant lengths , should be computed for each descending subspace in . The  concentration parameters might then be stored as a vector,

(38)

9.3 Rank-Order Change of Eigenvectors

Along an evolving system, the rank order of eigenvectors may change. This paper proposes no mathematically rigorous solution to dealing such change—it is unclear that one exists, nor is its absence a defect of the current work—yet there are a few remarks that may assist the analyst to determine whether rank-order change is a property of the data itself or a spurious effect that results from an upstream issue.

Eigenvectors are “labeled” by their eigenvalue: in the absence of eigenvalue degeneracy, each vector is mapped one-to-one with a value. The concept of rank-order change of eigenvectors along evolving data is only meaningful if a second label can be attached to the vectors such that the second label does not change when the first does. Intuitively (for evolving data), a second label is the pointing direction of the vectors. This choice is fraught: Two time series are not comparable if they begin at different moments along a sequence because the initial labeling may not match. Additionally, validation of secondary labeling stability needs to be performed, and revalidation is necessary for each update because stability cannot be guaranteed.

With an assumption that a second labeling type is suitably stable, rank-order changes may still be observed. Before it can be concluded that the effect is a trait of the data, modeling assumptions must be considered first. There is a significant assumption embedded in the estimation of the eigensystem itself: using an eig(..) function call presupposes that the underlying statistical distribution is Gaussian. However, data is often heavy tailed, so copula or implicit methods are appropriate since Gaussian-based methods are not robust (see Meucci:2007

). Another modeling concern is the number of independent samples per dimension used in each panel: too few samples inherently skews the eigenvalue spectrum, and when coupled with sample noise, may induce spurious rank-order changes (again, see

Meucci:2007 ). Lastly, PCA ought only be performed on homogeneous data categories because the mixing of categories may well lead to spurious rank-order changes. Better to apply PCA independently to each category and then combine.

If rank-order change is determined to be a true trait of the data, then axial statistics is used in place of directional statistics. Referring to figure 6 above, two or more constellations will have some of their points mirror-imaged through the origin, creating a “barbell” shape along a common axis. The juxtaposition of the Watson density function, which is an axial distribution, to the von Mises–Fisher distribution,

shows that the axial direction is squared in order to treat the dual-signed nature of the pointing directions Sra:Axial:2013 . With this, directional statistics can be applied once again.

9.4 Regression Revisited

Let us apply the averages developed in this section to the equations in section 8, Application to Regression. The SVD and orientation steps remain the same. The linear regression of (26) is modified to use the local average of the eigenvalues from (28),

(39)

The prediction step is also modified from (27) to read

(40)

Here the mean direction of the eigenvectors in (30) replaces the single-observation rotation matrix  in the original.

Use of eigenvalue and eigenvector averages will reduce sample-based fluctuation and therefore may improve the predictions. However, from a time-series perspective, averages impart delay because an average is constructed based on a lookback interval. For stationary underlying processes the delay may not a problem, yet for nonstationary processes the lag between the average and the current state can lead to lower quality predictions.

In any event, at the very least this section has presented a methodology to decompose variation across a sequence of eigensystem observations into meaningful quantities.

10 Reference Implementation

The Python package thucyd, written by this author, is freely available from PyPi555Available at pypi.org/project/thucyd and Conda-Forge666Available at github.com/conda-forge/thucyd-feedstock and can be used directly once installed. The source code is available on at gitlab.com/thucyd-dev/thucyd.

There are two functions exposed on the interface of thucyd.eigen:

  • orient_eigenvectors implements the algorithm in section 6, and

  • generate_oriented_eigenvectors implements
    , eq. (22).

The pseudocode in listing 1 outlines the reference implementation for orient_ eigenvectors that is available at the above-cited source-code repositories. Matrix and vector indexing follows the Python Numpy notation.

def OrientEigenvectors(V, E):
       Vsort, Esort, SortIndices SortEigenvectors(V, E) Vwork copy(Vsort), for  to  do
             ? : Vwork, ReduceDimensionByOne(, , Vwork)
       end for
      Vor , Eor Esort return Vor, Eor, AnglesMtx, SignFlipVec, SortIndices
def SortEigenvectors(V, E):
       SortIndices argsort return , , SortIndices
def ReduceDimensionByOne(, , Vwork):
       Vcol AnglesCol SolveRotationAnglesInSubDim(, , Vcol) R ConstructSubspaceRotationMtx(, , AnglesCol) Vwork return Vwork,
def SolveRotationAnglesInSubDim(, , Vcol):
       AnglesWork zeros r 1 for  to step  do
             ? arcsin : 0
       end for
      return
def ConstructSubspaceRotationMtx(, , AnglesCol):
       R for  to step  do
             R
       end for
      return R
def MakeGivensRotation():
       R , , R , R , R , R return R
Algorithm 1 Pseudocode implementation of orient_eigenvectors.

10.1 Eigenvector Orientation

A simple orientation example in  reads as follows:

import numpy as np
import thucyd
# setup an example eigenvector mtx with qr
full_dim = 3; A = np.eye(full_dim)
A[:, 0] = 1. / np.sqrt(full_dim)
V, _ = np.linalg.qr(A)
# setup trivial eigenvalues, descending order
E = np.diag(np.arange(full_dim)[::-1])
# reflect the second eigenvector to build Vin
S = np.diag(np.array([1., -1., 1.]))
Vin = V.dot(S)
# orient the original matrix V
Vor, Eor, signs, theta_matrix,