Eigenvalue and Generalized Eigenvalue Problems: Tutorial

03/25/2019 ∙ by Benyamin Ghojogh, et al. ∙ 0

This paper is a tutorial for eigenvalue and generalized eigenvalue problems. We first introduce eigenvalue problem, eigen-decomposition (spectral decomposition), and generalized eigenvalue problem. Then, we mention the optimization problems which yield to the eigenvalue and generalized eigenvalue problems. We also provide examples from machine learning, including principal component analysis, kernel supervised principal component analysis, and Fisher discriminant analysis, which result in eigenvalue and generalized eigenvalue problems. Finally, we introduce the solutions to both eigenvalue and generalized eigenvalue problems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Eigenvalue and generalized eigenvalue problems play important roles in different fields of science, especially in machine learning. In eigenvalue problem, the eigenvectors represent the directions of the spread or variance of data and the corresponding eigenvalues are the magnitude of the spread in these directions

(Jolliffe, 2011)

. In generalized eigenvalue problem, these directions are impacted by another matrix. If the other matrix is the identity matrix, this impact is canceled and we will have the eigenvalue problem capturing the directions of the maximum spread.

In this paper, we introduce the eigenvalue problem and generalized eigenvalue problem and we introduce their solutions. We also introduce the optimization problems which yield to the eigenvalue and generalized eigenvalue problems. Some examples of these optimization problems in machine learning are also introduced for better illustration. The examples include principal component analysis, kernel supervised principal component analysis, and Fisher discriminant analysis.

2 Introducing Eigenvalue and Generalized Eigenvalue Problems

In this section, we introduce the eigenvalue problem and generalized eigenvalue problem.

2.1 Eigenvalue Problem

The eigenvalue problem (Wilkinson, 1965; Golub & Van Loan, 2012) of a symmetric matrix is defined as:

(1)

and in matrix form, it is:

(2)

where the columns of are the eigenvectors and diagonal elements of are the eigenvalues. Note that and .

Note that for eigenvalue problem, the matrix can be non-symmetric. If the matrix is symmetric, its eigenvectors are orthogonal/orthonormal and if it is non-symmetric, its eigenvectors are not orthogonal/orthonormal.

The Eq. (2) can be restated as:

(3)

where because

is an orthogonal matrix. Moreover, note that we always have

for orthogonal but we only have if “all” the columns of the orthogonal exist (it is not truncated, i.e., it is a square matrix). The Eq. (3) is referred to as “eigenvalue decomposition”, “eigen-decomposition”, or “spectral decomposition”.

2.2 Generalized Eigenvalue Problem

The generalized eigenvalue problem (Parlett, 1998; Golub & Van Loan, 2012) of two symmetric matrices and is defined as:

(4)

and in matrix form, it is:

(5)

where the columns of are the eigenvectors and diagonal elements of are the eigenvalues. Note that and .

The generalized eigenvalue problem of Eq. (4) or (5) is denoted by . The is called “pair” or “pencil” (Parlett, 1998). The order in the pair matters. The and are called the generalized eigenvectors and eigenvalues of . The or is called the “eigenpair” of the pair in the literature (Parlett, 1998).

Comparing Eqs. (1) and (4) or Eqs. (2) and (5) shows that the eigenvalue problem is a special case of the generalized eigenvalue problem where .

3 Eigenvalue Optimization

In this section, we introduce the optimization problems which yield to the eigenvalue problem.

3.1 Optimization Form 1

Consider the following optimization problem with the variable :

(6)
subject to

where . The Lagrangian (Boyd & Vandenberghe, 2004) for Eq. (6) is:

where is the Lagrange multiplier. Equating the derivative of Lagrangian to zero gives us:

which is an eigenvalue problem for according to Eq. (1). The is the eigenvector of and the is the eigenvalue.

As the Eq. (6) is a maximization problem, the eigenvector is the one having the largest eigenvalue. If the Eq. (6) is a minimization problem, the eigenvector is the one having the smallest eigenvalue.

3.2 Optimization Form 2

Consider the following optimization problem with the variable :

(7)
subject to

where , the denotes the trace of matrix, and is the identity matrix. Note that according to the properties of trace, the objective function can be any of these: .

The Lagrangian (Boyd & Vandenberghe, 2004) for Eq. (7) is:

where is a diagonal matrix whose entries are the Lagrange multipliers.

Equating derivative of to zero gives us:

which is an eigenvalue problem for according to Eq. (2). The columns of are the eigenvectors of and the diagonal elements of are the eigenvalues.

As the Eq. (7) is a maximization problem, the eigenvalues and eigenvectors in and are sorted from the largest to smallest eigenvalues. If the Eq. (7) is a minimization problem, the eigenvalues and eigenvectors in and are sorted from the smallest to largest eigenvalues.

3.3 Optimization Form 3

Consider the following optimization problem with the variable :

(8)
subject to

where and denotes the Frobenius norm of matrix.

The objective function in Eq. (8) is simplified as:

The Lagrangian (Boyd & Vandenberghe, 2004) is:

where is the Lagrange multiplier. Equating the derivative of to zero gives:

where is because we take . The is an eigenvalue problem for according to Eq. (1). The is the eigenvector of and the is the eigenvalue.

3.4 Optimization Form 4

Consider the following optimization problem with the variable :

(9)
subject to

where .

Similar to what we had for Eq. (8), the objective function in Eq. (9) is simplified as:

The Lagrangian (Boyd & Vandenberghe, 2004) is:

where is a diagonal matrix including Lagrange multipliers. Equating the derivative of to zero gives:

which is an eigenvalue problem for according to Eq. (2). The columns of are the eigenvectors of and the diagonal elements of are the eigenvalues.

3.5 Optimization Form 5

Consider the following optimization problem with the variable :

(10)

According to Rayleigh-Ritz quotient method (Croot, 2005), this optimization problem can be restated as:

(11)
subject to

The Lagrangian (Boyd & Vandenberghe, 2004) is:

where is the Lagrange multiplier. Equating the derivative of to zero gives:

which is an eigenvalue problem for according to Eq. (1). The is the eigenvector of and the is the eigenvalue.

As the Eq. (10) is a maximization problem, the eigenvector is the one having the largest eigenvalue. If the Eq. (10) is a minimization problem, the eigenvector is the one having the smallest eigenvalue.

4 Generalized Eigenvalue Optimization

In this section, we introduce the optimization problems which yield to the generalized eigenvalue problem.

4.1 Optimization Form 1

Consider the following optimization problem with the variable :

(12)
subject to

where and . The Lagrangian (Boyd & Vandenberghe, 2004) for Eq. (12) is:

where is the Lagrange multiplier. Equating the derivative of Lagrangian to zero gives us:

which is a generalized eigenvalue problem according to Eq. (4). The is the eigenvector and the is the eigenvalue for this problem.

As the Eq. (12) is a maximization problem, the eigenvector is the one having the largest eigenvalue. If the Eq. (12) is a minimization problem, the eigenvector is the one having the smallest eigenvalue.

Comparing Eqs. (6) and (12) shows that eigenvalue problem is a special case of generalized eigenvalue problem where .

4.2 Optimization Form 2

Consider the following optimization problem with the variable :

(13)
subject to

where and . Note that according to the properties of trace, the objective function can be any of these: .

The Lagrangian (Boyd & Vandenberghe, 2004) for Eq. (13) is:

where is a diagonal matrix whose entries are the Lagrange multipliers.

Equating derivative of to zero gives us:

which is an eigenvalue problem according to Eq. (5). The columns of are the eigenvectors of and the diagonal elements of are the eigenvalues.

As the Eq. (13) is a maximization problem, the eigenvalues and eigenvectors in and are sorted from the largest to smallest eigenvalues. If the Eq. (13) is a minimization problem, the eigenvalues and eigenvectors in and are sorted from the smallest to largest eigenvalues.

4.3 Optimization Form 3

Consider the following optimization problem with the variable :

(14)
subject to

where .

Similar to what we had for Eq. (8), The objective function in Eq. (14) is simplified as:

The Lagrangian (Boyd & Vandenberghe, 2004) is:

where is the Lagrange multiplier. Equating the derivative of to zero gives:

which is a generalized eigenvalue problem according to Eq. (4). The is the eigenvector and the is the eigenvalue.

4.4 Optimization Form 4

Consider the following optimization problem with the variable :

(15)
subject to

where .

Similar to what we had for Eq. (9), the objective function in Eq. (15) is simplified as:

The Lagrangian (Boyd & Vandenberghe, 2004) is:

where is a diagonal matrix including Lagrange multipliers. Equating the derivative of to zero gives:

which is an eigenvalue problem according to Eq. (5). The columns of are the eigenvectors of and the diagonal elements of are the eigenvalues.

4.5 Optimization Form 5

Consider the following optimization problem (Parlett, 1998) with the variable :

(16)

According to Rayleigh-Ritz quotient method (Croot, 2005), this optimization problem can be restated as:

(17)
subject to

The Lagrangian (Boyd & Vandenberghe, 2004) is:

where is the Lagrange multiplier. Equating the derivative of to zero gives:

which is a generalized eigenvalue problem according to Eq. (4). The is the eigenvector and the is the eigenvalue.

As the Eq. (16) is a maximization problem, the eigenvector is the one having the largest eigenvalue. If the Eq. (16) is a minimization problem, the eigenvector is the one having the smallest eigenvalue.

5 Examples for the Optimization Problems

In this section, we introduce some examples in machine learning which use the introduced optimization problems.

5.1 Examples for Eigenvalue Problem

5.1.1 Variance in Principal Component Analysis

In Principal Component Analysis (PCA) (Pearson, 1901; Friedman et al., 2009)

, if we want to project onto one vector (one-dimensional PCA subspace), the problem is:

(18)
subject to

where is the projection direction and is the covariance matrix. Therefore, is the eigenvector of with the largest eigenvalue.

If we want to project onto a PCA subspace spanned by several directions, we have:

(19)
subject to

where the columns of span the PCA subspace.

5.1.2 Reconstruction in Principal Component Analysis

We can look at PCA with another perspective: PCA is the best linear projection which has the smallest reconstruction error. If we have one PCA direction, the projection is and the reconstruction is . We want the error between the reconstructed data and the original data to be minimized:

(20)
subject to

Therefore, is the eigenvector of the covariance matrix (the is already centered by removing its mean).

If we consider several PCA directions, i.e., the columns of , the minimization of the reconstruction error is:

(21)
subject to

Thus, the columns of are the eigenvectors of the covariance matrix (the is already centered by removing its mean).

5.2 Examples for Generalized Eigenvalue Problem

5.2.1 Kernel Supervised Principal Component Analysis

Kernel Supervised PCA (SPCA) (Barshan et al., 2011) uses the following optimization problem:

(22)
subject to

where and are the kernel matrices over the training data and the labels of the training data, respectively, the is the centering matrix, and the columns of span the kernel SPCA subspace.

According to Eq. (13), the solution to Eq. (22) is:

(23)

which is the generalized eigenvalue problem according to Eq. (5) where the and are the eigenvector and eigenvalue matrices, respectively.

5.2.2 Fisher Discriminant Analysis

Another example is Fisher Discriminant Analysis (FDA) (Fisher, 1936; Friedman et al., 2009) in which the Fisher criterion (Xu & Lu, 2006) is maximized:

(24)

where is the projection direction and and are between- and within-class scatters:

(25)
(26)

is the number of classes, is the sample size of the -th class, is the -th data point in the -th class, is the mean of the -th class, and is the total mean.

According to Rayleigh-Ritz quotient method (Croot, 2005), the optimization problem in Eq. (24) can be restated as:

(27)
subject to

The Lagrangian (Boyd & Vandenberghe, 2004) is:

where is the Lagrange multiplier. Equating the derivative of to zero gives:

which is a generalized eigenvalue problem according to Eq. (4). The is the eigenvector with the largest eigenvalue and the is the corresponding eigenvalue.

6 Solution to Eigenvalue Problem

In this section, we introduce the solution to the eigenvalue problem. Consider the Eq. (1):

(28)

which is a linear system of equations. According to Cramer’s rule, a linear system of equations has non-trivial solutions if and only if the determinant vanishes. Therefore:

(29)

where denotes the determinant of matrix. The Eq. (29) gives us a -degree polynomial equation which has roots (answers). Note that if the is not full rank (if it is a singular matrix), some of the roots will be zero. Moreover, if is positive semi-definite, i.e., , all the roots are non-negative.

The roots (answers) from Eq. (29) are the eigenvalues of . After finding the roots, we put every answer in Eq. (28) and find its corresponding eigenvector, . Note that putting the root in Eq. (28) gives us a vector which can be normalized because the direction of the eigenvector is important and not its magnitude. The information of magnitude exists in its corresponding eigenvalue.

7 Solution to Generalized Eigenvalue Problem

In this section, we introduce the solution to the generalized eigenvalue problem. Recall the Eq. (16) again:

Let be this fraction named Rayleigh quotient (Croot, 2005):

(30)

The is stationary at if and only if:

(31)

for some scalar (Parlett, 1998). The Eq. (31) is a linear system of equations. This system of equations can also be obtained from the Eq. (4):

(32)

As we mentioned earlier, eigenvalue problem is a special case of generalized eigenvalue problem (where ) which is obvious by comparing Eqs. (28) and (32).

According to Cramer’s rule, a linear system of equations has non-trivial solutions if and only if the determinant vanishes. Therefore:

(33)

Similar to the explanations for Eq. (29), we can solve for the roots of Eq. (33). However, note that the Eq. (33) is obtained from Eq. (4) or (16) where only one eigenvector is considered.

For solving Eq. (5) in general case, there exist two solutions for the generalized eigenvalue problem one of which is a quick and dirty solution and the other is a rigorous method. Both of the methods are explained in the following.

7.1 The Quick & Dirty Solution

Consider the Eq. (5) again:

If is not singular (is invertible ), we can left-multiply the expressions by :

(34)

where is because we take . The Eq. (34) is the eigenvalue problem for according to Eq. (2) and can be solved using the approach of Eq. (29).

Note that even if is singular, we can use a numeric hack (which is a little dirty) and slightly strengthen its main diagonal in order to make it full rank:

(35)

where is a very small positive number, e.g., , large enough to make full rank.

7.2 The Rigorous Solution

Consider the Eq. (5) again:

There exist a rigorous method to solve the generalized eigenvalue problem (Wang, 2015) which is explained in the following.

Consider the eigenvalue problem for :

(36)

where and are the eigenvector and eigenvalue matrices of , respectively. Then, we have:

(37)

where is because is an orthogonal matrix (its columns are orthonormal) and thus .

We multiply to equation (37) from left and right hand sides:

where:

(38)

We define as:

(39)

The is symmetric because:

where notices that is symmetric.

The eigenvalue problem for is:

(40)

where and are the eigenvector and eigenvalue matrices of . Left-multiplying to equation (40) gives us:

(41)

where is because is an orthogonal matrix (its columns are orthonormal), so . Note that is an orthogonal matrix because is symmetric (if the matrix is symmetric, its eigenvectors are orthogonal/orthonormal). The equation (41) is diagonalizing the matrix .

Plugging equation (39) in equation (41) gives us:

(42)

where:

(43)

The also diagonalizes because ( is a diagonal matrix):

(44)

where is because is an orthogonal matrix. From equation (44), we have:

(45)

where is because .

Comparing equations (5) and (45) shows us:

(46)

algocf[!t]

To summarize, for finding and in Eq. (5), we do the following steps (note that and are given):

  1. From Eq. (36), we find and .

  2. From Eq. (38), we find . In case is singular in Eq. (38), we can use the numeric hack where is a very small positive number, e.g., , large enough to make full rank.

  3. From Eq. (39), we find .

  4. From Eq. (40), we find and . From Eq. (46), is found.

  5. From Eq. (43), we find .

The above instructions are given as an algorithm in Algorithm LABEL:algorithm_solution_generalizedEigenProblem.

8 Conclusion

This paper was a tutorial paper introducing the eigenvalue and generalized eigenvalue problems. The problems were introduced, their optimization problems were mentioned, and some examples from machine learning were provided for them. Moreover, the solution to the eigenvalue and generalized eigenvalue problems were introduced.

References

  • Barshan et al. (2011) Barshan, Elnaz, Ghodsi, Ali, Azimifar, Zohreh, and Jahromi, Mansoor Zolghadri. Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds. Pattern Recognition, 44(7):1357–1371, 2011.
  • Boyd & Vandenberghe (2004) Boyd, Stephen and Vandenberghe, Lieven. Convex optimization. Cambridge university press, 2004.
  • Croot (2005) Croot, Ernie. The Rayleigh principle for finding eigenvalues. Technical report, Georgia Institute of Technology, School of Mathematics, 2005. Online: http://people.math.gatech.edu/ecroot/notes_linear.pdf, Accessed: March 2019.
  • Fisher (1936) Fisher, Ronald A. The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2):179–188, 1936.
  • Friedman et al. (2009) Friedman, Jerome, Hastie, Trevor, and Tibshirani, Robert. The elements of statistical learning, volume 2. Springer series in statistics New York, NY, USA:, 2009.
  • Golub & Van Loan (2012) Golub, Gene H. and Van Loan, Charles F. Matrix computations, volume 3. The Johns Hopkins University Press, 2012.
  • Jolliffe (2011) Jolliffe, Ian. Principal component analysis. Springer, 2011.
  • Parlett (1998) Parlett, Beresford N. The symmetric eigenvalue problem. Classics in Applied Mathematics, 20, 1998.
  • Pearson (1901) Pearson, Karl. LIII. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559–572, 1901.
  • Wang (2015) Wang, Ruye. Generalized eigenvalue problem. http://fourier.eng.hmc.edu/e161/lectures/algebra/node7.html, 2015. Accessed: January 2019.
  • Wilkinson (1965) Wilkinson, James Hardy. The algebraic eigenvalue problem, volume 662. Oxford Clarendon, 1965.
  • Xu & Lu (2006) Xu, Yong and Lu, Guangming. Analysis on Fisher discriminant criterion and linear separability of feature space. In 2006 International Conference on Computational Intelligence and Security, volume 2, pp. 1671–1676. IEEE, 2006.