1 Introduction
Eigenvalue and generalized eigenvalue problems play important roles in different fields of science, especially in machine learning. In eigenvalue problem, the eigenvectors represent the directions of the spread or variance of data and the corresponding eigenvalues are the magnitude of the spread in these directions
(Jolliffe, 2011). In generalized eigenvalue problem, these directions are impacted by another matrix. If the other matrix is the identity matrix, this impact is canceled and we will have the eigenvalue problem capturing the directions of the maximum spread.
In this paper, we introduce the eigenvalue problem and generalized eigenvalue problem and we introduce their solutions. We also introduce the optimization problems which yield to the eigenvalue and generalized eigenvalue problems. Some examples of these optimization problems in machine learning are also introduced for better illustration. The examples include principal component analysis, kernel supervised principal component analysis, and Fisher discriminant analysis.
2 Introducing Eigenvalue and Generalized Eigenvalue Problems
In this section, we introduce the eigenvalue problem and generalized eigenvalue problem.
2.1 Eigenvalue Problem
The eigenvalue problem (Wilkinson, 1965; Golub & Van Loan, 2012) of a symmetric matrix is defined as:
(1) 
and in matrix form, it is:
(2) 
where the columns of are the eigenvectors and diagonal elements of are the eigenvalues. Note that and .
Note that for eigenvalue problem, the matrix can be nonsymmetric. If the matrix is symmetric, its eigenvectors are orthogonal/orthonormal and if it is nonsymmetric, its eigenvectors are not orthogonal/orthonormal.
The Eq. (2) can be restated as:
(3) 
where because
is an orthogonal matrix. Moreover, note that we always have
for orthogonal but we only have if “all” the columns of the orthogonal exist (it is not truncated, i.e., it is a square matrix). The Eq. (3) is referred to as “eigenvalue decomposition”, “eigendecomposition”, or “spectral decomposition”.2.2 Generalized Eigenvalue Problem
The generalized eigenvalue problem (Parlett, 1998; Golub & Van Loan, 2012) of two symmetric matrices and is defined as:
(4) 
and in matrix form, it is:
(5) 
where the columns of are the eigenvectors and diagonal elements of are the eigenvalues. Note that and .
3 Eigenvalue Optimization
In this section, we introduce the optimization problems which yield to the eigenvalue problem.
3.1 Optimization Form 1
Consider the following optimization problem with the variable :
(6)  
subject to 
where . The Lagrangian (Boyd & Vandenberghe, 2004) for Eq. (6) is:
where is the Lagrange multiplier. Equating the derivative of Lagrangian to zero gives us:
which is an eigenvalue problem for according to Eq. (1). The is the eigenvector of and the is the eigenvalue.
3.2 Optimization Form 2
Consider the following optimization problem with the variable :
(7)  
subject to 
where , the denotes the trace of matrix, and is the identity matrix. Note that according to the properties of trace, the objective function can be any of these: .
The Lagrangian (Boyd & Vandenberghe, 2004) for Eq. (7) is:
where is a diagonal matrix whose entries are the Lagrange multipliers.
Equating derivative of to zero gives us:
which is an eigenvalue problem for according to Eq. (2). The columns of are the eigenvectors of and the diagonal elements of are the eigenvalues.
3.3 Optimization Form 3
Consider the following optimization problem with the variable :
(8)  
subject to 
where and denotes the Frobenius norm of matrix.
The objective function in Eq. (8) is simplified as:
The Lagrangian (Boyd & Vandenberghe, 2004) is:
where is the Lagrange multiplier. Equating the derivative of to zero gives:
where is because we take . The is an eigenvalue problem for according to Eq. (1). The is the eigenvector of and the is the eigenvalue.
3.4 Optimization Form 4
Consider the following optimization problem with the variable :
(9)  
subject to 
where .
Similar to what we had for Eq. (8), the objective function in Eq. (9) is simplified as:
The Lagrangian (Boyd & Vandenberghe, 2004) is:
where is a diagonal matrix including Lagrange multipliers. Equating the derivative of to zero gives:
which is an eigenvalue problem for according to Eq. (2). The columns of are the eigenvectors of and the diagonal elements of are the eigenvalues.
3.5 Optimization Form 5
Consider the following optimization problem with the variable :
(10) 
According to RayleighRitz quotient method (Croot, 2005), this optimization problem can be restated as:
(11)  
subject to 
The Lagrangian (Boyd & Vandenberghe, 2004) is:
where is the Lagrange multiplier. Equating the derivative of to zero gives:
which is an eigenvalue problem for according to Eq. (1). The is the eigenvector of and the is the eigenvalue.
4 Generalized Eigenvalue Optimization
In this section, we introduce the optimization problems which yield to the generalized eigenvalue problem.
4.1 Optimization Form 1
Consider the following optimization problem with the variable :
(12)  
subject to 
where and . The Lagrangian (Boyd & Vandenberghe, 2004) for Eq. (12) is:
where is the Lagrange multiplier. Equating the derivative of Lagrangian to zero gives us:
which is a generalized eigenvalue problem according to Eq. (4). The is the eigenvector and the is the eigenvalue for this problem.
4.2 Optimization Form 2
Consider the following optimization problem with the variable :
(13)  
subject to 
where and . Note that according to the properties of trace, the objective function can be any of these: .
The Lagrangian (Boyd & Vandenberghe, 2004) for Eq. (13) is:
where is a diagonal matrix whose entries are the Lagrange multipliers.
Equating derivative of to zero gives us:
which is an eigenvalue problem according to Eq. (5). The columns of are the eigenvectors of and the diagonal elements of are the eigenvalues.
4.3 Optimization Form 3
Consider the following optimization problem with the variable :
(14)  
subject to 
where .
Similar to what we had for Eq. (8), The objective function in Eq. (14) is simplified as:
The Lagrangian (Boyd & Vandenberghe, 2004) is:
where is the Lagrange multiplier. Equating the derivative of to zero gives:
which is a generalized eigenvalue problem according to Eq. (4). The is the eigenvector and the is the eigenvalue.
4.4 Optimization Form 4
Consider the following optimization problem with the variable :
(15)  
subject to 
where .
Similar to what we had for Eq. (9), the objective function in Eq. (15) is simplified as:
The Lagrangian (Boyd & Vandenberghe, 2004) is:
where is a diagonal matrix including Lagrange multipliers. Equating the derivative of to zero gives:
which is an eigenvalue problem according to Eq. (5). The columns of are the eigenvectors of and the diagonal elements of are the eigenvalues.
4.5 Optimization Form 5
Consider the following optimization problem (Parlett, 1998) with the variable :
(16) 
According to RayleighRitz quotient method (Croot, 2005), this optimization problem can be restated as:
(17)  
subject to 
The Lagrangian (Boyd & Vandenberghe, 2004) is:
where is the Lagrange multiplier. Equating the derivative of to zero gives:
which is a generalized eigenvalue problem according to Eq. (4). The is the eigenvector and the is the eigenvalue.
5 Examples for the Optimization Problems
In this section, we introduce some examples in machine learning which use the introduced optimization problems.
5.1 Examples for Eigenvalue Problem
5.1.1 Variance in Principal Component Analysis
In Principal Component Analysis (PCA) (Pearson, 1901; Friedman et al., 2009)
, if we want to project onto one vector (onedimensional PCA subspace), the problem is:
(18)  
subject to 
where is the projection direction and is the covariance matrix. Therefore, is the eigenvector of with the largest eigenvalue.
If we want to project onto a PCA subspace spanned by several directions, we have:
(19)  
subject to 
where the columns of span the PCA subspace.
5.1.2 Reconstruction in Principal Component Analysis
We can look at PCA with another perspective: PCA is the best linear projection which has the smallest reconstruction error. If we have one PCA direction, the projection is and the reconstruction is . We want the error between the reconstructed data and the original data to be minimized:
(20)  
subject to 
Therefore, is the eigenvector of the covariance matrix (the is already centered by removing its mean).
If we consider several PCA directions, i.e., the columns of , the minimization of the reconstruction error is:
(21)  
subject to 
Thus, the columns of are the eigenvectors of the covariance matrix (the is already centered by removing its mean).
5.2 Examples for Generalized Eigenvalue Problem
5.2.1 Kernel Supervised Principal Component Analysis
Kernel Supervised PCA (SPCA) (Barshan et al., 2011) uses the following optimization problem:
(22)  
subject to 
where and are the kernel matrices over the training data and the labels of the training data, respectively, the is the centering matrix, and the columns of span the kernel SPCA subspace.
5.2.2 Fisher Discriminant Analysis
Another example is Fisher Discriminant Analysis (FDA) (Fisher, 1936; Friedman et al., 2009) in which the Fisher criterion (Xu & Lu, 2006) is maximized:
(24) 
where is the projection direction and and are between and withinclass scatters:
(25)  
(26) 
is the number of classes, is the sample size of the th class, is the th data point in the th class, is the mean of the th class, and is the total mean.
According to RayleighRitz quotient method (Croot, 2005), the optimization problem in Eq. (24) can be restated as:
(27)  
subject to 
The Lagrangian (Boyd & Vandenberghe, 2004) is:
where is the Lagrange multiplier. Equating the derivative of to zero gives:
which is a generalized eigenvalue problem according to Eq. (4). The is the eigenvector with the largest eigenvalue and the is the corresponding eigenvalue.
6 Solution to Eigenvalue Problem
In this section, we introduce the solution to the eigenvalue problem. Consider the Eq. (1):
(28) 
which is a linear system of equations. According to Cramer’s rule, a linear system of equations has nontrivial solutions if and only if the determinant vanishes. Therefore:
(29) 
where denotes the determinant of matrix. The Eq. (29) gives us a degree polynomial equation which has roots (answers). Note that if the is not full rank (if it is a singular matrix), some of the roots will be zero. Moreover, if is positive semidefinite, i.e., , all the roots are nonnegative.
The roots (answers) from Eq. (29) are the eigenvalues of . After finding the roots, we put every answer in Eq. (28) and find its corresponding eigenvector, . Note that putting the root in Eq. (28) gives us a vector which can be normalized because the direction of the eigenvector is important and not its magnitude. The information of magnitude exists in its corresponding eigenvalue.
7 Solution to Generalized Eigenvalue Problem
In this section, we introduce the solution to the generalized eigenvalue problem. Recall the Eq. (16) again:
Let be this fraction named Rayleigh quotient (Croot, 2005):
(30) 
The is stationary at if and only if:
(31) 
for some scalar (Parlett, 1998). The Eq. (31) is a linear system of equations. This system of equations can also be obtained from the Eq. (4):
(32) 
As we mentioned earlier, eigenvalue problem is a special case of generalized eigenvalue problem (where ) which is obvious by comparing Eqs. (28) and (32).
According to Cramer’s rule, a linear system of equations has nontrivial solutions if and only if the determinant vanishes. Therefore:
(33) 
Similar to the explanations for Eq. (29), we can solve for the roots of Eq. (33). However, note that the Eq. (33) is obtained from Eq. (4) or (16) where only one eigenvector is considered.
For solving Eq. (5) in general case, there exist two solutions for the generalized eigenvalue problem one of which is a quick and dirty solution and the other is a rigorous method. Both of the methods are explained in the following.
7.1 The Quick & Dirty Solution
Consider the Eq. (5) again:
If is not singular (is invertible ), we can leftmultiply the expressions by :
(34) 
where is because we take . The Eq. (34) is the eigenvalue problem for according to Eq. (2) and can be solved using the approach of Eq. (29).
Note that even if is singular, we can use a numeric hack (which is a little dirty) and slightly strengthen its main diagonal in order to make it full rank:
(35) 
where is a very small positive number, e.g., , large enough to make full rank.
7.2 The Rigorous Solution
Consider the Eq. (5) again:
There exist a rigorous method to solve the generalized eigenvalue problem (Wang, 2015) which is explained in the following.
Consider the eigenvalue problem for :
(36) 
where and are the eigenvector and eigenvalue matrices of , respectively. Then, we have:
(37) 
where is because is an orthogonal matrix (its columns are orthonormal) and thus .
We multiply to equation (37) from left and right hand sides:
where:
(38) 
We define as:
(39) 
The is symmetric because:
where notices that is symmetric.
The eigenvalue problem for is:
(40) 
where and are the eigenvector and eigenvalue matrices of . Leftmultiplying to equation (40) gives us:
(41) 
where is because is an orthogonal matrix (its columns are orthonormal), so . Note that is an orthogonal matrix because is symmetric (if the matrix is symmetric, its eigenvectors are orthogonal/orthonormal). The equation (41) is diagonalizing the matrix .
Plugging equation (39) in equation (41) gives us:
(42) 
where:
(43) 
The also diagonalizes because ( is a diagonal matrix):
(44) 
where is because is an orthogonal matrix. From equation (44), we have:
(45) 
where is because .
algocf[!t]
To summarize, for finding and in Eq. (5), we do the following steps (note that and are given):
The above instructions are given as an algorithm in Algorithm LABEL:algorithm_solution_generalizedEigenProblem.
8 Conclusion
This paper was a tutorial paper introducing the eigenvalue and generalized eigenvalue problems. The problems were introduced, their optimization problems were mentioned, and some examples from machine learning were provided for them. Moreover, the solution to the eigenvalue and generalized eigenvalue problems were introduced.
References
 Barshan et al. (2011) Barshan, Elnaz, Ghodsi, Ali, Azimifar, Zohreh, and Jahromi, Mansoor Zolghadri. Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds. Pattern Recognition, 44(7):1357–1371, 2011.
 Boyd & Vandenberghe (2004) Boyd, Stephen and Vandenberghe, Lieven. Convex optimization. Cambridge university press, 2004.

Croot (2005)
Croot, Ernie.
The Rayleigh principle for finding eigenvalues.
Technical report, Georgia Institute of Technology, School of
Mathematics, 2005.
Online:
http://people.math.gatech.edu/ecroot/notes
_
linear.pdf, Accessed: March 2019.  Fisher (1936) Fisher, Ronald A. The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2):179–188, 1936.
 Friedman et al. (2009) Friedman, Jerome, Hastie, Trevor, and Tibshirani, Robert. The elements of statistical learning, volume 2. Springer series in statistics New York, NY, USA:, 2009.
 Golub & Van Loan (2012) Golub, Gene H. and Van Loan, Charles F. Matrix computations, volume 3. The Johns Hopkins University Press, 2012.
 Jolliffe (2011) Jolliffe, Ian. Principal component analysis. Springer, 2011.
 Parlett (1998) Parlett, Beresford N. The symmetric eigenvalue problem. Classics in Applied Mathematics, 20, 1998.
 Pearson (1901) Pearson, Karl. LIII. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559–572, 1901.
 Wang (2015) Wang, Ruye. Generalized eigenvalue problem. http://fourier.eng.hmc.edu/e161/lectures/algebra/node7.html, 2015. Accessed: January 2019.
 Wilkinson (1965) Wilkinson, James Hardy. The algebraic eigenvalue problem, volume 662. Oxford Clarendon, 1965.
 Xu & Lu (2006) Xu, Yong and Lu, Guangming. Analysis on Fisher discriminant criterion and linear separability of feature space. In 2006 International Conference on Computational Intelligence and Security, volume 2, pp. 1671–1676. IEEE, 2006.
Comments
There are no comments yet.