Generalized two-dimensional linear discriminant analysis with regularization

Recent advances show that two-dimensional linear discriminant analysis (2DLDA) is a successful matrix based dimensionality reduction method. However, 2DLDA may encounter the singularity issue theoretically and the sensitivity to outliers. In this paper, a generalized Lp-norm 2DLDA framework with regularization for an arbitrary p>0 is proposed, named G2DLDA. There are mainly two contributions of G2DLDA: one is G2DLDA model uses an arbitrary Lp-norm to measure the between-class and within-class scatter, and hence a proper p can be selected to achieve the robustness. The other one is that by introducing an extra regularization term, G2DLDA achieves better generalization performance, and solves the singularity problem. In addition, G2DLDA can be solved through a series of convex problems with equality constraint, and it has closed solution for each single problem. Its convergence can be guaranteed theoretically when 1≤ p≤2. Preliminary experimental results on three contaminated human face databases show the effectiveness of the proposed G2DLDA.

READ FULL TEXT VIEW PDF

Authors

page 9

page 11

page 13

11/04/2020

Capped norm linear discriminant analysis and its applications

Classical linear discriminant analysis (LDA) is based on squared Frobeni...
11/11/2020

Two-dimensional Bhattacharyya bound linear discriminant analysis with its applications

Recently proposed L2-norm linear discriminant analysis criterion via the...
11/06/2018

Robust Bhattacharyya bound linear discriminant analysis through adaptive algorithm

In this paper, we propose a novel linear discriminant analysis criterion...
03/27/2011

Fast Learning Rate of lp-MKL and its Minimax Optimality

In this paper, we give a new sharp generalization bound of lp-MKL which ...
06/15/2020

Robust Locality-Aware Regression for Labeled Data Classification

With the dramatic increase of dimensions in the data representation, ext...
08/13/2019

Null Space Analysis for Class-Specific Discriminant Learning

In this paper, we carry out null space analysis for Class-Specific Discr...
02/14/2012

Lipschitz Parametrization of Probabilistic Graphical Models

We show that the log-likelihood of several probabilistic graphical model...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Dimensionality reduction (DR) plays an important role in pattern recognition and has been studied extensively. For supervised DR, linear discriminant analysis (LDA)

[1, 2]

is usually employed to extract the most discriminative features. It finds the optimal discriminative direction by maximizing the between-class variance while simultaneously minimizing the within-class variance in the projected space.

However, when dealing with massive multi-dimensional data such as the real world two-dimensional (2D) face images, LDA often becomes inadequate due to the high-dimensionality and the loss of some useful natural structural information when converting multi-dimensional data to vector ones

[3]. Especially, when the data feature dimension is much larger than the number of samples, LDA may suffer from the small sample size (SSS) problem and hence encounters singularity.

To deal with these problems, matrix based LDA, i.e., 2DLDA [5, 3, 4, 6, 7, 8, 9] is studied. Compared to LDA, 2DLDA alleviates the SSS problem when some mild condition is satisfied [3, 4]

. Even so, it may still encounter the singularity issue theoretically, which in turn will degenerate its performance. Moreover, both LDA and 2DLDA just maximize the between-class variance and minimize the within-class variance for the training data set, while do not consider the generalization ability on the test data. This over-fitting phenomenon arises from the fact that there is no control term on confidence interval for classical LDA and 2DLDA. Another problem existing in classical LDA and 2DLDA is that they are sensitive to the presence of outliers, because the L2-norm will exaggerate the effect of the data samples.

For the first issue, a popular method is the regularization technique, which replaces the within-class covariance matrix with a ridge-like covariance estimate, for example, the regularized LDA (RDA)

[10, 11, 12, 13]. The regularization technique reduces the variance that associated with the sample based estimate, and hence stabilizes the estimate [10]. In fact, it has been successfully applied when solving the ill-posed inverse problems [14]. Meanwhile, the introduction of the regularization term controls the model complexity, and avoids over-fitting [15]. Therefore, the regularization technique not only overcomes the singularity problem, but also leads better generalization ability. For singularity and generalization problems in LDA, Lp-norm-like regularization with Ridge-like covariance estimate is a good choice [10, 11, 12, 13, 14, 15].

For the second issue on the sensitivity to outliers, some approaches were also proposed. For vector based LDA, local Fisher discriminant analysis (LFDA) [16]

, probability based minimax optimization technique

[17] and uncertainty LDA model [18]. Since the application of L2-norm in LDA is one of the main reasons that causes LDA sensitive to outliers, the L1-norm based technique is also considered as an effective robust replacement, such as the rotational invariant L1-norm based LDA (DCL1) [19] and the L1-norm based LDA (LDA-L1) [20, 21]. Here we notice that the solving algorithm of LDA-L1 was based on the gradient ascending (GA) technique of nonconvex surrogate functions whose optimal solutions cannot be guaranteed, and the proper step size was hard to choose in practice. To tackle this problem, various methods were proposed, including the non-greedy iterative algorithms for difference optimization problems [22, 23, 24], the convex surrogate technique [25], the concave-convex procedure (CCCP) [26] and the successive linear algorithm (SLA) [27]. Though the above improvements were proved to be effective, it should be noted that some of them still exist the singularity problem during practical computation, for example, L1-LDA [25] and recursive “concave-convex” Fisher linear discriminant (RPFLD) [26], as pointed out in [15] and [23], respectively. Further, the generalization of L1-norm LDA to Lp-norm LDA with any was studied [28, 29], and the Lp-norm LDA was solved through the GA technique. For 2DLDA, as a robust improvement of LDA, LDA-L1 was further extended to its 2D version named L12DLDA [30, 31], and its non-greedy modification [32] was also studied.

However, for the singularity problem of 2DLDA and its generalization and robustness issues, they are not well addressed as LDA. In particular, the regularization technique and the application of Lp-norm are not studied as far as we know. Therefore, in this paper, to address the singularity problem of 2DLDA and improve its generalization and robustness performance, we consider a generalized Lp-norm based 2DLDA with regularization for an arbitrary , named G2DLDA. G2DLDA not only maximizes the Lp-norm between-class distance and minimizes the Lp-norm within-class distance, but also introduces a regularization term with Lp-norm. Since the proposed G2DLDA involves the Lp-norm operation on both its numerator and denominator, we here employ a simple iteration technique which converts the ratio optimization problem into a series of convex programming problems, and its convergence is also discussed. In summary, the proposed G2DLDA has following characteristics:
(i) G2DLDA is a generalized two-dimensional linear discriminant analysis with regularization, where the between-class scatter, within-class scatter and the regularization term in G2DLDA are measured by Lp-norm with arbitrary . This makes G2DLDA not only easily degenerate to the existing 2DLDA and L12DLDA, but also achieve desired performance by choosing proper .
(ii) The regularization is used to remedy the singularity problem. In addition, it controls the model complexity and avoids over-fitting, and therefore improves its generalization performance.
(iii) An effective algorithm is designed for G2DLDA. In specific, the solution of G2DLDA is given by solving a series of convex problems with closed-form solutions. Moreover, the convergence of the algorithm can be ensured when .
(iv) Experimental results on three contaminated human face databases with different noise levels demonstrate the effectiveness of G2DLDA.

The paper is organized as follows. Section 2 briefly dwells on the LDA and 2DLDA. Section 3 reviews L1LDA and L12DLDA. Section 4 proposes our G2DLDA and gives the corresponding theoretical analysis. Section 5 makes comparisons of our G2DLDA with its related approaches. At last, the concluding remarks are given in Section 6.

The notations of this paper are given as follows. We consider a supervised learning problem in the

-dimensional matrix space . The training data set is given by , where is the input matrix and is the corresponding label with . Assume that the -th class contains samples, . Then we have . We further write the samples in the -th class as , , . Let be the mean of all sample matrices and be the mean of sample matrices in the -th class. For a matrix and , its Lp-norm is defined as . Note that when , Lp-norm is only a quasi-norm [33]. However, it does not affect its use in this paper, and we hence call it Lp-norm for symbol unification.

Ii L2-norm LDA and L2-norm 2DLDA

The classical L2-norm based LDA is arising from Fisher’s discriminant problem, and is a vector based method. Assume , then each lies in the -th dimensional vector space . For the data set , define the between-class scatter matrix and the within-class scatter matrix as

(1)

and

(2)

LDA involves seeking an optimal matrix that consists of discriminant vectors , , , by solving the problem

(3)

After obtaining , a new coming input is projected as . The solution to the optimization problem (3

) can be given by the eigenvectors corresponding to the first largest

nonzero eigenvalues of the generalized eigenvalue problem

in case is nonsingular, where . Since the rank of is at most , the number of extracted features is less or equal than . It is obvious that when is not of full rank, LDA will encounter the singularity problem.

To deal with the matrix data directly, 2DLDA is studied. Assume and the input data are as demonstrated in . Then 2DLDA has the same formulation of LDA (3), where and are defined as in (1) and (2) but with the matrix input. The projection vectors of 2DLDA are also obtained by solving the above generalized eigenvalue problem. 2DLDA can well extract algebraic features of the matrix input data [5]. Though the singularity problem is much alleviated for 2DLDA, it still exists in theory [4].

In addition, both LDA and 2DLDA are prone to the presence of outliers. In fact, denote , , where are defined as in Section 1. Then by the fact that for any matrix , where is the Frobenius norm, (3) can be rewritten as

(4)

As can be seen, the objective of (4) is based on the L2-norm in nature, and hence is sensitive to outliers and noise.

Iii L1-norm LDA and L1-norm 2DLDA

With the purpose to improve the robustness of LDA and 2DLDA, L1-norm based LDA and 2DLDA (LDA-L1 and L12DLDA) are formulated by replacing the F-norm terms in (4) by the L1-norm ones:

(5)

where is the orthonormal projection matrix. When , (5) is LDA-L1 [20, 21]. For general , it is just L12DLDA [30, 31]. When , (5) is originally solved through the GA technique for a nonconvex surrogate function, and the deflation technique is used to obtain multiple discriminant directions. Note that GA needs to choose an appropriate step size, and the optimal solution of (5) can not be guaranteed in GA technique [25].

To address this problem, when , [25] derived a novel L1-norm discriminant criterion coined L1-LDA under a theoretical framework of Bayes optimality. L1-LDA is of the same formulation of LDA-L1 but uses the whole data scatter instead of the within-class scatter, all are equal, and the orthonormal constraint in (5) is replaced by the -orthonormal one. L1-LDA is solved by an iteration technique, and the -th solution is given by

(6)

where , , , and . Here is the symbol function.

Let and . Then the solution of (6) is given by

(7)

Though L1-LDA is derived under the rigorous theoretical framework, it is obvious that the matrix in the solution (7) of each iteration may not be of full rank, and hence may not exist. The singularity problem happens because the matrix in the quadratic objective of problem (6) may not be positive definite. This will bring the ineffectiveness of the algorithm. The phenomenon is also theoretically stated in [23]. Moreover, it can be seen that problem (5) just minimizes the empirical error for the training data. However, it does not consider the generalization property. As L12DLDA is also solved by GA, L12DLDA exists the same problem as LDA-L1. However, there is no study on this problem on L12DLDA. If we try to address it as did in [25], L12DLDA can also be solved through a series of convex quadratic problems. However, it will also encounter the singularity problem. Thereafter, it is necessary to study the above problems further.

Iv Generalized Lp-norm 2DLDA with regularization

Iv-a Problem formulation

As seen in Sections II and III, the above L2-norm and L1-norm based ratio form in LDAs and 2DLDAs face the singularity problem. Moreover, though they may perform well for the training data set, their optimization models did not consider the generalization ability. This phenomenon arises from the fact that there is no regularization term on classical LDA and 2DLDA. Aiming to solve the above singularity problem, and improve the robustness and generalization performance of 2DLDA, we here propose a generalized Lp-norm based 2DLDA for arbitrary with regularization, called G2DLDA, which is formulated as

(8)

where , and is a nonnegative tuning parameter.

We now explain the geometric meaning of problem (

8). By minimizing the first term in the nominator, each element in the -th class is guaranteed to be as close as possible to the -th class center, while maximizing the denominator makes each projected class center as far as possible from the whole projected center in the Lp-norm sense. Minimizing the second term in the nominator, leads us to control the model complexity, which will generate better generalization ability. The constraint of (8) makes sure the obtained discriminant directions of G2DLDA orthogonal to each other, which is beneficial to the nonredundancy in representing the subspace.

As we can see from problem (8), there are two improvements of G2DLDA over the existing two-dimensional LDAs: (i) introducing an arbitrary Lp-norm to measure the between-class scatter and the within-class scatter and (ii) minimizing an extra regularization term in 2DLDA.

Remark 1.

Problem (8) can be viewed as a general framework for two-dimensional LDA. By choosing different and , we obtain different existing two-dimensional LDA models. In particular, when , , and , then G2DLDA is 2DLDA; when , , and , then G2DLDA is L12DLDA. Further, when , G2DLDA degenerates to the vector based LDA. In this situation, when and , G2DLDA is LDA; when and , G2DLDA is LDA-L1; when and , G2DLDA is LDA-Lp. In particular, when , and , G2DLDA is just RDA, which is an important improvement of LDA by considering an extra regularization term. In RDA, the regularization term not only makes it avoid the SSS problem, but also makes direct matrix operation available for high dimensionality [13]. Moreover, it helps controlling the model complexity and hence avoids the over-fitting problem [15].

However, for matrix based LDA, there is no corresponding regularized model. To give a clearer picture of the regularization term in G2DLDA, we reformulate problem (8) as:

(9)

By observing problem (9

), we see G2DLDA can be separated into two parts: the first part includes the first term in the objective together with the constraint, which realizes minimizing the empirical error on the training data, while the second part is the regularization term in the objective, which makes sure our G2DLDA not only works on the training data, but also controls the confidence interval of our model. The regularization technique is in fact applied extensively in many pattern recognition methods, such as some great improvements on support vector machines (SVMs)

[34, 35] with L2-norm-like regularization terms, which realized the structural risk minimization. For our G2DLDA, the regularization term on one hand can control the model complexity and hence generate better generalization ability. On the other hand, we will see that in the following solving procedure of G2DLDA, this additional regularization term conquers the singularity problem that exists in (6) The influence of different and the effect of the regularization term to our G2DLDA will also be investigated experimentally.

Iv-B The solving of the proposed G2DLDA for one projection

The problem (8) is in the ratio form, and both its denominator and nominator contain the Lp-norm. Therefore, it is hard to obtain all the projection directions once for all. Hence, we first consider the corresponding problem with one projection vector

(10)

subject to , where .

As in (9), we rewrite problem (10) as

(11)

The above problem is equivalent to

(12)

where represents the diagonal matrix with its -th element , .

We now present an iterative algorithm to solve (12). Let be the iteration number. Denote

(13)
(14)

Then we solve the following problem to get :

(15)

By the definition of , we see that the regularization term makes it positive definite. Note that to make sure is well defined, it requires that and . If this happens, we should let , where is a small random vector.

Now we solve problem (15). Let the Lagrangian of (15) be

(16)

Then the corresponding KKT conditions are

(17)

From (17), we get . Therefore,

(18)

Now we summarize the above procedure in Algorithm 1.

  G2DLDA solving algorithm for one
       direction
Input: The training data set
with , , parameter ,
stopping criterion and maximum iteration number
.

I. Initialization. Set the iteration number and
, where 1 is the vector of all ones.
II. Repeat

 (a) Compute and according to (13) and (14),
   respectively.

 (b) Compute according to (18), and let
   .

 (c) Set .
Until or .
Output: Discriminant vector .

Next, we show that the above Algorithm 1 is convergent when .

Proposition 2.

When , Algorithm 1 monotonically decreases the objective of problem (10) in each iteration, and hence converges.

Proof: The proof is in the supplemenary material.

We now analyze the time complexity of 2DLDA, L12DLDA and the proposed algorithm when one discriminant discriminant is learnt. Denote the number of samples, the number of classes , the feature space , , and the iteration number. For 2DLDA [5], the time complexity is . The time complexity of L12DLDA [30, 31] is . For our G2DLDA, the time complexity to compute and in (13) and (14) is and . To obtain in (18), it needs . Therefore, the total complexity of G2DLDA is .

Iv-C G2DLDA for multiple orthogonal projection directions

By performing Algorithm 1, we get one projection vector. If we want to project the data into multi-dimensional space, more orthogonal projection vectors are needed. Suppose we have obtained the first orthogonal discriminant vectors , . To compute the next projection vector by minimizing , it needs to satisfy the following orthogonal constraints [36]

(19)

Denote and let be the linear subspace with . It is obvious that is required to be orthogonal to . Give a basis matrix of the space , where is the null space of with . Then, we can solve the following problem

(20)

to obtain . Note that the above problem projects the data onto , and therefore the computation of is in a -dimensional space. This leads the output to be orthogonal to any vectors in .

In practical, to obtain and the solution of (20), we first consider to solve the following linear equation

Its solution spans a -dimensional space. We apply the Gram-Schmidt procedure to its solution, and consequently obtain an orthogonal basis of the linear subspace . This implies is the solution of , that is . After obtaining , let , , , , . Then we only need to solve using the updated data. Or equivalently, we solve

(21)

which can be solved by Algorithm 1.

As we can see, the solution of problem (21) lies in . Now we transform it to an element in . Since and are two isomorphic linear spaces, and is a linear isomorphism between them, then for each , holds, where is the -th representation coefficient of , and , where is the inner product of and . Therefore, . This implies that multiplying the left side of by gives the coefficient of the representation by , and we set .

 G2DLDA solving algorithm for multiple
        directions
Input: The training input data matrix ,
parameter , and the desired discriminant vectors
number .
Process:

I. Initialization. Set , and ;

II. For to
  (a) Compute , , and
   , , .
  (b) Apply Algorithm 1 to the updated data in (a),
   and obtain the optimal solution .
  (c) Update , where ;
  (d) Solve the linear equation , and
   update as the obtained orthogonal solution.
End
Output: W.
Proposition 3.

By performing Algorithm 2, the obtained multiple projection vectors are orthogonal to each other.

Proof: Since is the orthonormal basis generated by the solution of , where , we have , , and hence .

V Experiments

In this section, experiments are conducted on three contaminated ORL [37], AR [38], FERET [39] human face databases to evaluate the performance of the proposed G2DLDA compared to 2DPCA [40], 2DPCA-L1 [41], 2DLDA [5], and L12DLDA [30, 31]. We here write 2DPCA-L1 as L12DPCA for symbol unification. The learning parameter of L12DLDA is selected optimally from the set . For our G2DLDA, we take from , and the parameter is chosen optimally from the set . When we examine the influence of the regularization term to our methods, we also consider . In addition, is initialized as instead of a random vector in G2DLDA, where 1

is the vector of all ones, which makes sure that the comparison between our G2DLDA and other methods is not affected by random initialization. All of our experiments are carried out on a PC machine with Intel 3.30 GHz CPU and 4 GB RAM memory under Matlab 2017b platform. To test the discriminant ability for various methods, we first project test images into a new space obtained by each dimensionality reduction method on training samples; then we apply the nearest neighbor classifier under F-norm metric to classify the test face images. In this section, we give main conclusions. For more detailed descriptions, we refer the readers to the supplemenary materials.

V-a Human face databases

The ORL database contains 400 samples of 40 subjects, where each subject has 10 different images. Some of the images were taken at different times, varying the lighting, facial expressions and facial details. All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position, with pixels 11992. We here crop and resize each image to the size 3232. 6 images per subject are randomly selected to formed the training set, and the rest images form the test set. Each training face image is added with the random salt and pepper noise to each whole training image with noise densities 0.1, 0.2 and 0.3, respectively, as shown in Figure 1.

Fig. 1: Sample faces from the ORL database with the random salt and pepper noise to each whole training image with noise densities (a) 0.1, (b) 0.2 and (c) 0.3, respectively.

The AR face database contains 120 subjects with each object containing 26 images of size 5040. All the images were taken with different facial expressions and illuminations, and some images were occluded with black sunglasses or towels, as shown in Figure 2. We here use a subset of the AR database that contains 100 subjects with each subject containing 12 images, and further crop and resize each face image to the size . Two of the 12 images are occluded with black sunglasses, and two of them are occluded with towels. The rest 8 images of each subject are unoccluded. For unoccluded faces, we randomly choose 5 images of each subject together with the natural occluded images as the training set, and the rest 3 constitute the test set. The above originally unoccluded training images are artificially polluted with random Gaussian noise of rectangular form with mean 0 and variances 0.01, 0.05 and 0.1, respectively, and the area of noise covers 50% of each image at random position. The position, the length and width of the noise rectangular are also randomly generated. The sample face images of the polluted training data are shown in Figure 2.

Fig. 2: Sample faces from the AR database with 50% random rectangular Gaussian noise with mean 0, and variances (a) 0.01, (b) 0.05 and (c) 0.1, respectively. The last two images of each row are the natural occulted faces.

The FERET database contains 1564 sets of images for a total of 14126 images that include 1199 individuals and 365 duplicate sets. Each photography session used the same physical setup, and for some individuals, over two years had elapsed between their first and last sittings, with some subjects being photographed multiple times. We here use a subset that contains 1400 gray scale images of 200 individuals, with each image cropped and resized to 3232. For each individual there are 7 face images with expression, illumination and age variation. 4 images per subject are randomly selected to formed the training set, and the rest images form the test set. Each training face is added with black block at random position, and the area of noise covers 10%, 20% and 30% respectively, as shown in Figure 3.

Fig. 3: Sample faces from the FERET database with random rectangular black block covered on (a) 10%, (b) 20% and (c) 30% percentages of each image, respectively.

V-B The influence of

We first investigate the influence of to our G2DLDA. For each , we record its accuracy when the reduced dimension varies from 1 to 32, and the results are give in Supplemenary Figures S1, S2, S3 (Figure S1, S2, S3 in supplemenary material). For ORL database, when and , G2DLDA has the highest accuracy and performs the stablest. In particular, when , its highest accuracy is 94.38%. When the noise densities are 0.2 and 0.3, the corresponding results suggest the similar performance of G2DLDA. Generally, when and 1.5, G2DLDA behaves the best on the ORL database. For AR and FERET databases, when , G2DLDA performs better overall. In summary, when the value of is smaller than 2, G2DLDA behaves better.

V-C The influence of the regularization parameter

Next, we study the effect of the regularization term to G2DLDA. For this purpose, we let , respectively. The corresponding results are illustrated in Supplemenary Figures S4, S5 and S6. For ORL database, we take for all the noise densities since G2DLDA performs the best for all its noise situations. For all the noise densities, the best accuracies when are better than those of , which demonstrates the benifit of adding the regularization term. When the noise densities are 0.1 and 0.2, gives the best results. When the noise density increases to 0.2 or 0.3, gives the best performance. The above results show that for the same data set, the optimal regularization parameters for different noise cases are within a certain interval. The results on the AR and FERET databases confirm the improvement of G2DLDA brought forward by the regularization term, and for the same database with different noise, the optimal exists in a similar range.

V-D The influence of the reduced dimension

We finally examine the behavior of each method when the reduced dimension varies. The corresponding for each data database is taken as above. For the ORL database, the result is given in Figure 4. It is obvious that for all the methods, their overall behaviors become poor as the noise density increases. And as the increasing of the reduced dimension, they all grow fast to the highest accuracy and then descend. This phenomenon shows that within the first a few dimensions, the usable discriminant information increases as the dimension grows. However, as the dimension becomes larger, useless disturbance information may also be included since the image data is polluted. However, its influence to our G2DLDA is smaller than those of the other methods. Moreover, the highest accuracies of G2DLDA for all the density levels are better than the other methods. To see the results more clearer, we list the highest accuracies of all the methods in Table I. From the table, we see for all the three densities of noise, when and 1.5, G2DLDA outperforms the other methods, and its best accuracy is at least 3% higher. This also shows the necessity to choose a proper . We also conduct the similar experiments on the AR database and the FERET database, and their results are given in Figures 5, 6 and Tables II and III. The corresponding results again demonstrate that our G2DLDA has the most robust performance.

We summarize all the experimental results in the Supplemenary material.

(a) 2DPCA
(b) L12DPCA
(c) 2DLDA
(d) L12DLDA
(e) G2DLDA
Fig. 4: Accuracies of 2DPCA, L12DPCA, 2DLDA, L12DLDA and G2DLDA under different reduced dimensions on the contaminated ORL database with random rectangular salt and pepper noise on each whole training data with densities 0.1, 0.2 and 0.3, respectively.
Noise density
Method 0.1 0.2 0.3
Acc (Dim) Acc (Dim) Acc (Dim)
2DPCA 91.88 (6) 91.88 (6) 83.13 (8)
L12DPCA 81.25 (2) 75.00 (2) 53.13 (2)
2DLDA 91.88 (9) 91.25 (7) 80.00 (3)
L12DLDA 91.25 (4) 78.13 (1) 73.13 (1)
G2DLDA (p=0.5) 91.88 (26) 88.75 (26) 73.75(1)
G2DLDA (p=1) 93.75 (11) 93.75 (19) 85.00 (17)
G2DLDA (p=1.5) 94.38 (7) 94.40 (14) 87.50 (12)
G2DLDA (p=2) 92.50 (15) 91.88 (15) 77.50 (21)
G2DLDA (p=5) 91.25 (10) 92.50 (9) 76.25 (19)
TABLE I: Comparison of different methods in terms of the best accuracies(%) on the contaminated ORL database with 50% rectangular random salt and pepper noise on each whole face image, with densities 0.1, 0.2 and 0.3, respectively. The optimal dimension is shown next to its corresponding accuracy.
(a) 2DPCA
(b) L12DPCA
(c) 2DLDA
(d) L12DLDA
(e) G2DLDA
Fig. 5: Accuracies of 2DPCA, L12DPCA, 2DLDA, L12DLDA and G2DLDA under different reduced dimensions on the contaminated AR database with 50% random rectangular Gaussian noise on the training data with variances 0.01, 0.05 and 0.1, respectively.
Noise level
Method 0.01 0.05 0.1
Acc (Dim) Acc (Dim) Acc (Dim)
2DPCA 89.00 (8) 85.33 (6) 81.33 (6)
L12DPCA 73.33 (3) 61.00 (3) 49.33 (4)
2DLDA 92.00 (7) 84.67 (3) 81.33 (2)
L12DLDA 88.33 (3) 84.33 (2) 82.33 (1)
G2DLDA (p=0.5) 92.00 (4) 81.67 (4) 79.00 (2)
G2DLDA (p=1) 92.00 (4) 88.33 (3) 88.67 (3)
G2DLDA (p=1.5) 93.00 (8) 83.67 (2) 85.67 (3)
G2DLDA (p=2) 88.33 (29) 73.67 (7) 70.00 (8)
G2DLDA (p=5) 88.67 (26) 71.67 (24) 54.67 (10)
TABLE II: Comparison of different methods in terms of the best accuracies(%) on the contaminated AR database with 50% rectangular random Gaussian noise with mean 0, and standard variances 0.01, 0.05 and 0.1, respectively. The optimal dimension is shown next to its corresponding accuracy.
(a) 2DPCA
(b) L12DPCA
(c) 2DLDA
(d) L12DLDA
(e) G2DLDA
Fig. 6: Accuracies of G2DLDA () on the contaminated FERET database with rectangular black block covered on 10%, 20% and 30% percentages of each image, respectively.
Noise percentage
Method 10% 20% 30%
Acc (Dim) Acc (Dim) Acc (Dim)
2DPCA 13.83 (18) 10.17 (20) 6.67 (13)
L12DPCA 13.50 (30) 17.50 (2) 11.67 (10)
2DLDA 35.67 (6) 31.50 (3) 30.67 (2)
L12DLDA 34.83 (2) 28.33 (3) 23.00 (10)
G2DLDA (p=0.5) 13.67 (30) 30.83 (1) 29.00 (1)
G2DLDA (p=1) 40.50 (4) 36.00 (3) 33.17 (3)
G2DLDA (p=1.5) 37.67 (3) 32.00 (3) 32.17 (3)
G2DLDA (p=2) 24.67 (23) 17.17 (26) 14.50 (15)
G2DLDA (p=5) 29.50 (11) 18.00 (7) 12.67 (9)
TABLE III: Comparison of different methods in terms of the best accuracies(%) on the contaminated FERET database with rectangular black block covered on 10%, 20% and 30% percentages of each image, respectively. The optimal dimension is shown next to its corresponding accuracy.

Vi Conclusion

In this paper, a novel generalized Lp-norm () two-dimensional linear discriminant analysis with regularization (G2DLDA) has been proposed. G2DLDA is a generalized framework, and can be solved by a simple iterative algorithm. Comparing to 2DLDA and L12DLDA, G2DLDA can avoid the singularity problem, and is more robust to outliers. Moreover, the regularization term also improves its generalization performance. Experimental results confirmed its effectiveness. The corresponding G2DLDA Matlab code and slide can be downloaded from http://www.optimal-group.org/Resources/Code/G2DLDA.html.

It should be noted that, the regulation term can be replaced by any s-norm term for , rather than the same Lp-norm as in the between-class scatter and the within-class scatter. In the future, we expect to extend G2DLDA to its nonlinear case, and study the convergent result of the algorithm for and . The bilateral G2DLDA is also interesting.

Acknowledgment

This work is supported by the National Natural Science Foundation of China (No.61703370, No.61866010, No.11871183 and No.61603338), the Natural Science Foundation of Zhejiang Province (No.LQ17F030003 and No.LY18G010018), the Natural Science Foundation of Hainan Province (No.118QN181), and the Foundation of China Scholarship Council (No. 201708330179).

References

  • [1] Fisher R A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 1936, 7(2): 179-188.
  • [2] Fukunaga K. Introduction to statistical pattern recognition, second edition. Academic Press, New York, 1991.
  • [3] Kong H, Teoh E K, Wang J G, et al. Two-dimensional Fisher discriminant analysis: forget about small sample size problem. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005: 761-764.
  • [4] Li M, Yuan B. 2D-LDA: A statistical linear discriminant analysis for image matrix. Pattern Recognition Letters, 2005, 26(5): 527-532.
  • [5]

    Liu K, Cheng Y Q, Yang J Y. Algebraic feature extraction for image recognition based on an optimal discriminant criterion. Pattern Recognition, 1993, 26(6): 903-911.

  • [6]

    Yang J, Zhang D, Yong X, Yang J Y. Two-dimensional discriminant transform for face recognition. Pattern Recognition, 2005, 38: 1125-1129.

  • [7] Xiong H, Swamy M N S, Ahmad M O. Two-dimensional FLD for face recognition. Pattern Recognition, 2005, 38(7): 1121-1124.
  • [8]

    Kong H, Wang L, Teoh E K, Wang, J G, Venkateswarlu R. A framework of 2D Fisher discriminant analysis: application to face recognition with small number of training samples. Proceeding of Computer Vision and Pattern Recognition, 2005, 2: 1083-1088.

  • [9] Jing X Y, Wong H S, Zhang D. Face recognition based on 2D Fisherface approach. Pattern Recognition, 2006, 39(4): 707-710.
  • [10] Friedman J H. Regularized discriminant analysis. Journal of the American Statistical Association, 1989, 84(405): 165-175.
  • [11] Dudoit S, Fridlyand J, Speed T P. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 2002, 97(457): 77-87.
  • [12]

    Bickel P J, Levina E. Some theory for Fisher’s linear discriminant function,‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli, 2004: 989-1010.

  • [13] Guo Y, Hastie T, Tibshirani R. Regularized linear discriminant analysis and its application in microarrays. Biostatistics, 2006, 8(1): 86-100.
  • [14] O’Sullivan F. A statistical perspective on ill-posed inverse problems. Statistical Science, 1986, 1; 502-527.
  • [15] Chen X, Yang J, Mao Q, et al. Regularized least squares fisher linear discriminant with applications to image recognition. Neurocomputing, 2013, 122: 521-534.
  • [16]

    Sugiyama M. Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. Journal of Machine Learning Research, 2007, 8(1): 1027-1061.

  • [17] Lanckriet G R G, Ghaoui L E, Bhattacharyya C, et al. A robust minimax approach to classification. Journal of Machine Learning Research, 2003, 3: 555-582.
  • [18] Kim S J, Magnani A, Boyd S. Robust Fisher discriminant analysis. Advances in Neural Information Processing Systems. 2005: 659-666.
  • [19] Li X, Hua W, Wang H, et al. Linear discriminant analysis using rotational invariant L norm. Neurocomputing, 2010, 73(13-15): 2571-2579.
  • [20] Zhong F, Zhang J. Linear discriminant analysis based on L1-norm maximization. IEEE Transactions on Image Processing, 2013, 22(8): 3018-3027.
  • [21] Wang H, Tang Q, Zheng W. L1-norm-based common spatial patterns. IEEE Transactions on Biomedical Engineering, 2012, 59(3): 653-662.
  • [22] Chen X, Yang J, Jin Z. An improved linear discriminant analysis with L1-norm for robust feature extraction. The 22nd IEEE International Conference on Pattern Recognition, 2014: 1585-1590.
  • [23] Ye Q, Yang J, Liu F, et al. L1-norm distance linear discriminant analysis based on an effective iterative algorithm. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(1): 114-129..
  • [24] Liu Y, Gao Q, Miao S, et al. A non-greedy algorithm for L1-norm LDA. IEEE Transactions on Image Processing, 2017, 26(2): 684-695.
  • [25]

    Zheng W, Lin Z, Wang H. L1-norm kernel discriminant analysis via Bayes error bound optimization for robust feature extraction. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(4): 793-805.

  • [26] Ye Q L, Zhao C X, Zhang H F, et al. Recursive “concave-convex” Fisher linear discriminant with applications to face, handwritten digit and terrain recognition. Pattern Recognition, 2012, 45: 54-65.
  • [27] Li C N, Zheng Z R, Liu M Z, et al. Robust recursive absolute value inequalities discriminant analysis with sparseness. Neural Networks, 2017, 93 :205-218.
  • [28] Oh J H, Kwak N. Generalization of linear discriminant analysis using Lp-norm. Pattern Recognition Letters, 2013, 34(6): 679-685.
  • [29] An L L, Xing H J. Linear discriminant analysis based on Lp-norm maximization. The 2nd International Conference on Information Technology and Electronic Commerce (ICITEC), 2014: 88-92.
  • [30] Li C N, Shao Y H, Deng N Y. Robust L1-norm two-dimensional linear discriminant analysis. Neural Networks, 2015, 65: 92-104.
  • [31] Chen S B, Chen D R, Luo B. L1-norm based two-dimensional linear discriminant analysis (In Chinese). Journal of Electronics and Information Technology, 2015, 37(6): 1372-1377.
  • [32] Li M, Wang J, Wang Q, et al. Trace ratio 2DLDA with L1-norm optimization. Neurocomputing, 2017, 266(29): 216-225.
  • [33] Pekalska E, Duin R. The dissimilarity representation for pattern recognition: foundations and applications. World Scientific, 2006.
  • [34] Shao Y H, Zhang C H, Wang X B, et al. Improvements on twin support vector machines. IEEE Transactions on Neural Networks, 2011, 22(6):962-8.
  • [35] Shao Y H, Deng N Y, Yang Z M. Least squares recursive projection twin support vector machine for classification. International Journal of Machine Learning and Cybernetics, 2016, 7(3):411-426.
  • [36] Yu M, Shao L, Zhen X, et al. Local feature discriminant projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(9): 1908-1914.
  • [37] Samaria F S, Harter A C, Harter A. Parameterisation of a stochastic model for human face identification. Proceedings of the Second IEEE Workshop on Applications of Computer Vision, IEEE Xplore, 1994: 138-142.
  • [38] Martinez A M, Benavente R. The AR face database. CVC Technical Report #24, 1998.
  • [39] Phillips P J, Moon H, Rizvi S A, et al. The FERET evaluation methodology for face-recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(10): 1090-1104.
  • [40] Yang J, Zhang D, Frangi A F, et al. Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(1): 131-137.
  • [41] Li X, Pang Y, Yuan Y. L1-norm-based 2DPCA. Systems, Man, and Cybernetics, Part B: IEEE Transactions on Cybernetics, 2010, 40(4): 1170-1175.