Modal Regression based Atomic Representation for Robust Face Recognition

11/05/2017 ∙ by Yulong Wang, et al. ∙ NONG Village Practical JISHUYU Information Magazine University of Macau 0

Representation based classification (RC) methods such as sparse RC (SRC) have shown great potential in face recognition in recent years. Most previous RC methods are based on the conventional regression models, such as lasso regression, ridge regression or group lasso regression. These regression models essentially impose a predefined assumption on the distribution of the noise variable in the query sample, such as the Gaussian or Laplacian distribution. However, the complicated noises in practice may violate the assumptions and impede the performance of these RC methods. In this paper, we propose a modal regression based atomic representation and classification (MRARC) framework to alleviate such limitation. Unlike previous RC methods, the MRARC framework does not require the noise variable to follow any specific predefined distributions. This gives rise to the capability of MRARC in handling various complex noises in reality. Using MRARC as a general platform, we also develop four novel RC methods for unimodal and multimodal face recognition, respectively. In addition, we devise a general optimization algorithm for the unified MRARC framework based on the alternating direction method of multipliers (ADMM) and half-quadratic theory. The experiments on real-world data validate the efficacy of MRARC for robust face recognition.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

page 5

page 7

page 8

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Representation-based classification (RC) methods have drawn intensive interest and shown great potential in face recognition in recent years [1, 2, 3, 4]. An appealing merit of RC methods is that they can exploit the subspace structure of data in each class for classification. Concretely, RC methods are based on the observation that many real-world data in a class often approximately lie in a low-dimensional subspace, such as face images of a subject under varying illumination [5] and hand-written digit images with distinct rotations and translations [6].

In the past decades, various RC methods have been proposed for face recognition (FR). Inspired by the success of lasso regression in compressed sensing, Wright et al. [1] first developed the sparse RC (SRC) method for face recognition. To improve the efficiency of SRC, Zhang et al. [3]

put forward the collaborative RC (CRC) approach by using the ridge regression to compute the representation vector. To exploit the block structure of the dictionary, Elhamifar and Vidal

[7] proposed a block sparse RC (BSRC) method by utilizing the group lasso regression for FR. Zhang et al. [8] proposed a nonlinear extension of SRC by incorporating the kernel trick into SRC. Shekhar et al. [9] developed a joint sparse RC (JSRC) method for multimodal face recognition by utilizing the correlation information among distinct modalities. More recent advances on RC methods can be found in the references [10, 11, 12, 13, 14]. Previous RC methods are devised separately based on different motivations. In our previous work [13], we developed a unified framework termed as atomic representation-based classification (ARC). We show that many important RC methods can be reformulated as special cases of ARC.

Figure 1: An example to show the mode of the noise variable. (a) a noiseless facial image; (b) a facial image of the same subject with sunglasses; (c) the noise image, i.e., the difference between (b) and (a); (d) the noise distribution of pixels in (c) and its mode, i.e., .

Despite the empirical success, most previous RC methods are based on the conventional regression models, such as lasso regression [15], ridge regression [16] or group lasso regression [17]

. These regression models in fact impose a predefined assumption on the distribution of the noise variable, such as the Gaussian distribution or Laplacian distribution

[10, 18, 19]. Such limitation may impede their performance when the assumptions violate in the presence of complicated noises in real-world face recognition.

In this paper, we propose to learn the representation vector based on the modal regression and the atomic norm regularization. The modal regression [20]

aims to reveal the relationship between the input variable and the response variable by regressing towards the conditional mode function. For a continuous random variable

, the mode is defined as the value at which its density function attains its peak value, i.e., . For a set of observations, the mode is the value that appears most frequently. Fig. 1 shows the mode of the noise variable in a facial image with sunglasses. Previous research results [20, 21, 19]

have shown that one of the most appealing merits of modal regression is its robustness to various complex noise, including heavy-tailed noises, impulsive noises and outliers. The novelties and contributions of this work are summarized as follows:

  • We develop a general unified framework termed as modal regression based atomic representation and classification (MRARC) for robust face recognition and reconstruction. Unlike previous RC methods, MRARC does not require the noise variable to follow any specific predefined distributions. This gives rise to its ability in handling the various complicated noises in reality.

  • Using MRARC as a general platform, we propose four novel modal regression based RC methods by specifying distinct atomic sets for unimodal and multimodal face recognition, respectively.

  • We devise a general optimization algorithm for MRARC based on the Alternating Direction Method of Multipliers (ADMM) [22] and half-quadratic theory [23]. Thus, the algorithm can be applied to each method in MRARC.

The remainder of the paper is arranged as below. Section II briefly reviews previous related works. In Section III, we describe the proposed method. Section IV presents the experiments on real-world databases. Finally, Section V concludes.

Ii Related Work

This section briefly introduces some representative RC methods. Consider a classification problem with classes. Let be a matrix of labeled training samples from the -th class for . Define . Table I summarizes the key notations used in this paper. Given the training data matrix , the goal is to correctly determine the label of any new test sample .

1) SRC (Sparse Representation based Classification) [1]: The SRC method tries to seek the sparsest solution to the linear system of equations for classification. To this end, it first computes the representation vector by solving the minimization problem

(1)

where the norm is defined as . To deal with noise, SRC solves the following minimization problem also known as lasso regression [15]

(2)

where is a positive regularization parameter.

2) CRC (Collaborative Representation based Classification) [3]: To improve the efficiency of SRC, the CRC method computes the representation vector by solving the norm based ridge regression problem

(3)

The problem (3) has a closed-form solution, which can be explicitly expressed as . Here

denotes the identity matrix.

Notation Description
number of classes
feature dimension
number of all training samples
matrix of all training samples
a new test sample
Table I: Key notations used in this paper.

3) BSRC (Block Sparse Representation based Classification) [7]: This method suggests considering the block structure of training data. It assumes that training samples in each class form a few blocks. Let be a partition of , i.e., and . It computes the representation vector based on group lasso regression

(4)

where denotes the subvector of with entries of indexed by .

After the representation vector is obtained, RC methods compute the class-specific residuals for each class

where is the vector that only keeps the nonzero entries of with respect to the -th class [1]. Finally, the test sample is assigned to the class yielding the minimal residual.

In our previous work [13], we have proposed a general unified framework called atomic representation based classification (ARC). Most RC methods can be reformulated as special cases of ARC by specifying the atomic set. To review ARC, we first introduce the definition of the atomic norm.

Definition 1.

[24] The atomic norm of with respect to an atomic set is defined by

where denotes the convex hull of the set .

Two typical examples of the atomic norm are the norm and the nuclear norm [24]. The former induces sparsity for vectors while the latter induces low rankness for matrices. The atomic representation (AR) model is given by

(5)

For noisy data , we consider the regularized AR model

(6)

Then we compute the class-specific residual for each class and assign to the class with the minimal residual.

Figure 2: As a general framework, ARC includes many RC methods as special cases.

Most previous RC methods belong to ARC as special cases with the specific atomic set , as shown in Fig. 2. For example, if we define where is a unit vector with its only nonzero entry 1 in the -th coordinate, we have and ARC reduces to SRC [1]. Similarly, BSRC [2] also belongs to ARC. Concretely, Let be a partition of as mentioned before, i.e., and . Denote . Define It can be proved that and ARC reduces to BSRC by setting . It can be shown that CRC also belongs to ARC by using the atomic set . Another example of ARC is the low-rank RC (LRRC) [25] when multiple test samples are considered simultaneously. Given test samples , we arrange them as columns of a matrix . The LRRC model looks for the representation matrix with the lowest rank by

(7)

denotes the nuclear norm of

, i.e., the sum of singular values of

. If we define the atomic set we have and ARC reduces to LRRC.

Iii Proposed Method

In this section, we describe the proposed modal regression based atomic representation and classification (MRARC) framework for robust face recognition and reconstruction.

Iii-a Modal Regression

We first introduce some basic facts and analysis of modal regression [20, 21, 19]. Denote and as the input and response random variables, respectively. Consider the observation model

(8)

where denotes the unknown target function and represents the noise term. The goal of the regression problem is to approximate the unknown target function . Modal regression aims to recover the target function by regressing towards the following modal regression function [20].

Definition 2.

The modal regression function is defined as

where denotes the conditional density of conditioned on .

If we assume the mode of the conditional distribution of the noise at any to be zero, i.e.,

there holds according to Eq. (8). Here denotes the conditional density of the noise conditioned on . Thus, under zero-mode noise assumption, we have

and the target is converted to estimating the modal regression function

. To this end, we introduce the modal regression risk [19] as below.

Definition 3.

For a measurable function , its modal regression risk is defined as

where denotes the marginal distribution of .

It can be proved that is the minimizer of the risk [19], i.e., , where denotes the set of all measurable functions on . For any measurable function, denote as the error random variable. According to Eq. (8), there holds . Then the density of can be formulated as

Setting , we have

Thus, the modal regression problem is converted to minimizing the value of the density at . In reality, we often have only a finite samples and the density is unknown. For this reason, the Parzen window method [26] is utilized to estimate as below

(9)

where for and denotes the general kernel function satisfying . Some common kernel functions include the Gaussian kernel and the Epanechnikov kernel , where if and otherwise. Then we have the estimator of the modal regression function as follows

(10)

Under the zero-mode noise assumption, there holds

Based on the analysis above, we can find that modal regression does not require the noise to follow any specific preset distributions such as the Gaussian distribution required by some conventional regression models. This makes modal regression attractive in handling various complex noise in practice [19].

Iii-B Modal Regression based Atomic Representation (MRAR)

Define the error vector where . The problem (10) can be rewritten in the equivalent form

(11)

where

denotes the modal regression based loss function (MRLF)

(12)

For simplicity, consider the linear function where is the unknown coefficient vector. Then and , where and . Incorporating the MRLF into the regularized AR model (6), we have the following modal regression based atomic representation (MRAR) model

(13)

The problem above is difficult to tackle due to the combination of the nonlinearity of the MRLF and the abstract atomic norm regularization. In addition, most previous optimization techniques are originally proposed for RC methods with special atomic norms, such as sparse representation. They are difficult to be applied for the general MRAR framework. In this paper, we devise an effective optimization algorithm to implement the general MRAR model based on ADMM [22] and the half-quadratic (HQ) theory [23]. We first introduce an auxiliary vector and reformulate the objective function in Eq. (13) as

(14)

The augmented Lagrangian function of Eq. (14) is

(15)

where is independent of and . Here is the Lagrangian multiplier and denotes a penalty parameter. Given the initialization of , and , we alternatively update each variable while fixing others in each iteration.

In the first step, we update while fixing and by

(16)

 

Algorithm 1 Implementation of MRAR (13)

 

Input: , , , and .

Initialization: , , , , , , .

while not converged and do

  • Update by the proximity operator

  • Update as by Eq. (18).

  • Update the Lagrange multiplier vector by

  • Check the convergence conditions:

end while

Output: .

  The optimal solution of Eq. (16) can be written as

(17)

where denotes the proximity operator with respect to [24]. Here we introduce the proximity operators for some common atomic sets

  • ,

  • ,

Here denotes a vector of the sign of entries of and represents the Hadamard product. if and otherwise. denotes the index sets in BSRC aforementioned. For the vector , denotes the subvector of containing the entries indexed by the set .

In the second step, we update the auxiliary variable while fixing and by

(18)

We optimize the problem in (18) based on the half-quadratic (HQ) theory [23]. For the function , there exists a dual convex function [23] such that

where the infimum is reached at For the Gaussian kernel , . Then the MRLF in Eq. (12) is rewritten as

where .

 

Algorithm 2 MRAR based Classification

 

Input: An atomic set , training samples , a test sample , and the parameter .

Output: identity .

  • Normalize the columns of to have unit Euclidean norm.

  • Learn the representation vector via MRAR (13).

  • Calculate the residuals

  • Predict .

  Thus, the problem in (18) can be reformulated as

(19)

In light of the HQ theory [23], the problem (19) can be tackled by the following alternate procedure

(20)
(21)

If the Gaussian kernel is used, the scale parameter is often determined empirically [27]. The problem (21) has a closed-form solution

where denotes a square diagonal matrix with the elements of on the main diagonal. The iterations above are guaranteed to converge according to the HQ theory [23]. Finally, the Lagrange multiplier vector is updated by . Algorithm 1 summarizes the algorithm of MRAR.

Iii-C MRAR based Classification

In this section, we develop the general MRAR based classification (MRARC) framework and some novel methods as special cases of MRARC.

Given the training samples and a new test sample , the first step is to compute the optimal coefficient vector using the MRAR model

(22)

Secondly, we calculate the class-specific residuals for each class. Unlike most previous methods using the norm, we utilize the MRLF to calculate the residuals

where denotes the vector only keeping the nonzero entries of with respect to the -th class. Finally, the test sample is assigned to the class yielding the minimal residual. Algorithm 2 summarizes the classification procedure of MRARC.

It is worth pointing out that MRARC is a general framework for pattern classification. We can use it to devise new classification methods by specifying the atomic set . Concretely, we refer to the MRARC with the atomic sets , , and as MRSRC, MRBSRC and MRCRC for short, respectively.

Iii-D MRARC for Multimodal Data

Assume that we have modalities and the corresponding dimensions are . Let and be the test sample and dictionary in the -th modality where . For multimodal data, the MRAR model can be formulated as

(23)

where denotes the -th column of the matrix and is an atomic set of matrices. To take advantage of the correlation information among multiple modalities, we can use the joint sparsity inducing atomic set

where denotes the -th row of the matrix . We refer to the MRARC method using in Eq. (23) as modal regression based joint sparse representation classification (MRJSRC). The MRJSRC method encourages the representation vectors of a test data (i.e., columns of ) in distinct modalities to have the same sparsity pattern and locations of nonzero entries. The optimization problem in Eq. (23) can be tackled using the similar way as Algorithm 2. The main difference is that the proximity operator for the atomic set is formulated as

for . Once the solution of Eq. (23) is obtained, the class-dependent residuals are computed by

Finally, the multimodal test sample is assigned to the class yielding the minimal residual.

Iv Experiments

In this section, we evaluate the efficacy of the proposed MRARC framework for unimodal and multimodal face recognition against various noises.

Experiments are conducted on four public available databases, i.e., the Extended Yale B database (EYaleB) [28], the AR database [29], the CMU MoBo database [30] and the the CMU PIE database [31]. Fig. 3 shows some sample images in these databases and Table II

depicts the details of them, including the number of classes, data dimension and the number of instances. For unimodal face recognition, we compare MRSRC, MRBSRC and MRCRC in the MRARC framework with the three typical methods SRC, BSRC and CRC in the ARC framework. The linear regression-based classification (LRC)

[32] approach is used as the baseline. For multimodal face recognition, we compare the proposed MRJSRC in MRARC with the JSRC [9] method. For ARC and MRARC methods, the regularization parameter is tuned by searching a discrete set to achieve their best performance as possible. For MRARC methods, we set the penalty parameter in all experiments.

Figure 3: Sample images from different databases. (a) the Extended Yale B database; (b) the AR database. (c) the CMU PIE database.
Database #Class #Dimension #Instance
Extended Yale B 38 2,432 (images)
AR 100 2,600 (images)
CMU MoBo 24 96 (videos)
CMU PIE 68 41,368 (images)
Table II: Description of the used public databases.

Iv-a Face Recognition With Occlusion

In this subsection, we conduct five different experiments to analyze the performance of proposed methods for face recognition and reconstruction against occlusions.

Experiment 1–Effect of percent of occlusion: In the first experiment, we evaluate the performance of the competing methods against different levels of random occlusions. For each test image, a random square region is occluded by a baboon image, as shown in Fig. 4. The Extended Yale B database is used for the experiment and the images are resized to for efficiency. In the literature of RC methods [1], many researchers manually chose a subset of images in the database with normal or moderate light conditions for training and only test images have extreme light conditions. However, in real-world scenarios both training and testing images may have different light conditions, including moderate and extreme light conditions. For this reason, we randomly select half of images (32 images) per subject for training and the rest for testing. Fig. 4 shows the recognition rates of various methods as a function of the percent of occlusion. From Fig. 4, it can be seen that most competing methods achieve high recognition rates when the occlusion level is low. However, as the occlusion level increases, the recognition rates of LRC, and the three methods (SRC, BSRC and CRC) in ARC drop rapidly. In contrast, the three methods (MRSRC, MRBSRC and MRCRC) in MRARC outperform other methods in different occlusion levels.

Figure 4: Recognition rates versus percent of occluded pixels. Recognition rates as a function of the percent of occluded pixels using the Extended Yale B Database.
Figure 5: Recognition rates versus feature dimension with random occlusion. Recognition rates as a function of the feature dimension on the Extended Yale B Database against 20 percent occlusion with images of the first subjects. (a) ; (b) ; (c) .
Methods LRC SRC BSRC CRC MRSRC MRBSRC MRCRC
16 Mean 69.30 75.42 70.76 77.29 81.73 76.18 81.17

Min 67.52 74.26 69.57 75.58 80.26 74.26 79.69

Max 70.72 76.32 72.12 79.44 82.98 77.63 82.89
20 Mean 70.02 77.60 72.25 79.26 85.36 79.38 82.55

Min 68.26 76.81 71.13 78.13 83.88 77.14 81.50

Max 71.46 78.62 73.19 80.43 86.10 80.43 83.47
24 Mean 70.79 78.93 73.32 80.22 87.72 82.12 83.92

Min 69.82 77.80 72.29 79.36 86.43 81.33 82.65

Max 72.20 79.44 73.93 80.92 88.98 83.47 84.70
28 Mean 71.04 79.95 74.19 81.04 89.40 83.91 84.36

Min 70.31 78.70 73.03 80.43 88.65 83.14 83.39

Max 71.96 80.59 74.92 81.50 89.97 85.20 85.03
Table III: Recognition rates (%) using varying number of training samples per class on the Extended Yale B database with 20 percent occlusion in each test image using ten-run test. The image size is . Best results are marked bold.
LRC SRC BSRC CRC MRSRC MRBSRC MRCRC
Sunglasses 96 87 80.5 68.5 98 88.5 94
Scarves 95.5 93.5 96 95 97.5 95.5 98.5
Table IV: Recognition rates (%) with real-world occlusions on the AR database. Some results of LRC, SRC, BSRC and CRC are copied from the corresponding papers. The image size is . Best results are marked bold.

Experiment 2–Effect of feature dimension: In the second experiment, we study the effect of the feature dimension (or image size) to the recognition performance using the Extended Yale B database. To this end, we resize the images to , , , and , respectively. The corresponding downsampling ratio is , , , and , respectively. Fig. 5 shows the recognition rates versus feature dimension with 20 percent occlusion using facial images of the first , and subjects. The three methods in MRARC can improve the corresponding ones in ARC with varying feature dimensions and number of classes. In particular, the MRSRC method significantly outperform other competing methods especially when the feature dimension is low.

Experiment 3–Effect of training set size: In the third experiment, we analyze the impact of the training set size on the final face recognition performance with random occlusion. For each subject, we randomly select images for training and perform recognition on the test images with 20% occlusion. We repeat the experiment ten times and compute the mean, minimum and maximum recognition rates of each algorithm. The results are reported in Table III. It can be found in Table III that even with small training set, MRARC methods can enhance the ARC ones with large margin.

Experiment 4–Recognition with real-world occlusions: In the fourth experiment, we evaluate the performance of the proposed methods against real-world occlusions such as sunglass and scarf. Fig. 3(b) shows four facial images occluded by sunglasses or scarf in the AR database. The AR database is used for this experiment and the images are resized to . For each subject, the eight images with varying expressions are utilized for training. The four images with sunglasses or scarves are used for testing. Table IV reports the recognition results. Some results of LRC, SRC, BSRC and CRC are copied from the corresponding papers. The results suggest that the proposed MRARC methods can well handle facial images with real-world occlusions with high recognition accuracy.

Figure 6: Recognition rates versus percent of corrupted pixels. Recognition rates as a function of the percent of corrupted pixels in each test image. (a) the AR database; (b) the Extended Yale B Database.

Iv-B Face Recognition With Corruption

In this part, we evaluate the performance of proposed methods for face recognition and reconstruction against random corruption.

Experiment 1–Effect of percent of corruption:

In the first experiment, we study the performance of proposed methods against different levels of random pixel corruption. To this end, a fraction of pixels of each test image are randomly chosen and their values are replaced by random values following uniform distribution over the interval

. Like the settings in Section IV-A, we randomly select half of images per subject in the Extended Yale B databases for training and the rest are used for testing. As for the AR database, we use the seven images with expression and illumination variations per subject in the first session for training and the seven images in the second session for testing. Fig. 6 shows the recognition rates of competing algorithms as the percent of corruption varies from 10 to 60. The results demonstrate the superiority of the methods in MRARC over those in ARC for recognition with random corruption.


Methods LRC SRC BSRC CRC MRSRC MRBSRC MRCRC
16 Mean 71.67 65.94 58.50 59.96 81.88 76.77 81.95

Min 69.90 64.80 56.50 58.63 80.35 75.08 81.25

Max 73.03 66.94 59.62 61.43 82.98 77.96 82.89
20 Mean 73.66 67.17 56.55 62.55 85.01 79.67 82.53

Min 72.78 65.87 54.77 60.36 83.72 77.88 81.17

Max 74.59 67.93 58.39 63.98 85.94 81.25 83.47
24 Mean 74.38 67.60 55.10 63.03 87.68 82.84 82.86

Min 73.68 66.28 53.54 62.17 86.60 82.07 82.07

Max 75.08 68.59 56.66 64.47 88.73 85.03 83.80
28 Mean 74.88 68.22 53.17 63.62 89.35 85.21 83.08

Min 73.77 67.68 52.47 62.99 88.73 84.05 82.24

Max 75.66 69.16 53.95 64.80 90.05 85.94 84.05
Table V: Recognition rates (%) using varying number of training samples per class on the Extended Yale B database with 30 percent corruption in each test image using ten-run test. The image size is . Best results are marked bold.

Experiment 2–Effect of training set size: In the second experiment, we analyze the impact of the training set size on the recognition performance with random pixel corruption. Like the settings in Section IV-A, we vary the number of training samples per class from 16 to 28. Table V reports the recognition results using test images with 30% random corruption over ten runs. The results further validate the fact that MRARC methods can improve the recognition performance of ARC methods against random corruption in most cases.


Methods LRC SRC BSRC CRC MRSRC MRBSRC MRCRC
20 Mean 56.76 67.30 67.57 69.05 87.16 87.84 87.57
Min 52.70 60.81 60.81 64.86 82.43 81.08 83.78
Max 62.16 72.97 78.38 78.38 90.54 90.54 90.54
40 Mean 61.76 76.76 72.30 70.41 90.14 90.27 91.22
Min 52.70 70.27 64.86 59.46 86.49 86.49 89.19
Max 75.68 83.78 85.14 81.08 94.59 93.24 94.59
80 Mean 65.81 77.30 73.11 73.65 92.16 91.49 92.03
Min 55.41 70.27 63.51 68.92 87.84 86.49 87.84
Max 77.03 82.43 81.08 78.38 95.95 94.59 94.59
Table VI: Recognition rates (%) of various methods on the CMU MoBo database with distinct number of frames per subject over 10 random runs. Best results are marked bold.

Iv-C Results on the CMU MoBo Database

In this part, we evaluate the performance of MRARC for image set based face recognition (ISFR) using the CMU Mobo database [30]. The database consists of 96 video sequences of 24 subjects walking on a treadmill. For each subject, 4 video sequences are taken with four distinct walking patterns, respectively. The detected facial images using the Viola-Jones face detector [33] are resized to . For each facial image, we extract the Local Binary Pattern (LBP) features [34] and normalize each feature to have unit Euclidean norm. To evaluate the robustness of proposed methods against corruption, we randomly choose 10 percent of the entries of each feature vector and replace them by random values following the uniform distribution . In the experiment, we randomly select a video sequence for training and use the rest for testing. To extend ARC and MRARC methods for ISFR, we use the average class-dependent reconstruction residuals of the images in the query set for recognition [35]. Eventually, the query set is assigned to the class minimizing the average residual. Instead of using all frames of each video, we randomly choose frames of images from each video and use them to construct the training and query (testing) set for recognition. To obtain reliable results, we repeat each test ten times and compute the average, minimal and maximum recognition rates among the ten tests. Table VI reports the recognition results. The results show that the three MRARC methods have close performance and outperform other competing RC methods.

Iv-D Multimodal Face Recognition

In this section, we compare the MRJSRC method in the MRARC framework with JSRC [9] for multimodal face recognition. For the comparison between JSRC and other multimodal recognition methods, see the reference [9].

In the first experiment, we extract four weak modalities from each facial image for evaluation [9]. They are the left and right periocular, nose, and mouth as shown in Fig. 7. For the AR database, the images are resized to . Like the settings in Experiment 1 in Section IV-B, we use the seven images with expression and illumination variations per subject in the first session for training and the seven images in the second session for testing. For the EYaleB database, the images are resized to . We randomly select half of all images per subject for training and the rest for testing. Analogously, we use the original intensity values of the images for the experiments.

Figure 7: Extracted four modalities from an image. (a) AR database and (b) Extended Yale B (EYaleB) database.
Database Algorithm {1} {1,2} {1,2,3} {1,2,3,4}
AR JSRC 73.14 82.14 86.43 89.71
MRJSRC 78.71 84.71 89.43 91.14

EYaleB
JSRC 83.06 95.89 96.88 97.86
MRJSRC 85.36 97.70 97.78 98.68

EYaleB+10%Occlu
JSRC 69.98 88.90 90.38 95.23
MRJSRC 71.13 89.56 92.35 95.89


EYaleB+20%Occlu
JSRC 63.08 82.24 83.55 91.37
MRJSRC 64.80 84.62 85.86 92.43
Table VII: Recognition rates (%) using distinct number of modalities. Modalities include 1. left periocular, 2. right periocular, 3. nose, and 4. mouth. EYaleB+10%Occlu means that each test image in the EYaleB database has 10% region occluded by a baboon image as in Fig. 4. Best results are marked bold.

Table VIII reports the recognition results of MRJSRC and JSRC using varying number of modalities. Based on the results, we can draw the following conclusions.

  • First, the recognition rate of each competing method grows rapidly as the number of modalities increases from 1 to 4. This suggests that both JSRC and MRJSRC can exploit the complementarity among distinct modalities to improve the recognition performance. However, the proposed MRJSRC method stably improves JSRC with varying number of modalities.

  • Second, the competing methods using more modalities have better robustness against random block occlusion on the EYaleB database. For example, the recognition rate of MRJSRC using the single modality 1 (i.e., the left periocular modality) varies from without occlusion to with occlusion, declining over . In contrast, the recognition rate of MRJSRC using the four modalities varies from to , declining less than .

(a) View 1
(b) View 2
Figure 8: Facial images in two views of a subject in the CMU PIE database. (a) View 1; (b) View 2. In each subfigure, from left to right: the original image and the noisy image with 20% random corruption.
Figure 9: Recognition rates as a function of the percent of corrupted pixels using two views in the CMU PIE database.

In the second experiment, we evaluate the performance of the proposed method for multiview face recognition using the CMU PIE database [31]. This database consists of 41,368 images of 68 people, of which the facial images are captured under 13 different poses, 43 different illumination conditions, and with 4 different expressions. Two near frontal poses (i.e., C09 and C29) are selected to construct the multiview setting, as shown in Fig. 8. Thus, a pair of images of one subject under the two poses are regarded as a two-view or two-modality sample. Each image is resized to . In the experiment, we randomly select half of all images per subject for training and the rest for testing. As shown in Fig. 8, a fraction of pixels of each test image are randomly selected and their values are replaced by random values following uniform distribution over the interval .

Fig. 9 shows the recognition rates of competing methods as a function of the percent of corrupted pixels in each test image. We also compare JSRC and MRJSRC using single-view and two-view samples and with varying percent of random corruption. The results are reported in Table VIII. Concretely, View 1 and View 2 correspond to Pose C09 and Pose C29, respectively. From Fig. 9 and Table VIII, we can find that the proposed MRJSRC method stably outperforms JSRC in varying number of views and corruption level. This comes from the fact that MRJSRC has decent robust property and can well take advantage of the complementary information among multiple modalities.

Algorithm View 1 View 2 View 1 View 2
10 JSRC 95.34 93.26 92.52
MRJSRC 95.96 94.00 93.01

20
JSRC 93.87 91.91 90.69
MRJSRC 95.59 93.75 92.28

30
JSRC 93.01 89.22 86.76
MRJSRC 94.73 93.63 91.30

40
JSRC 89.09 82.23 78.55
MRJSRC 93.75 91.42 89.95

50
JSRC 79.29 68.75 59.93
MRJSRC 91.42 87.87 84.56

60
JSRC 59.56 45.71 34.93
MRJSRC 86.52 76.35 71.69
Table VIII: Recognition rates (%) using distinct number of views with varying percent of randomly corrupted pixels. Best results are marked bold.

V Conclusion

This paper presents a novel general classification framework termed as MRARC for robust face recognition and reconstruction. The proposed MRARC framework is based on the modal regression and does not require the noise to follow any specific distribution. This gives rise to the ability of MRARC in handling various complicated noises in reality. Using MRARC as a platform, we have also developed several novel RC methods for robust unimodal and multimodal face recognition. The experiments on real-world databases show the efficacy of the proposed methods for robust face recognition and reconstruction.

References

  • [1] J. Wright, A.Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 2, pp. 210–227, Jan. 2009.
  • [2] E. Elhamifar and R. Vidal, “Block-sparse recovery via convex optimization,” IEEE Trans. Signal Process., vol. 60, no. 8, pp. 4094–4107, Aug. 2012.
  • [3] L. Zhang, M. Yang, and X. Feng, “Sparse representation or collaborative representation: which helps face recognition?” in Proc. IEEE Conf. Int. Conf. Comput. Vis., Nov. 2011, pp. 471–478.
  • [4] R. He, W. Zheng, and B. Hu, “Maximum correntropy criterion for robust face recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 8, pp. 1561–1576, Aug. 2011.
  • [5] R. Basri and D. Jacobs, “Lambertian reflection and linear subspaces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 3, pp. 218–233, Feb. 2003.
  • [6] T. Hastie and P. Simard, “Metrics and models for handwritten character recognition,” Statistical Science, vol. 13, no. 1, pp. 54–65, 1998.
  • [7] E. Elhamifar and R. Vidal, “Robust classification using structured sparse representation,” in

    Proc. IEEE Conf. Comput. Vis. Pattern Recognit.

    , 2011, pp. 1873–1879.
  • [8]

    L. Zhang, W. Zhou, P. Chang, J. Liu, Z. Yan, T. Wang, and F. Li, “Kernel sparse representation-based classifier,”

    IEEE Trans. Signal Process., vol. 60, no. 4, pp. 1684–1695, Jan. 2012.
  • [9] S. Shekhar, V. Patel, N. Nasrabadi, and R. Chellappa, “Joint sparse representation for robust multimodal biometrics recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 1, pp. 113–126, Jan. 2014.
  • [10] M. Yang, L. Zhang, J. Yang, and D. Zhang, “Regularized robust coding for face recognition,” IEEE Trans. Image Process., vol. 22, no. 5, pp. 1753–1766, May 2013.
  • [11] R. He, W. Zheng, T. Tan, and Z. Sun, “Half-quadratic-based iterative minimization for robust sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 2, pp. 261–275, Feb. 2014.
  • [12] J. Yang, L. Luo, J. Qian, Y. Tai, F. Zhang, and Y. Xu, “Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 1, pp. 156–171, Jan. 2017.
  • [13] Y. Tang, Y. Wang, L. Li, and C. Chen, “Structural atomic representation for classification,” IEEE Trans. Cybernetics, vol. 45, no. 12, pp. 2905–2913, Dec. 2015.
  • [14] Y. Wang, Y. Tang, L. Li, and P. Wang, “Information-theoretic atomic representation for robust pattern classification,” in Proc. Int. Conf. Pattern Recognit., Dec. 2016, pp. 3685–3690.
  • [15] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Roy. Stat. Soc., Ser. B, vol. 58, no. 1, pp. 267–288, 1996.
  • [16] A. Hoerl and R. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970.
  • [17] M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variabels,” J. Royal. Statist. Soc B., vol. 68, no. 1, pp. 49–67, 2006.
  • [18] D. Erdogmus and J. C. Principe, “An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems,” IEEE Trans. Signal Process., vol. 50, no. 7, pp. 1780–1786, Jul. 2002.
  • [19] Y. Feng, J. Fan, and J. Suykens, “A statistical learning approach to modal regression,” Arxiv preprint arXiv:1702.05960v2, 2017.
  • [20] T. Sager and R. Thisted, “Maximum likelihood estimation of isotonic modal regression,” Ann. Stat., vol. 10, no. 3, pp. 690–707, 1982.
  • [21] H. Zhou and X. Huang, “Nonparametric modal regression in the presence of measurement error,” Electronic Journal of Statistics, vol. 10, no. 2, pp. 3579–3620, 2016.
  • [22] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,”

    Foundations and Trends in Machine Learning

    , vol. 3, pp. 1–122, Jan. 2011.
  • [23] M. Nikolova and M. K. Ng, “Analysis of half-quadratic minimization methods for signal and image recovery,” SIAM J. Sci. Comput., vol. 27, no. 3, pp. 937–966, 2005.
  • [24] V. Chandrasekaran, B. Recht, P. Parrilo, and A. Willsky, “The convex geometry of linear inverse problems,” Found. Comput. Math., vol. 12, pp. 805–849, Dec. 2012.
  • [25] G. Liu, Z. Lin, and Y. Yu, “Robust subspace segmentation by low-rank representation,” in Proc. Int. Conf. Mach. Learn., 2010, pp. 663–670.
  • [26]

    E. Parzen, “On the estimation of a probability density function and the mode,”

    Ann. Math. Stat., vol. 33, no. 3, pp. 1065–1076, Sep. 1962.
  • [27] J. C. Principe, Information Theoretic Learning: Renyi’s Entropy and Kernel Perspectives.   New York: Springer, 2010.
  • [28] K. Lee, J. Ho, and D. Driegman, “Acquiring linear subspaces for face recognition under variable lighting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 5, pp. 684–698, May 2005.
  • [29]

    A. Martinez and R. Benavente, “The AR face database,” Computer Vision Center, Tech. Rep. 24, Jun. 1998.

  • [30] R. Gross and J. Shi, “The CMU Motion of Body (MoBo) Database,” Robotics Institute, Tech. Rep. CMU-RI-TR-01-18, June 2001.
  • [31] T. Sim, S. Baker, and M. Bsat, “The CMU pose, illumination, and expression database,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 12, pp. 1615–1618, Dec. 2003.
  • [32] I. Naseem, R. Togneri, and M. Bennamoun, “Linear regression for face recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 11, pp. 2106–2112, Nov. 2010.
  • [33] P. Viola and M. Jones, “Robust real-time face detection,” Int. J. Comput. Vis., vol. 57, no. 2, pp. 137–154, 2004.
  • [34] T. Ahonen, A. Hadid, and M. Pietikainen, “Face description with local binary patterns: Application to face recogntion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 12, pp. 2037–2041, Dec. 2006.
  • [35] P. Zhu, W. Zuo, L. Zhang, S. Shiu, and D. Zhang, “Image set-based collaborative representation for face recognition,” IEEE Trans. Inf. Forensics Security, vol. 9, no. 7, pp. 1120–1132, July 2014.