Biometrics attempts to recognize human beings according to their physical or behavioral features . In the past, various traits were used for biometric recognition, out of which iris and face are the most popular [32, 38, 17, 26]. Based on the pioneering work of Wright et al. , the sparse representation theory is emerging as a popular method in the biometrics fields and is considered specially suitable to handle degraded data acquired under uncontrolled acquisition protocols [31, 37].
1.1 Sparse Representation
Model selection in high-dimensional problems has been gaining interest in the statistical signal processing community [10, 4]. Using convex optimization models, the main problem is recovering a sparse solution of an underdetermined system of the form
, given a vectorand a matrix . There is a special interest in signal recovery when the number of predictors are much larger than the number of observations (n m). A direct solution to the problem is to select a signal whose measurements are equal to those of , with smaller sparsity by solving a minimization problem based on the -norm:
(), being a direct approach to seek the sparsest solution. Problem (1) is proved to be NP-hard and difficult to approximate since it involves non-convex minimization . An alternative method is to relax the problem (1) by means of the -norm (). Hence problem (1) can be replaced by the following -minimization problem:
which can be solved by standard linear programming methods. In practice, signals are rarely exactly sparse, and may often be corrupted by noise. Under noise, the new problem is to reconstruct a sparse signal , where
is white Gaussian noise with zero mean and variance. In this case the associated -minimization problem adopts the form:
Although sparsity of representation seems to be well established by means of the LASSO approach, some limitations were remarked by Hastie et al. . LASSO model tends to select at most variables before it saturates and in case predictors are highly correlated, LASSO usually selects one variable from a group, ignoring others. In order to overcome these difficulties, Hastie et al. 
proposed the elastic net (EN) model as a new regulation technique for outperforming LASSO in terms of prediction accuracy. The elastic net is characterized by the presence of ridge regression term (-norm) and it is defined by the following convex minimization problem:
where and are non-negative parameters. An improvement for the EN model was proposed in  where a combination of the -penalty and an adaptive version of the -norm have been implemented by considering the minimization problem
where the adaptive weights are computed using a solution given by the EN minimization problem (3). If we let the solution of EN to be , then the weights are given by the equation where is a positive constant. A variant of the above model was proposed in  by incorporating the adaptive weight matrix in the -penalty term:
In this paper we use a re-weighted elastic net regularization model for periocular recognition application.
1.2 Summary of Contributions
The main contribution of this paper is to propose a re-weighted elastic net (REN) regularization model, that enhances the sparsity of the solutions found. The proposed REN model is a regularization and variable selection method that enjoys sparsity of representation, particularly when the number of predictors are much larger than the number of observations. The weights are computed such that larger weights will encourage small coordinates by means of the -norm, and smaller weights will encourage large coordinates due to the -norm. Our model differs from the schemes in  and  (see equations (4) and (5) above), since the and terms are automatically balanced by weights which are continuously updated using with a positive parameter . We also provide a concise proof of the existence of a solution for the proposed model as well as its accuracy property. A complete presentation of the numerical implementation of the REN model using a gradient projection (GP) method , seeking sparse representations along certain gradient directions is described in this paper using a reformulation of the REN model as a quadratic programming (QP) problem.
As a main application of our model, we consider the periocular recognition problem. The periocular region has been regarded as a trade-off between using the entire face or only the iris in biometrics. Periocular region is particularly suitable for recognition under visible wavelength light and uncontrolled acquisition conditions [28, 42, 27]. We enhance periocular recognition through the sparsity-seeking property of our REN model over different periocular sectors, which are then fused according to a Bayesian decision based scheme. The main idea is to benefit from the information from each sector, which should contribute in overall recognition robustness. Two different domains are considered for this purpose: (1) geometry and (2) color. Full geometry information is accessed by decomposing a given image into their cartoon - texture components by means of a dual formulation of the weighted total variation (TV) scheme . For color, a key contribution is the use of nonlinear features such as chromaticity and hue components, which are thought to improve image geometry information according to human perception . Our methodology is inspired by two related works: 1) Wright et al. , which introduced the concept of sparse representation for classification (SRC) purposes; and 2) Pillai et al. 
, that used a SRC model for disjoint sectors of the iris and fused results at the score level, according to a confidence score estimated from each sector. Our experiments are carried out in periocular images of the UBIRIS.v2 data set: images were acquired at visible wavelengths, from 4 to 8 meters away from the subjects and uncontrolled acquisition conditions. Varying gazes, poses and amounts of occlusions (due to glasses and reflections) are evident in this data set and makes the recognition task harder, see Figure 1. The results obtained using our model allowed us to conclude about consistent increases in performance when compared to the classical SRC model and other important approaches (e.g., Wright et al.  and Pillai et al. ). Also, it should be stressed that such increase in performance were obtained without a significant overload in the computational burden of the recognition process.
The reminder of the paper is organized as follows. Section 2
summarizes the most relevant in the scope of this work concerning penalized feature selection for sparse representation. The re-weighted elastic net (REN) model is introduced together with statistical motivation ensuring high prediction rates. An algorithm based on gradient projection (GP) for the REN model is also introduced. Section3 describes the different geometrical information extracted from periocular images for performing recognition based on cartoon - texture and chromaticity features in a total variation framework. Section 4 describes the experimental validation procedure carried out together with remarkable comparisons. Finally, Section 5 concludes the paper.
2 The Reweighted Elastic Net model for Classification Model
2.1 The LASSO Model for Recognition
We first briefly describe the sparse representation based classification framework which is a precursor to our REN based approach. Having a set of labeled training samples ( samples from the i subject), they are arranged as columns of a matrix . A dictionary results from the concatenation of all samples of all classes:
The key insight is that any probe can be expressed as a linear combination of elements of . As the data acquisition process often induces noisy samples, it turns out to be practical to make use of the LASSO model. In this case it is assumed that the observation model has the form .
Classification is based on the observation that high values of the coefficients in the solution are associated with the columns of of a single class, corresponding to the identity of the probe. A residual score per class is defined: , where is a indicator function that set the values of all coefficients to 0, except those associated to the i class. Over this setting, the probe is then reconstructed by , and the minimal reconstruction error deemed to correspond to the identity of the probe, between and :
In  a sparsity concentration index (SCI) is used to accept/reject the response given by the LASSO model. The SCI of a coefficient vector corresponds to:
If , the computed signal is considered to be acceptably represented by samples from a single class. Otherwise, if the sparse coefficients spread evenly across all classes and a reliable identity for that probe cannot be given.
The recognition model proposed by Pillai et al.  obtains separate sparse representations from disjoint regions of an image and fusing them by considering a quality index from each region. Let be the number of classes with labels . A probe is divided into sectors, each one described by the SRC algorithm. SCI values are obtained over each sector, allowing to reject those with quality bellow a threshold. Let represent the class labels of the retained sectors, and
be the probability that the-th sector returns a label , when the true class is :
being and constants such that . According to a maximum a posteriori (MAP) estimate of the class label, the response corresponds to the class having the highest accumulated SCI:
2.2 The Re-weighted Elastic Net (REN) Method
The proposed REN model is a sparsity of representation approach balances the LASSO shrinkage term (-norm) and the strengths of the quadratic regularization (-norm) coefficients by the following minimization problem:
where are positive weights taking values in . The REN-penalty is strictly convex and it is a compromise between the ridge regression penalty and the LASSO. The convex combination in the REN-penalty term is natural in the sense that both the and norms are balanced by weights controlling the amount of sparsity versus smoothness expected from the minimization scheme. As in , the weights are chosen such that they are inversely related to the computed signal according to the equation with a positive parameter. Under this setting, large weights will encourage small coordinates with respect to the REN-penalty term, whereas small weights imply big coordinates with respect to the REN-penalty term, respectively. Then, it is seen that the new model combines simultaneously a continuous shrinkage and an automatic variable selection approach. We next consider the existence of solution and the sign recovery property of the REN model.
2.3 Existence of Solution
We state necessary and sufficient conditions for the existence of a solution for the proposed model (6). We follow the notations used in [41, 16]. In terms of and norms, we rewrite the minimization problem in (6) as,
Let us denote by and the real and estimated solution of (7) respectively. Given , we define the block-wise form matrix
where () is a () matrix formed by concatenating the columns () and is assumed to be invertible.
First we assume that there exist satisfying (7) and . Lets define together with the set,
From the Kauush-Kuhn-Tucker (KKT) conditions we obtain
which can be rewritten as,
for some and by substituting the equality . From the above Eqn. (8) the following two equations arise:
which guarantees the equality due to (13). In the same manner, we define satisfying and
implying from (14) the inequality for and therefore . From previous, we have found a point a point and satisfying (9) and (10) respectively or equivalently (8). Moreover, we also have the equality . Under these assertions we can prove the sign recovery property of our model as illustrated next.
2.4 Sign Recovery Property
Under some regularity conditions on the proposed REN model, we intend to give an estimation for which the event is true. Following similar notations in [48, 46], we intend to prove that our model enjoys the following probabilistic property:
For theoretical analysis purposes, the problem (6) is written as
The following regularity conditions are also assumed:
Denoting with and
the minimum and maximum eigenvalues of a symmetric matrix, we assume the following inequalities hold:
where and are two positive constants.
By using the definitions of and , the next two inequalities arise
On the other hand
which together with the identity
allow us to prove
Let us notice that
Let and . Because of (21),
Now, we notice that
and as long as , it follows that
By using (23), we derive
Then (15) holds.
There is special interest in applying the REN model in the case the data satisfies the condition . For the LASSO model it was suggested in  to make use of the Dantzig selector which can achieve the ideal estimation up to a factor. In  a performing of the Dantzig selector called the Sure Independence Screening (SIS) was introduced in order to reduce the ultra-high dimensionality. We remark that the SIS technique can be combined with the REN model (6) for dealing the case . Then previous computations can be still applied to reach the sign recovery property.
Next we describe an algorithm for the REN model allowing us to directly deal with the case . It turns out that our REN model can be expressed as a quadratic program (QP), thus allowing us to apply a gradient projection approach to perform the sparse reconstruction.
2.5 Numerical Implementation
The algorithm that alternates between the computed signal and redefining the weights is as follows:
Choose initial weights , .
Find the solution of the problem
Update the weights: for each ,
where is a positive stability parameter.
Terminate on convergence or when a specific number of iterations is reached. Otherwise, go to step 2.
Note that our REN problem in (27) can also be expressed as a quadratic program , by splitting the variable into its positive and negative parts. That is, where and are the vectors that collect the positive and negative coefficients of , respectively. Then, we handle the minimization problem,
where , , and with
where is the step size computed as
The operator is the define as the middle value of three scalar arguments and and are two given parameters. The parameter take the form
The performance of the REN minimization along with comparisons is shown is Figure 2 for a sparse signal. We want to reconstruct a length- sparse signal (in the canonical basis) from observations, with . The matrix
is build with independent samples of a standard Gaussian distribution and by ortho-normalizing the rows, while the original signalcontains 160 randomly placed and the observation is defined as with a Gaussian noise of variance . The reconstruction of the original signal over the REN minimization problem produces a much lower mean squared error (MSE = with been an estimate of ) equal to , while the MSE given by the adaptive elastic model proposed in ,  and LASSO are , and respectively. Therefore, the proposed REN approach does an excellent job at locating the spikes.
3 Geometric and Color Spaces for Image Decomposition
3.1 Cartoon + Texture (CT) Space
The periocular images contain cartoon (smooth) and texture parts (small scale oscillations) which can be obtained using the total variation (TV)  model effectively. In this setting, the grayscale version of a periocular image is divided into two components representing the geometrical and texture parts. The TV based decomposition model is defined as an energy minimization problem,
where is the input grayscale image, and is an edge indicator type function. Following  we use a splitting with an auxiliary variable to obtain the following relaxed minimization,
After a solution is computed, it is expected to get the representation , where the function represents the geometric cartoon part, the function contains texture information, and the function represent edges. The minimization (29) is achieved by solving the following alternating sub-problems based on the dual minimization technique: