1 Introduction
Least squares support vector machine (LSSVM) was introduced by SuykensSuykens1999
and has been a powerful learning technique for classification and regression. It has been successfully used in many real world pattern recognition problems, such as disease diagnosis
Duygu2011 , fault detectionLong2014 , image classification Yang2015, partial differential equations solving
Mehrkanoon2015 and visual trackingGao2016 . LSSVM tries to minimize least squares errors on the training samples. Comparing with other SVMs, LSSVM is based on equality constraints rather than inequality ones, hence it has closed form solutions by solving a system of linear equations instead of solving a quadratic programming (QP) problem iteratively as other SVMs. So the training of LSSVM is simpler than other SVMs.However, LSSVM has two main drawbacks. One is that it is sensitive to outliers, because outliers always have large support values (the values of Lagrange multiplier), which means that the influences of outliers are larger than other samples in constructing the decision function. Another is that the solution of LSSVM lacks sparse, which limits the method for training large scale problems.
In order to overcome the sensitivity to outliers of the LSSVM, Suykens et al.Suykens2002 proposed the weighted LSSVM (WLSSVM) model by putting small weights on the less important samples or outliers to reduce their influence to the model. Some other weight setting strategies are proposed, see Valyon2003 You2011 . Theoretical analyses and the experimental results indicate that such methods are robust to outliers. But those methods need presolve the original LSSVM to set the weights, so they are all not suit for training large scale problems. Another technique to deal with robustness is on nonconvex loss functions. Based on truncated least squares loss function, Wang et al.KuainiWang2014 and Yang et al.XiaoweiYang2014 presented robust LSSVM (RLSSVM) model. Experimental results show that RLSSVM model significantly reduces the effect of the outliers. However, the solutions to RLSSVM by Yang’s and Wang’s algorithms both lack sparseness, and they need precompute the whole kernel matrix and the inverse of , hence they are both time consuming for the large scale data sets. They are even unable to handle the data sets containing more than 10,000 training samples on common computers.
There are also some methods to promote the sparsity of LSSVM. Suykens et al.Suykens2000 J.A.K.Suykens2002 proposed a pruning algorithm which iteratively remove a small amount of samples (5%) with smallest support values to impose sparseness. In this pruning algorithm, a retraining of LSSVM with the reduced training set is needed for each iteration, which leads to a large computation cost. Fixedsize least squares support vector machine (FSLSSVM)Suykens2002 is another sparse algorithm. In this algorithm, some support vectors (SVs) referred to as prototype vectors are fixed in advance, and then they are replaced iteratively by samples which are randomly selected from the training set based on the quadratic Rényi entropy criterion. However, in each iteration, this method only computes the entropy of the samples that are selected in the working set rather than the whole data set, which may cause the suboptimized solutions. Jiao et al.Jiao2007 presented the fast sparse approximation for LSSVM (FSALSSVM), in which an approximated decision function was built iteratively by adding the basis function from a kernelbased dictionary one by one until the
criterion satisfied. This algorithm obtains sparse classifiers at a rather low cost. But with the very sparse setting, the experimental results in
sszhou2016 show that FSALSSVM is not good on some training data sets. Zhousszhou2016 proposed pivoting Cholesky of primal LSSVM (PCPLSSVM) which is an iterative method based on incomplete pivoting Cholesky factorization of the kernel matrix. Theoretical analyses and the experimental results indicate that PCPLSSVM can obtain acceptable test accuracy by extreme sparse solution.In this paper, we aim to obtain the sparse solution of the RLSSVM model to overcome the two drawbacks of LSSVM simultaneously. New algorithm solves the RLSSVM in primal space as Zhousszhou2016 did for LSSVM, and our main contributions can be summarized as follows:

By introducing an equivalent form of the truncated least squared loss function, we show that RLSSVM is equivalent to a reweighted LSSVM model, which explains the robustness of RLSSVM.

We illustrate that representer theorem is also held for the nonconvex loss function, and propose the primal RLSSVM model which has a sparse solution if the kernel matrix is low rank.

We propose sparse RLSSVM algorithm to obtain the sparse solution of RLSSVM by applying lowrank approximation of the kernel matrix. The complexity of the new algorithm is lower than the existing nonsparse RLSSVM algorithms.

A large number of experiments demonstrate that the proposed algorithm can process largescale problems efficiently.
The rest of the paper is organized as follows. The brief descriptions of the RLSSVM and its existing algorithms are given in section 2. In section 3, robustness of RLSSVM is interpreted from a reweighted viewpoint. In section 4, primal RLSSVM and its smooth version are discussed, and the novel sparse algorithm is proposed. After that, the convergence and complexity of the new algorithm are analyzed. Section 5 includes some experiments to show the efficiency of the proposed algorithm. Section 6 concludes this paper.
2 Robust LSSVM model and the existing algorithms
In this section, we briefly summarize the RLSSVM and the existing algorithms.
2.1 Robust LSSVM
Consider a training set with pairs of samples , where are the input data and or are the output targets corresponding to the inputs for classification or regression problems. The classical LSSVM model is described as follows:
(1) 
where is the regularization parameter,
is the normal of the hyperplane,
is the bias, is a map which maps the input into a highdimensional feature space, especially for managing the nonlinear learning problems, and is the least squares loss with being the predict error.By replacing in (1) with the truncated least squares loss :
(2) 
Wang et al.KuainiWang2014 and Yang et al.XiaoweiYang2014 introduced the Robust LSSVM (RLSSVM):
(3) 
where is the truncated parameter which controls the errors of the outliers. Fig. 2 plots the in (2) with , the least square loss and the difference between them . It is clear that the losses of the outliers (samples with larger errors) are bounded by , hence it reduce the effects of the outliers in RLSSVM. We will investigate the robustness of the RLSSVM from a reweighted viewpoint in section 3.
2.2 Existing algorithms for RLSSVM
The truncated least squares loss is nonconvex and nonsmooth, which can be easily observed by Fig. 2, but can be expressed as the difference between two convex functions and KuainiWang2014 , where
(4) 
Then RLSSVM can be transformed to a difference of convex (DC) programming:
(5) 
Wang et al.KuainiWang2014 and Yang et al.XiaoweiYang2014 solve the DC programming (5) by the ConcaveConvex Procedure (CCCP). Then through different methods, they both focus on solving the following linear equations (6) iteratively.
(6) 
where is the positive semidefinite kernel matrix satisfying , ,
is a identity matrix,
, , and is the value of at the th iteration satisfying(7) 
where , is the th row of the kernel matrix .
Through iteratively solving (6) with respect to and until convergence, the output deterministic function is .
In order to compute (7), Wang et al.KuainiWang2014 neglect the nondifferentiability points in and adopt the following formula:
(8) 
and Yang et al. compute (7) after smoothing the function by a piecewise quadratic functionXiaoweiYang2014 .
One limitation of these two algorithms is that the solution lacks sparseness. That is because the coefficient matrix of (6) is a nonsingular symmetric dense matrix and the vector on the right side of equations is dense. Hence the training speeds of these two algorithms are slow and they can not train largescale problems efficiently.
3 Robustness of RLSSVM from a reweighted viewpoint
Wang et al.KuainiWang2014 illustrate the robustness of RLSSVM only through experiments. Yang et al.XiaoweiYang2014 explain it from the relationship between the solutions of RLSSVM and WLSSVMJ.A.K.Suykens2002 . In this section, we will show that RLSSVM enjoys the robustness from a reweighted viewpointFeng2016 .
By the representer theorem in section 4.1, RLSSVM can be translated into the following model in primal space without the implicit feature map :
(9) 
In order to explain the robustness of the preceding model (9) more clearly, we propose an equivalent form of in Lemma 1 from the idea in Geman1995 Nikolova2005 .
Lemma 1.
can be expressed as
(10) 
where
(11) 
Proof.
Moreover,
(12) 
∎
By Lemma 1 and the research of reweighted LSSVM in Brabanter2009 , we have
Proposition 1.
Any stationary point of RLSSVM (9) can be obtained by solving an iteratively reweighted LSSVM as follows:
(13) 
where is the value of th iteration of the weight .
Proof.
Substituting (10) into (9), we have
(14) 
where . Since is nonconvex, only a stationary point of preceding minimization problem can be expected. Let be one of the stationary points of (9). By the analysis above, there exists such that be the solution of (14). On the other hand, if is any stationary point of (14), then also solves (9). Hence, we can iteratively solve (14) by alternating direction method (ADM)He2012 as follows:
(15)  
(16) 
Obviously, the optimization problem in (16) has the closed form solution as (12). The optimization problem in (15) is just the reweighted LSSVM (13). ∎
Since denotes the predicted error, similar to the robustness analysis in article Feng2016 , the larger is, the more likely that the instance pair tends to be an outlier. From (12) and (13), it observes that when the is sufficiently large for the outlier instance , the corresponding weight in (13) will be 0. That is, the truncated least squares loss function can reduce the influence of samples which are far away from their true targets. This explains the robustness of RLSSVM from the reweighted viewpoint.
4 Sparse RLSSVM algorithm
In this section, we give the primal RLSSVM and propose the sparse algorithm to obtain the sparse solution of the RLSSVM.
4.1 Primal RLSSVM
If loss function is convex such as in LSSVM model (1), by duality theory, the optimal solution can be represented as
(17) 
where . If loss function is nonconvex, the strong duality does not hold, hence we cannot get (17) by duality. However, by the representer theorem in Scholkopf2001 Shai2014 , it is easily to prove that (17) also holds.
Theorem 1.
Substituting (17) into (5), we get a DC programming with regard to and as follows:
(18) 
with convex functions and . We call the model (18) or its equivalent form (9) as primal RLSSVM for convenience.
Using CCCP method in KuainiWang2014 XiaoweiYang2014 Yuille2003 , the solution to the problem (18) can be obtained by iteratively solving the following convex QP until it converges:
(19) 
where is the same as (7) with .
However, the computation of is not simple, since is nondifferentiable at some points. Inspired by the idea in ShuishengZhou2013 , we smooth by the entropy penalty function. Let
(20) 
then we have whenever . is the smooth approximation of , and the upper bound of the difference between and is . In practice, if we set sufficiently large such as , the difference between them can be neglected. Fig. 2 shows the comparison between and the smoothed truncated least squares loss function with .
Yang et al.XiaoweiYang2014 also adopt a smooth procedure, but their method has to tune the smoothing parameter to get the best effect. That makes the parameter adjustment procedure complex. In comparison, our smoothing strategy based on entropy penalty function does not need to tune such parameter. What we need to do is set a large value for in (22).
4.2 Sparse solution for Primal RLSSVM
After obtaining by (22), in (19) are the solutions of the following system of linear equations:
(23) 
It seems that (23) is more complicated than (6) in a first sight. However, the coefficient matrix of (6) is nonsingular symmetric dense matrix, which leads to a nonsparse solution of (6). In comparison, the coefficients matrix of (23) may be low rank if the related kernel matrix is low rank or is approximated by a low rank matrix. In this situation, (23) may have sparse solution, which overcomes the limitation of the previous methods partly.
Now, we discuss the sparse optimization solution of (23) as soon as the kernel matrix can be approximated by a low rank matrix.
After simply calculation, we get the bias by (23). Eliminating , (23) is simplified to the following linear equation:
(24) 
Nyström Approximation is a most popular method to obtain the lowrank approximation of kernel matrix (see Williams2001 Petros2005 Zhang2010 Si2016 and the references therein). The lowrank approximation method is not the point of this paper. For simplicity, we employ Zhou’s pivoting Cholesky factorization methodsszhou2016 . Let corresponding to the indices of the landmark points, be the submatrix of whose elements are for and , and be defined similarly. By the pivoting Cholesky factorization method in sszhou2016 , we can obtain the full column rank matrix satisfying as the best rank Nyström type approximation of under the trace norm, and in all process only the selected columns and the diagonal of the kernel matrix are necessary. If is gotten by some other lowrank approximation methods Williams2001 Petros2005 Zhang2010 Si2016 , let and the following analysis is the same.
Substituting into (24) instead of , (24) is simplified as:
(25) 
where is a identity matrix. By permuting rows of matrix , we get , where is a full rank matrix (and will be a lower triangular matrix if is obtained as sszhou2016 , hence is computed with cost instead of ), and is comprised by the rest rows of . Correspondingly, let , then we have
(26) 
is the sparse solution of (25), where
(27) 
So the sparse RLSSVM (SRLSSVM) algorithm is obtained by iteratively updating as follows:
(28)  
(29) 
4.3 Sparse RLSSVM algorithm
From the above analysis, our SRLSSVM algorithm is listed as Algorithm 1.
After obtaining the optimal and by Algorithm 1, the decision function for regression is:
(30) 
For classification, the decision function is . We give some comments about Algorithm 1.
Comment 1. If let as the starting point, the first cycle of Algorithm 1 is equivalent to solving the primal LSSVM (PLSSVM) problemsszhou2016 .
Comment 2. In equation (28), if we set
then and , so . In step 3 of Algorithm 1, we compute instead of , and the cost of step 3 is decreased further. The output can only be calculated at the last round by .
Comment 3. To promote computational efficiency, Equation (28) can be rewritten as:
(31) 
where , , is the sparse solution of primal LSSVM, is the index set of nonzero elements of , is comprised by several rows of , and the indexes of these rows in correspond to the elements in , is a vector comprised of nonzero elements of .
Then the step 2 and 3 in Algorithm 1 can be replaced with the following:
Step 2’: Compute and . Set ;
Comment 4. In Algorithm 1, the parameter limits the upper bound of loss function. should not be set too large or small. The improper results in poor generalization performance. To overcome the sensitivity of the loss function to , we can tone as follows. Firstly, set a little larger , such as , where . Then add the following step between the step 3 and step 4 in Algorithm 1: reduce if is small until , where is the minimum of we set.
4.4 Convergence and Complexity analysis
CCCP is globally or locally convergent, see Yuille2003 Tao2014 Bharath2012 . Similar to the convergence proof of DCA (DC Algorithm) for general DC programs in article PHAM1997 , we have the following Lemma.
Lemma 2.
If the optimal value of the problem (18) is finite, and the infinite sequences and are bounded, then every limit point of the sequence is a generalized KKT point of .
Obvious, the objective function of (18) and (9) is bounded below. Assume the prediction error variable is bounded, which is reasonable in real application, then is bounded by (22). So and are also bounded because of the boundedness of , and in (28) and (29). By Lemma 2, we get the following theorem.
Theorem 2.
For Algorithm 1, the computation cost of step 1 and step 2 are both sszhou2016 . The complexity of iteratively solving step 3 is , where is the total iterative number of SRLSSVM. So the overall complexity of this algorithm is . If we utilize the technique in comment 3 to compute , then the complexity of step in Algorithm 1 reduced to which is the complexity of step . In comparison, the computational complexities of Wang’s and Yang’s RLSSVM algorithms in KuainiWang2014 and XiaoweiYang2014 are both , where is the iterations of their algorithms. It is obvious that our method has smaller computational complexity than existing approaches.
Parallel computing potential. In the Algorithm 1, some calculations are easy to perform, so serial computing is enough for them. However, for some costly calculations, we can utilize parallel computing to further improve computing efficiency. The main computational cost of Algorithm 1 is from computing , which can be implemented in parallel. For example, can be partitioned into chunks according to row satisfying , so which can be efficiently calculated by the parallel algorithm of matrix multiplication, where is the th block of the matrix .
5 Numerical experiments and discussions
To examine the validity of the proposed algorithm, we compare our SRLSSVM with the RLSSVMWKuainiWang2014 (Wang’s algorithm for RLSSVM), RLSSVMYXiaoweiYang2014 (Yang’s algorithm for RLSSVM), the classical LSSVM, WLSSVMJ.A.K.Suykens2002 , the FSLSSVMSuykens2002 which is operated in the LSSVMlab v1.8 software Brabanter2011_lssvmtoolbox ^{1}^{1}1Codes are available in http://www.esat.kuleuven.be/sista/lssvmlab/. and the SVMs (CSVC for classification and SVR for regression) which are implemented in the LIBSVM software^{2}^{2}2Codes are available in https://www.csie.ntu.edu.tw/ cjlin/libsvm/. for medium datasets. For some largescale problems, we only compare the proposed algorithm with some sparse algorithms, such as PCPLSSVMsszhou2016 ^{3}^{3}3Codes and article can be downloaded from http://web.xidian.edu.cn/sszhou/paper.html, FSLSSVMSuykens2002 , Cholesky with side information (CSI)Bach_csi2005 ^{4}^{4}4Codes are available in http://www.di.ens.fr/fbach/csi/index.html. and CSVC for classification or SVR for regression, since the others can not apply in this case.
All computations are implemented in windows 8 with Matlab R2014a. The whole experiments are run on a PC with an Intel Core i54210U CPU and a maximum of 8G bytes of memory available for all processes.
We fixed the values of smoothing parameter in SRLSSVM and the stop criterion respectively. For all the data sets, we use crossvalidation procedure and grid search to search the best values of the parameter and , where is the parameter in Gaussian kernel function , and is the smooth parameter in method RLSSVMY.
For RLSSVMW and RLSSVMY, the running time in our article is much less than those in KuainiWang2014 XiaoweiYang2014 for the same data sets and the total complexity is reduced from KuainiWang2014 XiaoweiYang2014 to , where the coefficient matrix of (6) is decomposed by Cholesky factorization once and such decomposition is unchanged per loop in our experiments.
5.1 Classification experiments
In this section, we test one synthetic classification data set and some benchmark classification data sets to illustrate the effectiveness of the SRLSSVM. For benchmark datasets, each attribute of the samples is normalized into , and these datasets are separated into two groups: the medium size datasets group and the largescale datasets group. All of them are downloaded from lib . The experimental results on Adult data set show the reason why we separate these data sets into two groups. Finally, we test the robustness of our proposed algorithm for largescale data sets with outliers on CodRNA dataset. Outliers are generated by the following procedure. We choose 30% of samples which are far from decision hyperplane, then randomly sample 1/3 of them and flip their labels to simulate outliers.
5.1.1 Synthetic classification dataset experiment
To compare the robustness and spareness of four algorithms LSSVM, WLSSVM, RLSSVMY and SRLSSVM, we conduct an experiment on a linear binary classification data set including 60 training samples and 100 testing samples. Fig. 3 shows the experimental results. To simulate outliers, we add 4 training samples labeled with wrong classes. They are marked as ’’ and ’’ for positive and negative classes respectively. Through grid search, we obtain the best parameter values for this data set are .
Fig. 3 illustrates that the decision lines of algorithms LSSVM and WLSSVM change greatly and these two methods have lower accuracies than SRLSSVM and RLSSVMY after adding outliers. In contrast, the decision boundaries of SRLSSVM and RLSSVMY are almost unchanged and the accuracies of these two approaches remain stable before and after adding outliers. So SRLSSVM is insensitive to outliers. Moreover, almost all of training samples are SVs for LSSVM, WLSSVM and RLSSVMY. By contrast, for SRLSSVM, the support vector sizes are both only 2 for data sets with and without outliers. So the proposed algorithm is sparseness, which can accelerate the training speed of our approach in processing large scale problems.
5.1.2 Mediumscale benchmark classification datasets experiments
Data  Algorithms  Iterations  Training  nSVs  Accuracies(%)  

(Train,  Time(s)  
Test)  
Pendigits  CSVC        0.36(0.02)  433.5(7.7)  99.95(0.001)  
(1466,  LSSVM        0.12(0.01)  1466()  99.26(0.005)  
733)  WLSSVM        0.18(0.01)  1464.6(1.7)  99.92(0.002)  
FSLSSVM        0.19(0.01)  73(0)  99.90(0.001)  
RLSSVMW  1.5    16.1(1.7)  0.22(0.01)  1466(0)  99.37(0.003)  
RLSSVMY  1.5  0.25  12.2(1.5)  0.19(0.01)  1436.8(8.0)  99.09(0.005)  
SRLSSVM  1.5    8.5(0.9)  0.03()  73()  99.96(0.001)  
Protein  CSVC        55.64(0.47)  5486.8(27.1)  77.98()  
(8186,  LSSVM        22.67(0.30)  8185.9(0.32)  78.22(0.002)  
3509)  WLSSVM        27.77(0.30)  8185.7(0.67)  78.24(0.003)  
FSLSSVM        27.55(0.61)  408.1(1.20)  77.00(0.003)  
RLSSVMW  0.8    34.9(4.2)  28.56(1.10)  8184.8(0.92)  77.81(0.023)  
RLSSVMY  0.8  0.7  16.7(3.4)  25.17(0.81)  7876.7(46.6)  78.23(0.004)  
SRLSSVM  0.8    6(0)  11.65()  409()  78.04(0.002)  
Satimage  CSVC        0.25(0.01)  693.5(13.6)  99.86()  
(2110,  LSSVM        0.76(0.01)  2109.6(0.7)  99.23(0.002)  
931)  WLSSVM        0.90(0.02)  1897.5(2.1)  99.91(0.001)  
FSLSSVM        0.50(0.02)  105(0)  97.93(0.008)  
RLSSVMW  0.5    17.1(1.5)  0.88(0.03)  2107.3(1.2)  99.90(0.001)  
RLSSVMY  0.5  0.3  11.7(1.2)  0.87(0.03)  1916.5(15.1)  99.88(0.001)  
SRLSSVM  0.5    6(2.7)  0.27()  105()  99.97(0.001)  
USPS  CSVC        0.92(0.02)  646.9(14.4)  99.34()  
(2199,  LSSVM        2.53(0.01)  2198.7(0.5)  99.34(0.002)  
623)  WLSSVM        2.67(0.01)  1973.8(3.1)  99.49(0.001)  
FSLSSVM        1.21(0.01)  109(0)  98.28(0.006)  
RLSSVMW  1.1    9.3(0.84)  2.60(0.02)  2193.7(2.7)  99.52(0.000)  
RLSSVMY  1.1  0.15  10.2(0.92)  2.60(0.02)  2002.4(14.9)  99.52(0.000)  
SRLSSVM  1.1    6(0.79)  1.91()  108.7()  99.52(0.000)  
Splice  CSVC        0.12(0.01)  820.8(8.3)  76.38()  
(1000,  LSSVM        0.19(0.01)  1000(0)  75.99(0.10)  
2175)  WLSSVM        0.20(0.01)  1000(0)  76.04(0.10)  
FSLSSVM        0.42(0.02)  100(0)  76.66(0.07)  
RLSSVMW  0.9    33.8(7.2)  0.29(0.03)  1000(0)  75.15(0.14)  
RLSSVMY  0.9  0.5  15.1(2.9)  0.22(0.01)  947.6(19.5)  80.50(0.04)  
SRLSSVM  0.9    25.8(7.1)  0.19()  100()  81.27(0.03)  
Mushrooms  CSVC        2.33(0.04)  2244.8(27.1)  99.99()  
(5614,  LSSVM        5.75(0.08)  5415.8(0.4)  98.67(0.002)  
2708)  WLSSVM        7.41(0.19)  4837.8(12.2)  99.90(0.001)  
FSLSSVM        2.69(0.02)  268.9(0.7)  99.66(0.003)  
RLSSVMW  0.6    26.4(4.0)  7.57(0.22)  5381.8(6.6)  99.97(0.000)  
RLSSVMY  0.6  0.3  22.7(4.2)  7.48(0.18)  4928(16.9)  99.71(0.001)  
SRLSSVM  0.6    6(0)  1.28()  270()  100(0) 

Pendigits is a penbased recognition of handwritten digits data set to classify the digits 0 to 9. We only classify the digit 3 versus 4 here.

Protein is a multiclass data set with 3 classes. Here a binary classification problem is trained to separate class 1 from 2.

Satimage is comprised by 6 classes. Here the task of classifying class 1 versus 6 is trained.

USPS is a muticlass data set with 10 classes. Here a binary classification problem is trained to separate class 1 from 2.
Comparison of the numbers of iterations, training time (seconds), mean number of support vectors (denoted by nSVs) and accuracies (%) of different algorithms on benchmark classification data sets with outliers (10%). The standard deviations are given in brackets. ’’ means this parameter is not used by this method. The best values are highlighted in bold.
Table 1 reports the data information, optimal parameters and experimental results for the mediumscale classification data sets with outliers. The best results are highlighted in bold. In Table 1, we set for SRLSSVM and FSLSSVM for all data sets except Splice (). For CSVC, the parameter . All the algorithms independently operate 10 times to get the unbiased results.
As regard to accuracies, Table 1 illustrates that our proposed method SRLSSVM has higher accuracies than any other compared approaches on most data sets. As to training time, our method is faster than other approaches except CSVC. CSVC performs well on some medium scale data sets in training speed, but on some larger scale data sets such as Protein and Mushroom, the running speeds of CSVC are slower than SRLSSVM. In addition, the accuracies of CSVC is lower than SRLSSVM.
In terms of sparseness, SRLSSVM and FSLSSVM need much fewer support vectors than other approaches. In other words, these two methods have sparseness. But the accuracy of FSLSSVM is lower than SRLSSVM, and FSLSSVM spends more time than SRLSSVM on all data sets. CSVC also displays sparsity but its support vector size is much larger than SRLSSVM and FSLSSVM, partly because there exist outliers in the training set.
As respect to iteration times of solving nonconvex programming RLSSVM, SRLSSVM needs less iterations than RLSSVMW and RLSSVMY to converge to the optimal solution.
5.1.3 Adult data set experiments
To investigate the performance of each algorithm on data sets in different sizes, we randomly choose 4000, 8000, 10000, 15000, 20000 and all the 32561 training samples from the training set of Adult data setlib . The test set size is 16281.
Fig. 4 shows the experimental results of all the approaches on the data sets with outliers. The horizontal axis is the logarithmic coordinate in the figure. As to accuracies on these data sets, in general, SRLSSVM, CSVC and FSLSSVM perform better than other methods, and our method SRLSSVM performs the best. In addition, from Fig. 4, we can draw the conclusion that for the medium scale training data sets, especially those with size smaller than 8000, every algorithm runs fast. However, if the training set size exceeds 20000, LSSVM, WLSSVM, RLSSVMW and RLSSVMY cannot operate on our common computer due to lack of memory. So for the large scale benchmark data sets, we do not compare our method SRLSSVM with LSSVM, WLSSVM, RLSSVMW and RLSSVMY. Moreover, Fig. 4 also shows that the training time of CSVC increases rapidly as the sizes of training samples grow larger.
5.1.4 Largescale benchmark classification datasets experiments
Table 2 reports the data information, optimal parameters and experimental results for the largescale data sets with outliers (10%). We compare our SRLSSVM with some other sparse algorithms. For Skinnonskin data set, we randomly select 2/3 of the data as training samples and the rest of the data as testing samples, and for others, we use the default setting in lib . All the algorithms operate 5 times independently to get the unbiased results for every dataset. The best results are highlighted in bold.
Data Sets  Algorithms  Training  nSVs  Accuracies(%)  

(train, test)  Time(s)  
Skinnonskin  CSVC    1329.4()  59910()  99.30()  
(163371, 81686)  FSLSSVM    42.6(0.7)  199.2(0.8)  99.82(0.000)  
CSI    42.9(0.7)  100(0)  99.84(0.000)  
PCPLSSVM    32.7(0.6)  400(0)  99.86(0.000)  
SRLSSVM  1.5  34.8()  400()  99.86(0.000)  
IJCNN1  CSVC    84.3()  17103()  92.16()  
(49990,91701)  FSLSSVM    34.9(0.8)  398.8(1.0)  94.17(0.014)  
CSI    63.0(1.2) 
Comments
There are no comments yet.