Mixed Integer Linear Programming for Feature Selection in Support Vector Machine

08/07/2018 ∙ by Martine Labbé, et al. ∙ 0

This work focuses on support vector machine (SVM) with feature selection. A MILP formulation is proposed for the problem. The choice of suitable features to construct the separating hyperplanes has been modelled in this formulation by including a budget constraint that sets in advance a limit on the number of features to be used in the classification process. We propose both an exact and a heuristic procedure to solve this formulation in an efficient way. Finally, the validation of the model is done by checking it with some well-known data sets and comparing it with classical classification methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In supervised classification, we are given a set of objects

partitioned into classes and the goal is to build a procedure for classifying new objects into these classes. This type of problem is faced in many fields, including insurance companies (to determine whether an applicant is a high insurance risk or not), banks (to decide whether an applicant is a credit risk or not), medicine (to determine whether a tumor is benign or malignant), etc. This wide field of application has attracted the attention of a number of researchers from different areas. Currently, these problems are analyzed from different perspectives: artificial intelligence, machine learning, optimization or statistical pattern recognition among others. In this paper we analyze these problems from the point of view of optimization and more specifically, from the perspective of Mathematical Programming

Mangasarian (1965, 1968).

In the partitioning process, the objects of are considered as points in an

-dimensional feature space. However, the number of features is often much larger than the number of objects in the population. Handling such a high number of features could therefore be a difficult task and in addition, the interpretation of the results could also be impossible. In this sense, feature selection consists of eliminating as many features as possible in a given problem, keeping an acceptable accuracy where spurious random variables are removed. Actually, having a minimal number of features often leads to better generalization and simpler models that can be interpreted more easily.

The support vector machine (SVM) is a type of mathematical programming approach developed by Vapnik (1998), and Cortes and Vapnik (1995). It has been widely studied and has become popular in many fields of application in recent years; see the introductory description of SVM by Burges (1998). The SVM is based on margin maximization, which consists of finding the separating hyperplane that is farthest from the closest object. SVM has proven to be a very powerful tool for supervised classification. Recently, Maldonado et al. (2014) proposed two SVM-based models in which feature selection is taken into account by introducing a budget constraint in the formulation, limiting the number of features used in the model, see also Aytug (2015) and Gaudioso et al. (2017).

In this work, we are proposing a MILP formulation, based on Maldonado et al. (2014) idea to choose the best features and to obtain an adequate predictor. To find an efficient way for solving this model, we exploit the tightening of the bounds on the separating hyperplane coefficients, enabling us to get better times. Exact and heuristic solution approaches are also presented using these improvements. Lastly, the model is validated by comparing the proposed formulation with other models and with other feature selection techniques known in the literature, such as Recursive Feature Elimination (RFE) or the Fisher Criterion Score (F); see Guyon et al. (2002) and Guyon et al. (2006) respectively.

The paper is organized into 8 sections. Section 2 is a revision of various formulations for SVM-based models that are analyzed in the literature. Section 3 presents the model under study. In Section 4, several strategies to fix some of the “big M” parameters of the model are considered. Sections 5 and 6 develop heuristic and exact solution approaches, respectively. Section 7 presents some of the computational results that illustrate the improvement analyzed in the paper. Section 8 is devoted to analyzing the validation of the model presented in the paper. Finally some conclusions are addressed.

2 Support Vector Machine

Consider a training set partitioned into two classes, each object is represented with a pair , where is the number of features analyzed over each element of , contains the features’ values and provides the labels, or , associated with the two classes in . The SVM determines a hyperplane that optimally separates the training examples. In the case of linearly separable data, this hyperplane maximizes the margin between the two data classes, i.e., it maximizes the distances between two parallel hyperplanes supporting some elements of the two classes. Even if the training data is non-linearly separable, the constructed hyperplane also minimizes classification errors. Thus, the classical SVM model minimizes an objective function that is a compromise between the structural risk, given by the inverse of the margin, , and the empirical risk, given by the deviation of misclassified objects. Several SVM models have been proposed using different measures of margin and deviation. Among them, the standard -SVM (Bradley and Mangasarian (1998)) uses the following formulation:

(2)

As can be observed, -SVM considers -norm to measure the margin and it introduces the slack variables to measure the deviations of misclassified elements. Additionally, a penalty parameter that regulates the trade-off between structural and empirical risk is added. Constraints (2) are the main restrictions appearing in classical SVM. In fact, constraints (2) determine whether or not the training data are separable by the classifier hyperplane.

Bradley and Mangasarian (1998) also presented the same model but considering -norm instead of -norm for the margin. The resulting model is the following,

An equivalent linear formulation of this problem is:

(4)

Due to constraints (4), the variables represent the absolute value of the hyperplane coefficients .

Often, real data are composed of few sample elements (), but each element has a large number of related features (). Therefore, it is essential to select a suitable set of features to construct the classifier. From among the different techniques for feature selection, we focus on the embedded methods that perform feature selection at the same time as the classifier is constructed. Specifically, we focus on the SVM models that include feature selection constraints.

Maldonado et al. (2014) proposed a model inspired by the -SVM in which the idea of feature selection was introduced through a budget constraint. Unlike -SVM, the objective function of this model does not consider the margin, i.e., this model focuses on minimizing the sum of the deviations. Although structural risk is not explicitly included in the objective function it is, in some way, implicitly under control because the number of non-null

-variables is bounded by a budget constraint. This model is based on the use of a binary variable linked to each feature in order to restrict the number of attributes used in the classifier via a budget constraint. A cost vector

is considered, where is the cost of acquiring attribute , . The formulation is therefore given by

(MILP1) (7)

Constraints (7) link the - and -variables and enable the identification of the -variables which are non-null, i.e. will be non-null only if takes value for any . In fact, these are big M constraints. Values and correspond to the lower and upper bounds of the value of , , respectively. As previously mentioned, constraint (7) is the budget constraint that limits the number of non-null -variables. Thus, an important issue for solving this model is the appropriate choice of these bounds because the efficiency of any enumeration solution approach will greatly depend on the tightness of the model’s LP relaxation.

A possible criticism to this model is that in the case in which, for a particular value of , there are many optimal solutions with objective value , this model does not provide a way to choose among those hyperplanes. This situation is very often in datasets with many features and very few objects.

3 The model

Based on the idea introduced by Maldonado et al. (2014), we propose the extension of -SVM with a budget constraint, i.e., our model takes into account the structural and empirical risk in the objective function with feature selection through a budget constraint, therefore we avoid the above mentioned criticism of MILP1. Maldonado et al. (2014) were mainly focused on validating their model by contrasting it against well-known classification methods from the literature, but little attention was paid to analyze how to solve their problem efficiently. However, our goal in this paper is to provide a deep analysis of the model, allowing us to produce efficient exact and heuristic solution approaches in addition to validating the model by comparing it with classical classification methods. Hence the model that we propose is given by,

In contrast to MILP1, the presented formulation considers the margin in the objective function. Thus, the problem looks for an optimal balance between deviations and the margin using -norm. In what follows and for the sake of clarity, analogously to Maldonado et al. (2014), we will assume that in (7) is set to for . Therefore, the budget will represent the maximum number of features that can be selected. An equivalent formulation for this model is obtained by decomposing the unrestricted variables as two different non-negative variables, and . In this reformulation, , where for . Thusly, by taking advantage of this definition, we have that in any optimal solution since for is part of the objective function to be minimized. This means that, at most, only one of the two variables is non-zero in the optimal solution. Consequently, the following formulation is obtained,

(FS-SVM) (14)

Note that the FS-SVM formulation presents a feature selection constraint (14) that limits the number of selected features in order to construct the separating hyperplane. Additionally, constraints (14) and (14) are two sets of big M constraints.

A preliminary computational study to check how difficult it is to solve the aforementioned mixed integer linear formulation of FS-SVM with very conservative big M values ( and for all ) shows that formulation’s performance is not very good (see the columns FS-SVM of Table 7.2 for the different data sets). This encourages us to check whether a strengthening the big M values might improve these computational results. In the following sections, we will analyze the influence of tightening bounds of the -variables in this formulation, and , to solve the model. In this sense, we will develop different methodologies to obtain better -variable bounds.

In addition, we have also studied an alternative formulation of FS-SVM by substituing constraints (14) and (14) by conditional constraints and implementing them by CPLEX command IloIfThen, however we have omitted it because we obtained very bad computational times. Moreover, from this preliminar computational analysis we have checked that the solution times for solving MILP1 and FS-SVM are similar. Only in the cases in which the optimal value of MILP1 is 0, this model is much faster, but in those cases, MILP1 is useless because the data are separable for the chosen features and many separating hyperplanes can be equally valid.

4 Strategies for obtaining tightened values of the bounds

As mentioned above and in terms of developing good solution approaches to our problem, it would be useful to provide tightened values of the upper/lower bounds of for . It should be noted that the literature contains various methods related to bound reduction of variables in SVM. In particular, two methods are developed in Belotti et al. (2016). One of them was the origin of a CPLEX parameter and the other is based on an iterative process that solves auxiliary MIPs to strengthen big M values associated with certain variables. In our preliminary computational analysis we checked this parameter and it did not improve our computational results. In addition, the second approach in Belotti et al. (2016) consists in an iterative process that solves a sequence of MIPs (two for each ). They applied this approach to data with and , for this reason, they solve four MILPs in each iteration. However for the datasets analyzed in this paper with a large number of features, this approach does not make sense.

For FS-SVM, we develop two strategies to compute the bounds of for any . The first strategy proposed is based on solving the maximization of linear problems that report the lower/upper bounds of the variables and the second one uses the Lagrangian relaxation to tighten the bounds. Note that, in what follows, we will denote the linear relaxation of FS-SVM as LP-FS-SVM.

4.1 Strategy I

Given a subset , we will denote the restricted problem below, which is derived from the original FS-SVM, as FS-SVM():

Note that in this problem only a subset of variables , and are considered. This is equivalent to considering the FS-SVM where for . Consequently, the solution to this problem is feasible for the original problem and its objective value, called UB, is an upper bound of our model. Solving FS-SVM will be necessary in the application of Strategy I and it will also be used in the heuristic approach, as we will see in Section 5. The process given by Strategy I is described in Algorithm 1.

Data: Training sample composed by a set of elements with features.
Result: Updated values of upper bounds parameters for .
/* Step 1 */
For , let and be an optimal solution for LP-FS-SVM and set . Solve the restricted problem FS-SVM to obtain UB.
/* Step 2 */
for  to  do
       Solve the following linear programming problems for .
Let be the optimal value of the above problem.
       if  then
            , .
      
Algorithm 1 Strategy I

Remark 4.1

Note that the bounds obtained by Algorithm 1 can be improved by substituting constraint (14) in the (LP) for any for . However, the computational analysis addressed in a preliminary study showed that the improvement in quality of the bounds is very small and the running times increased when using this modification. For this reason, we decided to keep constraint (14) and not use this modification.

4.2 Strategy II

Unlike the previous strategy in which bounds for have been computed, this strategy will provide us with bounds for and independently. In this case, the strategy is based on the results below.

Theorem 4.1

Let be an optimal solution of LP-FS-SVM; its objective value; a vector of optimal values for the dual variables associated with the constraints (14); and for some .

  • If is an optimal solution of LP-FS-SVM restricting where is a positive constant, its objective value and , then

  • If is an optimal solution of LP-FS-SVM restricting with a positive constant, its objective value and , then

Proof:

We are only addressing statement i) here because statement ii) would be proved in a similar manner. Since is a vector of optimal values for the dual variables associated with the family of constraints (14), it holds that

In addition, since , we have that

(15)

On the other hand, consider the LP-FS-SVM with the additional constraints , and where the family of constraints (14) has been dualized, i.e.,

where . Hence, this problem can be rewritten as follows,

Note that , an optimal solution of LP-FS-SVM, is feasible for the problem above if . In addition, any feasible solution of the problem Lag-FS-SVM taking is feasible for the LP-FS-SVM where family of constraints (14) has been dualized.

Hence, for , using (15), the optimal objective value of the above problem is , which is the lower bound of the optimal value of the LP-FS-SVM with the additional constraint of .


Corollary 4.1

Under the hypothesis of Theorem 4.1, if we have an upper bound UB of FS-SVM, then it holds that

A detailed description of this second strategy can be found in Algorithm 2.

Data: Training data ( elements features).
Result: Tightened bounds of and .
/* Step 1 */
Solve the LP-FS-SVM and obtain the dual variables (denoted by ) associated with the family of constraints (14). Let be its optimal value and an optimal solution.
/* Step 2 */
Let UB be an upper bound of our original model (recall that an upper bound was computed in Strategy I). If ,
Set ,
if   then
       .
Set ,
if   then
       .
Algorithm 2 Strategy II

5 Heuristic Solution Approach: Kernel Search

Among the characteristics of the presented model, we must point out that each data feature () has an associated binary variable () which indicates whether or not feature is selected to construct the classifier. Therefore, the size of the problem, and consequently the time required for solving it, grows with the number of features. SVM usually works with real data using quite a large number of features. Hence, a heuristic approach that is suited to the model will help us to very quickly find appropriate, good solutions for those cases where exact methods cannot provide solutions within an acceptable time.

Specifically, we adapt the Kernel Search (KS) proposed by Angelelli et al. (2010). The basic idea of this heuristic approach is to solve a sequence of restricted MILPs derived from the original problem, thus obtaining a progressively better bound on the solution. The KS has been successfully applied to different kinds of problems such as portfolio optimization (Angelelli et al. (2012)) and location problems (Guastaroba and Speranza (2012)). Even though it was originally applied to pure binary formulations, it has been also used in problems with several continuous or integer variables associated with each binary variable.

Regarding our model, we observed that the continuous variables and are related to the binary variables by constraints (14) and (14). By applying this heuristic approach to our problem, we will solve a sequence of MILP problems with the same structure as the original one but only considering a subset of variables and the corresponding subset of continuous variables and which are associated with it. Since restricted MILPs only take into account a subset of variables, i.e. the remaining variables are fixed to , they will hopefully provide upper bounds in acceptable times.

In the KS, each restricted MILP of the sequence considers the variables that are most likely to take a value different from in the optimal solution of the original problem. These variables are called promising variables and they form the Kernel set of each restricted MILP. Detailed below is the complete KS procedure for our SVM model, including how to select the promising variables at each step and how to modify the Kernel.

5.1 Initial step

First, feature set must be sorted according to how much the corresponding variables are likely to take a value of in the optimal solution. The LP-FS-SVM is solved with this aim in mind, obtaining a solution and the reduced costs of variables and for each . Then, features are sorted in non-decreasing order with respect to vector , which is defined as:

(16)

Where, and are the reduced costs of variables and in the LP-FS-SVM, for .

To obtain the initial Kernel set (), the first features are chosen, having been sorted into a non-decreasing order with respect to vector . Specifically, we take

although is a parameter of the heuristic that can be modified.

Similar to Guastaroba and Speranza (2012), the remaining features are divided into subsets denoted as for . In particular, we take . Each subset , will be composed of features and will contain the remaining features. In fact, we restrict the KS to analyze only of the subsets, due to the size of the instances considered. Computational experiments have shown results when exploring of the total number of subsets, i.e., .

Given the initial Kernel , the upper bound (UB) of the problem is initialized by solving FS-SVM. Note that, as observed in Section 4, FS-SVM is equivalent to solving the original problem setting for . We should point out that any solution of FS-SVM() is always a feasible solution for FS-SVM, thus solving FS-SVM() we obtain an upper bound.

5.2 Main step

In each iteration (), the heuristic considers the set of features , i.e. the combination of the current Kernel and the features in the set . To update the UB, in each iteration FS-SVM() is solved plus the following two constraints, as denoted by FS-SVM()(17)(18),

(17)
(18)

Constraint (17) restricts the objective function to take a value smaller than or equal to the current upper bound and constraint (18) ensures that at least one feature belonging to will be chosen. We also impose the restriction that each problem has to be solved within a time limit of seconds. If no feasible solution can be found within this time limit, the algorithm skips to the next iteration. Note that this problem may potentially be infeasible due to the presence of constraints (17) and (18) together in the formulation. Otherwise, if FS-SVM()(17)(18) is feasible, the objective value, at least, will be equal to the previous UB because of constraint (17).

5.3 Update step

If the problem FS-SVM()(17)(18) (i.e., FS-SVM() where constraints (17) and (18) have been added) is feasible, then some features from are chosen in the optimal solution of FS-SVM()(17)(18). They are added to the current Kernel for the next iteration since adding these features obtains an identical or better upper bound. Conversely, the set of features of that has not been chosen in the optimal solution in the previous iterations is removed from the Kernel. The removal of some of the features from the Kernel is decisive in that it does not excessively increase the number of binary variables considered in each FS-SVM()(17)(18). In our case, we remove the features that were not selected in the previous two iterations. The set of added features is denoted as and the set of removed features as . The resulting Kernel for the next iteration is .

Coversely, if the problem is infeasible, the kernel is not modified and the procedure skips to the next iteration. The KS for the FS-SVM model is also described in Algorithm 3.

Data: Training data composed by a set of elements with features.
Parameter is initially fixed as described in Subsection 5.1.
Result: A feasible solution of FS-SVM model.
/* Initial Step */
Solve LP-FS-SVM. Sort the features in non-decreasing order with respect to vector defined in (16).
Build the initial kernel taking the first ordered features. Set .
Divide the remaining sorted features in a sequence of subsets ().
Select the number of subsets to analyze, , ().
Solve the FS-SVM() obtaining the initial upper bound (UB).
for  to  do
       /* Main Step */
       Build .
       Solve FS-SVM(17)(18).
       /* Update Step */
       if FS-SVM() is feasible (with solution and optimal value ) and the running time is smaller than s  then
            
             Build .
             Build .
             Update .
            
      Set .
      
Algorithm 3 Kernel Search for FS-SVM

6 Exact procedure

This section is devoted to the description of a procedure to get an FS-SVM optimal solution by solving a sequence of semi-relaxed problems. In this exact procedure, each semi-relaxed problem is associated with a subset of features in such a way that only the variables with will be considered as binary and the remaining ones will be relaxed. Specifically, the semi-relaxed version of the problem for a set of features is formulated as follows,

(SR-FS-SVM()) (20)

The optimal value of SR-FS-SVM() provides a lower bound of FS-SVM. By adding and removing certain features of , a sequence of semi-relaxed problems is created, providing lower bounds on the solutions. As we will see in the following subsections, strategies I, II and the Kernel Search are also used in this procedure.

6.1 Initial Step

To obtain initial bounds on the objective value, the exact procedure exploits techniques detailed previously. First, strategies I and II are used to tighten parameters and . It should be noted that the use of these strategies provide an initial upper bound (UB) and an initial lower bound (LB) for the objective value (by solving the linear relaxation). The Kernel Search is then performed in order to improve the UB given by the strategies.

6.2 Main Step

The main step of the exact procedure consists of solving a sequence of semi-relaxed problems to improve the lower bound of the objective value. To start with, we must select a subset of features whose associated variables will be considered as binary variables in the first semi-relaxed problem. The Kernel Search provides a subset of features that allows us to obtain a good bound on the optimal objective value. Therefore, the exact procedure will consider the set provided by the heuristic as the initial and it will obtain an initial LB solving SR-FS-SVM().

Then, the set is updated by adding and removing some of the features, improving the bound of the objective value. To this end, two sets (denoted by and ) are built in each iteration. Set consists of some of the features in whose associated variables will be considered as binary in the next iteration, i.e. features of will be added to . Similarly, consists of features in that will not be considered as binary in the next iteration. In addition and if possible, we will update the UB in the main step.

A general outline of the exact procedure is shown in Algorithm 4. Since the set can be modified using different rules to improve the lower bounds, we have provided three different update variants of this procedure in Algorithms 4.5, 4.6 and 4.7. In Variant I (Algorithm 4.5), the set is updated by adding the features using vector , sorted in non-decreasing order as described in the previous section. This ordered sequence of -variables is based on the idea that the features with the biggest reduced costs of variables and are less likely to be different from in the optimal solution of FS-SVM and those features with a positive value in the LP are the most likely to be different from .

Data: Training data composed by a set of elements with features.
Result: Optimal objective value or accurate upper and lower bounds.
/* Initial Step */
Run strategies I and II to tighten and and to obtain initial LB and UB.
Run the Kernel Search to obtain UB.
if UBUB then
       UBUB and run strategies I and II again.
/* Main Step */
Let be the final set obtained using the Kernel Search. Take .
while  do
       Solve SR-FS-SVM(). Let be its solution, its objective value.
       if running time of FS-SVM() s then
            break
      Let be the optimal values of variables with of SR-FS-SVM(). Solve the FS-SVM fixing and . Let be its objective value.
       if  then
             .
      if  then
             .
      /* Update Step */
       Build , composed by the features in that will be relaxed in the next iteration.
       Build , composed by the features in that will be added to in the next iteration.
       Update .
      
      
Algorithm 4 Exact Procedure

In contrast, in Variant II (Algorithm 4.6) the set is increased in each iteration by adding features that take a value bigger than in the SR-FS-SVM(). Lastly, in Variant III (Algorithm 4.7) the set is modified based on the reduced costs of the resulting linear programming problem after fixing the binary variables of SR-FS-SVM() to their optimal values. We thus obtain the reduced costs of variables and , and a vector similar to is created. In this case, we have denoted it as and it is defined as:

(21)

where , , and are the solutions and reduced costs of the problem described above. is sorted in non-decreasing order and is updated by adding the features in this order. A set of features are added in each iteration. In particular we take since it provides good results. Additionally, variables that are null in two consecutive iterations are relaxed.

Using these variants, we explore different forms to improve the lower bounds and to update the initial set of binary variables. The various performances of the described procedures are analyzed in Section 7.

/* Modified Main Step: Variant I */
Sort the features according to vector defined in (16).
Divide in a sequence of subsets () of a certain size , considering the order given by .
/* Update Step: Variant I */
Build . Update .
Algorithm 4. 5 Update Variant I.

/* Update Step: Variant II */
. Update and .
Algorithm 4. 6 Update Variant II

/* Update Step: Variant III */
Build has not been selected in the solution of the last two iterations.
Solve the LP program resulting of fixing the binary variables of SR-FS-SVM() to its optimal values.
Sort in non-decreasing according to the values of vector in (21).
Construct the set selecting the first features of the ordered set . In particular, we take .
Update .
Algorithm 4. 7 Update Variant III

7 Computational Results

In this section, we present the results provided by several computational experiments. In particular: i) how the use of Strategies I and II for fixing the upper and lower bounds of variables can reduce the computing times for our model; ii) the efficiency of the heuristic approach (Kernel Search) proposed in this paper; and lastly, iii) the study of the results provided by the different variants of the exact solution approaches.

It should be noted that the various computational experiments were performed using CPLEX 12.6.3 on an Intel(R) Core(TM) i7-4790K CPU 32 GB RAM computer. We should also remark that the CutPass, CutsFactor, EachCutLim, FracCuts, PreInd, RinHeur, EpInt and EpRHS parameters were modified in order to give a clean comparison of the relative performance of the formulations, i.e. by using these parameters we tried to avoid using the CPLEX internal heuristics since they can have a different influence on the previously described solution variants. The computational experiments were carried out on sixteen different datasets. Eight of them can be found in the UCI repository (Asuncion and Newman (2007)), (see Table 7.1), where is the number of elements, is the number of features and the last column shows the percentage of elements in each class. As can be observed, they contain a small number of features. The other eight datasets used in the experiments have a larger number of features (see Table 7.1). The Lepiota, Arrythmia, Madelon and MFeat datasets are also in UCI repository. A further description of the remaining datasets in Table 7.1 can be found in Alon et al. (1999), Carrizosa et al. (2010), Guyon et al. (2002), Maldonado et al. (2014), Golub et al. (1999), Shipp et al. (2002) and Notterman et al. (2001).

Small number of features
Name m n Class(%)
BUPA 345 6 42/58
PIMA 768 8 65/35
Cleveland 297 13 42/58
Housing 506 13 51/49
Australian 690 14 44/56
GC 1000 24 30/70
WBC 569 30 37/63
Ionosphere 351 33 64/36
(a)
Big number of features
Small sample size Big sample size
Name m n Class(%) Name m n Class(%)
Colon 62 2000 35/65 Lepiota 1824 109 52/48
Leukemia 72 5327 47/53 Arrythmia 420 258 57/43
DLBCL 77 7129 75/25 Madelon 2000 500 50/50
Carcinoma 36 7457 53/47 Mfeat 2000 649 10/90
(b)
Table 7.1: Datasets description.

Since the resolution times for the datasets in Table 7.1 and Lepiota dataset are really good when solving the formulation with CPLEX (a few seconds), in this section we focus our attention of developing alternative solution strategies to the instances with the largest number of features (different from Lepiota). In Subsection 7.1, we will analyze the instances with a big number of features and a small sample size (Colon, Leukemia, DLBCL, Carcinoma). Lastly, we will apply the best obtained techniques to the instances with a big sample size in Subsection 7.2.

7.1 Analysis of datasets with small sample size and big number of features

In this subsection we will analyze how the use of strategies I and II can affect Colon, Leukemia, DLBCL and Carcinoma datasets. The heuristic and exact procedures applied to these datasets are also studied in this subsection.

7.1.1 FS-SVM with Strategies I and II

Section 4 described two strategies for obtaining tightened bounds on parameters and . Table 7.2 reports the computational results of the proposed formulation, both with and without Strategies I and II. We used small values of , and a time limit of two hours (instances exceeding the time limit have been highlighted with their times underlined). However, since the running times are very short for small values of and the variables result to be null in most cases, we have only reported the results for . In this Table, the column labelled “FS-SVM” shows the gaps and running times of the proposed model. The second group of columns for each dataset, titled “St.+FS-SVM”, shows the results associated with the model after strategies I and II have been applied for obtaining tightened lower/upper bounds of -variables. The termination gap (%) is shown in the “Gap” column, whilst the “” column gives the time required for the two strategies, is the running time for solving the formulation once the parameters defining the bounds of have been fixed and is the overall process time. Lastly, column shows the average difference between the upper and lower bounds after the use of both strategies. Note that initially takes a large enough amount. Generally, it can be seen that the use of the strategies provides an average difference between both bounds of less than units. Therefore, Strategies I and II provide tightened bounds.

For the Colon dataset, FS-SVM cannot be solved for , and , within the time limit. However, if Strategies I and II are employed before solving the model, FS-SVM can be solved in less than minutes for all cases when . For and , the model cannot be solved in less than two hours, even if the strategies are performed, but the gaps at termination are smaller.

The second group of columns of Table 7.2 shows the results for the Leukemia dataset. In this instance, the model with the strategies solves the same cases as the model without the strategies. However, most of the cases that cannot be solved in less than two hours present smaller gaps if the strategies are used.

Table 7.2, also details the results for the DLBCL dataset. For and , this instance was not solved within the time limit, but when using the strategies it can be solved in less than approximately minutes. However, the model in which and cannot be solved, even if and are tightened with the strategies, although the gaps are once again better than when the strategies are not utilized.

The last group of columns shows the results of the Carcinoma dataset. In this case, the model cannot be solved in less than two hours for . However, if we use both strategies, the solution times improve and only two cases (with parameters ) remained unsolved after the time limit.

Regarding the reported results, we can conclude that the use of Strategies I and II leads to a reduction in running times in most cases. Furthermore, although the model cannot be solved for certain parameters values (even with the use of the strategies), the termination gaps are better if Strategies I and II are employed.

Colon m=62 n=2000 Leukemia m=72 n=5327 DLBCL m=77 n=7129 Carcinoma m=36 n=7457
B/C FS-SVM St. + FS-SVM FS-SVM St. + FS-SVM FS-SVM St. + FS-SVM FS-SVM St. + FS-SVM
Gap Time Gap Gap Time Gap