New crossover operators for multiple subset selection tasks

08/06/2014 ∙ by Arnab Roy, et al. ∙ Binghamton University 0

We have introduced two crossover operators, MMX-BLXexploit and MMX-BLXexplore, for simultaneously solving multiple feature/subset selection problems where the features may have numeric attributes and the subset sizes are not predefined. These operators differ on the level of exploration and exploitation they perform; one is designed to produce convergence controlled mutation and the other exhibits a quasi-constant mutation rate. We illustrate the characteristic of these operators by evolving pattern detectors to distinguish alcoholics from controls using their visually evoked response potentials (VERPs). This task encapsulates two groups of subset selection problems; choosing a subset of EEG leads along with the lead-weights (features with attributes) and the other that defines the temporal pattern that characterizes the alcoholic VERPs. We observed better generalization performance from MMX-BLXexplore. Perhaps, MMX-BLXexploit was handicapped by not having a restart mechanism. These operators are novel and appears to hold promise for solving simultaneous feature selection problems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The feature selection (FS) literature contains a wide variety of approaches designed to extract a subset of features that optimizes a given objective function (Schaffer, 2005; Siedlecki, 1989; Kudo and Sklansky, 2000; Liu and Yu, 2005; Seok, 2004; Lucasius and Kateman, 1992; Leardi, 1992; Oliveira, 2001; Zhang and Sun, 2002; Yang and Honavar, 1998; Mathias, 2000; Bala, 1996; Radcliffe, 1992; Narendra and Fukunaga, 1977)

. These techniques are generally grouped into 2 approaches; the wrapper approach aims at choosing a feature subset that can improve the performance of a classifier, whereas, the filter approach consists of an objective function that exploits statistical properties of the data (e.g. information content, correlation coefficient)

(Liu and Yu, 2005). Many empirical studies have reported that genetic algorithm (GA) based FS outperforms sequential search techniques for problems that involve a considerable number of features (typically 50) (Zhang and Sun, 2002; Seok, 2004). Here, we present a novel crossover operator for the genetic algorithm (GA) based FS task and implement a wrapper model to illustrate its characteristics. Typically, bit-string chromosome representations have been used where the bits represent the absence/presence of a feature (Yang and Honavar, 1998; Bala, 1996; Leardi, 1992; Siedlecki, 1989; Seok, 2004),however, index representations (features encoded as numeric values) have also been implemented (Lucasius and Kateman, 1992; Radcliffe, 1992; Mathias, 2000). Lucasius et al. (Lucasius and Kateman, 1992) introduced a crossover operator for index representation which was built upon the preservation of four basic properties of an encoded feature subset, identity, position, order, and adjacency, while transferring information from the parents to the daughter chromosome. Radcliffe (Radcliffe, 1992) developed the RRR111Random Respectful Recombination crossover operator for index representation and introduced the idea of respect, i.e. the child must inherit the common features between the parent chromosomes. Mathias et al. (Mathias, 2000) have implemented similar encoding scheme and introduced the MMX222Mix and Match crossover, the MMX-s333Mix and Match and sort crossover, and the MSX444Mix and Swap crossover crossover operators that maintain positive and negative respect in the absence of any mutation. The positive respect requires that the child must inherit common features between the parents and in order that the negative respect be maintained, the child should not contain features that are absent in both the parents. As a consequence of maintaining respect during the crossover, the above operators are known to produce convergence controlled variation (CCV) (Mathias, 2000); there is less variation among the chromosomes as the search converges. By introducing mutation to relax the requirement of negative respect, a convergence controlled mutation (CCM) can be achieved. For the purposes of this paper, we are interested to device two crossover operators, one that exhibits CCM and the other that has a quasi-constant mutation rate. Also, in order to facilitate encoding subsets of variable lengths, we decided to use index representation.

Some engineering applications require that multiple FS problems be solved simultaneously. For example, Roy et al. (Roy, 2013)

developed a spike neural network (SNN) based classifier to characterize the alcoholic brain using visually evoked response potentials. This task involved simultaneously solving two FS problems; one that required choosing a correct subset of EEG leads along with the lead-weights (features with attributes) and the other that defined the temporal pattern to be detected. It can be inferred from the literature review that not much work has been done for simultaneously solving multiple FS problems, and where the features may contain numeric attributes. Here, we introduce two versions of MMX-BLX

555Mix and match + blend crossover operator, which is an extension of the MMX-SSS operator (Schaffer, 2005), to accommodate these requirements. The versions vary based on the level of exploration and exploitation they perform, hence we call them MMX-BLX and MMX-BLX. We illustrate the characteristic of the above crossover operators by evolving temporal pattern detectors to characterize the alcoholic brain.

2 Crossover operators for numeric and categorical chromosomes

Mathais et al. (Mathias, 2000)

introduced the concept of positive/negative respect and presented the MSX and the MMX crossover operators for subset selection problems where the subset size was predetermined. These operators were made more exploratory by compromising negative respect; the positions in the daughter chromosomes which consisted of unmatched elements, i.e. elements which are not common between parents, were mutated with a certain predefined probability. By introducing a numeric gene to encode the subset size (SSS), Schaffer et al.

(Schaffer, 2005) extended the MSX and the MMX crossover operators for FS problems where the SSS was not predefined but only an upper bound () on the SSS was given. In order to crossover the SSS gene, the BLX operator (Eshelman and Schaffer, 1992)) was used. The SSS gene defined an acceptance boundary on the chromosome such that the encoded features on the right side of this boundary remained unexpressed; i.e. these features were not included in the subset. Therefore, in order to preserve the important features, MMX-SSS (Schaffer, 2005) copied the common genes to offspring one position to the left of their parental positions. Here, we introduce the MMX-BLX and the MMX-BLX crossover operators which are built upon the foundation of the MMX-SSS operator. We do not use a SSS gene; as a consequence, the order in which the features are encoded is irrelevant. Also, we allow the features to have multiple numeric attributes where the attributes of the common parental features are mated using the BLX operator (Eshelman and Schaffer, 1992)). Below we briefly present the BLX and the MMX-SSS crossover operators in order to introduce the important concepts necessary to explain the steps involved in MMX-BLX.

Figure 1: A: The BLX- operator generalizes FX operator by allowing the sampling range to expand beyond the parental allele values For =0 BLX behaves similar to FX. B: The figure illustrate a 2-dimensional BLX and FX operation. FX samples within the area marked by the dotted rectangle and BLX samples within the area marked by the solid rectangle.

2.1 Blx

Radcliffe’s flat crossover (FX) operator (Radcliffe, 1990) for numeric genes produces an offspring by uniformly picking a value between an interval defined by the parental allele-values (see Figure 1a). The possible allele-values for the child are defined by a rectangular region for a 2-D problem,(see Figure 1b), a region enclosed by a cuboid for a 3-D problem , and so on. Thus, the FX operator is respectful of this interval. Such a strategy is exploitative in nature and may lead to a premature convergence. Eshelman’s and Schaffer’s (Eshelman and Schaffer, 1992)) BLX- ( 0 ) operator is a generalization of FX that allows the child to have allele-values in an extended region defined by the parameter (see Figure 1a): BLX- is same as FX. If a child inherits an allele-value outside of the region bounded by the parent’s allele-values, then it is considered to be a mutation-event. Clearly, the BLX- (for 0) operator is a crossover/mutation operator where the level of mutation is coupled with the degree to which the population is converged, as a result it exhibits CCM. Below, we have provided a pseudo-code for the BLX- operator.



Pseudo-code for BLX-

Given:

  1. V1 = V1 is a

    -dimensional vector representing parent-1 allele values

  2. V2 = V2 is a -dimensional vector representing parent-2 allele values

  3. Vmax = A vector representing the maximum allowable allele values for the child

  4. Vmin = A vector representing the minimum allowable allele values for the child

  5. , where 0

BLX(V1,V2,Vmax,Vmin,)

  1. op = () Define an empty output list op

  2. For counter: 1 to

  3. Range = - Calculate the interval

  4. R1 = Min(,) - Range Define the lower bound

  5. R2 = Max(,) + Range Define the upper bound

  6. val = Uniform-random(R1,R2) Generate uniform random number

  7. If val then val Check for the upper bound

  8. If val then val Check for the lower bound

  9. Append(op,val) append val to the op list

  10. Return(op)

For the purposes of this paper the BLX operation for two scalar values, and , with upper and lower bounds, and , will be represented as BLX((),(),(),(),), where the parenthesis around the scalar values represents that they have been converted to a 1-dimensional vector.

2.2 Mmx-Sss

In MMX-SSS all chromosomes are of fixed length (), however, the number of genes that can be expressed may vary; this is defined using the SSS gene. The steps involved in the MMX-SSS crossover operation are as follows (Schaffer, 2005):

  1. Incest prevention: Avoid mating a pair of parent chromosomes which are too similar. Similarity is evaluated by comparing the expressed part of the chromosome that could be inherited by the offspring.

  2. Maintain positive respect: Copy all the genes (features) that are common between the parents to the offspring. These genes will be copied one position to the left of their parental position. Child-1 and child-2 will receive the common genes from parent-1 and parent-2, respectively.

  3. Maintain negative respect: The uncommon (uniques) genes between the parents are stored in a separate data-structure and sampled without replacement to fill the remaining positions in the offspring.

  4. Crossover SSS gene: Perform BLX using the parent SSS genes to generate an integer value for the child SSS-gene.

  5. Mutation: In order to allow for more exploration, the positions in the offspring occupied by the inherited unique genes will be replaced by an allele value randomly generated within an allowed range based on a predefined mutation rate. Care should be taken so that the same feature is not encoded twice.

As a consequence of maintaining positive respect (during crossover), over many generations more chromosomes will contain common features (genes). Also, as the inherited common elements in an offspring are never mutated, this will result in a CCM. Clearly, CCM is an important property that both BLX and MMX-SSS share.

2.3 Mmx-Blx

The MMX-BLX crossover operator is designed for simultaneously solving FS problems where the features may have multiple numeric attributes and the subset sizes are not predefined. In order to accommodate these requirements we had to introduce a complex chromosome structure that consists of sub-chromosomes, where the sub-chromosome- ( 1 ) encodes a subset for the FS task. The MMX-BLX operation involves 3 basic tasks: defining the length of an offspring’s sub-chromosome, modifying the attributes of the parental features666A copy of the parental features along with their attributes are stored in a separate data-structure. The feature-attributes are then modified in this data-structure before transferring them to the offspring., and defining the rules by which the offspring shall inherit features for the FS task. All the parameters that are necessary for these tasks are listed in Table 1. Below, we have explained the steps for accomplishing these tasks.

Parameter Definition Constraint
Will be used for indexing the feature selection task 1
Will be used for defining the offspring’s sub-chromosomal lengths. 0
Will be used for modifying the attributes of the common parental features. 0
Will be used for the mutation operation. 0 1
Will be used for generating the attribute vector for the unique features. 0
The length of the attribute vector of the features that are part of the FS task. 0
A list defining the upper bound on the attribute values for features that are part of the FS task.
A list defining the lower bound on the attribute values for features that are part of the FS task.
A list defining the datatype of the individual elements of the attribute vector of features that are part of the FS task. Datatype can be integer or real
The max feature-index for the FS task.
The min feature-index for the FS task.
The maximum subset size for the FS task.
The minimum subset size for the FS task. ( - + 1) 0
Total number of FS tasks 0
Parent-1 chromosome.
Parent-2 chromosome.
Table 1: The above table illustrates the parameters required by the MMX-BLX crossover operator.

2.3.1 Defining the length of an offspring

The length of the sub-chromosome- of chromosome- () is defined by the number of features (genes) it encodes and may vary among chromosomes. The length of offspring’s sub-chromosome- () is defined by performing BLX operation over a range bounded by the lengths of the parental sub-chromosome- (see Figure 1a). This operation is formally defined as follows:

(1)
(2)
(3)

The Round function guarantees that the output of this function will be an integer.

2.3.2 Modifying the attributes of the parental features

In MMX-BLX, all features that are part of the FS task must have attribute vectors of same length: . For a pair of parental chromosomes, for the FS task, the common, the unique, and the absent features (along with their attributes) are stored in 3 bags: bag of common (BC-), bag of uniques (BU-), and bag of absent (BA-), and their attributes are modified777Attributes of the features stores in the bags are modified. The parental chromosomes remain unchanged.. The bags will contain only 1 instance of a feature; there will be no duplicate copies. The process by which the attributes are modified are discussed below:

Figure 2: The above figure illustrates the process by which new attribute vector (the hexagonal boxes) for the child is generated from the parental attribute-vector (the circles) of an unique feature. In the above figure the attribute-vector consists of 2 integer components and 1 real component. The dotted lines indicate the range within which the child attribute values will be generated. The range varies for different elements depending on the values of the corresponding elements of and vector, and the parameter . If any element of the child attribute-vector (the highlighted hexagonal box) exceeds the maximum/minimum allowable value (the vertical dotted lines), it will be updated to the threshold value it violated.
  1. For a common parental feature, , the attribute-vectors, and , are used for defining a region for the BLX operation to generate a new attribute-vector for the offspring (see Figure 1b). This operation is formally defined as:

    (4)

    where, represents an instance of all common features for the FS task.

  2. For an unique feature, , the attribute-vector, , is modified by performing BLX over a range (controlled by the parameter ) that is defined by two vectors which are derived by incrementing and decrementing each element of by an equal amount (see Figure 2). The steps involved in this operation are shown below:

    (5)
    (6)
    (7)
    (8)

    where, represents an instance of all unique features for the FS task.

  3. For an absent feature (features that are not present in either parents), , the attribute-vector is set by performing BLX over a region that is bounded by 2 user-defined vectors (see Figure 1b): and . This operation is formally defined as:

    (9)

    where, represents an instance of all absent features for the FS task.

2.3.3 Inheriting features

Below, we have presented the steps involved in the MMX-BLX and the MMX-BLX crossover operation for the sub-chromosome. MMX-BLX operation produces a single offspring for a given pair of parents; in order to produce 2 offspring for a given parental pair, 2 independent MMX-BLX operations should be performed 888In our experiments 2 offspring are produced per parental pair. To generate an offspring, the following steps should be repeated for all sub-chromosomes.MMX-BLX

  1. Determine the length () of the offspring’s sub-chromosome- using the process discussed above.

  2. If BC- is not empty then choose a feature (along with the attributes) from it randomly without replacement and place it in the offspring’s sub-chromosome- else goto step-4.

  3. If the current length of the offspring’s sub-chromosome- is less than then goto step-2.

  4. Choose a feature (along with the attributes) randomly without replacement from BU- or BA- with a predefined probability and place it in the offspring’s sub-chromosome-. The parameter will become irrelevant when either BU- or BA- becomes empty.

  5. If the current length of the offspring’s sub-chromosome- is less than then goto step-4 else stop.

MMX-BLX

  1. Determine the length () of the offspring’s sub-chromosome- using the process discussed above.

  2. If BC- and BA- are not empty then choose a feature (along with the attributes) randomly without replacement from BC- or BA- with a predefined probability and place it in the offspring’s sub-chromosome-. Then goto step-6.

  3. If BC- is empty and BA- is not, then choose a feature (along with the attributes) randomly without replacement from BU- or BA- with a predefined probability and place it in the offspring’s sub-chromosome-. Then goto step-6.

  4. If BA- is empty and BC- is not, then choose a feature (along with the attributes) randomly without replacement from BC- and place it in the offspring’s sub-chromosome-. Then goto step-6.

  5. If both BC- and BA- are empty then choose a feature (along with the attributes) randomly without replacement from BU- and place it in the offspring’s sub-chromosome-. Then goto step-6.

  6. If the current length of the offspring’s sub-chromosome- is less than then goto step-2 else stop.

2.3.4 Summarizing MMX-BLX

The MMX-BLX operator places a greater emphasis on positive respect but allows for mutation to occur by sacrificing negative respect. By setting the parameter = 0 negative respect can be maintained. At the other extreme, by setting = 1 the search can be made to be more exploratory. Regardless of the parameter value, MMX-BLX always prioritizes positive respect. This aspect of MMX-BLX is very similar to MMX-SSS. MMX-BLX however, sacrifices respect (both +ve and -ve) in order to allow for more exploration to occur. Clearly, MMX-BLX should exhibit CCM, whereas MMX-BLX should have a quasi-constant mutation rate. MMX-BLX may be useful for problems that require more exploration or where the fitness function gradually changes over time. We believe that both operators can be made more effective if some form of incest prevention scheme can be applied. Here, we have not implemented such a scheme as defining a similarity measure for chromosomes that encode multiple feature subsets is not trivial and requires a separate investigation. In comparison to MMX-SSS where all chromosomes are of fixed length and the SSS gene defines the expressible part of the chromosome, here, chromosomes vary in size and there are no passive genes. Hence, the memory is more efficiently used.

While simultaneously solving FS tasks, there may be an additional constraint that some of the sub-chromosomes must be of same length; i.e. the encoded subset sizes must be similar. In order to satisfy this constraint, we generate the lengths of the offspring’s sub-chromosomes that encode these tasks using a parental sub-chromosome- that encodes one of these tasks. This is possible because all the parental sub-chromosomes encoding these tasks are of same lengths.

3 The alcoholic classification task

The screening procedures for alcoholism currently used in the clinics are mainly questionnaire based tests that involve queries related to social/family problems the subject may have faced due to drinking, guilt associated with the addiction, and pattern of alcohol consumption (Cherpitel, 1997; Kahan, 1996). These tests are not very accurate and their results may vary with gender and race (Cherpitel, 1997). Tests that are based on detecting the physiological changes associated with the disease may not only be more accurate, but may also be more informative for clinical purposes. Recent MRI, fMRI, and EEG studies have reported occurrences of structural and functional changes in the alcoholic brain (Harper, 1998, 1987, 1985; George, 2004; Mann, 2001). An EEG based alcoholic screening procedure may be clinically useful because of its portability, afford-ability , and good temporal resolution. Here, we will evolve a temporal pattern detector (TPD) that can characterize the alcoholic brain using its visually evoked response potentials (VERPs) which are recorded using electroencephalogram (EEG). The primary goal of the evolutionary process will be to select a set of EEG leads along with weights and to evolve the design specifications for the TPD. Below, we have explained the VERP dataset, the TPD technology, and the steps involved in evolving the TPDs for the alcoholic classification task.

3.1 The VERP dataset

The dataset consisted of VERPs from two groups of subjects: alcoholics and controls. Each subject was exposed to a visual stimulus (a picture), thus providing 1 second of signal (256 samples) from 62 EEG leads starting at the exposure to the picture. For a given subject, multiple such trials were performed; the number of trials per subject varied from 7 to 60. For a given trial, if any of the leads generated a signal that exceeded 100V, it was discarded for containing blink artifacts. Subjects with fewer than 40 trials were not included in this study; 47 alcoholics and 31 controls were eligible. For each subject, 36 randomly chosen trials were used to calculate the average signal across each lead 99962 average signals (representing 62 leads) each composed of 256 data points. for training the classifier and another set of 36 randomly chosen trials were used for developing an average signal for the testing purposes; clearly, there will be many instances of overlapping trails. In order to reduce the extent of overlap, we could have used fewer trials for averaging, however, the signals would have been too noisy for the classification task. VERPs are much weaker than the background EEG activity and typically at-least 100 trails 101010with no artifacts are required to generate an averaged signal with a respectable signal to noise ratio. Hence, we expect our training files to be noisy and therefore, we believe it is an interesting challenge for the evolution to find a robust temporal pattern with an appropriate tolerance level that can characterize the alcoholic VERPs.

3.2 The temporal pattern detector

The temporal pattern detection task involves detecting predefined temporal structures in a time-series. Roy et al. (Roy, 2013)

introduced a design rule for a spike neural network based TPD which consisted of a serial chain of sequence detectors, each designed to detect the occurrences of a predefined inter-spike interval (ISI) pattern in a serial spike train, within a fixed tolerance limit (a box-car function). The design specifications were many fewer than the number of network parameters for the TPD. This provided an opportunity for the evolutionary algorithm to learn the design specifications, instead of having to tune myriad network parameters. This idea was successfully tested on the alcoholic classification task where characteristic temporal patterns in the alcoholic VERPs were found by first converting the time-series to a spike train

(Roy, 2013) and then evolving the TPDs to detect hidden ISI patterns. We tried evolving the above TPD for a larger alcoholic dataset, however, the results were not satisfactory. We hypothesize that due to the rigid tolerance window associated with the above TPDs, partial-credit cannot be assigned to the temporal patterns that do not fall within the desired specification, but come close to it. This may result in a fitness landscape that may not be favorable for the evolutionary process. Here, we will introduce a new temporal pattern detector where the tolerance window for the desired temporal pattern will be represented by a -dimensional continuous function:

(10)

This enables us to rate the temporal patterns on a continuous scale, unlike the TPD introduced by Roy et al. (Roy, 2013) where a boolean rating system was used. The function is formally explained below:

  • represents a finite length () discrete time-series, 0 , and .

  • represents a set of distinct pointers on the time-series where indexes an element, , of the set. Also, , , and 0 .

  • represents the lead pointer to a position on , such that .

(11)

where, , , 0 , and 1 .

Summarizing the function

The function is used as a tolerance function for the amplitude difference between the points, and , where, represents the desired amplitude difference (a part of the desired pattern). It reaches its maximum value, , when . The parameters and are used for controlling the width and the manner in which decays (see Figure 3), respectively. The shape of the function influences the partial credit assignment for the temporal patterns that deviate from the desired temporal structure. The product, is a mechanism by which we can evaluate to what degree the individual amplitude differences between the points, and , have deviated from the desired set of amplitude differences (the desired temporal pattern). The occurrences of the desired temporal pattern in a discrete signal can be evaluated by moving the pointer sequentially over the discrete-signal (see Figure 4). This can be formally stated as follows:

(12)

Figure 3: Shaping the tolerance function . A: amplitude = 10, cutoff = 2, support = 0, order= 1. B: amplitude = 10, cutoff = 15, support = 0, order= 1. C: amplitude = 10, cutoff = 2, support = 0, order= 20. D: amplitude = 10, cutoff = 15, support = 0, order= 20.

Figure 4: The above figure illustrates the process by which a TPD scans the occurrences of a desired temporal pattern in a discrete signal. For the above TPD = 2; hence the TPD consists of 3 pointers representing a trident. The vertical arrow (on the arm of the trident) represents the current value of . The horizontal arrow as well as the dotted-trident suggests that the TPD will be sequentially moved across the signal.

The element(s) of the set , and the parameters , , , and , will all be set by the evolutionary process. For the purposes of this paper we will keep the function symmetric, i.e. for all the possible values of , the parameters , , and , will be set to the evolved values, , , and , respectively. The value(s) will be set using an indirect encoding scheme, explained later in the paper.

Figure 5: The above figure illustrates the steps involved in the creation of a composite signal. The leftmost figure represents the EEG electrodes. In order to create the composite signal, electrodes AF1, CP5, and P4 are selected. The second column shows the signal arriving from the chosen electrodes. Finally, the weighted sum of these signals, the composite signal, is shown in the last column.

3.3 The chromosome encoding

The alcoholic classification problem consists of 2 sub-tasks: the spatial task, and the temporal task. The spatial task involves choosing an appropriate subset of EEG leads along with the lead-weights using which a composite signal can be created (see Figure 5). The objective of the temporal task is to design a TPD that can find a characteristic temporal pattern in the alcoholic composite signals; these patterns should occur more frequently in the alcoholic VERPs, and many fewer times in the control VERPs. Since, there are only 62 EEG leads, a sub-chromosome is assigned to directly encode (the genotype is the phenotype) the leads along with the weights; an EEG lead is considered a feature and the corresponding weight is its numeric (real number) attribute. Contrastingly, the space of possible temporal patterns is overwhelmingly large. As a consequence, the TPD design specifications are indirectly encoded into 3 sub-chromosomes; the sub-chromosomes will encode a teacher-id (a pointer to an alcoholic training signal) and a specific set of pointers to the positions on the teacher composite signal (see Table 2). The signal amplitudes at these positions will define the temporal pattern the TPD will be designed to detect; this guarantees that the temporal pattern of interest exists in at-least one alcoholic composite signal and may qualify as a candidate solution for the classification task. This approach involves an implicit assumption that the alcoholic signals are homogeneous. In order to accommodate the possibility of class heterogeneity, we allow a chromosome to encode at most 2 teacher signals using which 2 TPDs can be created. This decision was based on the outcome of a preliminary investigation, where the objective was to establish a sense of the minimum number of TPDs required 111111A classifier composed of too many TPDs may over-fit the training set. in order for the classifier to distinguish the alcoholic cases with an acceptable precision and accuracy. The information encoded by the sub-chromosomes are summarized in Table 2:

Sub-chromosome Feature-id Attribute(s) Comments subset length
(Max/Min) (Max/Min/type) Max/Min
1 Sensor-id (62/1) Sensor-weight
(4.0/-4.0/Real) 5/1

2
Teacher-id (47/1) There are 47 alcoholic training signals each index by an unique id. 2/1

3
Reference-pointer (250/97) Skip-length
(12/1/Int) 1. The reference pointer (RP) marks a position on the composite teacher signal w.r.t which other pointers will be defined. The signal amplitude at these positions will then define the temporal pattern of interest (TPI).
2. The teacher signal amplitude at the positions - are candidates for defining the TPI, where 1 8 . 2/1

4
Qualification-id (255/1) Cutoff
(20.0/0.1/Real)
Order
(15/1/Int)
Amplitude
(1.0/0.0/Real) 1. Qualification-id (QI) is first converted to a 8-bit binary form. If the value at bit position is 1, then else doesn’t qualify for defining the TPI.
2. The Cutoff, Order, and the Amplitude values are same for all (see TPD equations). 2/1

Table 2: The above table illustrates the chromosome encoding scheme for the alcoholic classification task. The sub-chromosome-1, sub-chromosome-2, and sub-chromosome-3, must be of same length.

3.4 The objective function/selection procedure

The aim of the evolutionary process is to find a TPD whose output () will be much larger for the alcoholic composite signals than the control composite signals. In order to assign a penalty-value to a chromosome, the output of the TPD (that it encodes) for all the training cases (47 alcoholic + 31 control cases) are stored as two distributions: and

. These discrete-distributions are then used for calculating the area under the receiver operator characteristic curve (AUC). The penalty is then evaluated as follows:

(13)

Thus, penalty = 0 would suggest that the output of the TPD, , for the alcoholic cases are always greater than the control cases, implying there exists no overlap between the and the distributions.

We have implemented an elitist selection procedure introduced by Eshelman (Eshelman, 1991), where the parents and the offspring compete for a position in the population. The parents and the offspring are first sorted by penalty (in ascending order), and only the top chromosomes are selected 121212The fittest chromosomes are selected, where represents the population size. Since, the subset sizes for the current application were modest, no selection pressure to evolve smaller subsets was deemed necessary.

Figure 6: The above plot illustrates how the population penalty changes over generations when MMX-BLX (Left) and MMX-BLX crossover operators (right) are implemented.

Figure 7: The above plot illustrates how the number of features that are selected from the bag of commons, the bag of uniques, and the bag of absents change over generations when MMX-BLX (left) and MMX-BLX crossover operators (right) are implemented. A mutation event involves selecting a feature from the bag of absents.

Figure 8: Emergence of sensor-subsets in two evolutionary runs: MMX-BLX (left) and MMX-BLX (right). In both cases a sensor subset of size 5 was evolved.

4 Results

In order to characterize the behavior of MMX-BLX and MMX-BLX, as well as to test the repeatability of the learning paradigm, we conducted 3 independent experiments using each crossover operator. The population size was set to 50 and each experiment was run for 5000 generations. The crossover parameters , , , and , were set to 1, 1.4, 0.85, and 0.75, respectively. In Figure 6, Figure 7, and Figure 8, we have illustrated the behavior of these crossover operators for only one experiment; the behavior was similar across all 3 experiments. The performance of the evolved classifier for all 3 experiments have been summarized in Table 3. For both crossover operators the population penalty had an initial rapid fall till generation-400 (see Figure 6) during which the evolution made a decision on the spatial aspect of the problem; chromosomes encoding undesirable combinations of EEG sensors were eliminated (see Figure 8). Even though the chosen subset of sensors varied between the experiments, they primarily represented the central, the right-parietal/occipital, and the right-frontal regions of the brain. Once the spatial task was solved, from generation-400 onwards the evolution focused on the temporal aspect of the problem. We did not predefine the order in which these tasks should be solved; it was an emergent behavior.

Both the operators were able to produce a good solution by generation. While the population penalty for MMX-BLX remained unchanged after generation-2500, MMX-BLX was able to find new solutions between generation-4500 to generation-5000, suggesting that the current task may require more exploration. For MMX-BLX, after generation-1500 there were no mutation events; all features were chosen from the bag of commons (see Figure 7). Also the mutation events gradually decreased to 0 by generation-1500. This corroborates our hypothesis that MMX-BLX exhibits a CCM and is less exploratory in nature. Figure 7 also suggests that MMX-BLX has a quasi-constant mutation rate that helps maintain genetic diversity.

In Table 3 we have illustrated the performance of the best chromosome in generation-5000 on both training and the test cases. Interestingly, in spite of the fact that MMX-BLX is less exploratory in nature, it performed slightly better than MMX-BLX on the training set. However, MMX-BLX was able to find more robust solutions; its performance on the test set was much better. Perhaps, MMX-BLX was handicapped by not having a soft-restart mechanism (Eshelman, 1991). Such a mechanism requires an incest prevention scheme based on a similarity metric that we did not implement as the chromosome-structure was complex.

Experiment No. MMX-BLX MMX-BLX
Training/Test penalty Training/Test penalty

1
0.0254/0.0417 0.0254/0.1282
2 0.0357/0.0703 0.0295/0.09
3 0.046/0.0906 0.0336/0.1283

Table 3: The above table summarizes the performance of the evolved classifier.

5 Conclusion

Both, MMX-BLX and MMX-BLX provides a mechanism to simultaneously solve multiple FS problems where the features may have numeric attribute(s) and the subset size is not predefined. Since, MMX-BLX prioritizes positive respect it is able to perform rigorous search in a region defined by the parental chromosomes. On the one hand this feature allows it to perform better on the training set, but on the other hand it seems to yield less general solutions. A soft-restart mechanism may allow MMX-BLX to perform more exploratory search.

MMX-BLX seem to be able to find more robust solutions by virtue of quasi-constant mutation process even-though the fitness function used did not explicitly evaluate generalization. Mutation allows more exploration by sacrificing respect; this trade-off may be problem specific and requires additional investigation.

The conventional techniques used for distinguishing the alcoholic VERPs from the controls primarily consist of 2 steps, developing a set of feature vectors and training a classifier using these feature vectors (Palaniappan, 2002, 2005, 2006, 2007; Ong, 2005; Kousarrizi, 2009; Shri and Sriraam, 2012; Shahina, 2008). Most authors have used the information in the gamma band (30-50 Hz) to develop feature vectors. Using, the evolutionary learning paradigm along with the TPD technology we were able to solve this problem in 1 step; we did not make any assumptions regarding the data. The TPD technology introduced here is an extension of the pattern detector developed by Roy et al. (Roy, 2013) and provides a mechanism to assign partial credits to temporal patterns that vary from the desired specification. We believe this makes the search landscape more favorable to an effective evolutionary search. Finally, by allowing the evolution to interact with the environment (the alcoholic teacher signal), the search for a temporal pattern that can characterize the alcoholic VERP is made more manageable.

6 Acknowledgements

The data for this research was made available on the web by Henri Begleiter, Neurodynamics Laboratory, State University of New York Health Center at Brooklyn.

References

  • Bala (1996) Bala, J., De Jong, K., Huang, J., Vafaie, H., and Wechsler, H. (1996). Using learning to facilitate the evolution of features for recognizing visual concepts. Evolutionary Computation, volume 4, number 3, MIT Press, pp. 297–311.
  • Cherpitel (1997) Cherpitel, C. J. (1997). www.hawaii.edu/hivandaids/brief%20screening%20instruments
    %20for%20alcoholism.pdf
  • Eshelman (1991) Eshelman, L. (1991). The CHC Adaptive search algorithm. How to have safe search when engaging in nontraditional genetic recombination G. Rawlins, Editor, Foundations of Genetic Algorithms , pp. 265–283.
  • Eshelman and Schaffer (1992)) Eshelman, L. J., and Schaffer, J. D. (1992). Real-Coded Genetic Algorithms and Interval-Schemata. Foundations of Genetic Algorithms, Morgan Kaufmann, pp. 187–202.
  • George (2004) George, M., Potts, G., Kothman, D., Martin, L., and Mukundan, C. (2004), Frontal deficits in alcoholism: an erp study. Brain Cogn 54, 3 , pp. 245–7.
  • Harper (1985) Harper, C., Kril, J., and Holloway, R. (1985), Brain shrinkage in chronic alcoholics: A pathological study. British Medical Journal 290 , pp. 501 – 504.
  • Harper (1987)

    Harper, C., Kril, J., and Daly, J.(1985), Are we drinking our neurones away?

    British Medical Journal 294 , pp. 534 – 536.
  • Harper (1998) Harper, C. (1998), The neuropathology of alcohol-specific brain damage, or does alcohol damage the brain? Journal of Neuropathology and Experimental Neurology 57 , pp. 101 – 110.
  • Hussain (2001) Hussain, F., Nawwaf, K., and Rabab W. (2001). Genetic Algorithms for Feature Selection and Weighting, A Review and Study. Proceedings of the Sixth International Conference on Document Analysis and Recognition, IEEE Computer Society, Washington, DC, USA, pp. 1240–1244.
  • Kahan (1996) Kahan, M. (1996). Identifying and managing problem drinkers. Canadian Family Physician 42 , pp. 661 – 671.
  • Kousarrizi (2009) Kousarrizi, M., Ghanbari, A., Gharaviri, A., Teshnehlab, M., and Aliyari, M. (2009). Classification of alcoholics and non-alcoholics via eeg using svm and neural networks. 3rd International Conference on Bioinformatics and Biomedical Engineering, pp. 1–4.
  • Kudo and Sklansky (2000) Kudo, M. and Sklansky, J. (2000). Comparison of algorithms that select features for pattern classifiers. Pattern Recogn. Lett., Elsevier Science Inc., New York, NY, USA, pp. 25–41.
  • Leardi (1992) Leardi, R., Boggia, R., and Terrile, M. (1992). Genetic algorithms as a strategy for feature selection. Journal of chemometrics, Volume 6, Issue 5, pp. 267–281.
  • Liu and Yu (2005) Liu, H. and Yu, L. (2005). Toward integrating feature selection algorithms for classification and selection. IEEE Trans. Knowledge and Data Eng.vol. 17, no. 4, pp. 491–502.
  • Lucasius and Kateman (1992) Lucasius, C., and Kateman, G. (1992). Towards Solving Subset Selection Problems with the Aid of the Genetic Algorithm. PPSN, pp. 241–250.
  • Mann (2001) Mann, K., Agartz, I., Harper, C., Shoaf, S., Rawlings, R. R., Momenan, R., Hommer, D. W., Pfefferbaum, A., Sullivan, E. V., Anton, R. F., Drobes, D. J., George, M. S., Bares, R., Machulla, H.-J., Mundle, G., Reimold, M., and Heinz, A., (2001). Neuroimaging in alcoholism: Ethanol and brain damage. Alcoholism: Clinical and Experimental Research. 25, pp. 104S–109S.
  • Mathias (2000) Mathias, K.E., Eshelman, L. J., Schaffer, J. D.,Augusteijn, L., Hoogendijk, P. F., Wiel, R. (2000). Code Compaction Using Genetic Algorithms. Genetic and Evolutionary Computation Conference, pp. 710–717.
  • Narendra and Fukunaga (1977) Narendra, P. M. and Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers, IEEE, pp. 917–922.
  • Oliveira (2001) Oliveira, L.S., Benahmed, N., Sabourin, R., Bortolozzi, F., and Suen, C.Y. (2001). Feature subset selection using genetic algorithms for handwritten digit recognition. In Proceedings of the 14th Brazilian Symposium on Computer Graphics and Image Processing, Florianópolis-Brazil, IEEE Computer Society, pp. 362–369.
  • Ong (2005) Ong, K., Thung, K., Wee, C., and Paramesran, R. (2005). Selection of a subset of eeg channels using pca to classify alcoholics and non-alcoholics. Proceedings of the 2005 IEEE,Engineering in Medicine and Biology 27th Annual Conference Shanghai, China, pp. 4195–4198.
  • Palaniappan (2002) Palaniappan, R. (2002). Using genetic algorithm to identify the discriminatory subset of multi-channel spectral bands for visual response. Applied Soft Computing 2, pp. 48–60.
  • Palaniappan (2005)

    Palaniappan, R. (2005). Discrimination of alcoholic subjects using second order autoregressive modeling of brain signals evoked during visual stimulus perception.

    World Academy of Science. Engineering and Technology, pp. 282–287.
  • Palaniappan (2006) Palaniappan, R. (2006). Improved automated classification of alcoholics and non-alcoholics. Information Technology, pp. 182–186.
  • Palaniappan (2007) Palaniappan, R. (2007).Screening for chronic alcoholic subject using multiple gamma band eeg: A pilot study. Journal of Computer Science and Technology, pp. 182–185.
  • Radcliffe (1990) Radcliffe, N. J. (1990). Genetic neural networks on MIMD Computers. Ph.D. dissertation, Department of Theoretical Physics, University of Edinburgh, Edinburgh, UK.
  • Radcliffe (1992) Radcliffe, N. J. (1992). Genetic set recombination. Foundations of Genetic Algorithms, Morgan Kaufmann, pp. 203–219.
  • Roy (2013) Roy, A., Schaffer, J. D., and Laramee, C. B. (2013). Evolving spike neural network sensors to characterize the alcoholic brain using visually evoked response potential. Complex Adaptive Systems, Baltimore, MD.
  • Schaffer (2005) Schaffer, J. D., Janevski, A., and Simpson, M. (2005). A genetic algorithm approach for discovering diagnostic patterns in molecular measurement data. Symposium on Computational Intelligence in Bioinformatics and Computational Biology, IEEE, pp. 392–399.
  • Seok (2004) Seok, O., Lee, J., and Moon, B. (2004). Hybrid genetic algorithms for feature selection. IEEE Transaction on Pattern Analysis Machine Intelligence vol. 26 , no. 11, pp. 1424-1437.
  • Shahina (2008) Shahina, A., Karthikeyan, R., Rakhi, R., Gopal, M., and Khan, A. (2008). Auto-associative neural networks for discrimination of chronic alcoholics using visual evoked potentials. International Conference on Computing, Communication and Networking, pp. 1–6.
  • Shri and Sriraam (2012) Shri, T.P., and Sriraam, N. (2012). EEG based detection of alcoholics-A selective review. International Journal of Biomedical and Clinical Engineering, vol-1.
  • Siedlecki (1989) Siedlecki, W. and Sklansky, J. (1989). A Note on Genetic Algorithms for Large-scale Feature Selection. Pattern Recogn. Lett., Elsevier Science Inc., New York, NY, USA, pp. 335–347.
  • Yang and Honavar (1998) Yang, J. and Honavar, V. (1998). Feature subset selection using a genetic algorithm. IEEE Intelligent Systems archive,Vol. 13 Issue 2, pp. 44–49.
  • Zhang and Sun (2002) Zhang, H., and Sun, G. (2002). Feature selection using tabu search method. Pattern recognition, volume 35, number 3, Elsevier, pp. 701–711.