Evolutionary algorithms (EAs) Bäck (1996) have been successfully applied to many fields and can achieve extraordinary performance in addressing some real-world hard problems, particularly NP-hard problems Higuchi et al. (1999); Koza et al. (2003); Hornby et al. (2006); Benkhelifa et al. (2009). To gain an understanding of the behavior of EAs, many theoretical studies have focused on the running time required to achieve exact optimal solutions He and Yao (2001); Yu and Zhou (2008); Neumann and Witt (2010); Auger and Doerr (2011). In practice, EAs are most commonly used to obtain satisficing solutions, yet theoretical studies of the approximation ability of EAs have only emerged recently.
He and Yao (2003) first studied conditions under which the wide-gap far distance and the narrow-gap long distance problems are hard to approximate using EAs. Giel and Wegener (2003) investigated a (1+1)-EA for a maximum matching problem and found that the time taken time to find exact optimal solutions is exponential but is only for -approximate solutions, which demonstrates the value of EAs as approximation algorithms.
Subsequently, further results on the approximation ability of EAs were reported. For the (1+1)-EA, the simplest type of EA, two classes of results have been obtained. On one hand, it was found that the (1+1)-EA has an arbitrarily poor approximation ratio for the minimum vertex cover problem and thus also for the minimum set cover problem Friedrich et al. (2010); Oliveto et al. (2009). On the other hand, it was also found that (1+1)-EA provides a polynomial-time randomized approximation scheme for a subclass of the partition problem Witt (2005)
. Furthermore, for some subclasses of the minimum vertex cover problem for which the (1+1)-EA gets stuck, a multiple restart strategy allows the EA to recover an optimal solution with high probabilityOliveto et al. (2009). Another result for the (1+1)-EA is that it improves a -approximation algorithm to a -approximation on the minimum vertex cover problem Friedrich et al. (2009). This implies that it might sometimes be useful as a post-optimizer.
Recent advances in multi-objective (usually bi-objective) EAs have shed light on the power of EAs as approximation optimizers. For a single-objective problem, multi-objective reformulation introduces an auxiliary objective function for which a multi-objective EA is used as the optimizer. Scharnow et al. (2002) first suggested that multi-objective reformulation could be superior to use of a single-objective EA. This was confirmed for various problems Neumann and Wegener (2005, 2007); Friedrich et al. (2010); Neumann et al. (2008) by showing that while a single-objective EA could get stuck, multi-objective reformulation helps to solve the problems efficiently.
Regarding approximations, it has been shown that multi-objective EAs are effective for some NP-hard problems. Friedrich et al. (2010) proved that a multi-objective EA achieves a -approximation ratio for the minimum set cover problem, and reaches the asymptotic lower bound in polynomial time. Neumann and Reichel (2008) showed that multi-objective EAs achieve a -approximation ratio for the minimum multicuts problem in polynomial time.
In the present study, we investigate the approximation ability of EAs by introducing a framework called simple evolutionary algorithm with isolated population (SEIP), which uses an isolation function to manage competition among solutions. By specifying the isolation function, SEIP can be implemented as a single- or multi-objective EA. Multi-objective EAs previously analyzed Neumann and Laumanns (2006); Friedrich et al. (2010); Neumann and Reichel (2008) can be viewed as special cases of SEIP in term of the solutions maintained in the population. By analyzing the SEIP framework, we obtain a general characterization of EAs that guarantee approximation quality.
We then study the minimum set cover problem (MSCP), which is an NP-hard problem Feige (1998). We prove that for the unbounded MSCP, a simple configuration of SEIP efficiently obtains an -approximation ratio (where is the harmonic number of the cardinality of the largest set), the asymptotic lower bound Feige (1998). For the minimum -set cover problem, this approach efficiently yields an -approximation ratio, the currently best-achievable quality Hassin and Levin (2005). Moreover, for a subclass of the minimum -set cover problem, we demonstrate how SEIP, with either one-bit or bit-wise mutation, can overcome the difficulty that limits the greedy algorithm.
The remainder of the paper is organized as follows. After introducing preliminaries in Section 2, we describe SEIP in Section 3 and characterize its behavior for approximation in Section 4. We then analyze the approximation ratio achieved by SEIP for the MSCP in Section 5. In Section 6 we conclude the paper with a discussion of the advantages and limitations of the SEIP framework.
We use bold small letters such as
to represent vectors. We denoteas the set and as the power set of , which consists of all subsets of . We denote for the th harmonic number. Note that since .
In this paper, we consider minimization problems as follows. [Minimization problem] Given an evaluation function and a set of feasibility constraints , find a solution that minimizes while satisfying constraints in . A problem instance can be specified by its parameters .
In the definition of the minimization problem, solutions are represented in binary vectors. When the aim of a minimization problem is to find a subset from a universal set, we can equivalently use a binary vector to represent a subset, where each element of the vector indicates the membership of a corresponding element of the universe set. For example, given a universal set , its subset can be represented by a binary vector , and we define . Considering the equivalence between sets and binary vectors, we apply set operators () to binary vectors when there is no confusion. For example, , , and . We denote as the vector corresponding to the empty set.
We investigate the minimum set cover problem (MSCP), which is an NP-hard problem. [Minimum set cover problem (MSCP)] Given a set of elements and a collection of nonempty subsets of , where each is associated with a positive cost , find a subset such that is minimized with respect to .
Using binary vector representation, we denote an instance of the MSCP by its parameters , where and is the cost vector. The MSCP involves finding a vector , which is equivalent to its set representation , by solving a constrained optimization problem
where is the inner product between vectors and , and denotes a set consisting of elements in the collection that are indicated by binary vector .
[Minimum -set cover problem] An MSCP is a -set cover problem if, for some constant , it holds that for all , denoted as .
A solution is called feasible if it satisfies all the constraints in ; otherwise, it is called an infeasible solution, or a partial solution in this paper. Here, we assume that the evaluation function is defined on the full input space, that is, it evaluates all possible solutions regardless of their feasibility. For example, if we evaluate a solution of the MSCP using , this evaluation function can be calculated for any solution.
Given a minimization problem, we denote as an optimal solution of the problem, and . For a feasible solution , we regard the ratio as its approximation ratio. If the approximation ratio of a feasible solution is upper-bounded by some value , that is,
the solution is called an -approximate solution. An algorithm that guarantees to find an -approximate solution for an arbitrary problem instance in polynomial time is an -approximation algorithm.
The greedy algorithm Chvátal (1979) described here as Algorithm 2 is the most well-known approximation algorithm for the MSCP. This greedy algorithm consists of a sequence of steps. The cost of a candidate set is defined as its weight divided by the number of its elements that have not been covered yet (i.e., the quantity in line 3). The algorithm then picks the candidate set with the smallest cost for the solution (line 4) and marks the newly covered elements (line 6). This simple algorithm yields an -approximation ratio or, more exactly, , where is the cardinality of the largest set Chvátal (1979). The key to the proof of the approximation ratio is the definition of the of elements, as in line 5. The of an element equals the cost of the set that first covers it; therefore, the total of all elements equals the total cost of the solution. Furthermore, it should be noted that when an element is covered by a set with the lowest cost, it would also be covered by one set of an optimal solution but with a higher cost. Therefore the of the element is upper-bounded by the optimal cost and hence the approximation ratio is upper-bounded. For a detailed proof please refer to Chvátal (1979).
[Greedy algorithm Chvátal (1979)]Given a minimum -set cover problem , the greedy algorithm consists of the following steps:
Several studies have shown that the approximation ratio of the MSCP is lower-bounded by unless that is unlikely Raz and Safra (1997); Slavík (1997); Feige (1998); Alon et al. (2006). Therefore, the greedy algorithm achieves the asymptotic lower bound of the approximation ratio for the MSCP.
Although the -approximation ratio is asymptotically tight for the unbounded MSCP, a better approximation ratio can be achieved for the minimum -set cover problem, where is a constant. It has been proved that for the unweighted minimum -set cover problem, an -approximation ratio can be achieved Duh and Fürer (1997) and, if , an improved ratio can be achieved Levin (2008). For the weighted minimum -set cover problem, a greedy-algorithm-with-withdrawals (GAWW) was presented and achieved an -approximation ratio Hassin and Levin (2005).
The GAWW algorithm presented here as Algorithm 2 is a modification of the greedy algorithm. In every iteration, the algorithm chooses between taking a greedy step, which is the same as in the greedy algorithm, and a withdrawal step, which replaces a set in the current solution with at most candidate sets. It evaluates the cost of candidate sets as in the greedy algorithm, and also evaluates the benefit of the withdrawal (calculated in lines 4 and 5). When the benefit of the withdrawal step is not large enough according to the criterion in line 6, the algorithm takes the greedy step, and otherwise takes the withdrawal step. To prove the approximation ratio, the of elements is defined similarly in line 7 for the greedy step and line 10 for the withdrawal step, which is used later in the proofs for this paper.
[GAWW Hassin and Levin (2005)]Given a minimum -set cover problem , GAWW consists of the following steps.
The (1+1)-EA is the simplest EA implementation, as described in Algorithm 2. Starting from a solution generated uniformly at random, the (1+1)-EA repeatedly generates a new solution from the current one using a mutation operator, and the current solution is replaced if the new solution has better (or equal) fitness.
[(1+1)-EA]Given a minimization problem , each solution is encoded as a binary vector of length and the (1+1)-EA-minimizing consists of the following steps.
Two mutation operators are commonly used to implement the “mutate” step in line 3:
One-bit mutation: Flip one randomly selected bit position of to generate .
Bit-wise mutation: Flip each bit of with probability to generate .
It has been shown that the (1+1)-EA has an arbitrarily poor approximation ratio for the MSCP Friedrich et al. (2010). Laumanns et al. (2002) used a multi-objective reformulation with a multi-objective EA named SEMO to achieve a -approximation ratio. The SEMO algorithm is described in Algorithm 2, where two objectives are presented as . To apply SEMO to the MSCP, let evaluate the cost of the solution and evaluate the number of uncovered elements. Thus, SEMO minimizes the cost and the number of uncovered elements of solutions simultaneously. A notable difference between SEMO and (1+1)-EA is that SEMO uses a non-dominance relationship, implemented using the function in SEMO. The population of SEMO maintains non-dominant solutions, that is, no solution is superior to another for both of the two objectives.
[SEMO Laumanns et al. (2002)]Given a two-objective minimization problem , each solution is encoded as a binary vector of length . SEMO minimization of consists of the following steps.
where the function of two solutions is defined such that is if any one of the following three rules is satisfied:
3) and and
and is otherwise.
The SEIP framework is depicted in Algorithm 3. It uses an isolation function to isolate solutions. For some integer , the function maps a solution to a subset of . If and only if two solutions and are mapped to subsets with the same cardinality, that is, , the two solutions compete with each other. In that case, we say the two solutions are in the same isolation, and there are at most isolations since the subsets of have different cardinalities.
[SEIP]Given a minimization problem , an isolation function encodes each solution as a binary vector of length . SEIP minimization of with respect to constraint consists of the following steps.
where the function of two solutions determines whether one solution is superior to the other. This is defined as follows: is if both of the following rules are satisfied:
2) , or but
and is otherwise.
When the isolation function puts all solutions in an isolation for a particular instance, it degrades to the (1+1)-EA. The isolation function can also be configured to simulate the dominance relationship of multi-objective EAs such as SEMO/GSEMO Laumanns et al. (2002) and DEMO Neumann and Reichel (2008). If we are dealing with -objective optimization with discrete objective values, a simple approach is to use one of the objective functions, say , as the fitness function and use the combination of the values of the remaining objective functions (say ) as the isolation functions. Thus, two solutions compete (for ) only when they share the same objective values for . This simulation shows that all the non-dominant solutions of a multi-objective EA are also kept in the SEIP population, and if a non-dominant solution does not reside in the SEIP population, there must be another solution that dominates it. Hence, SEIP can be viewed as a generalization of multi-objective EAs in terms of the solutions retained. This simulation also reveals that SEIP retains more solutions than a multi-objective EA using the dominance relationship. This, on one hand, SEIP takes more time to manipulate a larger population than a multi-objective EA does, which could be overcome by defining an isolation function that aggregates nearby solutions, as has been done for DEMO Neumann and Reichel (2008). On the other hand, SEIP has more opportunities available to find a better approximation solution, since the relationship “ dominates ” does not imply that definitely leads to a better approximation solution than .
Taking the MSCP as an example, we can use the fitness function , which is the sum of costs of the selected sets. For the isolation function, we can use as if is feasible and if is infeasible, which isolates the feasible from the infeasible solutions (and thus ); we can also use the isolation function , and thus the solutions compete only when they cover the same number of elements (and thus ).
The mutation operator can use either one-bit or bit-wise mutation. Usually, one-bit mutation is considered for local searches, while bit-wise mutation is suitable for global searches as it has a positive probability for producing any solution. We denote SEIP with one-bit mutation as LSEIP and SEIP with bit-wise mutation as GSEIP, where “L” and “G” denote “local” and “global”, respectively.
For convenience, SEIP is set to start from solution rather than from a random solution, as commonly done for other EAs. Under the condition that any solution will have a better fitness if any 1 bit is turned to 0, we can bound the difference between random initialization and starting from . From a random solution, SEIP takes at most expected steps to find according to the following argument. Suppose the worst case whereby random initialization generates a solution with all 1 bits; according to the fitness function condition, finding is equivalent to solving the OneMax problem using a randomized local search and the (1+1)-EA, which takes steps for both LSEIP and GSEIP Droste et al. (1998). Furthermore, note that there can be at most solutions in the population, and thus it costs SEIP expected steps to choose one particular solution from the population.
The criterion is not described in the definition of SEIP, since EAs are usually used as anytime algorithms in practice. We now analyze the approximation ratio and the corresponding computational complexity of SEIP.
4 General approximation behavior of SEIP
For minimization problems, we consider linearly additive isolation functions. is a linearly additive isolation function if, for some integer ,
The quality of a feasible solution is measured in terms of the approximation ratio. To measure the quality of a partial (infeasible) solution, we define a partial reference function and partial ratio as follows.
[Partial reference function]
Given a set and a value , a function is a partial reference function if
2) for all such that . For a minimization problem with optimal cost and an isolation function mapping feasible solutions to the set , we denote a partial reference function with respect to and as . When the problem and the isolation function are clear, we omit the subscripts and simply denote the partial reference function as .
[Partial ratio] Given a minimization problem and an isolation function , the partial ratio of a (partial) solution with respect to a corresponding partial reference function is
and the conditional partial ratio of conditioned on is
where and .
The partial ratio is an extension of the approximation ratio. Note that the partial ratio for a feasible solution equals its approximation ratio. We have two properties of the partial ratio. One is that it is non-increasing in SEIP, as stated in Lemma 4, and the other is its decomposability, as stated in Lemma 4.
Given a minimization problem and an isolation function , if SEIP has generated an offspring with partial ratio with respect to a corresponding partial reference function , then there is a solution in the population such that , and the partial ratio of is at most . is put into the population after it is generated; otherwise there is another solution with and , and in this case let . The lemma is proved since and by the superior function the cost is non-increasing.
From Lemma 4, we know that the partial ratio in each isolation remains non-increasing. Since SEIP repeatedly tries to generate solutions in each isolation, SEIP can be considered as optimizing the partial ratio in each isolation.
Given a minimization problem and an isolation function , for three (partial) solutions and such that , we have
with respect to a corresponding partial reference function . Since , we have, by definition,
Thus, we have
Lemma 4 reveals that the partial ratio for a solution is related to the conditional partial ratio of a building block. This can be considered as the way in which SEIP optimizes the partial ratio in each isolation, that is, by optimizing the conditional partial ratio of each building block partial solution. We then have the following theorem.
Given a minimization problem and an isolation function mapping to subsets of , assume that every solution is encoded in an -length binary vector. For some constant with respect to a corresponding partial reference function , if
for every partial solution such that , SEIP takes as the parent solution and generates an offspring partial solution such that and in polynomial time in and ,
then SEIP finds an -approximate solution in polynomial time in and . Starting from , we can find a sequence of partial solutions , such that
because of the conditions. Note that when a partial solution is added to the solution, the offspring solution is in a different isolation to the parent solution. Since the isolation function is linearly additive, the length of the sequence cannot be greater than the number of isolations .
Let be the time expected for SEIP to generate a partial solution in the sequence from its parent solution, which is polynomial to and by the condition. It takes at most expected steps for SEIP to pick the parent, since there are at most solutions in the population. Therefore, the total time to reach a feasible solution is , that is, , which is still polynomial in and .
By Lemma 4, since the feasible solution is composed of and partial solutions , the approximation ratio for is at most as large as the maximum conditional partial ratio for the partial solutions, .
Theorem 4 reveals how SEIP can work to achieve an approximate solution. Starting from the empty set, SEIP uses its mutation operator to generate solutions in all isolations, and finally generates a feasible solution. During the process, SEIP repeatedly optimizes the partial ratio in each isolation by finding partial solutions with better conditional partial ratios. Since the feasible solution can be viewed as a composition of a sequence of partial solutions from the empty set, the approximation ratio is related to the conditional partial ratio of each partial solution.
In Theorem 4, the approximation ratio is upper-bounded by the maximum conditional partial ratio, while some building-block partial solutions may have lower conditional partial ratios but are not utilized. Moreover, in Theorem 4 we restrict SEIP to append partial solutions, while GSEIP can also remove partial solutions using bit-wise mutation. The approximation ratio can have a tighter bound if we consider these two issues. Applying the same principle as for Theorem 4, we present a theorem for GSEIP in particular that leads to a tighter approximation ratio.
[Non-negligible path] Given a minimization problem and an isolation function mapping to subsets of , assume that every solution is encoded in an -length binary vector. A set of solutions is a non-negligible path with ratios and gap if and, for every solution , there exists a solution , where the pair of solutions satisfies
if , .
Given a minimization problem and an isolation function mapping to subsets of , assume that every solution is encoded in an -length binary vector. If there exists a non-negligible path with ratios and gap , then GSEIP finds an -approximate solution in expected time . We prove the theorem by tracking the process of GSEIP over the non-negligible path. We denote by the solution we want to operate on. Initially, , and thus .
GSEIP takes at most expected steps to operate on , since there are at most solutions in the population and GSEIP selects one to operate on in each step.
According to the definition of the non-negligible path, there exists a pair of solutions with respect to such that . We denote . The probability that the mutation operator generates solution is at least , which implies expected steps.
According to the definition of the non-negligible path, suppose that ; we also have . Note that can be decomposed recursively, and thus according to the theorem conditions, we have
Let be a corresponding partial reference function. Thus, .
Given , again according to the definition of the non-negligible path, we have . Then we store solution in the population; otherwise there exists another solution with and has a smaller partial ratio than by Lemma 4 when we substitute for .
Now let be . We have .
After at most iterations of the above update of , we have , which means is feasible. Thus, the partial ratio of , , is its approximation ratio.
Thus, at most jumps are needed to reach a feasible solution, each takes expected steps for operation on a particular solution, and it takes expected steps to choose the particular solution. Overall, it takes expected steps to achieve the feasible solution.
Using Theorem 4 to prove the approximation ratio of GSEIP for a specific problem, we need to find a non-negligible path and then calculate the conditional evaluation function for each jump on the path. One way of finding a non-negligible path for a problem is to follow an existing algorithm for the problem. This will lead to a proof that the EA can achieve the same approximation ratio by simulating the existing algorithm. Similar ideas have been used to confirm that EAs can simulate dynamic programming Doerr et al. (2009). In addition, note that the concrete form of the partial reference function is not required in the theorem.
5 SEIP for the MSCP
To apply SEIP to the MSCP, we use the fitness function
which has the objective of minimizing the total weight. For a solution , we denote , that is, is the set of elements covered by . We use the isolation function
which, owing to the effect of the isolation function, makes two solutions compete only when they cover the same number of elements. We could regard a partial reference function of to be the minimum price that optimal solutions pay for covering the same number of elements covered by , although it is not necessary to calculate the partial reference function.
Instead of directly assessing a minimum -set cover problem , we analyze EAs for the extended input Hassin and Levin (2005). The original problem is extended by first taking a closure of under the subset operation, that is, , and the weight vector is extended accordingly by if . Then if an optimal solution contains a set with less than elements, we construct a new problem instance in which to are added a minimum number of dummy elements such that all sets of the optimal solution are filled to be -sets using dummy elements while keeping their weights unchanged. Therefore, the extended problem has an optimal solution containing -sets. Analysis on the extended input leads to the same result as for the original problem, as shown in Lemma 5. The lemma is derived from Lemmas 2 and 3 of Hassin and Levin (2005).
The extended input does not affect the optimal cost or the optimal solutions. We can then assume without loss of generality that an optimal solution consists of -sets and is disjoint, that is, for all and and for all .
Thus, an optimal solution can be represented as a matrix of elements, as plotted in Figure 1, where column corresponds to the elements in . Note that there are exactly rows in , since each set in an optimal solution contains exactly elements. For an element , we denote as the column to which belongs, that is, the set in that contains . We also denote as the cost of , and as the number of uncovered elements in column at the time at which is covered.
5.1 SEIP ratio for the (unbounded) MSCP
In Theorem 5.1 we show that SEIP achieves an -approximation ratio, where is the cardinality of the largest set. It has been proved that SEMO Friedrich et al. (2010) achieves an -approximation ratio for the MSCP, which is known as the asymptotic lower bound for the problem. The theorem confirms that SEIP can simulate multi-objective EAs in terms of the solution retained, so that SEIP is able to achieve the approximation ratio obtained by multi-objective EAs.
Given an MSCP where , GSEIP finds an -approximate solution in expected time , where is the size of the largest set in .
Given an MSCP and an arbitrary partial solution , let with respect to . For every element , there exists a set of an optimal solution that covers , and it holds that,
We denote as the current solution to be operated on. We find where the set minimizes with respect to . Let . Thus, we have and for partial solution .
By Lemma 5.1, for all , there exists a set of an optimal solution that covers , and suppose
In the worst case, , that is, the added set only covers one uncovered element . In this case, according to the definition of , we have . We then have
Thus, we find a non-negligible path with gap 1 and sum of ratios
where denotes the partial solution that will cover in its next step, and is the size of the largest set in .
By Theorem 4, GSEIP finds an -approximate solution. Note that the isolation function maps to at most isolations, the non-negligible path has a constant gap of 1, and the solution is encoded in an -length binary vector; thus, GSEIP takes expected time .
Given an MSCP where , LSEIP finds an -approximate solution in expected time, where is the size of the largest set in .
5.2 Ratio for GSEIP for the minimum -set cover problem
In this section, we prove in Theorem 5.2 that GSEIP achieves the same -approximation ratio as GAWW (Algorithm 2) Hassin and Levin (2005), the current best algorithm for the minimum -set cover problem. This result reveals that when the problem is bounded, which is very likely in real-world situations, GSEIP can yield a better approximation ratio than in the unbounded situation. Since the greedy algorithm cannot achieve an approximation ratio lower than , the result also implies that GSEIP has essential non-greedy behavior for approximations.
Given a minimum -set cover problem , where , we denote . GSEIP using finds an -approximate solution in expected time .
When applying the GAWW rule to select sets, we use Lemmas 5.2 and 5.2. Owing to the assignments of , the total price of elements covered by equals the cost of . We say a set is last-covered if . [Lemma 4 of Hassin and Levin (2005)] In each step of GAWW, its partial solution is disjoint. [Lemma 5 of Hassin and Levin (2005)] In each step of GAWW, we can assume without loss of generality that the set added to (i.e., or ) has no more than one element in common with every set in .
Given a minimum -set cover problem and an arbitrary partial solution , if the GAWW rule uses a withdrawal step to add sets to and withdrawal , denoting , then
Given a minimum -set cover problem and an arbitrary partial solution , if the GAWW rule selects a set that is not last-covered to add to (greedy step), and there is an element such that
Given a minimum -set cover problem and an arbitrary partial solution , if the GAWW rule selects a set that is not last-covered to add to (greedy step), followed by another set , and for all elements for which
Given a minimum -set cover problem and an arbitrary partial solution , if the GAWW rule selects a set that is last-covered to add to (greedy step), then
of Theorem 5.2 We find a path of isolations following the GAWW rule. Note that there are at most isolations.
Since the GAWW rule covers at least one uncovered element, we have .