In this paper, we delve into two naturally-inspired algorithms, Particle Swarm Optimization (PSO) (Eberhart and Kennedy, 1995) and Differential Evolution (DE) (Storn and Price, 1995) for solving continuous black-box optimization problems , which is subject to minimization without loss of generality. Here we only consider simple box constraints on , meaning the search space is a hyper-box .
In the literature, a huge number of variants of PSO and DE has been proposed to enhance the empirical performance of the respective algorithms. Despite the empirical success of those variants, we, however, found that most of them only differ from the original PSO/DE in one or two operators (e.g., the crossover), where usually some simple modifications are implemented. Therefore, it is almost natural for us to consider combinations of those variants. Following the so-called configurable CMA-ES approach (van Rijn et al., 2016, 2017), we first modularize both PSO and DE algorithms, resulting in a modular framework where different types of algorithmic modules are applied sequentially in each generation loop. When incorporating variants into this modular framework111The source code is available at https://github.com/rickboks/pso-de-framework., we first identify the modules at which modifications are made in a particular variant, and then treat the modifications as options of the corresponding modules. For instance, the so-called inertia weight (Shi and Eberhart, 1998), that is a simple modification to the velocity update in PSO, shall be considered as an option of the velocity update module.
This treatment allows for combining existing variants of either PSO or DE and generating non-existing algorithmic structures. It, in the loose sense, creates a space/family of swarm algorithms, which is configurable via instantiating the modules, and hence potentially primes the application of algorithm selection/configuration (Thornton et al., 2013) to swarm intelligence. More importantly, we also propose a meta-algorithm called PSODE that hybridizes the variation operators from both PSO and DE, and therefore gives rise to an even larger space of unseen algorithms. By hybridizing PSO and DE, we aim to unify the strengths from both sides, in an attempt to, for instance, improve the population diversity and the convergence rate. On the well-known Black-Box Optimization Benchmark (BBOB) (Hansen et al., 2016) problem set, we extensively tested all combinations of four different velocity updates (PSO), five neighborhood topologies (PSO), two crossover operators (DE), five mutation operators (DE), and four selection operators, leading up to algorithms. We benchmark those algorithms on all test functions from the BBOB problem set and analyze the experimental results using the so-called IOHprofiler (Doerr et al., 2019), to identify algorithms that perform well on (a subset of) the 24 test functions.
This paper is organized as follows: Section 2 summarizes the related work. Section 3 reviews the state-of-the-art variants of PSO. Section 4 covers various cutting-edge variants of DE. In Section 5, we describe the novel modular PSODE algorithm. Section 6 specifies the experimental setup on the BBOB problem set. We discuss the experimental results in Section 7 and finally provide, in Section 8, the insights obtained in this paper as well as future directions.
2. Related Work
A hybrid PSO/DE algorithm has been coined previously (Wen-Jun Zhang and Xiao-Feng Xie, 2003) to improve the population diversity and prevent premature convergence. This is attempted by using the DE mutation instead of the traditional velocity- and position-update to evolve candidate solutions in the PSO algorithm. This mutation is applied to the particle’s best-found solution rather than its current position , resulting in a steady-state strategy. Another approach (Hendtlass, 2001) follows the conventional PSO algorithm, but occasionally applies the DE operator in order to escape local minima. Particles maintain their velocity after being permuted by the DE operator. Other PSO/DE hybrids include a two-phase approach (Pant et al., 2008) and a Bare-Bones PSO variant based on DE (Omran et al., 2007), which requires little parameter tuning.
This work follows the approach of the modular and extensible CMA-ES framework proposed in (van Rijn et al., 2016), where many ES-structures
can be instantiated by arbitrarily combining existing variations of the CMA-ES. The authors of this work implement a Genetic Algorithm to efficiently evolve the ES structures, instead of performing an expensive brute force search over all possible combinations of operators.
3. Particle Swarm Optimization
As introduced by Eberhart and Kennedy (Eberhart and Kennedy, 1995), Particle Swarm Optimization (PSO) is an optimization algorithm that mimics the behaviour of a flock of birds foraging for food. A particle in a swarm of size
is associated with three vectors: the current position, velocity , and its previous best position , where . After the initialization of and , where is initialized randomly and is set to , the algorithm iteratively controls the velocity for each particle (please see the next subsection) and moves the particle accordingly:
To prevent the velocity from exploding, is kept in the range ( is a vector containing all ones). After every position update, the current position is evaluated, . Here, stands for the best solution found by (thus personal best) while is used to track the best solution found in the neighborhood of (thus global best). Typically, the termination of PSO can be determined by simple termination criteria, such as the depletion of the function evaluation budget, as well as more complicated ones that reply on the convergence behavior, e.g., detecting whether the average distance between particles has gone below a predetermined threshold. The pseudo-code is given in Alg. 1.
3.1. Velocity Updating Strategies
As proposed in the original paper (Eberhart and Kennedy, 1995), the velocity vector in original PSO is updated as follows:
where stands for a continuous uniform random vector with each component distributed uniformly in the range , and is component-wise multiplication. Note that, henceforth the parameter settings such as will be specified in the experimentation part (Section 6). As discussed before, velocities resulting from Eq. (2) have to be clamped in range . Alternatively, the inertia weight (Shi and Eberhart, 1998) is introduced to moderate the velocity update without using :
A large value of will result in an exploratory search, while a small value leads to a more exploitative behavior. It is suggested to decrease the inertia weight over time as it is desirable to scale down the explorative effect gradually. Here, we consider the inertia method with fixed as well as decreasing weights.
Instead of only being influenced by the best neighbor, the velocity of a particle in the Fully Informed Particle Swarm (FIPS) (Mendes et al., 2004) is updated using the best previous positions of all its neighbors. The corresponding equation is:
where is the number of neighbors of particle and . Finally, the so-called Bare-Bones PSO (Kennedy, 2003) is a completely different approach in the sense that velocities are not used at all and instead every component () of position
is sampled from a Gaussian distribution with mean
and variance, where and are the th component of and , respectively:
3.2. Population Topologies
Five different topologies from the literature have been implemented in the framework:
lbest (local best) (Eberhart and Kennedy, 1995) takes a ring topology and each particle is only influenced by its two adjacent neighbors.
gbest (global best) (Eberhart and Kennedy, 1995) uses a fully connected graph and thus every particle is influenced by the best particle of the entire swarm.
In the Von Neumann topology (Kennedy and Mendes, 2002), particles are arranged in a two-dimensional array and have four neighbors: the ones horizontally and vertically adjacent to them, with toroidal wrapping.
The increasing topology (Suganthan, 1999) starts with an lbest topology and gradually increases the connectivity so that, by the end of the run, the particles are fully connected.
The dynamic multi-swarm topology (DMS-PSO) (Liang and Suganthan, 2005) creates clusters consisting of three particles each, and creates new clusters randomly after every iterations. If the population size is not divisible by three, every cluster has size three, except one, which is of size .
4. Differential Evolution
Differential Evolution (DE) is introduced by Storn and Price in 1995 (Storn and Price, 1995) and uses scaled differential vectors between randomly selected individuals for perturbing the population. The pseudo-code of DE is provided in Alg. 3.
After the initialization of the population (please see the next subsection) ( is again the swarm size), for each individual , a donor vector (a.k.a. mutant) is generated according to:
where three distinct indices are chosen uniformly at random (u.a.r.). Here is a scalar value called the mutation rate and is referred as the base vector. Afterwards, a trial vector is created by means of crossover.
In the so-called binomial crossover, each component () of is copied from
with a probability(a.k.a. crossover rate), or when equals an index chosen u.a.r.:
In exponential crossover, two integers , are chosen. The integer acts as the starting point where the exchange of components begins, and is chosen uniformly at random. represents the number of elements that will be inherited from the donor vector, and is chosen using Algorithm 2.
The trial vector is generated as:
The angular brackets denote the modulo operator with modulus . Elitism selection is applied between and , where the better one is kept for the next iteration.
In addition to the so-called DE/rand/1 mutation operator (Eq. 6), we also consider the following variants:
DE/best/1 (Storn and Price, 1995): the base vector is chosen as the current best solution in the population :
DE/best/2 (Storn and Price, 1995): two differential vectors calculated using four distinct solutions are scaled and combined with the current best solution:
DE/Target-to-best/1 (Storn and Price, 1995): the base vector is chosen as the solution on which the mutation will be applied and the difference from the current best to this solution is used as one of the differential vectors:
Target-to-best/1 (Jingqiao Zhang and Sanderson, 2007): the same as above except that we take instead of the current best a solution that is randomly chosen from the top 100 solutions in the population with .
DE/2-Opt/1 (Chiang et al., 2010):
4.2. Self-Adaptation of Control Parameters
The performance of the DE algorithm is highly dependent on values of the parameters and , for which the optimal values are in turn dependent on the optimization problem at hand. The self-adaptive DE variant JADE (Jingqiao Zhang and Sanderson, 2007) has been proposed in desire to control the parameters in a self-adaptive manner, without intervention of the user. This self-adaptive parameter scheme is used in both DE and hybrid algorithm instances.
5. Hybridizing PSO with DE
Here, we propose a hybrid algorithm framework called PSODE, that combines the mutation- and crossover operators from DE with the velocity- and position updates from PSO. This implementation allows combinations of all operators mentioned earlier, in a single algorithm, creating the potential for a large number of possible hybrid algorithms. We list the pseudo-code of PSODE in Alg. 4, which works as follows.
The initial population ( stands for the swarm size) is sampled uniformly at random in the search space, and the corresponding velocity vectors are initialized to zero (as suggested in (Engelbrecht, 2012)).
After evaluating , we create by applying the PSO position update to each solution in .
Similarly, is created by applying the DE mutation to each solution in .
Then, a population of size is generated by recombining information among the solutions in and , based on the DE crossover.
Finally, a new population is generated by selecting good solutions from and (please see below).
Four different selection methods are considered in this work, two of which are elitist, and two non-elitist. A problem arises during the selection procedure: solutions from have undergone the mutation and crossover of DE that alters their positions but ignores the velocity thereof, leading to an unmatched pair of positions and velocities. In this case, the velocities that these particles have inherited from may no longer be meaningful, potentially breaking down the inner workings of PSO in the next iteration. To solve this issue, we propose to re-compute the velocity vector according to the displacement of a particle resulting from mutation and crossover operators, namely:
where is generated by using aforementioned procedure.
A selection operator is required to select particles from , , and for the next generation. Note that is not considered in the selection procedure, as the solution vectors in this population were recombined and stored in . We have implemented four different selection methods: two of those methods only consider population , resulting from variation operators of PSO, and population , obtained from variation operators of DE. This type of selection methods is essentially non-elitist allowing for deteriorations. Alternatively, the other two methods implement elitism by additionally taking population into account.
We use the following naming scheme for the selection methods:
Using this scheme, we can distinguish the four selection methods: pairwise/2, pairwise/3, union/2, and union/3. The “pairwise” comparison method means that the -th members (assuming the solutions are indexed) of each considered population are compared to each other, from which we choose the best one for the next generation. The “union” method selects the best solutions from the union of the considered populations. Here, a “2” signals the inclusion of two populations, and , and a “3” indicates the further inclusion of . For example, the pairwise/2 method selects the best individual from each pair of and , while the union/3 method selects the best individuals from .
A software framework has been implemented in C++ to generate PSO, DE and PSODE instances from all aforementioned algorithmic modules, e.g. topologies and mutation strategies. Such a framework is tested on IOHprofiler, which contains the functions from BBOB/COCO (Hansen et al., 2016) that are organized in five function groups: 1) Separable functions 2) Functions with low or moderate conditioning 3) Unimodal functions with high conditioning 4) Multi-modal functions with adequate global structure and 5) Multi-modal functions with weak global structure.
In the experiments conducted, a PSODE instance is considered as a combination of five modules: velocity update strategy, population topology, mutation method, crossover method, and selection method. Combining each option for each of these five modules, we obtain a total of different PSODE instances.
By combining the velocity update strategies and topologies, we obtain PSO instances, and similarly we obtain DE instances.
Naming Convention of Algorithm Instances
As each PSO, DE, and hybrid instance can be specified by the composing modules, it is named using the abbreviations of its modules: hybrid instances are named as follows:
PSO instances are named as:
And DE instances are named as:
Options of all modules are listed in Table 1.
The following parameters are used throughout the experiment:
Function evaluation budget: .
Population (swarm) size: is used for all algorithm instances, due to the relatively consistent performance that instances show across different function groups and dimensionalities when using this value.
Hyperparameters in PSO: In Eq. (2) and (3), is taken as recommended in (Clerc and Kennedy, 2002) and for FIPS (Eq. (4)), a setting is adopted from (Mendes et al., 2004). In the fixed inertia strategy, is set to while in the decreasing inertia strategy, is linearly decreased from to . For the Target-to-best/1 mutation scheme, a value of is chosen, following the findings of (Jingqiao Zhang and Sanderson, 2007).
Hyperparameters in DE: and are managed by the JADE self-adaptation scheme.
Number of independent runs per function: . Note that only one function instance (instance “1”) is used for each function.
Performance measure: expected running time (ERT) (Price, 1997), which is the total number of function evaluations an algorithm is expected to use to reach a given target function value for the first time. ERT is defined as , where denotes the total number of function evaluations taken to hit in all runs, while might not be reached in every run, and denotes the number of successful runs.
To present the result, we rank the algorithm instances with regard to their ERT values. This is done by first ranking the instances on the targets of every benchmark function, and then taking the average rank across all targets per function. Finally, the presented rank is obtained by taking the average rank over all test functions. This is done for both dimensionalities. A dataset containing the running time for each independent run and ERT’s for each algorithm instance, with the supporting scripts, are available at (Boks et al., 2020).
|B – Bare-Bones PSO||B1 – DE/best/1|
|F – Fully-informed PSO (FIPS)||B2 – DE/best/2|
|I – Inertia weight||T1 – DE/target-to-best/1|
|D – Decreasing inertia weight||PB – DE/target-to-best/1|
|[crossover]||O1 – 2-Opt/1|
|B – Binomial crossover||[selection]|
|E – Exponential crossover||U2 – Union/2|
|[topology]||U3 – Union/3|
|L – (ring)||P2 – Pairwise/2|
|G – (fully connected)||P3 – Pairwise/3|
|N – Von Neumann|
|I – Increasing connectivity|
|M – Dynamic multi-swarm|
depicts the Empirical Cumulative Distribution Functions (ECDF) of the top-highest ranked algorithm instances in both -D and -D. Due to overlap, only algorithms are shown. Tables 2 and 3
show the the Estimated Running Times of the 10 highest ranked instances, and the 10 ranked in the middle in-D and -D, respectively. ERT values are normalized using the corresponding ERT values of the state-of-the-art Covariance Matrix Adaptation Evolution Strategy (CMA-ES).
Though many more PSODE instances were tested, DE instances generally showed the best performance in both 5-D and 20-D. All PSO instances were outperformed by DE and many PSODE instances. This is no complete surprise, as several studies (e.g. in (Vesterstrom and Thomsen, 2004; Iwan et al., 2012)) demonstrated the relative superiority of DE over PSO.
Looking at the ranked algorithm instances, it is clear to see that some modules are more successful than others. The (decreasing) inertia weight velocity update strategies are dominant among the top-performing algorithms, as well as pairwise/3 selection and binomial crossover. Target-to-best/1 mutation is most successful in 5-D while target-to-best/1 seems a better choice in 20-D. This is surprising, as one may expect the less greedy target-to-best/1 mutation to be more beneficial in higher-dimensional search spaces, where it is increasingly difficult to avoid getting stuck in local optima. The best choice of selection method is convincingly pairwise/3. This seems to be one of the most crucial modules for the PSODE algorithm, as most instances with any other selection method show considerably worse performance. This seemingly high importance of an elitist strategy suggests that the algorithm’s convergence with non-elitist selection is too slow, which could be due to the application of two different search strategies. The instances H_I_*_PB_B_P3 and H_I_*_T1_B_P3 appear to be the most competitive PSODE instances, with the topology choice having little influence on the observed performance. The most highly ranked DE instances are DE_T1_B and D_PB_B, in both dimensionalities. Binomial crossover seems superior to the exponential counterpart, especially in 20 dimensions.
Interestingly, the PSODE and PSO algorithms “prefer” different module options. As an example, the Fully Informed Particle Swarm works well on PSO instances, but PSODE instances perform better with the (decreasing) inertia weight. Bare-Bones PSO showed the overall poorest performance of the four velocity update strategies.
Notable is the large performance difference between the worst and best generated algorithm instances. Some combinations of modules, as to be expected while arbitrarily combining operators, show very poor performance, failing to solve even the most trivial problems. This stresses the importance of proper module selection.
8. Conclusion and Future Work
We implement an extensible and modular hybridization of PSO and DE, called PSODE, in which a large number of variants from both PSO and DE is incorporated as module options. Interestingly, a vast number of unseen swarm algorithms can be easily instantiated from this hybridization, paving the way for designing and selecting appropriate swarm algorithms for specific optimization tasks. In this work, we investigate, on benchmark functions from BBOB, PSO variants, DE variants, and PSODE instances resulting from combining the variants of PSO and DE, where we identify some promising hybrid algorithms that surpass PSO but fail to outperform the best DE variants, on subsets of BBOB problems. Moreover, we obtained insights into suitable combinations of algorithmic modules. Specifically, the efficacy of the target-to-()best mutation operators, the (decreasing) inertia weight velocity update strategies, and binomial crossover was demonstrated. On the other hand, some inefficient operators, such as Bare-Bones PSO, were identified. The neighborhood topology appeared to have the least effect on the observed performance of the hybrid algorithm.
The future work lies in extending the hybridization framework. Firstly, we are planning to incorporate the state-of-the-art PSO and DE variants as much as possible. Secondly, we shall explore alternative ways of combining PSO and DE. Lastly, it is worthwhile to consider the problem of selecting a suitable hybrid algorithm for an unseen optimization problem, taking the approach of automated algorithm selection.
Hao Wang acknowledges the support from the Paris Île-de-France Region.
- Cited by: §6.
- A 2-opt based differential evolution for global optimization. Applied Soft Computing 10 (4), pp. 1200 – 1207. Note: Optimisation Methods & Applications in Decision-Making Processes External Links: Cited by: item 5.
- The particle swarm - explosion, stability, and convergence in a multidimensional complex space. IEEE Transactions on Evolutionary Computation 6 (1), pp. 58–73. External Links: Cited by: 3rd item.
Benchmarking discrete optimization heuristics with iohprofiler. Applied Soft Computing, pp. 106027. Cited by: §1.
- A new optimizer using particle swarm theory. Proceedings of the sixth international symposium on micro machine and human science, pp. 39––43. Cited by: §1, 1st item, 2nd item, §3.1, §3.
- Particle swarm optimization: velocity initialization. In 2012 IEEE Congress on Evolutionary Computation, Vol. , pp. 1–8. External Links: Cited by: item 1.
- COCO: a platform for comparing continuous optimizers in a black-box setting. ArXiv e-prints arXiv:1603.08785. Cited by: §1, §6, Figure 1.
A combined swarm differential evolution algorithm for optimization problems.
Proceedings of the 14th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems: Engineering of Intelligent Systems, IEA/AIE ’01, Berlin, Heidelberg, pp. 11–18. External Links: Cited by: §2.
- Performance comparison of differential evolution and particle swarm optimization in constrained optimization. Procedia Engineering 41, pp. 1323 – 1328. Note: International Symposium on Robotics and Intelligent Sensors 2012 (IRIS 2012) External Links: Cited by: §7.
- JADE: self-adaptive differential evolution with fast and reliable convergence performance. In 2007 IEEE Congress on Evolutionary Computation, Vol. , pp. 2251–2258. External Links: Cited by: item 4, §4.2, 3rd item.
- Population structure and particle swarm performance. In Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No.02TH8600), Vol. 2, pp. 1671–1676 vol.2. External Links: Cited by: 3rd item.
- Bare bones particle swarms. In Proceedings of the 2003 IEEE Swarm Intelligence Symposium. SIS’03 (Cat. No.03EX706), Vol. , pp. 80–87. External Links: Cited by: §3.1.
- Dynamic multi-swarm particle swarm optimizer. In Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005., Vol. , pp. 124–129. External Links: Cited by: 5th item.
- The fully informed particle swarm: simpler, maybe better. IEEE Transactions on Evolutionary Computation 8 (3), pp. 204–210. External Links: Cited by: §3.1, 3rd item.
- Differential evolution based particle swarm optimization. In 2007 IEEE Swarm Intelligence Symposium, Vol. , pp. 112–119. External Links: Cited by: §2.
- Hybrid differential evolution - particle swarm optimization algorithm for solving global optimization problems. In 2008 Third International Conference on Digital Information Management, Vol. , pp. 18–24. External Links: Cited by: §2.
- Differential evolution vs. the functions of the 2/sup nd/ iceo. In Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC ’97), Vol. , pp. 153–157. External Links: Cited by: 6th item.
- A modified particle swarm optimizer. In 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360), Vol. , pp. 69–73. External Links: Cited by: §1, §3.1.
- Differential evolution: a simple and efficient adaptive scheme for global optimization over continuous spaces. Journal of Global Optimization 23, pp. . Cited by: §1, item 1, item 2, item 3, §4.
- Particle swarm optimiser with neighbourhood operator. In Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), Vol. 3, pp. 1958–1962 Vol. 3. External Links: Cited by: 4th item.
- Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’13, New York, NY, USA, pp. 847–855. External Links: Cited by: §1.
- Evolving the structure of evolution strategies. In 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Vol. , pp. 1–8. External Links: Cited by: §1, §2.
- Algorithm configuration data mining for cma evolution strategies. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’17, New York, NY, USA, pp. 737–744. External Links: Cited by: §1.
A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems. In Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753), Vol. 2, pp. 1980–1987 Vol.2. External Links: Cited by: §7.
- DEPSO: hybrid particle swarm with differential evolution operator. In SMC’03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483), Vol. 4, pp. 3816–3821 vol.4. External Links: Cited by: §2.