Acronyms
CEC  Congress of Evolutionary Computation 
CS  Cuckoo Search 
DE  Differential Evolution 
DEa  Adaptive Variant of DE 
GA  Genetic Algorithm 
LHS  Latin Hypercube Sampling 
PSO  Particle Swarm Optimization 
PSOw  PSO with an Inertia Weight 
1 Introduction
Many realworld optimization problems are very complex, subject to multiple nonlinear constraints. Such nonlinearity and multimodality can cause difficulties in solving these optimization problems. Both empirical observations and numerical simulations suggest that the final solution may depend on the initial starting points for multimodal optimization problems (Yang et al., 2018; Eskandar et al., 2012). This is especially true for gradientbased methods. In addition, for problems with nonsmooth objective functions and constraints, gradient information may not be available. Hence, most traditional optimization methods struggle to cope with such challenging issues. A good alternative is to use metaheuristic optimization algorithms, such as particle swarm optimization (PSO) and cuckoo search (CS). These metaheuristic optimizers are gradientfree optimizers, which do not require any prior knowledge or rigorous mathematical properties, such as continuity and smoothness (Yang et al., 2018; Li et al., 2016).
In the past decade, various studies have shown that these metaheuristic algorithms are effective in solving different types of optimization problems, including noisy and dynamic problems (Yang et al., 2018; Sun et al., 2019; Fan et al., 2018; Cheng et al., 2018). For example, engineering design problems can be solved by an improved variant of the PSO (Isiet and Gadala, 2019) and the connectivity of the internet of things (IoT) can be enhanced by a multiswarm optimization algorithm (Hasan and AlRizzo, 2019). In addition, the optimized energy consumption model for smart homes can be achieved by differential evolution (DE) (Essiet et al., 2019), while the optimal dam and reservoir operation can be achieved by a hybrid of the bat algorithm (BA) and PSO (Yaseen et al., 2019). A fuzzydriven genetic algorithm (Jacob et al., 2009) was used to solved a sequence segmentation problem, and a fuzzy genetic clustering algorithm was used to solve a dataset partition problem (Nguyen and Kuo, 2019).
Almost all algorithms for optimization require some forms of initialization, where some educated guess or random initial solutions are generated. Ideally, the final optimal solutions found by algorithms should be independent on their initial choices. This is only true for a few special cases such as linear programs and convex optimization; however, a vast majority of problems are not linear or convex, thus such dependency can be a challenging issue. In fact, most algorithms will have different degrees of dependency on their initial setting, and the actual dependency can be problemspecific and algorithmspecific
Yang (2014); Kondamadugula and Naidu (2016). For largescale and multimodal problems, the effect of initialization is more obvious, and many algorithms may show differences in the probability of finding global optima on different initialization (Elsayed et al., 2017).However, it still lacks a systematical study of initialization and how the initial distributions may affect the performance of algorithms under a given set of problems. The good news is that researchers start to realize the importance of initialization and have started to explore other possibilities with the aim to increase the diversity of the initial population (Yang, 2014). For example, based on the guiding principle of covering the search space as uniformly as possible, some studies have preliminarily explored certain ideas of different initialization methods, including quasirandom initialization (Kimura and Matsumura, 2005; Ma and Vandenbosch, 2012; Kazimipour et al., 2014; Maaranen et al., 2004), chaotic systems (Gao and Liu, 2012; Alatas, 2010), antisymmetric learning methods (Rahnamayan et al., 2008), and Latin hypercube sampling (Ran et al., 2017; Zaman et al., 2016)
. In some cases, these studies have improved the performance of algorithms such as PSO and genetic algorithms (GA), but there are still some serious issues. Specifically, quasirandom initialization is simple and easy to implement, but it suffers from the curse of dimensionality
(Maaranen et al., 2004); for chaosbased approaches, random sequences are generated by a few chaotic maps and fewer parameters (initial conditions), but they can inevitably have very sensitive dependence upon their initial conditions under certain conditions (dos Santos Coelho and Mariani, 2008). In addition, in the antisymmetric learning method, twice the number of the population as the solution cohorts are used so as to select the solutions for the next generation, which doubles the computational cost. Though the Latin hypercube sampling is very effective at low dimensions, its performance can deteriorate significantly for higherdimensional problems. We will discuss this issue in more detail later in this paper.On the other hand, some researchers attempted to design some specific type of initialization in combination with a certain type of algorithm so as to solve a particular type of problems more efficiently. For example, Kondamadugula et al. (Kondamadugula and Naidu, 2016)
used a special sampling evolutionary algorithm and random sampling evolutionary algorithm to estimate parameters concerning digital integrated circuits; Li et al.
(Li et al., 2015) applied knowledgebased initialization to improve the performance of the genetic algorithm for solving the traveling salesman problem; Li et al. (Li et al., 2019) used the degrees of nodes to initialization for network disintegration problem, and Puralachetty et al. (Puralachetty and Pamula, 2016) proposed a twostage initialization approach for a PID controller tuning in a coupled tankliquid system. However, these approaches do have some drawbacks. Firstly, such initialization requires sophisticated allocation of points, which may not be straightforward to implement and can thus increase the computational costs. Secondly, they may be suitable only for a particular type of problems or algorithms. Thirdly, such initialization is largely dependent on the experience of the user. Finally, there is no mathematical guidance about the ways of initialization in practice.This motivates us to carry out a systematic study of different initialization methods and their effects on the algorithmic performance. The choice of 22 probability distributions are based on rigorous probability theory with the emphasis on different statistical properties. In addition, we have used five different metaheuristic optimization algorithms for this study, and they are differential evolution (DE), particle swarm optimization (PSO), cuckoo search (CS), artificial bee colony (ABC) algorithm and genetic algorithm (GA). There are over 100 different algorithms and variants in the literature
(Yang et al., 2018; Eskandar et al., 2012; Zaman et al., 2016), it is not possible to compare a good fraction of these algorithms. Therefore, the choice of algorithm has to focus on different search characteristics and representativeness of algorithms in the current literature. Differential evolution is a good representative of evolutionary algorithms, while particle swarm optimization is considered as the main optimizer of swarm intelligence based algorithms. In addition, the cuckoo search uses a longtailed, Lévy flightsbased search mechanism that has been shown to be more efficient in exploring the search space. Furthermore, artificial bee colony is used to represent the beebased algorithms, while the genetic algorithm has been considered as a cornerstone for a vast majority of evolutionary algorithms.Based on the simulations and analyses below, we can highlight the features and contributions of this paper as follows:

Numerical experiments show that, under the same condition of the maximum number of fitness evaluations(FEs), some algorithms require a large number of populations to reach the optimal solution, while others can find the optimal solution through multiple iterations under a small number of populations. In this paper, we make some recommendations concerning the number of the initial population and the maximum number of iterations of the five algorithms.

The initialization of 22 different probability distributions and their influence on the performance of the algorithm are studied systematically. It is found that some algorithms such as the differential evolution are not significantly affected by initialization, while others such as the particle swarm optimization are more sensitive to initialization. This may be related to the design mechanisms of these algorithms themselves, which is also an important indicator to measure the robustness of algorithms.

For the five algorithms under consideration, we have used a statistical ranking technique, together with a correlation test, to gain insight into the appropriate initialization methods for given benchmark functions.
Therefore, the rest of this paper is organized as follows. Section 2 briefly introduces the fundamentals of the three metaheuristic optimizers with some brief discussions of the other two optimizers, followed by the discussion of motivations and details of initialization methods in Section 3. Experimental results are presented in Section 4, together with the comparison of different initialization methods on some benchmark functions, including commonly used benchmarks and some recent CEC functions. Further experiments concerning key parameters of different algorithms are also carried out. Then, Section 5 discusses the correlation between the distributions of the initial population and their corresponding final solutions. Finally, Section 6 concludes with discussions about further research directions.
2 Metaheuristic Optimizers
Though traditional optimization algorithms can work well for local search, metaheuristic optimization algorithms have some main advantages for global optimization because they usually treat the problem as a blackbox and thus can be flexible and easy to use (Yang, 2014). Furthermore, such optimizers do not have strict mathematical requirements (e.g., differentiability, smoothness), so they are suitable for problems with different properties, including discontinuities and nonlinearity. Various studies have shown their effectiveness in different applications (Yang, 2014; Aljarah et al., 2020; Yin et al., 2019).
The initialization of a vast majority of metaheuristic optimization algorithms has been done by using uniform distributions. Although this approach is easy to implement, empirical observations suggest that uniform distributions may not be the best option in all applications. It is highly needed to study initialization systematically using different probability distributions. As there are many optimization algorithms, it is not possible to study all of them. Thus, this paper will focus on five algorithms: differential evolution (DE), particle swarm optimization (PSO), cuckoo search (CS), artificial bee colony (ABC) and genetic algorithm (GA). These algorithms are representative, due to the different search mechanisms and their richer characteristics.
2.1 Differential Evolution
Differential evolution (DE) is a representative evolutionary and heuristic algorithm
(Storn and Price, 1997), which has been used in many applications such as optimization, machine learning and pattern recognition
(Liu and Lampinen, 2005). Though differential evolution has a strong global search capability with a relatively high convergence rate for unimodal problems, the performance of DE can depend on its parameter setting. For highly nonlinear problems, its convergence rate can be low. To overcome such limitations, various mutation strategies and adaptive parameter control for have been proposed to improve its performance(Zhang and Sanderson, 2009). In the DE algorithm, each individual is a candidate solution or a point in the dimensional search space, and the th individual can be represented as. In essence, different mutation strategies typically generate a mutation vector
by modifying the current solution vector in different ways.Crossover is another strategy of modifying a solution. For example, the binomial crossover is a componentwise modification, controlled by a crossover parameter , which takes the following form:
(1) 
where is the th dimension of the th individual solution. The updated vector can be expressed as after the mutation step, and corresponds to the th dimension of the th individual after crossover.
Among various variants of DE, Qin et al. (Qin et al., 2009) proposed a selfadaptive DE (SaDE) variant with four mutation strategies in its pool, which can be selected at different generations by a given criterion. More specifically, according to the success and failure of each mutation, a fixed learning period (LP) was used to update the probability of each mutation strategy being selected for the next generation. In addition,
was drawn from a normal distribution with a mean of
and standard deviation of
; that is . Similarly, was drawn from a normal distribution , where was calculated from previous LP generations. Though the performance of SaDE was good, its complexity had increased.For the ease of implementation and comparison in this paper, we use a simplified adaptive DE (DEa). Based on the idea of the SaDE algorithm, a simple adaptive DE (DEa) algorithm is proposed in this paper. In the mutation pool, we use five mutation strategies as follows:

DE/rand/1 (Storn and Price, 1997)
(2) 
DE/best/1
(3) 
DE/currenttobest/1 (Zhang and Sanderson, 2009)
(4) 
DE/best/2
(5) 
DE/rand/2
(6)
where is a parameter for mutation strength, and is the th dimension of the current best solution. Here, , , , and represent 5 different individuals, which are selected randomly from the current population.
Both parameters and are initialized to a set of discrete values. That is, and . The current mutation strategy and parameter settings are not updated if better solutions are found during the iterations. Otherwise, mutation strategies and parameters are randomly selected from the above sets or ranges. Our simplified variant becomes easier to implement and the performance is much better than the original DE, as observed from our simulations later. Therefore, we will use this variant for later simulations.
2.2 Particle Swarm Optimization
Particle swarm optimization (PSO) is a wellknown swarm intelligence optimizer with good convergence (Clerc and Kennedy, 2002), which is widely used in many applications (Kennedy and Eberhart, 2011). However, it can have premature convergence for some problems, and thus various variants have been developed to remedy it with different degrees of improvement. Among different variants, an improved PSO with an inertia weight (PSOw), proposed by Shi and Eberhart (Shi and Eberhart, 1998), is efficient and its main steps can be summarized as the following update equations:
(7) 
(8) 
where and are the velocity vector and position vector, respectively, for particle at iteration . Here, is the individual best solution of th individual in the previous iterations, and is the best solution of the current population. In Eq. (7), and are the two learning parameters, while and are two random numbers at the current iteration, drawn from a uniform distribution. In a special case when the inertia weight , this variant becomes the original PSO.
The value of can affect the convergence rate significantly. If is large, the algorithm can have a faster convergence rate, but it can easily fall into local optima, leading to premature convergence. Studies showed that a dynamically adjusted with iteration can be more effective. That is
(9) 
where represents the maximum number of iterations, and are the minimum inertia weight and the maximum inertia weight, respectively. we will use PSOw in the later experiments.
2.3 Cuckoo Search
Cuckoo search (CS) algorithm is a metaheuristic algorithm, developed by XinShe Yang and Suash Deb (Yang and Deb, 2009), which was based on the behavior of some cuckoo species and their interactions with host species in terms of brooding parasitism. CS also uses Lévy flights instead of isotropic random walks, which can explore large search spaces more efficiently. As a result, CS has been applied in many applications such as engineering design (Gandomi et al., 2013)
(Vazquez, 2011), semantic Web service composition (Chifu et al., 2011), thermodynamic calculations (Bhargava et al., 2013) and so on.Briefly speaking, the CS algorithm consists of two parts: local search and global search. The current individual is modified to a new solution by using the following global random walk:
(10) 
where is a factor controlling step sizes, and is the step size. is a random vector drawn from a Lévy distribution (Yang, 2014). That is
(11) 
Here, ‘’ means that is drawn as a randomnumber generator from the distribution on the righthand of the equation. is the Gamma function, while is a parameter. One of the advantages of using Lévy flights is that it has a small probability of long jumps, which enables the algorithm to escape from any local optima and thus increases its exploration capability (Yang et al., 2018; Viswanathan et al., 1999). The local search is mainly carried out by
(12) 
where is the Heaviside function. This equation modifies the solution using two other solutions and . Here, the random number is drawn from a uniform distribution and is the step size. A switching probability is used to switch between these two search mechanisms, intending to balance global search and local search.
2.4 Other Optimizers
There are other optimizers that can be representative for the purpose of comparison. The genetic algorithm (GA) has been a cornerstone of almost all modern evolutionary algorithms, which consists of crossover, mutation and selection mechanisms. The GA has a wide range of applications such as pattern recognition (Pal and Wang, 2017), neural networks and control system optimization (Back and Schwefel, 1996) as well as discrete optimization problems (Guerrero et al., 2017). The literature on this algorithm is vast, thus we will not introduce it in detail here.
On the other hand, the artificial bee colony (ABC) algorithm was inspired by foraging behaviour of honey bees (Karaboga, 2005), and this algorithm has been applied in many applications (Li et al., 2017; Gao et al., 2018, 2019). A multiobjective version also exists (Xiang et al., 2015). Due to the page limit, we will not introduce this algorithm in detail. Readers can refer to the relevant literature (Karaboga and Basturk, 2007).
We will use the above five algorithms in this paper for different initialization strategies.
3 Initialization Methods
The main objective of this paper is to investigate different probability distributions for initialization and their effects on the performance of the algorithms used.
3.1 Motivations of this work
Both existing studies and empirical observations suggest that initialization can play an important role in the convergence speed and accuracy of certain algorithms. A good set of initial solutions, especially, when the initial solutions that are near the true optimality by chance, can reduce the search efforts and thus increase the probability of finding the true optimality. As the location of the true optimality is unknown in advance, initialization is largely uniform in a similar manner as those for Monte Carlo simulations. However, for problems in higher dimensions, a small initial population may be biased and could lie sparsely in unpromising regions. In addition, the diversity of the initial population is also important, and different distributions may have different sampling emphasis, leading to different degrees of diversity. For example, some studies concerning genetic algorithms have shown some effects of initialization (Burke et al., 2004; Chou and Chen, 2000).
Many initialization methods such as the Latin hypercube sampling (LHS) in the literature are mainly based on the idea of uniform spreading in the search space. They are easy to implement and can work well sometimes. For example, the twodimensional landscape of the Bukin function is shown in Fig. 1. When the search space is in the area of , the PSOw algorithm with an initial population obeying a uniform distribution can find the optimal solution in a few iterations. The distribution of the particles is shown in Fig. 2. For comparison, another run with an initial beta distribution has also been carried out as shown in Fig. 3. Specifically, the indicates the real optimal solution at (10,1), while the dots show the locations of the current population and () indicates the best solution in current population. Fig. 2 shows the initial population with a uniform distribution in the search domain, while these population converged near the optimal solution after 5 iterations by the PSOw algorithm, as shown in Fig. 2 where the current best solution of the population is close to the real optimal solution. However, the initial population (as shown in Fig. 3) drawn from a beta distribution could fall into a local optimum after 5 iterations as shown in Fig. 3. This clearly shows the effect and importance of initialization.
For the above function, initialization by a uniform distribution seems to give better results. However, for another function, uniform distributions may give worse results, even though uniform distributions are widely used. As an illustrative example, the best solution of the Michalewicz function is in twodimensional space at [2.20319,1.57049] (see Fig. 4). If the initialization was done by a uniform distribution, it can lead to premature convergence as shown in Fig. 5, while the initialization by a beta distribution can lead to the global optimal solution after 5 iterations as shown in Fig. 6. Clearly, this shows that uniform distributions are not the best initialization method for all functions. For the same algorithm (such as PSOw), different initialization methods can lead to different accuracies for different problems. This suggests that different initialization methods should be used for different problems. We will investigate this issue further in a more systematically way.
In order to study the effect of initialization systematically, we will use a diverse range of different initialization methods such as Latin hypercube sampling and different probability distributions. We now briefly outline them in the rest of this section.
3.2 Details of initialization methods
Before we carry out detailed simulations, we now briefly outline the main initialization methods.
3.2.1 Latin hypercube sampling
Latin hypercube sampling (LHS) is a spatial filling mechanism. It creates a grid in the search space by dividing each dimension into equal interval segments, and then generates some random points within some interval. It utilizes ancillary variables to ensure that each of the variables to be represented is in a fully stratified feature space (McKay et al., 1979). For example, if three sample points are needed in a twodimensional (2D) parameter space, the three points may have four location scenarios (shown in Fig. 7). Obviously, these three points can also be scattered in the diagonal subspace of the 2D search space.
In the LHS, a set of samples are distributed so that they can sparsely distribute in the search space so as to effectively avoid the problem of over aggregation of sampling points. Studies show that such sampling can provide a better spread than uniform distributions, but it does not show a distinct advantage for higherdimensional problems. So we will investigate this issue further.
3.2.2 Beta distribution
A beta distribution is a continuous probability distribution over the interval (0,1). Its probability density function (PDF) is given by
(13) 
where is the standard Gamma function. This distribution has two shape parameters () that essentially control the shape of the distribution. Its notation is usually written as . Its expected value is
and its variance is
.3.2.3 Uniform distribution
Uniform distributions are widely used in initialization, and a uniform distribution on an interval is given by
(14) 
where and are the limits of the interval. Its expectation or mean is , and its variance is .
3.2.4 Normal distribution
Gaussian normal distributions are among the most widely used distributions in various applications, though they are not usually used in initialization. The probability density function of this bellshaped distribution can be written as
(15) 
with the mean of and the standard deviation . This distribution is often written as N() where its mean determines the central location of the probability curve and its standard deviation determines the spread on both sides of the mean (Yang, 2014; Kızılersü et al., 2018)
. Normal distributions can be approximated by other distributions and can be linked closely with other distributions such as the lognormal distribution, Student
distribution and distribution.3.2.5 Logarithmic normal distribution
Unlike the normal distribution, the Logarithmic normal distribution is an asymmetrical distribution. Its probability density function is
(16) 
3.2.6 Exponential distribution
An exponential distribution is asymmetric with a long tail, and its probability density function can be written as
(17) 
where is a parameter. Its mean and standard deviation are and , respectively.
3.2.7 Rayleigh distribution
The probability density function of the Rayleigh distribution can be written as
(18) 
whose mean and variance are and , respectively (Weik and Weik, 2001).
3.2.8 Weibull distribution
The Weibull distribution has a probability density function (Kızılersü et al., 2018)
(19) 
where is a scale parameter, and is a shape parameter. This distribution can be considered as a generalization of a few other distributions. For example, corresponds to an exponential distribution, while leads to the Rayleigh distribution. Both its mean and variance are and , respectively.
Based on the above different probability distributions, we will carry out various numerical experiments in the rest of this paper.
4 Numerical Experiments
4.1 Experimental settings
In order to investigate the possible influence of different initialization methods on the five algorithms (PSOw, DEa, CS, ABC, GA), a series of experiments have been carried out first using a set of nine benchmark functions as shown in Table 1. The experiments will focus first on the PSOw, DEa and CS, and then similar tests will be carried out for the ABC and GA. These benchmark functions are chosen based on their different properties such as their modal shapes and numbers of local optima. More specifically, , , , and are continuous, unimodal functions, while , , , and are multimodal functions. For example, the global minimum of lies in a narrow, parabolic valley, which can be difficult for many traditional algorithms. Functions , , , and have many local minima that are widespread. The bowlshaped function has local minima with only one global optimum, while the Easom function has several local minima, and its global minimum lies in a small area in a relatively large search space. In addition, we will use 10 more recent benchmarks from CEC2014 and CEC2017 to be discussed in detail later.
Name  Function  Search Range  Opt  

Rosenbrock  0  
Ackley  
0  
Sphere  0  
Rastrigin  0  
Griewank  0  
Zakharov  0  
Alpine  0  
Easom  1  
Schwefel  0 
For a fair comparison, we have set the same termination condition for all the algorithms with the maximum number of function evaluations (FEs) of 600000, each algorithm with certain initialization has 20 independent runs. For all the test functions, the dimensionality is set to . As there are so many sets of data generated, we have summarized the results as the ‘Best’, ‘Mean’, ‘Var’ (variance) and ‘Dist’. Here, ‘Dist’ corresponds to the mean distance from the obtained solution to the true global optimal solution . That is
(20) 
where denotes the total number of runs in each set of experiments. This distance metric not only measures the distance of the results, but also measures the stability of the obtained solutions.
For the algorithmdependent parameters, after some preliminary parametric studies, we have set and to and , respectively, for DEa. In the PSOw, learning factors and are set to 1.5, and the inertia weight . For the CS, we have used and . In addition, the population size () will be varied so as to see if it has any effect on the results.
4.2 Influence of population size and number of iterations
Before we can compare different initialization methods in detail, we have to figure out if there is any significant effect due to the number of the population () used and the maximum number of iterations . Many studies in the existing literature used different population sizes and numbers of iterations (Akay and Karaboga, 2012). Though the total number of function evaluations for all functions and algorithms is set to 600 000, the maximum iteration will vary with . Obviously, a larger will lead to a smaller .
In order to make a fair comparison, all the algorithms are initialized by the same random initialization. Four functions with are selected randomly to reduce the computational efforts. We have carried out numerical experiments and the results are summarized in Tables 2 to 4.
Fun  value 









Rosenbrock  Best  0  5.09e19  1.39e09  2.53  12.225  19.929  22.198  
Mean  0.0987  0.1993  0.1993  7.2212  13.632  21.224  23.878  
Var  1.5057  0.7947  0.7947  215.2  1.2934  1.9251  1.2532  
Dist  0.2  0.0999  0.1003  5.6464  14.933  22.638  25.277  
Sphere  Best  5.67e197  9.71e105  8.64e70  7.19e36  1.02e22  9.05e11  1.63e07  
Mean  1.78e187  1.57e96  4.45e65  7.74e33  1.92e19  1.72e09  2.74e06  
Var  0  3.63e191  1.99e128  2.88e64  5.98e37  1.18e17  2.39e11  
Dist  2.0039e48  2.01e48  1.48e32  2.34e16  9.17e10  1.44e4  5.88e3  
Rastrugin  Best  6.9647  18.271  91.987  113.07  112.94  130.32  140.17  
Mean  43.547  96.429  113.77  122.2  131.08  142.57  151.7  
Var  1108.7  558.39  95.447  48.535  77.995  41.862  64.614  
Dist  14.371  19.502  21.89  23.979  24.69  26.31  27.254  
Griewank  Best  0  0  0  0  0  7.21e12  7.07e09  
Mean  1.11e03  1.11e03  2.22e03  0  3.53e03  1.11e03  1.11e03  
Var  7.34e06  7.34e06  1.21e05  0  8.69e05  7.34e06  7.35e06  
Dist  1.1368  1.1368  2.2735  8.60e07  0.9227  1.1371  1.1572 
Fun  value 









Rosenbrock  Best  27.141  17.382  7.7837  17.936  13.534  16.019  14.754  
Mean  36.055  28.803  24.005  21.842  18.815  19.304  18.085  
Var  264  161.56  18.62  5.035  7.1071  6.5569  2.1004  
Dist  27.465  26.791  25.076  23.145  20.193  20.678  19.607  
Sphere  Best  2.46e04  1.69e08  9.77e16  1.33e36  4.56e28  3.91e18  4.12e14  
Mean  2.32e03  2.35e07  1.14e11  5.68e34  2.84e27  1.30e17  1.44e13  
Var  3.51e06  1.30e13  1.35e21  1.14e66  4.69e54  6.21e35  6.36e27  
Dist  1.91e01  1.78e03  7.72e06  6.91e17  2.18e13  1.54e08  1.59e06  
Rastrugin  Best  28.59  17.913  22.884  19.899  12.935  12.935  8.9567  
Mean  44.819  35.542  33.732  32.187  25.073  22.287  18.26  
Var  91.411  98.105  41.561  68.803  85.093  48.604  30.551  
Dist  27.43  23.591  23.187  21.74  18.805  17.91  15.081  
Griewank  Best  5.36e05  5.12e09  2.92e14  0  0  2.22e16  0  
Mean  1.14e03  3.25e03  5.77e03  2.34e03  3.69e04  3.70e04  3.69e04  
Var  5.25e06  1.28e04  2.76e04  1.09e04  2.74e06  2.73e06  2.73e06  
Dist  1.177  1.2848  1.4333  5.02e01  3.79e01  3.79e01  3.79e01 
Fun  value 









Rosenbrock  Best  0  3.18e13  2.76e01  7.01  12.55  31.904  92.556  
Mean  3.22e30  4.05e09  2.6204  11.9608  16.78  33.51  105.21  
Var  1.04e58  2.67e16  1.5332  7.8448  7.1492  8.1804e01  83.05  
Dist  5.55e17  1.90e05  4.3424  12.113  15.364  27.867  27.592  
Sphere  Best  1.91e139  4.48e62  1.27e32  5.91e14  1.23e08  1.42e03  7.60e02  
Mean  1.41e136  2.54e61  3.63e32  9.10e14  2.22e08  2.03e03  1.36e01  
Var  1.47e271  7.27e122  1.86e64  7.30e28  5.04e17  1.86e07  5.14e04  
Dist  3.16e68  2.03e30  8.20e16  1.30e06  6.50e04  1.97e01  1.62  
Rastrugin  Best  0  12.791  24.333  47.727  55.124  77.96  89.599  
Mean  9.45e01  16.8  34.625  57.695  68.146  89.385  102.36  
Var  8.83e01  7.4984  16.161  36.615  33.234  38.407  54.137  
Dist  9.45e01  14.626  22.947  30.64  33.247  36.986  40.746  
Griewank  Best  0  0  0  2.71e11  1.28e06  2.48e02  2.57e02  
Mean  0  0  0  1.67e10  2.49e06  3.19e02  3.66e02  
Var  0  0  0  1.48e20  2.72e12  3.27e07  2.05e05  
Dist  5.28e07  5.46e07  5.19e07  2.81e04  3.60e02  1.3211  4.5692 
Table 2 shows the experimental results of the DEa algorithm with different and . When and , DEa shows better performance in most cases. That means the accuracy of the DEa algorithm depends more heavily on the number of iterations, and it manages to find the optimal solution with a small population size.
Table 3 summarizes the results for the PSOw algorithm. We can see that the PSOw algorithm performs well on the Rosenbrock, Rastrugin and Griewank functions when the size of population is 3000 and the number of iterations is 200. Only for the Sphere function, the PSOw has the highest search accuracy when and . The results show that the accuracy of the PSOw may depend more on its population size.
Table 4 shows that the CS algorithm has better performance under a small population and repeatedly iterations. Compared with DE, CS can find the optimal solution with a smaller size of population. This may be related to the design mechanism of the CS algorithm, which increases the diversity in the iteration process of the algorithm. This is one of the advantages of the CS algorithm.
Based on the above experiments, it is recommended that the population size and the number of maximum iterations be set as shown in Table 5. Thus, these parameter settings will be used in all the subsequent experiments.
algorithm  

DEa  100  6000 
PSOw  3000  200 
CS  30  10000 
4.3 Numerical results
In order to compare the possible effects of different initialization strategies for the first three algorithms (PSOw, DEa and CS), 22 different initialization methods have been tested, including 9 different distributions with different distribution parameters. As before, we have used different benchmarks with and have run each algorithm independently for 20 times. Tables 6, 7 and 8 show the comparison results of the ‘Best’, ‘Mean’, ‘Var’ and ‘Dist’ obtained by the three algorithms.
Comments
There are no comments yet.