1 Introduction
Most of the realworld optimization problems are multimodal tasgetiren2017iterated ; lin2016effective ; ciancio2016heuristic ; csevkli2017multi ; yi2016parallel ; palmieri2017comparison
, i.e., their objective functions often contain multiple global optima or local optima. If a traditional numerical method is used to solve a multimodal optimization problem, it has to try many times for locating a different optimum in each single run to pick out the global optima, which will take a lot of time and work. In such a scenario, using metaheuristic algorithms, no matter evolutionary algorithms (EAs) or swarm based algorithms, to solve these problems has become a hot research topic, as they are easy to implement and can converge to as good as possible solutions. However, many metaheuristic algorithms, such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Differential Evolution (DE), and so on, are primarily designed to search for a single global optimum. But, it is desirable to locate multiple global optima in order to choose the most appropriate solution for engineers. In addition, some metaheuristic algorithms are easy to fall into the local optima. Therefore, many techniques have been proposed for the metaheuristic algorithms to find as many global optima as possible. These techniques are commonly known as niching methods
li2010niching , which are committed to promoting and maintaining the formation of multiple stable subpopulations within a single population for locating multiple optima. Some representative niching methods include crowding de1975analysis , fitness sharing goldberg1987genetic , clustering yin1993fast , restricted tournament selection harik1995finding , parallelization bessaou2000island , speciation deb1989investigation , and population topologies kennedy2002population , and so on. Several of them are presented below, more references and discussions about niching methods can be seen from reference li2016seeking .Crowding was firstly proposed by De Jong de1975analysis to preserve genetic diversity, so as to improve the global search ability of the algorithm for locating multiple optima. In crowding method, the offspring with better fitness replaces the most similar individual from a subset (i.e., crowd) of the population. The similarity is generally measured by distance, including hamming distance for binary encoding and Euclidean distance for realvalued encoding thomsen2004multimodal , which means that the smaller the distance of two individuals is, the higher similarity of two individuals is. The individuals of subset are randomly selected from the population, and the size of subset is a user specified parameter called crowding factor () that is often set to 2 or 3. However, low values will lead to replacement errors, i.e., the offspring replaces another individual with little similarity, which will lower the population diversity. To reduce replacement errors, deterministic crowding mahfoud1992crowding and probabilistic crowding mengshoel1999probabilistic were proposed, Thomsen suggested simply setting equal to the population size thomsen2004multimodal , and so on.
Goldberg and Richardson goldberg1987genetic proposed fitness sharing mechanism, which develops the method of sharing functions to permit the formation of multiple subpopulations. When using this method, the shared fitness of all the individuals need to be calculated according to Eq.1 before executing the operators of an algorithm, as the operations on the individuals are based on their shared fitness.
(1) 
where, and are the original fitness and shared fitness of individual respectively; is the shared value of individual with other individuals, and is formulated as , wherein is the population size, is the sharing function over the individual and , which is calculated as follows.
(2) 
where, is a constant, and always set as 1; is the distance between the individual and ; is the sharing distance, which is always set as the value of peak radius. However, this method assumes that all the peaks have equal height and width. Obviously, a prior knowledge of the fitness landscape is required for this method to set the value of .
Speciation deb1989investigation is another popular niching technique, which is used to form parallel subpopulations, i.e., species, according to the similarity between individuals. The similarity is also measured by distance, such as Euclidean distance. This niching technique employs one userspecified parameter called species distance () to divide the population into a set of species. The detail procedure of forming species in every generation is shown below. Assuming a maximization optimization problem. The first step is to find out the species seeds that dominate their own species. Firstly, an empty set is defined to contain the species seeds. Sorting the individuals in decreasing order of fitness and adding the first individual of population after sorting to the set . Then, judging the remaining individuals one by one in order, and determining whether they are within a distance /2 of any species seed in . If no, they are added to . After all the individuals are traversed, the set has collected all the species seeds. Next comes the step of adding the individuals to their corresponding species. For each species seed in , adding the individuals that are within a distance /2 of it to its species, if an individual has been already added to a species, doing nothing. Although speciation method is able to divide the population into multiple subpopulations, it has a major shortcoming that its parameter, i.e., species distance, is hard to be set precisely for different optimization problems. In such case, inspired by the Multinational Evolutionary Algorithms ursem1999multinational , Stoean et al. stoean2007disburdening proposed “detectmultimodal” mechanism to establish species, which removes the need of distance parameter. The “detectmultimodal” mechanism utilities a set of interior points between two individuals to detect whether there is a valley between them in the fitness landscape, so as to determine whether the two individuals track different extreme points. If the fitness values of all the interior points are better than the worse fitness value among these two individuals, they are considered following the same extreme point, i.e., locating in the same peak of the fitness landscape, as shown in Fig. 1(a), wherein, > and >. On the contrary, if there exist at least one interior point whose fitness value is worse than the worse fitness value among these two individuals, at least one valley is considered existing between the two individuals, i.e., they are considered tracking different extreme points as shown in Fig. 1(b), wherein, <. Those individuals following the same extreme point are added to the same species. Although “detectmultimodal” mechanism does not utilize species distance to divide the population into multiple species, it employs another parameter called “number of gradations”, i.e., number of interior points, which also depends on the problem to be solved.
Thus it can be seen that some niching methods need to set some parameters, which require some prior knowledge of the fitness landscape, to divide the population into multiple subpopulations. However, for some realworld optimization problems, the prior knowledge of the fitness landscape are very difficult or almost impossible to obtain li2010niching . Therefore, these niching methods are difficult to be used to deal with the realworld optimization problems. In this paper, a new multimodal optimization algorithm called Whale Swarm Algorithm with Iterative Counter (WSAIC), based on our preliminary work in Zeng2017 , is proposed. By improving the iteration rule of the original WSA for multimodal optimization, WSAIC removes the need of specifying optimal parameter values for different problems to form multiple subpopulations, without introducing any niching parameter. In addition, WSAIC enables the identification of extreme point to jump out of the located extreme points during iterations.
The remainder of this paper is organized as follows. A brief overview of the multimodal optimization algorithms is presented in section 2. Section 3 describes the proposed WSAIC in sufficient detail. The experimental setup containing the algorithms compared, test functions, parameters setting and performance metrics is presented in section 4. The next section presents the experimental results and analysis to evaluate WSAIC. The last section is the conclusions and topics for further research.
2 Related works
Since increasing niching methods were put forward, a large number of multimodal optimization algorithms combining the metaheuristic algorithms with these niching methods have been proposed. In this section, a brief overview of multimodal optimization algorithms is presented. According to whether the prior knowledge of the fitness landscape is needed, these multimodal optimization algorithms are classified into prior knowledge based methods and nonprior knowledge based methods. More references and discussions about multimodal optimization algorithms can be viewed in references
li2016seeking ; das2011real .2.1 Prior knowledge based methods
Species Conserving Genetic Algorithm (SCGA) was proposed by Li et al. li2002species via introducing speciation and species conservation techniques into the classical GA. In each iteration, the current population is partitioned into multiple subpopulations (i.e., species) using the speciation technique deb1989investigation , before executing the genetic operators. Moreover, after executing the genetic operators, all the species seeds are either conserved to the next generation or replaced by better members of the same species, which can contribute significantly to the preservation of global and local optima that have been found so far. Li showed that the additional overhead of SCGA caused by these two techniques is not higher than that introduced by GA with sharing goldberg1987genetic , and SCGA performs far better than Genetic Algorithm with Sharing (SGA) in success rates of locating the global optima.
Li li2005efficient proposed Speciesbased DE (SDE) algorithm to solve multimodal optimization problems via introducing speciation technique. In SDE algorithm, when the number of member individuals of a species is less than a predefined value, the algorithm will randomly generate new individuals within the radius of species seed until the species size reaches the predefined value. Then, each identified species runs the conventional DE algorithm by itself. In addition, if the fitness of an offspring is the same as that of its species seed, this offspring will be replaced by a randomly generated new individual. These two mechanisms have improved the efficiency of SDE algorithm significantly.
The speciation technique was also introduced into the conventional PSO by Li li2004adaptively to solve multimodal optimization problems. In each iteration of Speciesbased PSO (SPSO), after the population is divided into multiple species and the species seeds are determined, each species seed is assigned to its member individuals as the . Then, each individual updates its position according to the iterative equations of velocity and position in the PSO. The experimental results showed that SPSO is comparable or better than SNGA beasley1993sequential , SCGA and NichePSO brits2002niching over a set of multimodal functions.
Stoean et al. stoean2007disburdening proposed Topological Species Conservation (TSC) algorithm, which utilities the “detectmultimodal” mechanism to remove the need of distance parameter when selecting species seeds and forming species. In TSC algorithm, all the individuals that track the same extreme point are in the same species, which corresponds the real structure of the optimization function. And the species seeds also can be conserved to the next generation. However, TSC algorithm need excessive fitness evaluations in seeds selection procedure, especially when the number of interior points get larger. For improving the computational efficiency of TSC algorithm, i.e., saving the fitness evaluations, Stoean et al. stoean2010multimodal proposed Topological Species Conservation Version 2 (TSC2) algorithm. In TSC2 algorithm, the current free individual chooses the seed one by one in ascending order of distance to it to perform the “detectmultimodal” procedure until the return value is true or this individual is considered a new seed, because the species dominated by the closer seed is more likely to track the same peak with the current free individual. With this method, TSC2 algorithm saves considerable fitness evaluations. In addition, when the optimization function has a large number of local optima, TSC algorithm might block the population into too many seeds that would only be conserved to the next generation, which would significantly reduce the search ability of TSC algorithm. And TSC2 algorithm introduced the maximum number of seeds to guarantee the algorithm’s search ability.
Deb and Saha deb2010finding firstly converted a singleobjective multimodal optimization problem into a biobjective optimization problem. Multiple global and local optima of the original problem become the members of weak Paretooptimal set of the transformed problem. One of the objectives of the transformed problem is the objective function of the original problem. With regards to another objective, the gradientbased approach is firstly employed, which is based on the property that the derivatives of objective function at the minimum points are equal to zero. However, the derivatives of objective function at the maximum and saddle points are also equal to zero, and the objective functions of some optimization problems may be nondifferentiable at the minimum points. Then, more pragmatic neighborhood count based approaches are developed for establishing the second objective, which is assigned to the count of the number of neighboring solutions that are better than the current solution in terms of their objective function values. During iterations, the nondominated ranks of different solutions rely on two parameters, i.e., and , which are used to distinguish two optima.
2.2 Nonprior knowledge based methods
Thomsen thomsen2004multimodal proposed Crowdingbased DE (CDE) algorithm by introducing crowding method into the conventional DE for multimodal function optimization. In CDE algorithm, the similarity of two individuals is measured by the Euclidean distance between two individuals. The fitness value of an offspring is only compared with that of the most similar individual in the current population, and the offspring replaces the most similar individual if it has better fitness. This replacement scheme can make the population remain diversity in the search space, which has a great contribution to the location of multiple optima. Thomsen showed that CDE algorithm performs better than a fitness sharing DE variant over a group of multimodal functions.
The History based topological speciation (HTS) was proposed by Li and Tang li2015history to incorporate into the CDE with species conservation technique for multimodal optimization. HTS is a parameterfree speciation method, which captures the landscape topography relying exclusively on search history, which avoids the additional sampling and function evaluations associated with existing topology based methods. Therefore, HTS is a parameterfree speciation method. The experimental results showed that HTS performs better than existing topologybased methods when the function evaluation budget is limited.
Liang et al. liang2006comprehensive proposed Comprehensive Learning Particle Swarm Optimizer (CLPSO) for multimodal function optimization. In CLPSO, all particles’ best previous positions can potentially be used to guide a particle’s flying, i.e., each dimension of a particle may learn from the corresponding dimension of different particle’s best previous position. The velocity updating equation of CLPSO is shown as follows.
(3) 
where, is an inertia weight, is an acceleration constant, denotes the th dimension of particle ’s position, represents the th dimension of particle ’s velocity. is a random number between 0 and 1 associated with . For particle , a set =[, , , , , ], where denotes the dimension of fitness function, is built to store the serial numbers of those particles that particle should learn from their best previous positions at the corresponding dimensions. denotes the th dimension of particle ’s best previous position. The values of elements in
depend on the learning probability
that can take different values for different particles. For example, generating a random number for assigning , if this random number is greater than , assigning to ; otherwise, assigning the serial number of a particle selected from population with tournament selection procedure to . If particle does not find a better position after a certain number of iterations called the refreshing gap , reassigning for particle .Li li2007multimodal proposed FitnessDistanceRatio based PSO (FERPSO) algorithm, which utilities FER to avoid specifying any niching parameter, for multimodal function optimization. The FER value between particle and particle is shown as follows.
(4) 
where, and are the best previous positions of particle and particle respectively; is a scaling factor and formulated as follows.
(5) 
where, and are the bestfit particle and worstfit particle in current population respectively. 
 is the size of search space, which is estimated by its diagonal distance
(where denotes the dimension of search space, i.e., the number of variables. and are the upper and lower bounds of the th variable , respectively). In every iteration, each particle needs to calculate the FER value between it and every other particle to find the neighboring point denoted by , corresponding to the maximal FER value. Then, each particle updates its velocity according to Eq. 6. Over successive iterations, some subpopulations tracking different peaks will be formed, so as to locate multiple optima.(6) 
where, and are the velocity and position of particle respectively. and
denote two vectors which comprise random values generated between 0 and
. is a positive constant. And is a constriction coefficient.The PSO niching algorithms using ring topology, such as , ,  and , were also proposed by Li li2010niching for multimodal function optimization. These ring topology based PSO niching algorithms also remove the need of specifying any niching parameter. Taking for example, a particle’s neighboring best point , shown in Eq. 6, is set as the best one among the best previous positions of its two immediate neighbors (i.e., left and right neighbors identified by population indices). With the ring topology methods, these PSO algorithms are able to form multiple subpopulations over successive iterations. Li showed that the PSO algorithms using ring topology can provide comparable or better performance than SPSO and FERPSO on some test functions.
Qu et al. qu2012differential proposed a neighborhood based mutation and integrated it with three niching DE algorithms, i.e., CDE, SDE and sharing DE thomsen2004multimodal , for multimodal function optimization. In neighborhood mutation, the subpopulations are formed, relying on the parameter neighborhood size . During iterations, each individual should calculate the Euclidean distances between it and other individuals in the population. Then, selecting the former nearest individuals to current individual to form a subpopulation. And the offspring of each individual is generated by using the corresponding DE algorithm within the subpopulation that the individual belongs to. After a certain number of iterations, some subpopulations will track different extreme points of the multimodal function to be optimized. Generally, the parameter can be set to a value between 1/20 of the population size and 1/5 of the population size.
The locally informed PSO (LIPS) algorithm was proposed by Qu et al. qu2013distance for multimodal function optimization. LIPS makes use of the local information (best previous positions of several neighbors) to guide the search of each particle. The velocity updating equation of LIPS is shown as follows.
(7) 
where, is an inertia weight, denotes the th dimension of particle ’s position, is the th dimension of particle ’s velocity. , is the neighbor size, which is dynamically increased from 2 to 5 during iterations; is a random number generated in [0, 4.1/], and ; is the best previous position of the th nearest neighbor to the th individual’s best previous position. With this technique, LIPS algorithm eliminates the requirement for specifying any niching parameter and improves the local search ability. Qu et al. showed that LIPS algorithm outperforms several wellknown niching algorithms, containing , , SPSO, FERPSO, SDE and CDE, and so on, over 30 standard benchmark functions not only on success rate but also with regard to accuracy.
Wang et al. wang2015mommop proposed Multiobjective Optimization for Multimodal Optimization Problems (MOMMOP), which transforms a Multimodal Optimization Problem (MMOP) into a Multiobjective Optimization Problem (MOP) with two conflicting objectives. So that all the global optima of the original MMOP can become the Pareto optimal solutions of the transformed problem. With MOMMOP, an MMOP is transformed into a MOP as follows.
(8) 
where, is a solution, () is the th variable, and denotes the number of variables. and are the two conflicting objectives of the transformed problem. is the objective function value of with respect to the original problem. and denote the best and worst objective function values during the evolution, respectively. and are the upper and lower bounds of the first variable, respectively. is the scaling factor, which gradually increases during the evolution. Because some optima may have the same values in certain variables, for the sake of locating multiple global optima, each variable is used to design a biobjective optimization problem similar to Eq. 8. If a solution Pareto dominates another solution on all the biobjective optimization problems, is considered to dominate . What’s more, to make the population distribute more reasonably, another comparison criterion is proposed. That is a solution dominates another solution if
(9) 
where, and are the objective function values of and , respectively, with respect to the original problem. denotes the Euclidean distance between the normalized and (i.e., =( )/( ), =( )/( ), where {1, ,}). If ( normalization(, ))<0.01, and is considered to be quite similar to each other.
2.3 Our motivations
Based on the above overview, we can find that lots of multimodal optimization algorithms need to set some niching parameters, which require some prior knowledge of the fitness landscape. However, it is very difficult or almost impossible to obtain the prior knowledge of the fitness landscape, for some realworld optimization problems. What’s more, few of existing multimodal optimization algorithms can effectively identify and get rid of the located extreme points during iterations. Because they have no idea whether a subpopulation has already located the extreme point of a peak or not, before the end of running. Therefore, lots of function evaluations will be wasted, when an extreme point has been located early. And it also restrict the global search ability of the algorithm that a subpopulation all the time tracks an extreme point located early.
So, based on the above analysis, our main motivations in this paper are summarized as follows.

Improving the iteration rule of the original WSA to remove the need of specifying optimal values of attenuation coefficient for different problems to form multiple subpopulations, without adding any niching parameter.

Enabling the identification of extreme point and jumping out of the located extreme points during iterations, relying on two new parameters named stability threshold and fitness threshold , so as to eliminate the unnecessary function evaluations and improve the global search ability.
3 Whale swarm algorithm with iterative counter
Firstly, this section introduces the traditional WSA briefly. Then, the improvements of WSA for multimodal function optimization are presented. Next, the implementation of WSAIC is described in sufficient detail. Assuming the problems to be solved by the algorithms are minimization problems. Let the fitness functions be the same as the objective functions.
3.1 Whale swarm algorithm
We proposed WSA for function optimization Zeng2017 , inspired by the whales’ behavior of communicating with each other via ultrasound for hunting. It can be seen from reference Zeng2017 , WSA performs well on maintaining population diversity and has strong local search ability, which contribute significantly to locating the global optima with high accuracy. In WSA, a whale updates its position, under the guidance of its “better and nearest” whale , according to the following equation.
(10) 
where, and denote the th elements of ’s position at and +1 iterations respectively, and represents the th element of ’s position at iteration. is the intensity of ultrasound source, which can be set to 2 for almost all the cases. denotes the natural constant. is the attenuation coefficient. And is the Euclidean distance between and . denotes a random value generated between 0 and uniformly. According to Eq. 10, a whale would move positively and randomly under the guidance of its “better and nearest” whale which is close to it, and move negatively and randomly under the guidance of that whale which is quite far away from it.
The general framework of WSA is shown in Fig. 2, where  in line 6 denotes the number of members in , namely the swarm size, and in line 7 is the th whale in . From Fig. 2, it can be seen that WSA has a fairly simple structure. In every iteration, before moving, each whale needs to find its “better and nearest” whale as shown in Fig. 3, where in line 6 is the fitness value of whale .
3.2 The improvements of WSA
3.2.1 The improvement on iteration rule of WSA
Although the original WSA performs well on forming multiple parallel subpopulations and maintaining the population diversity, it needs to specify optimal values of attenuation coefficient for different problems, which reduces the practicality of WSA. Thus, we improve the iteration rule of WSA to remove the need of specifying optimal values of attenuation coefficient for different problems, on the premise of ensuring the formation of multiple subpopulations and the ability of local exploitation, in this section. Firstly, we assume that the intensity of ultrasound is not attenuated in water, i.e., =0, which means that each whale can correctly understand the message send out by any other whale in the search area. Therefore, a whale will move positively and randomly under the guidance of its “better and nearest” whale, whether that whale is close to it or far away from it. So, when a whale and its “better and nearest” whale track different extreme points, the whale may leave far away from the extreme point tracked by it due to the guidance of its “better and nearest” whale that follows another extreme point, which will weaken WSA’s ability of local exploitation. Taking a onedimensional function optimization problem for example, as it can be seen from Fig. 4, the whale is near to an extreme point, however, its “better and nearest” whale is near to another extreme point. In this case, may move to a worse point or even go to another peak under the guidance of , which will impede the location of the extreme point tracked by previously. Obviously, this situation is not conducive to locating multiple global optima for WSA.
To solve the above problem effectively, we improved the rule of location update for each whale as follows. Firstly, generating a copy of a whale , then, moves under the guidance of ’s “better and nearest” whale according to Eq. 10. If the position of after moving is better than that of (i.e., the fitness value of after moving is less than that of ), moves to ; otherwise, remains unchanged. In a word, if a whale find a better position by Eq. 10 in an iteration, it moves to the better position; otherwise, it remains quiescent in its original position. So, when it comes to the case shown in Fig. 4, the probability of whale leaving away from the extreme point tracked by it will be reduced very much, because whale is difficult to find a better position by Eq. 10 under the guidance of its “better and nearest” whale . In other words, the whale may stay at its original position with great probability to guide the movement of other whales. When appearing at least one whale that follows the same extreme point with and is better than in the meantime, will converge to the extreme point under the guidance of the nearest one among those better whales, in next iteration. Therefore, this improvement will contribute significantly to forming multiple subpopulations and enhancing the ability of local exploitation for the improved WSA, which are very conducive to locating multiple global optima, despite =0. What’s more, this improvement does not added any niching parameter.
3.2.2 Identifying and escaping from the located extreme points during iterations
In the field of multimodal optimization, identifying the located extreme points effectively and jumping out of these extreme points for saving unnecessary function evaluations during iterations are very important for metaheuristic algorithms to locate the global optimum/optima. Although the improved WSA mentioned in section 3.2.1 can ensure the formation of multiple subpopulations and the ability of local exploitation, it cannot yet identify the located extreme points and escape from these extreme points during iterations. In such case, we propose two new parameters, i.e., stability threshold and fitness threshold , which aims to help each whale identify the located optima and jump out of these optima during iterations, so as to save unnecessary function evaluations and improve the global search ability. In this paper, is a predefined number of iterations utilized to judge whether a whale has reached steady state or not, and that a whale has reached steady state means that it has located the extreme point tracked by it. And is a predefined value utilized to judge whether a solution is a current global optimum or not. If a whale does not find a better position after successive iterations, it is considered to have already reached steady state and located an extreme point. If the difference between its fitness value and (the fitness value of the best one among the current global optima) is less than , the whale’s position is considered a current global optimum; otherwise, the whale’s position is considered a local optimum. If the whale’s position is a current global optimum, recording this optimum. Then, the whale that has reached steady state is randomly reinitialized in the search area to jump out of the located extreme point. For judging whether a whale does not find a better position after successive iterations or not, each whale keeps an iterative counter . In this paper, the improved WSA is called WSA with Iterative Counter (WSAIC).
3.3 The detailed procedure of WSAIC
Fig. 5 presents the pseudo code of WSAIC. For WSAIC, it is worth noting that the initialization of a whale contains two operations: initializing the whale’s position randomly and assigning 0 to its iterative counter. The improvement on iteration rule of WSA described in section 3.2.1 can be seen from Fig. 5. If a whale’s “better and nearest” whale exists (line 8 in Fig. 5), a copy of this whale is generated firstly (line 9 in Fig. 5), then, the copy moves under the guidance of the “better and nearest” whale according to Eq. 10 (line 10 in Fig. 5). If the position of this copy after moving is better than that of the original whale (line 12 in Fig. 5), the copy replaces the original whale (line 13 in Fig. 5).
The detail of identifying and escaping from the located extreme points during iterations for WSAIC is shown below. If a whale finds a better position (lines in Fig. 5) in an iteration, assigning 0 to its iterative counter (line 14 in Fig. 5); otherwise, the whale should check its iterative counter (lines and in Fig. 5). The detail procedure of checking a whale’s iterative counter is demonstrated in Fig. 6. As we can see from Fig. 6, firstly it should determine whether the whale’s iterative counter is equal to stability threshold or not. If the whale’s iterative counter is not equal to (line 2 in Fig. 6), its increases by 1 (line 3 in Fig. 6); otherwise, the whale is considered to have already reached steady state and located an extreme point. If the whale has reached steady state, it should determine whether the located extreme point is a current global optimum or not (line 5 in Fig. 6). If it is a current global optimum, recording this extreme point. Then, the whale that has reached steady state is randomly reinitialized (line 6 in Fig. 6), for jumping out of the located extreme point to find the global optima. Thus it can be seen that, with the parameter stability threshold , the proposed WSAIC can jump out of the located extreme points on the premise of keeping enough local search ability.
The detail procedure of judging whether a solution is a current global optimum or not is demonstrated in Fig. 7. Firstly, it should judge whether the fitness value of the solution is less than (the fitness value of the best one among the current global optima set ) or not. If the fitness value of this solution is less than (line 2 in Fig. 7), this solution must be the current global optimum. Before updating (line 6 in Fig. 7) and recording the new current global optimum (line 7 in Fig. 7), it should judge whether the optima in located before are still the current global optima or not. If the difference between and the whale’s fitness is greater than (line 3 in Fig. 7), all the elements of located before are not the current global optima, so needs to be cleared (line 4 in Fig. 7). If the fitness value of this solution is greater than (line 8 in Fig. 7), it should judge whether this solution is a current global optimum or not. If the difference between the fitness value of this solution and is not greater than (line 9 in Fig. 7), this solution is considered a current global optimum, so it is added to (line 10 in Fig. 7).
3.4 Parameters setting of WSAIC
As we can see from the detailed steps above, WSAIC contains four algorithmspecific parameters, i.e., intensity of ultrasound source , attenuation coefficient , stability threshold and fitness threshold . and are two constants, and always set to 2 and 0 respectively. should be set to a comparatively small value that is between 0 and the difference between the global second best fitness and the global best fitness, if the problem to be solved has at least one local optimum as shown in the example of an onedimensional function in Fig. 8. The and in Fig. 8 denote the global optimum and the global second best solution respectively, and the difference between their objective function values is quite small. For the function to be optimized in Fig. 8, should be set to a very small value that between 0 and . For almost all the problems, especially those problems without prior knowledge, can be set to . And for those benchmark test functions whose global optima are given, can be set to the value of the predefined fitness error (i.e., level of accuracy) that is utilized to judge whether a solution is a real global optimum. may need to be specified different values for different problems. According to a large number of experimental results, it is reasonable to set =100, where is the function dimension.
3.5 Convergence analysis of WSAIC
It can be seen from section 3.3 that if a whale’s iterative counter increases to , the whale is considered to have already reached steady state, i.e., it has converged. So, the convergence analysis of WSAIC depends on the convergence proof of position update rules of WSAIC. Based on Fig. 5 and Eq. 10, the position update equation of WSAIC can be expressed as follows.
(11) 
where, = 1rand(0, 2), = rand(0, 2). It can get that E() = 0, E() = 1 and D() D() E() 1/3.
To prove the convergence of Eq. 11
just needs to prove the convergence of expectation and variance of
. The expectation of is shown as follows.(12) 
Because the distribution of is independent to and , and can be treated as a constant. Eq. 12 is equivalent to the following equations.
(13) 
(14) 
(15) 
The sufficient and necessary condition for the convergence of is that the eigenvalue is less than 1. It can be seen from Eq. 15 that = E(). Therefore, we can conclude that will converge during iterations because E() = 0.
The variance of is shown as follows.
(16) 
Eq. 16 can be transformed as follows.
(17) 
From Eq. 17, it can get that the eigenvalue of is E(). So will converge during iterations because E() = 1/3 that is less than 1. Therefore, we can expect that during iterations of WSAIC, the whales will converge to an appropriate solution under the guidance of their “better and nearest” whales.
4 Experimental setup
The proposed WSAIC and other algorithms compared are all implemented with C++ programming language by Microsoft visual studio 2015 and executed on the PC with 3.2 GHz and 3.6 GHz Intel core i53470 processor, 4 GB RAM and 64bit Microsoft windows 10 operating system. The four niching metaheuristic algorithms compared are listed as follows.

LIPS qu2013distance : the locally informed PSO.

NSDE qu2012differential : the neighborhood based speciation DE.

NCDE qu2012differential : the neighborhood based crowding DE.

FERPSO li2007multimodal : the FitnessEuclidean distance ratio PSO.
Apart from the above niching metaheuristic algorithms, WSAIC also compares with WSA Zeng2017 . It is worth noting that the different evolutionary rules of different algorithms will result in different computational complexity. As all these algorithms compared are implemented in the same development environment, and run on the same computer, we utilize the CPU time as the stopping criterion for the sake of fairness. It is obvious that the more global optima the algorithm finds and the accuracy of these optima are higher when satisfying the stopping criterion, the better the algorithm performs.
4.1 Test functions
We use 20 multimodal benchmark functions to test these algorithms. Basic information of these test functions is summarized in Table 1, in which the symbol “” in last column corresponding to F16F20 means that these functions have many local optima, and we have not counted the number of their local optima. In Table 1, the former 15 multimodal functions come from CEC2015 Suganthan2015cec , and the latter 5 functions are the classical multimodal functions with high dimension. These CEC2015 functions can be divided into two parts. The first 8 functions are expanded scalable functions and the remaining 7 functions are composition functions. All these CEC2015 functions come with search space shift and rotation that makes them more difficult to be solved, while the latter 5 multimodal functions are only shifted. More details of these test functions are presented in the document “Definitions of CEC2015 niching benchmark 20141228” which can be download from the website shown in reference Suganthan2015cec . For functions F2, F3, F5, F6, F7, F8, F9, F11, F12 and F13 the objective is to locate all the global optima, while for the rest the target is to escape from the local optima to hunt for the global optimum. And all these test functions are minimization problems.
Fn.  Test function name  Dimensions  No. of global optima  No. of local optima 

F1  Expanded TwoPeak Trap  5  1  15 
F2  Expanded FiveUnevenPeak Trap  5  32  0 
F3  Expanded Equal Minima  4  625  0 
F4  Expanded Decreasing Minima  5  1  15 
F5  Expanded Uneven Minima  3  125  0 
F6  Expanded Himmelblau’s Function  4  16  0 
F7  Expanded SixHump Camel Back  6  8  0 
F8  Modified Vincent Function  3  216  0 
F9  Composition Function 1  10  10  0 
F10  Composition Function 2  10  1  9 
F11  Composition Function 3  10  10  0 
F12  Composition Function 4  10  10  0 
F13  Composition Function 5  10  10  0 
F14  Composition Function 6  10  1  19 
F15  Composition Function 7  10  1  19 
F16  Griewank  50  1  
F17  Ackley  100  1  
F18  Rosenbrock  100  1  
F19  Rastrigin  100  1  
F20  Expanded Scaffer’s F6  100  1  
Search range: 
4.2 Parameters setting
To compare the performance of the multimodal optimization algorithms in this paper, all the test functions should be treated as blackbox problems, though their global optima can be obtained by the method of derivation. Thus, the known global optima of these test functions cannot be used by the algorithms during iterations. After each algorithm finished the iteration, the fitness error , i.e., level of accuracy, is used to judge whether a solution is a real global optimum. If the difference between the fitness value of a solution and the fitness value of the known global optimum is lower than , this solution can be considered a real global optimum. In our experiments, the fitness error , population size and function evaluations used by these algorithms for the test functions are listed in Table 2. It is worth noting that a function which has higher dimension or more complex fitness landscape may require a larger population size or more function evaluations.
Fn.  pop. size ()  function evaluations  

F1  0.00000001  50  6.0E6 
F2  0.00000001  50  1.8E8 
F3  0.00000001  50  1.5E9 
F4  0.00000001  50  1.5E8 
F5  0.00000001  50  9.0E7 
F6  0.00000001  50  3.0E7 
F7  0.000001  50  3.0E7 
F8  0.0001  50  1.5E9 
F9  0.00000001  500  1.2E8 
F10  0.00000001  500  3.0E7 
F11  0.00000001  100  6.0E7 
F12  0.00000001  100  5.0E7 
F13  0.00000001  100  1.0E7 
F14  0.00000001  500  5.0E7 
F15  0.00000001  100  2.0E7 
F16  0.00000001  100  2.0E7 
F17  0.00000001  100  2.0E7 
F18  0.00000001  100  1.5E8 
F19  0.00000001  100  1.5E8 
F20  0.00000001  100  6.0E7 
The parameters’ values of these algorithms compared are set as same as those in their reference source respectively. Table 3 lists the values of main parameters of these algorithms.
Algorithms  Parameters 

LIPS  =0.729844, =25 
NSDE  =0.9, =0.5 
NCDE  =0.9, =0.5 
FERPSO  =0.729844, =4.1 
WSA  =2 
WSAIC  =2, =0, =100, = 

: inertia weight; : neighborhood size;

: crossover rate; : scaling factor;

: constriction factor; : coefficient;
The attenuation coefficient of WSA for these test functions are listed in Table 4. Table 5 shows the neighborhood size of NSDE and NCDE respectively.
Fn.  F1  F2  F3  F4  F5  F6  F7  F8  F9  F10 

0.0001  0.1  0.14  0.00005  0.16  0.16  0.001  0.3  0.09  0.001  
Fn.  F11  F12  F13  F14  F15  F16  F17  F18  F19  F20 
0.01  0.001  0.001  0.001  0.001  0.005  0.01  0.014  0.005  0.01 
Fn.  F1  F2  F3  F4  F5  F6  F7  F8  F9  F10  F11  F12  F13  F14  F15  F16  F17  F18  F19  F20 

NSDE  0.2  0.2  0.2  0.2  0.2  0.2  0.2  0.2  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1 
NCDE  0.2  0.2  0.2  0.2  0.2  0.2  0.2  0.2  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1 
4.3 Performance metrics
To fairly compare the performance of WSAIC with other five algorithms, we have conducted 51 independent runs for each algorithm over each test function. And the following four metrics are used to measure the performance of all the algorithms.

Success Rate (SR) li2004adaptively : the percentage of runs in which all the global optima are successfully located using the given level of accuracy.

Average Number of Optima Found (ANOF) Suganthan2015cec : the average number of global optima found over 51 runs.

Quality of optima found: the mean of fitness values of optima found over 51 runs, reflecting the accuracy of optima found.

Convergence rate: the rate of an algorithm converging to the global optimum over function evaluations.
5 Experimental results and analysis
In this section, the results of the comparative experiments are presented and analyzed in detail.
5.1 Quantity of optima found
This section presents and analyses the results of quantity of optima found by these algorithms. Firstly, all the algorithms are compared on “Success Rate”, which is the most popular metric used to test the performance of the multimodal optimization algorithms in terms of locating multiple global optima. Then, the metric “Average Number of Optima Found” is employed to further compare the performance of the algorithms on locating multiple global optima, as some algorithms can not achieve nonzero SR over some functions with multiple global optima.
5.1.1 Success Rate
The SR of each algorithm on each test function is presented in Table 6, in which each number within the parenthese denotes the rank of each algorithm on the corresponding function in terms of SR, and the bold number means the corresponding algorithm performs best on the function. The same SR value on a function means that the corresponding algorithms have the same rank for the function. The last row of Table 6 shows the total rank of each algorithm for all the test functions, which are the summation of each individual rank of the algorithm for each function. It can be seen from Table 6 that WSAIC performs best on most of the test functions in terms of SR. Especially on F3, F5 and F8 which have massive global optima, WSAIC performs far better than other algorithms, while the algorithms compared can not achieve nonzero SR values on the three functions. It is worth noting that F9F15 are composition functions with search space shift and rotation, whose global optima are more difficult to be located, so that all the algorithms can not achieve nonzero SR values on F9F14. For the composition function F15, WSAIC, LIPS, NSDE and NCDE all get the maximal SR value, i.e., 1. What’s more, for the high dimensional multimodal functions F16, F18 and F19, WSAIC can also achieve much higher SR values than most of other multimodal optimization algorithms. It also can be seen that the better performance of WSAIC in terms of SR can be supported by the total rank of WSAIC that is 25 which is much better than those achieved by other algorithms.
Fn.  LIPS  NSDE  NCDE  FERPSO  WSA  WSAIC 

F1  0.31  0.92  0.14  0.39  0.08  1 
(4)  (2)  (5)  (3)  (6)  (1)  
F2  0  0  0  0  0  1 
(2)  (2)  (2)  (2)  (2)  (1)  
F3  0  0  0  0  0  1 
(2)  (2)  (2)  (2)  (2)  (1)  
F4  0.31  (1)  (1)  0.14  0  1 
(4)  (1)  (1)  (5)  (6)  (1)  
F5  0  0  0  0  0  1 
(2)  (2)  (2)  (2)  (2)  (1)  
F6  0  0  0  0  0  1 
(2)  (2)  (2)  (2)  (2)  (1)  
F7  0  0  0.16  0  0  1 
(3)  (3)  (2)  (3)  (3)  (1)  
F8  0  0  0  0  0  1 
(2)  (2)  (2)  (2)  (2)  (1)  
F9  0  0  0  0  0  0 
(1)  (1)  (1)  (1)  (1)  (1)  
F10  0  0  0  0  0  0 
(1)  (1)  (1)  (1)  (1)  (1)  
F11  0  0  0  0  0  0 
(1)  (1)  (1)  (1)  (1)  (1)  
F12  0  0  0  0  0  0 
(1)  (1)  (1)  (1)  (1)  (1)  
F13  0  0  0  0  0  0 
(1)  (1)  (1)  (1)  (1)  (1)  
F14  0  0  0  0  0  0 
(1)  (1)  (1)  (1)  (1)  (1)  
F15  1  1  1  0.73  0  1 
(1)  (1)  (1)  (5)  (6)  (1)  
F16  1  1  1  0.39  0.41  0.94 
(1)  (1)  (1)  (6)  (5)  (4)  
F17  0  0.08  0  0  0  0 
(2)  (1)  (2)  (2)  (2)  (2)  
F18  0  0.82  0.88  0.24  0.57  0.88 
(6)  (1)  (1)  (5)  (4)  (1)  
F19  0  1  0  0  0  0.98 
(3)  (1)  (3)  (3)  (3)  (2)  
F20  0  0  0  0  0  0 
(1)  (1)  (1)  (1)  (1)  (1)  
Total rank  41  30  33  49  52  25 
5.1.2 Average Number of Optima Found
As the sample size in this paper is 51 that is greater than 30, we have conducted the Two Independentsamples Ztest for WSAIC to judge whether the difference between its population and the populations of other algorithms, respectively represented by their independent samples, are significant or not on each test function under the significance level 0.05, which is based on the variance between the ANOF of two independent samples. Table 7
presents the ANOF of each algorithm on each test function, and the standard deviation of the number of optima found are also listed. The symbol “+” means that the difference between the population of WSAIC and the population of the algorithm compared is significant, and WSAIC performs better than the algorithm compared, while the symbol “=” means that the difference is not significant. And the symbol “
” means that the difference is significant, and WSAIC performs worse than the algorithm compared. The bold number in Table 7 means that the corresponding algorithm performs best on the function in terms of ANOF. It can be seen form Table 7 that WSAIC has the best performance in terms of ANOF over F1F8, F15 and F18, which echoes the best SR values of WSAIC on these test functions as shown in Table 6. For the two composition functions F10 and F14 and the high dimensional function F20, all the algorithms can not get nonzero ANOF, which means that all the algorithms can not find the global optima of these functions. For the composition functions F9, F11 and F12, WSAIC performs far better than most of other algorithms compared in terms of ANOF. It also can be seen that the better performance of WSAIC in terms of the number of optima found can be supported by the total number of symbols “+”, “=” and “” in the last three rows of Table 7. As we can see from Table 7, the nonzero values of the number of symbol “” only occur when WSAIC is compared with LIPS, that is 1. And the number of symbol “+” is lager than that of symbol “=” when compared with the other algorithms. The better performance of WSAIC is firstly due to the improvement on the location update rule of WSA when =0, i.e., a whale moves to a new position under the guidance of its “better and nearest” whale if this new position is better than its original position, which can ensure the formation of multiple subpopulations and keep the ability of local exploitation. More importantly, the method of identifying and jumping out of the located extreme points during iterations can improve the global search ability as far as possible, which can contribute significantly to the location of multiple global optima.Fn.  LIPS  NSDE  NCDE  FERPSO  WSA  WSAIC  
F1  0.310.46  +  0.920.27  =  0.140.34  +  0.390.49  +  0.080.27  +  10 
F2  10.861.36  +  1.510.50  +  00  +  2.670.88  +  0.760.47  +  320 
F3  16.761.45  +  1.840.36  +  44.901.61  +  5.591.16  +  1.040.59  +  6250 
F4  0.310.46  +  10  =  10  =  0.140.34  +  00  +  10 
F5  16.801.68  +  1.980.14  +  0.221.53  +  8.611.50  +  1.270.45  +  1250 
F6  9.651.49  +  20  +  7.961.83  +  4.251.10  +  0.920.39  +  160 
F7  3.801.31  +  20  +  5.901.47  +  1.490.70  +  0.470.50  +  80 
F8  16.041.67  +  2.020.14  +  33.244.04  +  7.901.47  +  0.690.98  +  2160 
F9  6.310.67  00  +  20  +  0.820.68  +  1.510.54  +  4.531.04  
F10  00  =  00  =  00  =  00  =  00  =  00 
F11  0.510.50  +  0.020.14  +  0.840.36  =  0.020.14  +  0.330.47  +  0.820.38 
F12  0.180.38  =  00  =  00  =  00  =  00  =  0.040.19 
F13  0.960.19  =  10  =  10  =  0.760.42  =  00  +  0.900.30 
F14  00  =  00  =  00  =  00  =  00  =  00 
F15  10  =  10  =  10  =  0.730.45  +  00  +  10 
F16  10  =  10  =  10  =  0.390.49  +  0.410.49  +  0.940.24 
F17  00  =  0.080.27  =  00  =  00  =  00  =  00 
F18  00  +  0.820.38  +  0.880.32  +  0.240.42  +  0.570.50  +  0.880.32 
F19  00  +  10  +  00  +  00  +  00  +  0.980.14 
F20  00  =  00  =  00  =  00  =  00  =  00 
+  11  8  9  14  15  
=  8  12  11  6  5  
1  0  0  1  0 
5.2 Quality of optima found
This section compares the performance of these algorithms in terms of the quality of optima found. Table 8 presents the mean of fitness values of optima found over 51 runs on all these test functions, and the standard deviation of fitness values of optima found are also listed in the parentheses. It is worth noting that the fitness values of optima found by all the algorithms on the CEC2015 niching test functions (i.e., F1F15 in Table 1) have substracted 100*Fn. (Fn. denotes the serial number of a function), for comparing the performance of all the algorithms on the quality of optima found. And we have also conducted the Two Independentsamples Ztest between WSAIC and other algorithms compared. The bold number in Table 8 means that the corresponding algorithm performs best on the function in terms of the quality of optima found. It can be seen form Table 8 that WSAIC has the best performance over F1, F4, F7 and F14. What’s more, WSAIC has very balanced performanc in terms of the quality of optima found over these functions, which can be supported by the total number of symbols “+”, “=” and “” in the last three rows of Table 8, in which the number of symbol “” pertaining to different algorithms compared is much less than that of symbols “+” and “=”.
Fn.  LIPS  NSDE  NCDE  FERPSO  WSA  WSAIC  
F1  2.65E+01  +  3.14E+00  +  6.53E+00  +  2.98E+01  +  8.71E+01  +  0.00E+00 
(2.01E+01)  (1.08E+01)  (1.37E+01)  (2.73E+01)  (4.39E+01)  (0.00E+00)  
F2  0.00E+00  =  1.17E10  =  6.86E03  =  2.26E13  =  9.32E+00  +  4.88E16 
(0.00E+00)  (5.07E10)  (1.52E02)  (7.85E13)  (1.97E+01)  (2.11E15)  
F3  0.00E+00  =  3.90E15  =  1.60E11  =  8.41E13  =  1.57E01  =  3.42E16 
(0.00E+00)  (9.78E15)  (5.53E11)  (3.44E12)  (3.64E01)  (5.28E16)  
F4  7.86E02  +  0.00E+00  =  1.89E14  =  1.71E01  +  1.52E+00  +  0.00E+00 
(6.68E02)  (0.00E+00)  (3.31E14)  (1.33E01)  (5.64E01)  (0.00E+00)  
F5  0.00E+00  =  5.57E16  =  3.88E06  =  3.74E13  =  6.69E15  =  1.52E16 
(0.00E+00)  (3.94E15)  (3.38E06)  (8.30E13)  (1.83E14)  (3.89E16)  
F6  0.00E+00  =  1.11E15  =  1.53E14  =  1.19E13  =  5.84E01  +  2.37E15 
(0.00E+00)  (7.88E15)  (3.21E14)  (1.72E13)  (2.42E+00)  (5.39E15)  
F7  5.58E07  =  5.58E07  =  5.65E07  =  6.40E02  =  2.41E+00  +  5.58E07 
(8.76E15)  (0.00E+00)  (2.75E08)  (4.53E01)  (2.95E+00)  (0.00E+00)  
F8  0.00E+00  =  7.73E09  =  1.05E05  =  2.02E11  =  5.38E01  +  2.09E08 
(0.00E+00)  (5.47E08)  (6.30E06)  (1.27E10)  (1.13E+00)  (6.67E08)  
F9  1.78E13  =  1.53E+00  +  0.00E+00  =  4.81E01  +  2.02E10  =  3.77E14 
(1.15E13)  (0.00E+00)  (0.00E+00)  (7.12E01)  (5.69E10)  (2.92E14)  
F10  3.00E+01  3.00E+01  3.00E+01  9.84E+03  +  1.07E+04  +  3.82E+01  
(2.22E12)  (0.00E+00)  (5.24E05)  (6.32E+03)  (5.89E+03)  (1.34E+01)  
F11  4.61E03  =  2.04E01  +  3.00E03  =  3.78E01  +  3.44E01  +  3.78E02 
(1.44E02)  (1.07E01)  (1.02E02)  (3.52E01)  (7.03E01)  (8.59E02)  
F12  5.61E+01  +  4.69E02  8.83E02  3.27E+01  +  3.88E+02  +  4.41E+00  
(1.47E+02)  (1.86E02)  (1.07E01)  (8.17E+01)  (2.71E+02)  (2.47E+01)  
F13  9.58E+00  4.15E13  0.00E+00  6.01E+01  +  3.91E+02  +  2.28E+01  
(4.74E+01)  (1.47E13)  (0.00E+00)  (1.12E+02)  (8.24E+01)  (6.98E+01)  
F14  2.12E+02  +  1.12E+02  +  8.02E+01  +  2.13E+02  +  6.20E+02  +  5.91E+01 
(2.15E+02)  (7.36E+01)  (7.09E+00)  (1.41E+02)  (1.10E+02)  (4.16E+01)  
F15  2.54E13  =  4.99E13  =  0.00E+00  =  5.92E+01  +  2.91E+02  +  6.24E14 
(1.32E13)  (1.86E13)  (0.00E+00)  (9.71E+01)  (4.95E+01)  (2.92E13)  
F16  7.58E14  =  2.63E13  =  1.74E13  =  1.28E01  =  1.11E02  =  4.83E04 
(1.07E13)  (8.27E14)  (9.64E14)  (2.82E01)  (1.23E02)  (1.95E03)  
F17  2.11E+01  +  2.43E03  2.11E+01  +  2.00E+01  =  2.00E+01  =  2.00E+01  
(3.18E02)  (8.54E03)  (6.69E02)  (1.78E02)  (2.91E04)  (3.41E03)  
F18  1.52E+00  6.44E01  4.69E01  7.37E+07  +  1.72E+00  1.77E+02  
(4.05E+00)  (2.42E+00)  (1.28E+00)  (9.80E+07)  (1.97E+00)  (1.22E+03)  
F19  1.28E+02  +  1.01E12  3.65E+02  +  2.55E+03  +  5.69E+03  +  1.50E+00  
(2.06E+01)  (1.45E13)  (1.58E+02)  (2.61E+03)  (1.44E+03)  (1.06E+01)  
F20  3.05E+01  1.83E+00  4.43E+01  +  3.72E+01  4.44E+01  +  4.36E+01  
(2.38E+00)  (1.02E+00)  (1.75E+00)  (1.66E+00)  (7.13E01)  (7.79E01)  
+  6  4  5  11  14  
=  10  9  11  8  5  
4  7  4  1  1 
What’s more, the box plot of mean fitness values of optima found per run over 51 runs, by WSAIC, LIPS, NSDE, NCDE and FERPSO, is shown in Fig. 9. Since the quality of optima found by WSA are worse than other algorithms over most of these functions as shown in table 8, the blox plot of WSA are ignored, so as to ensure the obvious differences of other algorithms in terms of the distribution of optima found. It can be seen from Fig. 9
, the dispersion degree of mean fitness values of optima found by WSAIC are quite small on most of the test functions with respect to other algorithms compared. And WSAIC only has outliers on F11, F12, F13, F14 and F20, while most of other algorithms have more outliers over these test functions. Therefore, it can be concluded that WSAIC has good stability on the accuracy of optima found over these test functions, with respect to other algorithms compared. The better performance of WSAIC in terms of the quality of optima found is also due to the improvement on the location update rule of WSA, i.e., a whale moves to a new position under the guidance of its “better and nearest” whale if this new position is better than its original position, which can ensure the ability of local exploitation. For example, when some whales follow a same extreme point, the best whale among these whales will stay where it is with great probability to guide other whales to converge to the extreme point followed by them. Besides, the method of identifying and jumping out of the located extreme points during iterations can improve the global search ability as far as possible to find the global optima. For example, if some whales converge to a solution that near to a global optimum, with this method some other whales that have already reached steady state will be reinitialized, and they may move to the positions that near to those convergent whales, which will accelerate these whales to converge to the global optimum.
5.3 Convergence rate
From the previous two sections, it can be seen that the proposed WSAIC has better and more consistent performance than other algorithms in terms of both the quantity of optima found and the quality of optima found on most test functions. To demonstrate the efficiency of WSAIC on locating the global optima, WSAIC is compared with other algorithms excepting FERPSO and WSA (because the population of FERPSO and WSA may prematurely converge to a solution or several solutions with same fitness value and quit iteration) in terms of convergence rate in this section. Six functions (i.e., F1, F4, F9, F14, F18 and F19, wherein F9 has no local optima while others all come with local optima) are used to test these algorithms. The convergence curves of all the algorithms on these test functions are depicted in Fig. 10, in which the abscissa values represent the number of function evaluations and the ordinate values denote the mean of fitness values of the current global optima over 51 runs. It can be seen from Fig. 10(c), for function F9 without local optima, NSDE cannot converge to the global optima, and WSAIC converge to the global optima with much faster rate than that of LIPS and NCDE. Although NCDE can converge to the global optima of F9, it gets a much lower ANOF on F9 than that gained by WSAIC as shown in Table 7. What’s more, for functions F1, F4, F14, F18 and F19 that have multiple local optima, WSAIC can achieve the global optima with satisfied convergence rate on F4 and F18 as shown in Fig. 10(b) and Fig. 10(e). For F19, WSAIC only performs a little worse than NSDE and far better than other algorithms, as shown in Fig. 10(f). And WSAIC can achieve better solutions with faster convergence rate than other algorithms on F1 and F14, as shown in Fig. 10(a) and Fig. 10(d). Therefore, it can be concluded that the proposed WSAIC has excellent performance on convergence rate with respect to other algorithms.
5.4 The effect of different parameter values
As mentioned in section 3.4, the parameters and two constants, and always set to 2 and 0 respectively. For almost all the problems, especially those problems without prior knowledge, can be set to . Thus, only the parameter stability threshold may need to be specified different values for different problems. This section presents the results of ANOF obtained by WSAIC on all these test functions with different values, as shown in Table 9. And a clear visual comparison of ANOF obtained by WSAIC with different values is shown in Fig. 11, where the values of ANOF with different values on each test function are normalized, and 1 refers to the best ANOF value while 0 refers to the worst ANOF value. It can be seen from Table 9 and Fig. 11, WSAIC can achieve the best ANOF values on most test functions with =100. Therefore, the parameter can be set to 100 for almost all the continuous optimization problems.
Fn.  =20  =40  =60  =80  =100  =120  =140  =160  =180  =200 
F1  1  1  1  1  1  1  1  1  1  1 
F2  32  32  32  32  32  32  32  32  32  32 
F3  477.22  588.67  622.35  624.96  625  625  625  625  625  624.98 
F4  0.76  0.84  0.88  1  1  1  1  1  1  1 
F5  125  125  125  125  125  124.98  125  125  125  125 
F6  16  16  16  16  16  16  16  16  16  16 
F7  8  8  8  8  8  8  8  8  8  7.98 
F8  215.98  215.98  215.98  216  216  215.92  215.96  215.94  215.94  215.80 
F9  5.86  4.94  4.51  4.12  4.53  4.10  4.43  4.20  4.06  4.04 
F10  0  0  0  0  0  0  0  0  0  0 
F11  0.82  0.80  0.80  0.76  0.82  0.75  0.67  0.67  0.63  0.67 
F12  0.02  0.02  0.04  0.02  0.04  0  0.02  0  0  0 
F13  1  0.98  0.90  0.82  0.90  0.86  0.82  0.76  0.82  0.69 
F14  0  0  0  0  0  0  0  0  0  0 
F15  0.98  1  0.73  0.94  1  0.84  0.90  0.76  0.82  0.82 
F16  0.84  0.76  0.92  0.88  0.94  0.88  0.73  0.92  0.90  0.88 
F17  0  0  0  0  0  0  0  0  0  0 
F18  0.82  0.86  0.86  0.84  0.88  0.86  0.86  0.73  0.71  0.72 
F19  1  0.98  0.98  0.98  0.98  0.92  0.82  0.88  0.84  0.92 
F20  0  0  0  0  0  0  0  0  0  0 
6 Conclusions and future research
A new multimodal optimizer named Whale Swarm Algorithm with Iterative Counter (WSAIC), based on our preliminary work in Zeng2017 , is proposed in this paper. Firstly, WSAIC improves the iteration rule of the original WSA when attenuation coefficient is set to 0, i.e., a whale moves to a new position under the guidance of its “better and nearest” whale if this new position is better than its original position. As a results, WSAIC removes the need of specifying different values of for different problems to form multiple subpopulations, without introducing any niching parameter. And the ability of local exploitation is also ensured. What’s more, WSAIC enables the identification of extreme point and enables jumping out of the located extreme points during iterations, relying on two new parameters, i.e., stability threshold and fitness threshold . If a whale did not find a better position after successive iterations, it is considered to have already located an extreme point and is to be reinitialized, so as to eliminate the unnecessary function evaluations and improve the global search ability. If the difference between the fitness value of the located extreme point and (the fitness value of the best one among the current global optima) is less than , the located extreme point is considered a current global optimum. And the values of and are very easy to set for different problems. Moreover, the convergence of WSAIC is proved. The experimental results clearly show that WSAIC performs statistically better than other niching metaheuristic algorithms over most test functions on comprehensive metrics.
The main contributions of this paper are summarized into four aspects as follows.

WSAIC removes the need of specifying optimal niching parameter for different problems, which increases the practicality.

WSAIC can efficiently identify and jump out of the located extreme points during iterations, so as to locate as more global optima as possible in a single run, which further increases the practicality.

The algorithmspecific parameters of WSAIC are easy to be assigned for different problems, which also increases the practicality.

The population size of WSAIC does not need to exactly suit the number of optima of the optimization problem. Generally, WSAIC can keep a relative small population size, which contributes significantly to reducing the computation complexity.
In the future, we will focus on the following aspects.

Introducing other metaheuristic algorithms or heuristic algorithms for the best whale in each iteration to execute the neighborhood search process, so as to further improve the local search ability and the quality of optima.

Designing some new methods to escape from the located extreme points instead of random reinitialization, to make the population spread over the entire solution space as far as possible.
References
 (1) Tasgetiren MF, Kizilay D, Pan QK, Suganthan PN (2017) Iterated greedy algorithms for the blocking flowshop scheduling problem with makespan criterion. Comput Oper Res 77:111126
 (2) Lin G, Zhu W, Ali MM (2016) An effective hybrid memetic algorithm for the minimum weight dominating set problem. IEEE T Evolut Comput 20(6):892907

(3)
Ciancio C, Ambrogio G, Gagliardi F, Musmanno R (2016) Heuristic techniques to optimize neural network architecture in manufacturing applications. Neural Comput Appl 27(7):20012015
 (4) Şevkli AZ, Güler B (2017) A multiphase oscillated variable neighbourhood search algorithm for a realworld open vehicle routing problem. Appl Soft Comput 58:128144
 (5) Yi J, Li X, Chu CH, Gao L (2016) Parallel chaotic local search enhanced harmony search algorithm for engineering design optimization. J Intell Manuf :124
 (6) Raja MAZ, Ahmed U, Zameer A, Kiani AK, Chaudhary NI (2017) Bioinspired heuristics hybrid with sequential quadratic programming and interiorpoint methods for reliable treatment of economic load dispatch problem. Neural Comput Appl. doi:10.1007/s0052101730193
 (7) Li X (2010) Niching without niching parameters: particle swarm optimization using a ring topology. IEEE T Evolut Comput 14(1):150169
 (8) De Jong KA (1975) Analysis of the behavior of a class of genetic adaptive systems
 (9) Goldberg DE, Richardson J (1987) Genetic algorithms with sharing for multimodal function optimization. In: Genetic algorithms and their applications: Proceedings of the Second International Conference on Genetic Algorithms. Hillsdale, NJ: Lawrence Erlbaum, pp 4149

(10)
Yin X, Germay N (1993) A fast genetic algorithm with sharing scheme using cluster analysis methods in multimodal function optimization. In: Artificial neural nets and genetic algorithms. Springer, pp 450457
 (11) Harik GR (1995) Finding Multimodal Solutions Using Restricted Tournament Selection. In: ICGA. pp 2431
 (12) Bessaou M, P trowski A, Siarry P (2000) Island model cooperating with speciation for multimodal optimization. In: International Conference on Parallel Problem Solving from Nature. Springer, pp 437446
 (13) Deb K, Goldberg DE (1989) An investigation of niche and species formation in genetic function optimization. In: Proceedings of the 3rd international conference on genetic algorithms. Morgan Kaufmann Publishers Inc., pp 4250

(14)
Kennedy J, Mendes R (2002) Population structure and particle swarm performance. In: Evolutionary Computation, 2002. CEC’02. Proceedings of the 2002 Congress on. IEEE, pp 16711676
 (15) Li X, Epitropakis M, Deb K, Engelbrecht A (2016) Seeking multiple solutions: an updated survey on niching methods and their applications. IEEE T Evolut Comput 21(4): 518538
 (16) Thomsen R (2004) Multimodal optimization using crowdingbased differential evolution. In: Evolutionary Computation, 2004. CEC2004. Congress on. IEEE, pp 13821389
 (17) Mahfoud SW (1992) Crowding and preselection revisited. Urbana 51:61801
 (18) Mengshoel OJ, Goldberg DE (1999) Probabilistic crowding: Deterministic crowding with probabilistic replacement. In: Proc. of the Genetic and Evolutionary Computation Conference (GECCO99). p 409
 (19) Ursem RK (1999) Multinational evolutionary algorithms. In: Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress on. IEEE, pp 16331640
 (20) Stoean CL, Preuss M, Stoean R, Dumitrescu D (2007) Disburdening the species conservation evolutionary algorithm of arguing with radii. In: Proceedings of the 9th annual conference on Genetic and evolutionary computation. ACM, pp 14201427
 (21) Zeng B, Gao L, Li X (2017) Whale Swarm Algorithm for Function Optimization. In: Huang DS, Bevilacqua V, Premaratne P, Gupta P (eds) Intelligent Computing Theories and Application: 13th International Conference, ICIC 2017, Liverpool, UK, August 710, 2017, Proceedings, Part I. Springer International Publishing, Cham, pp 624639. doi:10.1007/9783319633091_55
 (22) Das S, Maity S, Qu BY, Suganthan PN (2011) Realparameter evolutionary multimodal optimization A survey of the stateoftheart. Swarm Evol Compu 1(2):7188
 (23) Li JP, Balazs ME, Parks GT, Clarkson PJ (2002) A species conserving genetic algorithm for multimodal function optimization. Evol Comput 10(3):207234
 (24) Li X (2005) Efficient differential evolution using speciation for multimodal function optimization. In: Proceedings of the 7th annual conference on Genetic and evolutionary computation. ACM, pp 873880
 (25) Li X (2004) Adaptively choosing neighbourhood bests using species in a particle swarm optimizer for multimodal function optimization. In: Genetic and Evolutionary Computation CGECCO 2004. Springer, pp 105116
 (26) Beasley D, Bull DR, Martin RR (1993) A sequential niche technique for multimodal function optimization. Evol Comput 1(2):101125
 (27) Brits R, Engelbrecht AP, Van den Bergh F (2002) A niching particle swarm optimizer. In: Proceedings of the 4th AsiaPacific conference on simulated evolution and learning. Singapore: Orchid Country Club, pp 692696
 (28) Stoean C, Preuss M, Stoean R, Dumitrescu D (2010) Multimodal optimization by means of a topological species conservation algorithm. IEEE T Evolut Comput 14(6):842864
 (29) Deb K, Saha A (2010) Finding multiple solutions for multimodal optimization problems using a multiobjective evolutionary approach. In: Proceedings of the 12th annual conference on genetic and evolutionary computation. ACM, pp 447454
 (30) Li L, Tang K (2015) Historybased topological speciation for multimodal optimization. IEEE T Evolut Comput 19(1):136150
 (31) Liang JJ, Qin AK, Suganthan PN, Baskar S (2006) Comprehensive learning particle swarm optimizer for global optimization of multimodal functions. IEEE T Evolut Comput 10(3):281295
 (32) Li X (2007) A multimodal particle swarm optimizer based on fitness Euclideandistance ratio. In: Proceedings of the 9th annual conference on Genetic and evolutionary computation. ACM, pp 7885
 (33) Qu BY, Suganthan PN, Liang JJ (2012) Differential evolution with neighborhood mutation for multimodal optimization. IEEE T Evolut Comput 16(5):601614
 (34) Qu BY, Suganthan P, Das S (2013) A distancebased locally informed particle swarm model for multimodal optimization. IEEE T Evolut Comput 17(3):387402
 (35) Wang Y, Li HX, Yen GG, Song W (2015) MOMMOP: Multiobjective optimization for locating multiple optimal solutions of multimodal optimization problems. IEEE T Cybernetics 45(4):830843
 (36) http://www.ntu.edu.sg/home/EPNSugan/index_files/CEC2015/CEC2015.htm