I Introduction
Ant Colony Optimization (ACO) is a population based metaheuristic optimization algorithm inspired by the foraging behaviour of ants was initially proposed for solving the discreet optimization problems. Later, it was extended for the optimization of continuous optimization problems. In present study, we shall examine the strength and weakness of the classical ACO algorithm for function optimization. In present paper we shall follow the improvised version of the Ant Colony Optimization with an abbreviation ACO. The performance of the classical ACO is fundamentally dependent on the mechanism of the construction of new solutions variable by variable basis where an dimensional continuous optimization problem has variables. Therefore, the key to success of the ACO is in its construction of a new solutions. To construct a new solution, a variable from an available solutions archive is selected for a Gaussian sampling in order to obtained a new variable. Therefore, newly obtained variable construct a new solution. The other influencing parameter to the performance of the ACC is distance measure metric, which is used for computing the average distance between variable from a selected solution and the variables of the other solutions in an available solutions set (solution archive). In the distance computation, the parameter pheromone evaporation rate plays an important role. In present study we shall analyse the impact of the mentioned parameters on the performance of the ACO. We shall also analyse and compare the performance of the improvised ACO with the other classical metaheuristic algorithms.
Ii Continuous Ant Colony Optimization (ACO)
The foraging behaviour of the ants inspired the formation of a computational optimization technique, popularly known as Ant Colony Optimization. Deneubourg et al. [1] illustrated that while searching for food, the ants, initially randomly explore the area around their nest (colony). The ants secrete a chemical substance known as pheromone on the ground while searching for the food. The secreted pheromone becomes the means of communication between the ants. The quantity of pheromone secretion may depend on the quantity of the food source found by the ants. On successful search, ants returns to their nest with food sample. The pheromone trail left by the returning ants guides the other ants to reach to the food source. Deneubourg et al. in their popular double bridge experiment have demonstrated that the ants always prefer to use the shortest path among the paths available between a nest and a food source. M. Dorigo and his team in early 90’s [2, 3] proposed Ant Colony Optimization (ACO) algorithm inspired by the foraging behaviour of the ants. Initially, ACO was limited to discrete optimization problems only [2, 4, 5]. Later, it was extended to continuous optimization problems [6]. Blum and Socha [7, 8]
proposed the continuous version of ACO for the training of neural network (NN). Continuous ACO is a population based metaheuristic algorithm which iteratively constructs solution. A complete sketch of the ACO is outlined in Figure
2. Basically, the ACO has three phases namely, Pheromone representation, Ant based solution construction and Pheromone update.Ii1 Pheromone Representation
Success of the ACO algorithm lies in its representation of artificial pheromone. The whole exercise of the ACO is devoted to maintain its artificial pheromone. The artificial pheromone represents a solution archive of a target problem. Socha and Dorigo [SochaDorigo], illustrated a typical representation of solution archive given in Figure 1. The solution archive shown in Figure 1 contains number of solutions, each of which has number of decision variables.
In the case of a dimensional benchmark optimization problem, variables in solution
indicates the variables of the optimization problem. Whereas, in the case of neural network (NN) training, a phenotype to genotype mapping is employed in order to represent NN as a vector of synaptic weights
[9, 10]. Therefore, a solution in the archive represent a vector of synaptic weights or a solution vector. A solution vector is initialized or created using random value chosen from a search space defined as . I case of NN, is defined as and is defined as , whereis set to 0. In the case of discrete version of ACO, a discrete probability mass function is used whereas, in case its continuous version, a continuous probability density function is derived from pheromone table. The probability density function is used for the construction of
new solutions. These new solutions are appended to initial solutions and then from solutions worst solutions are removed. Thus, the size of solution archive is maintained to .Ii2 Ant Based Solutions Construction
New solution is constructed variable by variable basis. The first step in the construction of new solution is to choose a solution from the set of solution archive based on its probability of selection. The probability of selection to the solutions in the archive is assigned using (1) or (2). The computation of the probability of selection a solution given in (1) is based on the rank of the solution in the archive whereas, the probability of selection given in (2) is based on the fitness value of the solution. For the construction of the () variable of (index into new solution set i.e. ) solution, () solution from a solution archive is chosen based on its probability of selection as per (1) or (2). Several selection strategies may be adopted for the selection of solution from a solution archive. The method of selection may be based on the probability of selection assigned based on the fitness function value or the weight assigned to the solution based on their rank. Hence, the probability of choosing solution from the solution archive may be given as
(1) 
or
(2) 
where is a weight associated to the solution computed as
(3) 
where is a parameter of the algorithm. Since in (2) the smallest function value gets lowest probability, a further processing is required in order to assign highest probability to smallest function value. The mean of the Gaussian function is set to 1, so that the best solution acquire maximum weight. The indicates the function value of the
solution. In case of the optimization problems, function value computation is straight forward whereas, in the case of the neural network training, the fitness of the solution is assigned using the Root Mean Square Error (RMSE) induced on NN for the given input training pattern (a given training dataset)
[11]. The benchmark functions (including the RMSE computation of the benchmark dataset) are the minimization problems, therefore, the lower the value of a function higher the rank the solution in the solution archive. A detailed discussion on the selection methods is offered in section IIIA.Once the
solution is picked up, in the second step, it is required to perform a Gaussian sampling. A Gaussian distribution is given as
(4) 
where is and is the average distance between the variable of the selected solution and the variable of the all other solutions in the archive. Various distance metrics adopted for computing the distance are comprehensively discussed in section IIIB. The Gaussian sampling parameter may be expressed as
(5) 
where the constant , is a parameter of the algorithm, known as pheromone evaporation rate (learning rate) and is the computed average distance between the selected solution and all the other solutions in the archive. For an example Manhattan distance may be given as
(6) 
Ii3 Pheromone Update
In the final phase of the ACO, the number of newly constructed solutions are appended to initial solutions. The fitness of solutions are ordered in acceding sense. In the subsequent step, number of worst solutions are chopped out from solution. Thus, the size of solution archive is being maintained to . The complete discussion about the ACO is summed up in the algorithm given in Figure 2.
Iii Performance Evaluation
The ACO algorithm mentioned in Figure 2, is implemented using Java programming language. The performance of ACO algorithm is observed against the tuning of various underlying parameters. The parameters taken into consideration for performance analysis are, (i) Selection strategies used for the selection of solution for Gaussian sampling (ii) Distance metric and (iii) pheromone evaporation rate. We shall test the algorithm over the benchmark function mentioned in the Table I. The expression of the benchmark functions are as follows
(7) 
(8) 
(9) 
(10) 
(11) 
(12) 
(13) 
(14) 
and
(15) 
where in (15) is the difference between the target value and predicted value of a training dataset.
Function  Expression  Dim.  Range  
F1  Ackley  as per (7)  15,30  0.0  
F2  Sphere  as per (8)  50,100  0.0  
F3  Sum Square  as per (9)  10,10  0.0  
F4  Dixon & Price  as per (14)  10,10  0.0  
F5  Rosenbrook  as per (10)  5,10  0.0  
F6  Rastring  as per (11)  5.12,5.12  0.0  
F7  Griewank  as per (12)  600,600  0.0  
F8  Zakarov  as per (13)  10,10  0.0  
F9  abolone(RMSE)  as per (15)  90  1.5,1.5  0.0 
F10  baseball(RMSE)  170  0.0 
Iiia Selection Method
The selection of solution is critical to the performance of the ACO provided in Figure 2. We shall analyse, how the selection strategies influence the performance the ACO. Several selection strategies may be adopted for the selection of solutions which have probability of selection assigned as per (1) (named as Weight) or (2) (named as FitVal). Now, we have Roulette Wheel Selection (RWS), Stochastic Universal Sampling (SUS) and Bernoulli Heterogeneous Selection (BHS) selection strategies available at hand. Therefore, each of these three selection strategy may be used to select an individual which has probability of selection assigned using either of the probability assignment method mentioned in (1) and (2). Therefore, we have six different strategies available at hand. The six selection strategies are namely, RWS(FitVal), RWS(Weight), SUS(FitVal), SUS(Weight), BHS(FitVal) and BHS(Weight).
In Roulette Wheel Selection, the individuals occupy the segment of wheel. The slice (segment) of the wheel occupied by the individuals are proportional to the fitness of the individuals. A random number is generated and the individual whose segment spans the random number is selected. The process is repeated until the desired number of individuals is obtained. Each time an individual required to be selected, a random number is generated and tested against the roulette wheel. The test verifies that the random number fall in the span of which segment of roulette wheel. The individual corresponding to the segment to which the random number belongs to is selected.
Unlike roulette wheel selection, the stochastic universal sampling uses a single random value to sample all of the solutions by choosing them at evenly spaced intervals. Initially, it is required to fix the number of candidates to be selected. Let, be the number of solution need to be selected. For the selection of first candidate, the random number in the case of SUS is generated in . For other candidates, random number is obtained, where indicates individual to be selected.
The Bernoulli Heterogeneous Selection (BHS) depend on the Bernoulli distribution
[12] may be described as follows. For independent variables representing the function value of solutions, where indicates the number of solutions. Therefore, to select an individual, the BHS may act as follows. The Bernoulli distribution [12] is a discrete distribution having two possible outcomes labelled by and in which (”success”) occurs with probability and (”failure”) occurs with probability , where . Therefore, it has probability density function(16) 
Accordingly, the outline of Bernoulli Heterogeneous Selection algorithm proposed in Figure 3.
The parameters setting for the performance evaluation of ACO based on selection strategy is as follows, the solution archive , , , , 1000 iterations and distance metric chosen is D2 (Manhattan). The experimental results of the various selection strategy is provided in Table II where the values are the mean of the functions F1 to F10 listed in Table I where the each function F has and its value computed over an average of 20 trials. In other words, the each function values are the average over 20 distinct trials/instances and the final value given in Table I is the average of each function.
Selection probability assignment  Selection Method  

RWS  SUS  BHS  Rank 1  
function fitness value based  28.202  101.252  34.931  41.725 
weight computed based on rank  35.105  95.216  36.503 
Examining Table II, it may be observed that the RWS selection strategy outperform the other selection strategies. The RWS selection with the probability of selection computed based on function value yields best result among the mentioned selection strategies. The performance of BHS selection strategy is competitive to RWS selection. The result in table indicate the perform SUS is worst among the all mentioned selection methods. From Table II, it is interesting to note that performance over probability of selection based on function value performs better than its counterpart with an exceptional being SUS case.
To investigate the differences in the performance of the selection strategies indicated above, three hundred selections by each of the selection methods is plotted graphically in Figure 5. As per the algorithm outlined in Figure 2, to construct a single solution, variable by variable, we need to select a solution from the solution archive of size individuals number of times, where is the number of variables in a solution. Hence, to construct number of solutions each of which having variables selection is required. Figure 5 represents the construction of 10 new solution from a solution archive of size 10 and each solution in the archive is having 30 variables representing the dimension of a function F1. Therefore, 10 30 = 300 selection are made at one iteration. Figure 5 illustrate the mapping of selection made in a single iteration of the algorithm ACO used for optimization of function F1. In figure 5, ten concentric circles represent the solutions in the solution archive. The center of the circle (marked 0) indicate the solution with rank 1 (highest) while the subsequent outer concentric circle indicates the representation of increasing rank of solution. Therefore, the center indicated the best solution whereas the outermost concentric circle indicate the worst solution. Hence from the center to the outermost concentric circle each circle in increasing diameter represents , , , , , , , and ranked solution respectively.
The six selection strategies namely, RWS(FitVal), RWS(Weight), SUS(FitVal), SUS(Weight), BHS(FitVal) and BHS(Weight) are represented by indigo, red, green, purple, blue and orange coloured lines. Examining Figures 4(a) and 5, SUS(FitVal) and SUS(Weight) selection strategy selects the solutions uniformly in each of the new individual construction. The distribution of the selection (picking a solution) is uniform (from best to worst) throughout 300 selection. Therefore, the selection are repeat at a step of 30 (dimension). It may also be observed that the coverage of selection is distributed from the best to worst selection. The results provided in Table II indicates poor performance of SUS selection strategy. Therefore, an uniform selection with wider coverage of ranks happens to be poor selection strategy. In case of RWS(FitVal) and RWS(Weights) selection mapping provided in Figures 4(b) and 5 indicated a span of selection from rank 1 (center marked 0) to rank 5 (outermost circle marked 4) in case of function value and rank 1 to rank 3 in case of Weight. However, its selection is mostly distributed within the range of rank 1 to 4 in case of FitVal and rank 1 to 2 in case of Weight. It is worth mentioning that unlike SUS case, in RWS the selections is not uniform throughout 300 selections for construction of each of the 10 new solutions. The non uniform selection with coverage of selection to an adequate range of best to worst solution helps RWS selection strategy to achieve better performance over its competitor selection strategy. Similar to RWS, BHS selection also offer non uniform selection of individuals but on the contrary to the RWS its coverage of rank is mostly concentrated to fittest individual in the archive. From Figures 4(c) and 5, it may be observed that the BHS selection is nonuniform but its selection spans upto rank 3 among the 10 individuals whereas the RWS spans upto rank 5. From Figures 4(a), 4(b), 4(c) and 5, it may observed that probability of selection computed based on the weights indicated in purple (in Figure 4(a)), red (in Figure 4(b)) and orange (in Figure 4(c)) behaves similar to the probability of selection computed based on function value but, it tend to prefer selection towards best ranks. However, the results provide in Table II indicated the the preference of better rank in case of SUS offers better result than preferring each individuals in archive.
IiiB Distance Measure Metric
After the selection of a solution from an archive, another crucial operation in ACO algorithm is sampling of the selected solution. To sample the selected solution, parameter and need to be computed. As discussed in section II, the parameter is the variable of the solution selected for sampling and the parameter is computed as per expression (5) where distance () is a distance metric listed in Table III. The distance computed between the selected solution (point) with other solution (points) in the solution archive is critical to the performance of ACO algorithm. To compute distance between two points, usually the Euclidean distance metric is used. In general for computing distance between points (, , , ) and (, , , ) Minkowski distance of order r is used. The Euclidean distance is a spacial case of Minkowski distance metric, where r = 2. An experimental result over all the distance metric mentioned in the Table III. The experimental setup is as follows: , , , and 1000 iterations. It may be noted that for the experimentation purpose the RWS selection with probability selection based on function value is used. Examining table III, it is found that the Squared Euclidean (D6) is performed better than all the other distance metric. However, it may also be noted that the performance of ACO decreases over the increasing order of ’r’ of Minkowski metric.
#  Distance Measure Metric  Mean Fun. Value  

Expression  Metric Name  
D1  Minkowsky (r = 0.5)  28.792  
D2  Manhattan (r = 1)  33.203  
D3  Euclidean (r = 2)  44.578  
D4  Minkowsky (r = 3)  45.211  
D5  Minkowsky (r = 4)  51.909  
D6  Minkowsky (r = 5)  53.702  
D7  Squared Euclidean  14.308  
D8  Chebychev  93.642  
D9  Bray Curtis  98.983  
D10  Canberra  103.742 
IiiC Evaporation Rate
The parameter evaporation rate () in ACO algorithm is treated as learning rate. The performance evaluation of ACO based on the evaporation rate with the following parameter combination , , , and 1000 iterations is illustrated in Figure 6. It may be noted that for the experimentation purpose the RWS selection with probability selection based on function value is used and the distance metric D6 is chosen for the computation of . In Figure 6, the values along the vertical axis represents the mean functions F1 to F10 (where each function value is averaged over twenty trails) listed in Table I while the values along the horizontal axis represent evaporation rate . The performance of ACO is evaluated by regulating the evaporation rate between 0.1 and 0.1. Investigating Figure 6, it may be observed that a valley shaped curve is formed. Initially, for , a high mean of function value is noted. While changing the value of , a substantial improvement is being observed in the performance of the ACO. Hence, the performance of ACO is highly sensitive to the parameter . It may be observed from Figure 6, that the increasing the value of enhance the performance of ACO. However, the performance of ACO slightly declined on further increasing the value of onward 0.5. A sudden high drop in performance if observed at the evaporation rate high sensitivity towards evaporation rate can be observed from the Figure 6.
IiiD Comparison with Other metaheuristics
An experiment conducted for the comparison between the improvised ACO and other classical metaheuristic algorithm such as Particle Swarm Optimization (PSO) and Differential Evaluation.
The parameter setting adopted for ACO is as follows: the population , , and 1000 iterations, evaporation rate is set to 0.5, selection method for classical ACO is BHS(Weight) and distance metric is D2 (Manhattan) whereas for ACO (improvised parameters) selection method is RWS(FitVal) and the distance metric is D6 (Squared Euclidean) is considered.
The PSO [13] is a population based metaheuristic algorithm inspired by foraging behaviour of swarm. A swarm is basically a population of several particles. The mechanism of PSO depends on the velocity and position update of a swarm. The velocity in PSO is updated in order to update the position of the particles in a swarm. Therefore, the whole population moves towards an optimal solution. The influencing parameters are cognitive influence , social influence are set to 2.0 and is set to 1.0 and is set to 0.0. The other parameter population size set to 10 and set to 1000 iterations.
The DE [14, 15] inspired by natural evolutionary process is a popular metaheuristic algorithm for the optimization of continuous functions. The parameter of DE such as is set to 0.7, the is set to 0.9 are the major performance controlling parameter. In present study the DE version [16] is used. The other parameter population size set to 10 and set to 1000 iterations. Examining Table IV, it may be concluded that at the present mentioned experimntal/ parameter setup, the improvised version of ACO outperform the classical metaheuristics. However, from the present paper and study and the availability of no free lunch theorem [17], it is clearly evident that the mentioned metaheuristcs are subjected to parameter tuning. Hence a claim of superiority of the present improvised ACO is subject to its comparisons parameter tuning of the other mentioned metaheuristics.
Funtion  Test  ACO  ACO  PSO  DE 

F1  1.72  1.63  17.86  11.16  
var  0.05  0.01  3.80  13.92  
F2  0.69  0.02  7875.01  1610.96  
var  0.02  0.00  1.45E+07  3402.52  
F3  5.57  0.47  488.92  40.81  
var  0.73  4.17  2.00E+05  200.54  
F4  131.42  65.23  2.20E+05  2763.53  
var  4501.12  6160.92  9.35E+09  37334.53  
F5  127.56  32.24  81.27  22.27  
var  308.44  618.88  569.12  56.08  
F6  0.46  0.06  62.64  13.22  
var  0.27  0.01  712.26  39.88  
F7  4.93  12.72  458.32  44.71  
var  1.60  294.76  20947.18  149.05  
F8  11.68  1.05  36556.80  2162.60  
var  324.92  0.03  4.18E+09  13945.71 
Iv Discussion and Conclusion
A comprehensive performance analysis of Ant Colony Optimization is offered in present study. The parameter such as selection strategy, distance measure metric and evaporation rate are put into meticulous tuning. The selection of a variable in construction of new solution. The assignment of the probability of selection to the individuals in the selection strategy influence the performance ACO. Analysing the results produce by the various selection strategy, it may be conclude that the selection strategy, RWS together with the probability of selection computed based on the function value offer better result than its counterparts. The advantages with RWS strategy is due to its ability to maintain non uniformity in selection and prefering not only the best solution in a population of individuals. Rather than sticking to Manhattan distance metric only, it is interesting to test several available distance measure metric for computing average distance between the selected solution and all the other solutions. It is observed from the experiments that the distance metric Squared Euclidean offer better performance among the mentioned distance metric in present study. It may also observed from the analysis that the ACO is highly sensitive towards its parameter, pheromone evaporation rate which control the magnitude of the average distance between the selection solution (variable) to all the other solution (individuals) in the population. A comparison between classical metaheuristic indicated the dominance of the ACO algorithm in present experimental setup. However, as evident from the present study and the no free lunch theorem, the metaheuristic algorithms are subjected to parameter tuning. Therefore, claim of superiority of one metaheuristic algorithm over other will always be under scanner.
Acknowledgment
This work was supported by the IPROCOM Marie Curie initial training network, funded through the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme FP7/20072013/ under REA grant agreement No. 316555.
References
 [1] J. L. Deneubourg, S. Aron, S. Goss, and J. M. Pasteels, “The selforganizing exploratory pattern of the argentine ant,” Journal of Insect Behavior, vol. 3, pp. 159–169, 1990.
 [2] M. Dorigo and L. M. Gambardella, “Ant colony system: A cooperative learning approach to the traveling salesman problem,” Evolutionary Computation, IEEE Transactions on, vol. 1, no. 1, pp. 53–66, 1997.
 [3] M. Dorigo, D. Caro, and L. M. Gambardella, “Ant algorithms for discrete optimization,” Artificial Life, vol. Vol 5, no. No. 2, pp. 137 – 172, 1999.
 [4] M. Dorigo, V. Maniezzo, and A. Colorni, “Ant system: Optimization by a colony of cooperating agents,” IEEE Transactions on Systems, Man, and Cybernetics, vol. Vol 26, no. No. 1, pp. 29 – 41, 1996.
 [5] M. Dorigo and G. Di Caro, “Ant colony optimization: a new metaheuristic,” in Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress on, vol. 2, 1999, pp. –1477 Vol. 2.
 [6] K. Socha and M. Dorigo, “Ant colony optimization for continuous domains,” European Journal of Operational Research, ElSEVIER, pp. 1155–1173, November 2006, dOI : 10.1016/j.ejor.2006.06.046.

[7]
C. Blum and K. Socha, “Training feedforward neural networks with ant colony optimization: An application to pattern classification,” in
Hybrid Intelligent Systems, 2005. HIS’05. Fifth International Conference on. IEEE, 2005, pp. 6–pp.  [8] K. Socha and C. Blum, “An ant colony optimization algorithm for continuous optimization: application to feedforward neural network training,” Neural Computing and Applications, vol. 16, no. 3, pp. 235–247, 2007.
 [9] X. Yao, “A review of evolutionary artificial neural networks,” International Journal of Intelligent Systems, vol. 8, no. 4, pp. 539–567, 1993. [Online]. Available: http://dx.doi.org/10.1002/int.4550080406
 [10] A. Abraham, “Meta learning evolutionary artificial neural networks,” Neurocomputing, vol. 56, no. 0, pp. 1 – 38, 2004.
 [11] S. Haykin, Neural Networks: A Comprehensive Foundation, 1st ed. Upper Saddle River, NJ, USA: Prentice Hall PTR, 1994.
 [12] E. W. Weisstein, “Bernoulli distribution,” mathWorld–A Wolfram Web Resource http://mathworld.wolfram.com/BernoulliDistribution.html.
 [13] R. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory,” in Micro Machine and Human Science, 1995. MHS ’95., Proceedings of the Sixth International Symposium on, 1995, pp. 39–43.
 [14] R. Storn and K. Price, “Differential evolution  a simple and efficient adaptive scheme for global optimization over continuous spaces,” 1995.
 [15] ——, “Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces,” Journal of global optimization, vol. 11, no. 4, pp. 341–359, 1997.
 [16] A. K. Qin, V. L. Huang, and P. N. Suganthan, “Differential evolution algorithm with strategy adaptation for global numerical optimization,” Evolutionary Computation, IEEE Transactions on, vol. 13, no. 2, pp. 398–417, 2009.
 [17] D. H. Wolpert and W. G. Macready, “No free lunch theorems for optimization,” Evolutionary Computation, IEEE Transactions on, vol. 1, no. 1, pp. 67–82, 1997.
Comments
There are no comments yet.