I Introduction
Optimization problems with unknown inputoutput relations or structures too complex to capture are regarded as “blackbox” problems [1]. With minimal assumptions on the problems and nonnecessity for domain specific knowledge, heuristic methods show promising capabilities in addressing these problems [2, 3, 4]. Yet their performance deteriorates as the dimensionality of the problems increases [5, 6, 7]
. Becoming one of the key topics in Large Scale Optimization (LSO) problems, this is widely regarded as the “curse of dimensionality”. To address the performance deterioration in LSO problems, diverse solutions of enhancement emerge.
Efforts have been put into investigating heuristics with higher performance [8]. However, empirical evidence suggests that though these approaches in a way address the bottlenecks, their capabilities inevitably change when applied on problems with different characteristics. This nofreelunch phenomenon directs researchers to the field of metaheuristics, which are the higherlevel procedures to find, generate, or select heuristics that may provide a sufficiently good solution to an optimization problem.
One of the most featured metaheuristics is the Multiple Offspring Sampling (MOS) algorithm [9]. MOS is a parametric framework designed for the dynamic allocation of the limited resources provided for optimization. MOS provides a framework in which the users can focus on embedding more powerful heuristics to enhance the performance. When MOS is articulated with powerful heuristics and used with appropriate hyperparameters, stateoftheart performance is achieved.
From the perspective of dynamic multiarmed bandits problems, a simplified version of Markov Decision Processes (MDP), this paper proposes a Online Decisioning Metaheuristic framework (ODM), to address several concerns of the current metaheuristics and can be highlighted for its potentials to be used in practice.
The rest of this paper is organized as follows. In Section II, we will first conduct critical reviews of the existing metaheuristics and then give clear bottomdown motivations of this work. In section III, we will present the details of the proposed framework ODM. In section IV, theoretical analyses ODM and guidelines for its use in practice will be presented. In section V, we will provide empirical test results that suffice to validate the effectiveness and the potentials of the framework. Finally, Section VI concludes the paper.
Ii Related Works & Motivation
Empirical evidence suggests that the effectiveness of heuristics changes when applied on the problems with different characteristics. The nofreelunch theorem tells us that there can be no universal problem solver, and thus making use of the merits of different heuristics seems a promising solution.
Iia Related Works
Under the setting of limited resources, optimization problems under metaheuristic frameworks are turned into how to implement clever resource allocation methods s.t. we may find a objective values as small as possible.
IiA1 Allocation of Resources
The problem of metaheuristics is the art of efficient resource allocation. In literature, there are diverse ideas about the allocation of resources.
One of the allocation ideas is that we do not consider allocation at all, i.e. just fix the resources for each heuristic and carry them out alternately. These kind of algorithms, such as [10], are often particularly designed for some class of problems and have limited generalization abilities. There are also metaheuristics implicitly using this idea [4]. For instance, knowing that one of the three articulated heuristic, LS1, behaves particularly well on the LSO08 benchmark problems, Multiple Trajectory Search (MTS) [11] always uses LS1 at the end of each iteration. The fix of resources can be interpreted as introducing prior knowledge in metaheuristic design and this harms the generalization abilities when being applied on realworld scenarios.
Another idea is to dynamically adjust the resources distributed to each heuristic for each iteration. For example, MOS [9] runs each of its articulated heuristics within each iteration and dynamically adjusts the resources for each heuristic according to their performance. The adjustment of the resources, or more accurately the change of the configuration for the articulated heuristics, is problematic for the fact that heuristics often require sufficient iterations to show their full capacities for the current state of optimization [12]. If we want to use this kind of allocation, we can only use the heuristics that are insensitive to the amount of provided resources. This leads to the loss of many potentially good heuristic articulation choices.
There are ideas that assign probabilities to each heuristic and treats the articulated heuristics as actions instead of changing their configurations. In
[13], the authors proposed a greedy resource allocation method, which simply switches from the two articulated heuristics when one of them has poor performance. Yet, the greedy exploitation of heuristic performance without intime exploration is not capable for excavating collaborative synergy and may lead to premature convergence. In [14], the algorithm adjusts the probabilities of selecting the articulated local search heuristics. The key idea in this stochastic setting is to leverage exploration and exploitation via the probability adjustments, which are often unsatisfactorily addressed in a unsystematic way.IiA2 Use of Feedback
The problem of the modeling and the utilization of the feedback is another key issue, since the feedbacks serve as guidance for the operations of resource distribution. Proper modeling and utilization of the feedbacks contribute to the goal of minimizing the objective function as much as possible with the limited resources.
The modeling of feedback is the problem of which information should we should make use of. There are some algorithms that model the quality of the offsprings as feedbacks from the optimization process and make use of them. The representative metaheuristic of this kind, MOS, employs notions of the quality of the offsprings and participation ratios to dynamically allocate the resources to each articulated heuristics. The quality of offsprings share implicit connection to maximizing the effectiveness of resource utilization. Another widelyused modeling of the feedback is the relative improvement, employed in the adaptive grouping method decision in [15]. However, this modeling of feedback lacks translation invariance property and can only be used when the global minimum value of the problem is known priori.
IiA3 Framework Interface and Simplicity
The interfaces of the metaheuristic frameworks are crucial factors of their applicability to realworld scenarios. To our recognition, there are mainly two aspects of the practical concerns for the interfaces.
The first problem is the friendliness of the interfaces, for the different types of costs and for the heuristic articulation. The flexibility of interface for costs or resources are wellpursued in Bayesian optimization methods [16, 1]
. Using different kinds of acquisition functions, the Bayesian optimization framework is able to minimize the objective with respect to different types of resources. For example, number of function evaluations (often used in benchmark problems), time (often used when tuning hyperparameters of machine learning models) or money (often used in tuning expensive realworld systems),
etc..The friendliness of interface for heuristic articulation is also another practical concern. To be applied on specific realworld problems, metaheuristic frameworks should be able to be easily articulated with different kinds of heuristics specifically designed for different domain. The need for easy articulation requires us to put minimal constraints on the heuristics. This further explains why we should recognize changing the configurations of the heuristics as a drawback, for it incurs additional constraints which require the heuristics to be insensitive to the change of resources.
The third problem is the variance of the metaheuristic framework. MOS obtains competitive performance via careful tuning of several hyperparameters. In practical applications where the costs must be taken seriously, tuning the hyperparameters of the metaheuristic framework is too costly to be considered. Thus the more hyperparameters, the higher performance variance. We should lower the variance of performance of the framework and decrease the parameter sensitivity as much as possible, aiming to make the metaheuristics quasihyperparameter free.
IiB Motivation
IiB1 Ultimate Goal is Efficiency
Using metaheuristics or not, the ultimate pursuit of optimization should be “efficiency”, in a sense that with limited computational resources, we should improve the global best fitness by the most. In a metaheuristic setting, calling heuristics at a particular stage of the optimization process results in local improvements of the global best fitness and local costs of some given resources. Acknowledging this, a metaheuristic should well capture the local efficiency for each call of the heuristics and try to maximize the cumulative local efficiency throughout the entire optimization process.
IiB2 Dynamic MultiArmed Bandit
Based on both the empirical evidence and theoretical analyses, decreasing the resources allocated to a certain heuristic is likely to decrease the effectiveness of the corresponding approach [12, 17]. If one certain heuristic is penalized in terms of resources for behaving badly for once, it is likely to suffer long term damages. Since do not want to undermine the abilities of articulated heuristics in the optimization process, we naturally turn to the idea of taking the strategies as immutable actions. More specifically, we should fix the builtin heuristics with presumably their best configurations to fully exploit their abilities and use the feedbacks of the optimization process to make online decisions on which action to take.
The notion of “efficiency” is a natural solution for different costs of taking different actions, since it considers improvement and cost simultaneously. With these considerations, the metaheuristic problem is transformed into online decisioning problems with the aim of maximizing the cumulative efficiency of taking actions. We can fit the online decision problem under the systematic framework of Markov Decision Processes (MDP), under which the goal is to maximize the expected cumulative reward. Unfortunately, the MDP of blackbox optimization is very unlikely to be solved using classical approaches for solving MDP, e.g. dynamic programming, since we cannot obtain the Markov kernels for blackbox objectives and behaviors of heuristics.
However, if we reasonably assume that within the consecutive states of the MDP, the reward distribution of taking a certain action changes slowly, i.e. the state of the Markov process changes slowly, we can simplify the MDP problem into the a dynamic multiarmed bandits problem, where the reward distribution of each arm changes gradually through time. The recognition of metaheuristics as dynamic multiarmed bandits can be traced back to [18]. In the dynamic multiarmed bandits setting, we must find an effective change point, s.t. the rewards after the change point can be used to wellapproximate the rewards for the recent future. The online decisioning framework with the perspective of dynamic multiarmed bandits is presented in Fig. 1.
There are several successful algorithms for classical stochastic multiarmed bandits, e.g. softmax, greedy, UCB, etc. [19]. These algorithms aim to achieve the highest cumulative reward via the tradeoff between exploration and exploitation for the actions. For optimization, the type of reward distribution (the distribution of local efficiency for taking each action) is agnostic, thus a nonparametric algorithm should be used. In this paper, we choose the softmax algorithm for its simplicity (only one hyperparameter), stochasticity (nondeterministic), as well as the competitive performance (softmax performs the very competitively in multiarmed bandits problems within the nonparametric algorithms) [19].
IiB3 Behavioral Consistency
Using heuristics on different types of problems results in feedbacks of magnitude unnecessarily the same. Also, in blackbox optimization problems, it is observed that the effectiveness of heuristics generally decrease as they proceed. These phenomena indicate that the magnitude of rewards for the online decisions can be inconsistent, potentially resulting in inconsistent behavior for softmax online decisions. For example, when the rewards are very small, softmax acts almost randomly. When the rewards are huge, softmax acts greedily. Normally for multiarmed bandits, the hyperparameter of softmax, which represents the degree of greediness, will be pretuned. For optimization, we cannot tune the parameters for each algorithm in advance. Thus, we would want a reliable decisioning mechanism that behaves consistently for exploration and exploitation regardless of the magnitudes of the rewards. To deal with this, we introduce normalization techniques into the decisioning s.t. no matter the magnitude of rewards, we can achieve similar behavior.
IiC Expectations of Design
After going through the motivations stepbystep, the idea for our metaheuristic framework can be summarized as: devising a stochastic online decisioning process which makes decisions based on the local efficiency of the calls of heuristics with consistent behavior, expecting to maximize the overall efficiency of resource utilization.
Iii Online Decisioning MetaHeuristic
This section gives details of the controller, an essential object that is responsible for the online decisioning of the proposed metaheuristic framework.
Iiia Efficiency & Formalization
We should first define local efficiency which represents the ratio of the local improvement on the objective function and the consumed resources (local cost) for calling a heuristic in a particular stage. Once the local improvement is defined, local efficiency can be easily derived as the ratio of local improvement and local cost.
In general scenarios, in contrary to solving benchmark problems, the ground truth global minima are unknown. This is to say, a consistent definition of local improvement should be independent of the global optimum. A consistent definition of local improvement should be translation invariant, which means that the local improvement should not change if the search landscape is shifted up or down. Mathematically, let be a translation invariant definition of local improvement when the global best fitness is improved from to by taking action , and we have
(1) 
Among all possible definitions of satisfying Cauchy’s functional equation, the simplest definition of improvement is be the absolute improvement . Using absolute improvement seems naïve since different objective functions have generally different magnitudes of absolute improvements. However, since a normalization technique will be introduced, we can use this simple definition without caring about the difference in magnitude. With the definition of local improvement, we can define local efficiency as the ratio of the absolute improvement in objective values and the cost of resources, as
(2) 
where and are the best known objective values before and after taking the action and and are the number of remaining resources before and after taking the action. Since normalization will be used, and can also be in described in various units, e.g. ratios of remaining resources, absolute number of remaining resources, etc..
Using the defined efficiency, the goal is formalized as
(3) 
where is the amount of resources provided for the whole optimization process.
A stream of fiveelement tuples (called the “records”) is used to record the states and the actions. A record tuple looks like
(4) 
The tuple denotes when action is taken during and , the best known objective value improved from to . With each record, the information of improvement and cost can be easily calculated.
Online decisioning based on local efficiency can be particularly useful in practice for its flexibility. For example, in the problems with limited number of evaluations using the blackbox, the efficiency is defined as the ratio of absolute improvement and the number of evaluations consumed. In the practical scenarios, based on diverse types of resources, including financial cost, time cost or other forms of costs, we can use different definitions of efficiency to tackle the problems.
IiiB Change Point, Normalization & Sliding Window
The multiarmed bandit perspective treats the local efficiency records as the rewards for taking actions and the core problem is how we should deal with the everchanging reward distributions. We must find an effective change point (a certain value of ), s.t. the rewards after the change point can be used to wellapproximate the rewards for the recent future. The change point between the “ancient” rewards and the “recent” rewards directly influences the quality of decisions.
Now let we first suppose we already have a good change point, how can we make use of the information of the rewards with generally decreasing efficiency records? An effective approach to offset the decreasing trend is necessary for consistent decisioning behavior. We should not try to offset the decreasing trend for each action separately since it makes the performance of the trends incomparable and undermines the effectiveness of decisioning. A reasonable solution is to use normalization for all the efficiency records after the change point. This operation can be justified by three points. First, the normalization negates the influence of diverse magnitudes of improvements for each problem. Second, we will show that normalization bounds the probabilities of exploration and exploitation s.t. behavioral consistency can be achieved. Last, normalization gives us flexibility of using absolute improvement as the numerator and resource consumption in any unit as the denominator of the defined efficiency. In this way, we offset the influence of the decrease trends and makes the efficiency records comparable for online decisioning.
Let us go back to the determination of an effective change point. Typically, in statistical analyses, change point detection techniques are used and they seem to be a good choice. However, there are two considerations that drove us away from these ideas: First, the change point detection methods often require modest number of points to alarm. We cannot assume or guarantee that the reward distribution of taking each action changes slowly enough thus the effectiveness of change point detection techniques is likely to be unsatisfactory; Second, the assumption for the distribution and hyperparameters in these methods make the online decisioning sensitive and cause higher variances. For a robust controller, sensitive hyperparameters are intolerable.
In this paper, we use the one of the simplest change point detection methods to tackle the problem: fixing the distance between the current location and the dividing line, i.e. use a sliding window with constant length . Though this method is indeed oversimplifying the problem, it brings convenience to analyze the behavior both theoretically and empirically.
IiiC NonStop Exploration and Soft Decisioning
We can never say that if a heuristic has behaved badly historically, it will not behave well now. Thus the exploration of the rewards for each action must be guaranteed. Also, softmax decision requires at least one reward record for each action. These two concerns can be fixed with a simple rule: if no record of an action is found, the controller will choose such action directly. This rule ensures the nonstop exploration across the whole optimization process no matter how bad the hyperparameters are chosen.
The key operation of the controller is here: we take the mean of the normalized rewards (efficiency records) for each action within the sliding window and apply softmax operation on these mean normalized records to obtain the probability distribution of selecting each action. Then, we uniformly generate random choice based on these actions. The overall decisioning behavior is formulated as pseudocode, presented in Algorithm
1.IiiD Example
In Fig. 2, an example of a onlinedecisioning with sliding window length and three articulated actions is given. Since no records of , and can be found, the first three decisions are just , and . From decision to decision , we linearly normalize the rewards to the interval and conduct the softmax decisions. For decision , no rewards of is found so is taken.
Iv Theoretical Analyses
Iva Computational Complexity
For time complexity, the mean and softmax operations costs at the level , which is trivial compared to the heuristics. For space complexity, we toss away the historical records outside the sliding window since they will not ne used again, resulting in space complexity at most , which is also trivial. The low space and time complexities are desirable.
IvB Probability Bounds for Behavior
The controller has only two hyperparameters, (length of the sliding window) and (temperature coefficient for softmax). With them, we can give the bounds for behaviors of the controller which will serve as a guideline of hyperparameter tuning.
Suppose that there are actions and denote the length of the sliding window . The controller ensures that in the fragment of length , there shall be at least one record of each action. Thus, without the consideration of softmax decisions, the maximum probability of greedy exploitation is and the exploration probability at least , with the probability of choosing each action at least .
Proposition IV.1 (Exploitation Bounds for ODM).
Suppose every action is corresponded with at least one efficiency record within the sliding window of size and is the parameter for softmax decisions, the probability of making the action with the highest mean normalized record (exploitation) in the fragment of length satisfies the bounds
(5) 
Proof.
After the normalization of the controller, the highest reward within the sliding window is and the lowest is . We first stretch the reward stream fragment corresponding to the sliding window into a rectangle by putting the rewards of each action on the corresponding row, as shown in Fig. 3.
Making softmax decisions within the fragment does not care about the orders of rewards, thus an equivalence class of stretched fragments can be obtained by permutations on the fragment before stretching. Suppose that is the best performing action, we can show the upper bound can be obtained within the class equivalent to
where we can put in the yellow cells and in the cyan cells but at most one element in each column. Similarly, we can get the lower bound within the class equivalent to
where “” represents normalized rewards that are infinitely close to but still less than and thus the lower bound cannot be reached.
The probability bounds for exploration can be derived directly by subtracting the exploitation bounds. ∎
The proposition gives us a measurement of behavior of the proposed framework. We can use the bounds inversely to do efficient hyperparameter search.
IvC Theoretically Derived Guidelines for Practical Use
To apply ODM on a practical problem, one should first consider the articulation of specialized heuristics for the corresponding domain. Then, proper hyperparameters should be set. In ODM, the search for effective choices of the only two hyperparameters, and can be assisted with the bound above. If we know how many heuristics are to be articulated and have a preferred interval for the probability of exploration or exploitation, we can inversely locate the potential combinations of the hyperparameters and therefore decrease the risk of using bad hyperparameters. For example, if there are totally heuristics to be articulated, , we can first constrain by and simplify the exploitation bound as
(6) 
Given a preferred exploitation probability interval, say , we can easily solve using since only is involved in the upperbound. Then, we use the solved and to get . For example, suppose we want the exploitation probability for softmax decisions to be within , we can first solve using and get the closest integer solution . Using and , we can find the closest integer solution of as . When we obtained a theoretically derived combination of , we can see from the bound that the two combinations that behaves the most similar are and . Our suggestion is that, if condition permits, to conduct hyperparameter tunings within the three combinations and see which performs the best on the problems. If we do not have the resources to do hyperparameter tuning, we may just use the combination with the smallest , since smaller leads to quicker adaptivity.
V Experimental Studies
Va Experimental Settings
VA1 Framework Setting
To validate the effectiveness of the controller, we articulate it with three heuristic approaches, including LS1 (LS), cooperative coevolution with random grouping (CC) and global search (GS). This is by no means the best possible combination. The reasons why we choose these heuristics are mainly three. First, from the empirical evidence, we know that the three choices, being powerful optimization heuristics, show effectiveness on different kinds of problems respectively. Combining them is potentially beneficial to deal with the complexities of the LSO problems; Second, these heuristics are used in literature, so that we can borrow the knowledge of finetuning these heuristics to their satisfactory configurations. This is the same routine we have suggested in the guidelines; Finally, these heuristics have relatively wide intersections with the stateoftheart algorithms, which makes the comparison reasonable. The details of the heuristics are presented in Table I.
Name  Costs  Details  Configurations 

LS  Local Search strategy used in MTS [11], MOS [9], resembling the trajectory search based algorithms.  The same as the cofiguration in MOS: Initial step size is , where and are the lower and upper box constraints respectively. Minimal step size is .  
CC  Cooperative Coevolution with random grouping and SaNSDE [20] as optimizer, resembling the DECC family [21].  A robust and classical configuration: The mean of NP is set to be , group size is and generations is assigned for each group.  
GS  Global Search that applies SHADE [22] on all dimensions of the problem, resembling direct optimization strategies, e.g. CSO [23].  The same configuration as SHADEILS [24]: NP is set to be , iterate for generations. 
Based on empirical knowledge, we want to set the probability of exploitation approximately in the interval . Using the bounds, we can obtain that the top three integer solutions of are , and . In the experiments, we will first conduct sensitivity analyses on the three combinations of hyperparameters to locate a best setting.
VA2 Compared Algorithm
In the later part of this section, the performance of the optimization process will be compared with several stateoftheart algorithms, including CCCMAES [15], DECCD [25], DECCDG2 [26], MTS[11] and CSO[23]. MTS updates each of the foreground individuals with the most rewarded local search method with best sampled performance [11]. CSO utilizes pairwise competition strategy to let losers learn from the winners [23]. DECCDG2 is a faster and more accurate version of DECCDG [27], where the accurate decomposition of the subcomponents is firstly calculated via the perturbation on the objective function before the subcomponents are cooperatively coevolved [26]. DECCD is designed for nonseparable LSO problems, where the cooperative coevolution is conducted on the subcomponents generated by frequently using the Delta grouping scheme [25].
VA3 Problem Settings
There are two sets of benchmark problems used in this section.
The CEC’2013 LSO (LSO13) benchmark suite represents a wider range of realworld largescale optimization problems. With illconditioned complicated subcomponents and irregularities [28], the LSO13 suite will serve nicely to test the overall capabilities of ODM. For the test cases in LSO13 suite, the resource for optimization is concretized as the total number of objective function evaluation and the problem dimensions are roughly .
The CEC’2008 Large Scale Optimization (LSO08) benchmark problems are naturally scalable to higher dimensions, which will serve as the test problems for our stability analyses on the impact of problem dimensionality on the controller behavior. For the test cases derived from the LSO08 benchmark suite, , where is the dimensionality of the search space.
Additionally, the default machine epsilon for IEEE754 double precision, i.e. , is adopted for all the results. That is to say, any values less or equal to will be regarded as .
VA4 Data Source
Results of all the compared algorithms are gathered from the implementations according to the corresponding papers on the single objective optimization experiment platform SOOPLAT^{1}^{1}1https://github.com/PwnerHarry/SOOPLAT, with the source code of all reproduced algorithms except MOS. We cannot satisfactorily reproduce the algorithm with consistent performance as in the literature. For MOS, we used the results directly copied from the literature.. Their parameter settings are identical to the settings used in the papers.
VA5 Statistical Test Settings
To statistically show the general performance rankings of algorithms when doing a comprehensive analyses, in this section, Friedman tests will be conducted with significance level on the mean gathered from many independent runs of every test cases and every algorithms. If Friedman tests tell , we can conclude that the performance rankings given by Friedman tests are statistically meaningful.
To show oneonone performance differences of algorithms on the test cases, paired tests will also be conducted with significance level of . Paired tests tell us whether we have the confidence to say one algorithm performs better than another on a certain test case. In this section, we will give collective test results in the format of a “” string. represents the number of test cases in which we are highlyconfident that the first algorithm gives smaller results than the second one. represents the opposite of . represents the number of test cases in which we have no confidence of telling the difference between the results.
VB HyperParameter Sensitivity
Using the probability bound and a desired interval of the probability of exploitation as the guideline for hyperparameter tuning, the search space of the hyperparameters collapsed into distinct points. We run ODM times using these configurations on the LSO13 benchmark problems to obtain the best hyperparameters for the test. The corresponding results are presented in Table II.
LSO13  

mean  std  mean  std  mean  std  
0.00E+00  0.00E+00  0.00E+00  0.00E+00  0.00E+00  0.00E+00  
8.69E+00  2.78E+00  1.20E+01  2.54E+00  1.28E+01  4.70E+00  
9.83E13  5.27E14  8.51E13  5.78E14  1.05E12  5.35E14  
6.98E+08  2.51E+08  6.04E+08  2.02E+08  6.27E+08  2.99E+08  
2.68E+06  4.38E+05  2.72E+08  5.12E+05  2.70E+06  6.04E+05  
4.44E+04  3.42E+04  7.14E+04  1.92E+04  9.48E+04  1.87E+04  
1.58E+05  3.68E+04  1.34E+05  5.02E+04  2.46E+05  7.81E+04  
1.24E+11  9.29E+10  2.61E+11  1.51E+11  7.16E+11  5.66E+11  
2.36E+08  2.61E+07  2.25E+08  4.38E+07  2.52E+08  4.64E+07  
7.64E+05  5.90E+05  8.18E+05  5.79E+05  8.00E+05  2.63E+05  
3.33E+07  1.08E+07  2.76E+07  8.55E+06  4.45E+07  3.48E+07  
6.27E+02  2.11E+02  6.69E+02  2.09E+02  4.32E+02  2.57E+02  
1.14E+07  2.20E+06  1.07E+07  2.24E+06  1.05E+07  1.13E+07  
4.35E+07  6.85E+06  4.17E+07  7.80E+06  4.29E+07  1.00E+07  
3.66E+06  2.32E+05  3.63E+06  2.01E+06  4.25E+06  9.17E+05  
ttest  3/9/3  6/8/1  
Color indicators are added for each test case. The greener, the better performance. 
From the perspective of the test results, the hyperparameter pairs and performed similarly well. The hyperparameter pair behaves relatively badly, we think it is because the length of the sliding window is too large to capture the changes of reward distributions in time. In the following parts, we will stick to the setting.
VC ODM Validation
In this section, we will validate the effectiveness of the proposed ODM via the comparison with several baseline algorithms, illustrated in details in Table III.
Baseline  Description 

LS  Uses only LS throughout the whole optimization process. 
CC  Uses only CC. 
GS  Uses only GS. 
random  Purely random choices for action decisions. A baseline for comparative study on the decision behaviors. 
BEK  The BEst Known average performance achieved by using from LS, CC and GS. The best known decision sequences come from extensive stochastic runs using the three heuristics. For each problem, we use the unique best known sequence to run independently. The performance obtained using the “BEK” baseline indicates the full potential of the articulated framework and serves as the approximate lower bound of the error obtained by ODM. The quality of the decision sequences can be captured via the similarity to the BEK sequences. 
VC1 Performance and Decision Quality Analysis
We run ODM and the baselines on the test cases times and gather the results in Table IV, together with the decision sequence similarity scores with respect to the sequences used in the BEK baseline. The sequence similarity scores are computed as mean values of the local alignment scores via the SmithWaterman method [29]. The higher the similarity scores, the more similar the decision sequences are to the best known sequence. The performance and the similarity are converted to two radar charts for easy visualization, shown in Fig. 4.
LSO13  BEK  ODM  RAN  LS  CC  GS  

mean  std  mean  std  sim  mean  std  sim  mean  std  sim  mean  std  sim  mean  std  sim  
0.00e0  0.00e0  0.00e0  0.00e0  4.41e10  1.36e9  0.00e0  0.00e0  4.77e4  1.32e3  6.85e5  1.01e5  
2.25e2  1.02e2  8.69e0  2.78e0  78.13%  2.44e2  3.83e1  31.56%  5.54e2  4.02e1  23.87%  2.40e0  1.50e0  92.25%  9.89e3  1.45e3  2.33%  
4.85e14  2.05e14  9.83e13  5.27e14  69.13%  9.06e13  6.55e14  28.57%  1.04e12  6.51e14  71.43%  1.19e0  1.11e1  31.43%  6.63e0  4.41e1  0.00%  
5.45e8  1.45e8  6.98e8  2.51e8  51.43%  1.23e9  3.90e8  16.47%  1.86e10  9.98e9  5.88%  1.22e11  6.65e10  12.25%  6.34e8  1.00e8  68.24%  
9.54e5  4.32e5  2.68e6  4.38e5  30.34%  2.52e6  9.94e5  42.15%  1.23e7  2.93e6  11.05%  1.26e7  3.42e6  20.01%  2.20e6  6.93e5  65.98%  
3.93e4  3.55e4  4.44e4  3.42e4  44.51%  5.99e4  3.35e4  46.58%  9.84e5  4.13e3  4.29%  9.82e5  4.62e3  2.17%  4.02e4  3.32e4  71.35%  
8.07e4  9.25e3  1.58e5  3.68e4  44.25%  4.10e5  4.22e5  33.15%  3.10e8  4.81e8  19.77%  3.13e9  2.20e9  24.24%  1.06e6  1.00e5  33.37%  
2.93e9  4.35e9  1.24e11  9.29e10  51.25%  1.45e12  1.42e12  44.42%  2.96e15  1.75e15  21.24%  4.93e16  3.12e16  9.13%  1.55e11  1.66e10  49.18%  
1.43e8  2.25e7  2.36e8  2.61e7  26.14%  2.08e8  4.02e7  31.15%  1.01e9  1.91e8  12.31%  9.34e8  2.05e8  8.31%  2.63e8  4.35e7  26.75%  
7.70e5  6.62e5  7.64e5  5.90e5  85.86%  9.33e5  5.28e5  42.00%  6.37e7  3.54e7  3.50%  6.46e7  3.40e7  6.87%  1.06e6  4.34e5  34.63%  
1.35e7  2.21e6  3.33e7  1.08e7  71.54%  1.84e8  9.53e7  22.78%  1.07e11  1.19e11  3.15%  1.17e11  1.14e11  9.55%  1.34e7  2.60e6  83.36%  
2.25e4  2.25e1  6.27e2  2.11e2  32.57%  8.12e2  2.69e2  18.82%  6.14e2  2.65e2  31.10%  2.42e3  4.79e2  24.56%  2.33e3  1.77e2  3.87%  
4.45e5  2.17e5  1.14e7  2.20e6  49.79%  9.28e7  1.05e8  22.53%  3.36e9  1.63e9  15.64%  5.34e9  1.90e9  19.35%  1.18e7  1.61e6  45.98%  
5.99e5  1.01e5  4.35e7  6.85e6  31.95%  5.36e7  1.12e7  19.59%  1.25e11  1.41e11  11.25%  1.60e11  9.10e10  15.25%  5.04e7  5.73e6  28.65%  
9.71e5  4.24e4  3.66e6  2.32e5  66.87%  8.78e6  2.82e6  31.66%  3.05e8  9.52e7  16.97%  2.15e9  2.77e9  22.65%  8.59e6  1.08e6  28.89%  
rank  performance/sim  1.63  4.29  2.53  3.32  3.70  1.79  4.67  2.14  2.67  3.46  
test  9/5/1  13/2/0  14/0/1  8/5/2  
We cannot compute similarity scores for since the best decision sequence is not unique.  
For each test case, the greener the indicators, the better the performance.  
The Friedman tests are conducted within ODM, RAN, LS, CC and GS. For the Friedman tests on the errors, . The smaller the ranks, the better the performance. For the Friedman tests on the similarity scores, . The larger the ranks, the more similar the decision sequences are to the best known ones.  
The tests are conducted in pairwise manner on ODM and RAN, ODM and LS, ODM and CC and ODM and GS. 
By analyzing the results in the table and the radar charts, we obtain the following observations:

In terms of the objective function values, ODM achieves significantly better performance than the baselines (RAN, LS, CC and GS), according to test results as well as the Friedman test results. This is a strong evidence for the effectiveness of the online decisions. On , , , and , ODM achieves roughly equivalent performance to the best performing heuristics. This is interesting since the nonstop exploration of the controller should have increased the overhead for identifying the global best action instead of minimizing it. The phenomenon indicates that nonstop exploration is beneficial, even if it potentially causes overhead and thus justifies the principles used in the controller design. On , , and , ODM achieves collaborative synergy, performance that is significantly better than using any one of the heuristics. On , , and , ODM achieves statistically equivalent performance to the BEK baseline, which is the best performance that ODM can possibly achieve.

In terms of decision similarity to the best known sequence, we can see that ODM is significantly more similar in distribution, than the RAN baseline and all the single heuristics. This means that ODM is able to excavate effective sequences of actions to deal with optimization.
To intuitively show the effectiveness of online decisions, in Fig. 8, we present three sets of realtime error bands of test cases where collaborative synergy is achieved. For , though ODM achieves faster convergence than BEK baseline in the early stages, BEK achieves globally better performance. This means we can still enhance ODM by designing more effective rules for the balance between exploration and exploitation.
VC2 Stability Analysis
A stable metaheuristic framework must be able to effectively deal with LSO problems with different dimensionality. The “curse of dimensionality” reminds us that the effectiveness of the articulated heuristics inevitably deteriorate with the increase of the dimensionality of the problems. Thus if the controller can lead to competitive performance within problems regardless the widelyranging problem dimensions, we should be able to suggest that the controller is with satisfactory stability and robustness. For this consideration, in this part, comparative trials on the LSO08 benchmark problems to are conducted. We run independent runs for all test cases corresponding to six problems with . Note that for LSO08 tests, the maxFEs scales linearly with problem dimension, which is also true for our settings of the heuristics. This means that the length of the decision sequence on different dimensions are roughly the same and thus comparable. In Table V, the mean and std performances of ODM obtained over independent runs on each test case is given. In Table VI, the similarity score matrices of the decision sequences are presented.
1000  2500  5000  10000  
Problem  mean  std  mean  std  mean  std  mean  std 
0.00e0  0.00e0  0.00e0  0.00e0  0.00e0  0.00e0  0.00e0  0.00e0  
2.60e1  2.53e0  8.58e1  2.12e0  1.25e2  1.97e0  1.44e2  7.53e1  
3.26e0  3.67e0  5.81e2  5.99e2  1.68e3  1.18e3  1.61e3  1.84e3  
0.00e0  0.00e0  0.00e0  0.00e0  0.00e0  0.00e0  0.00e0  0.00e0  
3.67e15  1.80e16  1.83e14  1.55e15  4.06e14  1.58e15  9.57e14  1.17e14  
1.04e12  5.37e14  5.50e13  2.01e14  1.15e12  3.64e14  2.70e12  1.62e13  
is excluded since we do not know its ground truth minimum. 
1000  1000  52.14%  48.34%  23.07%  1000  1000  66.63%  24.13%  12.75%  1000  1000  70.13%  66.75%  62.11%  1000 
2500  85.40%  2500  46.26%  31.68%  2500  55.12%  2500  42.67%  19.57%  2500  71.33%  2500  72.25%  69.33%  2500 
5000  78.46%  81.33%  5000  43.49%  5000  43.09%  53.25%  5000  33.26%  5000  51.24%  60.01%  5000  68.24%  5000 
10000  71.98%  75.47%  77.35%  10000  10000  24.25%  32.75%  54.33%  10000  10000  33.26%  39.66%  44.25%  10000  10000 
Each entry is the mean similarity score of decision sequences for a problem explained by the row dimensionality indicator and the column dimensionality indicator. For example, the bottom left is the mean value of similarity scores of the sequences under to the sequences under .  
Color indicators have been added for each problem. The greener the cell is shaded, the more similar the corresponding decision sequences are. The redder, the less similar. 
From the similarity matrices, it can be observed that generally the larger difference in problem dimensionality, the less similar the sequences are. Though the similarity scores seem relatively small, the performance under different dimensionality is roughly equivalent^{2}^{2}2The differences in performance may seem quite significant before comparing with other algorithms. and within of problems very close to the global optimum. This indicates that, though the patterns of sequences are not similar when the problem dimensionality changes, the qualities of the decisioning remain insensitive, providing another strong evidence for the effectiveness and the robustness of ODM.
VD Comparison with StateoftheArt Algorithms
In this subsection, we compare the performance ODM gave in the previous sections with the performance of the stateoftheart algorithms to demonstrate the potentials and the effectiveness of the proposed framework.
VD1 Comprehensive Comparison
To demonstrate the potentials, on the LSO13 benchmark problems, we compare the performance of ODM with six stateoftheart algorithms, including MTS [11], MOS [9], CSO [23], CCCMAES [15], DECCDG2 [26] and DECCD [25]. These algorithms are known by their competitive performance on the LSO benchmark problems. This will be a strict test on the capacities of the proposed framework. Under the CEC’2018 official competition standards, we run each of the algorithms times on each test case.
LSO13  ODM  MTS  MOS  CSO  CCCMAES  DECCDG2  DECCD  

mean  std  mean  std  mean  std  mean  std  mean  std  mean  std  mean  std  
0.00e0  0.00e0  0.00e0  0.00e0  0.00e0  0.00e0  0.00e0  0.00e0  3.01e2  1.22e1  9.54e5  1.57e6  4.23e12  9.08e13  
8.69e0  2.78e0  7.98e2  5.41e1  1.93e1  4.16e0  7.25e2  2.72e1  1.97e3  2.73e2  1.38e4  1.78e3  1.16e3  2.92e1  
9.83e13  5.27e14  9.18e13  6.31e14  0.00e0  0.00e0  1.19e12  2.11e14  1.20e13  3.08e15  1.07e1  8.71e1  8.52e10  5.18e11  
6.98e8  2.51e8  2.47e10  1.32e10  1.34e10  7.69e9  1.13e10  1.29e9  1.13e10  9.91e9  5.15e8  2.39e8  3.81e10  1.61e10  
2.68e6  4.38e5  1.02e7  1.17e6  1.11e7  1.76e6  7.66e5  1.17e5  8.72e6  2.52e6  2.51e6  4.49e5  9.71e6  1.77e6  
4.44e4  3.42e4  8.84e5  1.66e5  9.85e5  3.22e3  4.36e8  1.58e9  7.55e5  3.65e5  1.25e5  2.01e4  1.57e4  2.88e4  
1.58e5  3.68e4  7.08e7  7.60e7  2.31e7  4.12e7  7.94e6  2.88e6  5.37e6  1.15e7  1.54e7  1.01e7  3.29e9  1.14e9  
1.24e11  9.29e10  1.71e15  5.60e14  1.64e15  1.66e15  3.07e14  7.64e13  5.87e14  2.02e14  9.35e13  4.28e13  1.92e15  9.47e14  
2.36e8  2.61e7  7.32e8  9.60e7  8.97e8  1.39e8  4.59e7  8.57e6  5.19e8  1.70e8  3.06e8  7.37e7  7.18e8  1.14e8  
7.64e5  5.90e5  2.16e6  2.45e6  6.05e7  2.91e7  5.35e5  2.19e4  7.11e7  2.94e7  1.43e2  1.87e1  1.03e3  1.65e3  
3.33e7  1.08e7  6.26e9  1.64e10  4.01e10  1.23e11  3.87e8  1.13e8  4.52e8  1.18e9  8.82e9  2.60e10  4.13e10  9.40e10  
6.27e2  2.11e2  1.03e3  8.70e2  8.63e1  7.71e1  1.43e3  8.80e1  1.24e3  8.61e1  1.51e8  3.64e8  1.36e3  1.32e2  
1.14e7  2.20e6  1.30e9  1.51e9  1.13e9  7.71e8  5.65e8  1.87e8  7.42e9  4.97e9  9.62e8  3.80e8  4.33e10  9.30e9  
4.35e7  6.85e6  4.11e10  7.79e10  6.89e9  1.41e10  6.62e10  1.30e10  8.06e9  2.05e10  3.39e10  2.15e9  7.86e11  2.91e11  
3.66e6  2.32e5  4.07e7  1.23e7  1.31e8  6.02e7  1.59e7  1.06e6  3.51e6  1.02e6  1.55e7  1.36e6  5.39e7  4.53e6  
test  11/3/1  10/3/2  10/1/4  11/3/1  11/2/2  12/1/2  
rank  2.033  4.833  4.500  2.933  4.033  4.067  5.600  
For each test case, the greener the indicator, the better the performance. The best performance is in bold type and shaded grey.  
Friedman test: , . The best ranking is in bold type and shaded grey.  
If ODM performs better in terms of test results, the corresponding “l/u/g” string is in bold type and shaded grey. 
The color indicators, the Friedman rankings and the paired test results unanimously show that ODM achieves the best results within the compared algorithms. Furthermore, on test cases, ODM achieved errors at least one order of magnitude lower than all the compared algorithms. Since we do not add any additional components such as restart mechanisms to further enhance the performance, it can be concluded that the potential of the proposed framework ODM is amazingly satisfactory.
VD2 Scalability Comparison
We also want to check validate the sensitivity of performance to the problem dimensionality, i.e. the scalability of ODM. In this part, we run tests on the scalable LSO08 problems, with the same compared algorithms as the last test. For each of the six problems in LSO08 suite, we make test cases with dimensions , , and . This makes test cases in total. For each test case, we run each algorithm
independent times to gather the mean and standard deviation results and uses the same analyses as before. The results are presented in Table
VIII.LSO08  ODM  MTS  CSO  CCCMAES  DECCDG2  DECCD  

D  mean  std  mean  std  mean  std  mean  std  mean  std  mean  std  
1000  0.00e0  0.00e0  0.00e0  0.00e0  0.00e0  0.00e0  0.00e0  0.00e0  1.08e2  4.40e2  0.00e0  0.00e0  
2500  0.00e0  0.00e0  0.00e0  0.00e0  1.48e20  7.71e22  0.00e0  0.00e0  1.11e5  1.25e5  0.00e0  0.00e0  
5000  0.00e0  0.00e0  0.00e0  0.00e0  5.32e18  4.63e19  0.00e0  0.00e0  6.46e5  6.31e4  0.00e0  0.00e0  
10000  0.00e0  0.00e0  0.00e0  0.00e0  1.64e17  2.02e18  0.00e0  0.00e0  6.47e7  6.94e5  0.00e0  0.00e0  
1000  2.60e1  2.53e0  1.12e2  8.86e0  8.04e1  2.43e0  1.62e2  7.74e0  7.40e1  1.66e0  5.65e1  4.63e0  
2500  8.58e1  2.12e0  1.47e2  1.30e0  4.61e1  1.28e0  1.82e2  1.14e1  1.39e2  2.96e0  6.72e1  4.56e0  
5000  1.25e2  1.97e0  1.59e2  1.12e0  8.26e1  1.30e0  1.90e2  5.81e0  1.44e2  3.78e0  8.21e1  4.23e0  
10000  1.44e2  7.53e1  1.69e2  1.32e0  1.23e2  1.48e0  1.95e2  5.27e1  1.96e2  8.58e1  8.69e1  4.21e0  
1000  3.26e0  3.67e0  1.69e2  1.27e2  1.26e3  1.42e2  1.02e3  2.87e1  5.49e6  1.71e7  1.23e3  1.10e2  
2500  5.81e2  5.99e2  7.68e2  2.50e2  2.54e3  2.48e1  2.82e3  8.96e1  7.37e9  4.93e9  3.16e3  4.91e2  
5000  1.68e3  1.18e3  1.35e3  3.55e2  5.45e3  1.36e2  5.75e3  1.89e2  2.22e11  6.49e10  6.19e3  4.41e2  
10000  1.61e3  1.84e3  2.04e3  6.95e2  1.57e4  8.75e2  1.16e4  2.97e2  8.39e13  1.21e12  1.22e4  4.06e2  
1000  0.00e0  0.00e0  0.00e0  0.00e0  7.05e2  3.00e1  1.93e3  1.29e2  4.63e3  4.91e2  5.21e2  2.27e1  
2500  0.00e0  0.00e0  7.96e1  4.45e1  1.22e3  4.07e1  5.31e3  2.48e2  1.68e4  1.63e3  1.23e3  6.12e1  
5000  0.00e0  0.00e0  3.98e0  1.22e0  2.84e3  3.29e1  1.10e4  1.39e2  4.09e4  3.73e3  2.38e3  6.84e1  
10000  0.00e0  0.00e0  8.29e0  3.91e0  9.19e3  1.63e2  2.22e4  5.49e2  2.62e5  3.92e2  4.68e3  5.64e1  
1000  3.67e15  1.80e16  3.71e15  1.37e16  2.22e16  0.00e0  2.71e3  5.92e3  4.46e1  5.78e1  1.72e15  9.17e17  
2500  1.83e14  1.55e15  1.92e14  4.78e16  4.44e16  0.00e0  1.97e3  4.41e3  1.35e3  2.05e2  3.45e3  7.70e3  
5000  4.06e14  1.58e15  6.53e14  3.24e16  6.66e16  0.00e0  2.11e14  6.97e15  1.65e4  1.23e3  1.17e14  1.45e16  
10000  9.57e14  1.17e14  7.07e5  1.58e4  1.13e15  4.97e17  4.02e14  2.48e14  5.85e5  5.46e3  2.35e14  1.57e16  
1000  1.04e12  5.37e14  9.22e13  4.28e14  1.20e12  1.61e14  1.17e13  3.25e15  1.09e1  7.98e1  1.05e13  2.65e15  
2500  5.50e13  2.01e14  2.31e12  4.63e13  2.92e12  4.44e14  3.61e0  8.07e0  1.46e1  2.93e1  2.62e13  6.16e15  
5000  1.15e12  3.64e14  3.53e12  8.94e13  4.51e11  7.07e13  1.82e1  8.90e2  1.75e1  9.26e1  5.13e13  5.55e15  
10000  2.70e12  1.62e13  5.65e12  2.53e13  1.66e10  2.62e11  1.85e1  7.29e1  2.16e1  6.60e3  1.03e12  1.29e14  
test  1000  2/3/1  4/1/1  3/2/1  5/1/0  3/1/2  
2500  5/1/0  4/0/2  5/1/0  6/0/0  3/1/2  
5000  4/2/0  4/0/2  4/1/1  6/0/0  2/1/3  
10000  3/3/0  4/0/2  4/1/1  6/0/0  2/1/3  
rank  1000  2.25  3.08  3.67  4.00  5.50  2.50  
2500  1.92  2.92  2.83  4.42  5.67  3.25  
5000  2.42  3.08  3.17  4.42  5.50  2.42  
10000  2.25  3.08  3.50  3.92  6.00  2.25  
MOS is excluded in this set of test since we cannot satisfactorily reproduce the algorithm or find results of these test cases in the literature.  
For each test case, the greener the indicator, the better the performance. The best performance is in bold type and shaded grey.  
All Friedman tests satisfy . The best ranking is in bold type and shaded grey.  
If ODM performs better in terms of test results, the corresponding “l/u/g” string is in bold type and shaded grey. 
In out of test cases, ODM achieved the best performance. In terms of Friedman tests, ODM has consistently obtained the best performance. In terms of tests, ODM has the second better performance following DECCD. DECCD has better performance in the higher dimensions whereas our ODM has better performance within the lower dimensional test cases. Though for the scalability comparison tests we cannot say ODM achieves the best performance unanimously, the potential of the proposed framework is thoroughly demonstrated.
In Fig. 13, we present the realtime bands of the compared algorithms on of LSO08 suite. MTS obtains fast convergence in the D test case however exhibits significant deterioration in performance when the problem dimensionality rises. This indicates that the heuristics MTS have articulated has limited scalability. Since there is no free lunch, this is always true for any heuristics. The necessity for articulating more robust heuristics to metaheuristic frameworks leads to the necessity of friendly interfaces for heuristic articulation in metaheuristic frameworks, which can be highlighted for ODM.
Vi Conclusion, Introspection & Future Works
Out of empirical concerns, this paper formulates a metaheuristic framework that makes online decisions according to a controller, which decides upon the recent performance of the articulated heuristics. The controller is designed purposely to address the problems of the current LSO metaheuristic frameworks, with robust interfaces for practical use, simplicity to ensure lowvariance performance and theoretically derived guidelines for hyperparameter tuning. It has shown significance in action decisioning when articulated with heuristics on several benchmark problems, without embedding effective approaches such as restart mechanisms, etc..
Frankly, there are drawbacks of the proposed controller waiting to be addressed. First and foremost, the controller is built upon teh assumption that the state of the Markov process change sufficiently slowly, which is not guaranteed for general problems or implicitly conveys the restriction for the articulated heuristics to be not so costly interms of resources. Second, the controller itself is a heuristic approach with no theoretical guarantees for the closeness to the optimal decision sequence; Third, the identification of the change point using a constant size sliding window is, though empirically proved effective, oversimplified.
In this paper, we identify the hybrid LSO problems as solving MDPs. Though there are many effective and theoretically backed approaches of solving a single MDP, such as reinforcement learning, in the blackbox scenarios, we cannot use them until we can propose a satisfactory approach of transferring the knowledge of one MDP to other problems to achieve better results on other problems,
i.e.to properly setup a transfer learning framework. We will focus our future work on the transfer learning setup, which should be built upon effective invariance relations that catches the mutual information of the blackbox problems.
References
 [1] R. MartinezCantin, “Funneled bayesian optimization for design, tuning and control of autonomous systems,” IEEE Trans. Cybern., pp. 1–12, 2018.

[2]
X. Ma, X. Li, Q. Zhang et al.
, “A survey on cooperative coevolutionary algorithms,”
IEEE Trans. Evol. Comput., pp. 1–1, 2018.  [3] R. Cheng, M. N. Omidvar, A. H. Gandomi et al., “Solving incremental optimization problems via cooperative coevolution,” IEEE Trans. Evol. Comput., pp. 1–1, 2018.

[4]
Y. Cao, H. Zhang, W. Li et al.
, “Comprehensive learning particle swarm optimization algorithm with local search for multimodal functions,”
IEEE Trans. Evol. Comput., pp. 1–1, 2018.  [5] F. van den Bergh and A. P. Engelbrecht, “A cooperative approach to particle swarm optimization,” IEEE Trans. Evol. Comput., vol. 8, no. 3, pp. 225–239, 2004.
 [6] H. Ge, L. Sun, G. Tan et al., “Cooperative hierarchical pso with two stage variable interaction reconstruction for large scale optimization,” IEEE Trans. Cybern., vol. 47, no. 9, pp. 2809–2823, 2017.
 [7] I. Loshchilov, T. Glasmachers, and H. Beyer, “Large scale blackbox optimization by limitedmemory matrix adaptation,” IEEE Trans. Evol. Comput., pp. 1–1, 2018.
 [8] Z. Li, Q. Zhang, X. Lin et al., “Fast covariance matrix adaptation for largescale blackbox optimization,” IEEE Trans. Cybern., pp. 1–11, 2018.
 [9] A. LaTorre, S. Muelas, and J. M. Pena, “Multiple offspring sampling in large scale global optimization,” in IEEE Congr. Evol. Comput., 2012, pp. 1–8.
 [10] A. Boluf R hler, S. FiolGonz lez, and S. Chen, “A minimum population search hybrid for large scale global optimization,” in IEEE Congr. Evol. Comput., 2015, pp. 1958–1965.
 [11] L.Y. Tseng and C. Chen, “Multiple trajectory search for large scale global optimization,” in IEEE Congr. Evol. Comput., 2008, pp. 3052–3059.
 [12] N. Hansen and A. Auger, “Principled design of continuous stochastic search: From theory to practice,” Theory and Principled Methods for the Design of Metaheuristics, pp. 145–180, 2014.
 [13] S. Ye, G. Dai, L. Peng et al., “A hybrid adaptive coevolutionary differential evolution algorithm for largescale optimization,” in IEEE Congr. Evol. Comput., 2014, pp. 1277–1284.
 [14] D. Molina and F. Herrera, “Iterative hybridization of de with local search for the CEC’2015 special session on large scale global optimization,” in IEEE Congr. Evol. Comput., 2015, pp. 1974–1978.
 [15] J. Liu and K. Tang, “Scaling up covariance matrix adaptation evolution strategy using cooperative coevolution,” in International Conference on Intelligent Data Engineering and Automated Learning. Springer, 2013, pp. 350–357.
 [16] B. Shahriari, K. Swersky, Z. Wang et al., “Taking the human out of the loop: A review of bayesian optimization,” Proceedings of the IEEE, vol. 104, no. 1, pp. 148–175, 2016.
 [17] N. Hansen, “The CMA evolution strategy: A tutorial,” ArXiv, vol. 1604, no. 00772v1, 2016.

[18]
L. DaCosta, A. Fialho, M. Schoenauer et al., “Adaptive operator
selection with dynamic multiarmed bandits,” in
Conference on Genetic and Evolutionary Computation
, 2008, pp. 913–920.  [19] V. Kuleshov and D. Precup, “Algorithms for multiarmed bandit problems,” J. Mach. Learn. Res., vol. 1, pp. 1–48, 2000.
 [20] M. N. Omidvar, X. Li, Z. Yang et al., “Cooperative coevolution for large scale optimization through more frequent random grouping,” in IEEE Congr. Evol. Comput., 2010, pp. 1–8.
 [21] Z. Yang, K. Tang, and X. Yao, “Large scale evolutionary optimization using cooperative coevolution,” Info. Sci., vol. 178, no. 15, pp. 2985–2999, 2008.
 [22] R. Tanabe and A. S. Fukunaga, “Improving the search performance of shade using linear population size reduction,” in IEEE Congr. Evol. Comput., 2014, pp. 1658–1665.
 [23] R. Cheng and Y. Jin, “A competitive swarm optimizer for large scale optimization,” IEEE Trans. Cybern., vol. 45, no. 2, pp. 191–204, 2015.
 [24] D. Molina, A. LaTorre, and F. Herrera, “SHADE with iterative local search for largescale global optimization,” in IEEE Congr. Evol. Comput., 2018, pp. 1–8.
 [25] M. N. Omidvar, X. Li, and X. Yao, “Cooperative coevolution with Delta grouping for large scale nonseparable function optimization,” in IEEE Congr. Evol. Comput., 2010, pp. 1–8.
 [26] M. N. Omidvar, M. Yang, Y. Mei et al., “DG2: A faster and more accurate differential grouping for largescale blackbox optimization,” IEEE Trans. Evol. Comput., vol. 21, no. 6, pp. 929–942, 2017.
 [27] M. N. Omidvar, X. Li, Y. Mei et al., “Cooperative coevolution with differential grouping for large scale optimization,” IEEE Trans. Evol. Comput., vol. 18, no. 3, pp. 378–393, 2014.
 [28] X. Li, K. Tang, M. N. Omidvar et al., “Benchmark functions for the CEC’2013 special session and competition on largescale global optimization,” Evolutionary Computing and Machine Learning group, RMIT, Australia, Tech. Rep., 2013.
 [29] T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” Jour. Molec. Bio., vol. 147, no. 1, pp. 195–197, 1981.
Comments
There are no comments yet.