Online Decisioning Meta-Heuristic Framework for Large Scale Black-Box Optimization

by   Mingde Zhao, et al.

Out of practical concerns and with the expectation to achieve high overall efficiency of the resource utilization, this paper transforms the large scale black-box optimization problems with limited resources into online decision problems from the perspective of dynamic multi-armed bandits, a simplified view of Markov decision processes. The proposed Online Decisioning Meta-heuristic framework (ODM) is particularly well suited for real-world applications, with flexible compatibility for various kinds of costs, interfaces for easy heuristic articulation as well as fewer hyper-parameters for less variance in performance. Experimental results on benchmark functions suggest that ODM has demonstrated significant capabilities for online decisioning. Furthermore, when ODM is articulated with three heuristics, competitive performance can be achieved on benchmark problems with search dimensions up to 10000.



page 2

page 3

page 4

page 5

page 6

page 7

page 10

page 12


FHHOP: A Factored Hybrid Heuristic Online Planning Algorithm for Large POMDPs

Planning in partially observable Markov decision processes (POMDPs) rema...

Meta Learning Black-Box Population-Based Optimizers

The no free lunch theorem states that no model is better suited to every...

Black-Box Optimization Revisited: Improving Algorithm Selection Wizards through Massive Benchmarking

Existing studies in black-box optimization suffer from low generalizabil...

Benchmarking for Metaheuristic Black-Box Optimization: Perspectives and Open Challenges

Research on new optimization algorithms is often funded based on the mot...

Embedded Bandits for Large-Scale Black-Box Optimization

Random embedding has been applied with empirical success to large-scale ...

Population-Based Black-Box Optimization for Biological Sequence Design

The use of black-box optimization for the design of new biological seque...

Programming by Rewards

We formalize and study “programming by rewards” (PBR), a new approach fo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Optimization problems with unknown input-output relations or structures too complex to capture are regarded as “black-box” problems [1]. With minimal assumptions on the problems and nonnecessity for domain specific knowledge, heuristic methods show promising capabilities in addressing these problems [2, 3, 4]. Yet their performance deteriorates as the dimensionality of the problems increases [5, 6, 7]

. Becoming one of the key topics in Large Scale Optimization (LSO) problems, this is widely regarded as the “curse of dimensionality”. To address the performance deterioration in LSO problems, diverse solutions of enhancement emerge.

Efforts have been put into investigating heuristics with higher performance [8]. However, empirical evidence suggests that though these approaches in a way address the bottlenecks, their capabilities inevitably change when applied on problems with different characteristics. This no-free-lunch phenomenon directs researchers to the field of meta-heuristics, which are the higher-level procedures to find, generate, or select heuristics that may provide a sufficiently good solution to an optimization problem.

One of the most featured meta-heuristics is the Multiple Offspring Sampling (MOS) algorithm [9]. MOS is a parametric framework designed for the dynamic allocation of the limited resources provided for optimization. MOS provides a framework in which the users can focus on embedding more powerful heuristics to enhance the performance. When MOS is articulated with powerful heuristics and used with appropriate hyper-parameters, state-of-the-art performance is achieved.

From the perspective of dynamic multi-armed bandits problems, a simplified version of Markov Decision Processes (MDP), this paper proposes a Online Decisioning Meta-heuristic framework (ODM), to address several concerns of the current meta-heuristics and can be highlighted for its potentials to be used in practice.

The rest of this paper is organized as follows. In Section II, we will first conduct critical reviews of the existing meta-heuristics and then give clear bottom-down motivations of this work. In section III, we will present the details of the proposed framework ODM. In section IV, theoretical analyses ODM and guidelines for its use in practice will be presented. In section V, we will provide empirical test results that suffice to validate the effectiveness and the potentials of the framework. Finally, Section VI concludes the paper.

Ii Related Works & Motivation

Empirical evidence suggests that the effectiveness of heuristics changes when applied on the problems with different characteristics. The no-free-lunch theorem tells us that there can be no universal problem solver, and thus making use of the merits of different heuristics seems a promising solution.

Ii-a Related Works

Under the setting of limited resources, optimization problems under meta-heuristic frameworks are turned into how to implement clever resource allocation methods s.t. we may find a objective values as small as possible.

Ii-A1 Allocation of Resources

The problem of meta-heuristics is the art of efficient resource allocation. In literature, there are diverse ideas about the allocation of resources.

One of the allocation ideas is that we do not consider allocation at all, i.e. just fix the resources for each heuristic and carry them out alternately. These kind of algorithms, such as [10], are often particularly designed for some class of problems and have limited generalization abilities. There are also meta-heuristics implicitly using this idea [4]. For instance, knowing that one of the three articulated heuristic, LS1, behaves particularly well on the LSO08 benchmark problems, Multiple Trajectory Search (MTS) [11] always uses LS1 at the end of each iteration. The fix of resources can be interpreted as introducing prior knowledge in meta-heuristic design and this harms the generalization abilities when being applied on real-world scenarios.

Another idea is to dynamically adjust the resources distributed to each heuristic for each iteration. For example, MOS [9] runs each of its articulated heuristics within each iteration and dynamically adjusts the resources for each heuristic according to their performance. The adjustment of the resources, or more accurately the change of the configuration for the articulated heuristics, is problematic for the fact that heuristics often require sufficient iterations to show their full capacities for the current state of optimization [12]. If we want to use this kind of allocation, we can only use the heuristics that are insensitive to the amount of provided resources. This leads to the loss of many potentially good heuristic articulation choices.

There are ideas that assign probabilities to each heuristic and treats the articulated heuristics as actions instead of changing their configurations. In

[13], the authors proposed a greedy resource allocation method, which simply switches from the two articulated heuristics when one of them has poor performance. Yet, the greedy exploitation of heuristic performance without in-time exploration is not capable for excavating collaborative synergy and may lead to premature convergence. In [14], the algorithm adjusts the probabilities of selecting the articulated local search heuristics. The key idea in this stochastic setting is to leverage exploration and exploitation via the probability adjustments, which are often unsatisfactorily addressed in a unsystematic way.

Ii-A2 Use of Feedback

The problem of the modeling and the utilization of the feedback is another key issue, since the feedbacks serve as guidance for the operations of resource distribution. Proper modeling and utilization of the feedbacks contribute to the goal of minimizing the objective function as much as possible with the limited resources.

The modeling of feedback is the problem of which information should we should make use of. There are some algorithms that model the quality of the offsprings as feedbacks from the optimization process and make use of them. The representative meta-heuristic of this kind, MOS, employs notions of the quality of the offsprings and participation ratios to dynamically allocate the resources to each articulated heuristics. The quality of offsprings share implicit connection to maximizing the effectiveness of resource utilization. Another widely-used modeling of the feedback is the relative improvement, employed in the adaptive grouping method decision in [15]. However, this modeling of feedback lacks translation invariance property and can only be used when the global minimum value of the problem is known priori.

Ii-A3 Framework Interface and Simplicity

The interfaces of the meta-heuristic frameworks are crucial factors of their applicability to real-world scenarios. To our recognition, there are mainly two aspects of the practical concerns for the interfaces.

The first problem is the friendliness of the interfaces, for the different types of costs and for the heuristic articulation. The flexibility of interface for costs or resources are well-pursued in Bayesian optimization methods [16, 1]

. Using different kinds of acquisition functions, the Bayesian optimization framework is able to minimize the objective with respect to different types of resources. For example, number of function evaluations (often used in benchmark problems), time (often used when tuning hyper-parameters of machine learning models) or money (often used in tuning expensive real-world systems),


The friendliness of interface for heuristic articulation is also another practical concern. To be applied on specific real-world problems, meta-heuristic frameworks should be able to be easily articulated with different kinds of heuristics specifically designed for different domain. The need for easy articulation requires us to put minimal constraints on the heuristics. This further explains why we should recognize changing the configurations of the heuristics as a drawback, for it incurs additional constraints which require the heuristics to be insensitive to the change of resources.

The third problem is the variance of the meta-heuristic framework. MOS obtains competitive performance via careful tuning of several hyper-parameters. In practical applications where the costs must be taken seriously, tuning the hyper-parameters of the meta-heuristic framework is too costly to be considered. Thus the more hyper-parameters, the higher performance variance. We should lower the variance of performance of the framework and decrease the parameter sensitivity as much as possible, aiming to make the meta-heuristics quasi-hyper-parameter free.

Ii-B Motivation

Ii-B1 Ultimate Goal is Efficiency

Using meta-heuristics or not, the ultimate pursuit of optimization should be “efficiency”, in a sense that with limited computational resources, we should improve the global best fitness by the most. In a meta-heuristic setting, calling heuristics at a particular stage of the optimization process results in local improvements of the global best fitness and local costs of some given resources. Acknowledging this, a meta-heuristic should well capture the local efficiency for each call of the heuristics and try to maximize the cumulative local efficiency throughout the entire optimization process.

Ii-B2 Dynamic Multi-Armed Bandit

Based on both the empirical evidence and theoretical analyses, decreasing the resources allocated to a certain heuristic is likely to decrease the effectiveness of the corresponding approach [12, 17]. If one certain heuristic is penalized in terms of resources for behaving badly for once, it is likely to suffer long term damages. Since do not want to undermine the abilities of articulated heuristics in the optimization process, we naturally turn to the idea of taking the strategies as immutable actions. More specifically, we should fix the built-in heuristics with presumably their best configurations to fully exploit their abilities and use the feedbacks of the optimization process to make online decisions on which action to take.

The notion of “efficiency” is a natural solution for different costs of taking different actions, since it considers improvement and cost simultaneously. With these considerations, the meta-heuristic problem is transformed into online decisioning problems with the aim of maximizing the cumulative efficiency of taking actions. We can fit the online decision problem under the systematic framework of Markov Decision Processes (MDP), under which the goal is to maximize the expected cumulative reward. Unfortunately, the MDP of black-box optimization is very unlikely to be solved using classical approaches for solving MDP, e.g. dynamic programming, since we cannot obtain the Markov kernels for black-box objectives and behaviors of heuristics.

However, if we reasonably assume that within the consecutive states of the MDP, the reward distribution of taking a certain action changes slowly, i.e. the state of the Markov process changes slowly, we can simplify the MDP problem into the a dynamic multi-armed bandits problem, where the reward distribution of each arm changes gradually through time. The recognition of meta-heuristics as dynamic multi-armed bandits can be traced back to [18]. In the dynamic multi-armed bandits setting, we must find an effective change point, s.t. the rewards after the change point can be used to well-approximate the rewards for the recent future. The online decisioning framework with the perspective of dynamic multi-armed bandits is presented in Fig. 1.

Fig. 1: The framework of Online Decisioning Meta-heuristics (ODM)

There are several successful algorithms for classical stochastic multi-armed bandits, e.g. softmax, -greedy, UCB, etc. [19]. These algorithms aim to achieve the highest cumulative reward via the trade-off between exploration and exploitation for the actions. For optimization, the type of reward distribution (the distribution of local efficiency for taking each action) is agnostic, thus a non-parametric algorithm should be used. In this paper, we choose the softmax algorithm for its simplicity (only one hyper-parameter), stochasticity (non-deterministic), as well as the competitive performance (softmax performs the very competitively in multi-armed bandits problems within the non-parametric algorithms) [19].

Ii-B3 Behavioral Consistency

Using heuristics on different types of problems results in feedbacks of magnitude unnecessarily the same. Also, in black-box optimization problems, it is observed that the effectiveness of heuristics generally decrease as they proceed. These phenomena indicate that the magnitude of rewards for the online decisions can be inconsistent, potentially resulting in inconsistent behavior for softmax online decisions. For example, when the rewards are very small, softmax acts almost randomly. When the rewards are huge, softmax acts greedily. Normally for multi-armed bandits, the hyper-parameter of softmax, which represents the degree of greediness, will be pre-tuned. For optimization, we cannot tune the parameters for each algorithm in advance. Thus, we would want a reliable decisioning mechanism that behaves consistently for exploration and exploitation regardless of the magnitudes of the rewards. To deal with this, we introduce normalization techniques into the decisioning s.t. no matter the magnitude of rewards, we can achieve similar behavior.

Ii-C Expectations of Design

After going through the motivations step-by-step, the idea for our meta-heuristic framework can be summarized as: devising a stochastic online decisioning process which makes decisions based on the local efficiency of the calls of heuristics with consistent behavior, expecting to maximize the overall efficiency of resource utilization.

Iii Online Decisioning Meta-Heuristic

This section gives details of the controller, an essential object that is responsible for the online decisioning of the proposed meta-heuristic framework.

Iii-a Efficiency & Formalization

We should first define local efficiency which represents the ratio of the local improvement on the objective function and the consumed resources (local cost) for calling a heuristic in a particular stage. Once the local improvement is defined, local efficiency can be easily derived as the ratio of local improvement and local cost.

In general scenarios, in contrary to solving benchmark problems, the ground truth global minima are unknown. This is to say, a consistent definition of local improvement should be independent of the global optimum. A consistent definition of local improvement should be translation invariant, which means that the local improvement should not change if the search landscape is shifted up or down. Mathematically, let be a translation invariant definition of local improvement when the global best fitness is improved from to by taking action , and we have


Among all possible definitions of satisfying Cauchy’s functional equation, the simplest definition of improvement is be the absolute improvement . Using absolute improvement seems naïve since different objective functions have generally different magnitudes of absolute improvements. However, since a normalization technique will be introduced, we can use this simple definition without caring about the difference in magnitude. With the definition of local improvement, we can define local efficiency as the ratio of the absolute improvement in objective values and the cost of resources, as


where and are the best known objective values before and after taking the action and and are the number of remaining resources before and after taking the action. Since normalization will be used, and can also be in described in various units, e.g. ratios of remaining resources, absolute number of remaining resources, etc..

Using the defined efficiency, the goal is formalized as


where is the amount of resources provided for the whole optimization process.

A stream of five-element tuples (called the “records”) is used to record the states and the actions. A record tuple looks like


The tuple denotes when action is taken during and , the best known objective value improved from to . With each record, the information of improvement and cost can be easily calculated.

Online decisioning based on local efficiency can be particularly useful in practice for its flexibility. For example, in the problems with limited number of evaluations using the black-box, the efficiency is defined as the ratio of absolute improvement and the number of evaluations consumed. In the practical scenarios, based on diverse types of resources, including financial cost, time cost or other forms of costs, we can use different definitions of efficiency to tackle the problems.

Iii-B Change Point, Normalization & Sliding Window

The multi-armed bandit perspective treats the local efficiency records as the rewards for taking actions and the core problem is how we should deal with the ever-changing reward distributions. We must find an effective change point (a certain value of ), s.t. the rewards after the change point can be used to well-approximate the rewards for the recent future. The change point between the “ancient” rewards and the “recent” rewards directly influences the quality of decisions.

Now let we first suppose we already have a good change point, how can we make use of the information of the rewards with generally decreasing efficiency records? An effective approach to offset the decreasing trend is necessary for consistent decisioning behavior. We should not try to offset the decreasing trend for each action separately since it makes the performance of the trends incomparable and undermines the effectiveness of decisioning. A reasonable solution is to use normalization for all the efficiency records after the change point. This operation can be justified by three points. First, the normalization negates the influence of diverse magnitudes of improvements for each problem. Second, we will show that normalization bounds the probabilities of exploration and exploitation s.t. behavioral consistency can be achieved. Last, normalization gives us flexibility of using absolute improvement as the numerator and resource consumption in any unit as the denominator of the defined efficiency. In this way, we offset the influence of the decrease trends and makes the efficiency records comparable for online decisioning.

Let us go back to the determination of an effective change point. Typically, in statistical analyses, change point detection techniques are used and they seem to be a good choice. However, there are two considerations that drove us away from these ideas: First, the change point detection methods often require modest number of points to alarm. We cannot assume or guarantee that the reward distribution of taking each action changes slowly enough thus the effectiveness of change point detection techniques is likely to be unsatisfactory; Second, the assumption for the distribution and hyper-parameters in these methods make the online decisioning sensitive and cause higher variances. For a robust controller, sensitive hyper-parameters are intolerable.

In this paper, we use the one of the simplest change point detection methods to tackle the problem: fixing the distance between the current location and the dividing line, i.e. use a sliding window with constant length . Though this method is indeed oversimplifying the problem, it brings convenience to analyze the behavior both theoretically and empirically.

Iii-C Non-Stop Exploration and Soft Decisioning

We can never say that if a heuristic has behaved badly historically, it will not behave well now. Thus the exploration of the rewards for each action must be guaranteed. Also, softmax decision requires at least one reward record for each action. These two concerns can be fixed with a simple rule: if no record of an action is found, the controller will choose such action directly. This rule ensures the non-stop exploration across the whole optimization process no matter how bad the hyper-parameters are chosen.

The key operation of the controller is here: we take the mean of the normalized rewards (efficiency records) for each action within the sliding window and apply softmax operation on these mean normalized records to obtain the probability distribution of selecting each action. Then, we uniformly generate random choice based on these actions. The overall decisioning behavior is formulated as pseudocode, presented in Algorithm


Iii-D Example

In Fig. 2, an example of a online-decisioning with sliding window length and three articulated actions is given. Since no records of , and can be found, the first three decisions are just , and . From decision to decision , we linearly normalize the rewards to the interval and conduct the softmax decisions. For decision , no rewards of is found so is taken.

Input: (set of actions, ), (reward stream), (hyper-parameter: sliding window size), (hyper-parameter: greediness for softmax)
Output: (chosen action)
1 //Setting Change Point: Obtain a fragment of the reward stream with length
3 //Non-stop Exploration Rule: If no records of an action is found, do that action
4 for  do
5       if  then
6             return
8//Reward Normalization: Normalize the rewards in the fragment linearly to the interval
10 //Softmax Decisioning: Take roulette action based on probability distribution of actions
13 for  do
Algorithm 1 Online Decisioning

Fig. 2: Example for online decisioning.

Iv Theoretical Analyses

Iv-a Computational Complexity

For time complexity, the mean and softmax operations costs at the level , which is trivial compared to the heuristics. For space complexity, we toss away the historical records outside the sliding window since they will not ne used again, resulting in space complexity at most , which is also trivial. The low space and time complexities are desirable.

Iv-B Probability Bounds for Behavior

The controller has only two hyper-parameters, (length of the sliding window) and (temperature coefficient for softmax). With them, we can give the bounds for behaviors of the controller which will serve as a guideline of hyper-parameter tuning.

Suppose that there are actions and denote the length of the sliding window . The controller ensures that in the fragment of length , there shall be at least one record of each action. Thus, without the consideration of softmax decisions, the maximum probability of greedy exploitation is and the exploration probability at least , with the probability of choosing each action at least .

Proposition IV.1 (Exploitation Bounds for ODM).

Suppose every action is corresponded with at least one efficiency record within the sliding window of size and is the parameter for softmax decisions, the probability of making the action with the highest mean normalized record (exploitation) in the fragment of length satisfies the bounds


Fig. 3: Demonstration for stretching fragments.

After the normalization of the controller, the highest reward within the sliding window is and the lowest is . We first stretch the reward stream fragment corresponding to the sliding window into a rectangle by putting the rewards of each action on the corresponding row, as shown in Fig. 3.

Making softmax decisions within the fragment does not care about the orders of rewards, thus an equivalence class of stretched fragments can be obtained by permutations on the fragment before stretching. Suppose that is the best performing action, we can show the upper bound can be obtained within the class equivalent to

where we can put in the yellow cells and in the cyan cells but at most one element in each column. Similarly, we can get the lower bound within the class equivalent to

where “” represents normalized rewards that are infinitely close to but still less than and thus the lower bound cannot be reached.

The probability bounds for exploration can be derived directly by subtracting the exploitation bounds. ∎

The proposition gives us a measurement of behavior of the proposed framework. We can use the bounds inversely to do efficient hyper-parameter search.

Iv-C Theoretically Derived Guidelines for Practical Use

To apply ODM on a practical problem, one should first consider the articulation of specialized heuristics for the corresponding domain. Then, proper hyper-parameters should be set. In ODM, the search for effective choices of the only two hyper-parameters, and can be assisted with the bound above. If we know how many heuristics are to be articulated and have a preferred interval for the probability of exploration or exploitation, we can inversely locate the potential combinations of the hyper-parameters and therefore decrease the risk of using bad hyper-parameters. For example, if there are totally heuristics to be articulated, , we can first constrain by and simplify the exploitation bound as


Given a preferred exploitation probability interval, say , we can easily solve using since only is involved in the upper-bound. Then, we use the solved and to get . For example, suppose we want the exploitation probability for softmax decisions to be within , we can first solve using and get the closest integer solution . Using and , we can find the closest integer solution of as . When we obtained a theoretically derived combination of , we can see from the bound that the two combinations that behaves the most similar are and . Our suggestion is that, if condition permits, to conduct hyper-parameter tunings within the three combinations and see which performs the best on the problems. If we do not have the resources to do hyper-parameter tuning, we may just use the combination with the smallest , since smaller leads to quicker adaptivity.

V Experimental Studies

V-a Experimental Settings

V-A1 Framework Setting

To validate the effectiveness of the controller, we articulate it with three heuristic approaches, including LS1 (LS), cooperative coevolution with random grouping (CC) and global search (GS). This is by no means the best possible combination. The reasons why we choose these heuristics are mainly three. First, from the empirical evidence, we know that the three choices, being powerful optimization heuristics, show effectiveness on different kinds of problems respectively. Combining them is potentially beneficial to deal with the complexities of the LSO problems; Second, these heuristics are used in literature, so that we can borrow the knowledge of fine-tuning these heuristics to their satisfactory configurations. This is the same routine we have suggested in the guidelines; Finally, these heuristics have relatively wide intersections with the state-of-the-art algorithms, which makes the comparison reasonable. The details of the heuristics are presented in Table I.

Name Costs Details Configurations
LS Local Search strategy used in MTS [11], MOS [9], resembling the trajectory search based algorithms. The same as the cofiguration in MOS: Initial step size is , where and are the lower and upper box constraints respectively. Minimal step size is .
CC Cooperative Coevolution with random grouping and SaNSDE [20] as optimizer, resembling the DECC family [21]. A robust and classical configuration: The mean of NP is set to be , group size is and generations is assigned for each group.
GS Global Search that applies SHADE [22] on all dimensions of the problem, resembling direct optimization strategies, e.g. CSO [23]. The same configuration as SHADE-ILS [24]: NP is set to be , iterate for generations.
TABLE I: Three Heuristics used for Experiments

Based on empirical knowledge, we want to set the probability of exploitation approximately in the interval . Using the bounds, we can obtain that the top three integer solutions of are , and . In the experiments, we will first conduct sensitivity analyses on the three combinations of hyper-parameters to locate a best setting.

V-A2 Compared Algorithm

In the later part of this section, the performance of the optimization process will be compared with several state-of-the-art algorithms, including CC-CMA-ES [15], DECC-D [25], DECC-DG2 [26], MTS[11] and CSO[23]. MTS updates each of the foreground individuals with the most rewarded local search method with best sampled performance [11]. CSO utilizes pairwise competition strategy to let losers learn from the winners [23]. DECC-DG2 is a faster and more accurate version of DECC-DG [27], where the accurate decomposition of the sub-components is firstly calculated via the perturbation on the objective function before the sub-components are cooperatively coevolved [26]. DECC-D is designed for non-separable LSO problems, where the cooperative coevolution is conducted on the sub-components generated by frequently using the Delta grouping scheme [25].

V-A3 Problem Settings

There are two sets of benchmark problems used in this section.

The CEC’2013 LSO (LSO13) benchmark suite represents a wider range of real-world large-scale optimization problems. With ill-conditioned complicated sub-components and irregularities [28], the LSO13 suite will serve nicely to test the overall capabilities of ODM. For the test cases in LSO13 suite, the resource for optimization is concretized as the total number of objective function evaluation and the problem dimensions are roughly .

The CEC’2008 Large Scale Optimization (LSO08) benchmark problems are naturally scalable to higher dimensions, which will serve as the test problems for our stability analyses on the impact of problem dimensionality on the controller behavior. For the test cases derived from the LSO08 benchmark suite, , where is the dimensionality of the search space.

Additionally, the default machine epsilon for IEEE-754 double precision, i.e. , is adopted for all the results. That is to say, any values less or equal to will be regarded as .

V-A4 Data Source

Results of all the compared algorithms are gathered from the implementations according to the corresponding papers on the single objective optimization experiment platform SOOPLAT111, with the source code of all reproduced algorithms except MOS. We cannot satisfactorily reproduce the algorithm with consistent performance as in the literature. For MOS, we used the results directly copied from the literature.. Their parameter settings are identical to the settings used in the papers.

V-A5 Statistical Test Settings

To statistically show the general performance rankings of algorithms when doing a comprehensive analyses, in this section, Friedman tests will be conducted with significance level on the mean gathered from many independent runs of every test cases and every algorithms. If Friedman tests tell , we can conclude that the performance rankings given by Friedman tests are statistically meaningful.

To show one-on-one performance differences of algorithms on the test cases, paired -tests will also be conducted with significance level of . Paired -tests tell us whether we have the confidence to say one algorithm performs better than another on a certain test case. In this section, we will give collective -test results in the format of a “” string. represents the number of test cases in which we are highly-confident that the first algorithm gives smaller results than the second one. represents the opposite of . represents the number of test cases in which we have no confidence of telling the difference between the results.

V-B Hyper-Parameter Sensitivity

Using the probability bound and a desired interval of the probability of exploitation as the guideline for hyper-parameter tuning, the search space of the hyper-parameters collapsed into distinct points. We run ODM times using these configurations on the LSO13 benchmark problems to obtain the best hyper-parameters for the test. The corresponding results are presented in Table II.

mean std mean std mean std
0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00
8.69E+00 2.78E+00 1.20E+01 2.54E+00 1.28E+01 4.70E+00
9.83E-13 5.27E-14 8.51E-13 5.78E-14 1.05E-12 5.35E-14
6.98E+08 2.51E+08 6.04E+08 2.02E+08 6.27E+08 2.99E+08
2.68E+06 4.38E+05 2.72E+08 5.12E+05 2.70E+06 6.04E+05
4.44E+04 3.42E+04 7.14E+04 1.92E+04 9.48E+04 1.87E+04
1.58E+05 3.68E+04 1.34E+05 5.02E+04 2.46E+05 7.81E+04
1.24E+11 9.29E+10 2.61E+11 1.51E+11 7.16E+11 5.66E+11
2.36E+08 2.61E+07 2.25E+08 4.38E+07 2.52E+08 4.64E+07
7.64E+05 5.90E+05 8.18E+05 5.79E+05 8.00E+05 2.63E+05
3.33E+07 1.08E+07 2.76E+07 8.55E+06 4.45E+07 3.48E+07
6.27E+02 2.11E+02 6.69E+02 2.09E+02 4.32E+02 2.57E+02
1.14E+07 2.20E+06 1.07E+07 2.24E+06 1.05E+07 1.13E+07
4.35E+07 6.85E+06 4.17E+07 7.80E+06 4.29E+07 1.00E+07
3.66E+06 2.32E+05 3.63E+06 2.01E+06 4.25E+06 9.17E+05
t-test 3/9/3 6/8/1
Color indicators are added for each test case. The greener, the better performance.
TABLE II: Performance under Different Hyper-Parameters

From the perspective of the -test results, the hyper-parameter pairs and performed similarly well. The hyper-parameter pair behaves relatively badly, we think it is because the length of the sliding window is too large to capture the changes of reward distributions in time. In the following parts, we will stick to the setting.

V-C ODM Validation

In this section, we will validate the effectiveness of the proposed ODM via the comparison with several baseline algorithms, illustrated in details in Table III.

Baseline Description
LS Uses only LS throughout the whole optimization process.
CC Uses only CC.
GS Uses only GS.
random Purely random choices for action decisions. A baseline for comparative study on the decision behaviors.
BEK The BEst Known average performance achieved by using from LS, CC and GS. The best known decision sequences come from extensive stochastic runs using the three heuristics. For each problem, we use the unique best known sequence to run independently. The performance obtained using the “BEK” baseline indicates the full potential of the articulated framework and serves as the approximate lower bound of the error obtained by ODM. The quality of the decision sequences can be captured via the similarity to the BEK sequences.
TABLE III: Descriptions for Baseline Algorithms

V-C1 Performance and Decision Quality Analysis

We run ODM and the baselines on the test cases times and gather the results in Table IV, together with the decision sequence similarity scores with respect to the sequences used in the BEK baseline. The sequence similarity scores are computed as mean values of the local alignment scores via the Smith-Waterman method [29]. The higher the similarity scores, the more similar the decision sequences are to the best known sequence. The performance and the similarity are converted to two radar charts for easy visualization, shown in Fig. 4.

Fig. 4: Radar charts for performance and decision similarity with respect to the BEK baseline. is excluded since the best known sequence is not unique. For each test case, the first radar chart is obtained by mapping the mean log errors into the interval . The second radar chart directly uses the similarity scores in Table IV.
mean std mean std sim mean std sim mean std sim mean std sim mean std sim
0.00e0 0.00e0 0.00e0 0.00e0 4.41e-10 1.36e-9 0.00e0 0.00e0 4.77e-4 1.32e-3 6.85e5 1.01e5
2.25e-2 1.02e-2 8.69e0 2.78e0 78.13% 2.44e2 3.83e1 31.56% 5.54e2 4.02e1 23.87% 2.40e0 1.50e0 92.25% 9.89e3 1.45e3 2.33%
4.85e-14 2.05e-14 9.83e-13 5.27e-14 69.13% 9.06e-13 6.55e-14 28.57% 1.04e-12 6.51e-14 71.43% 1.19e0 1.11e-1 31.43% 6.63e0 4.41e-1 0.00%
5.45e8 1.45e8 6.98e8 2.51e8 51.43% 1.23e9 3.90e8 16.47% 1.86e10 9.98e9 5.88% 1.22e11 6.65e10 12.25% 6.34e8 1.00e8 68.24%
9.54e5 4.32e5 2.68e6 4.38e5 30.34% 2.52e6 9.94e5 42.15% 1.23e7 2.93e6 11.05% 1.26e7 3.42e6 20.01% 2.20e6 6.93e5 65.98%
3.93e4 3.55e4 4.44e4 3.42e4 44.51% 5.99e4 3.35e4 46.58% 9.84e5 4.13e3 4.29% 9.82e5 4.62e3 2.17% 4.02e4 3.32e4 71.35%
8.07e4 9.25e3 1.58e5 3.68e4 44.25% 4.10e5 4.22e5 33.15% 3.10e8 4.81e8 19.77% 3.13e9 2.20e9 24.24% 1.06e6 1.00e5 33.37%
2.93e9 4.35e9 1.24e11 9.29e10 51.25% 1.45e12 1.42e12 44.42% 2.96e15 1.75e15 21.24% 4.93e16 3.12e16 9.13% 1.55e11 1.66e10 49.18%
1.43e8 2.25e7 2.36e8 2.61e7 26.14% 2.08e8 4.02e7 31.15% 1.01e9 1.91e8 12.31% 9.34e8 2.05e8 8.31% 2.63e8 4.35e7 26.75%
7.70e5 6.62e5 7.64e5 5.90e5 85.86% 9.33e5 5.28e5 42.00% 6.37e7 3.54e7 3.50% 6.46e7 3.40e7 6.87% 1.06e6 4.34e5 34.63%
1.35e7 2.21e6 3.33e7 1.08e7 71.54% 1.84e8 9.53e7 22.78% 1.07e11 1.19e11 3.15% 1.17e11 1.14e11 9.55% 1.34e7 2.60e6 83.36%
2.25e-4 2.25e1 6.27e2 2.11e2 32.57% 8.12e2 2.69e2 18.82% 6.14e2 2.65e2 31.10% 2.42e3 4.79e2 24.56% 2.33e3 1.77e2 3.87%
4.45e5 2.17e5 1.14e7 2.20e6 49.79% 9.28e7 1.05e8 22.53% 3.36e9 1.63e9 15.64% 5.34e9 1.90e9 19.35% 1.18e7 1.61e6 45.98%
5.99e5 1.01e5 4.35e7 6.85e6 31.95% 5.36e7 1.12e7 19.59% 1.25e11 1.41e11 11.25% 1.60e11 9.10e10 15.25% 5.04e7 5.73e6 28.65%
9.71e5 4.24e4 3.66e6 2.32e5 66.87% 8.78e6 2.82e6 31.66% 3.05e8 9.52e7 16.97% 2.15e9 2.77e9 22.65% 8.59e6 1.08e6 28.89%
-rank performance/sim 1.63 4.29 2.53 3.32 3.70 1.79 4.67 2.14 2.67 3.46
-test 9/5/1 13/2/0 14/0/1 8/5/2
We cannot compute similarity scores for since the best decision sequence is not unique.
For each test case, the greener the indicators, the better the performance.
The Friedman tests are conducted within ODM, RAN, LS, CC and GS. For the Friedman tests on the errors, . The smaller the -ranks, the better the performance. For the Friedman tests on the similarity scores, . The larger the -ranks, the more similar the decision sequences are to the best known ones.
The -tests are conducted in pairwise manner on ODM and RAN, ODM and LS, ODM and CC and ODM and GS.
TABLE IV: Baseline Comparison Results

By analyzing the results in the table and the radar charts, we obtain the following observations:

  1. In terms of the objective function values, ODM achieves significantly better performance than the baselines (RAN, LS, CC and GS), according to -test results as well as the Friedman test results. This is a strong evidence for the effectiveness of the online decisions. On , , , and , ODM achieves roughly equivalent performance to the best performing heuristics. This is interesting since the non-stop exploration of the controller should have increased the overhead for identifying the global best action instead of minimizing it. The phenomenon indicates that non-stop exploration is beneficial, even if it potentially causes overhead and thus justifies the principles used in the controller design. On , , and , ODM achieves collaborative synergy, performance that is significantly better than using any one of the heuristics. On , , and , ODM achieves statistically equivalent performance to the BEK baseline, which is the best performance that ODM can possibly achieve.

  2. In terms of decision similarity to the best known sequence, we can see that ODM is significantly more similar in distribution, than the RAN baseline and all the single heuristics. This means that ODM is able to excavate effective sequences of actions to deal with optimization.

To intuitively show the effectiveness of online decisions, in Fig. 8, we present three sets of real-time error bands of test cases where collaborative synergy is achieved. For , though ODM achieves faster convergence than BEK baseline in the early stages, BEK achieves globally better performance. This means we can still enhance ODM by designing more effective rules for the balance between exploration and exploitation.

Fig. 8: Real-time error bands for , and of LSO13 suite. In each diagram, the -axes represent the ratios of resource consumption while the -axes represent the errors (differences to the global optimum). The bands are featured with the mean curves for the real-time performance data at each evenly-sampled -values and the confidence interval around the mean curves, shaded in lighter colors. Each band visualizes the performance of all the independent runs of a certain algorithm on a certain test case.

V-C2 Stability Analysis

A stable meta-heuristic framework must be able to effectively deal with LSO problems with different dimensionality. The “curse of dimensionality” reminds us that the effectiveness of the articulated heuristics inevitably deteriorate with the increase of the dimensionality of the problems. Thus if the controller can lead to competitive performance within problems regardless the widely-ranging problem dimensions, we should be able to suggest that the controller is with satisfactory stability and robustness. For this consideration, in this part, comparative trials on the LSO08 benchmark problems to are conducted. We run independent runs for all test cases corresponding to six problems with . Note that for LSO08 tests, the maxFEs scales linearly with problem dimension, which is also true for our settings of the heuristics. This means that the length of the decision sequence on different dimensions are roughly the same and thus comparable. In Table V, the mean and std performances of ODM obtained over independent runs on each test case is given. In Table VI, the similarity score matrices of the decision sequences are presented.

1000 2500 5000 10000
Problem mean std mean std mean std mean std
0.00e0 0.00e0 0.00e0 0.00e0 0.00e0 0.00e0 0.00e0 0.00e0
2.60e1 2.53e0 8.58e1 2.12e0 1.25e2 1.97e0 1.44e2 7.53e-1
3.26e0 3.67e0 5.81e2 5.99e2 1.68e3 1.18e3 1.61e3 1.84e3
0.00e0 0.00e0 0.00e0 0.00e0 0.00e0 0.00e0 0.00e0 0.00e0
3.67e-15 1.80e-16 1.83e-14 1.55e-15 4.06e-14 1.58e-15 9.57e-14 1.17e-14
1.04e-12 5.37e-14 5.50e-13 2.01e-14 1.15e-12 3.64e-14 2.70e-12 1.62e-13
is excluded since we do not know its ground truth minimum.
TABLE V: Performance on LSO08 Problems with Different Dimensions
1000 1000 52.14% 48.34% 23.07% 1000 1000 66.63% 24.13% 12.75% 1000 1000 70.13% 66.75% 62.11% 1000
2500 85.40% 2500 46.26% 31.68% 2500 55.12% 2500 42.67% 19.57% 2500 71.33% 2500 72.25% 69.33% 2500
5000 78.46% 81.33% 5000 43.49% 5000 43.09% 53.25% 5000 33.26% 5000 51.24% 60.01% 5000 68.24% 5000
10000 71.98% 75.47% 77.35% 10000 10000 24.25% 32.75% 54.33% 10000 10000 33.26% 39.66% 44.25% 10000 10000
Each entry is the mean similarity score of decision sequences for a problem explained by the row dimensionality indicator and the column dimensionality indicator. For example, the bottom left is the mean value of similarity scores of the sequences under to the sequences under .
Color indicators have been added for each problem. The greener the cell is shaded, the more similar the corresponding decision sequences are. The redder, the less similar.
TABLE VI: Similarity Score Matrices for LSO08 problems

From the similarity matrices, it can be observed that generally the larger difference in problem dimensionality, the less similar the sequences are. Though the similarity scores seem relatively small, the performance under different dimensionality is roughly equivalent222The differences in performance may seem quite significant before comparing with other algorithms. and within of problems very close to the global optimum. This indicates that, though the patterns of sequences are not similar when the problem dimensionality changes, the qualities of the decisioning remain insensitive, providing another strong evidence for the effectiveness and the robustness of ODM.

V-D Comparison with State-of-the-Art Algorithms

In this sub-section, we compare the performance ODM gave in the previous sections with the performance of the state-of-the-art algorithms to demonstrate the potentials and the effectiveness of the proposed framework.

V-D1 Comprehensive Comparison

To demonstrate the potentials, on the LSO13 benchmark problems, we compare the performance of ODM with six state-of-the-art algorithms, including MTS [11], MOS [9], CSO [23], CC-CMA-ES [15], DECC-DG2 [26] and DECC-D [25]. These algorithms are known by their competitive performance on the LSO benchmark problems. This will be a strict test on the capacities of the proposed framework. Under the CEC’2018 official competition standards, we run each of the algorithms times on each test case.

mean std mean std mean std mean std mean std mean std mean std
0.00e0 0.00e0 0.00e0 0.00e0 0.00e0 0.00e0 0.00e0 0.00e0 3.01e-2 1.22e-1 9.54e5 1.57e6 4.23e-12 9.08e-13
8.69e0 2.78e0 7.98e2 5.41e1 1.93e1 4.16e0 7.25e2 2.72e1 1.97e3 2.73e2 1.38e4 1.78e3 1.16e3 2.92e1
9.83e-13 5.27e-14 9.18e-13 6.31e-14 0.00e0 0.00e0 1.19e-12 2.11e-14 1.20e-13 3.08e-15 1.07e1 8.71e-1 8.52e-10 5.18e-11
6.98e8 2.51e8 2.47e10 1.32e10 1.34e10 7.69e9 1.13e10 1.29e9 1.13e10 9.91e9 5.15e8 2.39e8 3.81e10 1.61e10
2.68e6 4.38e5 1.02e7 1.17e6 1.11e7 1.76e6 7.66e5 1.17e5 8.72e6 2.52e6 2.51e6 4.49e5 9.71e6 1.77e6
4.44e4 3.42e4 8.84e5 1.66e5 9.85e5 3.22e3 4.36e-8 1.58e-9 7.55e5 3.65e5 1.25e5 2.01e4 1.57e4 2.88e4
1.58e5 3.68e4 7.08e7 7.60e7 2.31e7 4.12e7 7.94e6 2.88e6 5.37e6 1.15e7 1.54e7 1.01e7 3.29e9 1.14e9
1.24e11 9.29e10 1.71e15 5.60e14 1.64e15 1.66e15 3.07e14 7.64e13 5.87e14 2.02e14 9.35e13 4.28e13 1.92e15 9.47e14
2.36e8 2.61e7 7.32e8 9.60e7 8.97e8 1.39e8 4.59e7 8.57e6 5.19e8 1.70e8 3.06e8 7.37e7 7.18e8 1.14e8
7.64e5 5.90e5 2.16e6 2.45e6 6.05e7 2.91e7 5.35e-5 2.19e-4 7.11e7 2.94e7 1.43e2 1.87e1 1.03e3 1.65e3
3.33e7 1.08e7 6.26e9 1.64e10 4.01e10 1.23e11 3.87e8 1.13e8 4.52e8 1.18e9 8.82e9 2.60e10 4.13e10 9.40e10
6.27e2 2.11e2 1.03e3 8.70e2 8.63e1 7.71e1 1.43e3 8.80e1 1.24e3 8.61e1 1.51e8 3.64e8 1.36e3 1.32e2
1.14e7 2.20e6 1.30e9 1.51e9 1.13e9 7.71e8 5.65e8 1.87e8 7.42e9 4.97e9 9.62e8 3.80e8 4.33e10 9.30e9
4.35e7 6.85e6 4.11e10 7.79e10 6.89e9 1.41e10 6.62e10 1.30e10 8.06e9 2.05e10 3.39e10 2.15e9 7.86e11 2.91e11
3.66e6 2.32e5 4.07e7 1.23e7 1.31e8 6.02e7 1.59e7 1.06e6 3.51e6 1.02e6 1.55e7 1.36e6 5.39e7 4.53e6
-test 11/3/1 10/3/2 10/1/4 11/3/1 11/2/2 12/1/2
-rank 2.033 4.833 4.500 2.933 4.033 4.067 5.600
For each test case, the greener the indicator, the better the performance. The best performance is in bold type and shaded grey.
Friedman test: , . The best ranking is in bold type and shaded grey.
If ODM performs better in terms of -test results, the corresponding “l/u/g” string is in bold type and shaded grey.
TABLE VII: Comparative Results on LSO13 Problems

The color indicators, the Friedman rankings and the paired -test results unanimously show that ODM achieves the best results within the compared algorithms. Furthermore, on test cases, ODM achieved errors at least one order of magnitude lower than all the compared algorithms. Since we do not add any additional components such as restart mechanisms to further enhance the performance, it can be concluded that the potential of the proposed framework ODM is amazingly satisfactory.

V-D2 Scalability Comparison

We also want to check validate the sensitivity of performance to the problem dimensionality, i.e. the scalability of ODM. In this part, we run tests on the scalable LSO08 problems, with the same compared algorithms as the last test. For each of the six problems in LSO08 suite, we make test cases with dimensions , , and . This makes test cases in total. For each test case, we run each algorithm

independent times to gather the mean and standard deviation results and uses the same analyses as before. The results are presented in Table


D mean std mean std mean std mean std mean std mean std
1000 0.00e0 0.00e0 0.00e0 0.00e0 0.00e0 0.00e0 0.00e0 0.00e0 1.08e2 4.40e2 0.00e0 0.00e0
2500 0.00e0 0.00e0 0.00e0 0.00e0 1.48e-20 7.71e-22 0.00e0 0.00e0 1.11e5 1.25e5 0.00e0 0.00e0
5000 0.00e0 0.00e0 0.00e0 0.00e0 5.32e-18 4.63e-19 0.00e0 0.00e0 6.46e5 6.31e4 0.00e0 0.00e0
10000 0.00e0 0.00e0 0.00e0 0.00e0 1.64e-17 2.02e-18 0.00e0 0.00e0 6.47e7 6.94e5 0.00e0 0.00e0
1000 2.60e1 2.53e0 1.12e2 8.86e0 8.04e1 2.43e0 1.62e2 7.74e0 7.40e1 1.66e0 5.65e1 4.63e0
2500 8.58e1 2.12e0 1.47e2 1.30e0 4.61e1 1.28e0 1.82e2 1.14e1 1.39e2 2.96e0 6.72e1 4.56e0
5000 1.25e2 1.97e0 1.59e2 1.12e0 8.26e1 1.30e0 1.90e2 5.81e0 1.44e2 3.78e0 8.21e1 4.23e0
10000 1.44e2 7.53e-1 1.69e2 1.32e0 1.23e2 1.48e0 1.95e2 5.27e-1 1.96e2 8.58e-1 8.69e1 4.21e0
1000 3.26e0 3.67e0 1.69e2 1.27e2 1.26e3 1.42e2 1.02e3 2.87e1 5.49e6 1.71e7 1.23e3 1.10e2
2500 5.81e2 5.99e2 7.68e2 2.50e2 2.54e3 2.48e1 2.82e3 8.96e1 7.37e9 4.93e9 3.16e3 4.91e2
5000 1.68e3 1.18e3 1.35e3 3.55e2 5.45e3 1.36e2 5.75e3 1.89e2 2.22e11 6.49e10 6.19e3 4.41e2
10000 1.61e3 1.84e3 2.04e3 6.95e2 1.57e4 8.75e2 1.16e4 2.97e2 8.39e13 1.21e12 1.22e4 4.06e2
1000 0.00e0 0.00e0 0.00e0 0.00e0 7.05e2 3.00e1 1.93e3 1.29e2 4.63e3 4.91e2 5.21e2 2.27e1
2500 0.00e0 0.00e0 7.96e-1 4.45e-1 1.22e3 4.07e1 5.31e3 2.48e2 1.68e4 1.63e3 1.23e3 6.12e1
5000 0.00e0 0.00e0 3.98e0 1.22e0 2.84e3 3.29e1 1.10e4 1.39e2 4.09e4 3.73e3 2.38e3 6.84e1
10000 0.00e0 0.00e0 8.29e0 3.91e0 9.19e3 1.63e2 2.22e4 5.49e2 2.62e5 3.92e2 4.68e3 5.64e1
1000 3.67e-15 1.80e-16 3.71e-15 1.37e-16 2.22e-16 0.00e0 2.71e-3 5.92e-3 4.46e-1 5.78e-1 1.72e-15 9.17e-17
2500 1.83e-14 1.55e-15 1.92e-14 4.78e-16 4.44e-16 0.00e0 1.97e-3 4.41e-3 1.35e3 2.05e2 3.45e-3 7.70e-3
5000 4.06e-14 1.58e-15 6.53e-14 3.24e-16 6.66e-16 0.00e0 2.11e-14 6.97e-15 1.65e4 1.23e3 1.17e-14 1.45e-16
10000 9.57e-14 1.17e-14 7.07e-5 1.58e-4 1.13e-15 4.97e-17 4.02e-14 2.48e-14 5.85e5 5.46e3 2.35e-14 1.57e-16
1000 1.04e-12 5.37e-14 9.22e-13 4.28e-14 1.20e-12 1.61e-14 1.17e-13 3.25e-15 1.09e1 7.98e-1 1.05e-13 2.65e-15
2500 5.50e-13 2.01e-14 2.31e-12 4.63e-13 2.92e-12 4.44e-14 3.61e0 8.07e0 1.46e1 2.93e-1 2.62e-13 6.16e-15
5000 1.15e-12 3.64e-14 3.53e-12 8.94e-13 4.51e-11 7.07e-13 1.82e1 8.90e-2 1.75e1 9.26e-1 5.13e-13 5.55e-15
10000 2.70e-12 1.62e-13 5.65e-12 2.53e-13 1.66e-10 2.62e-11 1.85e1 7.29e-1 2.16e1 6.60e-3 1.03e-12 1.29e-14
-test 1000 2/3/1 4/1/1 3/2/1 5/1/0 3/1/2
2500 5/1/0 4/0/2 5/1/0 6/0/0 3/1/2
5000 4/2/0 4/0/2 4/1/1 6/0/0 2/1/3
10000 3/3/0 4/0/2 4/1/1 6/0/0 2/1/3
-rank 1000 2.25 3.08 3.67 4.00 5.50 2.50
2500 1.92 2.92 2.83 4.42 5.67 3.25
5000 2.42 3.08 3.17 4.42 5.50 2.42
10000 2.25 3.08 3.50 3.92 6.00 2.25
MOS is excluded in this set of test since we cannot satisfactorily reproduce the algorithm or find results of these test cases in the literature.
For each test case, the greener the indicator, the better the performance. The best performance is in bold type and shaded grey.
All Friedman tests satisfy . The best ranking is in bold type and shaded grey.
If ODM performs better in terms of -test results, the corresponding “l/u/g” string is in bold type and shaded grey.
TABLE VIII: Comparative Results on LSO08 Problems
(a) ,
(b) ,
(c) ,
(d) ,
Fig. 13: Real-time error bands for of LSO08 suite with different dimensions.

In out of test cases, ODM achieved the best performance. In terms of Friedman tests, ODM has consistently obtained the best performance. In terms of -tests, ODM has the second better performance following DECC-D. DECC-D has better performance in the higher dimensions whereas our ODM has better performance within the lower dimensional test cases. Though for the scalability comparison tests we cannot say ODM achieves the best performance unanimously, the potential of the proposed framework is thoroughly demonstrated.

In Fig. 13, we present the real-time bands of the compared algorithms on of LSO08 suite. MTS obtains fast convergence in the D test case however exhibits significant deterioration in performance when the problem dimensionality rises. This indicates that the heuristics MTS have articulated has limited scalability. Since there is no free lunch, this is always true for any heuristics. The necessity for articulating more robust heuristics to meta-heuristic frameworks leads to the necessity of friendly interfaces for heuristic articulation in meta-heuristic frameworks, which can be highlighted for ODM.

Vi Conclusion, Introspection & Future Works

Out of empirical concerns, this paper formulates a meta-heuristic framework that makes online decisions according to a controller, which decides upon the recent performance of the articulated heuristics. The controller is designed purposely to address the problems of the current LSO meta-heuristic frameworks, with robust interfaces for practical use, simplicity to ensure low-variance performance and theoretically derived guidelines for hyper-parameter tuning. It has shown significance in action decisioning when articulated with heuristics on several benchmark problems, without embedding effective approaches such as restart mechanisms, etc..

Frankly, there are drawbacks of the proposed controller waiting to be addressed. First and foremost, the controller is built upon teh assumption that the state of the Markov process change sufficiently slowly, which is not guaranteed for general problems or implicitly conveys the restriction for the articulated heuristics to be not so costly in-terms of resources. Second, the controller itself is a heuristic approach with no theoretical guarantees for the closeness to the optimal decision sequence; Third, the identification of the change point using a constant size sliding window is, though empirically proved effective, oversimplified.

In this paper, we identify the hybrid LSO problems as solving MDPs. Though there are many effective and theoretically backed approaches of solving a single MDP, such as reinforcement learning, in the black-box scenarios, we cannot use them until we can propose a satisfactory approach of transferring the knowledge of one MDP to other problems to achieve better results on other problems,


to properly setup a transfer learning framework. We will focus our future work on the transfer learning setup, which should be built upon effective invariance relations that catches the mutual information of the black-box problems.


  • [1] R. Martinez-Cantin, “Funneled bayesian optimization for design, tuning and control of autonomous systems,” IEEE Trans. Cybern., pp. 1–12, 2018.
  • [2] X. Ma, X. Li, Q. Zhang et al.

    , “A survey on cooperative co-evolutionary algorithms,”

    IEEE Trans. Evol. Comput., pp. 1–1, 2018.
  • [3] R. Cheng, M. N. Omidvar, A. H. Gandomi et al., “Solving incremental optimization problems via cooperative coevolution,” IEEE Trans. Evol. Comput., pp. 1–1, 2018.
  • [4] Y. Cao, H. Zhang, W. Li et al.

    , “Comprehensive learning particle swarm optimization algorithm with local search for multimodal functions,”

    IEEE Trans. Evol. Comput., pp. 1–1, 2018.
  • [5] F. van den Bergh and A. P. Engelbrecht, “A cooperative approach to particle swarm optimization,” IEEE Trans. Evol. Comput., vol. 8, no. 3, pp. 225–239, 2004.
  • [6] H. Ge, L. Sun, G. Tan et al., “Cooperative hierarchical pso with two stage variable interaction reconstruction for large scale optimization,” IEEE Trans. Cybern., vol. 47, no. 9, pp. 2809–2823, 2017.
  • [7] I. Loshchilov, T. Glasmachers, and H. Beyer, “Large scale black-box optimization by limited-memory matrix adaptation,” IEEE Trans. Evol. Comput., pp. 1–1, 2018.
  • [8] Z. Li, Q. Zhang, X. Lin et al., “Fast covariance matrix adaptation for large-scale black-box optimization,” IEEE Trans. Cybern., pp. 1–11, 2018.
  • [9] A. LaTorre, S. Muelas, and J. M. Pena, “Multiple offspring sampling in large scale global optimization,” in IEEE Congr. Evol. Comput., 2012, pp. 1–8.
  • [10] A. Boluf -R hler, S. Fiol-Gonz lez, and S. Chen, “A minimum population search hybrid for large scale global optimization,” in IEEE Congr. Evol. Comput., 2015, pp. 1958–1965.
  • [11] L.-Y. Tseng and C. Chen, “Multiple trajectory search for large scale global optimization,” in IEEE Congr. Evol. Comput., 2008, pp. 3052–3059.
  • [12] N. Hansen and A. Auger, “Principled design of continuous stochastic search: From theory to practice,” Theory and Principled Methods for the Design of Metaheuristics, pp. 145–180, 2014.
  • [13] S. Ye, G. Dai, L. Peng et al., “A hybrid adaptive coevolutionary differential evolution algorithm for large-scale optimization,” in IEEE Congr. Evol. Comput., 2014, pp. 1277–1284.
  • [14] D. Molina and F. Herrera, “Iterative hybridization of de with local search for the CEC’2015 special session on large scale global optimization,” in IEEE Congr. Evol. Comput., 2015, pp. 1974–1978.
  • [15] J. Liu and K. Tang, “Scaling up covariance matrix adaptation evolution strategy using cooperative coevolution,” in International Conference on Intelligent Data Engineering and Automated Learning.   Springer, 2013, pp. 350–357.
  • [16] B. Shahriari, K. Swersky, Z. Wang et al., “Taking the human out of the loop: A review of bayesian optimization,” Proceedings of the IEEE, vol. 104, no. 1, pp. 148–175, 2016.
  • [17] N. Hansen, “The CMA evolution strategy: A tutorial,” ArXiv, vol. 1604, no. 00772v1, 2016.
  • [18] L. DaCosta, A. Fialho, M. Schoenauer et al., “Adaptive operator selection with dynamic multi-armed bandits,” in

    Conference on Genetic and Evolutionary Computation

    , 2008, pp. 913–920.
  • [19] V. Kuleshov and D. Precup, “Algorithms for multi-armed bandit problems,” J. Mach. Learn. Res., vol. 1, pp. 1–48, 2000.
  • [20] M. N. Omidvar, X. Li, Z. Yang et al., “Cooperative co-evolution for large scale optimization through more frequent random grouping,” in IEEE Congr. Evol. Comput., 2010, pp. 1–8.
  • [21] Z. Yang, K. Tang, and X. Yao, “Large scale evolutionary optimization using cooperative coevolution,” Info. Sci., vol. 178, no. 15, pp. 2985–2999, 2008.
  • [22] R. Tanabe and A. S. Fukunaga, “Improving the search performance of shade using linear population size reduction,” in IEEE Congr. Evol. Comput., 2014, pp. 1658–1665.
  • [23] R. Cheng and Y. Jin, “A competitive swarm optimizer for large scale optimization,” IEEE Trans. Cybern., vol. 45, no. 2, pp. 191–204, 2015.
  • [24] D. Molina, A. LaTorre, and F. Herrera, “SHADE with iterative local search for large-scale global optimization,” in IEEE Congr. Evol. Comput., 2018, pp. 1–8.
  • [25] M. N. Omidvar, X. Li, and X. Yao, “Cooperative co-evolution with Delta grouping for large scale non-separable function optimization,” in IEEE Congr. Evol. Comput., 2010, pp. 1–8.
  • [26] M. N. Omidvar, M. Yang, Y. Mei et al., “DG2: A faster and more accurate differential grouping for large-scale black-box optimization,” IEEE Trans. Evol. Comput., vol. 21, no. 6, pp. 929–942, 2017.
  • [27] M. N. Omidvar, X. Li, Y. Mei et al., “Cooperative co-evolution with differential grouping for large scale optimization,” IEEE Trans. Evol. Comput., vol. 18, no. 3, pp. 378–393, 2014.
  • [28] X. Li, K. Tang, M. N. Omidvar et al., “Benchmark functions for the CEC’2013 special session and competition on large-scale global optimization,” Evolutionary Computing and Machine Learning group, RMIT, Australia, Tech. Rep., 2013.
  • [29] T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” Jour. Molec. Bio., vol. 147, no. 1, pp. 195–197, 1981.