1 Introduction
Runtime analysis is a rather recent and increasingly popular approach in the theory of randomized search heuristics. Typically, the aim is to analyze the (random) time until one goal of optimization (optimum found, good approximation found etc.) is reached. This is equivalent to deriving the first hitting time for a set of states of an underlying (discretetime) stochastic process.
Drift analysis has turned out as one of the most powerful techniques for runtime analysis. In a nutshell, drift is the expected progress of the underlying process from one time step to another. An expression for the drift is turned into an expected first hitting time via a drift theorem. An appealing property of such a theorem is that a local property (the onestep) drift is translated into a global property (the first hitting time).
Sasak and Hajek (1988) introduced drift analysis to the analysis of randomized search heuristics (more precisely, of simulated annealing), and He and Yao (2001) were the first to apply drift analysis to evolutionary algorithms. The latter paper presents a drift theorem that is nowadays called additive drift. Since then, numerous variants of drift theorems have been proposed, including upper and lower bounds in the scenario of multiplicative drift (Doerr et al., 2012; Lehre and Witt, 2012), variable drift (Johannsen, 2010; Mitavskiy et al., 2009; Doerr et al., 2011; Rowe and Sudholt, 2012) and generalizations thereof, e. g., variable drift without monotonicity conditions (Doerr et al., 2012; Feldmann and Kötzing, 2013). Moreover, considerable progress was made in the development of socalled distance functions used to model the process analyzed by drift analysis (Doerr and Goldberg, 2013; Witt, 2013)
. The powerful drift theorems available so far allow for the analysis of randomized search heuristics, in particular evolutionary algorithms and ant colony optimization, on example problems and problems from combinatorial optimization. See also the text books by
Auger and Doerr (2011), Neumann and Witt (2010) and Jansen (2013) for detailed expositions of the state of the art in runtime analysis of randomized search heuristics.At present, the exciting and powerful research done in drift analysis is scattered over the literature. Existing formulations of similar theorems may share many details but deviate in minor conditions. Notation is not always consistent. Several existing variants of drift theorems contain assumptions that might be convenient to formulate, e. g., Markovian properties and discrete or finite search spaces; however, it was not always clear what assumptions were really needed and whether the drift theorem was general enough. This is one reason why additional effort was spent on removing the assumption of discrete search spaces from multiplicative and variable drift theorems (Feldmann and Kötzing, 2013) – an effort, as we will show, was not really required.
Our work makes two main contributions to the area of drift analysis. The first one is represented by a “universal” formulation of a drift theorem that strives for as much generality as possible. We provably can identify all of the existing drift theorems mentioned above as special cases. While doing this, we propose a consistent notation and remove unnecessary assumptions such as discrete search spaces and Markov processes. In fact, we even identify another famous technique for the runtime analysis of randomized search heuristics, namely fitness levels (Sudholt, 2013) as a special case of our general theorem. Caveat. When we say “all” existing drift theorems, we exclude a specific but important scenario from our considerations. Our paper only considers the case that the drift is directed towards the target of optimization. The opposite case, i. e., scenarios where the process moves away from the target, is covered by the lower bounds from the socalled simplified/negative drift theorem (Oliveto and Witt, 2011), which states rather different conditions and implications. The conditions and generality of the latter theorem were scrutinized in a recent erratum (Oliveto and Witt, 2012).
The second contribution is represented by tail bounds, also called deviation bounds or concentration inequalities, on the hitting time. Roughly speaking, conditions are provided under which it is unlikely that the actual hitting time is above or below its expected value by a certain amount. Such tail bounds were not known before in drift analysis, except for the special case of upper tail bounds in multiplicative drift (Doerr and Goldberg, 2013). In particular, our drift theorem is the first to prove lower tails. We use these tail bounds in order to prove very sharp concentration bounds on the running time of a (1+1) EA on OneMax, general linear functions and LeadingOnes. Up to minor details, the following is shown for the running time of the (1+1) EA on OneMax (and the same holds on all linear functions): the probability that deviates (from above or below) from its expectation by an additive term of is for any constant . With LeadingOnes, a deviation by from the expected value is proved to have probability . Such sharpconcentration results are extremely useful from a practical point of view since they reveal that the process is “almost deterministic” such that very precise predictions of its actual running time can be made. Moreover, the concentration inequalities allow a change of perspective to tell what progress can be achieved within a certain time budget, see the recent line of work on fixedbudget computations (Jansen and Zarges, 2012; Doerr et al., 2013).
This paper is structured as follows. Section 2 introduces notation and basics of drift analysis. Section 3 presents the general drift theorem, its proof and suggestions for userfriendly corollaries. Afterwards, specializations are discussed. Section 4 shows how the general drift theorem is related to known variable drift theorems, and Section 5 specializes our general theorem into existing multiplicative drift theorems. The fitness level technique, both for lower and upper bounds, is identified as a special case in Section 6. Section 7 is devoted to the tail bounds contained in the general drift theorem. It is shown how they can directly be applied to prove sharpconcentration results on the running time of the (1+1) EA on OneMax and general linear functions. Moreover, a more userfriendly special case of the theorem with tail bounds is proved and used to show sharpconcentration results w. r. t. LeadingOnes. We finish with some conclusions.
2 Preliminaries
Stochastic process.
Throughout this paper, we analyze timediscrete stochastic processes represented by a sequence of nonnegative random variables
. For example, could represent the number of zero or onebits of an (1+1) EA at generation , a certain distance value of a populationbased EA from an optimal population etc. In particular, might aggregate several different random variables realized by a search heuristic at time into a single one. We do not care whether the state space is discrete (e. g., all nonnegative integers or even a finite subset thereof) or continuous. In discrete search spaces, the random variables will have a discrete support; however, this is not important for the formulation of the forthcoming theorems.First hitting time.
We adopt the convention that the process should pass below some threshold (“minimizes” its state) and define the first hitting time . If the actual process seeks to maximize its state, typically a straightforward mapping allows us to stick to the convention of minimization. In a special case, we are interested in the hitting time of state ; for example when a (1+1) EA is run on OneMax and were are interested in the first point of time where the number of zerobits becomes zero. Note that is a stopping time and that we tacitly assume that the stochastic process is adapted to its natural filtration , i. e., the information available up to time .
Drift.
The expected onestep change for is called drift. Note that in general is a random variable since the outcomes of are random. Suppose we manage to bound from below by some for all possible outcomes of , where . Then we know that the process decreases its state (“progresses towards ”) in expectation by at least in every step, and the additive drift theorem (see Theorem 1 below) will provide a bound on that only depends on and . In fact, the very naturally looking result will be obtained. However, bounds on the drift might be more complicated. For example, a bound on might depend on or states at even earlier points of time, e. g., if the progress decreases as the current state decreases. This is often the case in applications to evolutionary algorithms. It is not so often the case that the whole “history” is needed. Simple evolutionary algorithms and other randomized search heuristics are Markov processes such that simply . With respect to Markov processes on discrete search spaces, drift conditions traditionally use conditional expectations such as and bound these for arbitrary instead of directly bounding the random variable on .
Caveat.
As pointed out, the drift in general is a random variable and should not be confused with the “expected drift” , which rarely is available since it averages over the whole history of the stochastic process. Drift is based on the inspection of the progress from one step to another, taking into account every possible history. This onestep inspection often makes it easy to come up with bounds on . Drift theorems could also be formulated based on expected drift; however, this might be tedious to compute. See Jägersküpper (2011) for one of the rare analyses of “expected drift”, which we will not get into in this paper.
We now present the first formal drift theorem dealing with additive drift. It is based on a formulation by He and Yao (2001), from which we removed some unnecessary assumptions, more precisely the discrete search space and the Markov property. We only demand a bounded state space.
Theorem 1 (Additive Drift, following He and Yao (2001)).
Let , be a stochastic process over some bounded state space . Assume that . Then:

If then .

If then .
By applying the law of total expectation, Statement implies and analogously for Statement .
For the sake of completeness, we also provide as simple proof using martingale theory, inspired by Lehre (2012). This proof is simpler than the original one by He and Yao (2001).
Proof of Theorem 1.
We prove only the upper bound since the lower bound is proven symmetrically. We define . Note that as long , is a supermartingale w. r. t. , more precisely by induction
where the inquality uses the drift condition. Since the state space is bounded and , we can apply the optional stopping theorem and get . Rearranging terms, the theorem follows. ∎
Summing up, additive drift is concerned with the very simple scenario that there is a progress of at least from all nonoptimal states towards the optimum in and a progress of at most in . Since the values are not allowed to depend on , one has to use the worstcase drift over all . This might lead to very bad bounds on the first hitting time, which is why more general theorems (as mentioned in the introduction) were developed. It is interesting to note that these more general theorems are often proved based on Theorem 1 above by using an appropriate mapping from the original state space to a new one. Informally, the mapping “smoothes out” positiondependent drift into an (almost) positionindependent drift. We will use the same approach in the following.
3 General Drift Theorem
In this section, we present our general drift theorem. As pointed out in the introduction, we strive for a very general statement, which is partly at the expense of simplicity. More userfriendly specializations will be proved in the following sections. Nevertheless, the underlying idea of the complicatedlooking general theorem is the same as in all drift theorems. We look into the onestep drift and assume we have a (upper or lower) bound on the drift, which (possibly heavily) depends on . Based on , a new function is defined with the aim of “smoothing out” the dependency, and the drift w. r. t. (formally, ) is analyzed. Statements and of the following Theorem 2 provide bounds on based on the drift w. r. t. . In fact, is defined in a very similar way as in existing variabledrift theorems, such that Statements and can be understood as generalized variable drift theorems for upper and lower bounds on the expected hitting time, respectively. Statement is also valid (but useless) if the expected hitting time is infinite. Sections 4–6 study specializations of these first two statements into existing variable and multiplicative drift theorems.
Statements and
are concerned with tail bounds on the hitting time. Here momentgenerating functions of the drift w. r. t.
come into play, formally is bounded. Again for the sake of generality, bounds on the moment generating function may depend on the current state , as captured by the bounds and . We will see an example in Section 7 where the mapping smoothes out the positiondependent drift into a (nearly) positionindependent drift, while the momentgenerating function of the drift w. r. t. still heavily depends on the current position .Theorem 2 (General Drift Theorem).
Let , be a stochastic process over some state space , where . Let be an integrable function and define by for and . Let for . Then:

If and for some then .

If and for some then .

If and there exists and a function such that then for .

If and there exists and a function such that then for .
If additionally the set of states is absorbing, then .
Special cases of and .
If for some positionindependent , then Statement boils down to ; similarly for Statement .
On .
Some specializations of Theorem 2 require a “gap” in the state space between optimal and nonoptimal states, modelled by . One example is multiplicative drift, see Theorem 7 in Section 5. Another example is the process defined by and for . Its first hitting time of state cannot be derived by drift arguments since the lower bound on the drift towards the optimum within the interval has limit .
Proof of Theorem 2.
The first two items follow from the classical additive drift theorem (Theorem 1). To prove the third one, we use ideas implicit in Hajek (1982) and argue
where the first inequality uses that is nondecreasing, the equality that is a bijection, and the last inequality is Markov’s inequality. Now,
where the last equality follows inductively (note that this does not assume independence of the ). Using the prerequisite from the third item, we get
altogether
which proves the third item.
The fourth item is proved similarly as the third one. By a union bound,
for . Moreover,
using again Markov’s inequality. By the prerequisites, we get
Altogether,
If furthermore is absorbing then is equivalent to . In this case,
∎
Our drift theorem is very general and therefore complicated. In order to apply it, specializations might be welcome based on assumptions that typically are satisfied. The rest of this section discusses such simplifications; however, we do not yet apply them in this paper.
By making some additional assumptions on the function , we get the following special cases.
Lemma 1.
Let , and be any realvalued, differentiable function. Define

If then is concave.

If then is convex.

If then is convex.

If then is concave.
Proof.
The double derivative of is
where the first factor is positive. If , then , and is concave. If , then , and is convex.
Similarly, the double derivative of is
where the first factor is positive. If , then , and is concave. If , then , and is convex. ∎
Corollary 1.
Let , be a stochastic process over some state space , where . Let be a differentiable function. Then the following statements hold for the first hitting time .

If and , then

If and , then

If and for some , then

If and for some , then
Proof.
Let , and note that .
For (i), it suffices to show that condition (i) Theorem 2 is satisfied for . From the assumption , it follows that , hence is a concave function. Jensen’s inequality therefore implies that
where the last inequality holds because is a nondecreasing function.
For (ii), it suffices to show that condition (i) Theorem 2 is satisfied for . From the assumption , it follows that , hence is a convex function. Jensen’s inequality therefore implies that
where the last inequality holds because is a nonincreasing function.
For (iii), it suffices to show that condition (iii) of Theorem 2 is satisfied for . By Lemma 1 and Jensen’s inequality, it holds that
where
where the last inequality holds because is strictly monotonically increasing.
For (iv) a), it suffices to show that condition (iv) of Theorem 2 is satisfied for . By Lemma 1 and Jensen’s inequality, it holds that
where
where the last inequality holds because is strictly monotonically decreasing.
∎
4 Variable Drift as Special Case
The purpose of this section is to show that known variants of variable drift theorems can be derived from our general Theorem 2.
4.1 Classical Variable Drift and Fitness Levels
A clean form of a variable drift theorem, generalizing previous formulations by Johannsen (2010) and Mitavskiy et al. (2009), was recently presented by Rowe and Sudholt (2012). We restate their theorem in our notation and carry out two generalizations that are obvious: we allow for a continuous state space instead of demanding a finite one and do not fix .
Theorem 3 (Variable Drift; following Rowe and Sudholt (2012)).
Let , be a stochastic process over some state space , where . Let be an integrable, monotone increasing function on such that if . Then it holds for the first hitting time that
Proof.
Rowe and Sudholt (2012) also pointed out that variable drift theorems in discrete search spaces look very similar to bounds obtained from the fitness level technique (also called the method of based partitions, first formulated by Wegener, 2001). For the sake of completeness, we present the classical upper bounds by fitness levels w. r. t. the (1+1) EA here and prove them by drift analyis.
Theorem 4 (Classical Fitness Levels, following Wegener (2001)).
Consider the (1+1) EA maximizing some function and a partition of the search space into nonempty sets . Assume that the sets form an based partition, i. e., for and all , it holds . Let be a lower bound on the probability that a search point in is mutated into a search point in . Then the expected hitting time of is at most
Proof.
At each point of time, the (1+1) EA is in a unique fitness level. Let the current fitness level at time . We consider the process defined by . By definition of fitness levels and the (1+1) EA, is nonincreasing over time. Consider for . With probability , the value decreases by at least . Consequently, . We define , and and obtain an integrable, monotone increasing function on . Hence, the upper bound on from Theorem 3 becomes at most , which completes the proof. ∎
4.2 Nonmonotone Variable Drift and Lower Bounds by Variable Drift
In many applications, a monotone increasing function bounds the drift from below. For example, the expected progress towards the optimum of OneMax increases with the distance of the current search point from the optimum. However, recently Doerr et al. (2012) found that that certain ACO algorithms do not have this property and exhibit a nonmonotone drift. To handle this case, they present a generalization of Johannsen’s drift theorem that does not require to be monotone. The most recent version of this theorem is presented in Feldmann and Kötzing (2013). Unfortunately, it turned out that the two generalizations suffer from a missing condition, relating positive and negative drift to each other. Adding the condition and removing an unnecessary assumption (more precisely, the continuity of ) the theorem by Feldmann and Kötzing (2013) can be corrected as follows.
Theorem 5 (extending Feldmann and Kötzing (2013)).
Let , be a stochastic process over some state space , where . Suppose there exists two functions , where is integrable, and a constant such that for all

,

,

if ,

for all with , it holds .
Then it holds for the first hitting time that
Proof.
Using the definition of according to Theorem 2 and assuming , we compute the drift
Item (4) from the prerequisites yields if and if . Using this and , the drift can be further bounded by
where the first inquality used the Item (2) from the prerequisites and the last one Item (1). Plugging in in Theorem 2 completes the proof. ∎
Finally, so far only a single variant dealing with upper bounds on variable drift and thus lower bounds on the hitting time seems to have been published. It was derived by Doerr, Fouz, and Witt (2011). Again, we present a variant without unnecessary assumptions, more precisely we allow continuous state spaces and use less restricted and .
Theorem 6 (following Doerr, Fouz, and Witt (2011)).
Let , be a stochastic process over some state space , where . Suppose there exists two functions and on such that is monotone increasing and integrable and for all ,

,

for ,

for .
Then it holds for the first hitting time that
Proof.
Using the definition of according to Theorem 2, we compute the drift
Comments
There are no comments yet.