General Drift Analysis with Tail Bounds

by   Per Kristian Lehre, et al.

Drift analysis is one of the state-of-the-art techniques for the runtime analysis of randomized search heuristics (RSHs) such as evolutionary algorithms (EAs), simulated annealing etc. The vast majority of existing drift theorems yield bounds on the expected value of the hitting time for a target state, e.g., the set of optimal solutions, without making additional statements on the distribution of this time. We address this lack by providing a general drift theorem that includes bounds on the upper and lower tail of the hitting time distribution. The new tail bounds are applied to prove very precise sharp-concentration results on the running time of a simple EA on standard benchmark problems, including the class of general linear functions. Surprisingly, the probability of deviating by an r-factor in lower order terms of the expected time decreases exponentially with r on all these problems. The usefulness of the theorem outside the theory of RSHs is demonstrated by deriving tail bounds on the number of cycles in random permutations. All these results handle a position-dependent (variable) drift that was not covered by previous drift theorems with tail bounds. Moreover, our theorem can be specialized into virtually all existing drift theorems with drift towards the target from the literature. Finally, user-friendly specializations of the general drift theorem are given.



There are no comments yet.


page 1

page 2

page 3

page 4


First-Hitting Times Under Additive Drift

For the last ten years, almost every theoretical result concerning the e...

Improved Fixed-Budget Results via Drift Analysis

Fixed-budget theory is concerned with computing or bounding the fitness ...

Drift Theory in Continuous Search Spaces: Expected Hitting Time of the (1+1)-ES with 1/5 Success Rule

This paper explores the use of the standard approach for proving runtime...

Erratum: Simplified Drift Analysis for Proving Lower Bounds in Evolutionary Computation

This erratum points out an error in the simplified drift theorem (SDT) [...

Intuitive Analyses via Drift Theory

Humans are bad with probabilities, and the analysis of randomized algori...

Sharp Bounds on the Runtime of the (1+1) EA via Drift Analysis and Analytic Combinatorial Tools

The expected running time of the classical (1+1) EA on the OneMax benchm...

The (1+1)-ES Reliably Overcomes Saddle Points

It is known that step size adaptive evolution strategies (ES) do not con...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Runtime analysis is a rather recent and increasingly popular approach in the theory of randomized search heuristics. Typically, the aim is to analyze the (random) time until one goal of optimization (optimum found, good approximation found etc.) is reached. This is equivalent to deriving the first hitting time for a set of states of an underlying (discrete-time) stochastic process.

Drift analysis has turned out as one of the most powerful techniques for runtime analysis. In a nutshell, drift is the expected progress of the underlying process from one time step to another. An expression for the drift is turned into an expected first hitting time via a drift theorem. An appealing property of such a theorem is that a local property (the one-step) drift is translated into a global property (the first hitting time).

Sasak and Hajek (1988) introduced drift analysis to the analysis of randomized search heuristics (more precisely, of simulated annealing), and He and Yao (2001) were the first to apply drift analysis to evolutionary algorithms. The latter paper presents a drift theorem that is nowadays called additive drift. Since then, numerous variants of drift theorems have been proposed, including upper and lower bounds in the scenario of multiplicative drift (Doerr et al., 2012; Lehre and Witt, 2012), variable drift (Johannsen, 2010; Mitavskiy et al., 2009; Doerr et al., 2011; Rowe and Sudholt, 2012) and generalizations thereof, e. g., variable drift without monotonicity conditions (Doerr et al., 2012; Feldmann and Kötzing, 2013). Moreover, considerable progress was made in the development of so-called distance functions used to model the process analyzed by drift analysis (Doerr and Goldberg, 2013; Witt, 2013)

. The powerful drift theorems available so far allow for the analysis of randomized search heuristics, in particular evolutionary algorithms and ant colony optimization, on example problems and problems from combinatorial optimization. See also the text books by

Auger and Doerr (2011), Neumann and Witt (2010) and Jansen (2013) for detailed expositions of the state of the art in runtime analysis of randomized search heuristics.

At present, the exciting and powerful research done in drift analysis is scattered over the literature. Existing formulations of similar theorems may share many details but deviate in minor conditions. Notation is not always consistent. Several existing variants of drift theorems contain assumptions that might be convenient to formulate, e. g., Markovian properties and discrete or finite search spaces; however, it was not always clear what assumptions were really needed and whether the drift theorem was general enough. This is one reason why additional effort was spent on removing the assumption of discrete search spaces from multiplicative and variable drift theorems (Feldmann and Kötzing, 2013) – an effort, as we will show, was not really required.

Our work makes two main contributions to the area of drift analysis. The first one is represented by a “universal” formulation of a drift theorem that strives for as much generality as possible. We provably can identify all of the existing drift theorems mentioned above as special cases. While doing this, we propose a consistent notation and remove unnecessary assumptions such as discrete search spaces and Markov processes. In fact, we even identify another famous technique for the runtime analysis of randomized search heuristics, namely fitness levels (Sudholt, 2013) as a special case of our general theorem. Caveat. When we say “all” existing drift theorems, we exclude a specific but important scenario from our considerations. Our paper only considers the case that the drift is directed towards the target of optimization. The opposite case, i. e., scenarios where the process moves away from the target, is covered by the lower bounds from the so-called simplified/negative drift theorem (Oliveto and Witt, 2011), which states rather different conditions and implications. The conditions and generality of the latter theorem were scrutinized in a recent erratum (Oliveto and Witt, 2012).

The second contribution is represented by tail bounds, also called deviation bounds or concentration inequalities, on the hitting time. Roughly speaking, conditions are provided under which it is unlikely that the actual hitting time is above or below its expected value by a certain amount. Such tail bounds were not known before in drift analysis, except for the special case of upper tail bounds in multiplicative drift (Doerr and Goldberg, 2013). In particular, our drift theorem is the first to prove lower tails. We use these tail bounds in order to prove very sharp concentration bounds on the running time of a (1+1) EA on OneMax, general linear functions and LeadingOnes. Up to minor details, the following is shown for the running time  of the (1+1) EA on OneMax (and the same holds on all linear functions): the probability that deviates (from above or below) from its expectation by an additive term of is for any constant . With LeadingOnes, a deviation by from the expected value is proved to have probability . Such sharp-concentration results are extremely useful from a practical point of view since they reveal that the process is “almost deterministic” such that very precise predictions of its actual running time can be made. Moreover, the concentration inequalities allow a change of perspective to tell what progress can be achieved within a certain time budget, see the recent line of work on fixed-budget computations (Jansen and Zarges, 2012; Doerr et al., 2013).

This paper is structured as follows. Section 2 introduces notation and basics of drift analysis. Section 3 presents the general drift theorem, its proof and suggestions for user-friendly corollaries. Afterwards, specializations are discussed. Section 4 shows how the general drift theorem is related to known variable drift theorems, and Section 5 specializes our general theorem into existing multiplicative drift theorems. The fitness level technique, both for lower and upper bounds, is identified as a special case in Section 6. Section 7 is devoted to the tail bounds contained in the general drift theorem. It is shown how they can directly be applied to prove sharp-concentration results on the running time of the (1+1) EA on OneMax and general linear functions. Moreover, a more user-friendly special case of the theorem with tail bounds is proved and used to show sharp-concentration results w. r. t. LeadingOnes. We finish with some conclusions.

2 Preliminaries

Stochastic process.

Throughout this paper, we analyze time-discrete stochastic processes represented by a sequence of non-negative random variables

. For example, could represent the number of zero- or one-bits of an (1+1) EA at generation , a certain distance value of a population-based EA from an optimal population etc. In particular, might aggregate several different random variables realized by a search heuristic at time  into a single one. We do not care whether the state space is discrete (e. g., all non-negative integers or even a finite subset thereof) or continuous. In discrete search spaces, the random variables will have a discrete support; however, this is not important for the formulation of the forthcoming theorems.

First hitting time.

We adopt the convention that the process should pass below some threshold (“minimizes” its state) and define the first hitting time . If the actual process seeks to maximize its state, typically a straightforward mapping allows us to stick to the convention of minimization. In a special case, we are interested in the hitting time  of state ; for example when a (1+1) EA is run on OneMax and were are interested in the first point of time where the number of zero-bits becomes zero. Note that is a stopping time and that we tacitly assume that the stochastic process is adapted to its natural filtration , i. e., the information available up to time .


The expected one-step change for  is called drift. Note that in general is a random variable since the outcomes of are random. Suppose we manage to bound from below by some for all possible outcomes of , where . Then we know that the process decreases its state (“progresses towards ”) in expectation by at least  in every step, and the additive drift theorem (see Theorem 1 below) will provide a bound on that only depends on  and . In fact, the very naturally looking result will be obtained. However, bounds on the drift might be more complicated. For example, a bound on might depend on  or states at even earlier points of time, e. g., if the progress decreases as the current state decreases. This is often the case in applications to evolutionary algorithms. It is not so often the case that the whole “history” is needed. Simple evolutionary algorithms and other randomized search heuristics are Markov processes such that simply . With respect to Markov processes on discrete search spaces, drift conditions traditionally use conditional expectations such as and bound these for arbitrary instead of directly bounding the random variable on .


As pointed out, the drift in general is a random variable and should not be confused with the “expected drift” , which rarely is available since it averages over the whole history of the stochastic process. Drift is based on the inspection of the progress from one step to another, taking into account every possible history. This one-step inspection often makes it easy to come up with bounds on . Drift theorems could also be formulated based on expected drift; however, this might be tedious to compute. See Jägersküpper (2011) for one of the rare analyses of “expected drift”, which we will not get into in this paper.

We now present the first formal drift theorem dealing with additive drift. It is based on a formulation by He and Yao (2001), from which we removed some unnecessary assumptions, more precisely the discrete search space and the Markov property. We only demand a bounded state space.

Theorem 1 (Additive Drift, following He and Yao (2001)).

Let , be a stochastic process over some bounded state space . Assume that . Then:

  1. If then .

  2. If then .

By applying the law of total expectation, Statement  implies and analogously for Statement .

For the sake of completeness, we also provide as simple proof using martingale theory, inspired by Lehre (2012). This proof is simpler than the original one by He and Yao (2001).

Proof of Theorem 1.

We prove only the upper bound since the lower bound is proven symmetrically. We define . Note that as long , is a supermartingale w. r. t. , more precisely by induction

where the inquality uses the drift condition. Since the state space is bounded and , we can apply the optional stopping theorem and get . Rearranging terms, the theorem follows. ∎

Summing up, additive drift is concerned with the very simple scenario that there is a progress of at least  from all non-optimal states towards the optimum in  and a progress of at most  in . Since the -values are not allowed to depend on , one has to use the worst-case drift over all . This might lead to very bad bounds on the first hitting time, which is why more general theorems (as mentioned in the introduction) were developed. It is interesting to note that these more general theorems are often proved based on Theorem 1 above by using an appropriate mapping from the original state space to a new one. Informally, the mapping “smoothes out” position-dependent drift into an (almost) position-independent drift. We will use the same approach in the following.

3 General Drift Theorem

In this section, we present our general drift theorem. As pointed out in the introduction, we strive for a very general statement, which is partly at the expense of simplicity. More user-friendly specializations will be proved in the following sections. Nevertheless, the underlying idea of the complicated-looking general theorem is the same as in all drift theorems. We look into the one-step drift and assume we have a (upper or lower) bound on the drift, which (possibly heavily) depends on . Based on , a new function is defined with the aim of “smoothing out” the dependency, and the drift w. r. t.  (formally, ) is analyzed. Statements  and of the following Theorem 2 provide bounds on  based on the drift w. r. t. . In fact, is defined in a very similar way as in existing variable-drift theorems, such that Statements  and can be understood as generalized variable drift theorems for upper and lower bounds on the expected hitting time, respectively. Statement  is also valid (but useless) if the expected hitting time is infinite. Sections 46 study specializations of these first two statements into existing variable and multiplicative drift theorems.

Statements  and

are concerned with tail bounds on the hitting time. Here moment-generating functions of the drift w. r. t. 

come into play, formally is bounded. Again for the sake of generality, bounds on the moment generating function may depend on the current state , as captured by the bounds and . We will see an example in Section 7 where the mapping smoothes out the position-dependent drift into a (nearly) position-independent drift, while the moment-generating function of the drift w. r. t.  still heavily depends on the current position .

Theorem 2 (General Drift Theorem).

Let , be a stochastic process over some state space , where . Let be an integrable function and define by for and . Let for . Then:

  1. If and for some then .

  2. If and for some then .

  3. If and there exists and a function such that then for .

  4. If and there exists and a function such that then for .

    If additionally the set of states is absorbing, then .

Special cases of and .

If for some position-independent , then Statement  boils down to ; similarly for Statement .

On .

Some specializations of Theorem 2 require a “gap” in the state space between optimal and non-optimal states, modelled by . One example is multiplicative drift, see Theorem 7 in Section 5. Another example is the process defined by and for . Its first hitting time of state  cannot be derived by drift arguments since the lower bound on the drift towards the optimum within the interval has limit .

Proof of Theorem 2.

The first two items follow from the classical additive drift theorem (Theorem 1). To prove the third one, we use ideas implicit in Hajek (1982) and argue

where the first inequality uses that is non-decreasing, the equality that is a bijection, and the last inequality is Markov’s inequality. Now,

where the last equality follows inductively (note that this does not assume independence of the ). Using the prerequisite from the third item, we get


which proves the third item.

The fourth item is proved similarly as the third one. By a union bound,

for . Moreover,

using again Markov’s inequality. By the prerequisites, we get


If furthermore is absorbing then is equivalent to . In this case,

Our drift theorem is very general and therefore complicated. In order to apply it, specializations might be welcome based on assumptions that typically are satisfied. The rest of this section discusses such simplifications; however, we do not yet apply them in this paper.

By making some additional assumptions on the function , we get the following special cases.

Lemma 1.

Let , and be any real-valued, differentiable function. Define

  • If then is concave.

  • If then is convex.

  • If then is convex.

  • If then is concave.


The double derivative of is

where the first factor is positive. If , then , and is concave. If , then , and is convex.

Similarly, the double derivative of is

where the first factor is positive. If , then , and is concave. If , then , and is convex. ∎

Corollary 1.

Let , be a stochastic process over some state space , where . Let be a differentiable function. Then the following statements hold for the first hitting time .

  1. If and , then

  2. If and , then

  3. If and for some , then

  4. If and for some , then


Let , and note that .

For (i), it suffices to show that condition (i) Theorem 2 is satisfied for . From the assumption , it follows that , hence is a concave function. Jensen’s inequality therefore implies that

where the last inequality holds because is a non-decreasing function.

For (ii), it suffices to show that condition (i) Theorem 2 is satisfied for . From the assumption , it follows that , hence is a convex function. Jensen’s inequality therefore implies that

where the last inequality holds because is a non-increasing function.

For (iii), it suffices to show that condition (iii) of Theorem 2 is satisfied for . By Lemma 1 and Jensen’s inequality, it holds that


where the last inequality holds because is strictly monotonically increasing.

For (iv) a), it suffices to show that condition (iv) of Theorem 2 is satisfied for . By Lemma 1 and Jensen’s inequality, it holds that


where the last inequality holds because is strictly monotonically decreasing.

4 Variable Drift as Special Case

The purpose of this section is to show that known variants of variable drift theorems can be derived from our general Theorem 2.

4.1 Classical Variable Drift and Fitness Levels

A clean form of a variable drift theorem, generalizing previous formulations by Johannsen (2010) and Mitavskiy et al. (2009), was recently presented by Rowe and Sudholt (2012). We restate their theorem in our notation and carry out two generalizations that are obvious: we allow for a continuous state space instead of demanding a finite one and do not fix .

Theorem 3 (Variable Drift; following Rowe and Sudholt (2012)).

Let , be a stochastic process over some state space , where . Let be an integrable, monotone increasing function on such that if . Then it holds for the first hitting time that


Since is monotone increasing, is decreasing and , defined in Theorem 2, is concave. By Jensen’s inequality, we get

where the equality just expanded . Using that is decreasing, it follows

Plugging in in Theorem 2 completes the proof. ∎

Rowe and Sudholt (2012) also pointed out that variable drift theorems in discrete search spaces look very similar to bounds obtained from the fitness level technique (also called the method of -based partitions, first formulated by Wegener, 2001). For the sake of completeness, we present the classical upper bounds by fitness levels w. r. t. the (1+1) EA here and prove them by drift analyis.

Theorem 4 (Classical Fitness Levels, following Wegener (2001)).

Consider the (1+1) EA maximizing some function  and a partition of the search space into non-empty sets . Assume that the sets form an -based partition, i. e., for and all , it holds . Let be a lower bound on the probability that a search point in  is mutated into a search point in . Then the expected hitting time of  is at most


At each point of time, the (1+1) EA is in a unique fitness level. Let the current fitness level at time . We consider the process defined by . By definition of fitness levels and the (1+1) EA, is non-increasing over time. Consider for . With probability , the -value decreases by at least . Consequently, . We define , and and obtain an integrable, monotone increasing function on . Hence, the upper bound on from Theorem 3 becomes at most , which completes the proof. ∎

Recently, the fitness-level technique was considerably refined and supplemented by lower bounds (Sudholt, 2013). We will also identify these extensions as a special case of general drift in Section 6.

4.2 Non-monotone Variable Drift and Lower Bounds by Variable Drift

In many applications, a monotone increasing function bounds the drift from below. For example, the expected progress towards the optimum of OneMax increases with the distance of the current search point from the optimum. However, recently Doerr et al. (2012) found that that certain ACO algorithms do not have this property and exhibit a non-monotone drift. To handle this case, they present a generalization of Johannsen’s drift theorem that does not require to be monotone. The most recent version of this theorem is presented in Feldmann and Kötzing (2013). Unfortunately, it turned out that the two generalizations suffer from a missing condition, relating positive and negative drift to each other. Adding the condition and removing an unnecessary assumption (more precisely, the continuity of ) the theorem by Feldmann and Kötzing (2013) can be corrected as follows.

Theorem 5 (extending Feldmann and Kötzing (2013)).

Let , be a stochastic process over some state space , where . Suppose there exists two functions , where is integrable, and a constant such that for all

  1. ,

  2. ,

  3. if ,

  4. for all with , it holds .

Then it holds for the first hitting time that

It is worth noting that Theorem 3 is not necessarily a special case of Theorem 5.


Using the definition of  according to Theorem 2 and assuming , we compute the drift

Item (4) from the prerequisites yields if and if . Using this and , the drift can be further bounded by

where the first inquality used the Item (2) from the prerequisites and the last one Item (1). Plugging in in Theorem 2 completes the proof. ∎

Finally, so far only a single variant dealing with upper bounds on variable drift and thus lower bounds on the hitting time seems to have been published. It was derived by Doerr, Fouz, and Witt (2011). Again, we present a variant without unnecessary assumptions, more precisely we allow continuous state spaces and use less restricted and .

Theorem 6 (following Doerr, Fouz, and Witt (2011)).

Let , be a stochastic process over some state space , where . Suppose there exists two functions and on such that is monotone increasing and integrable and for all ,

  1. ,

  2. for ,

  3. for .

Then it holds for the first hitting time that


Using the definition of  according to Theorem 2, we compute the drift