Approximating the Permanent of a Random Matrix with Vanishing Mean

11/26/2017 ∙ by Lior Eldar, et al. ∙ MIT 0

Building upon a recent approach pioneered by Barvinok [4, 5, 7, 8] we present a quasi-polynomial time algorithm for approximating the permanent of a typical n x n random matrix with unit variance and vanishing mean μ = (ln ln n)-1/6 to within inverse polynomial multiplicative error. This result counters the common intuition that the difficulty of computing the permanent, even approximately, stems merely from our inability to treat matrices with many opposing signs. We believe that our approach may have several implications to understanding the permanent in the context of computational complexity, anti-concentration statistics, and de-randomization.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Complexity of computing the permanent

The permanent of an matrix is the following degree polynomial in the entries of :


where is the symmetric group over elements. Its computation has been the subject of intense research [5, 3, 2, 7, 8, 9] and has been connected to subjects ranging from random sampling of bi-partite matchings [2] to establishing a so-called “quantum supremacy” using linear optical experiments [3].

Much of this interest is centered around the computational complexity of computing the permanent. The permanent is known to be -hard to compute exactly [8, 10] and we only know exponential time algorithms for computing the permanent of a general matrix , the fastest of which is the Ryser formula.

Because of this hardness in the worst case, research has also focused on computing an approximation for the value of permanent: for multiplicative approximation, approximation schemes are known for several special cases of matrices. Perhaps most prominently is the work of Jerrum, Sinclair and Vigoda [2] who showed a randomized polynomial time algorithm to compute a multiplicative approximation of the permanent for matrices with non-negative entries. More recently [11] have shown how to approximate the permanent of a PSD matrix to simply exponential factor in polynomial time.

Still, if one allows the matrix to have an arbitrary number of negative values it is even -hard to compute the sign of the permanent [10], which rules out a multiplicative approximation. This suggests that part of the computational hardness of computing the permanent comes from alternating signs in a matrix. Hence, for general matrices, it seems that efficient approximation of the permanent remains well out of reach and progress is limited by a an “interference barrier” by which the positive and negative entries of a matrix may generate an intricate interference pattern that is hard to approximate.

Given the apparent difficulty in computing the permanent exactly and approximately one may ask a different question: Is the computation of permanent still hard on a faction of inputs? It turns out that it is still difficult to compute the permanent even on a small fraction of inputs. For example, it has been shown [12] that the permanent of a matrix over a finite field is -hard to compute exactly even on a fraction of such matrices. Specifically for the case of complex Gaussian matrices a somewhat weaker statement is known: It is -hard to compute exactly for a Gaussian w.p. greater than [3].

Faced with the apparent robustness of the complexity of computing the permanent against both error on a fraction of inputs and against approximation, Aaronson and Arkhipov [3] designed a beautiful paradigm called for demonstrating a so-called quantum supremacy over classical computers. It hinges upon permanent computation remaining -hard even when we simultaneously allow error on some inputs (see table 1), and allow an approximate value for the inputs for which we do handle. The intuitive difficulty of approximating the permanent on such a distribution stems from the same ”interference barrier” described above. Namely, that quantum computers, using complex amplitudes encoded in quantum states can ”bypass” this barrier naturally, whereas classical computers fall short.

The range of parameters in our result approaches the threshold of this conjectured hardness. Hence, it raises the intriguing question of whether there exists a yet undiscovered phenomenon, that occurs only for zero mean, which prohibits efficient approximation, or whether that conjecture is false. We discuss this further in Section 1.6.1.

worst case average case
exact -hard -hard
approximate -hard ?
Table 1: The computational complexity of computing the permanent of a complex Gaussian matrix. We know that: Permanent is -hard to compute exactly in the worst case or average case. We also know that permanent is -hard to compute exactly on average. However, nothing is known about the approximation of permanent in the average case. Our result demonstrates that the permanent can be approximated in quasi-polynomial time if the Gaussian ensemble has non-zero but vanishing mean.

1.2 A complex function perspective

Recent works of Barvinok [4, 5, 1, 6, 13] have outlined a new approach to computing the permanent, and in fact a large class of high-degree polynomials in matrices such as the partition function [14, 15]. His approach, stated here for the permanent of an matrix , is quite intuitive: instead of trying to compute directly, one computes an additive approximation of and then exponentiates the result. Let be the all ones matrix. The approximation of is then computed as a Taylor series approximation of the complex log of the univariate polynomial around the point , and evaluated at point . The crucial point is that one can compute the lowest derivatives of at point relatively efficiently, i.e., in time since they correspond essentially to a linear combination of the permanents of sub-matrices of of size at most . The additive approximation error of the Taylor series expansion decays exponentially fast in so choosing implies an algorithm for a multiplicative error of that runs in time at most .

In order to apply this Taylor series technique there is a significant limitation that must be carefully addressed: The Taylor approximation of about a point is valid only in a disk around that contains no poles of (the roots of

). Therefore, this approach inherently requires knowledge about the location of the roots of the univariate polynomial used for interpolating the value of the permanent from an easy-to-compute matrix to the target matrix


Using combinatorial arguments Barvinok characterized the location of the roots of for certain complex matrices: For example those that that satisfy [1] and diagonally dominant matrices [13]. This then implied quasi-polynomial time algorithms for these classes of matrices.

Hence, his approach is the first to break the ”sign barrier” for approximating the permanent - i.e. the ability to approximate the permanent for matrices that have many entries with opposing signs, thus extending our ability to compute the permanent beyond matrices with non-negative entries for which the algorithm by Jerrum, Sinclair and Vigoda [2] is famously known for. Still, this divergence from non-negative entry matrices was only quite mild: The entries of the matrix types that the algorithm handles are still highly correlated in a specific direction and so have a very negligible interference pattern.

That said, this approach opens up a wide range of possibilities for computing these otherwise intractable polynomials: Instead of thinking about them as combinatorial objects, one can ask a completely different set of questions: what can be said about the location of the roots of ? can one detour around these roots in order to reach “interesting” points where the value of is non-trivial? Yet another set of questions can then be phrased for a “random ensemble” of matrices : what is the behavior of the roots of for typical and can they be detoured “on average”, in an efficient way? Answering such questions analytically, and subsequently efficiently as an algorithm is the focus of our work.

1.3 Main results

Consider a random matrix where each entry is sampled independently from a distribution of complex valued random variables with mean

and variance . We refer to such matrix a random matrix with mean and variance . An example of such a matrix is a Bernoulli or Gaussian matrix.

Theorem (Informal statement of Theorem 18).

Let be some sufficiently large integer and . Let be an random matrix with mean and variance . There is a deterministic quasi-polynomial time algorithm that for fraction of random matrices outputs a number that is within inverse polynomial relative error of the permanent .

We note that one can also achieve a mean value parameter of with a run time that is strictly faster than for any . See Remark 20

One can ask whether perhaps allowing a non-vanishing mean devoids the permanent of its average-case hardness. Thus we show a complementary claim whose proof appears in [16]

Theorem 1 (Average-case hardness for the permanent of a nonzero mean Gaussian).

Let and let be the oracle that for any computes the permanent of fraction of matrices from the ensemble exactly. Then .

Hence our results establish a natural ensemble of matrices for which permanent approximation is efficient on average even though the exact computation is not. This should be contrasted with the central BosonSampling assumption which is that permanent approximation remains hard for -mean.

1.4 Roots of random interpolating polynomials

In this work we consider the Taylor series technique of Barvinok (see Section 1.2) in a random setting: We ask - given an ensemble of random matrices what can be said about the typical location of roots of some interpolating polynomial related to the permanent of , say ?

Notably, this question completely changes the flavor of the analysis: Instead of considering families of matrices with some fixed property (say diagonal dominance) and analyzing them combinatorially, we consider random matrices, and then treat as a random polynomial which allows us to bring to bear techniques from analysis of random polynomials.

First, to simplify matters we consider instead the polynomial


and observe that for any non-zero if is a random matrix with mean , then is (up to normalization by ) the permanent of a biased version of with mean . Given this polynomial we ask - what is the distribution of roots of ? Our goal is to show that we can find a sequence of overlapping disks, whose radii is not too small, that allows us to apply analytic continuation (see e.g. [17]) from the value of the function at to for some .

A useful technique in the analysis of random polynomials is Jensen’s formula which states that for an analytic function with zeros that is analytic in the complex disk of radius centered at the origin and we have


In order to apply Jensen’s formula we observe that the left hand side of the formula, for zero mean random matrices , and

is essentially bounded from above by the the logarithm of the second moment

. We prove in Lemma 7 (which provides a simplified version of a proof by Rempała and Wesołowski [18]) that this second moment is upper-bounded by a term that scales at most exponentially with the square of :


Together with Jensen’s formula this bound implies two interesting facts about the roots of summarized in Proposition 8: that typical matrices are such that has no roots inside the disk of radius for , and very few (say ) roots in the disk of radius .

These two facts imply together the existence of analytic curves arching from to some value , with the following property: For a typical from the distribution, these curves are at distance at least some small from any root of . These curves are depicted in Figure 2. Proposition 8 implies that most curves of this form avoid all roots of with margin at least for most ’s, so our algorithm samples such a curve at random and use it to interpolate the value of from to a large value of .

1.5 Turning into an algorithm

To recap the previous section, our overall strategy is to compute a Taylor series expansion of the logarithm of a polynomial related to the permanent of a random matrix . In order to do that, we characterize the location of the roots of for a typical matrix which allows us to find simple curves in the complex plane which are at distance at least some small but large enough from any root of for most ’s. This would imply that for such matrices the function is analytic on any point along these curves, up to radius .

However, it is not immediately clear that this analytic continuation strategy can be turned into an algorithm. Suppose is root-free within -neighborhood of each point of the segment . In his work [5] Barvinok suggested composing with an auxiliary polynomial corresponding to terms in the Taylor expansion of around . He showed that has indeed no roots inside a disk of radius . See lemma 8.1 of [5] and Section 2.2 of [19] for more details.

It is somewhat less clear however, whether one can use the auxiliary map strategy for tubes around more elaborate curves like a quadratic curve, or a piecewise linear curve (which we consider in this work), and whether such maps would result in good error parameters. In order to extend Barvinok’s approach to such curves one needs to compose with another auxiliary low-degree polynomial map which maps the neighborhood of to an neighborhood of these curves. We however use a different method which allows us to interpolation along an arbitrary (even non-differentiable) curve. We name this complementary algorithmic technique Computational Analytic Continuation.

1.5.1 Computational analytic continuation (CAC)

CAC is an algorithm that for a degree polynomial computes the value of at some value given an oracle access to derivatives of at , and a curve , that is at distance at least some from all roots of .

Let be a univariate polynomial and assume we have an oracle which computes the derivatives of this function at . Let be some curve in the complex plane that is at distance at least from any root of . We would like to approximate for some complex number , using a Taylor series expansion of order , that is small as possible. For simplicity, we assume that is piece-wise linear and divide each segment of into small intervals. We denote the entire sequence of intervals as . For each the length of is at most for all , for some .

We then use a sequence of iterations to compute many derivatives at each step: at the first step we compute some derivatives, where is suitably chosen for small approximation error. Then at the next step, we compute derivatives at point . The update rule of -th derivative at step , denoted by is merely the Taylor series approximation of the -th derivative as an analytical function to order :


In general at each step we compute derivatives where the number of derivatives compute at each step is reduced sharply with :


We choose sufficiently large so that is sufficiently large to imply a additive error in the final Taylor series approximation of . Intuitively, if is the number of overlapping disks, then we need to fix .

Since is at distance at least from any root of , and the step sizes are chosen to be sufficiently small compared to the convergence radius around the previous point

it follows that the Taylor series converges quickly to the true function value at each step. We prove a quantitative estimate:

Lemma 2.

(Sketch) Let and suppose that where is the convergence radius of around point , minimized over all . Consider the update rule in equation 5 that computes derivatives using previously computed derivatives . Suppose that for some error parameter . Then


We then show that for a specific choice of parameters , , and we get an inverse polynomial error by using a poly-logarithmic number of derivatives. Since for zero mean random matrix then is a multiplicative approximation of the same random matrix but with vanishing mean, i.e. .

Figure 1: The curve connects and some value and is at distance at least from any root of . Note that interpolation along the curve in small segments can reach a lot further than taking a single interpolation step which is bounded by blue-shaded region - dictated by the location of the nearest root to the point .

1.6 Discussion and future work

Our study extends the line of work pioneered by Barvinok that views the computation of the permanent of a matrix as a question about a related complex-valued polynomial . In this work we allow to be a random matrix and hence recast the question about the permanent of a random matrix as a question about the location of roots of a random polynomial related to . We characterize the behavior of these polynomials for some random matrices , and then provide an algorithmic technique which allows us to turn this knowledge into a quasi-polynomial algorithm for approximating the permanent of these matrices.

For a while now, it has been a folklore notion that the permanent is difficult to compute for general matrices mainly because of the ”sign problem” - namely the fact that entries with opposing signs generate intricate interference patterns. Such matrices avoid by definition the regime of matrices for which efficient algorithms are known, most prominent of which is the algorithm of Jerrum, Sinclair and Vigoda [2] for non-negative entry matrices. Our work, we believe, places this notion in serious doubt - we show natural random ensembles with very little (i.e. vanishing) correlation between any pair of entries - and yet, we are able to approximate the permanent for such matrices quite efficiently. Hence, it seems that if in fact approximation of the permanent on average is a difficult task, it must be due to another, yet uncharacterized phenomenon. Furthermore, our study makes the hardness of approximating the permanent on average to an even more intriguing problem: it seems that several natural ensembles do in fact admit an efficient approximation - but is it the case for other ensembles? Most notably, one would like to consider the case of zero mean complex Gaussian matrices, the presumed hardness of which is the basis for the paradigm, discussed in the following section.

1.6.1 Implications to

In [3] the authors consider the following computational problem:

Definition 3 (Gpe).

Given , , output a number such that with probability at least , in .

They conjecture that

Conjecture 4.

is -hard to compute.

Together with another conjecture on the anti-concentration of the permanent of complex Gaussian matrices this conjecture implies that BPP simulation of the linear-optical experiment called to within total variation distance implies collapse of the polynomial hierarchy to the third level, thereby establishing a so-called “quantum supremacy” of the outcomes of these physical experiments. Using the same anti-concentration assumption on the permanent of zero mean Gaussian matrices we explain (see appendix A) that in fact the above conjecture is true also for complex Gaussian matrices with mean :


On the other hand, our main theorem implies that


and hence is very unlikely to be -hard. This raises the following intriguing question: It seems that the hardness of the permanent of complex Gaussian matrices (or general random matrices for that matter) is not due to the common intuition that the different signs of the entries prohibits combinatorial treatment of the matrix, as a graph, in the spirit of [2]. Hence, if indeed is hard there must exist another phenomenon governing the behavior of the permanent of complex Gaussian matrices with mean values between and which makes computation intractable.

1.6.2 Reducing the mean value

A natural next step for our approach is to attempt to further increase the value of for which we evaluate . Approximating for typical at implies an approximation of the permanent for a random matrix with mean and variance .

However, one can see from the upper-bound on the interpolation error above that in order to achieve sufficiently small error the number of derivatives we initially compute must scale doubly exponentially in the ratio , namely the ratio of the interpolation length , and the step size . Since where is the number of roots in the disk of radius , then which implies that the number of required derivatives is exponential in .

Thus to improve on our technique new ideas would be required which would make a more economic use of the derivatives computed, and not “discard” at each step a fraction of the derivatives computed at that step. Another approach would be to tighten the bound on the number of roots inside the disk of a given radius for the polynomial or some other related polynomial.

1.6.3 Anti-concentration of the permanent

We point out that our algorithm, and also that of [5, 6], are not merely a sequence of computational steps, but actually show that the permanent of typical complex Gaussian (or more generally, random) matrices are well approximated by a low-degree polynomial in the entries of - in fact a polynomial of degree . Such a statement strongly indicates that for this range the permanent of complex Gaussian matrices is anti-concentrated, since it is well-known by a theorem of Carbery-Wright [20] that any polynomial of low-degree in standard i.i.d. complex Gaussian variables is anti-concentrated.

However, we have been unable to use this theorem directly to derive a formal anti-concentration statement. We note that the anti-concentration of the Gaussian permanent is also a necessary conjecture of the paradigm in [3, AC] along with the conjectured -hardness of this problem. Hence, intriguingly it seems that the conjectures on the computability, and statistics (i.e. anti-concentration) of the permanent of the complex Gaussian matrices are closely related: Our results suggest that for mean values the anti-concentration conjecture holds, but ironically this is because the second conjecture does not hold for this regime - i.e. the permanent is relatively easy to compute via a polynomial of low degree.

2 Preliminaries

2.1 Notation

denotes the complex numbers, and denotes the real numbers.

is the complex Gaussian distribution of mean

and variance . is the biased-Bernoulli random variable - it is w.p. and w.p. . is the permanent of matrix . is the natural complex logarithm of , defined up to an additive factor for integer . denotes the closed disk in the complex plane of radius around point . For computational problems we denote if there exists a poly-time reduction from to - i.e. is no harder than .

Definition 5.

Random Matrix

An matrix is called a random matrix and denoted by , if each entry of is independently drawn from some complex valued distribution with mean and variance .

The entries of in the above definition do not have to be identically distributed. We denote the distribution of complex Gaussian matrices with mean and variance with .

3 Permanent-interpolating-polynomial as a random polynomial

As described in Section 1.2 recent studies designed algorithms for evaluating multi-variate functions like the permanent or Ising model by considering a related univariate polynomial . These schemes used this polynomial to interpolate the value of at some point of interest using knowledge of the derivatives at .

For example, in his work, Barvinok [5] used the polynomial . In a more recent work [15] the authors choose a different polynomial - namely the Ising partition function. In both of these works the authors characterized the location of the roots of in order to establish that is analytical in the region of interest. Indeed, in his work, Barvinok showed that has no roots in the unit disk for any matrix that satisfies Likewise, the polynomial in the work of [15] the polynomial was shown to have no roots inside the unit disk using the Lee-Yang theorem.

In our work we consider the polynomial


and then analyze it as a random polynomial in order to gain insight into the distribution of its roots. The choice of this polynomial has a more natural interpretation in the context of random matrices: for a zero mean random matrix , the value of for nonzero is the value of the permanent of a matrix drawn from the ensemble when shifted with a mean of and variance . Another reason why we choose this polynomial is that it is easier to bound its roots compared to .

We begin with the following definition

Definition 6.

[Average sensitivity] Let be a random polynomial where is a random matrix. For any real the stability of at point is defined as


where is the expectation over

from a uniform distribution over


We begin with an upper-bound on the average sensitivity of the permanent of a random matrix that can also be derived from the work of Rempała and Wesołowski [18] (see Proposition 1).

Lemma 7.

Let . Then


A somewhat simpler and more intuitive proof of this lemma is given in [16] (see Lemma 12).

Our next result is to relate the number of roots of inside a disk of certain radius around origin to its average sensitivity around the boundary of that disk. Jensen’s formula provides this connection.

Proposition 8.



and . Then if is the number of roots of inside a disk of radius , . In particular:

  1. Let . With probability at least the polynomial has at most roots inside the disk with radius around .

  2. Let . With probability at least there are no roots inside the disk with radius around .

The above claim immediately implies:

Corollary 9.

Let be real number . For at least fraction of matrices has no roots inside and has at most roots inside .


The proof follows by a union bound on the two items in Proposition 8. In particular, the probability that is not root free is at most , and the probability that has more than roots is at most , so using union bound for the probability of error amounts to . ∎


(of Proposition 8) We use Jensen’s formula. Let be an arbitrary polynomial such that . Let be the number of zeros of inside a disk of radius , and let be these zeros. Then Jensen’s formula is


where we have used the notation


Let be a real number. We first use the following bound


We now pick and view the variables in the Jensen’s formula above as random variables depending on . By Jensen’s formula


where . Then by Lemma 7


and so in particular


So by concavity of the logarithm function


As a result, choosing and doing the change of variable ,


Thus by Markov’s inequality, when :


Next we consider the case when . In this case we can directly use the following Markov’s inequality using equation 28:


Note this bound is useful only when . ∎

3.1 Root-avoiding curves

We now use the above insight on the distribution of roots of the random polynomial to edge closer to an algorithm. We define:

Definition 10.

Root-free area

A subset is root-free w.r.t. polynomial if it contains no roots of .

Definition 11.

Tube of width around a curve

Let be a real number, and let denote some parameterized curve in the complex plane. The tube of width around denoted by , is the set of points defined as , where is the closed -ball centered around . In other words, for each point on the curve we include the -ball around it in the tube.

We will denote by the linear segment in between .

Lemma 12.

Finding a root avoiding tube

Let and . Fix . For numbers consider the following curve


There exists , and , such that the tube is root-free w.r.t. for a fraction at least over choices of . The total length of such a tube is at most .


Fix . For each define the piece-wise linear curve made of 2 segments




See figure 2.

Figure 2: The family of 2-piecewise linear curves interpolating from to some point on the real line. The curves branch out from the unit circle at an angle between and . Each curve starts out at an angle of for some integer and hits the imaginary axis at . Hence the imaginary magnitude of the end-point of the first segment in the -th curve is . After hitting the imaginary axis at , they descend back to the real line in parallel, thus hitting the real line at different points. The bottom-most curve hits the real line at . Note that by this definition tubes of weight around each curve do not overlap outside the ball of radius and in particular when they hit the imaginary axis at .

Note that specifically for a small subset of indices the value of is locked in a tight range:


Let . Then by Equation 34 the union of these tubes is contained inside a ball that is not too large:


and that by our definition of these tubes are disjoint outside a ball of radius :


Let the event of no roots at distance at most to the origin and few roots inside the disk of radius :


By Corollary 9


Condition on and for each and matrix let denote the number of roots of inside . Then by Equation 36 and the definition of we have


and so for uniform random the average number of roots is small


and so in particular this holds for average matrix (conditioned on ):


so by linearity of expectation


which implies that there exists (an index that minimizes ) such that


hence for we have


and so by the union bound with the probability of from Equation 38


Note that the total length of this tube is at most .

4 Computational analytic continuation

In this section we pose the question of analytic continuation as a computational task and devise a method to derive the value of the function at some point using it’s derivatives at , assuming that has a root-free tube around a curve , . We require the following result of Barvinok:

Lemma 13.


Efficient computation of derivatives

  1. Let be an complex matrix and let , where is all ones matrix. There exists a deterministic algorithm that runs in time and computes the -th derivative of at point .

  2. Let be a polynomial of degree . Given an algorithm that runs in time and computes the first derivatives of at point , one can compute in time the first derivatives of at .

We also need the following technical numerical result the proof of which is deferred to the appendix B.

Lemma 14.

For all and


We now present a deterministic algorithm for computing the analytic continuation (see [17]) of a degree polynomial .

Algorithm 15 (Computational analytic continuation).
  1. Input: An oracle that takes a number as input and outputs the first derivatives of at . complex numbers , a number , precision parameter , and integer - the number of derivatives computed at the -th step.

  2. Fixed parameters:

    1. .

    2. and for each

  3. Variables:

    1. for and % the -th derivative of at .

    2. for % the number of derivatives at each point .

    3. .

  4. Main:

    1. Query to obtain

    2. Using Lemma 13 and derivatives from step I compute for .

    3. For each :

      • Set:

      • Compute:

  5. Output:

    Let and return .

Prior to establishing the correctness of the algorithm, we define shifted versions of as follows:




and denote . Note ’s are defined in algorithm 15. We need the following elementary fact which we leave without proof.

Lemma 16.

If the closest root of to the point in the complex plane is , then the closest root of to is also .

We now establish correctness:

Claim 17.

Correctness of algorithm 15

Let be a polynomial of degree at most , and . Suppose the inputs to algorithm 15 satisfy the following conditions:

  1. Precision parameter: for some constant .

  2. Interpolation length: for some constant .

  3. Minimal segment length: for some constant .

  4. Number of iterations: for some constant .

  5. For each the ratio of the distance of the closest root of to to step size is at least .

  6. Number of derivatives at step : for some constant .

Then the following holds: there exists a constant such that if the number of derivatives that the algorithm queries from at step is at least


then output of the algorithm satisfies


where is a complex number.


Let . It is sufficient to show that


Let denote the approximation of the -th derivative of at point obtained by the algorithm. Using oracle for we can compute precisely the derivatives of at and hence using the first part of Lemma 13 evaluate the derivatives of precisely at :


For (in order) algorithm 15 computes the lowest derivatives using the first derivatives as follows:


By assumption 5 and Lemma 16 for each the function is analytical about point in radius . Hence, we can write the -th derivative of as the infinite Taylor series expansion of the -th derivative of evaluated at point :


Let denote the additive approximation error of the -th derivative at step and .


also let denote the worst-case error for all derivatives at step


Using the triangle inequality on equations 53 w.r.t. 54 we get: