    # Approximating the pth Root by Composite Rational Functions

A landmark result from rational approximation theory states that x^1/p on [0,1] can be approximated by a type-(n,n) rational function with root-exponential accuracy. Motivated by the recursive optimality property of Zolotarev functions (for the square root and sign functions), we investigate approximating x^1/p by composite rational functions of the form r_k(x, r_k-1(x, r_k-2( ... (x,r_1(x,1)) ))). While this class of rational functions ceases to contain the minimax (best) approximant for p≥ 3, we show that it achieves approximately pth-root exponential convergence with respect to the degree. Moreover, crucially, the convergence is doubly exponential with respect to the number of degrees of freedom, suggesting that composite rational functions are able to approximate x^1/p and related functions (such as |x| and the sector function) with exceptional efficiency.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Composing rational functions is an efficient way of generating a rational function of high degree: if each is of type , then is of type . By choosing each appropriately, one can often obtain a function that approximates a desired function in a wide domain of interest.

There is no reason to expect—and it is generally not true—that can express the minimax rational approximant of a given type, say , to a given function. However, building upon Rutishauser  and Ninomiya , Nakatsukasa and Freund  show a remarkable property of the best rational approximants to the function on for (called Zolotarev functions): appropriately composing Zolotarev functions gives another Zolotarev function of higher degree. In other words, the class of composite rational functions , with each of type , contains the type- minimax approximant to the sign function. Moreover, for a fixed , the convergence of Zolotarev functions is exponential in the degree. Since the degree is , and the number of parameters necessary to express is , it follows that the convergence is , a double-exponential convergence rate. This is so powerful that choosing and (one composition, i.e., two iterations) is enough to obtain convergence to machine precision in double precision arithmetic, with error below .

Functions related to the sign function, such as (via ) and (via then ) can similarly be approximated by composite rational functions. Gawlik  does this for the square root and shows that a composite rational function yields the minimax rational approximant (in the relative sense) on intervals , and that the approximation extends far into the complex plane. This observation generalizes earlier work on rational approximation of the square root with optimally scaled Newton iterations [2, 13, 15, 18]. Moreover, an extension was derived in , which shows that the th root can be approximated efficiently on intervals , although not with minimax quality.

Clearly, in the above papers the origin is excluded from the domain, as the functions have a singularity at . However, a landmark result from rational approximation theory [7, 16] states that the best rational approximant (in the absolute sense) of (for any real ) on can be approximated by a type- rational function with root-exponential accuracy. One might wonder, can this be done with a composite rational function? This is the question we address in this paper. We focus on the case in which with an integer.

We show that a rational function of the form can approximate on with superalgebraic accuracy, with close to th root-exponential convergence. Moreover—and crucially—the convergence is doubly exponential with respect to the number of degrees of freedom. That is, the error is for some constants , where is the number of parameters needed to express the rational function. By “number of parameters” we mean if has type for , so that reflects the cost of evaluating at a matrix argument.

Clearly, our result implies that any rational power of can be approximated by a composite rational function. Moreover, since on implies on for any , hence , our results also show that any rational power can be approximated efficiently on by a composite rational function. In addition, our approximants to immediately lead to approximants to the -sector function .

More generally, we think composite (rational) functions are a powerful tool in approximation theory, and we regard this as a contribution towards demonstrating their effectiveness and practicality. Indeed, one might say they are already used extensively in scientific computing:

1. Composite rational functions are implicitly employed in most algorithms for computing matrix functions , in which approximating a function on the spectrum of the matrix is required. For the th root, a standard algorithm [8, Ch. 7] employs Newton’s method, which ultimately approximates with a sequence of rational functions of given recursively by , . The function is composite rational and similar to the approximants we use, but not the same (it is unscaled), and it exhibits exponential rather than double-exponential convergence on . Generally speaking, Newton’s method for computing a matrix function (or more generally for various nonlinear problems, e.g. rootfinding) can often be interpreted as approximating (or the solution) by a composite rational function of .

2. The rapidly growing subject of deep learning is based on composing a large number of nonlinear activation functions

.

#### Summary of Results.

To summarize our results, let us introduce some terminology. We say that a univariate rational function is of type if and are polynomials of degrees at most and , respectively. We denote the set of all such rational functions by . We say that a bivariate rational function is of type if is of type . We say that a univariate rational function is -composite if is a composition of rational functions , , each of type :

 r(x)=rk(x,rk−1(x,rk−2(⋯(x,r1(x,1))))). (1)

Here is the main result of this paper.

###### Theorem 1.1.

Let be an integer. There exists a positive constant depending on such that for every integer , there exists a -composite rational function of type such that

 maxx∈[0,1]|r(x)−x1/p|≤exp(−bnc), (2)

where is a constant depending on and

 c=log(pp−1)log2log(2pp−1)logp. (3)

Note that when , , and as , .

Let us comment on the theorem. The bound (2) shows that by using a -composite rational function we can approximate the th root with “th root”-(nearly th root) exponential accuracy with respect to the degree, which is suboptimal unless (in which case a composite rational function on is optimal in the relative sense).

However, the result is still striking in the following sense: the number of degrees of freedom used to express is just for (see below (14)), and therefore with respect to the degrees of freedom , the convergence is

 maxx∈[0,1]|r(x)−x1/p|≤exp(−bp~cd), (4)

indicating a double-exponential convergence with respect to .

As a byproduct of our analysis, we will obtain analogous results for composite rational approximation of the -sector function on the set given by

 Sp={xe2πij/p∣x∈[0,1],j∈{1,2,…,p}}. (5)

We will also consider the subset of excluding the origin

 Sp,α={xe2πij/p∣x∈[α,1],j∈{1,2,…,p}}. (6)

We say that a -composite rational function (1) is pure if the functions appearing in (1) are univariate:

 r(x)=rk(rk−1(⋯(r1(x)))).
###### Theorem 1.2.

Let be an integer, and . There exists a positive constant depending on such that for every integer , there exist pure -composite rational functions and of type such that

 maxz∈Sp|z(r(z)−sectp(z))|≤exp(−bnc), (7)

where and are as in Theorem 1.1, and

 maxz∈Sp,α|q(z)−sectp(z)|≤exp(−ˆbnˆc), (8)

where depends on and , and .

It is worth noting that the two rational functions , are generally different—they coincide for a particular value of . The error in (7) is measured in a weighted norm, which is natural in view of the fact that is discontinuous at . When and , and , so (7) recovers the root-exponential convergence of rational approximants to on  [17, Ch. 25]. By contrast, (8) shows that a better bound holds for the absolute error if one excludes the neighborhood of the origin. When , and (8) recovers the exponential convergence of Zolotarev functions to the sign function on  [1, 3]. Our analysis will show that decays like a negative power of as .

#### Organization.

This paper is organized as follows. In Section 2, we review some theory from  concerning composite rational approximants of the th root on positive real intervals. In Section 3, we study the behavior of these approximants near the origin. We then prove Theorems 1.1 and 1.2 in Section 4, and we illustrate our results numerically in Section 5.

## 2 Composite rational approximation of the pth root

To approximate on an interval , Gawlik  considers the recursively defined rational function

 fk+1(x) =fk(x)^rm,ℓ(xfk(x)p,αk,p√⋅), f0(x) =1, (9) αk+1 =αk^rm,ℓ(αpk,αk,p√⋅), α0 =α, (10)

where is (a rescaling of) the relative minimax rational approximant of type on the interval :

 ^rm,ℓ(x,α,p√⋅)=(1+α2α)rm,ℓ(x,α,p√⋅),

where

 rm,ℓ(⋅,α,p√⋅)=argminr∈Rm,ℓmaxx∈[αp,1]∣∣ ∣∣r(x)−x1/px1/p∣∣ ∣∣. (11)

Gawlik shows that is a rapidly convergent approximant to the th root on . With recursions, the maximum relative error on decays double exponentially in : it is bounded above by for some depending on , , , and . Importantly, these constants depend very weakly on ; the analysis below will implicitly show that when , is independent of and decays like a negative power of as , just like in (8).

Given that (9) is an approximant on , which is an interval that excludes the singularity at , a natural question arises: can we approximate on ? Intuitively, the function is still continuous at (unlike e.g. the sign or sector function) with , and hence it is possible to approximate on the whole interval . Indeed Stahl  shows that on can be approximated by a type- rational function with root-exponential accuracy (we refer to [4, 14] for general results on classical rational approximation theory). Can a highly efficient rational approximant be constructed based on recursion as in (9)? It is important to note that we will necessarily switch to the (more natural) metric of absolute error rather than the relative error for this purpose.

It turns out that the rational function (9) does a good job approximating on , when is chosen carefully: when it is too small, the error is large on (in fact it is maximal at  ). Conversely if is too large, the error is large on (in fact it is at , as we show below). A major task undertaken in what follows is to choose so that the convergence is optimized, in that the error on and are balanced to be approximately the same.

Our analysis will focus on the lowest-order version of the iteration (9-10), obtained by choosing . It is shown in [5, Proposition 5] (and elsewhere [9, 11]) that for this choice of and ,

 ^r1,0(x,α,p√⋅)=1p((p−1)μ(α)+xμ(α)p−1),μ(α)=(α−αp(p−1)(1−α))1/p. (12)

Thus, when , the iteration (9-10) reads

 fk+1(x) =1p((p−1)μ(αk)fk(x)+xμ(αk)p−1fk(x)p−1), f0(x) =1, (13) αk+1 =pαk(p−1)μ(αk)+μ(αk)1−pαpk, α0 =α. (14)

Note that is -composite since it is of the form (1) with

 rj(x,y)=1p((p−1)μ(αj−1)pyp+xμ(αj−1)p−1yp−1)

for each . It follows from this observation and an inductive argument that has type for each .

We rely heavily on this explicit expression for the particular case , as it lets us analyze the functions in detail, which leads to a constructive proof for Theorem 1.1. We note that using larger values of may result in faster convergence, in particular a larger exponent than (3). In view of (4), the convergence is still doubly exponential, with an improved constant . However, we do not expect the improvement would be significant.

Moreover, composing low-degree rational functions is an extremely efficient way to construct high-degree rational functions of matrices, and we suspect that our choice would give the fastest convergence in terms of the number of matrix operations needed to evaluate at a matrix argument.

## 3 Bounding the error on [0,αp]

In this section, we analyze the absolute error committed by the function defined by (13)–(14) on the interval . It will be convenient to consider not but the scaled function

 ˜fk(x)=2αk1+αkfk(x), (15)

which has the property that [5, Theorem 2]

 maxx∈[αp,1]˜fk(x)−x1/px1/p=−minx∈[αp,1]˜fk(x)−x1/px1/p=1−αk1+αk∈(0,1). (16)

We will prove the following estimate.

###### Theorem 3.1.

Let . The function defined by (13)–(14) and (15) satisfies

 maxx∈[0,αp]|˜fk(x)−x1/p|≤2α (17)

for every .

Experiments suggest that the bound (17) could be improved to for large enough, but this does not affect what follows in any significant way.

We will prove Theorem 3.1 by a series of lemmas. Let

 gk(x)=xfk(xp).

Note that and

 gk+1(x)=xfk(xp)^r1,0(xpfk(xp)p,αk,p√⋅)=gk(x)^r1,0(gk(x)p,αk,p√⋅)=^s(gk(x),αk), (18)

where

 ^s(x,α)=x^r1,0(xp,α,p√⋅)=px(p−1)μ(α)+μ(α)1−pxp.

Also let

 H(α)=^s(α,α)=pα(p−1)μ(α)+μ(α)1−pαp,

so that .

###### Lemma 3.1.

For every and every ,

 0≤x^s′(x,α)≤^s(x,α)≤H(α),

where the prime denotes differentiation with respect to .

###### Proof.

A short calculation shows that

 x^s′(x,α)=w(x)^s(x,α),

where

 w(x)=(p−1)(1−(xμ(α))p)(p−1)+(xμ(α))p.

Since for every , it follows that

 0≤x^s′(x,α)≤^s(x,α),x∈[0,μ(α)].

In particular, the above inequalities hold on , and is nondecreasing on . Thus,

 ^s(x,α)≤^s(α,α)=H(α),x∈[0,α].

Now let be fixed.

###### Lemma 3.2.

For every and every ,

 0≤xg′k(x)≤gk(x)≤αk.
###### Proof.

Since and , the above inequalities hold when . Assume that they hold for some . Observe that

 xg′k+1(x)=xg′k(x)^s′(gk(x),αk).

Since for , Lemma 3.1 implies that . It follows from this and our inductive hypothesis that for . In addition, since and ,

 xg′k+1(x)≤^s(gk(x),αk)=gk+1(x).

Finally, since , it follows that . ∎

###### Lemma 3.3.

For every and every ,

 0<˜fk(x)≤α(1+εk),εk=1−αk1+αk.
###### Proof.

We first note that is positive and nondecreasing on . Indeed, differentiating the relation

 fk(xp)=xgk(x)

gives

 pxp−1f′k(xp)=gk(x)−xg′k(x)gk(x)2,

so Lemma 3.2 implies that for every . Evaluating the recursion (13) at gives

 fk+1(0)=fk(0)(p−1p)μ(αk),f0(0)=1,

so for every . Since is a positive multiple of , it follows that for every . Finally, taking in (16) gives . ∎

By the lemma above,

 |˜fk(x)−x1/p|≤max{|˜fk(x)|,|x1/p|}≤max{α(1+εk),α}=α(1+εk)≤2α,x∈[0,αp],

so

 maxx∈[0,αp]|˜fk(x)−x1/p|≤2α.

This completes the proof of Theorem 3.1.

An estimate for the absolute error on is now immediate: Combining the above theorem, (16), and the fact that for , we see that

 maxx∈[0,1]|˜fk(x)−x1/p|≤max{2α,1−αk1+αk}. (19)

### 3.1 Sector function approximation

We note that the function in (18) approximates the -sector function (this observation appeared in [5, Sec. 4]), and is a pure composite rational function of the form . In fact it is -composite, and an inductive argument shows that it has type . In the case, this reduces to Zolotarev’s best rational approximant to the sign function of type . That is, as in the square root approximation, the minimax rational approximant is contained in the class of (here purely) composite rational functions.

Below we derive estimates for the maximum weighted error on the sets defined in (5) and (6). As before, it will be convenient to work not with but with the rescaled function

 ˜gk(z)=21+αkgk(z)=4αk(1+αk)2z˜fk(zp).

As shown in [5, Sec. 4], the relative error is real-valued and equioscillates on each line segment , . Note that here the relative and absolute errors are the same in modulus. The asymptotic convergence rate on was analyzed in . Here we quantify the non-asymptotic convergence on .

###### Lemma 3.4.

For every ,

 maxz∈Sp|z(˜gk(z)−sectp(z))|≤max{α,1−αk1+αk}, (20)

and

 maxz∈Sp,α|˜gk(z)−sectp(z)|≤1−αk1+αk. (21)
###### Proof.

Let with and . Since and , we have

 |z(˜gk(z)−sectp(z))|=|x1/p(˜gk(x1/p)−1)|.

If , then Lemma 3.2 implies that , so

 |x1/p(˜gk(x1/p)−1)|≤x1/p≤α,x∈[0,αp].

On the other hand, if , then

 |x1/p(˜gk(x1/p)−1)|≤|˜gk(x1/p)−1|=∣∣ ∣∣4αk(1+αk)2x1/p˜fk(x)−1∣∣ ∣∣. (22)

By (16),

 ˜fk(x)x1/p∈[1−(1−αk1+αk),1+(1−αk1+αk)]=[2αk1+αk,21+αk],x∈[αp,1],

so

 x1/p˜fk(x)∈[1+αk2,1+αk2αk],x∈[αp,1],

and hence

 4αk(1+αk)2x1/p˜fk(x)−1∈[−1−αk1+αk,1−αk1+αk],x∈[αp,1]. (23)

It follows that

 |x1/p(˜gk(x1/p)−1)|≤1−αk1+αk,x∈[αp,1].

For (21), we simply start from the second expression in (22) and use (23). ∎

## 4 Proof of Theorems 1.1 and 1.2

To examine the convergence of the recursion (13)-(14) on , we first ask the question: given , what values of and are needed to get an error ? In view of (19), we must choose and large enough so that .

To determine , we select a constant (depending on ) and split the convergence of into three stages:

1. Find such that .

2. Find such that .

3. Find such that .

Clearly, the second stage is independent of and , so is a constant (depending on ).

Our choice of is described in the following lemma.

###### Lemma 4.1.

There exists a constant depending on such that

 1−H(α)1+H(α)≤p2(1−α1+α)2

for every .

###### Proof.

It is proven in [5, Theorem 2] that the iteration (10) generates an increasing sequence satisfying and

 1−αk+11+αk+1=C(m,ℓ,p)(1−αk1+αk)m+ℓ+1+o((1−αk1+αk)m+ℓ+1),

where

 C(m,ℓ,p)=pm+ℓ+1m!ℓ!(1/p)ℓ+1(1−1/p)m2m+ℓ(m+ℓ+1)!(m+ℓ)!

and denotes the rising factorial (the Pochhammer symbol): . Since , this implies that the iteration (10) with (i.e., the iteration (14)) generates satisfying

 1−αk+11+αk+1=(p−14)(1−αk1+αk)2+o((1−αk1+αk)2).

In other words,

 1−H(α)1+H(α)/(1−α1+α)2→p−14, as α↑1.

It follows that the above ratio is bounded by for close enough to . ∎

Without loss of generality, we assume

 α∗>max{1e,p−2p+2} (24)

in what follows.

#### Stage 1

We will now determine such that . We begin with a lemma.

###### Lemma 4.2.

For every ,

 H(α)>α1−1/p.
###### Proof.

We have

 H(α) =pαμ(α)p−1(p−1)μ(α)p+αp=pαμ(α)p−1α−αp1−α+αp=pαμ(α)p−1(1−α)α−αp+1 =pμ(α)p−1(1−α)1−αp=α1−1/pg(α)1−1/ph(α),

where

 g(α) =1−αp−1(p−1)(1−α)=1p−1p−2∑j=0αj, h(α) =1−αpp(1−α)=1pp−1∑j=0αj.

Since for every , it follows that

 g(α)1−1/ph(α)>g(α)h(α)