DeepAI

# Low-rank Parareal: a low-rank parallel-in-time integrator

The Parareal algorithm of Lions, Maday, and Turinici is a well-known time parallel algorithm for evolution problems. It is based on a Newton-like iteration, with cheap coarse corrections performed sequentially, and expensive fine solves performed in parallel. In this work, we apply Parareal to evolution problems that admit good low-rank approximations and for which the dynamical low-rank approximation (DLRA), proposed by Koch and Lubich, can be used as time stepper. Many discrete integrators for DLRA have recently been proposed, based on splitting the projected vector field or by applying projected Runge–Kutta methods. The cost and accuracy of these methods are mostly governed by the rank chosen for the approximation. We want to use these properties in a new method, that we call low-rank Parareal, in order to obtain a time-parallel DLRA solver for evolution problems. We propose an analysis of the algorithm on affine linear problems and illustrate our results numerically.

• 1 publication
• 13 publications
• 8 publications
02/27/2020

### Existence of dynamical low-rank approximations to parabolic problems

The existence of weak solutions of dynamical low-rank evolution for para...
01/06/2022

### A low-rank power iteration scheme for neutron transport criticality problems

Computing effective eigenvalues for neutron transport often requires a f...
10/16/2017

### Nonsmooth Frank-Wolfe using Uniform Affine Approximations

Frank-Wolfe methods (FW) have gained significant interest in the machine...
08/24/2020

### A Low-rank Method for Parameter-dependent Fluid-structure Interaction Discretizations With Hyperelasticity

In aerospace engineering and boat building, fluid-structure interaction ...
11/19/2019

### A Low-rank Approach for Nonlinear Parameter-dependent Fluid-structure Interaction Problems

Parameter-dependent discretizations of linear fluid-structure interactio...
12/08/2019

### The AZ algorithm for least squares systems with a known incomplete generalized inverse

We introduce an algorithm for the least squares solution of a rectangula...
01/31/2022

### GParareal: A time-parallel ODE solver using Gaussian process emulation

Sequential numerical methods for integrating initial value problems (IVP...

## 1 Introduction

Consider the evolution problem

 \lx@overaccentset%.X(t)=F(t,X(t)),t∈[0,T],X(0)=X0, (1)

where is a matrix of size . In particular, we are interested here in problems that admit good low-rank approximations for every time . More explicitly, for every there exists such that and . The accuracy of this approximation will depend on the choice of the rank . We emphasize that is assumed to be square for notational convenience and all our results can be easily formulated for rectangular .

### 1.1 The Parareal algorithm

The Parareal algorithm is one of the more popular algorithms for time parallelization. It has been proposed by Lions, Maday, and Turinici (2001). It is based on a Newton-like iteration, with inaccurate but cheap corrections performed sequentially, and accurate but expensive solves performed in parallel.

Below, the Parareal iteration is given for constant time step (hence ) and for autonomous . Both restrictions are not crucial but ease the presentation. The quantity is an approximation for at time and iteration . The accuracy of this approximation is expected to improve with increasing . Here, and throughout the paper, we denote dependency on the iteration index as , which should not be confused with the th power.

###### Definition 1.1 (Parareal).

The Parareal algorithm is defined by the following double iteration on and ,

 (Initial value) Xk0=X0, (2) (Initial approximation) X0n+1=Gh(X0n), (3) (Iteration) Xk+1n+1=Fh(Xkn)+Gh(Xk+1n)−Gh(Xkn). (4)

Here, represents a fine (accurate) time stepper applied to the initial value and propagated until time . Similarly, represents a coarse (inaccurate) time stepper.

Given two time steppers, the Parareal algorithm is easy to implement. A remarkable property of the Parareal algorithm is the convergence in a finite number of steps for . It is well known that Parareal works well on parabolic problems but behaves worse on hyperbolic problems; see Gander and Vandewalle (2007) for an analysis.

### 1.2 Dynamical low-rank approximation

When the dimension in problem (1) is large, solving this evolution problem is very expensive since the matrix is typically dense. Fortunately, for problems that admit good low-rank approximations, we can compute the solution more efficiently using the dynamical low-rank approximation (DLRA), proposed by Koch and Lubich (2007).

Let denote the set of matrices of rank , which is a smooth embedded submanifold in . Instead of solving (1), the DLRA solves the following projected problem:

###### Definition 1.2 (Dynamical low-rank approximation).

For a rank , the dynamical low-rank approximation of problem (1) is the solution of

 \lx@overaccentset\large.Y(t)=PY(t)F(t,Y(t)),Y(0)=Y0∈Mr, (5)

where is the -orthogonal projection onto the tangent space of at . In particular, for every .

To analyze the approximation error of DLRA, we need the following standard assumptions from Kieri et al. (2016). Here, and throughout the paper, denotes the Frobenius norm.

###### Assumption 1 (DLRA assumptions).

The function satisfies the following properties for all :

• Lipschitz with constant : .

• One-sided Lipschitz with constant : .

• Maps almost to the tangent bundle of : .

In the analysis in Section 2, it is necessary to have for convergence. This holds when is a discretization of certain parabolic PDEs, like the heat equation. In particular, for an affine function of the form , it holds ; see Hairer et al. (1987, Ch. I.10). The quantity is called the modeling error and decreases when the rank increases. For our problems of interest, this quantity is typically very small. Finally, the existence of is only needed to guarantee the uniqueness of (1) but it will actually not appear in our analysis. We can therefore allow to be large, as is the case for discretized parabolic PDEs.

Standard theory for perturbations of ODEs allows us to obtain the following error bound from the assumptions above:

###### Theorem 1.3 (Error of DLRA (Kieri et al., 2016)).

Under Assumption 1, the DLRA verifies

 ∥∥ψhr(Y0)−ϕh(X0)∥∥≤eℓt∥Y0−X0∥+εr∫t0eℓsds, (6)

where is the flow of the original problem (1) and is the flow of its DLRA (5) for rank .

Research in this field is currently very active, and several discrete integrators for DLRA have been proposed. One class of integrators consists in a clever splitting of the projector and the methods integrate this splitting. An influential example is the KSL scheme proposed in Lubich and Oseledets (2014). Other methods that require integrating parts of the vector field can be found in Khoromskij et al. (2012); Ceruti and Lubich (2022). Another approach is based on projecting standard Runge–Kutta methods (sometimes including their intermediate stages) onto the tangent space. This approach has been proposed in Feppon and Lermusiaux (2018b); Kieri and Vandereycken (2019); Rodgers et al. (2021). Most of these methods, like the method we propose in this paper, are formulated for constant rank . Rank adaptivity can be incorporated without much difficulty for splitting and for projected schemes; see Feppon and Lermusiaux (2018b); Dektor et al. (2021); Rodgers et al. (2021); Ceruti et al. (2022).

The solution of DLRA (5) is quasi-optimal with the best rank approximation. This can be seen already in Theorem 1.3

for short time intervals. Similar estimates exist for parabolic problems

(Conte, 2020)

and for longer time when there is a sufficiently large gap in the singular values and when their derivatives are bounded

(Koch and Lubich, 2007).

### 1.3 Contributions

In this paper, we propose a new algorithm, called low-rank Parareal. As far as we know, this is the first parallel-in-time integrator for low-rank approximations. We analyze the proposed algorithm when the function is affine. To this end, we extended the analysis of the classical Parareal algorithm in Gander and Hairer (2008) to a more general setup where the coarse problem is different from the fine problem. We can prove that the method converges for big steps (large ) on diffusive problems (). In numerical experiments, we confirm this behavior. In addition, the method also performs well empirically with a less strict condition on and on a non-affine problem.

## 2 Low-rank Parareal

We now present our low-rank Parareal algorithm for solving (1). Since the cost of most discrete integrators for DLRA scales quadratically111While the actual cost can be larger, at the very least the methods typically compute compact QR factorizations to normalize the approximations. This costs flops in our setting. with the approximation rank, we take the coarse time stepper as DLRA with a small rank . Likewise, the fine time stepper is DLRA with a large rank . We can even take , which corresponds to computing the exact solution as the fine time stepper since .

###### Definition 2.1 (Low-rank Parareal).

Consider two ranks . The low-rank Parareal algorithm iterates

 (Initial value) Yk0=Y0, (7) (Initial approximation) Y0n+1=ψhq∘Tq(Y0n), (8) (Iteration) Yk+1n+1=ψhr∘Tr(Ykn)+ψhq∘Tq(Yk+1n)−ψhq∘Tq(Ykn), (9)

where is the solution of (5) at time with initial value , and is the orthogonal projection onto . The notations and are similar but apply to rank .

Observe that the rank of is at most for all . The truncated SVD that is required for computing and can therefore be efficiently performed; see Section 3 for implementation details.

### 2.1 Convergence analysis

Let be the solution of the full problem (1) at time . Let be the corresponding low-rank Parareal solution at iteration . We are interested in bounding the error of the algorithm,

 Ekn=Xn−Ykn, (10)

for all relevant and . To this end, we make the following assumption:

###### Assumption 2 (Affine vector field).

The function is affine linear and autonomous, that is,

 F(X)=A(X)+B

with a linear operator and .

The following lemma gives us a recursion for the Frobenius norm of the error. This recursion will be fundamental in deriving our convergence bounds later on when we generalize the proof for standard Parareal from Gander and Hairer (2008).

###### Lemma 2.2 (Iteration of the error).

Under the Assumptions 1 and 2, the error of low-rank Parareal verifies

 (11)

The constants are defined in Assumption 1. Moreover, and are the Lipschitz constants of and .

###### Proof.

Our proof is similar to the one in Kieri and Vandereycken (2019) where first the continuous version of the approximation error of DLRA is studied. Denote by the solution of (1) at time with initial value . By definition, the discrete error is

 Ek+1n+1=ϕh(Xn)−ψhr∘Tr(Ykn)−ψhq∘Tq(Yk+1n)+ψhq∘Tq(Ykn).

We can interpret each term above as a flow from to . Denote these flows by , and with the initial values

 X(tn)=Xn, Z(tn)=Tr(Ykn), W(tn)=Tq(Yk+1n), V(tn)=Tq(Ykn). (12)

Defining the continuous error as

 E(t)=X(t)−Z(t)−W(t)+V(t),

we then get the identity .

We proceed by bounding . By definition of the flows above, we have (omitting the dependence on in the notation)

 \lx@overaccentset\large.E =F(X)−PZF(Z)−PWF(W)+PVF(V) =F(X−Z−W+V)+F(Z)−PZF(Z)−PWF(W)+F(W)−F(V)+PVF(V),

where the last equality holds since the function is affine. Using Assumption 1 and Cauchy–Schwarz, we compute

 12ddt∥E(t)∥2 =⟨E,\lx@overaccentset\large.E⟩ =⟨E,F(E)⟩+⟨E,F(Z)−PZF(Z)⟩+⟨E,F(W)−PWF(W)⟩−⟨E,F(V)−PVF(V)⟩ ≤ℓ∥E∥2+εr∥E∥+2εq∥E∥.

Since , we therefore obtain the differential inequality

 ddt∥E(t)∥≤ℓ∥E∥+εr+2εq.

Grönwall’s lemma allows us to conclude

 (13)

From (12), we get

 E(tn) =Xn−Tr(Ykn)−Tq(Yk+1n)+Tq(Ykn).

Denoting and , we get after rearranging terms

 E(tn) =T⊥r(Xn)+Tr(Xn)−Tq(Xn)−Tr(Ykn)+Tq(Ykn)+Tq(Xn)−Tq(Yk+1n)

Taking norms gives

 ∥E(tn)∥ (14)

where and are the Lipschitz constants of and respectively. Combining inequalities (13) and (14) gives the statement of the lemma. ∎

We now study the error recursion (11) in more detail. To this end, let us slightly rewrite it as

 (15)

with the non-negative constants

 α =eℓhCr,q,β=eℓhCq,γ=maxn≥0∥∥E0n∥∥, (16) κ =eℓhmaxn≥0∥Xn−Tr(Xn)∥+(2εq+εr)∫h0eℓ(h−s)ds.

Our first result is a linear convergence bound, up to the DLRA approximation error. It is similar to the linear bound for standard Parareal.

###### Theorem 2.3 (Linear convergence).

Under the Assumptions 1 and 2, and if , low-rank Parareal verifies for all the linear bound

 maxn≥0∥∥Ekn∥∥≤(α1−β)kmaxn≥0∥∥E0n∥∥+κ1−α−β, (17)

where are defined in (16).

###### Proof.

Define . Taking the maximum for of both sides of (15), we obtain

 ek+1⋆≤αek⋆+βek+1⋆+κ.

By assumption, and we can therefore obtain the recursion

 ek+1⋆≤α1−βek⋆+κ1−β,

with solution

 ek⋆≤(α1−β)ke0⋆+κ1−α−β[1−(α1−β)k].

By assumption, we also have , which allows us to obtain the statement of the theorem. ∎

Next, we present a more refined superlinear bound. To this end, we require the following technical lemma that solves the equality version of the double iteration (15). A similar result, but without the term and only as an upper bound, already appeared in Gander and Hairer (2008, Thm. 1). Our proof is therefore similar but more elaborate.

###### Lemma 2.4.

Let be any non-negative constants such that and . Let be a sequence depending on such that

 ek+1n+1=αekn+βek+1n+κ,e0n+1=γ,ek0=0. (18)

Then,

 ekn=κk−1∑j=0n−j−1∑i=0(i+ji) αjβi+{0 if n≤k,γ αk∑n−k−1i=0(i+k−1i) βi if n≥k+1. (19)
###### Proof.

The idea is to use the generating function for . Multiplying (18) by and summing over , we obtain

 ∞∑n=0ek+1n+1ξn+1=∞∑n=0αeknξn+1+∞∑n=0βek+1nξn+1+∞∑n=0κξn+1,∞∑n=0e0n+1ξn+1=∞∑n=0γξn+1.

Since for all , this gives the relations

 ρk+1(ξ)=αξρk(ξ)+βξρk+1(ξ)+κξ1−ξ,ρ0(ξ)=γξ1−ξ.

We can therefore obtain the linear recurrence

 ρk+1(ξ)=aρk(ξ)+b,where a=αξ1−βξ, b=κξ(1−ξ)(1−βξ).

Its solution satisfies

 ρk(ξ)=αkξk(1−βξ)kγξ1−ξ+k−1∑j=0αjξj(1−βξ)j+1κξ1−ξ.

It remains to compute the coefficients in the power series of the above formula since by definition of they equal the unknowns . The binomial series formula for ,

 1(1−z)k+1=∞∑i=0(i+ki)zi, (20)

together with the Cauchy product gives

 1(1−βξ)k11−ξ =∞∑i=0(i+k−1i)βiξi⋅∞∑i=0ξi=∞∑n=0(n∑ℓ=0(ℓ+k−1ℓ)βℓ)ξn 1(1−βξ)j+111−ξ =∞∑i=0(i+ji)βiξi⋅∞∑i=0ξi=∞∑n=0(n∑ℓ=0(ℓ+jℓ)βℓ)ξn.

Hence, the first term in satisfies

 αkξk(1−βξ)kγξ1−ξ =γαk∞∑n=0(n∑ℓ=0(ℓ+k−1)ℓβℓ)ξn+k+1=γαk∞∑n=k+1(n−k−1∑ℓ=0(ℓ+k−1ℓ)βℓ)ξn,

while the second term can be written as

 k−1∑j=0αjξj(1−βξ)j+1κξ1−ξ=κk−1∑j=0∞∑n=0n∑ℓ=0(ℓ+jℓ)αjβℓξn+j+1=κ∞∑n=1k−1∑j=0(n−1∑ℓ=0(ℓ+jℓ)αjβℓ)ξn+j.

Putting everything together, we have

 ∞∑n=0eknξn=γαk∞∑n=k+1(n−k−1∑ℓ=0(ℓ+k−1ℓ)βℓ)ξn+κ∞∑m=1k−1∑j=0(m−1∑ℓ=0(ℓ+jℓ)αjβℓ)ξm+j.

Finally, we can identify the coefficient in front of with those on the right-hand side. The coefficient for in the first term is clearly nonzero only when . In the second term, there is only one for every such that . Substituting allows us to identify the coefficient of . ∎

Using the previous lemma, we can obtain a convergence bound that is superlinear in .

###### Theorem 2.5 (Superlinear convergence).

Under the Assumptions 1 and 2, and if , the error of low-rank Parareal satisfies for all the bound

 (21)

where are defined in (16).

###### Proof.

Define . By Lemma 2.2, the terms verify the relation described in Lemma 2.4 with replaced by in (18). Hence, the solution (19) from Lemma 2.4 will be an upper bound for .

Since and using the binomial series formula (20), we bound the first term in (19) as

 κk−1∑j=0n−j−1∑i=0(i+ji)αjβi ≤κk−1∑j=0∞∑i=0(i+ji)αjβi=κk−1∑j=0αj1(1−β)j+1 ≤κ1−β∞∑j=0(α1−β)j=κ1−β11−α1−β=κ1−α−β.

For and , observe that

 (i+k−1)!i!=k∏j=1(i+j)≤k∏j=1(n−k−1+j)=k∏j=2(n−j).

Since , we can therefore bound the second term as

 γ αkn−k−1∑i=0(i+k−1i)βi =γ αkn−k−1∑i=0(i+k−1)!i!(k−1)!βi≤γ αk(k−1)!k∏j=2(n−j)n−k−1∑i=0βi ≤γ αk(k−1)!∏kj=2(n−j)1−β.

The conclusion now follows by the definition of . ∎

The proof above can be modified to obtain a simple linear bound that is similar but different to the one from Theorem 2.3:

###### Theorem 2.6 (Another linear convergence bound).

Under Assumptions 1 and 2, and if , the error of low-rank Parareal satisfies for all the bound

 ∥∥Ekn∥∥≤αk(1+β)n−1maxn≥0∥∥E0n∥∥+κ1−α−β, (22)

where are defined in (16).

###### Proof.

We repeat the proof for the superlinear bound but this time, the second term is bounded as

 γ αkn−k−1∑i=0(i+k−1i)βi≤γ αkn−1∑i=0(n−1i)βi=γ αk (1+β)n−1.\qed
###### Remark 2.7.

In the proof above, yet another bound based on (20) is

 γ αkn−k−1∑i=0(i+k−1i)βi≤γ αk∞∑i=0(i+k−1i)βi=γ αk 1(1−β)k.

This time we recover the linear bound from Theorem 2.3.

### 2.2 Summary of the convergence bounds

In the previous section, we have proven four upper bounds for the error of low-rank Parareal. The first is directly obtained from Lemma 2.4. It is the tightest bound but its expression is too unwieldy for practical use. The other three bounds can be summarized as

 ∥∥Ekn∥∥≤Bn,kmaxn≥0∥∥E0n∥∥+κ1−α−β, (23)

with

rate of (23) in
linear
linear
superlinear

Each of these practical bounds describes different phases of the convergence, and none is always better than the others. In Figure 1, we have plotted all four bounds for realistic values of and . We took since it only determines the stagnation of the error and would interfere with judging the transient behavior of the convergence plot. Furthermore, the errors at the start of the iteration were chosen arbitrarily since they have little influence on the results.

The bounds above depend on and , where and are the Lipschitz constants of and respectively; see (16). While it seems difficult to give a priori results on the size and , we can bound them up to first order in the theorem below. Note also that in the important case of , the constants and can be made as small as desired by taking sufficiently large.

###### Theorem 2.8 (Lipschitz constants).

Let . Then

 ∥Tq(A)−Tq(~A)∥≤σqσq−σq+1∥A−~A∥+O(∥A−~A∥2), (24)

where is the th singular value of . Moreover,

 ∥Tr,q(A)−Tr,q(~A)∥≤(σqσq−σq+1+σrσr−σr+1)∥A−~A∥+O(∥A−~A∥2). (25)
###### Proof.

For the first inequality, we refer to Breiding and Vannieuwenhoven (2021, Theorem 2) and Feppon and Lermusiaux (2018a, Theorem 24). The second inequality follows from the first by the triangle inequality,

 ∥Tr,q(A)−Tr,q(~A)∥F ≤∥Tq(A)−Tq(~A)∥+∥Tr(A)−Tr(~A)∥.\qed

In many applications with low-rank matrices, the singular values of the underlying matrix are rapidly decaying. In particular, when the singular values decay exponentially like for some , we have

 σqσq−σq+1=11−σq+1/σq≈11−e−c. (26)

This last quantity decreases quickly to when grows. Even for , it is less than . We therefore see that the constants in Theorem 2.8 are not too large in this case.

## 3 Numerical experiments

We now show numerical experiments for our low-rank Parareal algorithm. We implemented the algorithm in Python 3.10 and all computations were performed on a MacBook Pro with a M1 processor and 16GB of RAM. The complete code is available at GitHub so that all the experiments can be reproduced. The DLRA steps are solved by the second-order projector-splitting integrator from Lubich and Oseledets (2014). Since the problems considered are stiff, we used sufficiently many substeps of this integrator so that the coarse and fine solvers within low-rank Parareal can be considered exact.

### 3.1 Lyapunov equation

Consider the differential Lyapunov equation,

 \lx@overaccentset\large.X(t)=AX(t)+X(t)A+CCT,X(0)=X0, (27)

where is a symmetric matrix, and is a tall matrix for some . This initial value problem admits a unique solution for for any . The most typical example of (27) is the heat equation on a square with separable source term. Other applications can be found in Mena et al. (2018).

###### Assumption 3.

The matrix is symmetric and strictly negative definite.

Under Assumption 3, the one-sided Lipschitz constant for (27) is strictly negative. Indeed, the linear Lyapunov operator has the symmetric matrix representation

with eigenvalues

for ; see Golub and Van Loan (2013, Ch. 12.3). As in Hairer et al. (1987, Ch. I.10), we therefore get immediately that . Moreover, since is invertible, we can write the closed-form solution of (27) as

 X(t)=etA(X0)+A−1(etA(CCT)−CCT), (28)

which can be easily verified by differentiation using properties of the matrix exponential .

The following result shows that the solution of (27) can be well approximated by low rank. It is the analogue to a similar result for the algebraic Lyapunov equation . The latter result is well known, but we did not find a proof for the former in the literature.

###### Lemma 3.1 (Low-rank approximability of Lyapunov ODE).

Let be the th singular value of and likewise for . Under Assumption 3, the solution of (27) has an approximation

 Y(t) of rank at most r0+2rρ

for any with error

 ∥X(t)−Y(t)∥2≤eℓtσr0+1(X0)+(etℓ−1ℓ)(4exp(−π2ρlog(4κA))∥CCT∥2+σr+1(CCT)),

where and .

###### Proof.

The aim is to approximate the following two terms that make up the closed-form solution in (28):

 X1(t)=etA(X0),X2(t)=A−1(etA(CCT)−CCT).

The first term can be treated directly. By the truncated SVD, the initial value satisfies

 X0=Y0+E0 where rank(Y0)=r0 and ∥E0∥2=σr0+1(X0).

By Assumption 3, the operator is full rank. We therefore obtain

 X1(t)=etA(X0)=eAtY0eAt+eAtE0eAt=Y1(t)+E1(t), (29)

where and since .

Next, we focus on the second term . Like above, the source term can be decomposed as

 CCT=D+F where rank(D)=r and ∥F∥2=σr+1(CCT).

By linearity of the Lyapunov operator, we therefore obtain

Denote . By definition of the Lyapunov operator , we have

 S=A−1(M)⟺AS+SA=M.

As studied in Penzl (2000) and then improved in Beckermann and Townsend (2017), the singular values of the solution are bounded as

 σrank(M)ρ+1(S)σ1(S)≤4exp(−π2ρlog(4κA)), (31)

where and . Since by assumption on , the bound (31) then implies that

 S=Y2(t)+δS(t),

where and