1 Introduction
Consider the evolution problem
(1) 
where is a matrix of size . In particular, we are interested here in problems that admit good lowrank approximations for every time . More explicitly, for every there exists such that and . The accuracy of this approximation will depend on the choice of the rank . We emphasize that is assumed to be square for notational convenience and all our results can be easily formulated for rectangular .
1.1 The Parareal algorithm
The Parareal algorithm is one of the more popular algorithms for time parallelization. It has been proposed by Lions, Maday, and Turinici (2001). It is based on a Newtonlike iteration, with inaccurate but cheap corrections performed sequentially, and accurate but expensive solves performed in parallel.
Below, the Parareal iteration is given for constant time step (hence ) and for autonomous . Both restrictions are not crucial but ease the presentation. The quantity is an approximation for at time and iteration . The accuracy of this approximation is expected to improve with increasing . Here, and throughout the paper, we denote dependency on the iteration index as , which should not be confused with the th power.
Definition 1.1 (Parareal).
The Parareal algorithm is defined by the following double iteration on and ,
(Initial value)  (2)  
(Initial approximation)  (3)  
(Iteration)  (4) 
Here, represents a fine (accurate) time stepper applied to the initial value and propagated until time . Similarly, represents a coarse (inaccurate) time stepper.
Given two time steppers, the Parareal algorithm is easy to implement. A remarkable property of the Parareal algorithm is the convergence in a finite number of steps for . It is well known that Parareal works well on parabolic problems but behaves worse on hyperbolic problems; see Gander and Vandewalle (2007) for an analysis.
1.2 Dynamical lowrank approximation
When the dimension in problem (1) is large, solving this evolution problem is very expensive since the matrix is typically dense. Fortunately, for problems that admit good lowrank approximations, we can compute the solution more efficiently using the dynamical lowrank approximation (DLRA), proposed by Koch and Lubich (2007).
Let denote the set of matrices of rank , which is a smooth embedded submanifold in . Instead of solving (1), the DLRA solves the following projected problem:
Definition 1.2 (Dynamical lowrank approximation).
For a rank , the dynamical lowrank approximation of problem (1) is the solution of
(5) 
where is the orthogonal projection onto the tangent space of at . In particular, for every .
To analyze the approximation error of DLRA, we need the following standard assumptions from Kieri et al. (2016). Here, and throughout the paper, denotes the Frobenius norm.
Assumption 1 (DLRA assumptions).
The function satisfies the following properties for all :

Lipschitz with constant : .

Onesided Lipschitz with constant : .

Maps almost to the tangent bundle of : .
In the analysis in Section 2, it is necessary to have for convergence. This holds when is a discretization of certain parabolic PDEs, like the heat equation. In particular, for an affine function of the form , it holds ; see Hairer et al. (1987, Ch. I.10). The quantity is called the modeling error and decreases when the rank increases. For our problems of interest, this quantity is typically very small. Finally, the existence of is only needed to guarantee the uniqueness of (1) but it will actually not appear in our analysis. We can therefore allow to be large, as is the case for discretized parabolic PDEs.
Standard theory for perturbations of ODEs allows us to obtain the following error bound from the assumptions above:
Theorem 1.3 (Error of DLRA (Kieri et al., 2016)).
Research in this field is currently very active, and several discrete integrators for DLRA have been proposed. One class of integrators consists in a clever splitting of the projector and the methods integrate this splitting. An influential example is the KSL scheme proposed in Lubich and Oseledets (2014). Other methods that require integrating parts of the vector field can be found in Khoromskij et al. (2012); Ceruti and Lubich (2022). Another approach is based on projecting standard Runge–Kutta methods (sometimes including their intermediate stages) onto the tangent space. This approach has been proposed in Feppon and Lermusiaux (2018b); Kieri and Vandereycken (2019); Rodgers et al. (2021). Most of these methods, like the method we propose in this paper, are formulated for constant rank . Rank adaptivity can be incorporated without much difficulty for splitting and for projected schemes; see Feppon and Lermusiaux (2018b); Dektor et al. (2021); Rodgers et al. (2021); Ceruti et al. (2022).
The solution of DLRA (5) is quasioptimal with the best rank approximation. This can be seen already in Theorem 1.3
for short time intervals. Similar estimates exist for parabolic problems
(Conte, 2020)and for longer time when there is a sufficiently large gap in the singular values and when their derivatives are bounded
(Koch and Lubich, 2007).1.3 Contributions
In this paper, we propose a new algorithm, called lowrank Parareal. As far as we know, this is the first parallelintime integrator for lowrank approximations. We analyze the proposed algorithm when the function is affine. To this end, we extended the analysis of the classical Parareal algorithm in Gander and Hairer (2008) to a more general setup where the coarse problem is different from the fine problem. We can prove that the method converges for big steps (large ) on diffusive problems (). In numerical experiments, we confirm this behavior. In addition, the method also performs well empirically with a less strict condition on and on a nonaffine problem.
2 Lowrank Parareal
We now present our lowrank Parareal algorithm for solving (1). Since the cost of most discrete integrators for DLRA scales quadratically^{1}^{1}1While the actual cost can be larger, at the very least the methods typically compute compact QR factorizations to normalize the approximations. This costs flops in our setting. with the approximation rank, we take the coarse time stepper as DLRA with a small rank . Likewise, the fine time stepper is DLRA with a large rank . We can even take , which corresponds to computing the exact solution as the fine time stepper since .
Definition 2.1 (Lowrank Parareal).
Consider two ranks . The lowrank Parareal algorithm iterates
(Initial value)  (7)  
(Initial approximation)  (8)  
(Iteration)  (9) 
where is the solution of (5) at time with initial value , and is the orthogonal projection onto . The notations and are similar but apply to rank .
Observe that the rank of is at most for all . The truncated SVD that is required for computing and can therefore be efficiently performed; see Section 3 for implementation details.
2.1 Convergence analysis
Let be the solution of the full problem (1) at time . Let be the corresponding lowrank Parareal solution at iteration . We are interested in bounding the error of the algorithm,
(10) 
for all relevant and . To this end, we make the following assumption:
Assumption 2 (Affine vector field).
The function is affine linear and autonomous, that is,
with a linear operator and .
The following lemma gives us a recursion for the Frobenius norm of the error. This recursion will be fundamental in deriving our convergence bounds later on when we generalize the proof for standard Parareal from Gander and Hairer (2008).
Lemma 2.2 (Iteration of the error).
Proof.
Our proof is similar to the one in Kieri and Vandereycken (2019) where first the continuous version of the approximation error of DLRA is studied. Denote by the solution of (1) at time with initial value . By definition, the discrete error is
We can interpret each term above as a flow from to . Denote these flows by , and with the initial values
(12) 
Defining the continuous error as
we then get the identity .
We proceed by bounding . By definition of the flows above, we have (omitting the dependence on in the notation)
where the last equality holds since the function is affine. Using Assumption 1 and Cauchy–Schwarz, we compute
Since , we therefore obtain the differential inequality
Grönwall’s lemma allows us to conclude
(13) 
From (12), we get
Denoting and , we get after rearranging terms
Taking norms gives
(14)  
where and are the Lipschitz constants of and respectively. Combining inequalities (13) and (14) gives the statement of the lemma. ∎
We now study the error recursion (11) in more detail. To this end, let us slightly rewrite it as
(15) 
with the nonnegative constants
(16)  
Our first result is a linear convergence bound, up to the DLRA approximation error. It is similar to the linear bound for standard Parareal.
Theorem 2.3 (Linear convergence).
Proof.
Define . Taking the maximum for of both sides of (15), we obtain
By assumption, and we can therefore obtain the recursion
with solution
By assumption, we also have , which allows us to obtain the statement of the theorem. ∎
Next, we present a more refined superlinear bound. To this end, we require the following technical lemma that solves the equality version of the double iteration (15). A similar result, but without the term and only as an upper bound, already appeared in Gander and Hairer (2008, Thm. 1). Our proof is therefore similar but more elaborate.
Lemma 2.4.
Let be any nonnegative constants such that and . Let be a sequence depending on such that
(18) 
Then,
(19) 
Proof.
The idea is to use the generating function for . Multiplying (18) by and summing over , we obtain
Since for all , this gives the relations
We can therefore obtain the linear recurrence
Its solution satisfies
It remains to compute the coefficients in the power series of the above formula since by definition of they equal the unknowns . The binomial series formula for ,
(20) 
together with the Cauchy product gives
Hence, the first term in satisfies
while the second term can be written as
Putting everything together, we have
Finally, we can identify the coefficient in front of with those on the righthand side. The coefficient for in the first term is clearly nonzero only when . In the second term, there is only one for every such that . Substituting allows us to identify the coefficient of . ∎
Using the previous lemma, we can obtain a convergence bound that is superlinear in .
Theorem 2.5 (Superlinear convergence).
Proof.
The proof above can be modified to obtain a simple linear bound that is similar but different to the one from Theorem 2.3:
Theorem 2.6 (Another linear convergence bound).
Proof.
We repeat the proof for the superlinear bound but this time, the second term is bounded as
2.2 Summary of the convergence bounds
In the previous section, we have proven four upper bounds for the error of lowrank Parareal. The first is directly obtained from Lemma 2.4. It is the tightest bound but its expression is too unwieldy for practical use. The other three bounds can be summarized as
(23) 
with
rate of (23) in  

linear  
linear  
superlinear 
Each of these practical bounds describes different phases of the convergence, and none is always better than the others. In Figure 1, we have plotted all four bounds for realistic values of and . We took since it only determines the stagnation of the error and would interfere with judging the transient behavior of the convergence plot. Furthermore, the errors at the start of the iteration were chosen arbitrarily since they have little influence on the results.
The bounds above depend on and , where and are the Lipschitz constants of and respectively; see (16). While it seems difficult to give a priori results on the size and , we can bound them up to first order in the theorem below. Note also that in the important case of , the constants and can be made as small as desired by taking sufficiently large.
Theorem 2.8 (Lipschitz constants).
Let . Then
(24) 
where is the th singular value of . Moreover,
(25) 
Proof.
In many applications with lowrank matrices, the singular values of the underlying matrix are rapidly decaying. In particular, when the singular values decay exponentially like for some , we have
(26) 
This last quantity decreases quickly to when grows. Even for , it is less than . We therefore see that the constants in Theorem 2.8 are not too large in this case.
3 Numerical experiments
We now show numerical experiments for our lowrank Parareal algorithm. We implemented the algorithm in Python 3.10 and all computations were performed on a MacBook Pro with a M1 processor and 16GB of RAM. The complete code is available at GitHub so that all the experiments can be reproduced. The DLRA steps are solved by the secondorder projectorsplitting integrator from Lubich and Oseledets (2014). Since the problems considered are stiff, we used sufficiently many substeps of this integrator so that the coarse and fine solvers within lowrank Parareal can be considered exact.
3.1 Lyapunov equation
Consider the differential Lyapunov equation,
(27) 
where is a symmetric matrix, and is a tall matrix for some . This initial value problem admits a unique solution for for any . The most typical example of (27) is the heat equation on a square with separable source term. Other applications can be found in Mena et al. (2018).
Assumption 3.
The matrix is symmetric and strictly negative definite.
Under Assumption 3, the onesided Lipschitz constant for (27) is strictly negative. Indeed, the linear Lyapunov operator has the symmetric matrix representation
with eigenvalues
for ; see Golub and Van Loan (2013, Ch. 12.3). As in Hairer et al. (1987, Ch. I.10), we therefore get immediately that . Moreover, since is invertible, we can write the closedform solution of (27) as(28) 
which can be easily verified by differentiation using properties of the matrix exponential .
The following result shows that the solution of (27) can be well approximated by low rank. It is the analogue to a similar result for the algebraic Lyapunov equation . The latter result is well known, but we did not find a proof for the former in the literature.
Lemma 3.1 (Lowrank approximability of Lyapunov ODE).
Proof.
The aim is to approximate the following two terms that make up the closedform solution in (28):
The first term can be treated directly. By the truncated SVD, the initial value satisfies
By Assumption 3, the operator is full rank. We therefore obtain
(29) 
where and since .
Next, we focus on the second term . Like above, the source term can be decomposed as
By linearity of the Lyapunov operator, we therefore obtain
(30) 
Denote . By definition of the Lyapunov operator , we have
As studied in Penzl (2000) and then improved in Beckermann and Townsend (2017), the singular values of the solution are bounded as
(31) 
where and . Since by assumption on , the bound (31) then implies that
where and
where the last inequality holds by properties of the logarithmic norm of which equals ; see Söderlind (2006, Proposition 2.2). Moreover, we can bound the last term in (30) as