Log In Sign Up

ParaDIAG: Parallel-in-Time Algorithms Based on the Diagonalization Technique

In 2008, Maday and Rønquist introduced an interesting new approach for the direct parallel-in-time (PinT) solution of time-dependent PDEs. The idea is to diagonalize the time stepping matrix, keeping the matrices for the space discretization unchanged, and then to solve all time steps in parallel. Since then, several variants appeared, and we call these closely related algorithms ParaDIAG algorithms. ParaDIAG algorithms in the literature can be classified into two groups: * ParaDIAG-I: direct standalone solvers, * ParaDIAG-II: iterative solvers, We will explain the basic features of each group in this note. To have concrete examples, we will introduce ParaDIAG-I and ParaDIAG-II for the advection-diffusion equation. We will also introduce ParaDIAG-II for the wave equation and an optimal control problem for the wave equation. We show the main known theoretical results in each case, and also provide Matlab codes for testing. The goal of the Matlab codes ( is to help the interested reader understand the key features of the ParaDIAG algorithms, without intention to be highly tuned for efficiency and/or low memory use. We also provide speedup measurements of ParaDIAG algorithms for a 2D linear advection-diffusion equation. These results are obtained on the Tianhe-1 supercomputer in China, which is a multi-array, configurable and cooperative parallel system, and we compare these results to the performance of parareal and MGRiT, two widely used PinT algorithms. In a forthcoming update of this note, we will provide more material on ParaDIAG algorithms, in particular further Matlab codes and parallel computing results,also for more realistic applications.


page 1

page 2

page 3

page 4


Global space-time Trefftz DG schemes for the time-dependent linear nonhomogeneous and anisotropic wave equation

In this paper we are concerned with Trefftz discretizations of the time-...

Fast Iterative Solver for the Optimal Control of Time-Dependent PDEs with Crank-Nicolson Discretization in Time

In this article, we derive a new, fast, and robust preconditioned iterat...

Time-parallel simulation of the Schrödinger Equation

The numerical simulation of the time-dependent Schrödinger equation for ...

A note on parallel preconditioning for the all-at-once solution of Riesz fractional diffusion equations

The p-step backwards difference formula (BDF) for solving the system of ...

Parallel numerical method for nonlocal-in-time Schrödinger equation

We present a new parallel numerical method for solving the non-stationar...

Scalable Algorithms for High Order Approximations on Compact Stencils

The recent development of parallel technologies on modern desktop comput...

Exploiting Asynchronous Priority Scheduling in Parallel Eikonal Solvers

Numerical solutions to the Eikonal equation are computed using variants ...

1 Basic idea of ParaDIAG

We start with a basic introduction to ParaDIAG algorithms. Suppose we need to solve in parallel the system of ODEs with initial value arising from the semi-discretization of a time-dependent PDE, where . For finite element discretizations, is the mass matrix and is the stiffness matrix. For finite difference discretizations,

is just an identity matrix. The classical approach for solving such systems of ODEs is to apply a time-integrator, and then solve the resulting difference equation step-by-step in time. Instead, ParaDIAG tries to solve these difference equations

all-at-once. For linear multi-step methods, the all-at-once system is of the form


where are Toeplitz matrices specified by the time-integrator and is the number of time steps111For Runge-Kutta methods, the all-at-once system is different and will be treated in a forthcoming update of this note.. All ParaDIAG algorithms focus on treating the matrices and , while keeping and unchanged. There are mainly two approaches: first, using different step sizes , e.g., a geometrically increasing sequence with , which makes the time-discretization matrices diagonalizable. This yields ParaDIAG algorithms which are direct solvers in the ParaDIAG-I group [12, 5, 7].

The second treatment is to use a uniform step size and solve the all-at-once system (1.1) iteratively, which leads to ParaDIAG algorithms in the ParaDIAG-II group. There are several variants, but the common point is to introduce the -circulant block matrix


where and are Strang type -circulant matrices constructed from and , and is a free parameter. One can then either solve (1.1) via the stationary iteration [18]


where is the iteration index, or via Krylov subspace methods (e.g., GMRES, MINRES) by solving the preconditioned system [13]


which is nothing else than the stationary iteration (1.3) written at its fixed point, i.e. at convergence.

The algorithms proposed in [15] and [8] are essentially ParaDIAG-II algorithms as well, but they are derived from a different point of view. For example, in [8] the authors introduced a Waveform Relaxation (WR) iteration , , and after a time-discretization one can show that at each iteration the all-at-once system is , where . The algorithm in [15] can be understood similarly.

For each variant of ParaDIAG-II we need to compute with

being an input vector. The reason for using

is twofold: first, since and are Strang type -circulant matrices constructed from the Toeplitz matrices and , it naturally holds that converges to as goes to zero. This implies that by using a relatively small , the ParaDIAG-II algorithms converge rapidly. The second point lies in the fact that and can be diagonalized simultaneously, as is shown in the following Lemma.

Lemma 1 (see [3])

Let (with and ) be the discrete Fourier matrix and define for any given parameter the diagonal matrix

Then the two -circulant matrices can be simultaneously diagonalized as

where and represents the first column of , .

Due to the property of the Kronecker product, we can factor and thus we can compute by performing the following three steps:


where and . Since and are given by FFT techniques, Step-(a) and Step-(c) can be computed efficiently with operations. Step-(b) can be computed in parallel since all linear systems are completely independent from each other at different time points. These three steps represent the key steps of ParaDIAG algorithms and will appear frequently in this note, although the details differ in the various cases.

For nonlinear problems with , the basic idea for applying ParaDIAG algorithms is as follows: for linear multi-step methods, the non-linear all-at-once system is


where . The Jacobian matrix of (1.6) is


where . To apply ParaDIAG, we approximate the Jacobian matrix (1.7) by

where is constructed from the values by some averaging [6], e.g., or . Then, we can solve (1.6) by the following simplified Newton iteration:


where for each iteration the increment can be obtained using a ParaDIAG algorithm performing the three steps in (1.5). If we use different step sizes as in [6], then and are already diagonalizable, and we can replace by in (1.8).

In practice, the ParaDIAG algorithms can be combined with a windowing technique: after a certain number of time steps computed in parallel in the current time window, the computation can be restarted for the next time window in a sequential way. This permits the use of a certain adaptivity in time and space.

2 ParaDIAG for Linear Advection-Diffusion Problems

To illustrate the ParaDIAG-I and ParaDIAG-II algorithms, we now use the concrete example of the advection-diffusion equation with periodic boundary conditions222We use periodic boundary condition make the advection dominated situation harder for PinT algorithms, see [4].


where . Using the method of lines and a centered finite difference scheme for the spatial derivatives, we get the system of ODEs

where the matrix is

, and the periodic boundary conditions cause a zero eigenvalue in the matrix


2.1 ParaDIAG-I

To use ParaDIAG as a direct solver, one has to use all different time steps to make the time stepping matrix diagonalizable, and one possibility is to use geometrically increasing time step sizes to discretize (2.2a) as proposed in [12],


where is free parameter and is the first step size. We use the linear -method,


We will only consider and , which corresponds to the Backward-Euler method and the Trapezoidal rule. For , the method is also called the Crank-Nicolson scheme. The difference equations (2.4) can be combined into the all-at-once system

where , is an identity matrix and are matrices representing the time-discretization, namely

The right hand-side is given by with .

Let and . Then, we can rewrite (2.5a) as


where is an identity matrix. The diagonalization of for and can be found in [5] and [7] respectively, but for the reader’s convenience, we show the details here:

Theorem 2.1 (see [5, 7])

For the geometrically increasing step sizes given by (2.3) with , the matrix can be diagonalized as , where

. The eigenvector matrix

and its inverse are Toeplitz matrices of the form


Now using the typical ParaDIAG factorization

we can solve (2.6) by performing the three steps


where and . Since and are given in closed form, we only have to do matrix vector multiplications for Step-(a) and Step-(c), or one could use a fast Toeplitz solver based on Fourier techniques. For Step-(b), the linear systems can be solved simultaneously in parallel. There is however an important issue with this direct time parallel solver ParaDIAG-I: if the time steps are very different, the truncation error of the time stepping scheme becomes worse, and if they are very close to each other, ParaDIAG-I suffers from roundoff error in the diagonalization used in Step-(a) and Step-(c). The best one can do is to balance the two errors, as a detailed analysis in [5, 7] shows, and this limits the applicability of ParaDIAG-I to shorter time intervals and few time steps: the roundoff error is proportional to the condition number of , i.e.,

If is an eigenvector matrix of , the scaled matrix with any invertible diagonal matrix is an eigenvector matrix of as well. From [5, 7], the matrix is a good choice.

To illustrate the limitations of ParaDIAG-I, we provide the Matlab code ParaDIAG_V1_for_ADE, to test it for the advection-diffusion equation. For given , and the constraint determines uniquely the value of the geometric time steps as


For the space discretization, we fix . To study the accuracy of ParaDIAG-I, we use a reference solution obtained from the Matlab ODE solver ode45 with a very small absolute and relative tolerance, AbsTol= and RelTol=, see Figure 2.1 for an illustration.

Figure 2.1: Left: numerical solution obtained with and . Right: reference solution . Here, and the Trapezoidal rule is used.

The results of ParaDIAG-I compared to a sequential time integration are shown in Figure 2.2

Figure 2.2: Left: using the geometric time steps (2.8) with 7 different the errors measured at the finial time point for two numerical solutions: obtained step by step (dash-dot lines) and obtained by (2.7) (solid lines). For with , the error is larger than and thus the corresponding solid line is not shown. Right: for the error for the two numerical solutions as varies. Here, and the Trapezoidal rule is used.

We clearly see that using the geometric time steps (2.8) degrades the accuracy of the numerical solution, and when the time steps are too similar, the roundoff error problem sets in. This phenomenon was carefully studied in [5, 7], and the best possible geometrically stretched grid was determined, which leads to precise limits of time window length and number of time steps within which ParaDIAG-I can be reliably used.

2.2 ParaDIAG-II

Instead of using ParaDIAG as a direct solver with all different time steps to make the time stepping matrix diagonalizable, we can use it iteratively and solve a nearby problem in each iteration chosen such that the time stepping matrix of the nearby problem with uniform time step size can still be diagonalized. This idea leads to ParaDIAG algorithms in the ParaDIAG-II group. Among this group, we can use ParaDIAG within a stationary iteration or a Krylov subspace method. There are so far two very different ways to use ParaDIAG within a stationary iteration, proposed in [8] and [15]. The use of ParaDIAG within a Krylov subspace method can be found in [13, 18].

2.2.1 ParaDIAG-II – Waveform Relaxation (WR) Variant

The ParaDIAG algorithm introduced in [8] is based on the Waveform Relaxation iteration


where is the iteration index and is a free parameter. Upon convergence, the tail term is canceled and thus the converged solution is the solution of (2.2a). Applying the linear -method with a uniform step size to (2.9) gives


where . We rewrite (2.10) as an all-at-once system,

where , and are given by

The matrices are so-called -circulant matrices and can be diagonalized as stated in Lemma 1, and we can again use the typical ParaDIAG factorization . Hence, similar to (2.7) we can solve (2.11a) performing the three steps


where and . In (2.7), Step-(a) and Step-(c) can be computed efficiently via FFT and Step-(b) is again highly parallel. The eigenvector matrix satisfies


and thus the conditioning is depending on the choice of . The convergence properties of this ParaDIAG-II algorithm are summarized in the following theorem.

Theorem 2.2 (see [8])

For the linear system of ODEs , suppose with being an arbitrary eigenvalue of . Let be the -th iterate of the ParaDIAG-II algorithm (2.10) with and

be the reference solution obtained by directly applying the same time-integrator to the system of ODEs. Then the linear convergence estimate

holds, where

This shows that the ParaDIAG-II algorithm (2.10) converges with a rate independent of the spectrum of the matrix and the step size of the time-discretization. The convergence factor becomes smaller when decreases, but the condition number of (cf. (2.13)) implies that can not be arbitrarily small (e.g., not of the size ), because in this case the roundoff error will pollute the accuracy. The best parameter is again the value balancing the roundoff error and the discretization error, like for the direct solver ParaDIAG-I, see [8] for more discussions. In practice, and are good choices.

We provide a Matlab code, namely ParaDIAG_V2_WR_for_ADE, to test the ParaDIAG-II algorithm (2.10). In the code, we use the fft command to obtain by just using the first columns of , instead of the entire matrices. To implement Step-(a) in (2.12), we use the fft command as follows:

b=reshape(b,Nx,Nt);  sol_stepA=fft(Gam.*(b.’)).’;

where b is the vector . Similarly, to implement Step-(c) we use the inverse FFT command ifft,


Here, Gam and invGam. With an initial guess chosen randomly as random(’unif’,-20,20,, ), the first 2 iterates of this ParaDIAG-II algorithm are shown in Figure 2.3.

Figure 2.3: Initial guess and the first two iterates generated by the ParaDIAG-II algorithm (2.10) for and .

The maximum error at each iteration is shown in Figure 2.4.

Figure 2.4: The convergence of the ParaDIAG-II algorithm (2.10) is robust with respect to , and . Here, and the Trapezoidal rule is used as the time-integrator.

We next present some parallel speedup results for the ParaDIAG-II algorithm (2.10) based on Waveform Relaxation for a time-dependent advection-diffusion problem with periodic boundary conditions in 2D,


where and . The results were obtained on the China Tianhe-1 supercomputer [19], which is a multi-array, configurable and cooperative parallel system with a theoretical peak performance of 1.372 petaflops, composed of high performance general-purpose microprocessors and a high-speed Infiniband network. We used the parallel Fortran library MUMPS (MUltifrontal Massively Parallel sparse direct Solver [2, 1]) version 4.10.0 to solve the linear systems in Step-(b) of (2.12). For Step-(a) and Step-(c), the fft and ifft commands are dissected into complex arithmetic operations. For the particular case when the source term is zero as shown in (2.11b), Step-(a) can be implemented in an economical way: only the first column of is needed to compute .

We provided the parallel codes in Fortran, which are zipped by a document labeled as ’Parallel’. For reader’s convenience, the zip document contains a README file, in which we briefly introduce how to use these codes. Note that the modification of the Fortran library MUMPS (serving the purpose of serializing it in a distributed memory environment) is due to the fact that the relevant spatial solves are currently executed only in serial. In addition, a number of note statements are appended to Fortran’s functions and subroutines. All the Fortran codes are compiled with mpich-3.1.3 using the icc compiler version 11.1.059 and -O2 optimization level.

We denote by ParaDIAG-II (B-E) the algorithm (2.10) using Backward-Euler, and by ParaDIAG-II (TR) the one using the Trapezoidal rule, and set . For comparison, we also apply the parareal algorithm and MGRiT to (2.14). The parareal algorithm is implemented using the two-level XBraid solver with F-relaxation, and MGRiT is the multilevel XBraid solver with FCF-relaxation (i.e., an initial F-relaxation followed by a C-relaxation and then a second F-relaxation). Furthermore, we skip the unnecessary work during the first XBraid down cycle for both the parareal and MGRiT algorithms, and fix the coarsening factor to 8. As shown in Table 2.1,

4 4 4 9 4 4 5 10 7 5 5 33 20 5 5 51 26 5 5 54 27 5 5 55 27
8 4 4 9 4 4 5 10 7 5 5 33 20 5 5 51 26 5 5 54 27 5 5 55 27
16 4 4 9 4 4 5 10 7 5 5 33 20 5 5 51 26 5 5 54 27 5 5 55 27
32 4 4 9 4 4 5 10 7 5 5 33 20 5 5 51 26 5 5 54 27 5 5 55 27
64 4 4 9 4 4 5 10 7 5 5 33 20 5 5 51 26 5 5 54 27 5 5 55 27
128 4 4 9 4 4 5 10 7 5 5 33 20 5 5 51 26 5 5 54 27 5 5 55 27

B-E: ParaDIAG-II (B-E), TR: ParaDIAG-II (TR), PR: parareal, MG: MGRiT

Table 2.1: Iteration numbers of ParaDIAG-II (B-E), ParaDIAG-II (TR), Parareal and MGRiT in a strong scaling study, where and . The symbol indicates the number of processors and the coarsening factor is 8 in both parareal (two time levels) and MGRiT (three time levels).

ParaDIAG-II (B-E), ParaDIAG-II (TR), parareal and MGRiT converge robustly with respect to the number of processors. ParaDIAG-II (B-E) and ParaDIAG-II (TR) lead to parameter-robust convergence, while for parareal and MGRiT the required iteration counts increase dramatically as changes from 1 to . The tolerance tol for all experiments here is set to ). In Figure 2.5

Figure 2.5: Comparison of the overall time-to-solution in a strong scaling study, where and . The coarsening factor is cf=8 for both parareal and MGRiT.

we compare the measured CPU times for these PinT algorithms. Clearly, ParaDIAG-II (B-E) and ParaDIAG-II (TR) are two optimally scaling PinT algorithms, while for MGRiT and parareal the scaling is a little bit worse (this is because of the sequential coarse-grid-correction as we will see in the next subsection). The corresponding data is given in Table 2.2.

parareal MGRiT ParaDIAG-II (B-E) ParaDIAG-II (TR)
1.94 2.28 2.49 2.69 2.93 4.33 5.28 7.00 3.96 7.54 15.20 29.03 3.90 7.73 15.13 28.47
1.95 2.30 2.53 2.80 3.03 4.72 5.57 7.41 4.03 7.86 15.53 30.93 3.89 7.75 15.03 29.00
1.96 2.32 2.56 2.83 2.99 4.58 5.62 7.35 3.90 7.76 15.23 29.00 3.91 7.77 15.20 28.84
1.97 2.33 2.57 2.85 2.99 4.58 5.61 7.37 3.87 7.33 15.23 29.85 3.87 7.72 15.14 28.63
1.95 2.31 2.55 2.82 2.95 4.55 5.61 7.33 3.89 7.74 15.17 29.57 3.91 7.72 15.29 29.95
1.94 2.30 2.54 2.81 2.99 4.46 5.64 7.35 3.56 7.75 15.15 28.37 3.89 7.72 15.15 29.26
Table 2.2: Results for the strong scaling speedup of the four PinT algorithms measured by , where is the wall-clock time using processors.

Regarding the parallel efficiency measured by [14] ( is the wall-clock time using processors), the average parallel efficiency for ParaDIAG-II (B-E) is 92.06%, for ParaDIAG-II (TR) it is 90.70%, while it is only 22.82% for MGRiT and 8.75% for parareal.

2.2.2 ParaDIAG-II – Parareal Variant

The second way to use ParaDIAG within a stationary iteration is based on formulating the coarse-grid-correction (CGC) procedure of the parareal algorithm[10, 9] as an all-at-once system and applying ParaDIAG to it. The parareal algorithm is an iterative PinT algorithm, based on the updating formula


where and are called coarse and fine propagator, specified by two time-integrators. The quantity denotes a value calculated by applying successively steps of the fine propagator to the differential equations with initial value and the fine step size . The integer is called the coarsening ratio. Let

Then, the parareal algorithm is . This is the so called CGC, which is a sequential procedure and is often the bottleneck of the parallel efficiency. In [15], the author proposed an idea to parallelize the CGC: supposing we have to solve an initial-value problem

we apply to a slightly wrong problem, namely

where is a free parameter. We use the linear case to illustrate the details of ParaDIAG-II based on the parareal algorithm (for the nonlinear case, see [15]). We also use for simplicity Backward-Euler for . Let . The quantity computed from the previous iteration is

Note that all the quantities and can be computed simultaneously in parallel. Hence, . The parareal algorithm (2.15) can be rewritten as

where , which can be represented as

This problem is now precisely of the form (2.10) for , i.e. the Backward-Euler method, and the solution can be obtained using ParaDIAG-II (cf. 2.12). The convergence rate of this ParaDIAG-II parareal variant is summarized in the following theorem.

Theorem 2.3 (see [15])

Let be the convergence factor of the parareal algorithm with sequential-in-time CGC (i.e., the classical parareal algorithm) and be the convergence factor with parallel-in-time CGC. Then, there exists some threshold of the parameter , such that

In particular, for linear systems of ODEs with