## 1 Introduction

The parareal algorithm is a time domain decomposition method aiming at solving PDEs over a splitting of the original domain into multiple subdomains, over which the equation could be solved independently. The algorithm is therefore suited for parallel programmin by distributing the computational cost over several processors. The parallelization with respect to the time variable was earlier attempted by Nievergelt [18] and by Miranker and Liniger [17]

for the ordinary differential equations. The parareal algorithm is another extension of decomposition methods in the time direction, first introduced in

[10] that has grown in popularity as it has been successfully applied to a wide variety of problems. The algorithm consists of two solvers with different time steps, the coarse and the fine, to produce a convergent iterative method for parallel computations. The coarse solver solves the equation sequentially on the coarse time step while the fine solver uses the results from the coarse solution to solve, in parallel, over the fine time steps. The solution can therefore be obtained from independent time subdomain, over which the algorithm will iteratively use a coarse and a fine solver to compute the solution on the entire domain. Several extensions have been proposed following the theoretical analysis. Bal and Maday in [6] slightly modified the algorithm, and proposed an analysis of the convergence properties of the algorithm for parabolic equations.On the other hand, the asynchronous iterative scheme appeared on the horizon since Chazan and Miranker proposed their pioneering ideas applied to the linear systems [8], leading to a prosperity to the high performance applications, such as sub-structuring methods [16] and optimized Schwarz methods [15]. We are interested in the asynchronous iterations of the parareal method. Compared to the classical parareal algorithm, the asynchronous version gives further attractive properties including high efficiency and flexibility. Nevertheless, the implementation of the asynchronous iterative methods involves huge amount of work besides the theoretical analysis, which leads to a frustrating down time for the programming activities. Fortunately, some libraries have been developed to fill this gap. JACE [5] is a multi-threaded library aiming at the execution of asynchronous iterative algorithms based on Java, followed by a centralized volatility tolerant extension named JaceV [4]. Recently, they have developed some P2P and decentralized versions to enlarge the applicable scope [3, 7]. Additionally, CRAC [9] is another library designed to build the asynchronous applications based on C++. Finally, JACK [14] is an MPI-based C++ library, which provides many advanced properties designed for parallel iterative algorithms. A new version called JACK2 has been released. It gives an improved user-friendly API and several new convergence detection methods with global residual computation.

In this paper we concentrate on the implementation of asynchronous parareal solver based on JACK2 library. Section 2 recalls the mathematical and computational framework of the parareal algorithm. In Section 3, we presents the implementation details of such methods by JACK2. In Section 4, we choose the option pricing models to illustrate the experimental results of our asynchronous solver. Finally, our conclusions and remarks follow in Section 5.

## 2 Mathematical and Computational Framework

Consider a time-dependent problem

where is a second-order linear elliptic operator. After a decomposition of the time domain, we arrive at

(1) |

where the time domain is defined in a series of , with , . It is easily seen that the problem (1) is also an exact model. Nevertheless, we often choose an accurate enough sub-interval to approximate the original problem as shown in Figure 1.

We note here that the limit of each subproblem satisfies

which enables us to execute a continuous computation. A well known model proposed in [10] can be described as the following

where is a coarse propagator, and is a fine propagator. Accordingly, represents a coarse time step, whereas is a fine time step. We notice that this is a synchronous parareal iterative scheme, over which one can come up with the algorithmic model forthwith

###### Algorithm 1.

(Classical Parareal Algorithm)

Evidently, the first processor has only successor, whereas the last one has none but predecessor.

We now turn to the asynchronous mode. Formally, we view parareal as a two-stage problem which can be formalized carefully using the asynchronous scheme, illustrated as below

where and , with the conditions , . We assume that each processor keeps updating rather than resting permanently. Furthermore, the elements used by this processor will be renewed from time to time. Similarly, such computational model can be written as

###### Algorithm 2.

(Asynchronous Parareal Algorithm)

## 3 Implementations

### 3.1 Preprocessing

The parareal algorithm is indeed a special case of domain decomposition methods, since each time frame only depends upon its predecessor, and essentially needed by its successor. We separate the neighbors into the outgoing links and the incoming links, and thus write the code as Listing 1. It is seen that sneighb_rank and rneighb_rank have respectively one element in the communication graph. Take note that the first processor has no predecessor, therefore ; while the last one has no successor, obviously, with .

The key idea behind this project is that there exist two levels over the computation process. The upper object is the instance of Parareal, which provides the interface of JACK2’s power, whereas the lower level is the PDESolver, used to solve the equations. We notice that it is easy to enlarge the number of solvers without changing the code of Parareal. Accordingly, we need two instances of PDESolver in Parareal to obtain respectively the coarse results and the fine results, illustrated in Listing 2.

coarse_vec_U is the solution of coarse_pde, while fine_vec_U is the solution of fine_pde. vec_U is the final solution, which is also the output of Parareal solver. Finally, vec_U0 is the incoming value for both coarse_pde and fine_pde.

### 3.2 Overview of the Iterations

We now present the iterative process of parareal algorithm. Notice that the beginning parts of Algorithm 1 and Algorithm 2 are the same, we can therefore write in a unified scheme in Listing 3.

Let us mention here that Listing 3 gives us the exact position of vec_U0

and the initial estimation of

. Nevertheless, we need to distinguish the synchronous and the asynchronous parareal algorithm afterwards.#### 3.2.1 Synchronous mode

We follow the scheme of Algorithm 1 except for the convergence detection mode, since there is a rule of thumb to lighten the communication. It is easily seen that the processor will stop updating after iteration. Hence, we can apply a supplementary condition to the judging area in the synchronous mode, with the final code shown as Listing 4.

We note that it is necessary to copy the values of recv_buf to vec_U0 and the values of vec_U to send_buf whenever needed. However, we can also ease the process by setting directly the two vectors as communication buffers. In practice, the choice relates to the mathematical library used throughout the project. Moreover, using the results of Listing 4, it is also shown in Listing 5 that we need to finalize the fine solution and wait for the global termination.

#### 3.2.2 Asynchronous mode

The asynchronous parareal scheme can be implemented similarly with Listing 4 except for the matching conditions. There is no more facile code in practice, and we present the process as Listing 6. From the entry of lconv_flag, JACK2 can interact with the asynchronous process and give back results as res_norm.

## 4 Experimental Results

In this section, we give the experiments for the asynchronous parareal scheme, based on the Black-scholes equation, which is a well-known time-dependent equation in the domain of option pricing. Consider the following problem

where is the option price, depending on stock price and time . Volatility and risk-free interest rate are the constant parameters. Then, we employ some trivial variable substitutions to obtain the heat equation

(2) |

where is the time to maturity, with initial and boundary conditions

where , , . Obviously, equation (2) can be applied to the classical and the asynchronous parareal scheme. We note here that the input variables are the current stock price and the exercise price . Finally, we broaden the spatial domain and let the dimension correspond to the precision of our problem.

The mathematical operations are supported by Alinea library [11], which is implemented in C++ for both central processing unit and graphic processing unit devices. It includes several linear algebra operations [1] and numerical linear algebra solvers [13], [2], [12].

The experiments are exercised on a SGI ICE X cluster connected with InfiniBand. Each node consists of two Intel Xeon E5-2670 v3 2.30 GHz CPUs with SGI-MPI 2.14 installed. We assume that , , with 250 sub-intervals. Given , , , several average results are reported in Table 1 with Approximate Option Prices , Exact Option Prices , Absolute Error and Relative Error .

Time | |||||
---|---|---|---|---|---|

0.05 | 23.9476 | 23.9426 | 0.0050 | 0.0002 | 0.547 |

0.15 | 31.5512 | 31.5477 | 0.0035 | 0.0001 | 1.663 |

0.25 | 37.7225 | 37.7192 | 0.0033 | 0.0001 | 2.881 |

0.35 | 42.9995 | 42.9960 | 0.0035 | 0.0001 | 3.830 |

0.45 | 47.6375 | 47.6339 | 0.0036 | 0.0001 | 4.814 |

It is seen that the asynchronous parareal algorithm implemented by JACK2 is efficient and accurate.

## 5 Conclusions

In this paper, we illustrated the implementation of synchronous and asynchronous parareal algorithm using JACK2, an asynchronous communication kernel library for iterative algorithms. We discussed the particularity of such time-dependent problem, which leads to a rather different configuration than the general parallel context. As expected, the experimental results at the end verified the correctness of the asynchronous scheme.

## References

- [1] A.-K. C. Ahamed and F. Magoulès. Fast sparse matrix-vector multiplication on graphics processing unit for finite element analysis. In 14th IEEE Int. Conf. on High Performance Computing and Communications, Liverpool, UK, June 25-27, 2012. IEEE, 2012.
- [2] A.-K. C. Ahamed and F. Magoulès. Iterative methods for sparse linear systems on graphics processing unit. In 14th IEEE Int. Conf. on High Performance Computing and Communications, Liverpool, UK, June 25-27, 2012. IEEE, 2012.
- [3] J. M. Bahi, R. Couturier, and P. Vuillemin. JaceP2P: an environment for asynchronous computations on peer-to-peer networks. In Proc. of 2006 IEEE Int. Conf. on Cluster Computing, pages 1–10, 2006.
- [4] J. M. Bahi, R. Couturier, and P. Vuillemin. JaceV: A programming and execution environment for asynchronous iterative computations on volatile nodes. In Proc. of 7th Int. Conf. VECPAR, Rio de Janeiro, Brazil, June 10-13, 2006, pages 79–92. Springer, 2007.
- [5] J. M. Bahi, S. Domas, and K. Mazouzi. Jace: a Java environment for distributed asynchronous iterative computations. In Proc. of 12th Euromicro Conf. on Para. Dist. and Network-Based Proc., 2004, pages 350–357, 2004.
- [6] G. Bal and Y. Maday. A “parareal” time discretization for non-linear pde’s with application to the pricing of an american put. In L. F. Pavarino and A. Toselli, editors, Recent Developments in Domain Decomposition Methods, pages 189–202. Springer, 2002.
- [7] J.-C. Charr, R. Couturier, and D. Laiymani. JACEP2P-V2: A fully decentralized and fault tolerant environment for executing parallel iterative asynchronous applications on volatile distributed architectures. In Proc. of 4th Int. Conf. GPC 2009, Geneva, Switzerland, May 4-8, 2009, pages 446–458. Springer, 2009.
- [8] D. Chazan and W. Miranker. Chaotic relaxation. Linear Algebra and its Applications, 2(2):199–222, 1969.
- [9] R. Couturier and S. Domas. CRAC: a grid environment to solve scientific applications with asynchronous iterative algorithms. In Proc. of 2007 IEEE Int. Parallel and Distributed Processing Symposium, pages 1–8, 2007.
- [10] J.-L. Lions, Y. Maday, and G. Turinici. Résolution d’EDP par un schéma en temps “pararéel”. CRAS. Série I, Mathématique, 332(7):661–668, 2001.
- [11] F. Magoulès and A.-K. C. Ahamed. Alinea: An advanced linear algebra library for massively parallel computations on graphics processing units. The International Journal of High Performance Computing Applications, 29(3):284–310, 2015.
- [12] F. Magoulès, A.-K. C. Ahamed, and R. Putanowicz. Auto-tuned Krylov methods on cluster of graphics processing unit. International Journal of Computer Mathematics, 92(6):1222–1250, 2015.
- [13] F. Magoulès, A.-K. C. Ahamed, and R. Putanowicz. Fast iterative solvers for large compressed-sparse row linear systems on graphics processing unit. Pollack Periodica, 10(1):3–18, 2015.
- [14] F. Magoulès and G. Gbikpi-Benissan. JACK: an asynchronous communication kernel library for iterative algorithms. The Journal of Supercomputing, 73(8):3468–3487, 2017.
- [15] F. Magoulès, D. B. Szyld, and C. Venet. Asynchronous optimized Schwarz methods with and without overlap. Numerische Mathematik, 137:199–227, 2017.
- [16] F. Magoulès and C. Venet. Asynchronous iterative sub-structuring methods. Mathematics and Computers in Simulation, (in press).
- [17] W. L. Miranker and W. Liniger. Parallel methods for the numerical integration of ordinary differential equations. Mathematics of Computation, 21:303–320, 1967.
- [18] J. Nievergelt. Parallel methods for integrating ordinary differential equations. Commun. ACM, 7(12):731–733, 1964.