1. Introduction
The parareal algorithm was first proposed in 2001 by Lions, Maday, and Turinici (Lions et al., 2001) as an integration technique to solve differential equations in parallel. We present a full implementation of the parareal algorithm in the Julia programming language (https://julialang.org) (Moffit and Tate, 2014) for a fully general, firstorder, initialvalue problem. Furthermore, we present a simulation of the algorithm for purposes of pedagogy and as a tool for investigating the performance of the algorithm. Our implementation accepts both coarse and fine integrators as functional arguments. We use Euler’s method and another RungeKutta integration technique as the integrators in our experiments. We start with a brief introduction to the Julia programming language.
2. An Introduction to Julia: Dynamic, yet Efficient, Scientific/Numerical Programming
Julia is a multiparadigm language designed for scientific computing; it supports multidimensional arrays, concurrency, and metaprogramming. Due to both Julia’s LLVMbased JustInTime compiler and the language design, Julia programs run computationally efficient—approaching and sometimes matching the speed of languages like C. See (??, web) for a graph depicting the relative performance of Julia compared to other common languages for scientific computing on a set of microbenchmarks.
2.1. Coroutines and Channels in Julia
Coroutines are typically referred to as tasks in Julia, and are not scheduled to run on separate CPU cores. Channels in Julia can be either synchronous or asynchronous, and can be typed. However, if no type is specified in the definition of a channel, then values of any type can be written to that channel, much like unix pipes. Messages are passed between coroutines through channels with the put! and take!() functions. To add tasks to be automatically scheduled, use the schedule() function, or the @schedule and @sync macros. Of course, coroutines have little overhead, but will always run on the same cpu.
The current version of Julia multiplexes all tasks onto a single os thread. Thus, while tasks involving i/o operations benefit from parallel execution, compute bound tasks are effectively executed sequentially on a single os thread. Future versions of Julia may support scheduling of tasks on multiple threads, in which case compute bound tasks will see benefits of parallel execution too(??, Jul).
2.2. Parallel Computing
In addition to tasks, Julia supports parallel computing—functions running on multiple cpus or distributed computers. New processes are spawned with addproc(n), where n is the number of processes desired. The function addproc returns the pids of the created processes. The function workers returns a list of the processes. Alternatively, the Julia interpreter can be started with the p n option, where n is the number of processes desired. For instance:
Note that the process ids start at 2 because the Julia REPL shell is process 1.
Processes in Julia, which are either locally running or remotely distributed, communicate with each other through message passing.
The function remotecall(Function, ProcessID, args …) executes Function on worker ProcessID and returns a value of the Future type, which contains a reference to a location from which the return value can be retrieved, once Function has completed its execution. The Future value can be extracted with the function fetch(), which blocks until the result is available. Thus, the function remotecall is used to send a message while the function fetch is used to receive a message. For instance:
After the function remotecall is run, the worker process simply waits for the next call to remotecall.
The Julia macro @spawn simplifies this messagepassing protocol for the programmer and obviates the need for explicit use of the lowlevel remotecall function. Similarly, the macro @parallel can be used to run each iteration of a (for) loop in its own process.
There are also remote channels which are writable for more control over synchronizing processes.
2.3. Multidimensional Arrays
Julia supports multidimensional arrays, an important data structure in scientific computing applications, with a simple syntax and their efficient creation and interpretation over many dimensions (Bezanson et al., 2014). The function call ArrayType(dimensions) creates an array, where the argument in dimensions specifies the size of the dimension of the array. Similarly, the programmer manipulates these arrays using function calls that support infinitedimensional arrays given only limitations on computational time.
In summary, Julia incorporates concepts and mechanisms—particularly concurrency and multidimensional arrays—which support efficient scientific computing.
3. The Parareal Algorithm
The parareal algorithm is designed to perform parallelintime integration for a firstorder initialvalue problem. The algorithm involves two integration techniques, often known as the ‘coarse’ integrator and the ‘fine’ integrator. For the algorithm to be effective, the coarse integrator must be of substantially lower computational cost than the fine integrator. The reason will become apparent later in this section. Consider the differential equation given by
(1) 
with its associated initialvalue problem
(2) 
For simplicity, let us assume , so that the solution only extends rightward. To obtain an approximate solution to equation satisfying the initial condition , we partition our domain into with uniform step size . We now precisely define an ‘integrator’ as a function from to where is the set of all Riemann integrable functions. For example, the integrator given by
is the integrator corresponding to Euler’s method with step size . Let and be the coarse and fine integrators, respectively. Define
Since depends on , this algorithm is inherently sequential. Partition into with uniform step size . Define
This yields an approximate solution to over with initial conditions
Since does not depend on for , we can compute these approximations in parallel. After the last subproblem is solved, we simply combine the solutions on each subdomain to obtain a solution over the whole interval. However, our values are relatively inaccurate. The vertical spikes in the orange graph separating the coarse and fine predictions in Figure 1 illustrate this error. However, is a better approximation for where is the exact solution to the differential equation. We use this to obtain a better set of points for the coarse approximation. We do this by first defining and then defining
Thus, serves as a new prediction given a more accurate previous prediction from since has now been taken into account in calculating . In general, we continue evaluating so that for , we have
Note that since is dependent on , this step must be done sequentially. As increases, , which means that converges to the value that the fine integrator would predict if fine integration were simply done sequentially. Thus, each denotes fine integration over the whole interval. This means that the total computation performed is much greater than if fine integration were performed sequentially. However, the time efficiency of each iteration has the potential to be improved through concurrency. Since fine integration is more computationally intensive, this improvement in the runtime efficiency may compensate for the cumulative computation performed.
Let be the total number of iterations necessary to achieve a desired accuracy of solution and be the number of subintervals into which we divide according to the coarse integrator. If , then we achieve perfect parallel efficiency. If , then we likely slowed the computation down. The parareal algorithm is guaranteed to converge to the solution given by the sequential fine integrator within iterations. For a more complete treatment of this convergence analysis, we refer the reader to (Gander and Vandewalle, 2007). For fully general pseudocode, we refer the reader to (Aubanel, 2011; Nielsen, 2012).
4. Parareal Algorithm Implementation in Julia
Listing 1 presents an implementation of the parareal algorithm (from the prior section) in Julia. The @async macro within the loop causes the program to evaluate the first expression to its right as a concurrent task (i.e., the for loop assigning values to sub). The @sync macro causes the main program thread to wait until all tasks (spawned in the the first expression to its right with an @async or @parallel macro) complete. Once all concurrent tasks are complete, execution of the program proceeds sequentially. Given the semantics of these macros, the program in Listing 1 correctly perform concurrent integration. The sequential and parallel versions of this implementation have no significant differences in runtime efficiency. However, if a sleep statement is placed in the argument of fineIntegrator, the parallel version runs much faster. This demonstrates that use of those two macros does lead to concurrent program execution.
5. Graphical Algorithm Simulation
The function simulate in Listing 2 creates a graphical simulator of the parareal algorithm. This function can be used to introduce the parareal algorithm to students in a numerical analysis course. The first line gets the sequential solution from the fine integrator (the ‘ideal’ solution) and the second line gets the history of the computations that took place during the parareal execution. The main loop over the variable then displays the inner workings of the algorithm. The ideal solution is plotted, with a scatter plot of the points obtained from the coarse integrator. To simulate the parallel nature of the algorithm, random progress is made on randomly selected subdomains. Thus, the plot dynamically makes partial progress on different subdomains until all subdomains are finished with the fine integration. After this, the plots are connected into the current iteration’s approximation. During the next iteration, the previous guesses from the coarse integrator are displayed in red and the new guesses from the coarse integrator are displayed in green. As increases, these guesses converge to the ideal solution.


In addition to the use of this function for pedagogical purposes, it can be used to investigate the types of curves for which the parareal algorithm might be practical. For instance, consider the differential equation
with , (10 points), and (500 points). Figure 2 shows the first and ninth iterations. The ninth iteration’s large error on the right end of the interval shows that this is an example where parareal convergence is slow. This is as inefficient as possible, needing as many iterations as subdomains in order for the solution to converge. However, the simulation also shows that if , then the solution converges after just one iteration. These two examples show that the algorithm’s efficiency can be highly dependent on the integrand. Below the simulation function are Euler’s method and another RungeKutta integration technique that can be used as examples to be passed as firstclass functions as coarse or fine integration techniques to the ‘parareal’ or ‘simulate’ functions. A Git repository of both the implementation and graphical simulation is available in BitBucket at https://bitbucket.org/sperugin/pararealimplementationandsimulationinjulia. All of the graphical plots are generated with the Julia Plots package available at https://juliaplots.github.io/. A video describing this application of Julia is available on YouTube at https://www.youtube.com/watch?v=MtgbeLO6ZM4.
References
 (1)
 ?? (web) Julia MicroBenchmarks. Available: https://julialang.org/benchmarks/ [Last accessed: 14 November 2018]. (????).
 ?? (Jul) Julia v1.0 Documentation: Parallel Computing. Available: https://docs.julialang.org/en/v1/manual/parallelcomputing/ [Last accessed: 14 November 2018]. (????).
 Aubanel (2011) E. Aubanel. 2011. Scheduling of tasks in the parareal algorithm. Parallel Comput. 37, 3 (2011), 172–182.
 Bezanson et al. (2014) J. Bezanson, J. Chen, S. Karpinski, V. Shah, and A. Edelman. 2014. Array Operators Using Multiple Dispatch: A Design Methodology for Array Implementations in Dynamic Languages. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming. ACM Press, New York, NY, 56–61.
 Gander and Vandewalle (2007) M.J. Gander and S. Vandewalle. 2007. Analysis of the parareal timeparallel timeintegration method. SIAM Journal on Scientific Computing 29, 2 (2007), 556–578.
 Lions et al. (2001) J.L. Lions, Y. Maday, and G. Turinici. 2001. A “parareal” in time discretization of PDE’s. Comptes Rendus de l’Académie des Sciences  Series I  Mathematics 332 (2001), 661–668.
 Moffit and Tate (2014) J. Moffit and B.A. Tate. 2014. Julia. In Seven more languages in seven weeks: Languages that are shaping the future, B.A. Tate, F. Daoud, I. Dees, and J. Moffit (Eds.). Pragmatic Bookshelf, Dallas, TX, Chapter 5, 171–207.
 Nielsen (2012) A.S. Nielsen. 2012. Feasibility study of the parareal algorithm. Master’s thesis. Technical University of Denmark.
Comments
There are no comments yet.