1 Introduction
Most real problems take place on arbitrary geometries and one has to account for the boundary complexity to reproduce, at the numerical level, the interactions between the interior problem and the boundary conditions. Boundary layer and turbulence are among others, examples of phenomena that are mostly driven by the boundary condition and attention would be drawn on the numerical schemes to provide the correct behaviour of the numerical approximation. Very high order methods (we mean strictly higher than the secondorder of approximation) turned out to be an excellent tool in capturing the local geometry details and improving its accuracy. The counterpart is that additional efforts have to be made to treat a domain with curved boundaries. Indeed, popular schemes are usually restricted to, at most, the secondorder case when boundary conditions are not exactly localised on the nodes of the grid and the edge of cells.
Several techniques have been recently developed in the unstructured mesh context to preserve the optimal order. We refer to [1, 2, 3] for a recent review. In the present study, we are focusing on the specific case of the finite difference method on Cartesian grids. It is a very popular discretisation technique due to the low data storage, free underlying structures, and draws some advantages due to the simplicity of the numerical schemes [4]. Since the beginning of the seventies, and after the pioneer paper of Peskin [5], finite difference method with the boundary embedded in a Cartesian grid provides superior advantages over the conventional boundaryconformal approach since the computational mesh remains unchanged with respect to the boundary.
Historically, Immersed Boundary (IB) methods were classified into two categories: continuous force and discrete force approach (see
[6, 7] for a detailed overview). Nowadays, such a classification turns to be obsolete and the discrete force approach falls into a general framework that consists in transferring information located on the boundary into information supported by some nodes of the grid. Introduced in the original work of MohdYosuf [8] and extended by the socalled ghost cell method [9], several authors have contributed to improve the accuracy and stability of the technique [10, 11, 12]. Roughly speaking, a set of cells tagged ghost cells are identified around the computational domain. For each ghost cell of centroid , the orthogonal projection point on the physical boundaryis determined together with the normal vector
. We define the image point in the physical domain by symmetry and a value is assigned using linear, bilinear or quadratic reconstructions involving neighbouring points [13, 14]. Then a simple extrapolation of the BI and values transfers the Dirichlet or Neumann condition into a equivalent Dirichlet condition at the ghost cell centroid . Extension using several points on the semiline have been proposed to provide a secondorder approximation [15, 16, 17] with the Neumann condition and fourth/fifth order reconstruction along the normal have been recently proposed [18, 19].All the previous methods deal with a twosteps strategy. First, several approximations are computed with nodes interpolation at interior points located along the normal line. Second, an extrapolation (ghost point) or an interpolation (boundary point) using the boundary condition is computed at the ghost cell centroid. An alternative approach consists in performing both the nodes and boundary interpolation in one step by computing the best polynomial function in the least squares sense. The introduction of boundary condition with a polynomial representation dates back to the papers of
Tiwar and Kuhnert (2001) [20, 21] in the pointset method context while the same idea was independently proposed by OllivierGooch and Van Altena (2002) [22] for the finite volume method. A similar technique was proposed for Cartesian grids in [23, 24, 25]. In ever, the boundary condition is taken into account in a weak sense, i.e. with a weighted least squares method, hence the resulting polynomial function does not exactly satisfy the condition, but up to the reconstruction order.We propose a different strategy to include the boundary condition located on the physical domain while preserving the optimal (very) highorder. The overall picture of the problem is, on the one hand, that the boundary data is not situated on the nodes and one has to transfer the information from the physical domain onto the computational domain. Secondly, the boundary condition treatment presents a high level of independence regarded to the interior problem. The main idea consists in elaborating a mathematical object (a polynomial function for instance) that catches all the information about the location and the boundary condition we shall insert in the numerical scheme. For example, in the finite volume context, the boundary information is converted into flux on the interface edges [1, 2]. We tag the method ”Reconstruction Offsite Data” (ROD) to highlight the transfer of information located on the physical boundary and not on the grid (Offsite Data) into a polynomial (Reconstruction). We generalise the concept with a tidy separation between the boundary treatment and the numerical scheme and adapt the technique to the Cartesian grid context using the ghost cell method.
Noticeable differences between our method and the other authors will be mentioned. A strict inclusion of the boundary condition is obtained by imposing the polynomial reconstruction to exactly satisfy the boundary condition. Moreover, the routine treats the Dirichlet and Neumann conditions as a particular case of the Robin condition without any specificity. This provides a universal framework to deal with all kind of conditions. We also stress that our method does not require the ghost cell centroid projection onto the physical boundary and it just needs a list of points that belong to the frontier. At last, the method is of arbitrary order for arbitrary geometries, depending on the polynomial degree involved in the reconstruction on the local regularity of the border. As a final note, we highlight that we restrict the study to the simple convection diffusion problem on purpose for the sake of simplicity in order to focus on the main objective of the present work: the treatment of boundary condition on arbitrary geometries.
The organisation of the paper is the following. We present in section 2 the equations and the numerical methods that we consider in this paper. The details of the Reconstruction Offsite Data procedure is explained in section 3 while section 4 is dedicated to a new ADI strategy to solve the interior problem. Section 5 is dedicated to the coupling between the interior and boundary problem and propose a study on an accelerated fixpoint solver. Some numerical results that are obtained with this methodology are presented in section 6 to check the accuracy and computational effort.
2 Convection diffusion on curved boundary domain
We introduce the basic ingredients to deal with the discretisation of the equations and the boundary. In particular, we define the solver operator that deals with the discrete convection diffusion equation, deriving form the standard finite difference method, and the boundary operator, involving a very simple discrete approximation of the boundary.
2.1 Domain discretisation
Let be an open bounded set. We consider the linear scalar convection diffusion problem for the present study: find a function on such that
(1) 
with , , equipped with the boundary condition
(2) 
with a given function on the boundary , , are real numbers and is the outward normal vector on .
We denote by the rectangle that contains the subdomain with Lipschitz, smooth piecewise, boundary . For , , two given integer numbers, we set , the mesh sizes. We adopt the following notations with and :
Moreover, , , stand for the four nodes of cell while is the centroid. Dropping the indices , , the simpler notation and is used when the cell is clearly identified. The mesh gathers the cells of domain and we define the grid associated to domain by
where stands for the numerical domain. Cells are tagged active cells since they correspond to the computational domain (see figure 1).
To define the ghost cells, we introduce the rook distance between two cells with
and the distance between a cell and domain with
The first layer of ghost cells is then characterised by cells with a distance to is equal to , namely
Straightforward extensions is made for the second layer and third layer of ghost cells.
Finally, for a given cell and , we define the stencil , as the list of the closest cells of to . Notice that a stencil is only constituted of cells from the computational domain.
2.2 Boundary discretisation
To handle the boundary at the discrete level, we consider a set of points , we denote the collar as (see figure 2). We only assume that , where by convention and is the characteristic length of the collar. We shall take to guarantee the same accuracy than the numerical schemes. In addition to listing the positions, the collar also provides the outward normal vector of the physical boundary at points .
Remark 1
There are several methods to compute the collar point in function of the boundary characterisation depending on the description of the boundary: Jordan parametric curve, level set function, polar coordinate curve. We refer to [3] for a detail presentation. Nevertheless, we do not require any analytical description of the boundary to perform the method except a list of points and normal vectors. In particular, orthogonal projection of the ghost cell centroid is not required.
2.3 Reconstruction and solver operators
Before presenting the technical aspects, we give a general view of the method by defining the main operators. Let be a function defining on the domain . We denote by the approximations of for and the ghost cell values for , that we gather in matrix , the other entries still undetermined. Adopting a new oneindex numbering , matrix gives rise to two vectors: corresponds to the values on the active cells i.e. the cells that belong to , while gathers the values on ghost cells of the different layers. The method we propose is based on two operators coupling the active cells and the ghost cells.
The linear reconstruction operator (ROD operator)
(3) 
that provides the ghost cell values, given the active cell values and the boundary condition. We define the linear solver operator
(4) 
that provide the approximation on the active cells, given the values on the ghost cells and the righthand side function. The numerical solution of the convection diffusion problem (1)(2) satisfies both the conditions
and suggest an iterative method to reach the fixpoint solution. Another approach consists in introducing the global residual operator
such that the numerical solution is given by . The second approach provides a matrixfree linear operator that could be handle by an iterative method of type GMRES or BiGCStab.
3 The Reconstruction of Offsite Data method (ROD)
We detail the polynomial reconstruction operator (3) to provide a high accurate approximations of the solution in the ghost cell for smooth curved boundaries. We recall that the frontier is only characterised by a simple list of points and normal vectors. The major difference with the reconstruction proposed in [22] is the specific treatment of the boundary condition. Indeed, we combine a least squares method over a stencil of active cells but imposing the polynomial to satisfy the general Robin condition on two collar points. In other words, we apply the least squares procedure to a convex subset of polynomials that strictly respect the boundary condition.
3.1 The constraint optimisation problem
We assume that the entries of contain an approximation of . Let be a ghost cell. We adopt the multiindex notation and for any points , we define the polynomials centred at the centroid as
The lexicographic order given by , , , , , defines a onetoone function where is the rank in the lexicographic order (for instance ). Polynomial of degree is constituted of monomial functions we rewrite under the compact form where vector collects the polynomial coefficients , while collects the monomial functions. Finally, we define the energy functional
that represents the quadratic error between the polynomial representation centred at and the approximations over the stencil constituted of active cells.
We denote by the closest point of the collar, i.e.
where we drop the indices for the sake of simplicity. Then there exist two collar points , on both sides of and we choose one of the two points, denoted that satisfies the cone condition (see Figure 3)
The Reconstruction of Offsite Data method consists in seeking the coefficients of the polynomial such that
under the restriction
3.2 Calculation of the ROD polynomial
To determine the solution, we define the Lagrangian functional
with .
Due to the locality of the minimisation problem, we introduce the index and , corresponds to cell and centroid
with the local index respectively. Tensor
of size transforms the global indices into the local index by setting if is the local index of cell , zero elsewhere. Consequently, vector gathers the components of the approximation with the local indexation.We introducing the matrix of coefficients , , with
Hence, the energy function reads
while the gradient reads
In the same way, the boundary condition reads for and ,
with vector given by the coefficients at
, .
Gathering the two vectors in the matrix , and let . The saddle point satisfies the following linear system
Notice that all the small matrices , are computed in a preprocessing stage together with matrix that we use in the determination of the coefficients.
3.3 Validation of the ROD reconstruction
Given a ghost cell of centroid , given the boundary condition on the two collar points, we deduce the polynomial representative and then compute the value at point . We perform all the reconstructions to set the values for the different layers of ghostcells. Since the polynomial coefficients linearly depends on the values in the active cells, we deduce that the vector is an linear function of vector given by relation (3).
To check the accuracy of Reconstruction Offsite Data procedure and assess the order method, we consider the function and set the exact values in the active cells with , . On the other hand, to fulfil the boundary condition at the collar points , we manufacture the values such that . The reconstruction procedure will then provide the values for the first, second and third layer of ghost cells we gather in vector .
Remark 2
To reduce the computational cost, we only evaluate the reconstructions for the ghost cells of the first layer and use the same polynomial to compute the extrapolations for the second and third layer.
Errors are estimated with the
norms for the first layer withTable 1 presents the errors and convergence order for the reconstruction of the Dirichlet condition (first column), the reconstruction of the Neumann reconstruction (second column) and the reconstruction for the Neumann condition (third column). Notice that we lost one order of magnitude with the reconstruction and Neumann condition but we manage to recover the optimal order using the reconstruction.
Dirichlet  Neumann  Neumann  
I  error  order  error  order  error  order 
80  4.15e03  —  7.98e02  —  7.49e04  — 
160  1.17e03  1.83  4.42e02  0.85  9.70e05  2.95 
320  3.26e04  1.84  2.13e02  1.05  1.22e05  2.99 
Similarly, we report in Table 2 the error and convergence order for the reconstruction both for Dirichlet and Neumann boundary condition, and the reconstruction for the Neumann case. We observe that the Neumann condition does not suffer of the lack of accuracy as in the case and already reach the optimal order.
Dirichlet  Neumann  Neumann  
I  error  order  error  order  error  order 
80  4.41e05  —  4.32e05  —  2.32e06  — 
160  3.51e06  3.65  2.97e06  3.86  1.39e07  4.06 
320  2.20e07  4.00  1.85e07  4.00  3.28e09  5.41 
We present in Table 3 the errors and convergence order for the reconstruction both with the Dirichlet and Neumann condition. An additional benchmark is given for the reconstruction with the Neumann condition. Indeed, for we reach the double precision capacity to handle the large condition number of the matrices involved in the polynomial coefficients’ calculation. The right panel provides the errors for and show that we recover the the optimal order.
Dirichlet  Neumann  Neumann  
I  error  order  error  order  error  order 
80  6.57e07  —  1.97e07  —  3.91e08  — 
160  1.28e08  5.68  3.72e09  5.73  3.35e09  3.54 
320  5.13e10  4.64  7.32e11  5.67  2.61e09  0.36 
I  Neumann  

error  order  
60  1.89e07  — 
80  3.91e08  5.48 
100  9.94e09  6.14 
4 The solver operator
Given vector , we seek vector deriving from a traditional finite difference scheme with Dirichlet condition at the ghost cells. It provides the linear solver operator (4) by solving a linear system characterised by a very specific sparse matrix. Several algorithms are considered to take advantage of the structured mesh, namely parallel solvers that we shall evaluate in terms of efficiency and robustness.
4.1 Highorder finite difference schemes
We adopt the standard finite difference schemes that we reproduce hereafter for the sake of consistency. We always consider centred schemes even for large cell Péclet number assuming that the solutions we are dealing with do not produce numerical instabilities. Of course, upwind scheme would be necessary in case of oscillations but this issue is out of the scope of the present paper.
The secondorder scheme is achieved with the 3points approximations
The fourthorder scheme derives from the 5points approximations
We shall also consider the sixthorder scheme given by the 7points approximations
We use the same discretisation for the direction. Notice that only the computational cells have to be evaluated since the values of the ghost cells were already evaluated in order to provide the necessary information to carry out the calculations. By solving the linear system, given the ghost cells vector and the righthand side term, we obtain the linear operator .
4.2 Linear solvers and dimensional splitting
Even enjoying a high degree of parallelisation, the Jacobi method converges too slowly when dealing with high conditioning number matrix and does not represent a satisfactory solution. The SOR technique converge faster but presents strong restrictions for a fully parallelisation if one aims at using a large number of cores. For example, RedBlack ordering only uses two cores and the block strategies creates strong overheads [26].
Methods based on the residual computation such as the GMRES methods are strongly parallelisable by nature but memory access represents a limitation that strongly reduces its computational efficiency. Indeed, the Krylov based method involves the orthogonalisation procedure leading to the construction of full vectors and matrices together with a increasing computational cost. The biCGStab method is an interesting alternative since no additional storage is required but the computational cost is still significant (we require twice the evaluation of the residual) and the conditioning number is higher.
We here propose a more efficient method, fully paralellisable by construction, where data are consecutive in memory so what it takes advantage of the processor’s cache hierarchy. We revisit the Alternate Direction Implicit method (ADI) proposed in the 60s by splitting the 2D system into a large number of 1D independent linear problem where the data is contiguous in memory, to leverage the cache access. Moreover, such a method does not require any additional storage or calculation (for instance the orthogonalisation). To this end, let consider the operators
We aim at seeking the solution solution of the steadystate convection diffusion problem with Dirichlet boundary condition. We slightly modify the equation we rewrite as a fix point problem by setting
where is the identity operator and a parameter. At the discrete level, we denote by and the numerical discretisation matrices associated to operator and respectively while vector is the solution of the linear problem
with the discrete version of function and
the identity matrix. The ADI method is based on the construction of the following sequence
of approximations given bywhere stands for an intermediate stage. At each stage , we solve independent tri(penta, hepta)diagonal linear system (sweep) and independent tri(penta, hepta)diagonal linear system (sweep). The loop stops when the residual norm reach a prescribed tolerance.
Remark 3
Higher order ADI would be considered to improve the iteration procedure. For example, the fourthorder ADI method proposed in [27] could be rewritten in the steadystate context.
4.2.1 Comparison with classical solvers
We compare in tables 4 and 5 the ADI method with the GMRES and biCGStab for the second and fourth order finite difference method method. We consider the academic problem with the exact solution and the corresponding Dirichlet boundary condition. We solve the linear system until it reaches a residual lower than .
We report the error and convergence order together with the number of iterations, i.e. the number of calls to the residual computation using a points grid. We obtain exactly the same errors and convergence order as expected. The number of iterations for the biCGStab is twice since we call two times the residual for each stages. The ADI method provides the lowest number of iterations and exhibit an excellent convergence. We do not present the computational times since they highly depend on the implementation of each method, the memory access, the cache memory, and the CPU Instructions Per Clock (IPC).
GMRES  biCGStab  ADI  

err  ord  itr  err  ord  itr  err  ord  itr  
10  1.01e03  —  50  1.01e03  —  104  1.01e03  —  38 
20  2.82e04  1.84  113  2.82e04  1.84  232  2.82e04  1.84  78 
40  7.47e05  1.92  220  7.47e05  1.92  460  7.47e05  1.92  155 
80  1.93e05  1.96  430  1.93e05  1.96  876  1.93e05  1.96  312 
120  8.65e06  1.97  622  8.65e06  1.97  1280  8.65e06  1.97  467 
160  4.89e06  1.98  788  4.89e06  1.98  1696  4.89e06  1.98  627 
GMRES  biCGStab  ADI  

err  ord  itr  err  ord  itr  err  ord  itr  
10  9.32e05  —  36  9.32e05  —  80  9.32e05  —  33 
20  9.46e06  3.30  121  9.48e06  3.30  232  9.48e06  3.30  76 
40  7.36e07  3.69  267  7.36e07  3.69  480  7.36e07  3.69  161 
80  5.11e08  3.85  519  5.09e08  3.85  964  5.09e08  3.85  333 
120  1.02e08  3.96  808  1.04e08  3.92  1484  1.04e08  3.92  503 
160  3.35e09  3.92  1088  3.35e09  3.94  1970  3.35e09  3.94  665 
4.2.2 Parallelism and computational efficiency
The computational efficiency is deeply related to the builtin parallelism ability of the numerical method and its scalability. The main interest of the ADI solver is to split a 2D problem on a grid into independent 1D problems on a points grid for the sweep (and the symmetric for the sweep). To this end, an OpenMP version of the code has been implemented in order to take advantage of the multithreading capabilities of modern processors. Simulations have been carried out on a dual processor Intel Xeon E52650 v2 with 16 cores @2.6GHz and 64 GB of memory.
Table 6 presents the time spent and respective speedup from 1 to 16 cores and different points grids for the centred second order scheme. We report a very good scaling when deploying up to 16 cores with a speedup close to 9. We also note that the computational cost increases as while the grid size increases as , hence the running time is proportional to the power of the number of unknowns.
Time (s)  Speedup  

I / #cores  1  2  4  8  16  1  2  4  8  16 
256  1.42  0.74  0.44  0.27  0.2  1.00  1.92  3.23  5.26  7.10 
512  12.38  6.94  4.22  2.26  1.43  1.00  1.78  2.93  5.48  8.66 
1024  113.83  64.41  37.35  20.89  12.43  1.00  1.77  3.05  5.45  9.16 
Tables 7 and 8 concern the fourthorder and the sixthorder scheme respectively. We remark that the speedup is slightly better (close to 12 for 16 cores and the 6thorder scheme). We also notice that the running time is of the same order that for the 16 cores case, in line with the observation of the 2ndorder scheme. In particular for the 16 cores and grid case, the computational time is 12.43 for the 2ndorder, 30.76 for the 4thorder (two times), and only 38.76 (three times) for the 6thorder.
Time (s)  Speedup  

I / #cores  1  2  4  8  16  1  2  4  8  16 
256  3.58  2.12  1.09  0.60  0.43  1.00  1.69  3.28  5.97  8.33 
512  38.81  19.74  12.02  6.34  3.73  1.00  1.97  3.23  6.12  10.40 
1024  344.87  193.34  104.14  55.83  30.76  1.00  1.78  3.31  6.18  11.21 
Time (s)  Speedup  

I / #cores  1  2  4  8  16  1  2  4  8  16 
256  5.77  2.65  1.58  0.86  0.58  1.00  2.18  3.65  6.71  9.95 
512  47.34  26.51  13.66  7.67  4.51  1.00  1.79  3.47  6.17  10.50 
1024  464.76  247.99  122.47  68.71  38.76  1.00  1.87  3.79  6.76  11.99 
5 The fixpoint solver
The numerical solution of the convection diffusion problem (1)(2) is provided by a sequence of general term given by the relation
The sequence is built until we reach a satisfactory approximation of the fixpoint solution . Since R and S are linear operators, the composition is also linear and reads
(5) 
where is a full matrix of size the number of ghost cells. Moreover, let denote . Then, one has .
Since is not explicit, we use a matrix free procedure to provide an approximation of the fix point solution of the problem
. It is of common knowledge that fix point method converges very slowly when matrix has some eigenvalues very close to one. We developed a new accelerator method to improve the solver’s convergence rate.
5.1 A preliminary analysis
We consider in this section the very particular case where it exists an initial condition such that with , i.e. vectors and are colinear. Then the following proposition holds
Proposition 4
For any , one has
Moreover, the exact solution is simply given by
[Proof.] Relation is obtained by applying times the operator. On the other hand, we write
(6) 
We then deduce that sequence converges and relation (5) gives
We conclude that the limit is the fix point . Passing to the limit in relation (6), provides relation .∎ The preliminary study indicates that the colinearity property between two successive iterations is a key to compute a better approximation of the solution, avoiding all the intermediate stages and saving a lot of computational effort. Of course, in general case, we do not have such an ideal situation but when colinearity is almost achieved, we shall take advantage of the property as presented in the next section.
5.2 Fix point accelerators
We drop the subscript GC for the sake of simplicity and define the two following quantities:
where and stand for the Euclidean norm and inner product between vectors and . From the definitions, we have the following result.
Proposition 5
Let be the initial condition. Then the following decomposition holds
with . Moreover, one has
Finally, the fix point solution satisfies the relation
(7) 
[Proof.] From the definitions of and , we have
Hence defining vector , the orthogonality property holds by construction.
We prove the second relation by induction. We compute
To prove the last relation, we write
Multiplying the relation with gives
Assuming that matrix in nonsingular, we obtain
Hence by dividing with the quantity , we deduce the relation (7).∎ Relation (7) states that the fix point solution is decomposed into a principal with and a complementary orthogonal part. Note that the accelerator parameter turns to be very large when are close to one (colinearity). We propose several improvements of the fix point method based on that remark.
5.2.1 First acceleration algorithm
Assume that at stage we know the approximation .

We compute two successive steps noting
and the increments

We compute the indicators

The new approximation is then given by truncation of the second term in relation (7).
(8)
5.2.2 Second acceleration algorithm
To improve the performance, we design a second algorithm by using the orthogonality property
where we set for the sake of notation. We then have the following proposition
Proposition 6
Let the initial condition. The following decomposition holds
(9) 
Comments
There are no comments yet.