1 Introduction
When trying to characterize an oil reservoir in the subsurface of a field, a goto method consists in using seismic imaging. Through this process, an echography of the subsurface is obtained by “shooting” acoustic waves into the subsurface, and then studying their reflection on the different geological interfaces composing the subsurface. In particular, complex processing methods are performed to turn the detected times of arrival of these waves after reflection on a given interface into the actual depth of this surface. The methods applied to acquire the seismic data often add noise to the data. If not removed, it is very challenging to identify zones of interest in the subsurface. Like the signal, the noise tends to be spatially correlated inside the “seismic” cube. In this context, Geostatistics provide tools and methods suited to model such spatially structured data.
Indeed, Geostatistics is the branch of Statistics that focuses on variables defined over a spatial domain (chiles1999geost)
. Such variables are usually observed on a set of fixed locations in the domain and are spatially structured. For instance, for seismic data, the variable is the measured amplitude and the spatial domain is formed by the domain of acquisition and the seismic depth. When referring to spatial structure, we mean here that the similarity of observations of the variable is linked to their locations within the spatial domain. Such variables are modeled as (realizations of) spatially correlated (Gaussian) random fields. This probabilistic model for the variable then allows, once a dependency between correlation and space is fixed, to mimic the spatial structure of the data, to perform tasks such as the simulation of variables with the same spatial structure, estimating the variable at unobserved locations or, as illustrated in this paper, filtering out a specific spatial structure of the variable
(wackernagel2013multivariate).Geostatistical filtering is a method used in many applications dealing with spatial data affected by spatially structured noises (goovaerts2004accounting, bourennane2012geostatistical), among which seismic processing (hoeber2003use, piazza2015m). It relies on the assumption that the noisy signal at hand results from a complex phenomenon that can be seen as a superposition of independent simpler phenomena. Hence the noisy signal is written as a sum of independent signal and noise structures, each one characterized by its own range of influence and spatial structure: filtering consists in extracting the signal structure (considered as the “noisefree” signal) from observations of the overall sum.
Within the geostatistical framework, the components of the observed signal are modeled as independent (realizations of) Gaussian random fields defined over the spatial domain. These random fields should mimic even complex spatial structures observed in the data. For instance, when dealing with seismic data, these structures relate either to the underlying geometry of layers composing the subsurface, or to the acquisition process of the data. Hence, suitable models are needed to produce realizations with similar properties.
Besides, even if this modeling problem is taken care of, one might be confronted to scalability issues. For instance, geostatistical filtering usually requires to build matrices as big as the number of observation points and to solve the associated linear systems (wackernagel2013multivariate). For seismic datasets, this size is that of the observation grid and therefore can easily amount to several millions of points. Hence, the filtering scheme should be scalable both memorywise and computationallywise if we are to apply them to such datasets.
In this paper, we present a full framework allowing to perform the geostatistical filtering of large complex datasets while tackling the two challenges described above. The outline is as follows. In Section 2, we review the theory surrounding geostatistical filtering, and the main modeling and computational problems it poses. We then propose in Section 3 a novel approach to geostatistical filtering, based on theoretical results obtained by pereiraPhd2019 and which can be seen as an extension of the SPDE approach initially proposed by lindgren2011explicit. We finally illustrate in Section 4 the use of this approach on two case studies: a synthetic one based on simulated data, and a real one based on actual seismic data.
2 Geostatistical filtering
2.1 Basic geostatistical modeling
2.1.1 Isotropic stationary Gaussian random fields
In the simplest geostatistical modeling, the variable of interest , defined over a spatial domain , is assumed to be a particular realization of a Gaussian random field also defined over . This means in particular that
can be entirely characterized through its first two moments. Assuming now that the expectation of
is constant across , i.e.(1) 
its spatial structure can be accurately described using a covariance function which associates to each pair of points the covariance of and :
(2)  
In this paper, only zeromean Gaussian random fields will be considered (i.e. ).
Of particular interest is the case where we further assume that for any ,
is a radial function of the separating vector
, i.e., that there exists some function satisfying(3) 
where denotes the usual Euclidean norm on . In this case, is said to be secondorder stationary (given that its first two moments are invariant by translation) and isotropic (given that these same moments are invariant by rotation). In particular, the function in (3) must be such that
still defines a covariance function. Bochner’s theorem, actually provides a characterization of continuous covariance functions as the inverse Fourier transform of positive bounded symmetric measures
(chiles1999geost). This theorem can in turn be used to yield a sufficient condition on for (3) to define a “valid” covariance function.Proposition 2.1 (Corollary of Bochner’s theorem).
If is such that the Fourier transform (on ) of the function defines a nonnegative, radial and integrable function on then the function defined through (3) is a continuous covariance function.
When the conditions of Proposition 2.1 are satisfied, the function defined by
(4) 
is called spectral density. In particular, ORMEROD1979559 gives a formula linking a radial covariance function (which should be both integrable and squareintegrable as a radial function of ) and its associated spectral density :
(5) 
where denotes the JBessel function with parameter . Conversely, the expression of the radial covariance function can be retrieved from its radial spectral density through
(6) 
Proposition 2.1 can be used to derive parametrized families of isotropic covariance functions that are classically used in geostatistical modeling. Catalogs of such functions (an associated spectral densities) can be found for instance in the books of chiles1999geost and lantuejoul2013geostatistical. Note in particular that linear combinations with positive coefficients of such functions still define appropriate covariance functions.
2.1.2 Covariance modeling
Recall now that we have at our disposal observations of a realization of at some locations . Determining a suitable combination of isotropic covariance functions to model the covariance of from this observation can be done using variogram modeling. The semivariogram function of is defined as the function such that
(7) 
Within the assumption that is secondorder stationary and isotropic, its semivariogram is once again a radial function of the separating vector, thus giving:
(8) 
Also, the isotropic semivariogram function can be linked to the isotropic covariance of through the relation
(9) 
For any , an estimator of the value of from the observation of a realization of at locations is given by the experimental semivariogram , which is defined by:
(10) 
where and . Here is a threshold chosen small in comparison to . Once the experimental semivariogram has been evaluated for some values of , curve fitting is used in order to choose a function from the catalog of possible isotropic semivariogram (or equivalently covariance) functions (see e.g. desassis2013automatic).
2.2 Geostatistical filtering
2.2.1 Presentation of the approach
Within a geostatistical framework, a noisy observed variable can be seen as the realization of a Gaussian random field defined over a spatial domain and that can be decomposed as a sum of independent (Gaussian) random fields (), each one characterized by its own covariance model:
(11) 
Note in particular that following the independence assumption, the covariance function of is simply the sum of the covariance functions of each component forming .
Filtering then consists in estimating one of these components, here denoted by , from the observation of a realization of at locations . In the remaining of this text, we will call true signal the component that we wish to extract, and noise fields the discarded components .
The factorial kriging method, proposed by matheron1982pour, solves this problem by estimating the value of the true signal using the observed noisy signal in a kriging approach (wackernagel2013multivariate). Formally, the value of the true signal at some location is estimated using a linear combination of the values taken by the noisy signal at the observation locations
, which are weighted so that the estimator is both unbiased and minimizes the variance of the resulting error.
Assuming that all the fields are zeromean, we get the following estimator for the value of at :
(12) 
where is the covariance function of , is the matrix whose entries are:
(13) 
and similarly denote the covariance matrices of respectively at the same locations. In particular, the estimation of the true signal at the observation locations using now the actual data can be obtained solving a linear system obtained from the vectorization of (12). Namely, these estimates are given by:
(14) 
If we assume that the true signal and all the noise components are secondorder stationary and isotropic, the covariance matrices in (14) are obtained by applying the corresponding isotropic covariance function to the entries of the (Euclidean) distance matrix of the observation locations . These isotropic covariance functions are obtained from the data using the variogram modeling approach described in Section 2.1. Indeed, the experimental semivariogram of is fitted by a sum of isotropic covariance functions chosen from the catalog. Each one of these functions corresponds in our model to the covariance function of one of the components composing . It then falls on the user to choose which one of them has to be considered as the model of the true signal, in order to form the covariance matrices and then solve the system (14).
2.2.2 Limits of the approach
When dealing with real datasets however, and especially with large seismic datasets, the approach outlined up until now meets two main challenges. First, note that the system (14) involves matrices where is the number of observation points. Hence a naive approach to solve (14) would require enough storage space to keep in memory the values of each matrix and about operations to compute the solution. Clearly, a problem arises when becomes large, as it is the case for seismic datasets for which can be equal to several millions of points.
Nevertheless, under the assumptions of secondorder stationarity and isotropy of the random fields, several approaches have been proposed to circumvent this “big ” problem. We can for instance cite the use of compactly supported (gneiting2002compactly) or tapered (kaufman2008covariance, furrer2006covariance) covariance functions, which results in sparse^{1}^{1}1A sparse matrix a matrix whose most of its entries are zero. covariance matrices and therefore a reduction of storage needs and computational costs using algorithms designed for sparse matrices to solve (14). Similarly, imposing that the considered random fields are Markovian ensures that the resulting precision matrices^{2}^{2}2A precision matrix is the inverse of a covariance matrix. are sparse (rue2005gaussian). Reformulating (14) in terms of precision matrices then allows to once again use sparse matrix algorithms.
Unfortunately, as one may suspect, stationarity and isotropy are strong assumptions that cannot be applied to model any spatial dataset. Dealing with data for which the highly regular spatial structure implied by the isotropic stationary assumption does not apply, requires more work. In the nonstationary case, the covariance function can no longer be expressed as a simple function of the distance between the points, but has an expression that depends on the location and relative position of the considered pair of points. Building and working with the corresponding covariance matrices becomes a complicated task that involves either nontrivial and computationally expensive methods such as Karhunen–Loève expansions (lindgren2012stationary), space deformations (sampson1992nonparametric) or convolution models (higdon1999non) (see fouedjio2017second for a complete review); or the use of a restricted class of modeling covariance functions (like for instance the compactly supported nonstationary covariance models proposed by liang2016class).
In the applications considered in this work however, we assume that some prior structural information on the behavior of the random field across the domain is available. Namely, we assume that the random field shows local anisotropies: around each point of the domain, there is a preferential direction along which the range of highly correlated values is maximal, whereas it is minimal in the crossdirection(s). In particular, the angles defining the preferential directions are called anisotropy angles and the size of the ranges are called anisotropy ranges. These anisotropy parameters are graphically represented by an ellipse whose axes length and direction are respectively given by the anisotropy ranges and angles (see Figure 1 for an example). The goal is then to come up with covariance models/matrices that honor this prior information while keeping the big problem at bay.
A commonly used solution consists in performing the estimation of the filtered output at a given point considering only a small local neighborhood of observation points around it, and building the resulting (smaller) kriging system while considering the anisotropy parameters constant within the neighborhood (magneron2009m)
. However, when applied to large seismic volumes, this method tends to be time consuming and sometimes introduces artifacts that are due to the fact that a collection of small local models are now considered instead of a big global one. This work aims at presenting an approach to geostatistical filtering that builds on the “Stochastic Partial Differential Equations (SPDE) approach” introduced by
lindgren2011explicit, and that allows filtering to be performed on a global setting while still addressing the limitations of the big problem.3 A matrixfree approach to geostatistical filtering
3.1 Starting point: the SPDE approach
The SPDE approach, introduced by lindgren2011explicit consists in viewing a (Gaussian) random field as a solution of a class of SPDEs rather than adopting the classical covariancebased approach. It is based on a result from rozanov1977 that states that any (secondorder stationary) isotropic Gaussian Markov random field defined over () is equivalently characterized by:

having a spectral density which is the inverse of a polynomial which takes strictly positive values over :
(15) 
being a stationary solution of the following SPDE:
(16) where is the pseudodifferential operator defined by
(17) and
denotes a Gaussian white noise defined over
.
Example of such fields include fields with a Matérn covariance function^{3}^{3}3The Matérn covariance function is a covariance function widely used in geostatistical applications due to its large flexibility. It is defined through three parameters: a range parameter which acts like a scaling parameter, a sill parameter which corresponds to the marginal variance of the resulting field, and a shape parameter which corresponds to the smoothness of the field. Its expression is:
The SPDE approach of lindgren2011explicit relies on this last characterization. Indeed, they propose to formulate a solution for this SPDE using the finite element method. First, the domain
is triangulated. The solution of the SPDE is then approximated by a linear combination of interpolation functions defined over
, and each associated to one of the triangulation nodes. Hence, the solution is written as(18) 
where denotes the number of triangulation nodes, () is the interpolating function associated with the th node, and () is a Gaussian weight associated with . A classical choice for the interpolation functions is piecewise linear functions, that are at a given node and at any other node. With this choice, the weight actually corresponds to the value of at the th node.
The formulation (18) allows the conversion of the SPDE into a linear system involving the random weights , which in turn provides an expression for the covariance (or precision) matrix of these weights. lindgren2011explicit actually provide the expression of the precision matrix of the random weights of the finite element approximation of the solution of (16):
(19) 
where is the diagonal matrix whose entries are
(20) 
is a symmetric positive semidefinite matrix with entries
(21) 
and denotes the usual inner product associated with squareintegrable functions of : .
Note that working with the piecewise linear functions described above yields that the matrix is actually sparse: indeed, is zero whenever the supports of the functions and are disjoint, which is the case whenever the nodes and do not belong to a common triangle. Then, the precision matrix in (19), which corresponds to the precision matrix of at the triangulation nodes, is a matrix polynomial of a sparse matrix. This means in particular that solving the SPDE using this method actually yields Markovian solutions.
This approach sparked a lot of interest for several reasons. On one hand, the precision matrix of the weights obtained by the SPDE approach being sparse, it provides a practical solution to the big problem.
On the other hand, adjustments can be made to the model to produce random fields that are both Markovian and nonstationary with locally varying anisotropies. lindgren2011explicit and then fuglstad2015exploring worked in the particular case where is the polynomial given by for some and . In this case, the pseudodifferential operator reduces to a simple differential operator and the the SPDE (16) becomes:
(22) 
They first propose to work with spatially varying parameters and in SPDE (22), which then creates globally nonstationary fields with a locally isotropic covariance. A second approach they suggest consists in defining SPDE (22) in a deformed space. Rewriting then the SPDE in the original domain using a change of variable yields an expression of the SPDE that is locally parametrized by the Jacobian of the deformation process. This deformation is chosen so that the nonstationary field with given local anisotropies in the original domain becomes a stationary and isotropic field in the deformed domain. In a sense, the ellipses of anisotropy defined on the original domain should become, after this deformation, unit circles (cf. Figure 2): hence the deformation acts locally as the composition of a rotation (with angle minus the anisotropy angle) and directional scalings (with factors over each anisotropy range).
3.2 An extension of the SPDE approach
3.2.1 Laplacianbased random fields
Several limits to the SPDE approach can be identified. First, it only deals with Markovian random fields, meaning that we are restricted to use only fields whose spectral density can be expressed as the inverse of a polynomial. Besides, it relies on the fact that, given the Markov property, the precision matrix (19) should be sparse and therefore can be built and stored. However, the sparsity of the matrix (19) is directly linked to the degree of the polynomial defining the spectral density (and the precision matrix itself as it turns out): the larger the degree, the fuller the matrix. Hence, the big problem might resurface if the degree of the polynomial is not taken small enough. Finally, we note that the nonstationary setting was outlined only in a particular case of Markovian field, which proves to be an additional limitation for our modeling purposes.
In order to work with nonMarkovian spectral densities, (pereiraPhd2019) propose to work with random fields defined using the spectral properties of the Laplace operator. Under the assumption that the domain on which the random fields are defined is bounded (with piecewise smooth boundary), it can be shown that the set of squareintegrable functions of admits a countable orthonormal basis of eigenfunctions of the Laplacian, i.e functions satisfying
(23) 
for some
(called eigenvalues) such that
, and satisfying either Dirichlet or Neumann boundary conditions (or a mix of them) (gilbarg2015elliptic, laugesen2018spectral). Hence, any satisfies the following equality:(24) 
Now let be an integrable function, and for instance the spectral density of some isotropic covariance function . Following the representation (24) of elements of , consider the
valued random variable defined by:
(25) 
where is a function such that on , is defined in (23), and is a sequence of independent standard Gaussian variables. In particular, it is straightforward to show that then, defines a Gaussian random field over with covariance function given by:
(26) 
solin2019hilbert show that, as defined by (25), actually approximates a Gaussian random field on with (radial) spectral density . They even provide a uniform bound on the error between the actual covariance function of and the covariance function associated with which proved that the approximation improves as we move further away from the boundary.
3.2.2 Numerical approximation of the resulting fields
Building on the SPDE approach presented earlier, pereiraPhd2019 then proposes to build a finite element approximation^{4}^{4}4Or rather a Ritz–Galerkin approximation (strang1973analysis, brenner2007mathematical). of the field defined by (25). Basically, is once again replaced by a linear combination of interpolation functions (related to a triangulation of ) where the weights are chosen so that the approximation coincides with a definition of through (25), but replacing now the eigenfunctions and eigenvalues of the Laplacian by those of a discretized version of the Laplacian^{5}^{5}5In particular, this discretized Laplacian is defined for any by:
(27) 
where is the inverse of the matrix introduced in (20), is once gain defined by (21) and is a matrix function defined through an eigendecomposition of as
(28) 
where is a matrix whose column are an orthonormal basis of , i.e. , and are eigenvalues of . A convergence result of this approximation towards (25) as the mesh size of the triangulation decreases is actually provided by pereiraPhd2019, thus justifying this approximation approach.
Note that in the case where is a Markovian spectral density, we retrieve exactly the expression (19) obtained from the SPDE approach by inverting (27), which bridges the gap between both approaches. The approach proposed by pereiraPhd2019 can therefore be seen as a generalization of the SPDE approach to any (radial) spectral density. In what follows we show how this new approach can easily be extended to the nonstationary case (without a restriction on the possible models) and how it paves the way for a “matrixfree” approach of Geostatistics that pushes the limits of the big problem.
3.3 Working with nonstationary models
The extension of this Laplacianbased model for stationary (isotropic) Gaussian random fields to nonstationary ones relies on the notion of Riemannian manifold (jost2008riemannian). A Riemannian manifold is the association of a manifold with a Riemannian metric. On one hand, the manifold is a set that can locally be considered as Euclidean. On the other hand, the Riemannian metric is an application that smoothly associates to each point of the manifold an inner product that redefines the notions of length and of angles for infinitely small vectors that would be attached to .
An equivalent of the usual Laplacian that takes into account both the structure of the manifold and the local redefinition of its geometry due to the metric can be defined, and is called Laplace–Beltrami operator (lablee2015spectral). When considering compact Riemannian manifolds, a spectral theorem similar to the one presented in the previous section can be stated, then yielding a representation of squareintegrable^{6}^{6}6It is important to note that now, integrals also account for the metric through an infinitesimal integration volume computed in the metricdefined geometry (which varies across the manifold). functions defined over the manifold in a basis of eigenfunctions of the Laplace–Beltrami operator (lablee2015spectral, jost2008riemannian). Hence, Gaussian fields can once again be defined by (25) by using this time the eigenfunctions and eigenvalues of the Laplace–Beltrami operator.
Going back to our initial problem of defining Gaussian fields with local anisotropies on a bounded domain , pereiraPhd2019 proposes to work with a Gaussian field defined on a problemspecific Riemannian manifold. Namely, is the manifold and a metric is defined so that at each point , the resulting inner product is the Euclidean inner product between vectors deformed by the composition of rotation and axis scaling introduced in Figure 2, and defined for the anisotropy parameters at . In other words, the metric is chosen so that it locally deforms into a domain where the anisotropy becomes isotropy.
Defining Gaussian fields through (25) using the Laplace–Beltrami operator associated with this Riemannian manifold then yields (away from boundary) Gaussian fields on whose covariance function is isotropic after a change of variables from to the locally deformed domain described above. Here, once again, the isotropic covariance in question is the covariance function associated with the spectral density used to scale the Gaussian weights in (25). Hence we have,
(29) 
where is the diagonal matrix whose entries are the inverse of the anisotropy ranges at , is the rotation matrix whose angles are the anisotropy angles at , and is linked to through (6). It is then straightforward to check that such a structure of covariance locally reproduces the anisotropic structure we were aiming at.
pereiraPhd2019 carries out the same finite element approximation scheme as the one described in the stationary case and end up once again with an explicit characterization of the weights in (18). For any , denote by the anisotropy ranges at and by the rotation matrix associated with the anisotropy angles at . Introduce then and defined as:
(30) 
and
(31) 
Then pereiraPhd2019 shows that the approximation weights are once again Gaussian, zeromean and have covariance matrix given by:
(32) 
where now, is the diagonal matrix whose entries are
(33) 
is a symmetric positive semidefinite matrix with entries
(34) 
and denotes once again the usual inner product associated with squareintegrable functions of . Note in particular that (27) and (32) actually coincide whenever , , which corresponds to isotropic case (all the anisotropy ellipses are unit circles with same radius).
3.4 A matrixfree implementation
3.4.1 The matrixfree approach
Considering (27) and (32), building the covariance matrices involved for instance to solve the filtering system (14) seems like a task particularly impacted by the big problem. Indeed, their constructions seem to rely on the full eigen decomposition of a (sparse) matrices of size , and results in generally full matrices. However, the fact that both (27) and (32) are expressed as a product of diagonal matrices with a matrix function points towards adopting a socalled matrixfree strategy aiming at working with the covariance matrices without actually building and storing them.
Indeed, getting back to (14), note that the filtered solution can be obtained in two steps:

First, solve for the linear system
(35) 
Then, return the solution as
(36)
There exists iterative algorithms that allow to solve linear systems while relying only on products between the matrix defining the linear system and vectors (nocedal2006numerical). In particular, such algorithms can be used in a matrixfree way in the sense that they are able to solve the linear system without needing to explicitly build and store the associated matrix: all that is required is a routine that performs the product between this matrix and an input vector. An example of such an algorithm is the Conjugate Gradient (CG) algorithm (nocedal2006numerical). Hence, using the CG algorithm to solve (35), we can completely solve the filtering problem (14) in a matrixfree way: all that is required is routines computing the product between any one of the matrices and an input vector. We present in Algorithm 1 this approach to geostatistical filtering.
Apart from the routines, note that each iteration of Algorithm 1 requires at most operations and memory space. Hence the computational and memory efficiency of the algorithm comes down to that of the routines used to compute the products between one of the covariance matrices and vectors. To achieve it, we propose to model the true signal and the noise components using the approach presented in the previous section. Hence, every covariance matrix in (14) can be written as (32) with specific matrices and obtained after a triangulation of the spatial domain and taking into account the local anisotropy parameters of the corresponding field. Then we use a polynomial trick to efficiently compute the product between these covariance matrices and vectors. Such an approach has been proposed by Dietrich1995 and pereira2019efficient to deal with the simulation of Gaussian random fields, and by for instance higham2008functions and hammond2011wavelets for dealing with applications involving matrix functions.
3.4.2 Polynomial trick for matrixvector products
A product between a covariance matrix of the form (32) and a vector is obtained in three steps:

Compute the product .

Compute the product .

Return the product .
On one hand, steps 1 and 3 consists in products between a diagonal matrix and a vector, and hence require operations if the diagonal entries of are stored. On the other, step 2 would in general require either to have diagonalized the matrix
at some point and to have stores its full set of eigenvectors and eigenvalues.
However, whenever is a polynomial , it is straightforward to check (via a proof by induction) that the definition of through (28) is equivalent to the traditional notion of matrix polynomial, i.e.
(37) 
where
is the identity matrix. The product of step 2 can then be computed without requiring any diagonalization as its given by
(38)  
and therefore can be computed with an iterative approach that requires at each iteration only a single product between and a vector. Hence, in the polynomial case, only needs to be stored which represents a storage need of where is the mean number of nonzero entries on a row of , and the computational cost of step 2 now amounts to operations.
Now, when is not polynomial, we propose to replace it by an appropriate polynomial . In particular, this polynomial is chosen so that it is a good approximation of over a segment that contains the eigenvalues of . This is sufficient to ensure that then , following the definition (28) of matrix functions and the fact that by definition of , , . As for the segment of approximation, it can be obtained by upperbounding the largest eigenvalue of , which can be done at a negligible cost (of order ) using either the Gershgorin circle theorem (gershgorin1931uber) or the fact that
(39) 
In practice, the polynomial is determined using Chebyshev polynomial approximation (mason2002chebyshev, press2007numerical). As outlined by pereira2019efficient, this choice guarantees (for Lipschitzcontinuous):

the uniform convergence of the approximation over the segment as the degree of the chosen polynomial increases,

the fact that at any order of approximation the polynomial is close (with respect to the uniform norm) to the best polynomial approximation of same degree,

the fact that the coefficients of the polynomial in the Chebyshev basis of polynomials can be computed very efficiently using the Fast Fourier Transform (FFT) algorithm of cooley1965algorithm, which has a complexity of for the computation of coefficients.
As for the choice of the order of approximation, pereira2019efficient
provide heuristics based on the theory of statistical tests and on the explicit derivation of numerical errors.
We now have all the ingredients to produce the missing piece of our matrixfree approach to filtering: computationally efficient routines to compute the product between covariance matrices of the form (32) and vectors. These are laid out in Algorithm 2. Note in particular that the storage needs of this type of routine are order and its computational cost is of order (at each iteration, the most costly operation is the product between and a vector). Hence these routines are highly scalable given that they scale linearly with the size of the problem.
3.4.3 Summary of the proposed implementation
We now conclude the presentation of matrixfree approach to geostatistical filter that we propose, and that we summarized in Figure 3. Starting from a noisy signal observed across a spatial domain, we first determine the anisotropy and covariance parameters characterizing each component of the signal. The anisotropy parameters can be determined using for instance gradient based algorithms on a smoothed out version of the signal, or using auxiliary variables. The covariance parameters can be determined using local variogram modeling. This first step is actually common to any geostationary filtering approach aiming at incorporating local information into the process (magneron2009m, piazza2015m). Besides, the spatial domain is triangulated at the observed locations. In the particular case of observation on a grid, the triangulation needs no work as we can simply divide the grid cells into triangles or tetrahedrons.
Then, for each component of the signal, the corresponding finite element matrices and , which incorporate the local anisotropy information, are built using (33) and (34). An interval containing all the eigenvalues of is deduced (for instance through (39)) and the coefficients of the Chebyshev polynomial approximation of the spectral spectral density of the component over are computed. Then, for each component, the routines of Algorithm 2 aiming at computing the product between the covariance matrix of the component and vectors are written. Finally, Algorithm 1 is used to compute the factorial kriging estimates of the true signal.
4 Applications
4.1 Synthetic case study
In this first case study, we simulated two Gaussian random fields on a grid, both presenting local anisotropies. We summed them to define our input (noisy) signal (cf. Figure 4). Hence, the noisy signal to be filtered is composed of:

A nonstationary field defined by a Matérn covariance function with smoothness parameter , ranges and along its principal directions, and sill . It has local anisotropies that describe a vortexlike shape. This field is the true signal we want to extract.

A nonstationary field defined by an exponential covariance function with ranges and along its principal directions, and sill . It has local anisotropies that describe a “X” shape.
The filtering process is launched on the noisy image. The covariance parameters and the angles defining the anisotropies of the true signal and of the noise are directly used to compute the factorial kriging estimate of the true signal at each point of the grid. The output obtained from the filtering process is presented in Figure 5. As we see, the filtering process successfully extracted the true signal from the noisy observation. However we seem to obtain a smoothed version of the input: this is a consequence of the fact that the true signal is estimated through a kriging approach, which tends to yield smoothed outputs (je ne suis pas d’accord avec cette dernière phrase dans le cas du factorial kriging) (wackernagel2013multivariate).
4.2 Real case study
The case study corresponds to the application of the geostatistical filtering on a vintage 2D seismic line acquired in the Amadeus basin (onshore Australia), and provided by the company CENTRAL PETROLEUM. The data was originally acquired in 1966, reprocessed in 1984 and vectorized from a hard copy in 2010. It is displayed in Figure 6. As one may notice, the image is very noisy due to its long history. Moreover, in some parts of the image, the signal is almost completely attenuated by the noise: this is the case in high dip areas, where the high slope of the geological interface made it hard to retrieve a satisfying level of signal from the seismic measurements.
The first step was to derive a generic variogram model composed of the signal and noise structures. Random and linear noises were identified and characterized through stationary covariance functions. This was done following the same approach as the one described by magneron2009m. In particular, a smooth component corresponding to the true signal was identified, and additional noisy components characterized by global geometric anisotropies were identified. This work was done by expert geophysicists, who used their prior knowledge of the dataset to separate what is supposed to be the noise from the signal (during the variogram modeling step).
Then, local dips were assigned to the signal to be consistent with the geological structure. This was done using the Paleoscan^{TM} software from the company ELIIS, which allows to identify the global shape of some geological interfaces from noisy seismic images (cf. Figure 7). The angles describing locally these interfaces were extracted and interpolated on the whole domain. They serve as local anisotropy angles defining the signal to extract with the filtering process.
Thus, with local anisotropies defined, the geostatistical filtering approach allowed the filtering out of noise while preserving the true signal. This is clearly demonstrated in Figure 8 where there is no obvious signal remaining in the noise image. More impressive was the restoration of the signal in the high dip areas, which was only possible using our filtering approach, since the linear noise interferes strongly with the signal in these parts of the image.
The results were validated by CENTRAL PETROLEUM, and it was proposed to use this approach of geostatistical filtering solution as a valid alternative to expensive full seismic reprocessing of seismic lines. Note that in this case, the number of observation locations, which corresponds to the number of grid points, amounted to more than millions of points. At this level, the matrixfree approach we propose reveals its benefits: it allows us to “virtually” work with the huge full matrices that come with the complex nonstationary models we chose to model the complexity of the dataset.
5 Conclusion
In this work, we introduced a matrixfree approach to geostatistical filtering. In this approach, the noisy signal is modeled as a sum of independent Gaussian random fields defined through an expansion in a basis of eigenvalues of the Laplace (or Laplace–Beltrami) operator, defined over the spatial domain. These random fields are then numerically approximated using a finite element method. From a modeling perspective, this approach allows to easily account for nonstationary fields characterized by local anisotropies, without having to resort to neighborhood approaches or to restrict the choice of covariance model. From a practical point of view, the recourse to finite elements allows to reduce the memory and computational needs: indeed, only a few sparse matrices need to be built and stored to run the filtering process thanks to a polynomial trick based on Chebyshev approximation.
This approach can be extended to more than filtering, and actually paves the way to matrixfree approach to Geostatistics. Indeed, the expression of (nonstationary) covariance matrices as a matrix function depending on a sparse matrix, that we leveraged for filtering, can also be exploited to perform any other task involving covariance matrices. For instance, we can cite the inference of the parameters characterizing a Gaussian field using a likelihoodbased approach, simulations of Gaussian fields on a set of locations of a domain, and the estimation of a Gaussian field from its partial observation, using a kriging approach. Matrixfree algorithms for these purposes were actually derived by pereiraPhd2019.
Acknowledgements
The authors would like to thank Central Petroleum for the authorization to publish the results related to the 1966 seismic line (Amadeus basin). They would also like to thank Estimages and the Chalmers AI Research Center (CHAIR) for their financial and material support.