 # Kriging in Tensor Train data format

Combination of low-tensor rank techniques and the Fast Fourier transform (FFT) based methods had turned out to be prominent in accelerating various statistical operations such as Kriging, computing conditional covariance, geostatistical optimal design, and others. However, the approximation of a full tensor by its low-rank format can be computationally formidable. In this work, we incorporate the robust Tensor Train (TT) approximation of covariance matrices and the efficient TT-Cross algorithm into the FFT-based Kriging. It is shown that here the computational complexity of Kriging is reduced to O(d r^3 n), where n is the mode size of the estimation grid, d is the number of variables (the dimension), and r is the rank of the TT approximation of the covariance matrix. For many popular covariance functions the TT rank r remains stable for increasing n and d. The advantages of this approach against those using plain FFT are demonstrated in synthetic and real data examples.

## Code Repositories

### TT-FFT-COV

Field generation and kriging using circulant covariance matrix and TT approximations

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Kriging is an interpolation method that makes estimates of unmeasured quantities based on (sparse) scattered measurements. It is widely applied in the estimation of some spatially distributed quantities such as daily moisture, rainfall intensities, temperatures, contaminant concentrations or hydraulic conductivities, etc.

[40, 22]. Kriging is also used as a surrogate of some complex physical models for the purpose of efficient uncertainty quantification (UQ), in which it estimates the model response under some random perturbation of the parameters. In the first case the estimation grids are usually in two or three dimensions [60, 9, 18] or four dimensions in a space-time Kriging [3, 34, 21]

, while in the latter the dimension number could be much larger (equals to the number of uncertain parameters). When considering finely resolved estimation grids (which is often the case for UQ jobs), Kriging can easily exceed the computational capacity of modern computers. In this case estimation variance of Kriging or solving the related geostatistical optimal design problems incurs even higher computational costs

[41, 43, 55]. Kriging mainly involves three computational tasks. The first is solving a system of equations to obtain the Kriging weights, where is the number of measurements. Despite its complexity this task is better manageable since is usually much smaller than the number of estimates on a fine grid, , the dimensionality, especially when the measurement is expensive like for complex physical models. The second task is to compute the

Kriging estimates by multiplying the weights vector to the

cross-covariance matrix between measurements and unknowns. The third task is to evaluate the estimation variances as the diagonal of a conditional covariance matrix. If we take the optimal design of sampling into account, there is an additional task to repeatedly evaluate the conditional covariance matrix for the purpose of a high-dimensional non-linear optimization [32, 54, 51].

Remarkable progress had been made in speeding up Kriging computations by Fast Fourier transform (FFT) . The low-rank tensor decomposition techniques brought a further possible reduction in the time cost, since -dimensional FFT on a tensor in low-rank format can be made at the cost of a series of 1-dimensional FFT’s, as exemplified in  by using canonical, Tucker and Tensor Train formats of tensors. The work in  brought a significant further reduction of computational cost for the second and third Kriging tasks as well as the task for the optimal design of sampling by applying a low-rank canonical tensor approximation to the vectors of interest.

In this paper, we enhance the methodology proposed in  by employing a more robust low-rank Tensor Train (TT) format instead of the canonical format. We apply the TT-cross algorithm for efficient approximation of tensors, which is a key improvement compared to the method introduced in  where the low-rank format of the covariance matrix was assumed to be given. We also consider a more broad Matérn class of covariance functions.

The current work improves the applicability of the use of low-rank techniques in the FFT-based Kriging. We achieve a reduction of the computational complexity of Kriging to the level of , where is the considered TT rank of the approximation, and is the number of grid points in one direction, such that is the total number of estimated points.

We assume second-order stationarity for the covariance function and simple Kriging on a rectangular, equispaced grid parallel to the axes.

We also discuss possible extensions to non-rectangular domains and to general (scattered) measurement points. In such cases, the tensor ranks may significantly increase, up to the full rank. For the cases when FFT technique is not applicable the authors of [52, 37, 35, 29] applied the hierarchical matrix technique (-matrices). A parallel implementation of Kriging was done in .

### 1.1 State of the art for FFT-based Kriging

Let us assume that the covariance function is second-order stationary and is discretized on a tensor (regular and equispaced grid) mesh with points. Then the auto-covariance matrix of the unknowns has a symmetric (block-) Toeplitz structure (Section 3.1), which can be extended to a (block-) circulant matrix by a periodic embedding in which the number of rows and columns is enlarged, for example, from to [49, 22, 31]. It is known  that only the first column of the circulant matrix has to be stored. This reduces the computing cost from quadratic to log-linear  in . The key in the FFT-based Kriging is the fact that the multiplication of a circulant matrix and a vector is a discrete convolution which can be computed swiftly through FFT algorithm so that the quadratic computational complexity is also reduced to a log-linear one .

If the measurements are given on a regular equispaced grid, the first Kriging task is solving a system also with a symmetric positive-definite Toeplitz matrix [11, 4]. Further development of methods handling measurements that are on a subset of a finer regular grid have been made in [49, 11].

The work in  combined the power of FFT and the low-rank canonical tensor decomposition. It was assumed that the covariance matrix and the vector of interest (of size ) are available in a low-rank canonical tensor format which is a sum of Kronecker products of vectors of size each, with . Separable covariance functions (e.g. Gaussian, separate exponential) can be decomposed exactly with . For smooth non-separable covariance functions, a small value can usually give a good approximation.

The canonical tensor representation can not only greatly reduce the memory storage size of the circulant matrix, but also speed up the Fourier transform since the -dimensional FFT applied on the Kronecker product of matrices can be implemented by computing the 1-dimensional FFT on the first direction of each matrix. This reduces the complexity to . For this is a significant reduction from the complexity of FFT on the full tensor, which is .

### 1.2 Goals, approach and contributions

However, converting a full tensor to a well approximating low-rank tensor format can be computationally formidable. Simply generating the full tensor itself might be beyond the memory capacity of a desktop computer. To make the low-rank FFT-based method practical, we need an efficient way to obtain a low-rank approximation directly from the multi-dimensional function that underlies the full tensor. It could be a challenging task though to approximate the first column of the Toeplitz (circulant) matrix in the canonical tensor format for . This is due to the fact that the class of rank- canonical tensors is a nonclosed set in the corresponding tensor product space (pp 91-92 in ). The Tucker format tensor decomposition [27, 17, 15] adopted in  could be too costly to use for problems with .

In this paper, we adopt an alternative tensor format, namely, the Tensor Train (TT) format [47, 17] (introduced in Section 4.1

) which can be obtained from a full tensor in a stable direct way by a sequence of singular value decompositions of auxiliary matrices, or, more importantly, it can be computed iteratively by the

TT-cross method  which has the complexity in the order of , see Section 4.2 for more details. Often this is the most time-consuming stage of Kriging operations. Once the tensors are approximated in the TT format, the FFT can be carried out with a modest complexity. This makes the overall low-rank FFT-based Kriging practical for high dimensions. We test the efficiency of the method in terms of computational time and memory usage in Section 5.

Thus, our paper is novel in three aspects: (i) we approximate the covariance matrix in the low-rank TT tensor format using only the given covariance function as a black box (this part was missing in ), (ii) we extend the methodology to Matérn, exponential and spherical covariance functions (in addition to Gaussian functions), and (iii) we demonstrate that the low-rank approach enables high-dimensional Kriging.

### 1.3 Notation

We denote vectors by bold lower-case letters (e.g., , , ) and matrices by bold upper-case letters (e.g., , , ). Letters decorated with an overbar represent the size of the tensor grid of estimates. Embedded matrices, vectors and their sizes are denoted by letters with a check accent (e.g., , , , ). stands for -dimensional Fourier transform (FT), for one-dimensional FT along the -th dimension. and are their inverse operators.

## 2 Kriging and geostatistical optimal design

Like in , we work with the function estimate form [30, 31] of Kriging (introduced in Section 2.2). We take simple Kriging in which the estimates are assumed to have zero mean.

### 2.1 Matérn covariance

A low-rank approximation of the given function or a data set is a key component of the tasks formulated above. Among of the many covariance models available, the Matérn family  is widely used in spatial statistics and geostatistics.

The Matérn covariance function is defined as

 Cν,ℓ(r)=21−νΓ(ν)(√2νrℓ)νKν(√2νrℓ). (1)

Here is the distance between two points and in ; defines the smoothness. The larger is parameter , the smoother is the random field. The parameter is called the covariance length and measures how quickly the correlation of the random field decays with distance. denotes the modified Bessel function of order . It is known that setting we obtain the exponential covariance model. The value corresponds to a Gaussian covariance model.

In , the authors provided the analytic sinc-based proof of the existence of low-rank tensor approximations of Matérn functions. They investigated numerically the behavior of the Tucker and canonical ranks across a wide range of parameters specific to the family of Matérn kernels. It could be problematic to extend the results of this work to , since one of the terms in the Tucker decomposition storage cost is growing exponentially with .

### 2.2 Computational tasks in Kriging and optimal sampling design

Task-1. Let denote a -size vector containing the sampled values, denote the auto-covariance matrix. If the measurements are not exact and the covariance matrix of the random measurement error is available, is to be added to . The first task is to solve the below system for the Kriging weights :

 Cyyξ=y (2)

Task-2. With the weights we can obtain the Kriging estimates (sized ) by a superposition of columns of the cross-covariance matrices (sized ) weighted by , i.e. the Kriging estimate is given by :

 ^s=Csyξ. (3)

Task-3. The variance of the estimates is to be obtained from the diagonal of the conditional covariance matrix :

 ^σ2s=diag(Css|y) = diag(Css−CsyC−1yyCys) (4) = diag(Css)−N∑i=1(Csyζi)∘2,

where is the -th column of with the lower triangular Cholesky factor matrix of , and the superscript denotes Hadamard square.

Task-4. The goal of geostatistical design is to optimize sampling patterns (or locations) for . There two most common objective functions to be minimized, which are also called - and - criteria of geostatistical optimal design [41, 43, 5]:

 ϕA = ¯N−1trace[Css|y] ϕC = z⊤Css|yz=z⊤(Css−CsyC−1yyCys)z, (5)

where is a data vector .

## 3 Interface from Kriging to FFT-based methods

In this section we give a brief introduction to the basics of FFT-based Kriging . We assume that the measurement points are a subset of the estimate grid points. The simplest version of Kriging is a direct injection: the estimated values are set equal to the measurement values at the corresponding locations, and to zeros at all other points. Equivalently, we say that we inject a (small) tensor of measurements into a (larger) tensor of estimations.

For the FFT-based Kriging we use a regular, equispaced grid which leads to a (block) Toeplitz covariance matrix that can be augmented to a circulant one (Section 3.1). An embedding operation augments the injected tensor to the size that is compatible with the circulant covariance matrix. The (pseudo-)inverse of embedding is called extraction (Section 3.2).

### 3.1 Embedding Toeplitz covariance to circulant matrices

A Toeplitz matrix is constant along each descending diagonal (from left to right). A block Toeplitz matrix has identical sub-matrices in each descending diagonal block and each sub-matrix Toeplitz. If the covariance function is stationary and the estimates are made on a -dimensional regular, equispaced grid, the covariance matrix is symmetric level- block Toeplitz . Since submatrices are repeating along diagonals the required storage could be reduced from to elements [61, 23].

A circulant matrix is a Toeplitz matrix that has its first column periodic. This type of matrices come from covariance functions that are periodic in the domain. A circulant matrix-vector product can be computed efficiently by FFT 

. The eigenvalues of

can be computed as the Fourier transform of its first column [58, 2, pp. 350-354]. These properties lead us to the fast FFT-based kriging methods.

A Toeplitz matrix can always be augmented to a circulant matrix . This process is called embedding. Let be the first column of . Embedding is often done by appending the second through the last but one element of to the end of in reverse order, which makes a periodic vector . For the cases , this augmentation has to be done recursively in every level for the -level Toeplitz covariance matrix. An equivalent way of doing this is to augment the domain (to be times larger) and extend the covariance function to be periodic on the domain, as illustrated in [33, 45]. In [42, 6, 45] the authors have addressed the issue of the minimum embedding size.

### 3.2 Injection, embedding and extraction of data tensors

Suppose we obtained the Kriging weights for the measurements by solving (2). The injection of means to insert it in a larger all-zero tensor that has the same size of the estimate tensor, i.e. the injected tensor has non-zero entries only at the measurement sites.

Suppose we have measurements indexed by , each associated with a weight and a site index vector , then the injection of results in a tensor with entries:

 (6)

We denote the injection operation by .

Embedding an injected weight tensor enhances its mode size from to

by padding zeros to the extra entries so that the tensor is of

times the original size. The embedded weight tensor has entries:

 (7)

We denote the embedding operation by .

The extraction is the inverse operation of embedding, we denoted it by . By we take only the first half of in every dimension, which results in a new tensor of only of the size of .

### 3.3 Matrix-vector multiplication via FFT

With the circulant covariance matrix obtained as explained in Section 3.1, the Task-2 in (3) becomes a discrete convolution which can be computed by using FFT, this is written as (e.g., Fritz, Nowak and Neuweiler, ):

 Csyξ=CssH(ξ) =M†ˇCM(H(ξ)) =M†F[−d](F[d](ˇc)∘F[d](ˇξ)). (8)

where the operation injects and embeds into . The is evaluated by the Fast Fourier Transformation (FFT) . Without using tensor approximations the computational complexity for Kriging is reduced to , and the storage size reduced to .

For the variance estimation (Task-3) in (4) the FFT method also applies. We first need to do a Cholesky decomposition , and inject and embed each column of to get the corresponding . Then (4) can be computed as

 ^σ2s= σ2s1¯N−N∑i=1[M†F[−d](F[d](ˇc)∘F[d](ˇζi))]∘2, (9)

where is the prior variance, is a -length vector of all ones.

## 4 FFT-based Kriging accelerated by low-rank tensor decomposition

In addition to the efficient FFT-based method enabled by the Teoplitz structure of covariance matrices, the Kriging process can be further sped up by low-rank representations of the embedded covariance matrices. Since the covariance functions are usually smooth, large covariance matrices could be well approximated by a low-rank tensor format. A literature survey of low-rank tensor approximation techniques is available in [27, 15].

In this section, we approximate the first column of the circulant covariance matrix in tensor train (TT) format and then rewrite 8 also in the TT format. We start with a brief reviewing of the TT technique.

### 4.1 TT decomposition

We assume that the data vectors (, , etc.) can be associated to a function discretised on a structured grid in dimensions, for example, if is sampled on a Cartesian 3-dimensional grid,

 ξ={ξ(xi1,yi2,zi3)}n1,n2,n3i1,i2,i3=1. (10)

Then we can enumerate the entries of the vector via sub-indices , thereby seeing it as a tensor with elements We approximate such tensors, and, consequently, associated data vectors, in the Tensor Train (TT) decomposition ,

 ξ(i1,i2,…,id)≈~ξ(i1,i2,…,id):=r0,…,rd∑α0,…,αd=1ξ(1)α0,α1(i1)ξ(2)α1,α2(i2)⋯ξ(d)αd−1,αd(id). (11)

Here , , are called TT blocks. Each TT block is a three-dimensional tensor of size , . The efficiency of this representation relies on the TT ranks being bounded by a moderate constant . For simplicity we can also introduce an upper bound of the univariate grid sizes . Then we can notice that the TT format (11) contains at most elements. This is much smaller than the number of entries in the original tensor which grows exponentially in . Using Kronecker products, one can rewrite (11) as follows,

 ~ξ=r0,…,rd∑α0,…,αd=1ξ(1)α0,α1⊗ξ(2)α1,α2⊗⋯⊗ξ(d)αd−1,αd,

i.e. we see each TT block as a set of vectors of length .

Of course, one can think of any other scheme of sampling a function, e.g. at random points, but the TT decomposition requires independence of sub-indices , and therefore the Cartesian product discretisation. The rationale behind using this, on the first glance excessive, scheme, is the fast convergence of the approximation error with the TT ranks. If is analytic, the TT ranks often depend logarithmically on [56, 26, 53]. Combining the TT approximation with collocation on the Chebyshev grid, which allows to take for analytic functions, one arrives at overall cost of interpolation or integration using the TT format. This can be significantly cheaper than the

cost of Monte Carlo quadrature or Radial Basis function interpolation. Moreover, TT ranks depend usually very mildly on the particular univariate discretisation scheme, provided that it can resolve the function. We can use any univariate grid in each variable instead of the Chebyshev rule. For example, a uniform grid yields Toeplitz or circulant covariance matrices, which are amenable to fast FFT-based multiplication/diagonalisation.

However, it is difficult to obtain sharp bounds for the TT ranks theoretically. Therefore, we resort to robust numerical algorithms to compute a TT approximation of given data.

### 4.2 TT-cross approximation

A full tensor can be compressed into a TT format quasi-optimally for the desired tolerance via the truncated singular value decomposition (SVD) . However, the full tensor might even be impossible to store. In this section we recall the practical TT-cross method  that computes the representation (11) using only a few entries from . It is based on the skeleton decomposition of a matrix , which represents an matrix of rank as the cross (in Matlab-like notation)

 A=A(:,J)A(I,J)−1A(I,:) (12)

of columns and rows, where and are two index sets of cardinality such that (the intersection matrix) is invertible. If , the right-hand side requires only elements of the original matrix.

In order to describe the TT-cross method, we introduce the so-called unfolding matrices , that have the first indices grouped together to index rows, and the remaining indices grouped to index columns. Let us now consider and apply the idea of the matrix cross (12). Assume that there exists a set of index tuples, , such that the -“columns” of the original tensor form a “good” basis for all columns of . The reduction (12) may be formed for rows at positions , which are now optimized by choosing the submatrix such that its volume (modulus of determinant) is maximal. This can be done by the maxvol algorithm  in operations. Now we construct the first TT block as the matrix

. In a practical algorithm, the inversion is performed via the QR-decomposition for numerical stability. Next, we reduce the tensor onto

in the first variable, and apply TT-cross inductively to .

In the -th step, assume that we are given the reduction , a “left” index set , and a “right” set . The reduced unfolding matrix is again feasible for the maxvol algorithm, which produces a set of row positions . The next left set is constructed from by replacing with the corresponding indexes from . Continuing this process until the last variable, where we just copy , we complete the induction.

This process can be also organized in a form of a binary tree, which gives rise to the so-called hierarchical Tucker cross algorithm . In total, we need evaluations of and additional operations in computations of the maximum volume matrices.

The TT-cross method requires some starting index sets . Without any prior knowledge, it seems reasonable to initialize with independent realizations of any easy to sample reference distribution (e.g. uniform or Gaussian). If the target tensor admits an exact TT decomposition with TT ranks not greater than , and all unfolding matrices have ranks not smaller than the TT ranks of , the cross iteration outlined above reconstructs exactly . However, practical tensors can usually only be approximated by a TT decomposition with low ranks. Nevertheless a slight overestimation of the ranks can deliver a good approximation, if a tensor was produced from a regular enough function [1, 7].

However, it might be necessary to refine the sets by conducting several TT cross iterations, going back and forth over the TT blocks and optimizing the sets by the maxvol algorithm. For example, after computing , we “reverse” the algorithm and apply the maxvol method to the columns of a matrix . This gives a refined set of points . The recursion continues from to , optimizing the right sets , while taking the left sets from the previous (forward) iteration. After several iterations, both and can be optimized to the particular target function, even if the starting sets were inaccurate.

This adaptation of points can be combined with the adaptation of ranks. If the initial ranks were too large, they can be reduced to quasi-optimal values for the desired accuracy via SVD. However, we can also increase the ranks by computing the unfolding matrix on an enriched index set: we take from for , and also from an auxiliary set for . This increases the -th TT rank from to . The auxiliary set can be chosen at random  or using a surrogate for the error . The pseudocode of the entire TT cross method is listed in Algorithm 1, where we let for uniformity.

Empowered with the enrichment scheme, we are not limited to just truncating ranks from above. Instead, we can start with a low-rank initial guess and increase the ranks until the desired accuracy is met.

### 4.3 TT representation of general and structured matrices

Let us now consider how the TT format (11) can be generalised to matrices , such as the matrix from (4). Using sub-indices , we can think of a matrix as a -dimensional tensor with elements . However, most matrices in our applications have full ranks, and a straightforward -dimensional TT decomposition would be inefficient. Instead, we consider a permuted, or matrix TT decomposition :

 C(i1,…,id; j1,…,jd)=R0,…,Rd∑β0,…,βd=1C(1)β0,β1(i1,j1)C(2)β1,β2(i2,j2)⋯C(d)βd−1,βd(id,jd), (13)

or in the Kronecker form,

 C=R0,…,Rd∑β0,…,βd=1C(1)β0,β1⊗C(2)β1,β2⊗⋯⊗C(d)βd−1,βd. (14)

The identity matrix can be trivially represented in matrix TT format

with . Furthermore, we can quickly assemble block Toeplitz and circulant matrices if their first column/row is given in the TT format . Let us introduce the operation which assembles a Toeplitz matrix from a vector of its first column and row stacked together, and the operation which assembles a circulant matrix from its first column. Assume that a vector of size or a vector of size are given in the TT format (11),

 c=r0,…,rd∑α0,…,αd=1c(1)α0,α1⊗⋯⊗c(d)αd−1,αd,ˇc=r0,…,rd∑α0,…,αd=1ˇc(1)α0,α1⊗⋯⊗ˇc(d)αd−1,αd (15)

Then the block Toeplitz or circulant matrix, respectively

 C=(d⨂k=1T)c,ˇC=(d⨂k=1C)ˇc,

can be written in the matrix TT formats (13) with the same TT ranks,

 C=r0,…,rd∑α0,…,αd=1(Tc(1)α0,α1)⊗⋯⊗(Tc(d)αd−1,αd),ˇC=r0,…,rd∑α0,…,αd=1(Cˇc(1)α0,α1)⊗⋯⊗(Cˇc(d)αd−1,αd).

Similarly we can apply the multivariate Fourier transform without changing TT ranks:

 (d⨂k=1F)c=r0,…,rd∑α0,…,αd=1(Fc(1)α0,α1)⊗⋯⊗(Fc(d)αd−1,αd), (16)

where is the univariate FFT. This reduces the complexity of FFT from to .

In general, the TT format allows to represent the product of any matrix given in (13) and a compatible vector given in (11) in another TT format  with multiplied ranks,

 Cξ=(r0R0),…,(rdRd)∑γ0,…,γd=1(C(1)β0,β1ξ(1)α0,α1)γ0,γ1⊗⋯⊗(C(d)βd−1,βdξ(d)αd−1,αd)γd−1,γd, (17)

where , .

### 4.4 Kriging operations in TT format

To rewrite the Kriging estimation (8) in low rank format, we first find a TT approximation (15) of by using the TT-cross algorithm introduced in Section 4.2. With the rest of the operations we can proceed in two ways.

#### 4.4.1 Small number of scattered samples

If we assume to be small, the Task-1 of computing Kriging weights, , can be computed directly at low cost. Now we inject the scattered values into a TT tensor of desired size as introduced in (6). Suppose is the position of the th sample, we can define

 Hj=d⨂k=1e(k)j,wheree(k)j(ik)={1,ik=ℓj(k)0,otherwise,

i.e. the injection operation (6) per sample. Now the injected tensor is written in the CP format as

 ¯ξ=N∑j=1ξjHj, (18)

which can be converted to TT format directly by the formula in [16, pp. 380] or using the Alternating Least Squares (ALS)  approximation.

Similarly, we can use the direct truncation or the ALS method for summing columns of with the weights in (4), as well as the summation of different vectors .

Embedding operation (7) is simpler and more efficient: we just need to pad every TT block with zeros. Assuming we are given a vector in the form (11), we construct the following new TT blocks of a vector :

 ˇξ(k)αk−1,αk(ik)={ξ(k)αk−1,αk(ik),ik=1,…,¯nk,0,ik=¯nk+1,…,nk,k=1,…,d. (19)

Similarly, Extraction operation is performed by truncating the range of in each TT block from back to . Most importantly, embedding and extraction can be performed very efficiently without changing the TT ranks, similarly to FFT (16).

Finally, we need to compute the Hadamard products of TT tensors, e.g. in (8). The Hadamard product can be constructed exactly via (17) by noticing that

 s:=c∘ξ=Cξ,forC=diag(c),

or approximately by applying the TT-Cross algorithm to a tensor given elementwise by the formula . The direct multiplication requires operations, and the truncation afterwards has an even higher cost . In contrast, the TT-Cross approach needs computing samples of the target tensor , which means taking samples of the TT decompositions for and and multiplying them. Sampling another TT tensor requires in total operations, which, assuming that the ranks are comparable, , results in a total of operations in the TT-Cross computation of Hadamard products, which is thus preferred in this paper.

For geostatistical optimal design (Task-4) we need to compute the trace of . Since in the Task-3 we obtain already the diagonal of in the TT format, the trace can be evaluated swiftly by computing a dot product with the all-ones tensor.

#### 4.4.2 Large number of structured samples

When is large, the summation (18) can be a difficult operation in the TT format, potentially leading also to the TT ranks being in the order of . However, a large number of samples usually means that these samples are distributed fairly uniformly in the domain of interest. In this case, we switch to the TT computations even before Task-1 in equation (2). First, we interpolate the given samples onto a uniform Cartesian grid with the mesh interval being in the order of the average distance between the original samples. In the remaining operations, we assume that is structured in this way, i.e. it can be seen as a tensor , , . Thus, we can approximate in the TT format.

The solution for weights (2) becomes a rather difficult operation for a large . However, given the TT decompositions for and , the linear system can be solved more efficiently by employing ALS and similar tensor algorithms [19, 8]. Similarly, we can compute for (4) by treating as the right hand side, and expanding accordingly.

If we interpolate onto a periodic uniform Cartesian grid, the matrix becomes circulant, similarly to . In this case we can approximate only its first column in the TT format, perform the Fourier transform to obtain the eigenvalues, and apply again the TT-Cross method to approximate the pointwise division .

## 5 Numerical tests

We used the Matlab package TT-Toolbox ( https://github.com/oseledets/TT-Toolbox) for Tensor Train algorithms. The codes used for numerical experiments are available at https://github.com/dolgov/TT-FFT-COV. All computations are done on a MacBook Pro produced in 2013, equipped with 16GB RAM and an 2.7 GHz Intel Core i7 CPU.

We consider three test cases: 1) a 2-dimensional problem with (it is easy to visualize); 2) a 3-dimensional problem with and 3) 10-dimensional problem with . One of these parameters could be, for example, time. The daily soil moisture data set, used below, is taken from [20, 37, 38], where only one replicate, sampled at locations, is used.

### 5.1 Kriging of daily moisture data

Numerical models play important role in climate studies. These numerical models are complicated and high-dimensional, including such variables as pressure, temperature, speed, and direction of the wind, level of precipitation, humidity, and moisture. Many parameters are uncertain or even unknown. Accurate modeling of soil moisture finds applications in the agriculture, weather prediction, early warnings of flood and in some others. Since the underlined geographical areas are usually large and high spatial resolutions are required, the involved data sets are huge. This could make the computational process in dense matrix format unfeasible or very expensive. By involving efficient low-rank tensor calculus, we can increase the spatial and time resolution and consider more parameters. It is clear that utilization of the rank tensor approximation introduces an additional numerical error in quantities of interest (QoIs). By increasing tensor ranks we reduce this approximation error.

We consider high-resolution soil moisture data from January 1, 2014, measured in the topsoil layer of the Mississippi River basin, U.S.A (Fig. 1). Figure 1: The area where the daily soil moisture data were measured, Mississippi River basin, U.S.A.

Figure 2 shows an example of daily moisture data. On the left picture we used 2000 points , for interpolation, and on the right 4000 points. The third picture shows two set of locations: one with 2000 points, marked with the blue symbol and with 4000 points, marked with red dot. Figure 2: Daily moisture data. Interpolated from (left) 2000 and (center) 40000 measurement points. (right) Two sets of sampling points, 2000 and 4000.

The spatial resolution is 0.0083 degrees, and the distance of one-degree difference in this region is approximately 87.5 km. The grid consists of locations with observations and missing values. Therefore, the available spatial data are not on a regular grid.

The tensor product Kriging is performed as described in Sec. 4.4.2. First, we interpolate the given measurements (Fig. 3, left) onto a (coarse) Cartesian grid with the mesh interval being approximately equal to the average distance between the measurements. Specifically, we ended up with a grid (Fig. 3, center). Then the tensor of values on this coarse grid is approximated into a TT decomposition. Finally, the Kriging estimate (2)–(3) on a fine grid with points (Fig. 3, right) is computed in the TT format using FFT and TT-Cross algorithms. Figure 3: (left) 64000 measurements of the moisture; (center) regression on a coarse 65×65 Cartesian mesh; (right) TT-Kriging approximation on a fine mesh.

### 5.2 High-dimensional field generation: computational benchmark

To generate the following 2D, 3D and 10D random fields we used the Matlab script testgenerateytt.m in https://github.com/dolgov/TT-FFT-COV.

2D example. In this example we generated a high-resolution 2-dimensional Matérn random field in . One realization is presented in Fig. 4. The smoothness of the Matérn field is , covariance lengths in and directions and the variance 10. This realization is computed by the following formula in the TT format

 u′=C1/2ξ=√1nF⊤Λ1/2ξ=√1nF−1(λ1/2∘ξ), (20)

where the inverse Fourier , the square root of eigenvalues , and tensor product of two Gaussian random vectors are approximated in the TT format. Particularly, is a tensor product of two Gaussian vectors. The size of the first column of is and the computing time was 1 sec. With TT procedures one can create very fine resolved random fields in large domains. For instance, generation of a random field in the domain with locations takes less than 1 minute. Figure 4: High-resolution realization of 2D Matérn random field, computed with TT tensor format in [0,2000]2.

3D example. This example is very similar to the previous 2D example. The difference is only that the domain is and the size of the first column of is . The computing time was 3 minutes.

10D example. In this example, we generated a 10-dimensional Matérn random field. One of the dimensions could be time, for example. Table 1 contains all model parameters and the number of unknowns in (hypothetical) full tensor and in the TT decomposition of the final field . In this example we computed TT approximation of the first column of the multilevel circulant covariance matrix (cf. [24, 25]). Then we diagonalized this circulant matrix via FFT and computed square root of diagonal elements. After that we generated a random field by multiplying the square root with a random vector of the following structure , where is a normal vector. We note that we never store the whole vector explicitly, but only it’s tensor components . Also, note that is not Gaussian.

The TT approximation tolerance is set to . In the 10-dimensional case above the maximal rank was 143, and the total computing time 118 sec. In the similar 8-dimensional case the maximal rank was 138, and the total computing time 96 sec. Of course, one should observe tensor ranks not only of , but of other steps such as the TT approximation of the measurement vector and of the first column of the covariance matrix. These TT ranks were smaller than the TT ranks of the final solution though.

## 6 Discussion and Conclusions

In this paper, we proposed an FFT-based Kriging that utilizes a low-rank Tensor Train (TT) approximation of the covariance matrix. We apply the TT-Cross algorithm to generate a low-rank decomposition avoiding full tensors which could be well beyond the memory capacity of a desktop PC.

The low-rank format reduces the storage of the embedded circulant covariance matrix from exponential to linear in the number of variables. The circulant matrix can be diagonalized by FFT. Furthermore, due to the linearity of the Fourier transform, the TT format allows to implement the -dimensional FFT at the cost of one-dimensional FFT operations.

We then use the same technique to generate large Matérn random fields since the diagonalized covariance matrix gives eigen pairs for the spectral expansion of the underlying random field. We show in numerical examples that this method can generate very large random fields with a commonly affordable computational resource.

We demonstrated how to utilize the TT tensor format to speed up such geostatistical tasks as the generation of large random fields, computing kriging coefficients, kriging estimates, conditional covariance, and geostatistical optimal design. We used the fact that after discretization on a tensor grid the obtained matrix could be extended to a circulant one. Then, much expensive linear algebra operation could be done via -dimensional FFT. From the definition, one can see that FFT has tensor rank 1. After approximating the first column of the circulant matrix in the TT format (we assumed that such approximation exists) we were able to apply efficient TT tensor arithmetics and speedup expensive calculations even more. Utilizing TT format in FFT calculus allowed us to decrease computational cost and storage from to , where is the tensor rank, the dimensionality of the problem and is the number of points along the single longest edge of the estimation grid.

The presented numerical techniques have memory requirements as low as . Thus, we achieved log-complexity in the total number of lattice points. The resulting methods allow much better spatial resolution and significantly reduce the computing time.

The fundamental assumptions are: the covariance matrix is separable or has a TT-rank , the interpolation grid is a rectangular tensor grid, and the measurements also lie in the tensor grid. The random vector used to generate the random field is a Kronecker product of smaller random vectors.

## Acknowledgments

The research reported in this publication was supported by funding from the Alexander von Humboldt Foundation. We also would like to thank Wolfgang Nowak for sharing his Matlab code.

## References

•  J. Ballani and L. Grasedyck. Hierarchical tensor approximation of output quantities of parameter-dependent PDEs. SIAM/ASA Journal on Uncertainty Quantification, 3(1):852–872, 2015.
•  S. Barnett. Matrices Methods and Applications. Oxford Applied Mathematics and Computing Science Series. Clarendon Press, Oxford, 1990.
•  P. Bogaert. Comparison of kriging techniques in a space-time context. Mathematical Geology, 28(1):73–86, 1996.
•  R. H. Chan and M. K. Ng. Conjugate gradient methods for Toeplitz systems. SIAM Review, 38(3):427–482, 1996.
•  O. A. Cirpka and W. Nowak. First-order variance of travel time in non-stationary formations. Water Resour. Res., 40(3):W03507, 2004.
•  C. R. Dietrich and G. N. Newsam. Fast and exact simulation of stationary Gaussian processes through: Circulant embedding of the covariance matrix. SIAM J. Sci. Comput., 18(4):1088–1107, 1997.
•  S. Dolgov and R. Scheichl. A hybrid Alternating Least Squares – TT Cross algorithm for parametric PDEs. arXiv preprint 1707.04562, 2017.
•  S. V. Dolgov and D. V. Savostyanov. Alternating minimal energy methods for linear systems in higher dimensions. SIAM J. Sci. Comput., 36(5):A2248–A2271, 2014.
•  P. A. Finke, D. J. Brus, M. F. P. Bierkens, T. Hoogland, M. Knotters, and F. De Vries. Mapping groundwater dynamics using multiple sources of exhaustive high resolution data. Geoderma, 123(1):23–39, 2004.
•  M. Frigo and S. G. Johnson. FFTW: An adaptive software architecture for the FFT. In Proc. ICASSP, volume 3, pages 1381–1384, IEEE, Seattle, WA, 1998. http://www.fftw.org.
•  J. Fritz, W. Nowak, and I. Neuweiler. Application of FFT-based algorithms for large-scale universal Kriging problems. Math. Geosci., 41(5):509–533, 2009.
•  M. G. Genton. Separable approximations of space-time covariance matrices. Environmetrics, 18:681–695, 2007.
•  S. A. Goreinov, I. V. Oseledets, D. V. Savostyanov, E. E. Tyrtyshnikov, and N. L. Zamarashkin. How to find a good submatrix. In V. Olshevsky and E. Tyrtyshnikov, editors, Matrix Methods: Theory, Algorithms, Applications, pages 247–256. World Scientific, Hackensack, NY, 2010.
•  S. A. Goreinov, E. E. Tyrtyshnikov, and N. L. Zamarashkin. A theory of pseudoskeleton approximations. Linear Algebra Appl., 261:1–21, 1997.
•  L. Grasedyck, D. Kressner, and C. Tobler. A literature survey of low-rank tensor approximation techniques. GAMM-Mitt., 36(1):53–78, 2013.
•  W. Hackbusch. Elliptic differential equations, volume 18 of Springer Series in Computational Mathematics. Springer-Verlag, Berlin, 1992. Theory and numerical treatment, Translated from the author’s revision of the 1986 German original by Regine Fadiman and Patrick D. F. Ion.
•  W. Hackbusch. Tensor Spaces and Numerical Tensor Calculus. Springer Series in Computational Mathematics. Springer Verlag, 2012.
•  M. R. Haylock, N. Hofstra A. M. G. Klein Tank, E. J. Klok, P. D. Jones, and M. New. A european daily high-resolution gridded data set of surface temperature and precipitation for 1950–2006. J. Geophys. Res, 113:D20119, 2008.
•  S. Holtz, T. Rohwedder, and R. Schneider. The alternating linear scheme for tensor optimization in the tensor train format. SIAM J. Sci. Comput., 34(2):A683–A713, 2012.
•  H. Huang and Ying S. Hierarchical low rank approximation of likelihoods for large spatial datasets. Journal of Computational and Graphical Statistics, 27(1):110–118, 2018.
•  S. De Iaco, S. Maggio, M. Palma, and D. Posa. Toward an automatic procedure for modeling multivariate space-time data. Computers & Geosciences, 41:1–11, 2011.
•  A. G. Journel and C. J. Huijbregts. Mining Geostatistics. Academic Press, New York, 1978.
•  T. Kailath and A. H. Sayed. Displacement structure: Theory and applications. SIAM Review, 37(3):297–386, 1995.
•  V. Kazeev, B. Khoromskij, and E. Tyrtyshnikov. Multilevel Toeplitz matrices generated by tensor-structured vectors and convolution with logarithmic complexity. SIAM J. Sci. Comput., 35(3):A1511–A1536, 2013.
•  V. Khoromskaia and B. N. Khoromskij. Block circulant and toeplitz structures in the linearized hartree–fock equation on finite lattices: Tensor approach. Computational Methods in Applied Mathematics, 17(3):431–455, 02 2017.
•  B. N. Khoromskij. Structured rank- decomposition of function-related operators in . Comput. Methods Appl. Math, 6(2):194–220, 2006.
•  B. N. Khoromskij. Tensor-structured numerical methods in scientific computing: Survey on recent advances. Chemom. Intell. Lab. Syst., 110(1):1–19, 2012.
•  B. N. Khoromskij. Tensor numerical methods in scientific computing. Walter de Gruyter GmbH & Co KG, 2018.
•  B. N. Khoromskij and A. Litvinenko. Data sparse computation of the karhunen‐loève expansion. AIP Conference Proceedings, 1048(1):311–314, 2008.
•  P. K. Kitanidis. Analytical expressions of conditional mean, covariance, and sample functions in geostatistics. Stoch. Hydrol. Hydraul., 12:279–294, 1996.
•  P. K. Kitanidis. Introduction to Geostatistics. Cambridge University Press, Cambridge, 1997.
•  J. B. Kollat, P. M. Reed, and J. R. Kasprzyk. A new epsilon-dominance hierarchical bayesian optimization algorithm for large multiobjective monitoring network design problems. Adv. in Water Res., 31(5):828 – 845, 2008.
•  B. Kozintsev. Computations with Gaussian random fields. PhD thesis, Institute for Systems Research, University of Maryland, 1999.
•  P. Kyriakidis and A. Journel. Geostatistical space–time models: A review. Mathematical Geology, 31:651–684, 1999. doi:10.1023/A:1007528426688.
•  A. Litvinenko. HLIBCov: Parallel Hierarchical Matrix Approximation of Large Covariance Matrices and Likelihoods with Applications in Parameter Identification. arXiv 1709.08625, Sep 2017.
•  A. Litvinenko, D. Keyes, V. Khoromskaia, B. N. Khoromskij, and H. G. Matthies. Tucker tensor analysis of matérn functions in spatial statistics. Computational Methods in Applied Mathematics, 19(1):101–122, 2019.
•  A. Litvinenko, Y. Sun, M. G. Genton, and D. E. Keyes. Likelihood approximation with hierarchical matrices for large spatial datasets. Computational Statistics Data Analysis, 137:115 – 132, 2019.
•  A. Litvinenko, Y. Sung, H. Huang, M. G. Genton, and D. E. Keyes. Github repository: daily moisture data, https://github.com/litvinen/HLIBCov.git, 2017.
•  B. Matérn. Spatial variation. Springer, Berlin, Germany, 1986.
•  G. Matheron. The Theory of Regionalized Variables and Its Applications. Ecole de Mines, Fontainebleau, France, 1971.
•  W. G. Müller. Collecting spatial data. Optimum design of experiments for random fields. Springer, Berlin, Germany, 3 edition, 2007.
•  G. N. Newsam and C. R. Dietrich. Bounds on the size of nonnegative definite circulant embeddings of positive definite Toeplitz matrices. IEEE Transactions on Information Theory, 40(4):1218–1220, 1994.
•  W. Nowak. Measures of parameter uncertainty in geostatistical estimation and geostatistical optimal design. Math. Geosciences, 42(2):199–221, 2010.
•  W. Nowak and A. Litvinenko. Kriging and spatial design accelerated by orders of magnitude: Combining low-rank covariance approximations with fft-techniques. Mathematical Geosciences, 45(4):411–435, May 2013.
•  W. Nowak, S. Tenkleve, and O. A. Cirpka. Efficient computation of linearized cross-covariance and auto-covariance matrices of interdependent quantities. Math. Geol., 35(1):53–66, 2003.
•  I. V. Oseledets. DMRG approach to fast linear algebra in the TT–format. Comput. Meth. Appl. Math., 11(3):382–393, 2011.
•  I. V. Oseledets. Tensor-train decomposition. SIAM J. Sci. Comput., 33(5):2295–2317, 2011.
•  I. V. Oseledets and E. E. Tyrtyshnikov. TT-cross approximation for multidimensional arrays. Linear Algebra Appl., 432(1):70–88, 2010.
•  G. G. S. Pegram. Spatial interpolation and mapping of rainfall (SIMAR) Vol.3: Data merging for rainfall map production. Water Research Commission Report, (1153/1/04), 2004.
•  L. Pesquer, A. Cortés, and X. Pons. Parallel ordinary kriging interpolation incorporating automatic variogram fitting. Computers & Geosciences, 37(4):464–473, 2011.
•  P. Reed, B. Minsker, and A. J. Valocchi.

Cost-effective long-term groundwater monitoring design using a genetic algorithm and global mass interpolation.

Water Resour. Res., 36(12):3731–3741, 2000.
•  A. K. Saibaba and P. K. Kitanidis. Efficient methods for large-scale linear inversion using a geostatistical approach. Water Resour. Res., 48(W05522), 2012.
•  R. Schneider and A. Uschmajew. Approximation rates for the hierarchical tensor format in periodic sobolev spaces. Journal of Complexity, 2013.
•  R. Shah and P. M. Reed.

Comparative analysis of multiobjective evolutionary algorithms for random and correlated instances of multiobjective d-dimensional knapsack problems.

European Journal of Operational Research, 211(3):466 – 479, 2011.
•  G. Spöck and J. Pilz. Spatial sampling design and covariance-robust minimax prediction based on convex design ideas. Stochastic Environmental Research and Risk Assessment, 24:463–482, 2010.
•  E. E. Tyrtyshnikov. Tensor approximations of matrices generated by asymptotically smooth functions. Sbornik: Mathematics, 194(6):941–954, 2003.
•  C. F. van Loan. Computational Frameworks for the Fast Fourier Transform. SIAM Publications, Philadelphia, PA, 1992.
•  R. S. Varga. Eigenvalues of circulant matrices. Pacific J. Math., 4:151–160, 1954.
•  J. Vondřejc, D. Liu, M. Ladecký, and H. G. Matthies. FFT-based homogenisation accelerated by low-rank approximations. arXiv e-prints, page arXiv:1902.07455, Feb 2019.
•  S. M. Wesson and G. G. S. Pegram. Radar rainfall image repair techniques. Hydrological and Earth Systems Sciences, 8(2):8220–8234, 2004.
•  D. L. Zimmerman. Computationally exploitable structure of covariance matrices and generalized covariance matrices in spatial models. J. Stat. Comput. Sim., 32(1/2):1–15, 1989.