1 Introduction
Multivariate function approximation to construct a surrogate model for an underlying physical model that is expensive to evaluate is an important topic: it has applications in optimization, reduced order modeling, uncertainty quantification, sensitivity analysis, and many other areas. The first goal of this paper is to provide a multivariate approximation of a function of the form
(1) 
where is a hypercube, are basis functions for and and are the coefficients. Such a decomposition is described as a summation of separable functions and the existence of such an approximation depends on the properties of (such as regularity). If the functions are analytic, then the error in the best approximation decays geometrically with the degree of the polynomial trefethen2017multivariate ; gass2018chebyshev ; the decay is algebraic if the function is in a Sobolev space griebel2019analysis .
This paper proposes a technique for multivariate function approximation, using a combination of Chebyshev polynomial interpolation using function evaluations at Chebyshev interpolation nodes and compression of the tensor (i.e. multiway array) that contains the function evaluations. We use the Tucker format for the tensor to achieve compression. Since it is a natural fit for the application at hand, we can leverage prior work on randomized lowrank tensor approximations
minster2020randomized and we can exploit the structure in the Tucker format to our computational advantage resulting in new algorithms. Similar functional tensor approximations have been used for trivariate functions (i.e., for ) dolgov2021functional ; hashemi2017chebfun but our approach is not limited to three dimensions. The authors in rai2019randomized also use a functional Tucker approximation but use sparsity promoting techniques to reduce the storage cost involving the core tensor. Several authors have constructed multivariate tensor approximations using the Tensor Train format, rather than the Tucker format gorodetsky2019continuous ; bigoni2016spectral ; however, a detailed comparison between the Tucker and Tensor Train formats is beyond the scope of the paper. In addition to tensorbased methods, there is a technique for multivariate function approximation that uses sparse grid interpolation bungartz_griebel_2004 but it makes slightly different assumptions on the regularity of the functions to be approximated. While there are alternatives to the Tucker format for function approximation, to the best of our knowledge, none of the tensor formats have been explored in the context of kernel approximations, which is the second goal of the paper. For this application, using the Tucker format for the tensor has several advantages, which we exploit throughout this paper.Kernel methods are used to model pairwise interactions among a set of points. They are popular in many applications in numerical analysis and scientific computing such as solvers for integral equations, radial basis function interpolation, Gaussian processes, machine learning, geostatistics and spatiotemporal statistics, and Bayesian inverse problems. A major challenge in working with kernel methods is that the resulting kernel matrices are dense, and are difficult to store and compute with when the number of interaction points is large. Many fast kernel summation methods have been developed over the years including the BarnesHut algorithm
barnes1986hierarchical , Fast Multipole Method greengard1987fast , Fast Gauss Transform greengard1991fast , etc.Operations with kernel matrices, which are used to model pairwise interactions, can be accelerated by using lowrank matrix approximations. Over the years, there have been many efforts to compute efficient lowrank approximations in the context of kernel methods cambier2019fast ; xu2018low ; chen2018fast ; ying2004kernel ; ye2020analytical ; bebendorf2009recompression . There are two techniques to generating such approximations. In the first approach, the entire matrix is approximated by a global lowrank matrix (e.g., using Nyström approach). In the second approach, the kernel matrix is represented as a hierarchy of subblocks, and certain of the offdiagonal subblocks are then approximated in a lowrank format (e.g., hackbusch1999sparse ; hackbusch2002data ; grasedyck2003construction ; borm2003hierarchical ; hackbusch2015hierarchical
). The subblocks representing the interactions between these sets of points can be compressed using the singular value decomposition (SVD). However, the cost of computing the SVD can be prohibitively expensive when the number of source and target points is large and/or when the lowrank approximations need to be computed repeatedly over several sets of source and target points. Our goal is to avoid these expensive calculations when computing our approximations.
Our approach builds on the blackbox fast multipole method (BBFMM) lowrank approximation approach in Fong and Darve fong2009black , which uses Chebyshev interpolation to approximate the kernel interactions. In this method, instead of computing the kernel interactions between every pair of points given, the interactions are only computed between points on Chebyshev grids. This significantly reduces the number of kernel evaluations needed. This work is also similar to the semianalytic methods in cambier2019fast ; xu2018low which also use some form of Chebyshev interpolation to reduce the number of kernel interactions. Our work differs in the subsequent lowrank approximation techniques, as we use tensorbased methods instead of the matrix techniques presented in these papers.
Contributions and Contents.
This paper provides new algorithms for multivariate interpolation with an application to lowrank kernel matrix approximations. We summarize the main contributions in our paper:

We propose a method for multivariate function interpolation that combines Chebyshev polynomials and tensor compression techniques via the Tucker format. We analyze the error in the approximation which is comprised of the error due to interpolation and the error due to tensor compression. The resulting function approximations can be used as surrogates in a variety of applications including, but not limited to, uncertainty quantification and reduced order modeling.

We develop three novel randomized compression algorithms for computing low multirank approximations to tensors in the Tucker format. The first algorithm employs randomized interpolatory decomposition for each modeunfolding; we derive error bounds, in expectation, for this compression technique. The second algorithm uses the first approach but for a subsampled tensor, and requires fewer function evaluations. The third algorithm is similar to the first but uses random matrices formed using Kronecker products, and requires less storage and fewer random numbers to be generated. We also provide a detailed analysis of the computational costs. These algorithms are of independent interest to tensor compression beyond computing functional approximations.

We use the multivariate function approximations to develop lowrank approximations to kernel matrices which represent pairwise interactions. The resulting approximations are accurate, as demonstrated by detailed numerical experiments, and the computational costs are linear in the number of source and target points. To our knowledge, this is the first use of tensorbased function approximations to lowrank kernel matrix approximations.
We conclude this section with a brief organizational overview of the paper. In Section 2
, we review the material on Chebyshev interpolation, tensor compression, and randomized matrix algorithms. In Section
3, we propose a tensorbased method for approximating multivariate functions, which combines multivariate Chebyshev interpolation with randomized compression algorithms. We develop three randomized algorithms for compressing lowrank tensors and provide insight into the error in the interpolation. In Section 4, we apply the tensorbased algorithms developed in Section 3 to develop lowrank approximations to kernel matrices. We provide a detailed analysis of the computational cost of the resulting algorithms. In Section 5, we perform three sets of numerical experiments: the first highlights the performance of the tensorbased function approximation, the second demonstrates the accuracy of the lowrank approximations to kernel matrices with several different kernels, and the third is an application to Gaussian processes involving lowrank kernel matrices.2 Background
In this section, we provide some background on Chebyshev interpolation as well as tensor notation and decompositions. We also discuss the randomized matrix algorithms that will be important components in our algorithms.
2.1 Chebyshev interpolation
We begin by reviewing how to interpolate a function : using Chebyshev interpolation. The first step is to construct the Chebyshev points of the first kind in the interval ; these are given by for . Using the mapping : defined as
(2) 
we obtain the Chebyshev points in the interval as for . This mapping is invertible, with the inverse . Recall that for a function with , the Chebyshev interpolation polynomial of degree is
where is the Chebyshev polynomial of degree , and are the coefficients of the Chebyshev polynomials. The coefficients can be obtained using the interpolating conditions and the discrete orthogonality of the Chebyshev polynomials. If we define an interpolating polynomial as
(3) 
then the polynomial can also be expressed as
(4) 
The idea of interpolation using Chebyshev polynomials can be extented to multivariate functions as we explore in Section 3.1. The accuracy of Chebyshev interpolation has been studied extensively; see for example mason2002chebyshev ; trefethen2013approximation .
2.2 Tensor Preliminaries
We now introduce notation and background information for handling tensors. See kolda2009tensor for more details. A mode tensor is denoted as with entries for , .
Matricization
We can reorder the elements of a tensor to “unfold” it into a matrix, and this process is known as matricization or modeunfolding. A mode tensor (also called a th order tensor) can be unfolded into a matrix in different ways. For mode, the mode fibers^{1}^{1}1A mode fiber of a tensor is obtained by holding all indices except the th fixed. For example, the mode1 fibers of 3rd order tensor would be denoted . of a tensor are arranged to become the columns of a matrix. This matrix is called the mode unfolding, and is denoted by . We also denote the unfolding of modes through as .
Tensor product
One fundamental operation with tensors is the modewise product, a way of multiplying a tensor by a matrix. For a matrix , the mode product of with is denoted . The entries of this product are
Note that we can also express the mode product as the product of two matrices, as .
Tucker representation
The Tucker form of an mode tensor , given a target rank , consists of a core tensor and factor matrices , with , such that . For short, we denote this form as .
A popular algorithm for computing a Tucker representation of a tensor is the Higherorder SVD (HOSVD) de2000multilinear . In this algorithm, each mode of a tensor is processed separately and independently. For mode , the factor matrix is computed by taking the first
left singular vectors of
, where is the target rank. Then, the core tensor is formed as . The tensor compression algorithms we will present in Section 3 are related to the HOSVD algorithm. This can be computationally expensive for largescale problems, so we propose new randomized algorithms in Section 3, extending some of the methods proposed in minster2020randomized .2.3 Randomized Matrix Algorithms
In this section, we review some of the randomized algorithms (for matrices) that will be used in our later algorithms. The first of these is the randomized range finder, popularized in halko2011finding . Given a matrix , this algorithm finds a matrix
that estimates the range of
, i.e., . This is accomplished by drawing a Gaussian random matrix , where is the desired target rank, and is an oversampling parameter. We then form the product , which consists of linear combinations of random columns of . We next compute a thin QR of the result, , to obtain the matrix such that is a good approximation to . Taken together, this process gives the approximation . See steps 13 in Algorithm 1 for the details.The randomized range finder can be combined with subset selection (specifically pivoted QR, which we call the randomized row interpolatory decomposition (RRID). This algorithm, similar to randomized SVD with row extraction (halko2011finding, , Algorithm 5.2), produces a lowrank approximation to a matrix that exactly reproduces rows of . After the matrix is computed, we use pivoted QR on to pick out indices of wellconditioned rows of ; denote this by . We then form the matrix , which gives the lowrank approximation to as . In practice, we use the BusingerGolub pivoting algorithm (golub2013matrix, , Algorithm 5.4.1) but in Theorem 3.1 we use the strong rankrevealing QR factorization gu1996efficient , since among the rankrevealing QR algorithms this algorithm has the best known theoretical results. The details of this algorithm are presented in Algorithm 1, and will be used later in our tensor compression algorithms.
3 Tensorbased functional approximation
In this section, we present our tensorbased approach to approximate functions. First, we explain our approach in general, which combines multivariate Chebyshev interpolation and tensor compression techniques to approximate a function . We then propose three tensor compression methods that fit within our framework and the natural application of our approach to kernel approximations. Finally, we provide analysis of the computational cost of our approach as well as analysis of the accuracy.
3.1 Multivariate functional interpolation
Let us denote the Cartesian product of the intervals as
and let be a function that we want to approximate using Chebyshev interpolation. We can expand the function using Chebyshev polynomials as
where are the Chebyshev points of the first kind in the interval for and the interpolating polynomials have been defined in (3). Here we have assumed the same number of Chebyshev points in each dimension for ease of discussion, but this is straightforward to generalize. To match notation with (1), we have and .
For a fixed , we can express this in tensor form as
where the entries of are and
For a new point , we need only to compute the factor matrices . The core tensor need not be recomputed. However, storing the core tensor can be expensive, especially in high dimensions since the storage cost is entries. Therefore, we propose to work with a compressed representation of the core tensor . In (dolgov2021functional, , Section 2.5), the authors argue that for certain functions, the degree of the multivariate polynomial required for approximating a function to a specified tolerance can be larger than the multilinear rank (in each dimension). For these functions, this justifies a compression of the tensor which has a smaller memory footprint but is nearly as accurate as the multivariate polynomial interpolant.
Using the Tucker format, we can approximate where and for . Then,
where for with entries
The compressed Tucker representation can be obtained using any appropriate tensor compression technique (e.g., HOSVD, sequentially truncated HOSVD). However, computing this lowrank approximation using deterministic techniques is computationally expensive and requires many function evaluations. We propose randomized approaches to computing lowmultirank Tucker approximations, in order to both avoid the computational costs and reduce the number of function evaluations, associated with nonrandomized approaches.
3.2 Tensor compression techniques
We propose three new tensor compression methods in the context of compressing core tensor as defined above. The first uses the Randomized Row Interpolatory Decomposition (RRID), or Algorithm 1, on each mode unfolding of the tensor to obtain a lowrank approximation for each mode. We provide an analysis for this method as well. The second tensor compression method is similar to the first, but works with a subsampled tensor instead, thus reducing the computational cost. Our third method uses the Kronecker product of Gaussian random matrices instead of a single Gaussian random matrix while compressing , which reduces the number of random entries needed. These three methods produce structurepreserving approximations in the Tucker format as described in minster2020randomized ; this means that the core has entries from the original tensor and the decomposition (in exact arithmetic) reproduces certain entries of the tensor exactly.
3.2.1 Method 1: Randomized Interpolatory Tensor Decomposition
The first compression algorithm we present uses a variation of the structurepreserving randomized algorithm that we previously proposed in (minster2020randomized, , Algorithm 5.1) to compress core tensor . In mode , we apply Algorithm 1 with target rank and oversampling parameter to the mode1 unfolding to obtain the lowrank approximation
Here
has columns from the identity matrix corresponding to the index set
. This process is repeated across each mode to obtain the other factor matrices and the index sets . Finally, we have the low multirank Tucker representationIn contrast to (minster2020randomized, , Algorithm 5.1) in which we used sequential truncation, our lowrank approach can be described as the structurepreserving HOSVD algorithm. To analyze the computational cost of this algorithm, assume that the target tensor rank is . First, we discuss the cost of tensor compression. We apply the RRID algorithm to each mode unfolding of size . This costs flops. The total cost is flops.
Error Analysis
We now provide a result on the expected error from compressing a tensor using Algorithm 2. Note that this theorem assumes we are using strong rankrevealing QR (sRRQR) in the RRID algorithm instead of pivoted QR. This proof is very similar to the proof of (minster2020randomized, , Theorem 5.1), so we leave the proof to Appendix A.
Theorem 3.1
Let be the rank approximation to mode tensor which is the output of Algorithm 2. We use oversampling parameter such that , and strong rankrevealing QR factorization (gu1996efficient, , Algorithm 4) with parameter . Then the expected approximation error is
where .
The interpretation of this theorem is that if the singular values of the modeunfoldings
decay rapidly beyond the target rank, then the lowrank tensor approximation is accurate. See the discussion in (minster2020randomized, , Section 5.2) for further discussion on the interpretation.3.2.2 Method 2: Randomized Interpolatory Tensor Decomposition with Block Selection
Our second compression algorithm aims is a variant of the previous approach, which reduces the number of function evaluations and the computational cost, by working with a subsampled tensor rather than the entire tensor. Recall that we have assumed that the number of Chebyshev grid points in each spatial dimension is the same. Suppose we are given an index set with cardinality representing the subsampled fibers. In the first step of the algorithm, we process mode . We consider the subsampled tensor and apply the RRID algorithm to obtain the matrix and the index set . We then have a lowrank approximation to as
Here has columns from the identity matrix corresponding to the index set . This process is repeated across each mode to obtain the other factor matrices and the index sets . Finally, we have the lowrank Tucker representation
The summary of all steps for the second variation is described in Algorithm 3.
Compared with Method 1, in each mode we are working with the subsampled version of the mode unfolding rather than the entire modeunfolding. In other words, by setting we can recover Method 1. To subsample, we use the nested property of the Chebyshev nodes xu2016chebyshev . More precisely, let the Chebyshev nodes
Then ; i.e., the th point is the th point . Therefore, assuming that is a multiple of , to subsample by 1 level we take . To subsample more aggressively, by levels, we take the indices to be and .
Now consider the computational cost. At the first step, the unfolded matrix has dimensions . Applying RRID for each mode requires flops; the total cost for the entire tensor decomposition is therefore . The number of function evaluations are ; in practice, this is much smaller than since we choose .
3.2.3 Method 3: Randomized Kronecker Product
In the third compression algorithm, we are essentially computing RRID along each mode for the tensor compression step; however, the main difference is that random matrix is generated as the Kronecker product of Gaussian random matrices. More specifically, consider independent standard Gaussian random matrices . Here is the target tensor rank and is the oversampling parameter. In mode 1, we form the tensor
and compute the left singular vectors of the mode1 unfolding to obtain the factor matrix . We call this the Randomized Kronecker Product approach since using the definition of mode products
To obtain the factor matrices , we follow a similar procedure along modes through . Finally, to obtain the core tensor we compute . The rest of the algorithm resembles Method 1 and is detailed in Algorithm 4. The asymptotic cost of Method 3 is the same as Method 1. However, a potential advantage over Method 1 is the reduction in the number of random entries generated. In Method 1, in using a RRID along each mode, we need to generate random numbers, whereas in Method 3, we only need to generate random numbers.
3.3 Error analysis
We now analyze the accuracy of the proposed algorithmic framework. Since there are several approximations (interpolation error, and errors due to tensor compression) involved, it is difficult to give sharp estimates of the error, but we aim to provide some insight into the error analysis.
We want to first understand the effect of the tensor compression on the overall approximation error. To this end, we use the triangle inequality to bound
The first term in the error is due to Chebyshev polynomial interpolation and can be bounded using classical results from multivariate approximation theory. Assuming that the function satisfies the assumptions in (gass2018chebyshev, , Theorem 2.2), then by Corollary 2.3 in that same paper , where and are positive constants with . To bound the second term, we use the CauchySchwartz inequality to obtain
We then use the vector inequality ; the terms can also be bounded using results from interpolation; since the terms are the same as the Lagrange basis functions defined at the appropriate interpolation nodes, we can bound
where is the Lebesgue constant due to interpolation at the nodes . For Chebyshev interpolating points, as is the case in our application, it is well known that (see (trefethen2013approximation, , Chapter 15)) for the Chebyshev points for . This gives , using which we can derive the error bound
The important point to note is that there are two sources of error: the error due to Chebyshev interpolation which decreases geometrically with increasing number of Chebyshev nodes, and the error due to tensor compression is which is amplified by the term . In numerical experiments, we have observed that the error due to the tensor compression was sufficiently small, so the amplification factor did not appear to have deleterious effects.
It is desirable to have an adaptive approach that produces an approximation for a given accuracy; we briefly indicate a few ideas to develop an adaptive approach. However, we do not pursue the adaptive case here further and leave it for future work. If the number of Chebyshev points is not known in advance, an adaptive approach can be derived that combines the methods proposed here with the computational phases as described in hashemi2017chebfun ; dolgov2021functional . We can start with a small estimate for , we can use the fact that the Chebyshev points of the first kind are nested. We can start with a small value of (say ), and increase by multiples of until a desired accuracy is achieved (by evaluating the accuracy on a small number of random points); since the points are nested, this avoids recomputing a fraction of the function evaluations. If the target rank is not known in advance, we can use the techniques described in (minster2020randomized, , Section 4) to estimate the target rank such that the lowrank decomposition satisfies a relative error bound involving a userspecified tolerance.
4 Application: Lowrank Kernel approximations
In this section, we use the functional tensor approximations developed in Section 3 to compute lowrank approximations to kernel matrices.
4.1 Problem Setup and Notation
Rankstructured matrices are a class of dense matrices that are useful for efficiently representing kernel matrices. In this approach, the kernel matrix is represented as a hierarchy of subblocks, and certain offdiagonal subblocks are approximated in a lowrank format. There are several different types of hierarchical formats for rankstructured matrices, including matrices, matrices, hierarchically semiseparable (HSS) matrices, and hierarchically offdiagonal lowrank (HODLR) matrices. For more details on hierarchical matrices, see hackbusch1999sparse ; hackbusch2002data ; grasedyck2003construction ; borm2003hierarchical ; hackbusch2015hierarchical ; sauter2010boundary . If the number of interaction points is , then the matrix can be stored approximately using entries rather than entries, and the cost of forming matrixvector products (matvecs) is flops, where are nonnegative integers depending on the format and the algorithm for computing the rankstructured matrix. The dominant cost in working with rankstructured matrices is the high cost of computing lowrank approximations corresponding to offdiagonal blocks, which we address here using novel tensorbased methods.
Before showing how to compute these offdiagonal blocks efficiently, we define the problem setup that we will use in the rest of this chapter. Let and be and source and target points, respectively. These points are assumed to be enclosed by two boxes, , where is the dimension, defined as
(5)  
such that and . These bounding boxes are plotted in Figure 1 for visualization in two spatial dimensions (i.e., ).
Then, for the kernel , let the interaction matrix be defined as
(6) 
We can only represent blocks in lowrank form if those blocks are admissible. Note that the diameters of the bounding blocks and are
(7) 
We define the distance between these blocks as the minimum distance between any two points from and . Specifically,
(8) 
The domains and are considered strongly admissible if they satisfy
(9) 
where is the admissibility parameter. This means two sets of points are strongly admissible if they are sufficiently wellseparated. Two blocks are weakly admissible if there is no overlap between them. Typically, we will assume that the blocks and are either strongly or weakly admissible. However, in Section 5.3 we discuss how to compute a global lowrank approximation assuming only that and that the kernel is sufficiently smooth (and that the blocks are not admissible).
Given points and , we seek to compute the pairwise kernel interactions between the sets of points, i.e., . This is expensive both from a computational and storage perspective. If we simply compute the kernel interaction for each pair of points, this requires kernel interactions and the cost of storage is . As and can get quite large, computing can become computationally challenging. Assuming that the points are wellseparated so that the kernel interactions between the source and targets can be treated as a smooth function, the matrix can be approximated in a lowrank format. However, computing this lowrank approximation is computationally expensive. To this end, our goal is to compute an approximation to with a cost that is linear in and , which we accomplish using multivariate Chebyshev interpolation.
4.2 Lowrank approximation of
Suppose we have the boxes denoting the bounding boxes for the sources and the targets respectively that are wellseparated so that the kernel function can be viewed as a smooth function of the form
That is, we identify and for . Similarly, and for . This allows us to approximate the function , and, therefore, the kernel by a multivariate Chebyshev polynomial approximation
following the notation in Subsection 3.1 for . We can use the multidimensional Chebyshev approximation to obtain a lowrank representation for as follows.
Lowrank approximation
We are now given the set of points and . Let us define the sequence of matrices and such that
for . Let define the rowwise KhatriRao product. Using these sequences, we define the following factor matrices to be the rowwise KhatriRao product of the matrices and in reverse order
This gives the lowrank approximation (see Figure 3 for a visual representation)
(10) 
where is an unfolding of the tensor . The storage cost of this representation is , assuming and are stored in terms of factors , and are not formed explicitly.
Compressed lowrank representation
Suppose we compress using the methods discussed in Section 3 to obtain with multirank (here ), we can write the kernel approximation as
Comments
There are no comments yet.