1 Introduction
Identification of sufficient sampling density is an essential and ongoing challenge in data science and function modeling. For any problem context, too little data raises concerns of overfitting while too much data risks underfitting and inefficient computational pipelines. While theorems and error estimates can provide rough bounds on requisite sampling density, more often density is selected by heuristics, trial and error, or rules of thumb.
In this work, we introduce the Delaunay density diagnostic, a computational technique that can help identify the presence and scale of geometric features in a function as values of are iteratively collected. By geometric features we mean any nonlinear behavior that is not noise, and hence the presence of a feature requires sufficient sampling density to distinguish it from noise. More formally, the goal of the diagnostic is to provide a robust computation of the rate at which a piecewise linear interpolant of changes as additional sample points are incorporated in batches of userspecified size.
A simple illustration of the challenges present in this effort can be seen by examining the Griewank function on (), defined by
At a large scale, e.g. , the quadratic term is dominant and only an extremely dense sampling of the interval would be able to distinguish the cosine term from random noise. At a small scale, e.g. , the cosine term is dominant and even a very dense sampling of the interval would be insufficient to pick up the quadratic term behavior. At intermediate scales, the interplay between sampling density and feature representation is more subtle, motivating the need for a computational approach to the problem.
We first describe the Delaunay density diagnostic in the simplest context and in a manner slightly distinct from how it is actually implemented. Assume we can rapidly compute the output of a function for any . For now, we assume there is no uncertainty or significant numerical error in the output computation, as can be the case when
is given by an analytical formula or a trained neural network of mild complexity. The following must then be specified by the user:

A set of query points .

A dimensional bounding box , such that .

An initial set of sample points , drawn randomly from .

An upsampling growth factor .
In the simplest case, we choose the query points to be a regular, axisaligned lattice of points forming a cube in , for some , and to be a box with the same center as the lattice but larger diameter. The main computation is an iterative procedure:

For each query point , compute and store , the unique Delaunay piecewiselinear interpolant defined by , evaluated at .

Let . Let be the integer closest to . Randomly sample more points from and add these points to the set of samples so that .

Repeat until exceeds a specified threshold.
If some lies outside the convex hull of the at a step of the process, the Delaunay interpolant is not defined and we just store a nan value. This case can be avoided by increasing the diameter of or increasing the initial number of points in .
Observe that the values of may change at each step, since the introduction of new samples may change the sample points defining the computation of . We use this observation to compute an approximate rate of convergence of the interpolants to the true function over the region defined by the query points. Let denote the interpolated value at at the th step of the iterative procedure. For each , we compute a rate defined by
Here, denotes the mean squared difference over the index , which provides a measure of the difference between successive interploants in the procedure; the square root of the mean squared difference is a discretization of the norm. The definition of is inspired by the computation of convergence rates for finite element methods over a sequence of increasingly refined meshes [4]. We now make the following claims, which will be supported by numerical and analytical results later in the paper:
Claim 1
The mean squared difference rate will be approximately if and only if the set used to define contains sufficient information to reconstruct the geometric features present in over the set of query points .
Claim 2
The mean squared difference rate will be approximately if and only if the set used to define cannot distinguish geometric features from random noise over the set of query points .
We envision the results of these claims being relevant to the multitude of scientific machine learning problems in contemporary literature where the goal is to approximate some unknown function based on unstructured numerical data. Many such problems have inputs in
for , making visualization difficult, but not, making techniques designed for highdimensional data not necessarily applicable. In such settings, there are few computational techniques for robust identification of sufficient sampling density and hence the proposed diagnostic could aid in determining whether a given sampling procedure is insufficient, sufficient, or excessive in the context of a specific problem or applicaiton. We note, however, that sampling density is important throughout computational mathematics and the technique has no intrinsic relation to machine learning.
2 Context in the literature
Techniques for fitting a function to unstructured numerical data has been studied in multiple disciplines for many decades. Some prominent examples include the Kriging interpolation method [25, 26, 27]
[5, 6], response surface methodology [15], and generalized linear models [21, 19]. These methods were developed to serve specific needs in engineering and statistics communities but have trouble scaling to the size and dimension of modern datasets.Perhaps the most closely related research area—and in fact the inspiration for this work—is the notion of data oscillation in finite element methods (FEM); see e.g. [20]. In FEM, the goal is to approximate the solution
to a partial differential equation
by creating a piecewise polynomial with respect to a mesh of the domain. Data oscillation, , is a quantity that measures the variation of over a fixed mesh . If is large, has fine scale geometric structures that cannot be resolved by approximation with respect to , meaning a finer mesh must be used. Hence, attaining optimal rates of convergence in adaptive FEM (in which meshes are partially refined in an iterative process) depends on controlled data oscillation; control can be attained by assumption, by luck, or by computational detection and mesh refinement, often based on the classical theory of Richardson extrapolation [23, 24].The issue of functiondependent sampling density requirements is by no means restricted to the finite element world. Metrics to assess data variation with respect to sample points have been devised for applicationspecific contexts but have no standard nomenclature. These include the “grid convergence index”[7] and “index of resolution quality” [8] for computational fluid dynamics, “local feature size” for homeomorphic surface reconstruction from point clouds [2], and the “coefficient of variation” for LIDAR sampling [14]. The list could go on.
This work introduces a methodology for assessing data variation on unstructured numerical data with input dimension up to
. Our approach is not meant to generalize or replace the applicationspecific techniques and metrics described above. Rather, we aim to provide users working with high volume but relatively lowdimensional numerical datasets a means to assess data variation and sampling density in a rigorous, applicationagnostic framework. Such datasets are now ubiquitous in scientific disciplines, but analyzing function variation with respect to a mesh is often stymied by the socalled “curse of dimensionality.” For
, it quickly becomes infeasible to compute, store, or manipulate the complete mesh structure of a collection of unstructured data points.As we will demonstrate, the lack of scalability of mesh management can be circumvented for interpolation tasks if the interpolated value can be determined using only a sparse subset of an implied—but not computed—mesh data structure. Delaunay theory provides the requisite mathematical results for an implied mesh structure and the recently developed algorithm DelaunaySparse provides a practical tool for such computations.
2.1 Delaunay interpolation
Let be a multivariate function whose outputs are known at a collection of data points . Assume that is truly a
dimensional sample in the sense that it does not lie entirely in a hyperplane of
; we remark that if there was such a hyperplane, the interpolation algorithm would detect this. Then, the convex hull of , denoted , is an dimensional, flatfaced, convex region in . The Delaunay triangulation, denoted , is an unstructured mesh of consisting of simplices with vertices in that satisfy the open circumball property, which is described in the caption of Figure 1. The Delaunay triangulation exists and is unique, except for some corner cases, which can still be handled robustly.^{1}^{1}1For instance, data stored on a rectilinear lattice does not have a unique Delaunay triangulation, but in such situations, the geometry of the data is known a priori. For a randomly generated data set in, as is employed here, the probability of a unique Delaunay triangulation is 1.
The Delaunay triangulation can be used to define a unique piecewise linear interpolant, called the Delaunay interpolant, of the values . The Delaunay triangulation is widely considered to be an optimal simplicial mesh for the purposes of multivariate function interpolation [11, 22], and the Delaunay interpolant (defined below) has been studied in the context of both regression [16, 17] and classification [3] problems.
Given any point , the Delaunay interpolant at is defined as follows. Let be the simplex in with vertices , , such that . Then there exist nonnegative weights , , such that and . The value of the Delaunay interpolant at is given by
(1) 
The approximation defined by is a continuous, piecewise linear interpolant of for . In particular, if lies at the interface of multiple simplices in , the value of is not dependent on the choice of simplex used to compute it.
2.2 The DelaunaySparse algorithm
While computing a data structure for the full is not computationally feasible beyond very low dimensions, the value of only requires detection of a simplex in that contains . DelaunaySparse is a recently developed algorithm and software package that exploits this observation to provide efficient computation of the Delaunay interpolant in high dimensions at a userprovided set of input points. Algorithm 1 outlines the general strategy of DelaunaySparse; further details can be found in [9, 10].
Note that the cost to build the initial seed simplex is , where , and the cost to compute each subsequent simplex is . Therefore, the total cost of Algorithm 1 is , where
is the number of flips required. Empirically, for uniformly distributed
, tends to be a superlinear but subquadratic function of and independent of [10]. In particular, typically so that Algorithm 1 is effectively .While Algorithm 1 only allows interpolation points in the geometric sense, it can also extrapolate a value for a point outside . For this, the point is projected onto the convex hull by solving a quadratic programming problem, and the resulting projection is interpolated on the face of a simplex in via Algorithm 1. In this work, we detect and flag when a query point corresponds to geometric extrapolation for a given sample set, but intentionally set up experiments so that extrapolation does not occur.
3 Computing the Delaunay density diagnostic
We now describe in detail how the Delaunay density diagnostic is computed. As outlined in Section 1, the user specifies the following: a set of query points ; a dimensional bounding box , such that ; an initial set of sample points , drawn randomly from ; and an upsampling growth factor . In addition, a stopping criterion should be specified, in the form of a maximum number of upsampling iterations and/or a maximum size for the set of sample points. The algorithm is as follows:
Each output from Algorithm 2 is an estimate of the rate () at which piecewise linear interpolants are converging to at the query points when samples are drawn from . Since Algorithm 2 involves random sampling, we can run it multiple times with different initial random seeds to generate a distribution of outputs. A collection of outputs , , can then be used to assess the accuracy of the estimates and test sensitivity to the randomized aspect of the algorithm.
Computing rates in other norms
To compute the rate in a different norm than mean squared difference (MSD), only a few lines from Algorithm 2 need to be modified. An informative alternate rate to consider is the mean squared difference of the gradients of the interpolants, presented in Algorithm 3. We call this rate gradMSD, for clarity, but note that it is essentially a discrete version of the Sobolev seminorm , just as the square root of MSD is a discrete version of the norm.
We briefly comment on two key changes from Algorithm 2 to Algorithm 3. First, the calculation of in line 4 can be done accurately and robustly since is piecewise linear. Details of this computation are provided in Appendix A. Second, we interpret the term norm used in line 9
as the regular Euclidean norm for vectors in
, which is available as numpy.linalg.norm in Python.We have equivalent claims to creftype 1 and creftype 2 for the gradMSD rate; the only change is the expected value for in each case, namely, 1 replaces 2 and 1 replaces 0.
Claim 3
The gradMSD rate will be approximately if and only if the set used to define contains sufficient information to reconstruct the geometric features present in over the set of query points .
Claim 4
The gradMSD rate will be approximately if and only if the set used to define cannot distinguish geometric features from random noise over the set of query points .
Other norms could be considered as well. For instance, computing rates in for could be assessed by replacing MSD with “mean th power differences” and the square roots with th roots in line 9 from Algorithm 2. The rate could be similarly accomodated. Likewise, convergence in the Sobolev norm can be attained by using the norm instead of the norm for the term norm used in line 9 of Algorithm 3. We leave study of these possibilities for future work.
4 Numerical results
We have implemented and tested the feasibility of the Delaunay density diagnostic in various scenarios for data in from to . After collecting the data, we convert the number of samples at the th step, , to average sample spacing by
(2) 
where is the average side length of the bounding box .^{2}^{2}2For simplicity, we have always taken to be a dimensional cube in ; the definition of could be modified if a more complicated choice of was desired. The values of average sample spacing are used for the horizontal axes in our figures. Thus, the smallest average sample spacing in each plot corresponds to the largest value attained before the stopping criteria.
To assess the sensitivity of Algorithm 2 and Algorithm 3 to the location of points , we use a seed to initialize a random number generator. At the start of the code, we set rng=numpy.random.default_rng(globalseed), where globalseed is a userspecified integer; we then use rng to generate samples
subsequently in the code. After running the code with multiple seeds, we report the “mean rate” computed over all seeds, as well as the interquartile and interdecile range for the computed rates, thereby providing an estimate of uncertainty in the computation.
(a)  (b)  (c)  (d) 
Distinguishing features from noise
We begin by demonstrating that the rate computations can reliably distinguish features from noise for analytic functions , shown in Figure 2. For these examples, we set the query points to be a uniformly spaced grid including the corners of for (a) and (b), for (c), and for (d). The bounding box is taken to be a square centered at the origin with side length 25% longer than the query point lattice, e.g., for (a). The upsampling growth factor is 1.4641. We initialize Algorithm 2 and Algorithm 3 with and use a stopping criterion of .
When calling DelaunaySparse (Algorithm 1), we compute and pass values of for the current collection of sample points . In (a), we assign to be a random number drawn uniformly from . In (b)(d), we assign , where is the Griewank function [13] on , given by:
(3) 
The top row of Figure 2 shows a visualization of for each case, over a domain matching the region defined by the query points.
The middle and bottom rows show the MSD and gradMSD rates, respectively, as a function of average sample spacing.
Here, we have filtered out outputs from the code with , i.e. larger average sample spacings.
These outputs had larger variations in values due to the small number of points involved, which distracted from the success of the method for larger values.
We will discuss this issue further later.
Thus, in Figure 2, the average sample spacing values correspond to ,
from left to right, in each graph.
The caption of Figure 2 describes high level takeaway messages from each experiment.
Ackley function 
Effect of the upsampling factor
We next examine the effect of varying , the parameter that controls the rate at which samples are added during each iteration (line 11 from Algorithm 2). For this experiment, we fix to be the Ackley function [1], visualized in Figure 3, given by the formula
Like the Griewank function, the Ackley function is commonly used to test optimization algorithms due to its dense collection of local extrema at a fine scale. We again fix the query points to be a uniformly spaced grid including the corners of and set . We initialize each trial with and use a stopping criterion of .
We carried out 100 trials for three different values of the upsampling growth factor : 1.1, 1.21, and 1.4641, shown in Figure 3.
As in the previous example, we filter out results with very small values (in this case ) due to the wide variation in values, as evidenced by the larger error bars on the right side of each graph.
Regardless, the trend picked up by the mean rate in each experiment is quite clear.
Recall the definition of from Eq. 2.
Features of the Ackely function are recoverable for , fine scale features register as noise for , and are negligible for .
Since the period of the oscillations is 1 in each coordinate, these findings are consistent with the expected recovery of oscillations from sampling each oscillation densely (), sampling each oscillation very sparsely (), or not sampling most oscillations ().
Experiments in higher dimensions
While many Delaunay methods for computational geometry apply exclusively to data in or , the Delaunay density diagnostic algorithms have no formal restriction on input dimension . We explore practical considerations of assessing data sets with dimension 4 or higher by a series of experiments with the Griewank functions, defined in (3). While multiple parameters must be selected in order to run the code, each parameter has a clear geometric interpretation of how it affects the computational cost and accuracy of the result, as we will explain. The results of the experiments are shown in Figure 4; the parameters used are given in Table 1 and explained below.
For , 3, 4, we fix a uniformly spaced dimensional lattice of query points, centered at the origin in . For , we use and for we use . Note that can be any positive integer ( corresponds to a single query point), however, the size of will become a main driver of computational cost as increases. Letting denote the side length of the query point lattice, and the side length of the bounding box to be used (as in Eq. 2), we define
For , , we use and for we use , with the bounding box always centered at the origin. The purpose in decreasing qpdf for larger is to reduce the probability that a query point lies outside the convex hull of the samples , a case we exclude from consideration in this work, per the discussion at the end of Section 2.2.
We select as indicated in Table 1. The effect of on accuracy was discussed above and in Figure 3. We initialize each trial with for and 5000 for and set a stopping criteria of or . After collecting results, we filter out small values to produce the min and max values indicated in Table 1. From these values, we compute and then determine the number and extent of scaling needed so that will have range across all experiments.
dim  qpdf  # scales  # trials  

2  1.4641  400  0.8  3,947  173,832  4  100 
3  1.2  8,000  0.8  8,400  69,321  10  25 
4  1.3  10,000  0.6  33,423  89,109  14  25 
We use high performance computing resources at Lawrence Livermore National Laboratory to compute all of our examples. Each trial was run on a single node with 64 GB memory, consisting of 16 cores. We call the builtin “level 1 parallelism” feature of DelaunaySparse, which exploits a speedup strategy on the loop over query points; this approach temporarily stores discovered parts of the Delaunay mesh structure to accelerate subsequent interpolation queries. With these resources and parameter choices, the wall clock time per trial was 45 seconds, 8.5 minutes, and 11.1 minutes in , 3, and 4, respectively. Thus, the cumulative compute time to produce Figure 4 was approximately 5 hours, 35 hours, and 65 hours for , 3, and 4.
The reported run times indicate the feasibility of the algorithm in higher dimensions, not an actual assessment of its scalability with respect to dimension. We did not attempt to optimize the code for speed. In practice, the dimension of the data is fixed by the application context. A user would only need to adjust the other parameters—i.e., the column labels of Table 1—to ensure the error bounds met their requirements and the compute time fit within their resource capabilities.
Fixed dataset in
We now demonstrate how the algorithm can be modified to assess sampling density of static, existing datasets. At a high level, the only major change required to employ Algorithm 2 or Algorithm 3 on a static dataset is a change to the random point generation process. No bounding box is used. Instead, the index for the static dataset is randomly shuffled and the initial collection of sample points is defined to be the first points indicated by the shuffled index. Subsequent additions to are attained by including the next points according to the shuffled index. This process emulates the random selection of points, avoids drawing duplicates from the dataset, and is limited by the size of the dataset. We summarize this modification in Algorithm 4.
We apply Algorithm 4 to a dataset consisting of input–output pairs from HYDRA, a multiphysics simulation code developed at LLNL over the past twenty years that informs experiments at the National Ignition Facility [18]. In the dataset, there are 17,450 pairs where and is a quantity of interest. Physicists familiar with the problem context have indicated that the response of is primarily dependent on only five of the eight inputs. Thus, we filter out the three lessimportant input variables, reducing inputs to . The filtering process creates “nearduplicates” in the dataset, i.e., points and where but , with small enough that DelaunaySparse marks them as identical. We fix a tolerance level , then identify clusters of points within the dataset such that each point in a cluster is within of some other point in the cluster. For clusters with more than one point, we keep the mean of the points and values as a “new” data point and discard the points defining the cluster.^{3}^{3}3Implementing the averaging is a simple groupby(...).mean() operation using the pandas library. After these operations, we have reduced the dataset to 13,016 points, whose collection of inputs are sufficiently distinct for calls to DelaunaySparse.
To construct a query point lattice that lies inside the convex hull of the input points, we compute the 25th and 75th percentiles for each of the five input coordinates. These percentiles define an interval for each dimension, from which we can build a lattice centered around the mean of the dataset. We use six points per dimension, for a total of query points. The notion of qpdf does not extend directly to the static data case since there is no bounding box, however, we believe that selecting the 25th and 75th percentiles as bounds yields for this data set.
In Figure 5, we show the results of our experiments for , , , and # seeds = 20. With these options, the wall clock time was approximately 13.5 minutes per trial using the same compute resources that produced Figure 4. While the MSD rate has relatively large variation—due in part to the small value of —the gradMSD rate displays a trend similar to the Ackley function from Figure 3. In particular, the experiments suggest that the data set has sufficient density at the smallest possible value, corresponding to using all of the 13,016 points available.
Also of note, the largest value shown in Figure 5, which corresponds to , has many trials in which the computed MSD rate is significantly larger that the dashed green line for the recoverable feature rate. From other experiments, we found that this phenomenon occurs when the data collection is very close to linear (or piecewise linear, with large pieces). Such a finding is not surprising: with 211 points in , we have roughly points “per dimension” from the samples, which is insufficiently dense to reliably capture any significant features in the data. Accordingly, as increases (equivalently decreases), the features begin to be detected as noise and eventually, for the smallest value, as recovered features.
5 Analytical results
Our numerical results provide evidence for creftype 1, creftype 2, creftype 3, and creftype 4. We now sketch out theoretical support for these claims by employing orderofmagnitude analysis for the quantities appearing in Algorithm 2
We start with the case of smooth functions, where the sampling resolves all of their features (creftype 1). For such functions, the linear interpolant is a reasonable approximation up to quadratic order. In particular the difference between successive interpolants will behave as , where refers to a discrete approximation to any norm (e.g., ) and is the average sample spacing (the subscript reminds us of the iteration for which it is computed). Taking ratios of norms on successive iterations gives us
(4) 
where the last step uses the approximate relation between the number of points and the upsample rate (i.e., line 11 from Algorithm 2). Taking of the expressions in (4) gives , verifying Claim 1.
With additional notation, we can further formalize the above argument. We write the difference between the linear interpolant (at iteration ) and the true function as
(5) 
Here the point is in the dimensional simplex of the Delaunay triangulation, the discrepancy coefficients depend implicitly on the simplex containing , and the average edge length in , , has been factored out. This construction guarantees (in a suitable average sense) that the depend only weakly on the refinement iteration , .
We now write , which introduces the weight function that connects the overall average edge length in the Delaunay mesh to the local average edge length within . The weight function will be larger in regions of sparse sampling (large simplices) and smaller in regions of dense sampling (small simplices), averaging out to , in a suitable sense.
We are now ready to construct a discrete distance between successive linear interpolants using the query points ; a similar argument holds for any discrete norm. We have:
(6) 
where we have used . Since and (on average) depend only weakly on , we expect the only dependence of to come from . This can be used to more rigorously define the average sense in which and are independent of . If we further assume the term dominates the inner sum, we can conclude that
(7) 
confirming our initial estimate from Equation (4).
A similar argument can be made using the gradient of the linear interpolant (and its corresponding norm). The discrepancy between linear interpolants would scale linearly rather than quadratically. This causes the ratio of norms to be approximately , and the lograte to be , as required for creftype 3.
We now turn to the case of noisy or highlyoscillatory functions (creftype 2). With insufficient sampling, the linear interpolant will be a poor approximation to the function and will oscillate from iteration to iteration with the amplitude of the noise, which we write as :
(8) 
Taking the log of the above expression to compute the rate gives us a rate of , verifying creftype 2.
A similar argument holds for the rate of the gradient norm.
In this case, the gradient estimate gets worse with refinement, scaling as .
Thus, the ratio of successive norms will go as and of the ratio will go to , as required for creftype 4.
Analysis of upsampling factor
Finally, we comment on the trends with upsampling uncovered in Figure 3. Equation (7) deviates from due to the presence of cubic (and higher order) terms in (5). If we write this deviation as , then the rate becomes
(9) 
where we have made the approximation that the deviation is small.
From (9), we see that the deviation from the recoverable features MSDrate (i.e., ) gets worse as . This causes stronger fluctuations around the estimated rate for smaller , as seen in Figure 3. We can approximate how much additional sampling would be needed to meaningfully reduce these fluctuations. Suppose we are upsampling near , so that . Since the deviation is sourced by a cubic correction to the rate, we can estimate it as (the leading order contribution, , already involved two powers of , leaving just one power remaining in ). If we want small fluctuations around for the rate, we would need , which simplifies approximately to . So, reducing by a factor of two would require a factor of greater total sampling (halving the average sample spacing ) to get the same level of fluctuations around the target rate. Thus, our analysis confirms that fluctuations in the computed rates can be controlled by increasing , increasing , or both.
6 Conclusions and extensions
We have demonstrated in this paper how the convergence rate of iteratively refined, piecewise linear approximations to a function can be used to assess if the function has been sampled densely enough relative to its variation. Our computational technique eschews nearly any assumption on as it detects pure noise and undersampled oscillations as equivalent phenomena. Many extensions of the approach are plausible, including assessing convergence in other norms (as mentioned previously) and consideration of higher dimensional data sets, additional static data sets, discontinuous functions, functions with singularities, timedependent functions, and so forth.
To aid any interested parties in exploring these and other directions, we will soon release Python code that replicates the numerical results shown in Figure 4 for . The parameters will be adjusted so that the requisite data and figure can be generated in minutes using a typical laptop with a standard modern python environment. The code will be linked from the first author’s web page [12]. We anticipate releasing a more generalpurpose codebase in the future.
Acknowledgments
We would like to thank PeerTimo Bremer, Luc Peterson, Brian Spears, and many other colleagues at Lawrence Livermore National Laboratory for providing feedback, encouragement, and financial support for this research. We would also like to thank Tyler Chang at Argonne National Laboratory for many helpful discussions regarding the DelaunaySparse software.
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE–AC52–07NA27344 and the LLNLLDRD Program under Project tracking No. 21–ERD–028. This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes. Release number LLNL–JRNL–832470.
References
 [1] D. Ackley, A connectionist machine for genetic hillclimbing, vol. 28, Kluwer Academic Publishers, 1987.
 [2] N. Amenta, S. Choi, T. K. Dey, and N. Leekha, A simple algorithm for homeomorphic surface reconstruction, in Proceedings of the sixteenth annual symposium on Computational geometry, 2000, pp. 213–222.
 [3] M. Belkin, D. J. Hsu, and P. Mitra, Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate, in Advances in Neural Information Processing Systems 31, Curran Associates, Inc., 2018, pp. 2300–2311.
 [4] S. C. Brenner and L. R. Scott, Polynomial approximation theory in Sobolev spaces, in The Mathematical Theory of Finite Element Methods, Springer New York, New York, NY, 1994, pp. 91–122.
 [5] M. D. Buhmann, Radial basis functions, Acta numerica, 9 (2000), pp. 1–38.
 [6] M. D. Buhmann, Radial basis functions: Theory and implementations, vol. 12, Cambridge University Press, 2003.
 [7] I. Celik and O. Karatekin, Numerical experiments on application of Richardson extrapolation with nonuniform grids, ASME Journal on Fluids Engineering, 119 (1997), pp. 584–590.
 [8] I. B. Celik, Z. N. Cehreli, and I. Yavuz, Index of resolution quality for large eddy simulations, Journal of Fluids Engineering, (2005), pp. 949–958.
 [9] T. H. Chang, L. T. Watson, T. C. Lux, A. R. Butt, K. W. Cameron, and Y. Hong, Algorithm 1012: DELAUNAYSPARSE: Interpolation via a sparse subset of the Delaunay triangulation in medium to high dimensions, ACM Transactions on Mathematical Software (TOMS), 46 (2020), pp. 1–20.
 [10] T. H. Chang, L. T. Watson, T. C. H. Lux, B. Li, L. Xu, A. R. Butt, K. W. Cameron, and Y. Hong, A polynomial time algorithm for multivariate interpolation in arbitrary dimension via the Delaunay triangulation, in Proceedings of the ACMSE 2018 Conference, Richmond, Kentucky, 2018, ACM, pp. 12:1–12:8.
 [11] L. Chen and J. Xu, Optimal Delaunay triangulations, Journal of Computational Mathematics, 18 (2004), pp. 299–308.
 [12] A. Gillette, github.io webpage. https://akgillette.github.io/, 2022.
 [13] A. O. Griewank, Generalized descent for global optimization, Journal of optimization theory and applications, 34 (1981), pp. 11–39.
 [14] Q. Guo, W. Li, H. Yu, and O. Alvarez, Effects of topographic variability and LIDAR sampling density on several DEM interpolation methods, Photogrammetric Engineering & Remote Sensing, 76 (2010), pp. 701–712.
 [15] A. I. Khuri and S. Mukhopadhyay, Response surface methodology, Wiley Interdisciplinary Reviews: Computational Statistics, 2 (2010), pp. 128–149.
 [16] Y. Liu and G. Yin, Nonparametric functional approximation with Delaunay triangulation learner, in 2019 IEEE International Conference on Big Knowledge (ICBK), Beijing, China, 2019, IEEE, pp. 167–174.
 [17] T. C. H. Lux, L. T. Watson, T. H. Chang, Y. Hong, and K. Cameron, Interpolation of sparse highdimensional data, Numerical Algorithms, 88 (2021), pp. 281–313.
 [18] M. M. Marinak, G. Kerbel, N. Gentile, O. Jones, D. Munro, S. Pollaine, T. Dittrich, and S. Haan, Threedimensional HYDRA simulations of National Ignition Facility targets, Physics of Plasmas, 8 (2001), pp. 2275–2280.
 [19] P. McCullagh and J. A. Nelder, Generalized linear models, Routledge, 2019.
 [20] P. Morin, R. H. Nochetto, and K. G. Siebert, Data oscillation and convergence of adaptive FEM, SIAM Journal on Numerical Analysis, 38 (2000), pp. 466–488.
 [21] J. A. Nelder and R. W. Wedderburn, Generalized linear models, Journal of the Royal Statistical Society: Series A (General), 135 (1972), pp. 370–384.
 [22] V. Rajan, Optimality of the Delaunay triangulation in , Discrete & Computational Geometry, 12 (1994), pp. 189–202.
 [23] L. F. Richardson, The approximate arithmetical solution by finite differences of physical problems involving differential equations, with an application to the stresses in a masonry dam, Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 210 (1911), pp. 307–357.
 [24] L. F. Richardson and J. A. Gaunt, The deferred approach to the limit, Philosophical Transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character, 226 (1927), pp. 299–361.
 [25] J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn, Design and analysis of computer experiments, Statistical science, 4 (1989), pp. 409–423.
 [26] T. W. Simpson, T. M. Mauery, J. J. Korte, and F. Mistree, Kriging models for global approximation in simulationbased multidisciplinary design optimization, AIAA journal, 39 (2001), pp. 2233–2241.
 [27] W. C. Van Beers and J. P. Kleijnen, Kriging interpolation in simulation: a survey, in Proceedings of the 2004 Winter Simulation Conference, 2004., vol. 1, IEEE, 2004.
Appendix A Computing the gradient of the Delaunay interpolant
We can compute in Algorithm 3 by exploiting some standard techniques of differential geometry and linear algebra. Since , and , we can write
where are standard Euclidean coordinates for . Thus,
(10) 
defines an implicit dimensional surface in that is piecewise flat. Define by implicit differentiation as
(11) 
Observe is a unit normal vector to the piecewise flat implicit surface (10).
Now, near , the surface (10) is determined by the values of the Delaunay simplex that contains . Let denote the vertices of , which are found during the computation of by Algorithm 1. Recall that each , meaning is known by assumption. Thus, is a collection of points in lying on the surface (10) near .
Set to be the matrix whose rows are formed by the vectors . Subtract the columnwise average of from each row of , which has the effect of translating the barycenter of to the origin. Let be the SVD of . Then is an orthonormal set whose first vectors form a basis for (10), meaning the last vector, call it , is a unit normal to (10). By scaling so that its last coordinate is , we have found and can recover from (11).
We remark briefly on the case where lies at the interface of one or more Delaunay simplices. The gradient is not continuous across mesh elements (since the function is piecewise flat) and thus has no unique definition. In practice, however, this is very unlikely to occur and such instances could be detected and managed robustly in a number of ways.