1 Introduction
The motivation of this research mainly comes from applications to uncertainty quantification for ordinary or partial differential equations with random coefficients. The problem we are interested in is to estimate an expectation (integral)
for large with
being the univariate probability density function defined over
. In some applications, the integrand is of the formfor a matrix and a function , see Dick et al. (2015). Here we note that is defined as a row vector. Typically,
is given by the uniform distribution on the unit interval
, or by the standard normal distribution on the real line
.The standard Monte Carlo method approximates as follows: we first generate a sequence of i.i.d. samples of the random variables :
and then approximate by
(1) 
It is well known that
and
which ensures the canonical “one over square root of ” convergence.
Now let us consider a situation where computing for takes a significant amount of time in the computation of . In general, if the matrix does not have any special structure such as circulant, Hankel, Toeplitz, or Vandermonde, then fast matrixvector multiplication is not available and the computation of requires arithmetic operations. Some examples where a fast matrixvector multiplication has been established are the following: In Feischl et al. (2018) the authors use matrices to obtain an approximation of a covariance matrix which also permits a fast matrix vector multiplication; In Giles et al. (2008) the authors show how a (partially) fast matrix vector product can be implemented for multiasset pricing in finance; Brownian bridge and principle component analysis factorizations of the covariance matrix in finance also permit a fast matrix vector multiplication (Giles et al. 2008). Here we consider the case where either a fast matrixvector product is not available, or one wants to avoid matrices and particular covariance factorizations, since we do not impose any restrictions on .
In order to reduce this computational cost, we propose an alternative, novel Monte Carlo estimator in this paper. Instead of generating a sequence of i.i.d. samples of the vector , we generate a sequence of i.i.d. samples of a single random variable, denoted by , and then approximates by
with
The computation of can be done as follows:
Algorithm 1
For , let be i.i.d. samples of a random variable following .

Define by
Note that is a Toeplitz matrix.

Compute

Then is given by
The idea behind introducing this algorithm comes from a recent paper by Dick et al. (2015) who consider replacing the point set used in the standard Monte Carlo estimator (1) with a special class of quasiMonte Carlo point sets which permit a fast matrixvector multiplication for . This paper considers a sampling scheme different from Dick et al. (2015) while still allowing for a fast matrixvector multiplication.
When is quite large, say thousands or million, has to be set significantly smaller than . Throughout this paper we consider the case where for some . Since the matrixvector multiplication between a Toeplitz matrix and each column vector of can be done with
arithmetic operations by using the fast Fourier transform
(Frigo and Johnson 2005), the matrixmatrix multiplication appearing in the second item of Algorithm 1 can be done with arithmetic operations. This way the necessary computational cost can be reduced from to , which is the major advantage of using .In this paper we call a Toeplitz Monte Carlo (TMC) estimator of as we rely on the Toeplitz structure of to achieve a faster computation.^{1}^{1}1If the sample nodes are given by instead of , the matrix becomes a Hankel matrix, which also allows for a fast matrixvector multiplication. Therefore we can call our proposal a Hankel Monte Carlo (HMC) estimator instead. However, in the context of Monte Carlo methods, HMC often refers to the Hamiltonian Monte Carlo algorithm, and we would like to avoid duplication of the abbreviations by coining the name Toeplitz Monte Carlo. In the remainder of this paper, we study the variance of the TMC estimator and its dependence on the dimension , and also see practical efficiency of the TMC estimator by carrying out numerical experiments for applications from ordinary/partial differential equations with random coefficients.
2 Theoretical results
2.1 Variance analysis
In order to study the variance of , we introduce the concept of the analysisofvariance (ANOVA) decomposition of multivariate functions (Hoeffding 1948, Kuo et al. 2010, Sobol’ 1993). In what follows, for simplicity of notation, we write . For a subset , we write and denote the cardinality of by . Let be a squareintegrable function, i.e., . Then can be decomposed into
where we write and each summand is defined recursively by and
for . Regarding this decomposition of multivariate functions, the following properties hold. We refer to Lemmas A.1 & A.3 of Owen (2019) for the proof of the case where is the uniform distribution over the unit interval .
Lemma 1
With the notation above, we have:

For any nonempty and ,

For any ,
It follows from the second assertion of Lemma 1 that
This equality means that the variance of can be expressed as a sum of the variances of the lowerdimensional functions:
(2) 
Using these facts, the variance of the TMC estimator can be analyzed as follows:
Theorem 2.1
We have
and
where we write .
Note that, in the theorem, we write
The readers should not be confused with
Proof
The first assertion follows immediately from the linearity of expectation and the trivial equality . For the second assertion, by using the ANOVA decomposition of , we have
It follows from the first assertion of Lemma 1 that the second term on the rightmost side above becomes
where we reordered the sum over and with respect to the difference in the last equality.
If , there is no overlapping of the components between and . Because of the independence of samples, it follows from the first assertion of Lemma 1 that the inner sum over and above is given by
If , on the other hand, we have for any . With this equality and the first assertion of Lemma 1, the inner sum over and becomes
Altogether we obtain
Thus we are done.∎
As is clear from Theorem 2.1, the TMC estimator is unbiased and maintains the canonical “one over square root of ” convergence. Moreover, the TMC estimator can be regarded as a variance reduction technique since the second term on the variance can be negative, depending on the function.
Example 1
To illustrate the last comment, let us consider a simple test function given by
and let be normally distributed independent random variables with mean 0 and variance 1. It is easy to see that
Then it follows that
whereas, for , we have
Therefore the variance of the TMC estimator is almost onethird of the variance of the standard Monte Carlo estimator.
It is also possible that the variance of the TMC estimator increases compared to standard Monte Carlo, however, we show below that this increase is bounded.
2.2 Weighted space and tractability
Here we study the dependence of the variance on the dimension . For this purpose, we first give a bound on . For , let
Then it follows from the decomposition (2) that
resulting in an equality
Using Theorem 2.1, the variance is bounded above as follows.
Corollary 1
We have
Proof
For any , the CauchySchwarz inequality leads to
Applying this bound to the second assertion of Theorem 2.1, we obtain
∎
Using this result, we have
wherein, for the second inequality, the equality is attained if and only if . Therefore, when we fix the number of samples, the variance of the TMC estimator can at most be times larger than the variance of the standard Monte Carlo estimator.
Now let us consider the case and assume, as discussed in the first section, that the computational time for the standard Monte Carlo estimator is proportional to , whereas the computational time for the TMC estimator is proportional to (assuming that the main cost in evaluating lies in the computation of ). When we fix the cost instead of the number of samples, we have
where indicates that the terms should be of the same order, and so
Thus, the variance of the TMC estimator for a given cost is at most times as large as the standard Monte Carlo estimator (up to some constant factor). On the other hand, if there is some decay of the importance of the ANOVA terms as the index of the variable increases, for instance, if the first few terms in dominate the sum, then the ratio can be bounded independently of , leading to a gain in the efficiency of the TMC estimator. We observe such a behaviour in our numerical experiments below.
Following the idea from Sloan and Woźniakowski (1998), we now introduce the notion of a weighted space. Let be a sequence of the nonnegative real numbers called weights. Then the weighted space is defined by
where
For any subset with , we assume that the corresponding ANOVA term is 0 and we formally set .
For a randomized algorithm using function evaluations of to estimate , which we denote by , let us consider the minimal cost to estimate with mean square error for any in the unit ball of :
where
We say that the algorithm is

a weakly tractable algorithm if

a polynomially tractable algorithm if there exist nonnegative constants , such that
holds for all , where and are called the exponent and the exponent, respectively,

a strongly polynomially tractable algorithm if is a polynomially tractable algorithm with the exponent 0.
We refer to Novak and Woźniakowski (2010) for more information on the notion of tractability.
For instance, the standard Monte Carlo estimator is a strongly polynomially tractable algorithm with exponent 2 if
(3) 
holds. This claim can be proven as follows:
It follows that, in order to have
for any with , we need . Thus the minimal cost is bounded above by
Given the condition (3), we see that is bounded independently of the dimension and the algorithm is strongly polynomially tractable with the exponent 2.
The following theorem gives the necessary conditions on the weights for the TMC estimator to be a weakly tractable algorithm, a polynomially tractable algorithm, or a strongly polynomially tractable algorithm.
Theorem 2.2
The TMC estimator is

a weakly tractable algorithm if

a polynomially tractable algorithm with the exponent 2 if there exists such that

a strongly polynomially tractable algorithm with the exponent 2 if
Proof
It follows from Corollary 1 and Hölder’s inequality for sums that
Comments
There are no comments yet.