The motivation of this research mainly comes from applications to uncertainty quantification for ordinary or partial differential equations with random coefficients. The problem we are interested in is to estimate an expectation (integral)
for large with
being the univariate probability density function defined over. In some applications, the integrand is of the form
for a matrix and a function , see Dick et al. (2015). Here we note that is defined as a row vector. Typically,
is given by the uniform distribution on the unit interval
, or by the standard normal distribution on the real line.
The standard Monte Carlo method approximates as follows: we first generate a sequence of i.i.d. samples of the random variables :
and then approximate by
It is well known that
which ensures the canonical “one over square root of ” convergence.
Now let us consider a situation where computing for takes a significant amount of time in the computation of . In general, if the matrix does not have any special structure such as circulant, Hankel, Toeplitz, or Vandermonde, then fast matrix-vector multiplication is not available and the computation of requires arithmetic operations. Some examples where a fast matrix-vector multiplication has been established are the following: In Feischl et al. (2018) the authors use -matrices to obtain an approximation of a covariance matrix which also permits a fast matrix vector multiplication; In Giles et al. (2008) the authors show how a (partially) fast matrix vector product can be implemented for multi-asset pricing in finance; Brownian bridge and principle component analysis factorizations of the covariance matrix in finance also permit a fast matrix vector multiplication (Giles et al. 2008). Here we consider the case where either a fast matrix-vector product is not available, or one wants to avoid -matrices and particular covariance factorizations, since we do not impose any restrictions on .
In order to reduce this computational cost, we propose an alternative, novel Monte Carlo estimator in this paper. Instead of generating a sequence of i.i.d. samples of the vector , we generate a sequence of i.i.d. samples of a single random variable, denoted by , and then approximates by
The computation of can be done as follows:
For , let be i.i.d. samples of a random variable following .
Note that is a Toeplitz matrix.
Then is given by
The idea behind introducing this algorithm comes from a recent paper by Dick et al. (2015) who consider replacing the point set used in the standard Monte Carlo estimator (1) with a special class of quasi-Monte Carlo point sets which permit a fast matrix-vector multiplication for . This paper considers a sampling scheme different from Dick et al. (2015) while still allowing for a fast matrix-vector multiplication.
When is quite large, say thousands or million, has to be set significantly smaller than . Throughout this paper we consider the case where for some . Since the matrix-vector multiplication between a Toeplitz matrix and each column vector of can be done with
arithmetic operations by using the fast Fourier transform(Frigo and Johnson 2005), the matrix-matrix multiplication appearing in the second item of Algorithm 1 can be done with arithmetic operations. This way the necessary computational cost can be reduced from to , which is the major advantage of using .
In this paper we call a Toeplitz Monte Carlo (TMC) estimator of as we rely on the Toeplitz structure of to achieve a faster computation.111If the sample nodes are given by instead of , the matrix becomes a Hankel matrix, which also allows for a fast matrix-vector multiplication. Therefore we can call our proposal a Hankel Monte Carlo (HMC) estimator instead. However, in the context of Monte Carlo methods, HMC often refers to the Hamiltonian Monte Carlo algorithm, and we would like to avoid duplication of the abbreviations by coining the name Toeplitz Monte Carlo. In the remainder of this paper, we study the variance of the TMC estimator and its dependence on the dimension , and also see practical efficiency of the TMC estimator by carrying out numerical experiments for applications from ordinary/partial differential equations with random coefficients.
2 Theoretical results
2.1 Variance analysis
In order to study the variance of , we introduce the concept of the analysis-of-variance (ANOVA) decomposition of multivariate functions (Hoeffding 1948, Kuo et al. 2010, Sobol’ 1993). In what follows, for simplicity of notation, we write . For a subset , we write and denote the cardinality of by . Let be a square-integrable function, i.e., . Then can be decomposed into
where we write and each summand is defined recursively by and
for . Regarding this decomposition of multivariate functions, the following properties hold. We refer to Lemmas A.1 & A.3 of Owen (2019) for the proof of the case where is the uniform distribution over the unit interval .
With the notation above, we have:
For any non-empty and ,
For any ,
It follows from the second assertion of Lemma 1 that
This equality means that the variance of can be expressed as a sum of the variances of the lower-dimensional functions:
Using these facts, the variance of the TMC estimator can be analyzed as follows:
where we write .
Note that, in the theorem, we write
The readers should not be confused with
The first assertion follows immediately from the linearity of expectation and the trivial equality . For the second assertion, by using the ANOVA decomposition of , we have
It follows from the first assertion of Lemma 1 that the second term on the right-most side above becomes
where we reordered the sum over and with respect to the difference in the last equality.
If , there is no overlapping of the components between and . Because of the independence of samples, it follows from the first assertion of Lemma 1 that the inner sum over and above is given by
If , on the other hand, we have for any . With this equality and the first assertion of Lemma 1, the inner sum over and becomes
Altogether we obtain
Thus we are done.∎
As is clear from Theorem 2.1, the TMC estimator is unbiased and maintains the canonical “one over square root of ” convergence. Moreover, the TMC estimator can be regarded as a variance reduction technique since the second term on the variance can be negative, depending on the function.
To illustrate the last comment, let us consider a simple test function given by
and let be normally distributed independent random variables with mean 0 and variance 1. It is easy to see that
Then it follows that
whereas, for , we have
Therefore the variance of the TMC estimator is almost one-third of the variance of the standard Monte Carlo estimator.
It is also possible that the variance of the TMC estimator increases compared to standard Monte Carlo, however, we show below that this increase is bounded.
2.2 Weighted space and tractability
Here we study the dependence of the variance on the dimension . For this purpose, we first give a bound on . For , let
Then it follows from the decomposition (2) that
resulting in an equality
Using Theorem 2.1, the variance is bounded above as follows.
For any , the Cauchy-Schwarz inequality leads to
Applying this bound to the second assertion of Theorem 2.1, we obtain
Using this result, we have
wherein, for the second inequality, the equality is attained if and only if . Therefore, when we fix the number of samples, the variance of the TMC estimator can at most be times larger than the variance of the standard Monte Carlo estimator.
Now let us consider the case and assume, as discussed in the first section, that the computational time for the standard Monte Carlo estimator is proportional to , whereas the computational time for the TMC estimator is proportional to (assuming that the main cost in evaluating lies in the computation of ). When we fix the cost instead of the number of samples, we have
where indicates that the terms should be of the same order, and so
Thus, the variance of the TMC estimator for a given cost is at most times as large as the standard Monte Carlo estimator (up to some constant factor). On the other hand, if there is some decay of the importance of the ANOVA terms as the index of the variable increases, for instance, if the first few terms in dominate the sum, then the ratio can be bounded independently of , leading to a gain in the efficiency of the TMC estimator. We observe such a behaviour in our numerical experiments below.
Following the idea from Sloan and Woźniakowski (1998), we now introduce the notion of a weighted space. Let be a sequence of the non-negative real numbers called weights. Then the weighted space is defined by
For any subset with , we assume that the corresponding ANOVA term is 0 and we formally set .
For a randomized algorithm using function evaluations of to estimate , which we denote by , let us consider the minimal cost to estimate with mean square error for any in the unit ball of :
We say that the algorithm is
a weakly tractable algorithm if
a polynomially tractable algorithm if there exist non-negative constants , such that
holds for all , where and are called the -exponent and the -exponent, respectively,
a strongly polynomially tractable algorithm if is a polynomially tractable algorithm with the -exponent 0.
We refer to Novak and Woźniakowski (2010) for more information on the notion of tractability.
For instance, the standard Monte Carlo estimator is a strongly polynomially tractable algorithm with -exponent 2 if
holds. This claim can be proven as follows:
It follows that, in order to have
for any with , we need . Thus the minimal cost is bounded above by
Given the condition (3), we see that is bounded independently of the dimension and the algorithm is strongly polynomially tractable with the -exponent 2.
The following theorem gives the necessary conditions on the weights for the TMC estimator to be a weakly tractable algorithm, a polynomially tractable algorithm, or a strongly polynomially tractable algorithm.
The TMC estimator is
a weakly tractable algorithm if
a polynomially tractable algorithm with the -exponent 2 if there exists such that
a strongly polynomially tractable algorithm with the -exponent 2 if
It follows from Corollary 1 and Hölder’s inequality for sums that