# Sparse Polynomial Chaos Expansions via Compressed Sensing and D-optimal Design

In the field of uncertainty quantification, sparse polynomial chaos (PC) expansions are commonly used by researchers for a variety of purposes, such as surrogate modeling. Ideas from compressed sensing may be employed to exploit this sparsity in order to reduce computational costs. A class of greedy compressed sensing algorithms use least squares minimization to approximate PC coefficients. This least squares problem lends itself to the theory of optimal design of experiments (ODE). Our work focuses on selecting an experimental design that improves the accuracy of sparse PC approximations for a fixed computational budget. We propose DSP, a novel sequential design, greedy algorithm for sparse PC approximation. The algorithm sequentially augments an experimental design according to a set of the basis polynomials deemed important by the magnitude of their coefficients, at each iteration. Our algorithm incorporates topics from ODE to estimate the PC coefficients. A variety of numerical simulations are performed on three physical models and manufactured sparse PC expansions to provide a comparative study between our proposed algorithm and other non-adaptive methods. Further, we examine the importance of sampling by comparing different strategies in terms of their ability to generate a candidate pool from which an optimal experimental design is chosen. It is demonstrated that the most accurate PC coefficient approximations, with the least variability, are produced with our design-adaptive greedy algorithm and the use of a studied importance sampling strategy. We provide theoretical and numerical results which show that using an optimal sampling strategy for the candidate pool is key, both in terms of accuracy in the approximation, but also in terms of constructing an optimal design.

## Authors

• 1 publication
• 13 publications
• 5 publications
07/02/2014

### Info-Greedy sequential adaptive compressed sensing

We present an information-theoretic framework for sequential adaptive co...
04/25/2018

### Improved Algorithms for Adaptive Compressed Sensing

In the problem of adaptive compressed sensing, one wants to estimate an ...
12/16/2019

### Robust Adaptive Least Squares Polynomial Chaos Expansions in High-Frequency Applications

We present an algorithm for computing sparse, least squares-based polyno...
10/28/2021

### Iterative and greedy algorithms for the sparsity in levels model in compressed sensing

Motivated by the question of optimal functional approximation via compre...
06/18/2018

### Gradient Descent-based D-optimal Design for the Least-Squares Polynomial Approximation

In this work, we propose a novel sampling method for Design of Experimen...
11/05/2015

### Sparse approximation by greedy algorithms

It is a survey on recent results in constructive sparse approximation. T...
04/28/2016

### Sequential Bayesian optimal experimental design via approximate dynamic programming

The design of multiple experiments is commonly undertaken via suboptimal...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Our understanding of complex scientific and engineering problems often stems from a general Quantity of Interest (QoI). Practical analysis, design, and optimization of complex engineering systems requires modeling physical processes and accounting for how uncertainties impact QoIs. Uncertainties may arise from variations in model inputs, measurements and data, or boundary and operating conditions. Much research has been done to quantify how the presence of uncertainty within a model manifests changes in a QoI ghanem2003stochastic ; le2010spectral ; xiu2010numerical . This problem is often studied in the field of Uncertainty Quantification (UQ).

A common approach in UQ for problems with random inputs involves expanding the QoI in a polynomial basis, referred to as a polynomial chaos expansion ghanem2003stochastic ; xiu2002wiener . One way to construct a PC expansion is to form a regression problem using Monte Carlo samples of the QoI. Often QoIs in scientific and engineering applications admit sparse PC expansions, i.e., the QoI can be approximated by a small subset of the polynomial basis functions which capture important features of the model. This work focuses on QoIs which admit sparse PC expansions as detailed below. Sparsity may be exploited to regularize the regression problem; a concept studied in the context of compressed sensing candes2008introduction ; donoho2006compressed ; candes2006robust ; elad2010sparse ; eldar2012compressed . In UQ, sparse PC expansions have been applied for a variety of different purposes doostan2011non ; blatman2011adaptive ; mathelin2012compressed ; jones2015postmaneuver ; sargsyan2014dimensionality ; yan2012stochastic ; yang2013reweighted ; peng2014weighted ; schiavazzi2014sparse ; west2014uncertainty ; jakeman2015enhancing ; hampton2015coherence ; bouchot2015compressed ; peng2016polynomial ; chkifa2016polynomial ; winokur2016sparse ; yang2016enhancing ; adcock2017infinite ; jakeman2017generalized .

Assume that the input parameters of our model are represented by a

-dimensional random vector

with independent, identically distributed entries and with some joint probability density function

. We wish to approximate an unknown scalar QoI, with finite variance, denoted by

. Let represent a multivariate orthogonal polynomial, then we may write our QoI using a PC expansion as

 u(Ξ)=∞∑k=0ckψk(Ξ). (1)

We truncate the expansion in (1) for computation, i.e., let so that

 u(Ξ)=P∑k=1ckψk(Ξ)+ϵ(Ξ)≈P∑k=1ckψk(Ξ), (2)

where represents the truncation error introduced by truncating the expansion to a finite number of terms. Often, in practice, many of the coefficients are negligible and thus admits a sparse representation of the form

 u(Ξ)≈∑k∈Cckψk(Ξ), (3)

where the index set has few elements, say , and we say that our QoI is approximately sparse in the polynomial basis.

The polynomials are selected with respect to the probability measure so that they are orthogonal, e.g., when

obeys a jointly uniform or Gaussian distribution (with independent components),

are multivariate Legendre or Hermite polynomials, respectively xiu2002wiener . We assume

is obtained by the tensorization of univariate polynomials orthogonal with respect to the probability density function of the coordinates of

, and that is of total order less than or equal to . This formulation implies that there are basis polynomials. Furthermore, we assume that are normalized such that , where denotes the mathematical expectation operator.

For , where is the number of independent samples considered, the computational model is evaluated for each realization of , which we denote , and yields a corresponding value of the QoI . The coefficients are approximated using an experimental design consisting of samples and their corresponding QoIs , which are related by the linear system , where

 Ψ(i,j):=ψj(ξi) and u:=[u(ξ1),⋯,u(ξN)]T. (4)

Further, let be a diagonal positive-definite weight matrix such that is a function of , which depends on the sampling strategy described in Section 2. Let and . Under this sampling strategy, we consider the linear system

 v≈Φc. (5)

In compressed sensing a sparse approximation of is obtained by solving the optimization problem

 ^c=argminc∥c∥0 subject to ∥v−Φc∥2≤δ, (6)

where measures the sparsity of . In (6), is a tolerance of solution inaccuracy due to the truncation of the expansion. While, the problem (6) is NP-hard to solve, approximate solutions may be obtained in polynomial time using a variety of greedy algorithms including orthogonal matching pursuit (OMP) tropp2005signal ; tropp2007signal ; needell2010signal ; davenport2010analysis , compressive sampling matching pursuit (CoSaMP) needell2009cosamp ; pal2016stochastic , and subspace pursuit (SP) dai2009subspace , or convex relaxation via -minimization candes2008introduction ; donoho2006compressed . The key advantage of an approximation via compressed sensing is that, if the QoI is approximately sparse, stable and convergent approximations of can be obtained using random samples of , as long as satisfies certain conditions candes2008introduction ; donoho2006compressed ; doostan2011non ; rauhut2012sparse ; hampton2015coherence ; adcock2017infinite . Reducing the number of QoI samples, while maintaining solution accuracy and stability, is one of the main goals in UQ, and motivates the present study.

We use the linear system to approximate the vector of coefficients , by first generating samples of , namely, and simulating the corresponding QoIs . Imagine that each simulation is costly, thereby limiting the number of simulations we are allowed to perform. Under these circumstances, a natural question to ask is, can we choose our samples in a strategic way that improves the approximation of ? This paper aims to answer this question by focusing on the construction of the weighted measurement matrix for the purpose of solving the optimization problem in (6). While our results are based on SP, the general ideas may be extended to, e.g., OMP or CoSaMP.

Let us briefly review the original SP algorithm (see Algorithm 1) with the specific focus of solving (6). Define to be the projection of onto the column space of , where is an estimate of the support set of , and , i.e., is the submatrix made from columns of corresponding to the set . Further, define to be the residual vector and to be an approximate upper bound on the number of non-zero coefficients in . Note that in practice the optimal value of (or the tolerance parameter ) is not known a priori and should be estimated, for instance, using a cross-validation procedure (doostan2011non, , Section 3.5) as discussed in Section 4.4.

A key step in SP algorithm (as well as in OMP and CoSaMP) involves solving an over-determined least squares problem corresponding to a subset of columns of . In SP, this problem is represented by step 2) of the iteration. This over-determined least squares problem lends itself to optimization techniques from optimal design of experiments (ODE) pukelsheim2006optimal ; sinha2014optimal . We propose the use of an alphabetic optimality criterion from ODE to sequentially augment the experimental design according to the support set estimate on any given iteration, which determines the construction of the weighted measurement matrix . The sequential augmentation is done such that once a sample is selected to be a member of the design it is never removed, i.e., samples and their corresponding QoI evaluations are never discarded while augmenting the experimental design. This constraint is necessary due to the computational cost of evaluating the QoI.

Compared to PC approximations via over-determined least squares approximation (LSA), where the use of ODE has been well explored, ODE for compressed sensing has received less attention. The idea of using ODE for the purpose of compressed sensing, specifically when prior information about the sparsity is known in advance, is not necessarily new davenport2015constrained ; chepuri2014compression . For instance, in davenport2015constrained , the authors show that improvements in signal reconstruction accuracy are possible under the assumption that the support (the locations of the non-zero entries) of a signal, with respect to a given basis, is either known or can be estimated. This was the case in their example dealing with Fourier measurements of Wavelet sparse signals (davenport2015constrained, , Section 4), where it was shown that using an alphabetic optimality criterion from ODE to sequentially choose measurements reduced the mean-squared error of signal reconstructions. However, the assumption that prior information regarding the support of is known in advance is often too restrictive in the context of PC expansions. In fact, for most scientific and engineering applications none of the coefficients are zero and the locations of the largest in magnitude coefficients are unknown. Hence, the motivation for our approach.

### 1.1 Contributions of this paper

We propose a quasi-random sampling strategy for building experimental designs which uses a studied optimal importance sampling and an alphabetic optimality criterion from ODE as described in Sections 2.3 and 3, respectively. This strategy is exploited by a modified SP algorithm which builds an experimental design sequentially and chooses input samples from a candidate pool while simultaneously constructing an approximation of the PC coefficients at each iteration. A similar idea was presented in fajraoui2017optimal , where alphabetic optimal criteria from ODE were employed to sequentially build experimental designs with least angle regression (LAR) for PC expansions.

Our work differs from fajraoui2017optimal where the authors concluded that sequential design augmentation using -optimality (an alphabetic optimal criterion) showed poor performance relative to other sampling techniques, whereas our results indicate otherwise. Further, our work is based on the SP algorithm rather than LAR. This work also introduces a novel method for sequential design augmentation based on the ideas presented in seshadri2016effectively . The method uses a QR with column pivoting algorithm to construct and augment designs which allows for simple numerical implementation compared to existing greedy or exchange algorithms. We show that our modified SP algorithm reduces relative error and variability in the approximated PC coefficients compared to the standard SP algorithm. We provide evidence to support this claim in Section 5 by investigating manufactured sparse PC expansions with additive noise, a mathematical model for a Duffing oscillator, the Ishigami function, and a wing weight function. Finally, we provide theoretical and numerical results in Sections 3.4 and 5, respectively, which show that using an optimal sampling strategy for the candidate pool is beneficial, in terms of constructing an optimal design and the solution accuracy.

## 2 Sampling

In this section we outline the sampling method used in this work. First, we highlight some preliminary sampling definitions. Second, we consider sampling according to the random variables defined by the orthogonality measure

, which is often used in PC approximation and referred to as standard Monte Carlo sampling; see hosder2006non ; le2010spectral ; doostan2011non ; mathelin2012compressed . Third, we outline the coherence-optimal sampling strategy, as it has been shown to improve the stability and accuracy of over-determined least squares PC approximations relative to standard Monte Carlo sampling in either the Legendre or Hermite bases hampton2015coherence . This discussion on sampling methods is motivated by the over-determined least squares problems solved within SP as mention in Section 1 and specified in Step 2) of the iteration in Algorithm 1.

### 2.1 Sampling Preliminaries

It is necessary here to outline some preliminary definitions for the topics in Section 2.3. We follow the discussion and notation of (hampton2015coherence, , Section 2.1) but limit our presentation to polynomials bounded over their domain of orthogonality measure. Extensions to unbounded polynomials, e.g., for Gaussian inputs and Hermite polynomials, can be found in hampton2015coherence . Recall the set of basis polynomials defined in Section 1 and define to be

 B(ξ):= ⎷P∑k=1|ψk(ξ)|2. (7)

Here, represents a uniformly least upper bound on the sum of squares of the basis polynomials considered. A bound on may be attained from

 B2(ξ)≤Psupk=1:P|ψk(ξ)|2, (8)

where bounds on are known for certain types of orthogonal polynomials hampton2015compressive ; rauhut2012sparse ; szeg1939orthogonal ; dominici2007asymptotic ; askey1965mean ; muckenhoupt1970asymptotic ; nevai1994generalized . Hence, and

 c=(∫f(ξ)B2(ξ))−1/2 (9)

is such that

 c2∫f(ξ)B2(ξ)dξ=1, (10)

and

 fY(ξ):=c2f(ξ)B2(ξ), (11)

defines a probability density for the variable .

This formulation is designed to identify distributions for . However, under these conditions we can no longer guarantee that , in which case the polynomials are not necessarily orthogonal. If we let

 w(Y):=1cB(Y), (12)

then the weight function ensures that are orthonormal random variables. This construction motivates how we define the diagonal positive-definite weight matrix from (6), i.e.,

 W(i,i)=w(ξi), (13)

where is the th realization of . We denote all realized random vectors by without regard for the corresponding sampling distribution employed and we note that the weight function depends on the sampling strategy.

While our work focuses on compressed sensing, where , we first highlight some important, overlapping concepts from LSA. In LSA of PC expansions, typically , and the approximated PC coefficients are given by

 ^c=argminc∥Φc−v∥2 (14)

In this case, is computed by solving the system of normal equations

 ΦTΦ^c=ΦTv. (15)

Even though (6) differs from the LSA in (14), greedy methods like SP for solving (6) involve an over-determined LSA to construct . This fact motivates our interest in optimizing the LSA in SP.

### 2.2 Standard sampling

Standard Monte Carlo sampling involves constructing the samples according to , the orthogonality measure for a given PC basis. This sampling strategy implies equal weighting, i.e., and . In the case of

-dimensional Legendre polynomials, the standard method is to sample independently from the uniform distribution on

. In the case of -dimensional Hermite polynomials, the standard method is to sample each of the coordinates as an independent standard normal random variable.

### 2.3 Coherence optimal sampling

The coherence parameter , defined in hampton2015coherence as

 μ(Y):=supξP∑j=1|w(ξ)ψj(ξ)|2, (16)

plays a key role in the stability and convergence of least squares PC approximation cohen2013stability ; hampton2015coherence ; cohen2016optimal . A smaller results in more stable and accurate LSAs (hampton2015coherence, , Theorem 2.1). This result motivated the design of a random sampling strategy referred to as coherence-optimal sampling hampton2015coherence . Coherence-optimal sampling seeks to find a sampling measure to minimize . This strategy involves sampling according to the distribution defined by (11) for a normalizing constant . We mention that, is the measure for which the basis polynomials are naturally orthogonal. We define the weight function as

 w(ξ):=1B(ξ). (17)

In the case where LSA is applied to all basis funcitons, and with the weight function as in (17), sampling from (11) leads to the minimum possible compared to any other sampling measure (hampton2015coherence, , Theorem 3.3)

. Coherence-optimal sampling is performed with a Markov Chain Monte Carlo (MCMC) sampler to minimize the coherence parameter defined by (

16); see (hampton2015coherence, , Section 4.3.1). To generate coherence-optimal samples, we use the Matlab code available at www.github.com/CU-UQ.

The MCMC sampler must be given a proposal distribution. When , we use as proposal distributions, standard normal in the case of Hermite polynomials, and uniform over in the case of Legendre polynomials, where samples are drawn independently. When , samples are independently drawn from a uniform distribution on a -dimensional ball of radius for Hermite polynomials as in (hampton2015coherence, , Section 3.2), and a -dimensional Chebyshev distribution for Legendre polynomials. For more information on these proposal distributions see hampton2016compressive ; dominici2007asymptotic . When the cost of evaluating the QoI is expensive, the construction of the samples is not typically a computational bottleneck. Hence, the extra cost of the MCMC sampling is justifiable.

Coherence-optimal sampling ensures stable computation of via (15), with a number of QoI computations that depends linearly (up to a logarithmic factor) on the number of coefficients, i.e., (hampton2015coherence, , Theorem 2.2). Further, as this approach was applied to various numerical examples, it was empirically demonstrated that coherence-optimal sampling leads to either similar or considerably more accurate LSAs in comparison to sampling from hampton2015coherence ; hadigol2017least . In Section 5.2, we also demonstrate this improvement in accuracy for the case of when using SP.

## 3 Optimal design of experiments

Constructing a surrogate model requires sampling the input parameter space and performing experiments, whether physical or computational. The planning of this experimental procedure prior to conducting the experiment is referred to as design of experiments (DoE). Often, experiments are expensive and time-consuming and inputs should be selected in order to extract as much information as possible for a given amount of experimental effort, this is the study of ODE. Historically, work on ODE dates back to 1918 smith1918standard , which was expanded upon a few decades later in kiefer1985collected . ODEs are commonly used in the context of least squares regression box2005statistics ; atkinson2007optimum ; fedorov1972theory ; fedorov2012model ; pukelsheim2006optimal . For a brief review and interpretation of a major class of ODE, known as alphabetic optimal design, related to least squares PC approximation, the interested reader is referred to (hadigol2017least, , Section 4.5).

In this work, we seek to construct surrogate models in the context of sparse PC expansions. This construction is performed using a greedy compressed sensing algorithm, and a key feature of this algorithm is the solution of over-determined least squares sub-problems. Our approach is to apply ODE to these sub-problems. To explain this ODE strategy, in Section 3.1 we briefly review the -optimality criterion, an alphabetic optimality criterion that is widely used in ODE and exclusively focused on in this work. Section 3.2 describes some conventional methods used to construct -optimal designs. The primary method for constructing -optimal designs in this work is a QR factorization with column pivoting algorithm which is outlined in Section 3.3. We conclude by providing some theoretical results relevant to -optimal designs in Section 3.4.

### 3.1 D-optimal designs

Typically in ODE, the design points are chosen according to an alphabetic optimality criterion, which is a scalar function of the so-called information matrix defined as

 M:=1NΦTΦ. (18)

The matrix

plays an important role in the stability of LSAs, described by its deviation from the identity matrix

hampton2015coherence ; hadigol2017least . An important observation is that does not depend on any realized values of the QoI , and this means that different designs may be compared in terms of to judge their relative optimality prior to any simulations of the QoI.

-optimal (or determinant optimal) designs are obtained by maximizing the determinant of the information matrix, i.e., maximizing

 ϕD(M):=|M|1/P, (19)

where the factorization is a convenient normalization that has been used in the literature. -optimal designs fall into the category of estimation-oriented optimal designs jones2012optimal , which focus on the precise estimation of the coefficients , thereby improving surrogate accuracy. An equivalent formulation involves minimizing the determinant of the inverse information matrix, i.e., minimizing fedorov2012model .

###### Remark 3.1.

In fajraoui2017optimal , -optimal designs were employed in the context of sequential sparse PC approximation, and compared to a closely related criterion based on the objective function

 ϕS(Φ):=⎛⎜ ⎜⎝√|ΦTΦ|∏Pi=1∥Φ(i)∥2⎞⎟ ⎟⎠1/P, (20)

where represents the th column of . The -optimality criterion given by (20), was originally presented in shin2016nonadaptive as a method of point selection for LSA.

### 3.2 Construction of alphabetic optimal designs

Unlike the random sampling strategies of Sections 2.2 and 2.3, ODE offers deterministic sampling methods to improve the PC approximation of . In general, however, the alphabetic optimal designs are constructed by generating, either randomly or deterministically, a large number of candidate samples such that the selected optimal design depends on the choice of candidates. In this regard, we here consider ODE as quasi-random sampling. As we shall justify in Section 3.4, this work proposes using coherence-optimal sampling to generate candidate samples for ODE.

Let and be a pool of candidate samples generated with respect to the orthogonality measure in the case of standard Monte Carlo sampling, or (11) in the case of coherence-optimal sampling. Let represent the matrix of basis polynomials corresponding to the candidate samples and let be the appropriate weight matrix, then we define the candidate measurement matrix as . As previously mentioned, the information matrix does not involve evaluating the QoI . This fact implies that when the computational bottleneck is the evaluation of the QoI for any given realization , the additional computational cost of constructing an optimal design is justifiable. Hence, this pre-processing of the candidate sample pool can improve PC approximation in a cost-effective manner.

The problem of finding an exact -optimal design, i.e., choosing out of rows of maximizing , is NP-hard, and this fact has motivated the development of relaxation techniques. Two common methods for constructing alphabetic optimal designs are exchange mandal2015algorithmic ; smucker2010design ; fedorov1972theory ; cook1980comparison ; mitchell1974algorithm ; wynn1970sequential ; johnson1983some ; atkinson1989construction ; meyer1995coordinate and greedy dykstra1971augmentation ; song2009netquest ; shin2016nonadaptive ; gammerman2016conformal

algorithms. Heuristic exchange algorithms were among the earliest search methods proposed for the construction of optimal designs

mandal2015algorithmic ; smucker2010design . These exchange algorithms were developed originally for -optimal designs because they were computationally more feasible in comparison to other criteria fedorov2012model , and it has been shown that -optimal designs perform well compared to other criteria atkinson2007optimum . For comparisons of performance between different exchange algorithms see cook1980comparison ; johnson1983some ; pronzato2008optimal ; nguyen1992review . Greedy algorithms such as (hadigol2017least, , Algorithm 1) involve starting with a random seed, i.e., a random row of the candidate matrix, then iteratively and exhaustively searching the entire remaining candidates to build the experimental design row-by-row. Both exchange and greedy algorithms involve exhaustive searches of the candidate matrix at each iteration and can be computationally expensive when is large. To avoid the computational cost of exhaustive searches, we instead employ another greedy approach based on QR factorizations with column pivoting to build -optimal designs which is discussed further in Section 3.3.

### 3.3 QR factorization with column pivoting on ΦTc

In this work, we use QR factorizations with column pivoting to construct -optimal designs. This idea is based on the work of sommariva2009computing who originally used QR factorization with column pivoting to approximate Fekete points on compact multivariate domains, and more recently seshadri2016effectively , where QR factorizations with column pivoting were used to sub-sample design points from a tensor grid for generating least squares polynomial approximations. In this section, we briefly review QR factorization with column pivoting and the subsampling method proposed in seshadri2016effectively with the specific purpose of subsampling rows of to generate -optimal designs.

QR factorization with column pivoting is a greedy heuristic commonly used for solving rank deficient problems. This method works by (i) determining the numerical rank of a matrix, and (ii) permuting the columns of the matrix such that the first columns are linearly independent hansen2012least ; golub2012matrix ; gu1996efficient . For this reason, QR with column pivoting is sometimes referred to as rank revealing QR (RRQR). Consider applying this heuristic to sub-select rows of . Let , then there exists a RRQR factorization

 ΦTcP=Q(R1R2), (21)

where is orthogonal; is nonsingular, well conditioned, and upper-triangular; and gu1996efficient . In (21), is a permutation matrix that permutes the columns of such that the absolute value of the diagonal entries of are in descending order. Let be a vector that converts the pivots encoded in the matrix to the specific rows of , i.e., determines which rows to select from with

 π:=PTn, (22)

where and the vector contains ordered indices corresponding to rows of that are subselected via the RRQR factorization. Let be the first entries of , then we define the -point, -optimal design as

 ΦN=Φc(πN,:), (23)

where is the submatrix of constructed by selecting the rows of indexed by .

Subset selection is a similar process where the aim is to produce a well-conditioned submatrix of with linearly independent rows. This process can produce a submatrix with smaller condition number compared to that given by RRQR alone seshadri2016effectively ; golub2012matrix

. Subset selection can be accomplished in two steps. The first step is to compute the singular value decomposition (SVD) of

. The second step is to compute the RRQR decomposition of the transpose of the first right-singular vectors of , i.e.,

 V(:,1:P)TPT=QR, (24)

where the columns of encode the permutations. Once this computation is performed, equations (22) and (23) may be used with to determine . It should be mentioned, however, that subset selection is more expensive as it requires both an SVD and QR computation. The additional cost is justified when the QoI evaluations are computationally expensive.

### 3.4 Theory relevant for the design of experiments

We conclude this section by proposing two key heuristics related to the -optimality criterion. The first heuristic we consider is that a candidate set generated with coherence-optimal samples, as described in Section 2.3, will likely produce designs with larger values of compared to a candidate set that is constructed via standard Monte Carlo sampling. This is demonstrated empirically in Section 5.1. We justify the use of coherence-optimal samples relevant to the -optimality of arbitrary sub-matrices of corresponding to basis functions. As we shall explain in Section 4.3, our modified SP algorithm selects designs from such sub-matrices of . To investigate -optimality of these matrices, and following hampton2017basis , we consider a coherence parameter

 μK:=supξ∑j∈S|w(ξ)ψj(ξ)|2,

associated with any set of size . We mention that does not exceed the coherence parameter associated with the basis polynomials, i.e., where is defined as in (16). This bound on is a consequence of (hampton2015coherence, , Theorem 3.3).

###### Theorem 3.1.

Consider a basis of polynomials. Let be an information matrix associated with a set of coherence-optimal samples, and their associated weights, with a coherence . For

 P[ϕD(M)≤t]≤2Kexp[−cMμ−1K(1−t)], (25)

where is defined in (19).

Proof: See Appendix B for a proof of this theorem.

###### Remark 3.2.

We emphasize that the coherence parameter does depend on the polynomial basis functions. Particularly, for the SP algorithm, the set of polynomials is not determined at the time of sampling, so the coherence in (25) is not known a priori. However, the bound on the probability given by (25) decreases exponentially as decreases. This decay is significant because the coherence-optimal sampling strategy of Section 2.3 specifically minimizes the coherence parameter , which in turn bounds .

The second heuristic is that the RRQR method described in Section 3.3 identifies a subset of samples to form a design such that the corresponding information matrix has a large determinant. Consider the partial QR factorization and let be the rank of , where

 R=(ABC), (26)

denotes the triangular matrix in the partial QR factorization of , such that has positive diagonal elements. Let identify the ,th entry of a matrix. The second heuristic is justified in part by (gu1996efficient, , Lemma 3.1), which we restate here as Theorem 3.2.

###### Theorem 3.2.

(gu1996efficient, , Lemma 3.1) If denotes the triangular matrix after a pivot exchange that interchanges the th and th columns of such that

 ¯R=(¯A¯B¯C), (27)

then

 det(¯A)det(A)=√(A−1B)i,j+(∥C(:,j)∥2∥A−1(i,:)∥2)2. (28)

Theorem 3.2 shows that a RRQR algorithm, such as (gu1996efficient, , Algorithm 3), actively insures that the is monotonically increasing with repeated permutations, as the exchanges done for pivoting at each iteration may be chosen to maximize (28). Consequently, Theorem 3.2 shows that the columns of may be selected by the RRQR pivots as in (23) to produce a design with non-decreasing values of . Notice that as RRQR is a greedy algorithm and only achieves a local minima. That is, upon completion of the column pivoting, Theorem 3.2 does not necessarily guarantee that the constructed -optimal designs have larger value of than a design constructed by randomly selecting columns of . However, in Section 5, we do show that the PC approximations corresponding to designs selected by RRQR via (23) consistently outperform approximations where designs are constructed by randomly selecting columns of , for a variety of QoIs. We also mention that a RRQR algorithm such as (gu1996efficient, , Algorithm 3) may be used to efficiently construct -optimal designs as it requires in the worst case scenario floating-point operations.

## 4 Improving Subspace Pursuit

In this section, we propose two methods for improving PC approximation with SP. These methods are investigated numerically in Section 5. The first method focuses exclusively on the sampling strategy employed to select input samples for the standard SP algorithm. The second method is the main contribution of this work and uses the same sampling strategy as the first but involves modifying the standard SP algorithm to sequentially construct an experimental design based on the current PC approximation given at any iteration of the algorithm, as opposed to constructing the entire design before executing SP.

The motivation for building an experimental design sequentially is simple. The sparsity support set of is a priori unknown, but is needed for experimental design. In many iterative compressed sensing algorithms, an estimate of is obtained at each iteration, which can be used to generate, or more precisely augment, a design. Some examples of these algorithms include OMP, CoSaMP, and SP tropp2005signal ; needell2009cosamp ; dai2009subspace . Ideally, would contain the indices corresponding to the largest, in magnitude, coefficients in . In practice, once a sample point is added to the experimental design and its corresponding QoI has been computed, it should not be removed as it provides valuable information regarding the QoI. For computationally expensive models, we aim to show that it is advantageous to start with a relatively small experimental design to construct a PC approximation, then sequentially add new samples to the design and update the approximation. To perform this task we introduce a sequential strategy which augments the experimental design based on the current estimated support set .

This portion of the paper is organized as follows. In Section 4.1, we introduce an improved sampling strategy for SP. In Section 4.2 we present a QR-based approach for sequential design augmentation. In Section 4.3, we outline the modified SP algorithm with sequential sampling. Finally, in Section 4.4, we describe a simple cross-validation procedure for , an approximate upper bound on the sparsity of , which is necessary for the numerical experiments presented in Section 5.

### 4.1 Subspace pursuit with D-coherence-optimal sampling

The coherence-optimal sampling strategy presented in Section 2.3 is known to produce at least as accurate as PC approximations compared to standard Monte Carlo sampling from for both LSA and -minimization hampton2015coherence ; hampton2015compressive ; hadigol2017least ; alemazkoor2017near ; hampton2017basis . Further, constructing -optimal designs from a large pool of candidate samples, regardless of how the candidate samples are generated, can improve least squares PC approximation accuracy compared to designs constructed randomly jones2012optimal ; hadigol2017least . In this work, we combine these two sampling strategies to create a random, then deterministic sampling strategy which we call D-coherence-optimal sampling. Specifically, -coherence-optimal sampling involves first generating a large pool of coherence-optimal samples by sampling from (11), then constructing an -point, -optimal design from the candidate pool according to (23). In Section 5, we demonstrate that SP with -coherence-optimal sampling outperforms coherence-optimal sampling in a variety of problems.

### 4.2 Design adaptation using RRQR

As mentioned in the introduction to this section, once a sample point is added to an experimental design and its corresponding QoI value has been computed, the sample point should not be removed from the design as it provides valuable information regarding the QoI. At each iteration of SP, an approximate coefficient vector is obtained via LSA; see step 2) of the iteration in Algorithm 1. This LSA is computed by selecting columns of which are most correlated, in magnitude, with the residual vector; see step 1) of the iteration in Algorithm 1. The indices of these columns are represented by the estimated support set , where . The goal of design adaptation is to add a sample point from the candidate pool to the experimental design such that the sample is chosen via -optimality from the set according to , which is equivalent to selecting a row from . It is important to point out that instead of considering the full candidate matrix, the adaptation considers the submatrix consisting of columns of corresponding to . This process is performed such that each point in the final experimental design is a unique member of the candidate pool . In this section, we outline a QR-based approach for performing the design adaptation we have described. Because this approach uses RRQR, it avoids calculating determinants explicitly which increase rapidly for large values of and , and can be problematic for constructing -optimal designs, as described in Section 3.2. Instead of computing determinants explicitly, relative changes in determinants as in (28), which are less expensive to compute, are all that is required.

Let be an -point, -optimal design as in (23) for some candidate matrix , and let be the submatrix of which is constructed from the columns of indexed by . Similarly, let be the submatrix of which is constructed from the columns of indexed by . For our design adaptation, it is necessary to construct an approximation of using . We may write this approximation as

 ~ΦTNΠ≈~ΦTc, (29)

where and rank min. When using SP, dai2009subspace , therefore rank. The matrix may be solved for using least squares regression where . Now, let the RRQR factorization of be

 (~ΦTc−~ΦTNΠ)~P=~Q(~R1~R2). (30)

With defined as in (22), we let

 ~π:=~PTn, (31)

and the adapted design may be represented by

 πN+n=πN∪{the first n elements of ~π (that are not in πN)}. (32)

Hence, the augmented design matrix may be expressed as

 ΦN+n=Φc(πN+n,:). (33)

### 4.3 D-optimal Subspace Pursuit

We now have all of the necessary material to present the -optimal subspace pursuit (DSP) algorithm, which exploits the benefits of -optimality to sequentially build experimental designs and perform PC approximation. DSP works by selecting rows of a candidate matrix with the methods discussed in Sections 3.3 and 4.2. Assume we are allowed a fixed computational budget , which is the maximum number of QoI evaluations to be used in order to approximate . DSP first constructs an initial -optimal design using the RRQR method via (23) with subset selection. The initial design is constructed using fewer than samples from a candidate pool . At each iteration, DSP updates the estimated support set , the design is sequentially augmented according to by (33), and one QoI evaluation is performed. Once samples have been selected and their corresponding QoIs have been computed, DSP is designed to approximate in exactly the same way as SP, i.e., DSP always performs SP iterations once . Note that only once samples are placed in the experimental design is the complete measurement matrix formed. SP in contrast, takes as inputs a complete measurement matrix and the corresponding QoIs, which are computed a priori. The main steps of DSP are detailed in Algorithm 2.

As mentioned in Section 1, generally speaking, the optimal value of is not known in advance and can be estimated using a cross-validation procedure. We propose one such procedure for in Section 5; see Algorithm 3. If the optimal value of is unknown one should employ a slightly modified version of DSP; see Algorithm 4 in Appendix A.

In this work, we chose the size of the initial design to be of size max as an educated guess and do not claim that it is in any way optimal. In the context of sequential sampling, existing literature does not have a well established rule of thumb for the number of initial samples. In bernardo1992integrated , the authors suggest that the initial design should contain at least three observations per input variable. In jones1998efficient , it is recommended that the initial design be larger using up to ten observations per input variable. The choice of is based on fajraoui2017optimal , where it was argued that the optimal initial design size was likely to be problem dependent for design adaptive PC approximation using LAR, but the best performance is achieved with a relatively large initial experimental design. For DSP, the initial design must be at least of size , otherwise the LSA in step 2) of the iteration may not be well-posed. However, in most of the numerical examples presented in Section 5, and so the initial design consists of rows of the candidate matrix. Ideally, the initial design would be relatively small to obtain initial information regarding the support estimate , and this information would be iteratively updated. If the initial design is too small, however, the support estimate could be inaccurate, leading to poor optimization of the -optimal criterion in the early iterations. Even if a support index is falsely identified, DSP, SP, and CoSaMP allow for it to be removed from in later iterations dai2009subspace ; needell2009cosamp . This feature is in contrast to the widely used OMP algorithm, which does not allow for support indices to be removed once they are added to the set tropp2007signal .

We recommend using the subset selection method as described in Section 3.3 for constructing the initial design; see step 2) of the initialization in Algorithm 2. Recall that subset select involves both SVD and RRQR computations, but can result in a designed measurement matrix with a smaller condition number compared to using RRQR alone seshadri2016effectively ; golub2012matrix .

### 4.4 Cross validation for K

In most applications, the true coefficient vector is not likely to be exactly sparse. It may be the case that many components of are close to, but not exactly zero. We assume that QoI admits an approximately sparse PC expansion, and in that regard we don’t know exactly which value of results in the most accurate PC approximation. This problem affects both SP and DSP. Assume that we have a fixed number of QoI evaluations we are allowed to perform, we propose the use of a cross-validation scheme similar to that presented in (doostan2011non, , Section 3.3) to approximate the optimal value of . Let and , then we can estimate the optimal value of via the following cross-validation procedure outlined in Algorithm 3.111In all of the numerical examples presented in Section 5, , , and .

## 5 Numerical Experiments

Here we consider several numerical experiments to test the ideas presented in this work. In Section 5.1, a comparison between coherence-optimal sampling and standard Monte Carlo sampling is performed to judge the quality of -optimal designs obtained from each strategy. In Sections 5.2-5.5, we compare three different strategies with coherence-optimal samples in terms of their ability to construct a PC approximations of a given QoI. These three strategies are coherence-optimal sampling, coherence-optimal sampling with a -optimal design, and sequential coherence-optimal sampling via DSP which we denote as Coh-Opt, D-Coh-Opt, and Seq-D-Coh-Opt, respectively. We examine four different model QoIs. In Section 5.2 we model a QoI with manufactured sparse functions. In Section 5.3 we examine the model of a nonlinear Duffing oscillator. In Section 5.4 we consider a model for the wing weight of a light aircraft. Finally, in Section 5.5 we discuss the Ishigami function.

In Sections 5.2-5.5, we begin by creating a large pool of candidate samples, evaluating the polynomial basis, and computing the corresponding weights to yield a weighted candidate design matrix . Unless otherwise stated, the candidate pool is generated using coherence-optimal sampling. The coherence-optimal sampling strategy denoted Coh-Opt involves randomly selecting rows from to form a measurement matrix then using the corresponding QoI values with SP. The strategy denoted D-Coh-Opt involves using SP with an -point, -optimal design, namely as in (23). The sequential coherence-optimal sampling denoted Seq-D-Coh-Opt is implemented with DSP. The RRQR factorizations are computed using the matlab qr() function, which ensures that the absolute value of the diagonal entries of are in non-increasing order.

For each PC coefficient approximation using the Coh-Opt and D-Coh-Opt strategies, Algorithm 3 is performed first to estimate the value of , then is constructed via Algorithm 1. However, since the size of the measurement matrix can change on each iteration of DSP, a modified algorithm which performs cross-validation on at each iteration is employed. For a detailed description of this modified algorithm see Algorithm 4 in Appendix A. It should be mentioned that the computational cost associated with cross-validation of would typically be negligible compared to the cost associated with computing expensive QoIs, and it is therefore justifiable for each of the three strategies.

For the Duffing oscillator, the wing weight model, and the Ishigami function, we assess the accuracy of the PC approximations in terms of their ability to approximate the QoI for a large number of independent validation samples. Specifically, we compute the relative validation error as

 erel=∥Ψval^c−uval∥2∥uval∥2, (34)

where is constructed by evaluating the polynomial basis using randomly sampled validation inputs sampled from , is constructed by evaluating the QoI using the validation inputs, and is approximated with a separate set of reconstruction inputs and their corresponding QoI values. Let (in the case of DSP ) be the maximum number of samples to be used in computing , and be a weighted measurement matrix whose reconstruction inputs are sampled via coherence-optimal sampling according to (11). The candidate design matrix is chosen by randomly selecting rows from without replacement. The reason for constructing the candidate matrix in this manner is that we wish to repeat the PC coefficient approximation for each value of , times say, for each of the three sampling strategies. Changing the candidate matrix on each repetition is necessary otherwise the designs constructed by the D-Coh-Opt and Seq-D-Coh-Opt strategies will not change. After

repetitions, the mean and standard deviation of the

values of are computed. In all of the numerical results presented, and the number of validation samples is .

### 5.1 The effects of sampling on D-optimal designs

As mentioned in Section 3.4, we expect that a candidate pool of coherence-optimal samples will produce designs with larger values of compared to a candidate pool that is constructed via standard Monte Carlo sampling. In this experiment, -optimal designs are computed via (23) with subset selection and we consider values of , where , and is defined as in (18). Note that is similar to the -optimality criterion, and that larger values of correspond to better -optimal designs. The division by the Frobenius norm is to normalize so that each information matrix has the same average singular value and values of can be compared fairly regardless of the sampling method and weights used to generate the candidate pool.

This experiment involves constructing 1000 -optimal designs comprising of samples from candidate pools of samples. For each design, new candidate pools are constructed independently and the value of is recorded. In this manner, is considered as a random variable. For a fixed value of and

, comparing the empirical cumulative distribution functions (CDFs) of

provides qualitative information regarding each sampling strategy’s impact on the -optimal designs constructed. Specifically, given two sampling strategies MC (Monte Carlo) and Coh-Opt (coherence-optimal), we say that Coh-Opt first-order stochastically dominates MC if for all with strict inequality at some value of , where denotes a CDF hadar1969rules .

We consider cases for and which correspond to basis functions for both Legendre and Hermite polynomials. Figure 3 shows the empirical CDFs of computed using standard Monte Carlo sampling and coherence-optimal sampling for with Legendre and Hermite polynomials.

We see that in this case, coherence-optimal sampling produces significantly better experimental designs than standard Monte Carlo sampling. This claim is evident due to the position of the CDFs, for which coherence-optimal sampling produces larger values of with higher probability than standard Monte Carlo sampling. Figure 6 shows the empirical CDFs of computed using standard Monte Carlo sampling and coherence-optimal sampling for with Legendre and Hermite polynomials.

The results for indicate that coherence-optimal sampling provides slightly better -optimal designs. The results of this numerical experiment are consistent with those described in Section 5.2, where we see that coherence-optimal sampling is particularly advantageous for the case of , and in the case of Hermite polynomials standard Monte Carlo sampling performs significantly worse than coherence-optimal sampling. The greatest improvement is seen for Hermite polynomials when .

### 5.2 Manufactured sparse PC expansions

In this problem, we consider PC expansions where the vector of exact coefficients is manufactured. In our examples of manufactured sparse functions, the basis functions are multivariate Hermite polynomials, and the vector of exact coefficients has sparsity . The nonzero coefficient indices are drawn uniformly without replacement from the set and their values are independent identically distributed (i.i.d.) standard normal random variables. For any given , let be the size of the candidate pool, and . Let be the weighted candidate matrix corresponding to the pool of candidate samples which are drawn according to either the standard Monte Carlo or coherence-optimal sampling schemes described in Section 2. The weighted QoIs are constructed as

 v=Wu=W(Ψcc+ϵ), (35)

where , is the th row of , and . In our examples, the noise level .

Cross validation on is performed, even if is known in advance as it is in these examples. If one were to set , then only in the ideal scenario – where the support set estimate is perfect – would the SP or DSP algorithms be able to approximate each of the non-zero coefficients. The reason we should allow is due to the fact that the QoI values, are corrupted with noise. This noise can lead to the inclusion of erroneous indices in , and letting may allow for the correct non-zero coefficients to be estimated even if erroneous indices are included in .

Given the manufactured system from (35), we obtain an approximation of the exact coefficients using the Coh-Opt, D-Coh-Opt, and Seq-D-Coh-Opt strategies. This process is repeated times for each value of , i.e., . For each repetition, we manufacture a new reference vector of coefficients and compute the QoIs corresponding to the candidate pool, then we compute the relative error of the PC approximations as . An oracle solution is also computed for each repetition to gauge the performance of each sampling strategy. This oracle solution is constructed by solving an over-determined LSA using an -point, -optimal design with respect to the exact support set of , and it represents a limit of the solution accuracy. After repetitions are complete, we compute the mean and sample standard deviation of the relative errors.

Figure 9 depicts the mean relative error in approximating the exact coefficient vector for and , both of which correspond to basis functions. Figure 12 shows the standard deviation of the relative errors, and Figure 15 depicts the percentage of the support set which is correctly identified, on average. By each of the three metrics, Seq-D-Coh-Opt shows the best performance of the three strategies, Coh-Opt performs the worst, and D-Coh-Opt shows intermediate performance. In the low-dimension, high-order case, D-Coh-Opt and Seq-D-Coh-Opt show greater improvements in accuracy compared to Coh-Opt than they do in the high-dimension, low-order case. This result is consistent with the findings of Section 5.1.

To demonstrate the benefits of coherence-optimal sampling, we repeat the experiment for , this time using a candidate pool of standard Monte Carlo samples, i.e., sampling according to with . The results of this experiment are presented in Figure 19, where MC denotes standard Monte Carlo sampling with SP, D-MC denotes an -point, -optimal design of Monte Carlo samples with SP, and Seq-D-MC denotes sequential Monte Carlo sampling via DSP. We observe that each of the three strategies fail to construct accurate, stable PC coefficient approximations when standard Monte Carlo sampling is used. This finding is not particularly surprising given the results of Section 5.1, which indicate the standard Monte Carlo sampling for large results in very poor experimental designs for Hermite polynomials. Further, these results are fundamentally different to those presented in Figure 9, where coherence-optimal sampling yields accurate and stable PC coefficient approximations.

### 5.3 A Nonlinear Duffing Oscillator

Here we consider the displacement solution

of a nonlinear single-degree-of-freedom Duffing oscillator

mai2015polynomial ; hadigol2017least under free vibration described by

 ¨u(Ξ,t)+2ω1ω2˙u(Ξ,t)+ω21(u(Ξ,t)+ω3u3(Ξ,t))=0, (36) u(Ξ,0)=1,˙u(Ξ,0)=0,

and with uncertain input parameters such that

 ω1 =2π(1+0.2Ξ1), (37) ω2 =0.05(1+0.05Ξ2), ω3 =−0.5(1+0.5Ξ3),

where . Our QoI is . Although this is a relatively low-dimensional problem, due to the stochastic frequency of oscillations, high-order PC expansions are required to maintain a fixed level of accuracy at large time instances hadigol2017least . We consider for this problem to investigate the performance of our sampling strategies. The pair leads to PC coefficients to be approximated, whereas the pair leads to PC coefficients. For any given , let be the size of the candidate pool and .

Figures 22 and 25 demonstrate the mean and standard deviation of the relative validation errors, respectively.

We see that D-Coh-Opt and Seq-D-Coh-Opt clearly result in more accurate PC coefficient approximations, with less variability, compared to Coh-Opt. Further, these figures demonstrate that Seq-D-Coh-Opt results in slightly more accurate approximations than the non-sequential D-Coh-Opt strategy. These results are in agreement with those presented in Section 5.2. As one might expect, the PC approximations using achieve more accurate results than do those constructed using .

### 5.4 The Wing Weight Function

In this problem, we investigate a wing weight function which models a light aircraft wing. Details of the model and the associated Matlab code are available at https://www.sfu.ca/~ssurjano/wingweight.html. The wing weight model also appears in forrester2008engineering ; moon2010design ; moon2012two as a test model for statistical screening. The model response is the wing’s weight , given by the nonlinear expression

 u(Ξ)=0.036S0.758wW0.0035fw(Acos2(Λ))0.6q0.006λ0.04(100tccos(Λ))−0.3(NzWdg)0.49+SwWp, (38)

where each of the 10 input parameters’ descriptions, ranges, and units are described in Table 1.

In this example we choose which corresponds to basis functions. For any given , let be the size of the candidate pool and