# Ascent with Quadratic Assistance for the Construction of Exact Experimental Designs

In the area of experimental design, there is a large body of theoretical knowledge and computational experience concerning so-called optimal approximate designs. However, for an approximate design to be executed in a practical setting, it must be converted into an exact design, which is usually done via rounding procedures. Although generally rapid, rounding procedures have several drawbacks; in particular, they often yield worse exact designs than heuristics that do not require approximate designs at all. In this paper, we propose an alternative method of utilizing optimal approximate designs for the computation of optimal, or nearly-optimal, exact designs. The proposed method, which we call ascent with quadratic assistance (AQuA), is a hill-climbing method in which the information matrix of an optimal approximate design is used to formulate a quadratic approximation of the design criterion. To this end, we present quadratic approximations of Kiefer's criteria with an integer parameter, and we prove the low-rank property of the associated quadratic forms. We numerically demonstrate the generally superior performance of AQuA relative to both rounding procedures and standard heuristics for a problem of optimal mixture experimental design as well as for randomly generated regression models.

## Authors

• 3 publications
• 6 publications
02/25/2021

### Efficient computational algorithms for approximate optimal designs

In this paper, we propose two simple yet efficient computational algorit...
09/07/2017

### Computing optimal experimental designs with respect to a compound Bayes risk criterion

We consider the problem of computing optimal experimental design on a fi...
04/18/2018

### E- and R-optimality of block designs for treatment-control comparisons

We study optimal block designs for comparing a set of test treatments wi...
09/20/2018

### Locally D-optimal Designs for a Wider Class of Non-linear Models on the k-dimensional Ball with applications to logit and probit models

In this paper we extend the results of Radloff and Schwabe (2018), which...
11/15/2019

### Weak approximate unitary designs and applications to quantum encryption

Unitary t-designs are the bread and butter of quantum information theory...
12/22/2020

### Refined bounds for randomized experimental design

Experimental design is an approach for selecting samples among a given s...
08/10/2018

### On the optimal designs for the prediction of complex Ornstein-Uhlenbeck processes

Physics, chemistry, biology or finance are just some examples out of the...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Suppose that we intend to perform an experiment consisting of a set of trials. Assume that the observed response in each trial depends on a design point chosen from a finite design space . For instance, may be the set of all permissible combinations of levels of several discrete factors.

An “exact” design (ED) is a selection of design points, not necessarily distinct, to be used for individual trials. We will formalize an exact design

as an integer-valued vector

,111The symbols , , , and denote the sets of real, non-negative real, natural, and non-negative integer numbers, respectively. where , called the -th weight, represents the number of trials to be performed at the design point , .222Therefore, we implicitly we assume that reordering of the trials does not influence the experimental design. An “approximate” design (AD), , is more generally allowed to have non-negative components, which means that the weight is a continuous relaxation of the integer number of trials to be performed at , .333Approximate designs are sometimes also called “continuous” designs. Thus, an AD must be converted into an ED prior to its application in a real experiment.

Let be a set of permissible EDs, where is the required size of the experiment. Suppose that the information gained from an experiment based on can be represented by a matrix . For instance, may be the Fisher information matrix for the unknown parameters of an underlying statistical model. In optimal experimental design, it is usual to select a concave function with a target set to quantify the information content of . Such an optimality criterion allows an experimenter to compare different designs and, in principle, to select a -optimal ED, i.e., a design that maximizes444Alternatively, it is possible to select a convex criterion such that can be interpreted as a loss from the experiment that depends on the design . In this case, the optimal design would minimize over . Note that experimental design criteria also exist that do not depend on the design via its information matrix. We will not discuss such criteria in this paper. over the discrete set . In many cases, the information matrix can be consistently extended to ADs. Then, maximizing over the convex set results in the so-called optimal AD.

The construction of optimal EDs is typically a difficult problem of discrete optimization. There are two general approaches to computing optimal or nearly-optimal EDs (see Mandal et al. (2015) for a recent survey):

1. Convex computational methods, in which an optimal AD is first determined and a process called “rounding” is then used to obtain an ED;

2. Direct computational methods, including complete or partial enumeration methods, as well as various heuristics, such as exchange methods, tabu-search methods, simulated annealing and genetic algorithms.

It is generally much simpler to determine an optimal AD than an optimal ED, both analytically and computationally. Therefore, a large part of the literature is concerned only with approximate designs (e.g., Pukelsheim (2006), Pázman (1986), Atkinson et al. (2007)). Although ADs cannot be directly used for conducting experiments, relatively little attention has been paid to their conversion into efficient EDs.

The standard methods for converting an AD into an ED are called rounding algorithms. A rounding algorithm begins with an AD and extracts a set of positive weights, where is the support of and is the size of the support. Then, typically via simple rules, the algorithm converts into a rounded set such that and and, finally, transforms the set of rounded weights into an ED that belongs to .

The first notable rounding method was suggested by Kiefer (1974), who formulated the rounding problem as the minimization of the maximum of the difference between the exact and approximate design weights. By using techniques similar to those applied in voting apportionment, Pukelsheim and Rieder (1992) arrived at a criterion-independent rounding algorithm known as efficient rounding (ER). This method was further improved by Dorfleitner and Klein (1999), who considered a larger class of rounding functions. More recent proposals include randomized rounding heuristics, e.g., proportional and pipage rounding, as well as incremental rounding, and bounds on the approximation ratios of the resulting designs have been presented (see Bouhtou et al. (2010) and Sagnol (2013)). However, these methods are only applicable if the criterion function is submodular (e.g., -optimality).

ER and its variants, although prevalent to this day, have several major drawbacks. First, for any positive coordinate of the initial AD, the value of the corresponding coordinate of the resulting ED is forced to be at least . This implies the restriction , which can completely prevent the application of ER in some cases. Meanwhile, from the opposite perspective, if a design point is not present in the support of the AD, then ER cannot add a corresponding design point into the resulting ED. Moreover, ER does not account for any design criterion nor any underlying statistical model, although it is based on an optimal AD, which itself strongly depends on the adopted criterion and model. In addition, for many statistical models, an infinite number of optimal ADs exist, and it is unclear which of them should be used for the rounding operation. All of these disadvantages generally make approach (ii) preferable to approach (i) in practice; see, e.g., the practical examples in Goos and Jones (2011).

Harman and Filová (2014) proposed a substantially different approach to the use of an optimal AD for ED construction. It is based on a second-order approximation of the -criterion in the neighborhood of the -optimal approximate information matrix, and to arrive at an ED, it employs enumeration solvers for integer quadratic programming. However, although this method is useful for small-size problems and especially for problems with non-standard linear constraints on the design weights, the use of mathematical programming solvers often takes an unacceptably long time.

Alternatively, it is also possible to compute EDs via direct heuristics, such as the KL exchange algorithm (Atkinson et al. (2007), Chapter 12). This method does not require any information from an optimal AD. Instead, it searches for an ED by greedily generating an -point design and attempting exchanges between the support points and other points from the design space in the hope of improving the value of the criterion function.

The paper is organized as follows: In Section 2, we present two versions of Kiefer’s -criteria, and subsequently, in Section 3, we demonstrate how to compute quadratic approximations of these criteria and how to speed up these computations by updating low-rank quadratic forms. This leads to the main result of this paper, a hill-climbing algorithm in which the information matrix of an optimal AD is used to formulate a quadratic approximation of the design criterion; the proposed algorithm is presented in Section 4. Section 5 provides various notes on the application of the proposed method. In particular, we briefly discuss how the method can also be used in the situations with continuous design spaces. Finally, Section 6 presents numerical comparisons of our algorithm with ER and the KL exchange algorithm when applied to two problems: an -optimal design for a mixture model and a -optimal design for a class of random models.

## 2 The model and Kiefer’s criteria

For a trial in , , the observed response is an

-dimensional random vector that is assumed to satisfy the linear regression model

, where is a vector of unknown parameters and

is a known matrix. For different observations, the observation errors are assumed to be independent and identically distributed with a finite and non-zero variance. The information matrix associated with a design

on , either exact or approximate, is , where the , , are non-negative definite matrices with dimensions of .666For brevity, we will henceforth use , , and to denote the sets of all symmetric, non-negative definite and positive definite matrices, respectively.

Let be a continuous optimality criterion that attains its smallest possible value for singular non-negative definite matrices. Note that an exact -optimal design exists because is finite and non-empty. Since is a non-empty compact set, an approximate -optimal design is also guaranteed to exist. If is a -optimal approximate (exact) design, then is called a -optimal approximate (exact) information matrix. We will assume that there exists a such that is non-singular, which means that both the approximate and exact -optimal information matrices are non-singular. We will also assume that is twice differentiable in and that there exists a version777By two versions of a criterion, we mean two criteria that induce the same ordering on . of that is strictly concave in the optimal approximate information matrix . This assumption is satisfied for most models and standard optimality criteria, and it implies that the -optimal approximate information matrix is unique.

All properties stated above are satisfied by Kiefer’s criteria, which are commonly used in practice. In the optimal design literature, several versions of Kiefer’s criteria for -optimality appear, and usually, the choice of the particular version does not affect the strength of the theoretical or computational results. However, it turns out that in general, criterion-approximation methods do depend on the particular version of the criterion that is chosen. Therefore, we will consider two concave versions of criteria, as follows.

The “positive” version (cf. Pukelsheim (2006)): For and , let

 Φ+p(M)=(1mtr(M−p))−1/p (1)

and for a singular matrix . In particular, for , we obtain the criterion of -optimality. The corresponding criterion of -optimality is defined as for all .

The “negative” version (cf. Pázman (1986), Section IV.2.7): For and , let

 Φ−p(M)=−(1mtr(M−p))1/p (2)

and for a singular matrix . In particular, is a version of the criterion, and the corresponding criterion is for or for a singular .

Another commonly used concave version of the -optimality criterion is for all (); cf. Pázman (1986). This is the version of -optimality used in Harman and Filová (2014).

Note that both versions of the criterion are smooth on the set of positive definite matrices, and the gradients are

 ∇MΦ±p(M)=±Φ±p(M)tr(M−p)M−p−1.

It is customary to evaluate the quality of a design with respect to the optimal AD . Let be the optimal AD, and let be a non-negative, positively homogeneous optimality criterion that is not constantly equal to zero (these conditions are satisfied by the positive version of Kiefer’s criteria). Then, the -efficiency of a design is defined as .

## 3 Quadratic approximation of the design criterion

Suppose that we have a quadratic approximation of a concave design criterion in the neighborhood of . Our experience shows that in most optimal design problems, the ordering on that is induced by largely coincides with the ordering induced by the original criterion . At the same time, the quadratic approximation criterion can be evaluated (or updated) much more rapidly than itself. This can be exploited for various efficient methods of pre-selecting candidate designs for iterative computational algorithms. One possible pre-selection method constitutes the key component of the AQuA algorithm, which we make explicit in Section 4.

Because the mapping is linear, is a quadratic function on ; i.e., for some and , where

 ϕh,Q(ξ)=hTξ−ξTQξ,ξ∈ΞAN. (3)

For a general criterion, there are several possible ways of constructing the appropriate vector and matrix , for instance, through the use of standard numerical differentiation techniques. However, for Kiefer’s criteria with , it is simple to derive analytical forms for and , as we show next.

### 3.1 Quadratic approximations of Kiefer’s criteria

Let denote the approximate -optimal information matrix,888Note that the approximate optimal information matrix with respect to and is non-singular for any . and let be twice differentiable in . Then, a second-order Taylor approximation of in terms of can be written as follows (see, e.g., Dattorro (2010), Appendix D)

 Φ(M)≈Φ(M∗)+∂Φ(M∗,M−M∗)+12∂2Φ(M∗,M−M∗), (4)

where denotes the directional derivative at the point in the direction , i.e., , and denotes the second directional derivative at the point in the direction , i.e., , with denoting the gradient with respect to .

For as defined in (1) with , we obtain

 ∂Φ+p(M∗,M−M∗)=Φ+p(M∗)tr(M−p∗)tr(M−p−1∗(M−M∗))

and

 ∂2Φ+p(M∗,M−M∗)=Φ+p(M∗)tr(M−p∗)[(p+1)tr2(M−p−1∗M)tr(M−p∗)−Fp(M∗,M,M)],

where

 Fp(M∗,M1,M2)=p+1∑r=1tr(M−r∗M1M−p−2+r∗M2).

Note that in particular, and
. According to (4), we have , where is the second-order approximation of the criterion of -optimality and is given by

 Φ+pQ(M)=Φ+p(M∗)tr(M−p∗)(tr(M−p−1∗M)+p+12tr2(M−p−1∗M)tr(M−p∗)−12Fp(M∗,M,M)). (5)

We will call the criterion of -optimality.

Similar computations can be performed for , , as defined in (2), leading to the quadratic approximation

 (6)

We remark that it is also possible to compute based on the formulas for the Hessian of a modified version of that were derived by Yang et al. (2013) (for integer values of ) and Stufken and Yang (2012) (for ). We will call the criterion of -optimality.

Formally, the problem of -optimal exact design is expressed as

 maxξϕh,Q(ξ),subject toξ∈ΞEN. (7)

Consider an exact design with the information matrix . Clearly, , and for ,

 Fp(M∗,M,M)=n∑i,j=1ξiξjFp(M∗,Hi,Hj);

therefore, the maximization of over is equivalent to the integer quadratic optimization problem expressed in (7), where has the components , , and the matrix has the elements999Note that the matrix is symmetric, as is the matrix defined below, because for the symmetric non-negative definite matrices , , , and .

Similarly, the maximization of over is equivalent to the integer quadratic optimization problem expressed in (7), where has the components , , and the matrix has the elements

### 3.2 Efficient updating of low-rank quadratic forms

We can use quadratically approximated design criteria in combination with many standard optimal design algorithms (e.g., Atkinson et al. (2007), Dykstra (1971), Haines (1987)). To do so, we must be able to rapidly compute the values of the quadratic function given in (3) for a fixed vector , a fixed matrix , and a multitude of designs , as required by the algorithm.

The ability to efficiently numerically evaluate multivariate quadratic functions of the form generally depends on various specifics of the problem at hand, the known theoretical properties of and , the selected optimization algorithm, and the available hardware. Here, we will consider problems that are typical of optimal experimental design. In particular, we will assume that is a small number (usually less than ), whereas is a much larger number, possibly ranging from the order of tens to hundreds of thousands or more.

We will focus on the possibility of maximizing using heuristics that sequentially search a set of designs by adding, removing, or changing the position of a single trial, i.e., adding , subtracting or adding to or from the current design , where .

Let the function be based on the quadratic approximation of an optimality criterion defined on the set of information matrices. That is, , where is a quadratic function of the elements of . For a design , the most problematic part of computing is the evaluation of the quadratic form for the matrix , because is often very large. However, as we will show, we can construct a matrix with dimensions of , where , such that . Importantly, we can construct without computing ; i.e., we can completely avoid working with potentially enormous matrices.

To this end, let the function be represented in the form

 ΦQ(M)=~hTvech(M)−(vech(M))T~Qvech(M),

where is an -dimensional vector and is an non-negative definite matrix. Let be a unique duplication matrix that relates the and operators101010The symbols and denote the vectorization and half-vectorization of a matrix, respectively.; i.e., . Then, the versions of Kiefer’s criteria defined in the previous section can be represented using Theorem 16.2.2. from Harville (1997) and the formulas

 tr(NM) = tr2(NM) = vech(M)TGTmvec(N)(vec(N))TGmvech(M), tr(N1MN2M) = vech(M)TGTm(N2⊗N1)Gmvech(M),

which are valid for all ; thus, we obtain

where and

 ~Q+p = GTm[−1+p2vec(M−p−1∗)(vec(M−p−1∗))Ttr(M−p∗)+12p+1∑r=1M−p−2+r∗⊗M−r∗]Gm, ~Q−p = GTm[1−p6vec(M−p−1∗)(vec(M−p−1∗))Ttr(M−p∗)+16p+1∑r=1M−p−2+r∗⊗M−r∗]Gm.

Next, we construct a decomposition such that the matrix is of rank ,111111Note that can be a singular non-negative definite matrix; therefore, can be even smaller than ., using, for instance, the Cholesky algorithm or Schur decomposition. We have

 ϕh,Q(ξ) = ΦQ(n∑i=1ξiHi) (8) = n∑i=1ξi~hTvech(Hi)−n∑i=1n∑j=1ξiξj(vech(Hi))T~Qvech(Hj) = ξT(H~h)T−n∑i=1n∑j=1ξiξj(H~C~CTHT)i,j=hTξ−∥\SSTξ∥2,

where is an matrix, , and . Equation (8) allows us to compute without evaluating and storing . An added benefit is that with the use of , can be rapidly evaluated; for instance, the exchange step in an optimal design algorithm can be performed based on the equation

 ϕh,Q(ξ+el−ek)=ϕh,Q(ξ)+hl−hk−2(\SSTξ)T[\SSl⋅−\SSk⋅]−∥\SSl⋅∥2+2(\SSl⋅)T\SSk⋅−∥\SSk⋅∥2, (9)

where is updated as follows: . If the values of , , are precomputed and stored in memory, then each update involves only multiplications and subtractions or additions.

## 4 Ascent with Quadratic Assistance

The results of the previous section enable us to formulate a method for computing efficient exact designs using the information matrix of an optimal AD. The general idea of the procedure, which is outlined in Algorithm 1, is based on computing the quadratic approximation of the optimality criterion in terms of the optimal approximate information matrix, because this is easier to evaluate than the original criterion, which is generally a complicated nonlinear function of the design. The computation speed is further improved by the low-rank property of the quadratic forms that arise from the approximation, as seen in (8).

From the technical perspective, the algorithm starts by fully randomly generating an -point exact design, in contrast to the KL algorithm, in which only a few points are generated randomly and the rest are computed using a forward greedy heuristic. The model and the optimal approximate information matrix are used to compute , which is a quadratic approximation of in .

Then, in every iteration, we find the design that maximizes in some neighborhood . The choice of depends on the method used. For instance, if we apply the “KL exchange” principle, is the set of all designs formed from by exchanging trials between a set of trials from the support of and a set of trials from the entirety of . The neighborhood of in is chosen to maximize a function on , which is taken to be the gradient of , or, equivalently, the vector of directional derivatives in the directions of the singular one-point designs . In this step, searching for the best design using the function instead of the original criterion function is the most important factor responsible for the steep increase in the number of iterations that AQuA can complete in a given time (see Section 6). The computation of AQuA is restarted after the iteration process has terminated, i.e., when the algorithm stops at a local optimum. These restarts are performed until the allotted time has expired; the output of the algorithm is then the exact design with the best value of encountered throughout the computation.

Note that a good initialization may be important because a poor initial design may cause the quadratic ascent process to immediately terminate. However, from empirical experience, it seems that completely uniformly random initialization is fastest, is sufficient, and permits a fair comparison of AQuA with the KL algorithm. The large degree of “exploration” provided by random initial designs seems to compensate for the larger degree of “exploitation” achieved when the initial design is constructed via a greedy forward procedure.

The general structure outlined in Algorithm 1 allows for several modifications. First, the quadratic approximation is not unique, and one can experiment to find the most suitable one (see Section 5.3). Specifically, for the -optimality criterion, the model can be pre-orthogonalized, as shown in Section 5.2. It is also possible to skip the check for monotonicity with respect to the original optimality criterion , i.e., replace the condition in Algorithm 1 with . This change may lead to a violation of the monotonicity of the algorithm with respect to , but it makes the exchanges even more rapid. In some cases, this may lead to faster identification of an efficient design. Finally, various analogues to the so-called variance (or sensitivity) function can be used; see, e.g., Atkinson et al. (2007). This function is used to select support points for potential removal from and points from for potential addition to the exact design.

Optimal approximate designs can be computed very rapidly using modern methods (see, e.g. Yu (2011), Yang et al. (2013), Harman et al. (2018)). Importantly, the pre-computation of an optimal AD (or its information matrix) is very useful even if we are using only standard exact design heuristics that do not use ADs; this is because an optimal AD allows us to find a lower bound on the efficiency of the EDs produced by such a heuristic, which can be very informative.

In a sense, AQuA is a modification of the KL exchange algorithm; however, the same idea can be used in a similar way to improve many other procedures for computing exact designs.

Similarly to the KL algorithm, the proposed algorithm can be used without any special enumeration solvers, making it easy to implement in various environments. It can be used for most combinations of , and , unlike ER, which is applicable only when and is vastly inefficient for small (see Section 6), and unlike enumeration solvers, which produce results in a reasonable amount of time only for small models (e.g., ).

AQuA is also robust with respect to imperfections in the computation of ; this can prove to be greatly beneficial in cases when an optimal AD cannot be computed with high accuracy.

The proposed method can be used in combination with any criterion that is twice differentiable with respect to the optimal approximate information matrix. This fact extends its applicability beyond Kiefer’s criteria with an integer parameter (note also that there is no need to use criteria with explicit formulas such as those for Kiefer’s criteria). In particular, the method can also be used to compute

-optimal designs and designs with respect to criteria for the estimation of linear parameter subsystems, provided that the criterion used is twice differentiable in the optimal approximate information matrix.

### 5.1 Continuous design spaces

In some application, it is natural to use a continuous design space , instead of a finite one. This is typical of factor experiments, if the levels of each factor can be any real number in a given interval, without a practical discretization. In such cases, AQuA cannot be directly applied121212Of course, the same is true for the multitude of other popular design algorithms which work only on finite design spaces.. However, a straightforward strategy is to first apply AQuA to a finite grid in , and then use its result as an initial design for any constrained continuous optimization method which adjusts the positions of the support points within . Note that the search for optimal positions of design points in a continuous space is generally a highly non-convex problem, and a good initial feasible solution can make a crucial difference.

For instance, consider the full quadratic model with factors (i.e., parameters) in the range , and let us choose the criterion of -optimality. For the application of AQuA, we discretized each factor to equidistant levels, leading to design points. We compared the following two approaches.

1. We uniformly independently randomly chose design points in to initiate a continuous optimization algorithm based on the box-constrained BFGS method.131313

An attractive and possibly more efficient method would be the particle swarm optimization, see, e.g.

Mandal et al. (2015), Section 6.3. More precisely, we cyclically fine-tuned the positions of individual design points within until the convergence to a local optimum. Within allotted seconds, the algorithm could be re-run approximately times.

2. The same as Approach 1, but the initial design was computed by a seconds run of AQuA. In the time span of seconds, this approach produced designs.

The criterion values of the best designs obtained by Approach 1, and of all designs obtained by Approach 2 are visualized in Figure 1. Clearly, Approach 2 is superior to Approach 1. The best design obtained by AQuA on the grid (-criterion value ), and the best design on the continuous design space obtained by Approach 2 (-criterion value ) is depicted in Figure 2.

There are many other theoretical results and computational strategies related to using algorithms for finitely supported designs for the case of a continuous design spaces, see for instance Yang et al. (2013), Section 3.2 and Stufken and Yang (2012), Section 6. Most of these results relate to the approximate design theory. A detailed study of the application of finite-space algorithms to the computation of continuous-space exact designs requires a separate research.

### 5.2 Orthogonal reparametrization of the model for D-optimality

It is worth noting that the -optimal design does not change under non-singular linear reparametrizations of the underlying model. Specifically, if is the (known and non-singular) -optimal approximate information matrix, we can perform the following transformation (which can be called orthogonalization, see, e.g., Harman (2008), Section 3.5):

 ~Hi:=M−1/2∗Hi,i=1,…,n.

In the new model, the optimal approximate information matrix is . Note that for and , the formulas for have the following simple form:

 ~Q+p = ~Q−p = GTm[16mvec(Im)(vec(Im))T+16Im2]Gm.

Although it can be shown that the quadratic approximation of the -optimality criterion does not change after regular reparametrization, we have observed that for some models, the computation of the matrix is more numerically stable when the model is orthogonal.

### 5.3 Quadratic approximations of different versions of the same criteria

We can regard Kiefer’s criteria and as part of a larger class of concave optimality criteria: for , , and , we define

 Φ(γ)p(M):=1+γ2Φ+p(M)+1−γ2Φ−p(M), (10)

where we set . Thus, and for all . Clearly, is a concave version of the same criterion for all .

Note that for (i.e., -optimality) and in particular, we obtain

 Φ(0)DQ=det(M∗)m(−3m2+2tr(M−1∗M)−12tr[(M−1∗M)2]), (11)

which corresponds to the quadratic approximation of the version of the -optimality criterion with the form , as used in Harman and Filová (2014). Hence, the formulas for and , which are analogous to the formulas for , , and , have the following forms:

 ~h00=2GTmvec(M−1∗)

and

 ~Q00=GTm[12mM−1∗⊗M−1∗]Gm.

From the above, one can see that different versions of the same criterion can lead to different quadratic approximations, with slight differences in the resulting efficiency. However, analyzing the performance of these versions is beyond the scope of this paper.

### 5.4 Generalization of I-optimality and its conversion into A-optimality

As mentioned in Section 2, the case of in Kiefer’s criteria (1) and (2) corresponds to -optimality. Hence, it is easy to see that the matrices for the -optimality criterion take the following form:

 ~Q+1 = GTm[vec(M−2∗)(vec(M−2∗))Ttr(M−1∗)+M−1∗⊗M−2∗]Gm, ~Q−1 = 13GTm[M−1∗⊗M−2∗]Gm.

The results for -optimality can be easily adapted to compute -optimal designs.141414Such designs are also sometimes called - or -optimal designs (see Section 10.6 in Atkinson et al. (2007)). Standard -optimal designs are applied to models with one-dimensional observations (), and they minimize the integral of the variances of the BLUEs of the response surface over a region with respect to some measure. We will generalize the notion of -optimal design to potentially multivariate observations and show that -optimal designs are -optimal in a transformed model, giving us the possibility to use the theory and algorithms developed for -optimality.

Let be a measurable set representing a region of prediction interest, and let be a measure on . Suppose that for each , there is a matrix such that represents the total variance of the response surface estimator in , provided that the information matrix for the parameters is . For a positive definite , we can define a (generalized) -optimality criterion

 ΦI(M)=−∫x∈Ytr(M−1V(x))dη(x)=−tr(M−1L),

where , and for a non-regular , we can set . Suppose that is non-singular. Then, clearly, a design is -optimal if and only if it is -optimal in the model given by the elementary information matrices . The standard situation corresponds to , , , and being a uniform measure on .

### 5.5 Optimal design augmentation

The methods developed in this paper can also be applied to the problem of optimal design augmentation. Indeed, suppose that a preliminary experiment has been performed. For the -optimal -point augmentation of , the objective is to find a design that maximizes .

However, note that ; therefore, the problem of -optimal design augmentation is simply a special case of the standard problem of -optimal design with the elementary information matrices

 ~Hi:=Hi+1NM(ξ0), i=1,…,n.

### 5.6 Preliminary reduction of the design space

It turns out that one of the major factors affecting the complexity of the optimal design problem is the size of the design space. Therefore, to reduce the complexity, we can attempt to reduce the design space, and this reduction can also be informed by the optimal approximate information matrix , similarly to the quadratic criterion approximation. More precisely, consider the directional derivative of at the point in the direction :

 ∂Φ(M∗,N)=tr(∇MΦ(M∗)(N−M∗)),

where is the gradient of at . Define the essential support as the set of all points , , in such that the directional derivative is maximal, i.e., equal to . Clearly, the essential support is uniquely defined, and a consequence of the general equivalence theorem (Kiefer (1974)) for -optimality is that it contains the support of every optimal AD. Hence, it is reasonable to expect that for all from the support of an optimal ED, the directional derivative in the direction of is also equal, or very close, to . Thus, our suggestion is to select a number and restrict the design space to

 (12)

Note that if , then , whereas if , then is equal to the essential support of the optimal approximate designs. Therefore, serves as a tuning parameter for the magnitude of the restriction. However, this design space restriction should be informed by the deletion rules (see Harman and Pronzato (2007) and Pronzato (2013)), especially in the case when only a near-optimal approximate design is available, and one should be careful not to delete support points of the optimal AD.

## 6 Numerical comparisons

In this section, we demonstrate the performance of the basic procedures in the R computing environment (R Development Core Team (2011)). The examples were computed on a 64-bit Windows 7 system with an Intel Core i5-2400 processor at 3.10 GHz and 4 GB of RAM. The code is available at

### 6.1 Exact I-optimal designs for a mixture model

To illustrate the performance of AQuA, consider a mixture of five components with a typical ratio of 1:1:1:1:1. The experimenter wishes to determine the change in the response that occurs upon a slight modification of the percentages of the components in the mixture. Suppose that the allowed range for each component is between 10 and 30% in increments of 1%. The response can be modeled by a quadratic Scheffe mixture model given by

 E(Y(x))=5∑j=1βjxj+∑u

where , , . Hence, the model contains unknown parameters, and, as can be easily calculated, the dimensionality of the design space is .

We wish to compute exact -optimal designs (see Section 5.4), i.e., designs that minimize the integrated variance of the response surface estimation over the region . For a more detailed treatment of -optimal designs in mixture experiments, see Goos et al. (2016). The results of the efficient rounding (ER) and KL exchange (KL) algorithms are considered for comparison with those of AQuA+ and AQuA-, based on the quadratic approximations of positive and negative version of the -optimality criterion, respectively.

The AQuA+, AQuA-, and KL algorithms were run 3 times for 200 seconds each, and the time profiles of the efficiencies of the resulting exact designs were recorded. The optimal AD was precomputed prior to running the comparisons, and this optimal result was used to compute the efficiencies of the exact designs and as an input to ER and AQuA. Figure 3 shows the results for two design sizes: and . Note that in both cases, both versions of AQuA show uniformly superior performance compared with the KL exchange algorithm, and this difference is more pronounced for . Because the ER algorithm is independent of both the model and the criterion, it yields the resulting exact design almost immediately. However, the -optimal approximate design in this model is supported on more than 30 points. Thus, for , the application of the ER is impossible.

### 6.2 Exact D-optimal designs for random models

For a more comprehensive comparison, we have also compared the performances of the ER algorithm (where applicable), the KL exchange algorithm and the two versions of the AQuA algorithm for the -optimality criterion on four models with regressors independently and randomly drawn from the

-dimensional normal distribution

. The dimensions of the models are detailed in Table 1.

The results depicted in Figure 4 show that when the design size is small, KL fares slightly better than AQuA for larger models. However, as increases, KL becomes considerably slower, whereas AQuA finds exact designs with very high efficiency in a relatively short time (see the third column of Figure 4), and the results of ER slightly improve. Additionally, it is not always possible to obtain results from the ER, as in several cases, the -optimal approximate design is supported on more than points.

Upon closer inspection, the results show several favorable properties of AQuA. First, AQuA yields exact designs with high efficiency compared with those found using the competing KL and ER methods, and the difference grows with increasing and . Second, the initial improvements in AQuA are very fast compared with those in KL. This can be especially beneficial in the case of very large , for which the forward phase of KL is prohibitively time-consuming. Note that the performances of all algorithms are only slightly affected by the magnitude of (compare the results for models R1 and R2 and those for models R3 and R4). Furthermore, the iterations of AQuA yield a non-decreasing sequence of criterion values, provided that we perform the monotonicity check (this check is usually not necessary to obtain good results, although it might be dangerous to omit it). Note also that although pre-computing the optimal AD is necessary, unless it is already known (e.g., for linear and quadratic models in conventional design regions), the method itself depends only on the approximate optimal information matrix, not on the choice of the optimal design, which is often not unique (in contrast to ER).

In general, if the size of the design space is very small, e.g., less than , one should use an enumeration method, either complete (in the case of very small problems) or partial, such as the one based on MISOCP (Sagnol and Harman (2015)). If the design space is of moderate or large size, it is usually infeasible to use an enumeration method, and one is forced to use a heuristic. These comparisons show that AQuA can be recommended in a majority of cases, perhaps with the exception of very small , for which KL is preferable. Indeed, in some rare pathological models, mainly those with small , the design that maximizes the quadratic approximation function might be inefficient with respect to the original criterion, although if the monotonicity check is applied, the quality of the initial design will not deteriorate. Finally, as is true with all heuristic procedures, it is recommended to use the resulting design as a starting point for an alternative heuristic to see whether it can be further improved. For the most part, constructing efficient exact designs requires a toolbox of diverse algorithms based on differing principles and the ability to combine them in a creative way, as there is no universal approach that is preferred in all cases.

## 7 Conclusions

We show that the method of efficient rounding presented in Pukelsheim and Rieder (1992), which is widely used in optimal experimental design, is sometimes a very inefficient means of solving the problem at hand, although its results will always be meaningful; for instance, it can generate a good initial design for further improvement. Moreover, if the support of the approximate design contains more points than the required size of the experiment, the ER cannot be applied at all. Similarly, using the pure KL exchange heuristic without information from an approximate design is sub-optimal, especially for large . Using discrete optimization tools informed by an approximate design is a much better and more efficient approach.

We propose to modify optimal design heuristics by employing an approximation of the criterion function in the form of a quadratic function. To this end, we provide quadratic approximations for two versions of -optimality criteria, which, for and , lead to well-known and widely used criteria of - and -optimality as well as, after a simple transformation, a criterion of -optimality. The suggested simplifications are proven to be beneficial for design problems with various combinations of the design space size , the number of unknown model parameters and the design size .

Note that the proposed method presumes knowledge of the optimal approximate information matrix. However, because the algorithms for computing approximate optimal designs are well developed and very fast, this is not considered to be a drawback. Moreover, there is a large body of literature that provides theorems that explicitly yield optimal approximate designs. With the efficient rounding procedure alone, the practical value of approximate designs is weaker since direct heuristic computational methods can often find more efficient practical designs, entirely circumventing approximate design theory and computational methods. We show that approximate designs carry much more useful information for the construction of exact designs than is utilized by efficient rounding.

Acknowledgments The work was supported by Grant Number 1/0521/16 from the Slovak Scientific Grant Agency (VEGA).

## References

• Atkinson et al. (2007) Atkinson AC, Donev AN, Tobias RD (2007): “Optimum Experimental Designs, with SAS”, Oxford University Press, Oxford
• Bouhtou et al. (2010) Bouhtou M, Gaubert S, Sagnol G (2010): ”Submodularity and randomized rounding techniques for optimal experimental design.” Electronic Notes in Discrete Mathematics 36, pp. 679-686.
• Cook and Thibodeau (1980) Cook, RD, Thibodeau LA (1980): ”Marginally restricted D-optimal designs.” Journal of the American Statistical Association 75.370, pp. 366-371.
• Dattorro (2010) Dattorro, J (2010): ”Convex optim