    # On I-Optimal Designs for Generalized Linear Models: An Efficient Algorithm via General Equivalence Theory

The generalized linear model plays an important role in statistical analysis and the related design issues are undoubtedly challenging. The state-of-the-art works mostly apply to design criteria on the estimates of regression coefficients. It is of importance to study optimal designs for generalized linear models, especially on the prediction aspects. In this work, we propose a prediction-oriented design criterion, I-optimality, and develop an efficient sequential algorithm of constructing I-optimal designs for generalized linear models. Through establishing the General Equivalence Theorem of the I-optimality for generalized linear models, we obtain an insightful understanding for the proposed algorithm on how to sequentially choose the support points and update the weights of support points of the design. The proposed algorithm is computationally efficient with guaranteed convergence property. Numerical examples are conducted to evaluate the feasibility and computational efficiency of the proposed algorithm.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The generalized linear model (GLM) is a flexible generalization of linear models that allow the response being related to the predictor variables through a link function. It was first formulated by Nelder and Wedderburn



as a generalization of various statistical models, including linear regression, logistic regression and Poisson regression. The GLMs are used widely in many statistical analysis with different applications including business analytics, image analysis, bioinformatics, etc. From a data collection perspective, it is of importance to understand what is an optimal design for the generalized linear model, especially on the prediction aspects.

Suppose that an experiment has input factors , and let be a measurable set of all possible levels of the th input factor. Common examples of are and . The experimental region, , is some measurable subset of

. In a generalized linear model, the response variable

is assumed to follow a particular distribution in the exponential family, which includes normal, binomial, Poisson and gamma distribution, etc. A link function provides the relationship between the linear predictor,

, and the mean of the response ,

 μ(x)=E[Y(x)]=h−1(βTg(x)), (1)

where are the basis and are the corresponding regression coefficients. Here is the link function, and is the inverse function of .

In this work, we consider a continuous design as:

 ξ={x1,...,xnλ1,...,λn}

with if . The represents the fraction of experiments that is to be carried out at design point and . The Fisher information matrix of the generalized linear model in (1) is:

 I(ξ,β)=n∑i=1λig(xi)w(xi)gT(xi), (2)

where . The notation emphasizes the dependence on the design and regression coefficient .

The design issues of linear models have been well studied due to its simplicity and practicality . The design issues of generalized linear models tend to be much more challenging due to the complicated Fisher information matrix. In GLMs, the Fisher information matrix often depends on the regression coefficient through the function in (2). As most optimal design criteria can be expressed as a function of the Fisher information matrix, a scientific understanding of a locally optimal design is often conducted under the assumption that some initial estimates of the model coefficients are available. Khuri et al. surveyed design issues for generalized linear models and noted that “the research on designs for generalized linear models is still very much in developmental stage. Not much work has been accomplished either in terms of theory or in terms on computational methods to evaluate the optimal design when the dimension of the design space is high.”

Yang and Stufken  , Yang , Dette and Melas , and Wu and Stufken  obtained a series of theoretical results for several optimality criteria with univariate nonlinear models and univariate generalized linear models. On the computational aspect, most literature focuses on the D-optimality. Popular algorithms include modified versions of Fedorov-Wynn algorithm proposed by Wynn  and Fedorov  that updates the support of the design and the multiplicative algorithm proposed by Silvey et al.  that updates the weights of candidate points from a large candidate pool. Chernoff  expressed concerns about the concentration of study on D-optimality, advocating the study of other criteria. Research works on algorithms of finding optimal design under other criteria are sparse. Yang et al.  proposed a multistage algorithm to construct optimal design by sequentially updating the support of the design and optimizing the weights using the Newton’s method in each iteration.

A practical objective of the experimental design often focuses on the accuracy of the model prediction. Under such a consideration, the corresponding criterion, I-optimality, aims at minimizing the integrated mean squared error of the prediction. There are some literatures on I-optimality for linear regression models. Haines proposed a simulated annealing algorithm to obtain exact I-optimal design. Meyer and Nachtsheim used simulated annealing and coordinate-exchange algorithms to construct exact I-optimal design. Recently, Goos et al. proposed a modified multiplicative algorithm to find I-optimal designs for the mixture experiments.

However, based on our best knowledge, there are few literatures on defining and obtaining I-optimal designs for generalized linear models (GLMs). The contribution of this work aims to fill the gap in the theory of I-optimal designs for GLMs and develop an efficient algorithm of constructing I-optimal designs for GLMs. Specifically, we first establish the General Equivalence Theorem of the I-optimality for GLMs. The established theoretical results shed some insightful lights on how to develop efficient algorithms on constructing the I-optimal design for GLMs. The General Equivalence Theorem is well established for linear regression models, but not for GLMs. Kiefer and Wolfowitz  derived an equivalence theorem on D-optimality for linear models, and White  extended the theorem on D- and G-optimalities to nonlinear models. Later, Kiefer  generalized General Equivalence Theorem (GET) to -optimality for linear regression models. Since the I-optimality is not a member of -optimality, the General Equivalence Theorem on I-optimality for GLMs is not thoroughly investigated. After establishing the General Equivalence Theorem of I-optimality for generalized linear models, we further develop an efficient sequential algorithm of constructing I-optimal designs for GLMs. The proposed algorithm is able to sequentially add points into design and update their weights simultaneously by using a modified multiplicative algorithm . We also establish the convergence properties of the proposed algorithm. The multiplicative algorithm, first proposed by Titterington  and Silvey et al., has been used widely in finding optimal designs of linear regression models. The original multiplicative algorithm often requires a large candidate pool and updating the weights of all candidate points simultaneously can be computationally expensive. For our proposed sequential algorithm, we have modified the multiplicative algorithm and use it on a much smaller set of points. The proposed algorithm can significantly enhance the computation efficiency because of the sequential nature of the algorithm. It is worth pointing out that the proposed algorithm can be easily extended to construct optimal design under other optimality criteria.

The remainder of this work is organized as follows. We introduce the I-optimality criterion for the generalized linear model in Section 2. In Section 3, we derive the General Equivalence Theorem on the I-optimality for the generalized linear models. We detail the proposed sequential algorithm and establish its the convergence properties in Sections 4-5. Numerical examples are conducted in Section 6 to evaluate the performance of the proposed algorithm. We conclude this work with some discussion in Section 7.

## 2 The I-Optimality Criterion for GLM

For the generalized linear models, one practical goal is to make prediction of response at given input . Thus it is important to adopt the predication oriented criterion for considering the optimal design. The integrated mean squared error (IMSE) is defined as the expected value of mean squared error between fitted mean response and true mean response over the domain of input with distribution of input as on the experimental region , i.e.,

Under the context of generalized linear models, the fitted mean response can be expanded around the true mean response using Taylor expansion, which is

 ^μ(x)−μ(x)=h−1(^βTg(x))−h−1(βTg(x))≈c(x)T(^β−β),

with . Here is the linear predictor. Then, using the above first-order Taylor expansion, the integrated mean squared error could be approximated with,

 IMSE(ξ,β)=E[∫Ωc(x)T(^β−β)(^β−β)Tc(x)dFIMSE].

In numerical analysis, the first-order Taylor expansion is commonly used to approximate the difference of a nonlinear function when evaluated at two close points. Considering the design issue for generalized linear models, similar approaches was commonly used in the literature, such as the work in Schein and Ungar  for the logistic regression models.

###### Lemma 1.

The could be expressed as the trace of the product of two matrices,

 IMSE(ξ,β)=tr(AI(ξ,β)−1),

where matrix depends only on the regression coefficients and basis functions, but not the design, and the Fisher information matrix , defined in equation (2), depends on both the regression coefficients and the design.

###### Proof.
 IMSE(ξ,β) = E[∫ΩcT(^β−β)(^β−β)TcdFIMSE] = ∫ΩcTE[(^β−β)(^β−β)T]cdFIMSE ≈ ∫Ωtr(ccTI(ξ,β)−1)dFIMSE = tr[(∫ΩccTdFIMSE)I(ξ,β)−1] = tr(AI(ξ,β)−1).

The approximation is provided by the fact that the estimated regression coefficient follows asymptotically. ∎

We call the I-optimality as the corresponding optimality criterion aiming at minimizing

 IMSE(ξ,β)=tr(AI(ξ,β)−1).

A design is called an I-optimal design if it minimizes . The I-optimality for linear regression was first proposed by Box and Draper  back in 1959, and has been studied extensively for decades. The I-optimality for generalized linear models defined above follows a similar setup.

In this article, we focus on local I-optimal design with a given regression coefficient . To simplify the notation, in the rest of the article, is omitted from the notation of the Fisher information matrix, and is used instead. The following lemma shows the convexity of the , which provides a key fact that facilitates the proof of the General Equivalence Theorem in Section 3.

###### Lemma 2.

The is a convex function of the design .

###### Proof.

Consider two designs and , and define . With , is still a feasible design. The Fisher information matrix of could be expressed as

 I(ξ)=(1−a)I(ξ1)+aI(ξ2).

Since the three Fisher information matrices are positive definite and the inverse of positive definite matrix is convex , we have:

 I(ξ)−1⪯(1−a)I(ξ1)−1+aI(ξ2)−1.

Since is symmetric and positive semi-definite, there exists a unique positive semi-definite matrix with . As is symmetric, we have

As a result,

 tr(AI(ξ)−1)=tr(BI(ξ)−1B)≤tr((1−a)BI(ξ1)−1B+aBI(ξ2)−1B)=(1−a)tr(AI(ξ1)−1)+atr(AI(ξ2)−1).

That means . Thus is a convex function of design . ∎

## 3 General Equivalence Theorem of I-Optimality for GLM

In this section, we will derive the General Equivalence Theorem of I-optimality for generalized linear models. The established theoretical results facilitate the sequential choice of support points as that of Fedorov-Wynn algorithm [27, 7].

We first investigate the directional derivative of with respect to . Given two designs and , let the design be constructed as

 ~ξ=(1−α)ξ+αξ′.

Then, the derivative of in the direction of design is

 ϕ(ξ′,ξ)=limα→0+IMSE(~ξ,β)−IMSE(ξ,β)α. (3)
###### Theorem 1.

The directional derivative of in the direction of any design is,

 ϕ(ξ′,ξ)=limα→0+IMSE(~ξ,β)−IMSE(ξ,β)α=tr(I(ξ)−1A)−tr(I(ξ′)I(ξ)−1AI(ξ)−1).
###### Proof.

Given , we have . For any matrix as a function of , the derivative of its inverse can be calculated as . So, the derivative of with respect to can be expressed as,

 ∂[I(~ξ)−1A]∂α=∂I(~ξ)−1∂αA=−I(~ξ)−1[I(ξ′)−I(ξ)]I(~ξ)−1A. (4)

Then, the directional derivative of is

 ϕ(ξ′,ξ) = ∂tr[I(~ξ)−1A]∂α∣∣ ∣∣α=0 = tr[∂I(~ξ)−1A∂α∣∣ ∣∣α=0] = tr(−I(~ξ)−1[I(ξ′)−I(ξ)]I(~ξ)−1A)∣∣α=0 = tr(I(ξ)−1A)−tr(I(ξ)−1I(ξ′)I(ξ)−1A).

###### Corollary 1.

The directional derivative of in the direction of a single point is given as,

 ϕ(x,ξ)=tr(I(ξ)−1A)−w(x)g(x)TI(ξ)−1AI(ξ)−1g(x).
###### Proof.

Consider as as design with one design point whose weight is 1. ∎

Based on Lemma 2 and the above Corollary 1, we can derive the following General Equivalence Theorem of I-optimality for generalized linear models. This theorem provides an intuitive understanding on how to choose support points sequentially and check the optimality of a design.

###### Theorem 2 (General Equivalence Theorem).

The following three conditions of are equivalent:

1. The design minimizes

 IMSE(ξ,β)=tr(AI(ξ)−1).
2. The design minimizes

 supx∈Ωw(x)gT(x)I(ξ)−1AI(ξ)−1g(x).
3. holds over the experimental region , and the equality holds only at the points of support of the design .

###### Proof.
1. :

As proved in Lemma 2, the is a convex function of the design , and since minimizes , the directional derivative for all , i.e.,

 tr(I(ξ∗)−1A)≥w(x)g(x)TI(ξ∗)−1AI(ξ∗)−1g(x),

and equality holds only for support points of .

2. :

This could be proved by contradiction. Suppose that is the minimizer of , but is not the minimizer of . Then there exists a design different from that minimizes

 supx∈Ωw(x)gT(x)I(ξ)−1AI(ξ)−1g(x),

i.e.,

 supx∈Ωw(x)gT(x)I(ξ∗∗)−1AI(ξ∗∗)−1g(x)

As just proved above in (i), holds over the experimental region , we have

 supx∈Ωw(x)gT(x)I(ξ∗)−1AI(ξ∗)−1g(x)≤tr(I(ξ∗)−1A). (6)

Combining the results in (5) and (6), it is easy to see that

 supx∈Ωw(x)gT(x)I(ξ∗∗)−1AI(ξ∗∗)−1g(x)

Recall that the weight of the th point in design is denoted as . Then

 supx∈Ωw(x)gT(x)I(ξ∗∗)−1AI(ξ∗∗)−1g(x) ≥ supxi∈ξ∗∗w(xi)gT(xi)I(ξ∗∗)−1AI(ξ∗∗)−1g(xi) ≥ ∑jλjw(xj)gT(xj)I(ξ∗∗)−1AI(ξ∗∗)−1g(xj) = tr[∑jλjw(xj)g(xj)gT(xj)I(ξ∗∗)−1AI(ξ∗∗)−1] = tr(I(ξ∗∗)−1A)=IMSE(ξ∗∗,β)

Comparing the left and right hand sides of equation (7), we have

 IMSE(ξ∗∗,β)

which contradicts with condition 1. Thus, the design must be the minimizer of .

3. :

This statement can also be proved by contradiction. Suppose the design is the minimizer of , but not the minimizer of . Then there exists a design different from that minimizes , i.e.,

 IMSE(ξ∗∗,β)

and as a result from (i), for all .

Based on Corollary 1 and the assumption that is the minimizer of

 supx∈Ωw(x)gT(x)I(ξ)−1AI(ξ)−1g(x),

we thus have,

 tr(I(ξ∗∗)−1A) ≥ supx∈Ωw(x)g(x)TI(ξ∗∗)−1AI(ξ∗∗)−1g(x) (8) ≥ supx∈Ωw(x)g(x)TI(ξ∗)−1AI(ξ∗)−1g(x).

Since as defined, together with equation (8), we have

 supx∈Ωw(x)g(x)TI(ξ∗)−1AI(ξ∗)−1g(x)≤tr(I(ξ∗∗)−1A)

So, for all in . Together with Corollary 1, for any point in the design , we have

 λitr(I(ξ∗)−1A)>λiw(xi)g(x)TI(ξ∗)−1AI(ξ∗)−1g(xi),i=1,2,...,n.

Summing over all design points in , we have

 tr(I(ξ∗)−1A)>∑iλiw(xi)g(x)TI(ξ∗)−1AI(ξ∗)−1g(xi)=tr(I(ξ∗)−1A),

4. :

Suppose that holds over the experimental region , and the equality holds only at the points of support of the design , then the directional derivative for all , and for all . Thus, by the convexity of , minimizes , and condition holds.

Theorem 2 provides an intuitive way for choosing the support points to construct an I-optimal design in a sequential fashion. According to Theorem 2, for an optimal design , the directional derivative is non-negative for any . It implies that for any non-optimal design, there will be some directions in which the directional derivative . Given a current design , to gain the maximal decrease in the I-optimality criterion, we would choose a new support point to be added into the design, if . Then one can optimize the weights of all support points in the updated design, which will be described in Algorithm 1 in Section 4. By iterating the selection of support point and the weight update of all support points, this two-step iterative procedure can be continued until for all , which means the updated design is I-optimal. Such a sequential algorithm of constructing optimal designs, as described in Algorithm 2 in Section 5, follows similar spirits in the widely-used Fedorov-Wynn algorithm [27, 28]. Note that the I-optimal designs are not necessarily unique. More details will be discussed in Section 6.2 after the proposed algorithms are introduced.

In next section, we will first discuss the details on how to optimize the weights of supports point for a given design. Then we will describe Algorithm 2 in details in Section 5 on the sequential algorithm of constructing I-optimal designs for the generalized linear models.

## 4 Optimal Weights Given the Design Points

In this section, we will investigate how to assign optimal weights to support points that minimize when the design points are given. In Wynn’s work [27, 28], equal weights are assigned to all design points after a new design point is added. Whittle  made improvement by choosing the weight of the newly added point as the one that minimizes the optimality criterion and the weights of existing points proportional to their current weights. Yang et al.  proposed using a Newton-Raphson method to find the optimal weights, which solves for the roots of first order partial derivatives of the criterion with respect to the weights. Note that the Newton-Raphson method may get negative weights and thus more efforts need to be taken to eliminate those negative weights. Also when the first order partial derivatives does not exist, such as under the -optimality criterion, Yang et al.’s algorithm  is not applicable.

To address the above challenges, we develop a modified multiplicative algorithm for finding the optimal weights of support points. The developed algorithm does not require the partial derivative and Hessian matrix inversion, which can be computationally efficient.

### 4.1 Condition of Optimal Weights

Given the design points as fixed, denote

to be the weight vector with

as the weight for the corresponding design point . We write the corresponding design as

 ξλ={x1,...,xnλ1,...,λn}.

A superscript is added to emphasize that only the weight vector is changeable in the design under this situation. The optimal weight vector should be the one that minimizes with design points fixed.

We can still obtain the directional directional derivative of with respect to the weight vector . Consider another weight vector . Then the convex combination with is also a weight vector. Define the directional derivative of in the direction of a new weight vector as:

 ψ(Δλ,λ)=limα→0+IMSE(ξ~λ,β)−IMSE(ξλ,β)α.
###### Lemma 3.

The is a convex function of weight vector .

###### Proof.

Since , we have . Then, the rest of the proof is similar to that of Lemma 2. ∎

###### Lemma 4.

The directional derivative of in the direction of a new weight vector ,

###### Proof.

This is a special case of the general directional derivative in equation 3, where the design points in and are the same, but with different weights. Thus, Theorem 1 still holds and

 ψ(Δλ,λ) = tr(I(ξ)−1A)−tr(I(ξ′)I(ξ)−1AI(ξ)−1) = tr(I(ξλ)−1A)−n∑i=1Δλiw(xi)g(xi)TI(ξλ)−1AI(ξλ)−1g(xi).

The following result provides a necessary and sufficient condition of optimal weight vector to minimize when the design points are fixed.

###### Theorem 3.

Given a fixed set of design points , the following two conditions on the weight vector are equivalent:

1. The weight vector minimizes .

2. , for .

###### Proof.
1. Since minimizes , and is a convex function of weight vector as proved in Lemma 3,

 ψ(Δλ,λ∗) = tr(I(ξλ∗)−1A)−n∑i=1Δλiw(xi)g(xi)TI(ξλ∗)−1AI(ξλ∗)−1g(xi) = n∑i=1Δλi[tr(I(ξλ∗)−1A)−w(xi)g(xi)TI(ξλ∗)−1AI(ξλ∗)−1g(xi)]≥0,

for all feasible weight vector .

Thus, for .

Now we will show that actually,

 tr(I(ξλ∗)−1A)−w(xi)g(xi)TI(ξλ∗)−1AI(ξλ∗)−1g(xi)=0fori=1,...,n.

Suppose there exists at least one , such that

 tr(I(ξλ∗)−1A)−w(xi)g(xi)TI(ξλ∗)−1AI(ξλ∗)−1g(xi)>0.

Then,

 tr(I(ξλ∗)−1A) = n∑i=1λ∗itr(I(ξλ∗)−1A) > n∑i=1λ∗iw(xi)g(xi)TI(ξλ∗)−1AI(ξλ∗)−1g(xi) = tr(I(ξλ∗)−1A),

 tr(I(ξλ∗)−1A)−w(xi)g(xi)TI(ξλ∗)−1AI(ξλ∗)−1g(xi)=0,fori=1,...,n.
2. It is obvious since for all feasible weight vector and is a convex function of weight vector.

###### Remark 1.

The proof of in Theorem 3 can also be derived by the method of Lagrange multipliers.

The results in Theorem 3 provides a useful gateway to design an effective algorithm for finding the optimal weights given the design points.

### 4.2 Multiplicative Algorithm for Optimal Weights

Base on Theorem 3 for given design points, the optimal weight vector that aims at minimizing should be the solution of a system of equations as

 tr(I(ξλ∗)−1A)=w(xi)g(xi)TI(ξλ∗)−1AI(ξλ∗)−1g(xi),i=1,...,n. (9)

The weight of a design point should be adjusted according to the values of the two sides of equation (9). Note that for any weight vector ,

 tr(I(ξλ)−1A)=n∑i=1λiw(xi)g(xi)TI(ξλ)−1AI(ξλ)−1g(xi).

Then, our strategy of finding optimal weight would be, for a design point ,

• if , the weight of should be increased;

• if , the weight of should be decreased.

Thus, the ratio would indicate a good adjustment of the current weight of a design point .

Encouraged by the promising performance of the multiplicative algorithm recently used in Goos and Jones , we consider a modified multiplicative algorithm to find optimal weights based on Theorem 3. The detailed algorithm is described in Algorithm 1 as follows.

In Algorithm 1, the value of is the convergence parameter chosen by the user. Following the suggestion by Fellman  and Torsney , we use . The following lemma provides a justification that every iteration in Algorithm 1 is always feasible.

###### Lemma 5.

A positive definite would ensure the iteration in equation (10) to be always feasible, i.e., the denominator

 n∑i=1λki[w(xi)gT(xi)I(ξλk)−1AI(ξλk)−1g(xi)]δ

is always positive.

Given and