# Random Norming Aids Analysis of Non-linear Regression Models with Sequential Informative Dose Selection

A two-stage adaptive optimal design is an attractive option for increasing the efficiency of clinical trials. In these designs, based on interim data, the locally optimal dose is chosen for further exploration, which induces dependencies between data from the two stages. When the maximum likelihood estimator (MLE) is used under nonlinear regression models with independent normal errors in a pilot study where the first stage sample size is fixed, and the second stage sample size is large, the Fisher information fails to normalize the estimator adequately asymptotically, because of dependencies. In this situation, we present three alternative random information measures and show that they provide better normalization of the MLE asymptotically. The performance of random information measures is investigated in simulation studies, and the results suggest that the observed information performs best when the sample size is small.

## Authors

• 1 publication
• 4 publications
• 1 publication
• ### The Effects of Adaptation on Inference for Non-Linear Regression Models with Normal Errors

In this work, we assume that a response variable is explained by several...
12/10/2018 ∙ by Nancy Flournoy, et al. ∙ 0

• ### Adaptive dose-response studies to establish proof-of-concept in learning-phase clinical trials

In learning-phase clinical trials in drug development, adaptive designs ...
02/20/2021 ∙ by Shiyang Ma, et al. ∙ 0

• ### Sample Size Re-estimation Design in Phase II Dose Finding Study with Multiple Dose Groups: Frequentist and Bayesian Methods

Unblinded sample size re-estimation (SSR) is often planned in a clinical...
12/29/2020 ∙ by Qingyang Liu, et al. ∙ 0

• ### Goodness-of-fit Testing in Linear Regression Models

Model checking plays an important role in linear regression as model mis...
11/18/2019 ∙ by Rok Blagus, et al. ∙ 0

• ### Pharmacokinetics Simulations for Studying Correlates of Prevention Efficacy of Passive HIV-1 Antibody Prophylaxis in the Antibody Mediated Prevention (AMP) Study

A key objective in two phase 2b AMP clinical trials of VRC01 is to evalu...
01/25/2018 ∙ by Lily Zhang, et al. ∙ 0

• ### Sketching for Two-Stage Least Squares Estimation

When there is so much data that they become a computation burden, it is ...
07/15/2020 ∙ by Sokbae Lee, et al. ∙ 0

• ### An Econometric Perspective of Algorithmic Sampling

Datasets that are terabytes in size are increasingly common, but compute...
07/03/2019 ∙ by Sokbae Lee, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Two-stage designs are used for many purposes, including enrichment, sample size re-estimation and to modify randomization probabilities to improve the efficiency and/or efficacy of estimators. All these procedures use accumulated data to change the operation of the experimental design, which induces dependencies between the first and second stage data. Our interest lies in the effects of such dependencies on inference at the end of a pilot study where the first stage sample size is fixed, and the second stage sample size is large.

In two-stage enrichment designs, patients more likely to benefit from the treatment are identified based on data from the first stage, and second stage trials are conducted in the identified subpopulation [e.g., Simon and Maitournam [24], Ivanova and Tamura [11], Rosenblum and van der Laan [20], Trippa et al. [28], Zang and Guo [29]]. Two-stage sample size re-estimation methods are conducted by revising the final sample size with parameter estimation from the first stage [e.g., Stein [26], Proschan [18], Shih [23], Schwartz and Denne [21], Zhong et al. [30], Tarima et al. [27], Broberg and Miller [3]]. In two-stage adaptive optimal designs, information from the first stage is used to estimate optimal treatment assignment probabilities for the second stage [e.g., Haines et al. [7], Lane and Flournoy [13], Englert and Kieser [5], Lane et al. [14], Shan et al. [22]].

Lane and Flournoy [13]

studied asymptotic distributional properties of the maximum likelihood estimator for nonlinear regression models with independent normal errors. In their study, they used the Fisher information to norm the score function when taking limits, obtaining a limiting distribution for the maximum likelihood estimator that is a random scale mixture of normal random variable. Use of this result requires knowledge of the distribution of the limiting scaling random variable.

Lane and Flournoy found this distribution in the special case of an exponential mean function. But the method used is not generalizable, and so their result is informative, but not generally useful in practice.

In their review paper on likelihood theory for stochastic processes, Barndorff-Nielsen and Sørensen [2]

describe conditions under which maximum likelihood estimators normed with the Fisher information converge to randomly scaled mixture of normal distributions, as was the case in

Lane and Flournoy [13]. Limiting random mixtures of normal random variables also arise in Ivanova et al. [12], Ivanova and Flournoy [10], and May and Flournoy [16]. But Barndorff-Nielsen and Sørensen describe a solution to this problem. Namely, they describe how using a random norming in lieu of the Fisher information can lead to a standard normal distribution instead.

This paper examines the use of random normings in a practical situation. In particular, we evaluate these alternative random norms in the same context as in Lane and Flournoy [13] and Lane et al. [14], and show how to apply them to obtain the more useful standard normal distribution. Then we compare the rates of convergence and efficiencies of the different norming alternatives.

Accordingly, this paper is organized as follows. In Section 2, we present the model to be studied in this paper. In Section 3, we describe stable and mixing convergences, which are needed, and a generalized version of the Cramér-Slutzky theorem. In Section 4, we present the main asymptotic results for maximum likelihood estimators with random normings. We conduct simulation studies to compare the efficiencies obtained with these normings for exponential and logistic models in Section 5.

## 2 The Model

Let be observations from a two-stage adaptive design, where is the number of observations and is the single-dose used for the th stage, . To avoid degenerate cases, we assume , and set . We consider a general regression model with independent normal errors:

 yij=η(xi,θ)+ϵij,ϵij∼N(0,σ2), (1)

where is some (possibly) nonlinear mean function, twice differentiable by ; is given; and for simplicity, is a 1-dimensional parameter. In addition, adaptation is restricted to the choice of , and depends on stage 1 data only through sufficient statistics from stage 1. More specifically, ) is a random function, where . Define . Then . But . As for , and . But is only conditionally on .

Let denote maximum likelihood estimators of based on stage  data, , and let denote the maximum likelihood estimator of based on all trials. Since maximum likelihood estimators (MLEs) are functions of sufficient statistics, is a function of the first stage mean response , and both and are functions of .

Then the likelihood function is

 Ln(θ|y11,…,y1,n1,y21,…,y2,n2) =fn(y11,…,y1,n1,y21,…,y2,n2|θ) =fn1(y11,…,y1,n1|θ)fn2(y21,…,y2,n2|θ,y11,…,y1,n1) ∝exp{−12σ2n1∑i=1[y1,i−η(x1,θ)]2−12σ2n2∑j=1[y2,j−η(x2,θ)]2} ∝exp{−n12σ2[¯y1−η(x1,θ)]2−n22σ2[(¯y2−η(x2,θ)]2}.

Letting , and , the score function can be written as

 Sn(θ)=ddθlogLn(θ)=n1σ2[¯y1−η(x1,θ)]˙η(x1,θ)+n2σ2[¯y2−η(x2,θ)]˙η(x2,θ).

## 3 Stable and Mixing Convergence

### 3.1 Motivation and Definitions

Let and be real random variables defined on some probability space , and let be a subfield. Given a sequence of random variables , suppose one wants to obtain the limiting distribution of the product of . If converges in probability to a constant , and converges in distribution to , then by the Cramér-Slutzky theorem [[25]]. However, Lane and Flournoy [13] showed for model (1) that if is small (and provided common regularity conditions with ), then

 √n(^θn−θ)≈UVn2, (2)

where for every , where ; and is independent of and . Since Equation (2) holds for all , it holds in the limit as with fixed. That is, as with fixed and independent of . But is a random function of that does not converge to a constant when is held fixed. So one cannot divide both sides of Equation (2) by and apply the classical Cramér-Slutzky theorem to obtain a limit.

To obtain a standard normal limit instead of the normal mixture in Equation (2) requires a generalized version of the Cramér-Slutzky theorem, which is given in Lemma 4.2 below. The generalized Cramér-Slutzky theorem requires the concepts of stable and mixing convergence, which were introduced by Rényi [19]. So before proceeding, we recall these concepts. A thorough description of stable and mixing convergence can be found in Häusler and Luschgy [9].

Let denote the conditional probability given the event . We say that converges stably to as if

 (Xn)n≥1d→X under PE for every event E∈G with P(E)>0. (3)

Stable convergence is stronger than convergence in distribution, but not as strong as convergence in probability. If is independent of , then the limit is said to be mixing.

### 3.2 Stable Convergence Under Model (1).

Under model (1), is a sub field of . In Lemma 3.1, we show that the convergence given in (2) is, in fact, stable convergence. In the context of Equation (3), take .

###### Lemma 3.1.

If and under model 1, stably with independent of as while is fixed.

###### Proof.

Because of Equation (2) we only have to show that

 UVn2→UZFn1−stably as n2→∞ with n1 fixed.

Let the event with . By -measurability of and independence of and and of and in combination with , we have under for all . In particular,

 UVn2d→UZ under PE%asn2→∞.

This proves in view of the definition of stable convergence in Equation (3)

## 4 Standard normal limits with random norming

### 4.1 Random Norms and Their Limits under Model (1)

Barndorff-Nielsen and Sørensen [2]

describe random measures of information that can be used as norms for estimator and test statistics, and sometimes yield a more useful limit (e.g., standard normal) for MLEs. Following

Barndorff-Nielsen and Sørensen [2], we call them the observed, incremental observed and incremental expected information measures. In the two-stage setting, it not only makes sense to define increments in the log-likelihood between individual subjects, but also between stages because sufficient statistics are stage-wise data summaries. We examine both.

First we formally define these, together with the expected (Fisher) information, and then we evaluate them under model 1:

1. The observed information is the negative derivative of the score function:

 jn(θ) =−˙Sn(θ).

Barndorff-Nielsen and Sørensen [2] and others have considered the observed information to be a standard with which the other information measures are compared.

1. The Fisher information

is the variance of the score function. Assuming the integral and derivatives exist and are interchangeable, it is given by

 in(θ) ≡Var[Sn(θ)]=E[Sn(θ)]2=E[−˙Sn(θ)].

Efron and Hinkley [4] studied the trade-off between the observed and expected (Fisher) information. They argue for using the observed information for data analysis after a study is completed, and they express a preference for using the expected information to design an experiment. Barndorff-Nielsen and Sørensen [2] state that the difference (between the observed and expected information) is due, essentially, to the high content of ancillary information carried by the observed information. Pierce [17] and Firth [6] showed the observed information is larger than the Fisher information by an amount .

To define the incremental information in general, suppose a study is conducted in stages with subjects in each stage, . Then the log-likelihood can be written in increments as where is the th subject-wise increment and is the th stage-wise increment with .

1. The incremental expected information was introduced as the conditional variance by Lévy and Borel [15]

in an early version of the Martingale central limit theorem. Let

denote the history of the experiment up through the trial for subject , ; and let be the trivial field. Then is a filtration of , i.e.: . Using subject-wise and stage-wise increments in , we obtain the subject-wise and stage-wise incremental norms:

 IDn(θ) =n∑i=1Eθ[Di(θ)2|Fi−1];IDn(θ)=K∑k=1Eθ[Dk(θ)2|Fnk−1].

The incremental expected information is also called the quadratic characteristic of the score martingale.

1. The incremental observed information is given by

 JDn(θ)

In the terminology of martingale theory, it is called the quadratic variation of the score martingale [e.g., Barndorff-Nielsen and Sørensen [2]] and squared variation [e.g., Hall and Heyde [8]]. Barndorff-Nielsen and Sørensen show that use of the incremental observed information may improve the robustness of estimators.

It is common for the random information measures to converge to the Fisher information. However, there can be substantial differences with small sample sizes. Note that only observed and expected information are defined solely in terms of the likelihood function and its distribution law. The incremental observed and expected information require knowledge of how the log-likelihood function increases from one subject or one stage to the next.

We now evaluate the random information norms that we will use to obtain standard normal limits for . Under model (1), with , the observed information is

 jn(θ) =n1σ2[˙η(x1,θ)]2−n1σ2[¯y1−η(x1,θ)]¨η(x1,θ)+n2σ2[˙η(x2,θ)]2−n2σ2[¯y2−η(x2,θ)]¨η(x2,θ). (4)

The subject-wise and stage-wise incremental observed information are, respectively,

 JDn(θ) =n∑i=1[Di(θ)]2=n1∑i=1[yi−η(x1,θ)]2σ4[˙η(x1,θ)]2+n∑i=n1+1[yi−η(x2,θ)]2σ4[˙η(x2,θ)]2 (5)

and

 JDn(θ) =[D1(θ)]2+[D2(θ)]2=n21[¯y1−η(x1,θ)]2σ4[˙η(x1,θ)]2+n22[¯y2−η(x2,θ)]2σ4[˙η(x2,θ)]2. (6)

The subject-wise and stage-wise incremental expected information are the same:

 IDn(θ) =n1∑i=1Eθ[Di(θ)2|Fi−1]+n∑i=n1+1Eθ[Di(θ)2|Fi−1] =n1∑i=1Eθ{(yi−η(x1,θ))2σ4[˙η(x1,θ)]2|Fi−1}+n∑i=n1+1Eθ{(yi−η(x2,θ))2σ4[˙η(x2,θ)]2|Fi−1} =n1σ2[˙η(x1,θ)]2+n2σ2[˙η(x2,θ)]2; (7)
 IDn(θ) =Eθ{D1(θ)2|F0}+Eθ{D2(θ)2|Fn1} =n21Eθ{[¯y1−η(x1,θ)]2σ4[˙η(x1,θ)]2|F0}+n22Eθ{[¯y2−η(x2,θ)]2σ4[˙η(x2,θ)]2|Fn1} =n1σ2[˙η(x1,θ)]2+n2σ2[˙η(x2,θ)]2. (8)

Lemma 4.1 provides convergence results for the random normings that are then used to obtain the desired standard normal limit for .

###### Lemma 4.1.

Under model (1), if and , as with is fixed,

1. [label=(0)]

where and .

###### Proof.
1. [label=(0)]

2. The first two terms of equation (4) go to when divided by . Note in the forth term that

 ¯y2−η(x2,θ)p→E[y2,j−η(x2,θ)]=E¯y1{E[y2,j−η(x2,θ)|¯y1]}=0,j=1,…,n2,

by the weak law of large numbers. As

, , and so , and .

3. The first term of equation (5) tends to when divided by . In the second term, by the weak law of large numbers,

 n∑i=n1+1[yi−η(x2,θ)]2/n2σ2p→1σ2E¯y1E{[yn1+1−η(x2,θ)]2 | ¯y1}=1.

As , and , , and .

4. The first term of equation (6) goes to when divided by . In the second term, is distributed as for every , so as . And is independent of . Therefore,

 n−1JDn(^θn)d→[˙η(x2,θ)]2σ−2n2[¯y2−η(x2,θ)]2σ−2=U−2W

where as and .

5. Again, the first term of equations (4.1) and (4.1) go to when divided by . As , and , , and evaluated at , this limit is .

### 4.2 The Generalized Cramér-Slutzky theorem and Its Application

Now we introduce the Generalized Cramér-Slutzky Theorem in order to obtain main theoretical results in Theorem 4.3, that is, to obtain standard normal limits for using random norms. According to Lemma 4.1, the observed information , the stage-wise and subject-wise incremental expected information , , and the subject-wise incremental observed information can be applied to normalize the MLE by the generalized Cramér-Slutzky theorem, while the stage-wise incremental observed information cannot.

###### Lemma 4.2.

The Generalized Cramér-Slutzky Theorem [1] Suppose that . Let be a continuous function of two variables, if , where is a -measurable random variable. Then

 g(Xn,Yn)d→g(X,Y)G−stably.
###### Theorem 4.3.

Under model (1),

 jn(^θn)1/2(^θn−θ)d→N(0,1)(mixing),
 JDn(^θn)1/2(^θn−θ)d→N(0,1)(mixing),
 IDn(^θn)1/2(^θn−θ)=IDn(^θn)1/2(^θn−θ)d→N(0,1)(mixing),

as with fixed.

###### Proof.

Defining , is continuous function of two variables when . Let and . Then and . Because , . Now by Lemma 4.2,

 g(Xn,Yn)=√n(^θn−θ)n−1/2jn(^θn)1/2=jn(^θn)1/2(^θn−θ)d→UZU−1=ZFn1−stably.

Since is independent of ,

 jn(^θn)1/2(^θn−θ)d→N(0,1)(mixing).

Similarly,

 JDn(^θn)1/2(^θn−θ)d→N(0,1)(mixing),
 IDn(^θn)1/2(^θn−θ)=IDn(^θn)1/2(^θn−θ)d→N(0,1)(mixing).

## 5 Adaptive Optimal Design Examples

In this section, we apply Theorem 4.3

to normalize MLEs following an adaptive optimal design under logistic and exponential (location and scale) regression models. Then we compare their tail probabilities and the difference between cumulative distribution functions using random norms and the Fisher information. For all models, the dose in the first stage is fixed at

, while the dose for stage 2 is selected from the range based on stage 1 data. The divergence of the MLE of to infinity necessitates restricting the search to some finite interval ; for simplicity throughout this section, we assume . All simulations assume the true parameter and known variance .

The stage-two dose that maximizes the increase in information on the unknown parameter is

 x2(θ)=argmaxx2∈[a,b] 1n2Var(Sn−Sn1)=argmaxx2∈[a,b] [˙η(x2,θ)]2. (9)

The two-stage adaptive optimal design is , where is selected adaptively as given by (9), i.e.,

 ^x2=⎧⎪ ⎪⎨⎪ ⎪⎩aif x2(^θn1)≤a;x2(^θn1)if x2(^θn1)∈[a,b];bif x2(^θn1)≥b;

and . For each model, we evaluate the MLE norms’ performance for several fixed values of , including a locally optimal stage 1 sample size [14]:

 n∗1(θ)≡argmaxn1∈{1,...,n}i(ξA,θ),

where the notation makes Fisher information’s dependence on the design explicit. To provide an ideal benchmark, is evaluated at the true value of for all models. A practical method to approximate the locally optimal stage 1 sample size is discussed by Lane et al. [14].

### 5.1 Logistic Regression Models

We explore the sample size needed to obtain the normal tail probabilities for the location parameter and scale parameter logistic regression models, separately.

#### 5.1.1 The Logistic-Location Model

Consider the Logistic-Location Model with independent normal errors:

 y=[1+ex−θ]−1+ϵ,ϵ∼N(0,σ2),x∈[a,b],−∞

Maximizing the first-stage likelihood function,

 Ln1(θ|y11,…,y1,n1)∝exp{−n12σ2[¯y1−(1+ex1−θ)−1]2},

yields the MLE:

Adaptively selecting the second-stage dose to be

 ^x2=⎧⎪ ⎪⎨⎪ ⎪⎩^θn1if ¯y1∈[(1+ex−a)−1,(1+ex−b)−1],bif ¯y1≥(1+ex−b)−1,aif ¯y1≤(1+ex−a)−1,

the likelihood given data from both stages is

 Ln(θ|y11,…,y1,n1,y21,…,y2,n2)∝exp{−n12σ2[¯y1−(1+ex1−θ)−1]2−n22σ2[¯y2−(1+ex2(¯y1)−θ)−1]2},

and the MLE based on all data is

 ^θn=⎧⎪ ⎪⎨⎪ ⎪⎩θ′nif θ′n∈(0,1/a),0if θ′n≤0,1/aif θ′n≥1/a,

where maximizes The average Fisher information given data from both stages is

where and are the probabilities that falls on the boundaries and , respectively.

According to functions (4), (5) and (4.1), respectively, the observed information is

 jn(θ)=n1σ2(1+ex1−θ)−4e2(x1−θ)−n1σ2[¯y1−(1+ex1−θ)−1](1+ex1−θ)−3(ex1−θ−1)+n2σ2(1+ex2−θ)−4e2(x2−θ)−n2σ2[¯y2−(1+ex2−θ)−1](1+ex2−θ)−3(ex2−θ−1);

the subject-wise incremental observed information is

 JDn(θ) =n1∑i=1[yi−(1+ex1−θ)−1]2σ4(1+ex1−θ)−4e2(x1−θ)+n∑i=n1+1[yi−(1+ex2−θ)−1]2σ4(1+ex2−θ)−4e2(x2−θ);

and the stage-wise and subject-wise incremental expected information are

 IDn(θ)=IDn(θ) =n1σ2(1+ex1−θ)−4e2(x1−θ)+n2σ2(1+ex2−θ)