    # A Unified Approach for Constructing Confidence Intervals and Hypothesis Tests Using h-function

We introduce a general method, named the h-function method, to unify the constructions of level-alpha exact test and 1-alpha exact confidence interval. Using this method, any confidence interval is improved as follows: i) an approximate interval, including a point estimator, is modified to an exact interval; ii) an exact interval is refined to be an interval that is a subset of the previous one. Two real datasets are used to illustrate the method.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

It is well known that there is a one-to-one mapping between a family of tests and a confidence set. If one of the two is given in advance, then the other can be derived as follows.

Let be the range of a parameter of interest and be a sample space. For each , let be the acceptance region of a level- test of . Then,

 C(x––)={θ0∈Θ:x––∈A(θ0)}. (1)

is a confidence set. Conversely, let be a confidence set. Define

 A(θ0)={x––∈S:θ0∈C(x––)}. (2)

Then is the acceptance region of a level- test of .

An interval is typically derived from the tests since tests are easier to construct. However, solving from as in (1) is a complicated process. The first goal of this paper is to simplify this process. This is different from the existing methods described in Casella and Berger (2002), where the test and interval are constructed separately.

The key feature of a confidence interval is that its confidence coefficient, which is defined to be the infimum coverage probability (ICP) over the entire parameter space (Casella and Berger, 2002), should be no smaller than the nominal level . To avoid ambiguity in the future discussion, a exact confidence interval means that it has an ICP no less than ; while a confidence interval only means that the nominal level of the interval is set to be but its ICP can be any number in . For example, a 95% Wald internal for a proportion is not necessarily a 95% exact interval. In fact, a Wald interval is a zero exact interval for any sample size and any in (Brown, Cai and DasGupta, 2001; Agresti, 2013).

The requirement of an ICP no smaller than is violated for an approximate interval. Thus, the major concern is whether an inferential conclusion drawn from the interval is reliable as its ICP is seldom reported and can be much smaller than the nominal level . Huwang (1995) proved this for Wilson interval (1927) even for large samples. On the other hand, an approximate interval is attractive in practice since it is easy to derive. Therefore, it is of great interest to build an exact interval through a given approximate interval – the second goal of the paper. To the best of our knowledge, limited research has been conducted on this issue.

One common way to obtain a exact two-sided interval is to take the intersection of two one-sided intervals, for example, Clopper-Pearson interval (1934). Such intervals may be conservative. Hence, it is also of great interest to shrink any given exact interval – the third goal of the paper. Casella (1986) and Wang (2014) refine exact confidence intervals for a single proportion. However, their methods cannot be applied to the general case when there exist nuisance parameters.

We address the above three problems by introducing an h-function that is closely related to the p-value. The computation of p-values was originally done by Arbuthnot (1710) in a study of comparing the probabilities of male and female births. Laplace addressed the same problem later using binomial distributions (Stigler, 1986). The concept of the p-value was first formally introduced by Pearson (1900) and the use of the p-value in Statistics was popularized by Fisher (1925), where he proposed the level

as a limit for statistical significance. Now, the p-value method is a commonly used method for test construction, where the p-value is treated as a function over the random observation. In this paper, however, the p-value is also treated as a function over the parameter of interest for confidence interval construction. It can modify and/or refine any interval – a solution to the challenging problem of improving an interval when a nuisance parameter exists.

The paper is organized as follows: In Section 2, we describe the h-function method that yields both level- exact tests and exact confidence intervals. Section 3 discusses how to modify any two-sided confidence interval to an exact interval. Section 4 improves any exact two-sided interval. Section 5 modifies any one-sided interval to the smallest one. We provide discussions in Section 6. All proofs are given in the Appendix.

## 2 A general method

Suppose

is observed from a distribution with joint cumulative distribution function (CDF)

specified by a parameter vector

in a parameter space . Here is the parameter of interest and

is the nuisance parameter vector. The null hypothesis

is one of the three forms: , and , for a fixed value . Each form corresponds to two-sided, lower and upper one-sided intervals for , respectively. We next introduce the h-function method and illustrate its basic usage through two simple cases.

### 2.1 The h-function method

A p-value is a statistic satisfying for every sample point . A small provides evidence that is false. Following Casella and Berger (2002), a p-value is valid if, for every ,

 supθ∈H0P(θ,η–)(p(X––)≤α)≤α. (3)

For simplicity, we drop the subscript in the future discussion. In many cases, a valid p-value at

can be defined through a test statistic

using, for example,

 p(x––)=supθ∈H0P(T(X––)≤T(x––)), (4)

when a small value of is not in favor of . This includes the case that is the likelihood ratio test statistic for . For any satisfying (3), a level- acceptance region for is equal to

Note that is equal to a single point or a one-sided interval with the boundary point . The p-value indeed depends on both and . This is the key fact for the future theoretical development. Thus, we rewrite the p-value as

 h(X––,θ0)def=p(X––) (5)

and call the left hand side the h-function. We emphasis that the h-function closely relates to but is different from the p-value. The former is a function of both the random observation and parameter of interest, while the latter is typically treated as a function of the random observation. Based on this , the level- acceptance region for and exact confidence set for are given by

 A(θ0)={x––:h(x––,θ0)>α} % and C(x––)={θ0:h(x––,θ0)>α}, (6)

respectively. Both and are obtained by solving the same inequality but in terms of two different arguments and , respectively. In this sense, the constructions of test and confidence set are unified. We do not obtain or from each other following (1) and (2). Instead, we use the intermediary h-function in (5) and name it the h-function method. In fact, Blaker (2000) used a special in (6) to derive confidence intervals for four discrete distributions of a single parameter. Here, the h-function method is applied to a general case with or without nuisance parameters. In particular, Blaker interval (2000) can be uniformly shortened by this method as shown in Table 4.

It is known that may not be an interval. So, we use . Here, denotes the smallest simply connected set containing set . Therefore, is always an interval and its infimum coverage probability (ICP) over the entire parameter space is at least because contains . i.e.,

 ICP(¯¯¯¯¯¯¯¯¯¯¯¯¯C(X––))=inf(θ,η–)∈HP(θ∈¯¯¯¯¯¯¯¯¯¯¯¯¯C(X––))≥inf(θ,η–)∈HP(θ∈C(X––))=ICP(C(X––))≥1−α.

The ICP is also called the confidence coefficient (Casella and Berger, 2002). Throughout the paper, we use to denote and if it causes no confusion.

In general, a test statistic may also depend on , for example, the t-statistic, and thus has a form of . When a small value of is in favor of , let be a subset of the sample space, then

 (7)

where

is either the joint probability mass function (PMF) or probability density function (PDF) of

.

For illustration of the h-function method, we next consider two classic problems: i) estimating the proportion based on a binomial observation ; ii) estimating the difference of two proportions based on two independent binomials and . These are widely used in practice, including clinical trials. The two problems are still open since the best intervals for and have not been recognized yet. Let and denote the PMF and CDF of .

### 2.2 Two-sided confidence intervals for a proportion

For a fixed , consider the hypotheses Suppose is a test statistic satisfying

 Tp(x,p0)=Tp(n−x,1−p0), ∀x∈[0,n], (8)

and a small value of supports . Hence, the h-function based on is

 hp(x,p0)=∑{y∈[0,n]:Tp(y,p0)≤Tp(x,p0)}pB(y,n,p0).

The acceptance region of level- test for and exact confidence interval for are

 Ap(p0)={x:hp(x,p0)>α} and Cp(x)=¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯{p0:hp(x,p0)>α}.

Now we state a fact that the upper limit of interval can be obtained using the lower limit and vice versa. This simplifies interval construction by half.

###### Proposition 1

For a test statistic satisfying (8), we have

 Up(x)=1−Lp(n−x), ∀x∈[0,n]. (9)

Define the first h-function

 hp1(x,p0)=min{2min{Pp0(X≤x),Pp0(X≥x)},1}.

This yields the two-sided Clopper-Pearson interval (1934) for , denoted by . The interval also satisfies (9) since .

Define the second h-function based on a test statistic ,

 hp2(x,p0)=∑{y∈[0,n]:Tp2(y,p0)≤Tp2(x,p0)}pB(y,n,p0),

where . This yields Blaker interval (2000), denoted by , for . Since satisfies (8), interval satisfies (9). As discussed in Casella and Berger (2002) and Agresti (2013), has a nesting property: an interval with a higher confidence level always contains the one with a lower level. Also, is a subset of . But can still be shortened uniformly as shown in Section 4.

The third h-function based on the likelihood ratio test statistic is

 hp3(x,p0)=∑{y∈[0,n]:Tp3(y,p0)≤Tp3(x,p0)}pB(y,n,p0),

where for . This generates a confidence interval . The upper limit of can also be determined by its lower limit following (9) because satisfies (8).

We report in Table 1 the infimum coverage probability (ICP) and total interval length (TIL) over all sample points for the above intervals for when is from 16 to 100. Their ICPs are all no smaller than 0.95. Due to Wang (2007), the ICP of an interval for with a nondecreasing lower confidence limit is achieved at one of the values for , where denotes the left limit of when approaches . Thus, the ICP of can be computed precisely. Interval is the winner among the three due to its small TIL. In Section 4, we see that these three intervals are uniformly shorter, as shown in Table 4, than their modified intervals . For comparison purpose, the interval in Wang (2014) is also reported. This interval, derived by an iterative algorithm, is admissible and is shortest among the four intervals. Here, a exact confidence interval is admissible if any interval , which is contained in but not equal to , has an ICP less than .

### 2.3 Confidence intervals for the difference of two proportions

There are three commonly used measurements, the difference

, the relative risk and the odds ratio, for comparison of the two proportions. Here we focus on the difference. Consider the following hypotheses for any fixed

: Under , for , where

 D(d0)={[0,1−d0], %if d0∈[0,1];[−d0,1], if d0∈[−1,0).

Suppose is a test statistic satisfying

 Td(x,y,d0)=Td(n1−x,n2−y,−d0), ∀(x,y)∈Sd, (10)

for and a small value of supports . The h-function based on is

 hd(x,y,d0)=supp2∈D(d0)∑{(u,v)∈Sd:Td(u,v,d0)≤Td(x,y,d0)}pB(u,n1,p2+d0)pB(v,n2,p2). (11)

The acceptance region of level- test for and exact confidence interval for are

Similar to Proposition 1, we state a fact that determines the upper limit of from its lower limit and vice versa. This again simplifies interval calculation.

###### Proposition 2

For a test statistic satisfying (10), we have

 Ud(x,y)=−Ld(n1−x,n2−y), ∀(x,y)∈Sd. (13)

First, introduce the likelihood ratio test statistic:

 Td1(x,y,d0)=(^p1d^p1)x(1−^p1d1−^p1)n1−x(^p2d^p2)y(1−^p2d1−^p2)n2−y,

where

 ^p2d(x,y,d0)=argmaxp2∈D(d0)pB(x,n1,p2+d0)pB(y,n2,p2),
 ^p1d(x,y,d0)=^p2d(x,y,d0)+d0, ^p1=x/n1, ^p2=y/n2.

It can be shown that satisfies (10). The h-function based on follows (11). Then, the acceptance region of level- test and confidence interval for are given in (12) and are denoted by and , respectively. The upper limit of is determined by the lower limit through (13).

Secondly, the score test statistic, see Agresti and Min (2001) and Fay (2010), is

 Td2(x,y,d0)=−|^p1−^p2−d0|√^p1d(x,y,d0)(1−^p1d(x,y,d0))n1+^p2d(d0)(1−^p2d(x,y,d0))n2.

When or , the above ratio is 0/0. So it is defined to be 0. We obtain the h-function following (11) and then derive the acceptance region of level- test and exact confidence interval following (12). Since satisfies (10), the upper limit of is determined by the lower limit through (13).

Table 2 reports the ICP and TIL over all sample points in for and and other intervals to be discussed later when and varies. The ICPs of and are at least 0.95. Each value is computed by a grid search: select pairs of , where both and are the multiples of 0.005; compute the coverage probability at each pair; use the minimum of these coverage probabilities as the ICP.

## 3 Modifying any given two-sided confidence interval

For convenience, we consider the closed interval for if the confidence limits are not infinity. For a given value , consider

 H0:θ=θ0 vs HA:θ≠θ0. (14)

Define a test statistic through interval

 T2(X––,θ0)=min{θ0−L0(X––),U0(X––)−θ0}. (15)

Clearly, a small provides strong evidence of establishing , and

 T2(x––,θ0)≥0 if and only if θ0∈C0(x––). (16)

The h-function based on is

 h2(x––,θ0)=supH0P(T2(X––,θ0)≤T2(x––,θ0)). (17)

Following (6), the level- acceptance region for and exact two-sided confidence interval for are

 A2(θ0)={x––:h2(x––,θ0)>α} and CM0(x––)=¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯{θ0:h2(–x,θ0)>α}. (18)

In the rest of the paper, denotes the smallest closed simply connected set that contains set . The superscript “M” refers a modification. In general, . For a positive integer , is the resultant interval after the modification process of (15) through (18) is applied to for consecutive times. For example, when , , and when ,

###### Theorem 1

For any given confidence interval , consider the hypotheses (14).

i) The h-function in (17) is a valid p-value for statistic in (15) at observation .

ii) Interval given in (18) is of level . i.e., .

The theorem modifies a confidence interval of any level, including a point estimator (a zero exact confidence interval), to a exact confidence interval . Also, when using to conduct a level- test for one simply rejects if falls outside the interval but is unable to report a p-value. Now the p-value is given by in (17). The test of rejecting if this p-value is no larger than is an exact test of level ; On the contrary, the test of rejecting if falls outside is not of level if is not a exact interval.

Example 1. (more confidence intervals for a proportion) Interval estimation of based on is one of the basic inferential problems. Asymptotic intervals play an important role but may not be reliable. We apply the modification process of (15) through (18) to several commonly used intervals to obtain exact intervals.

Let denote the upper

th percentile of the standard normal distribution. It is known that Wald interval

 Cp5(X)=[^p∓zα2√^p(1−^p)n]

has an ICP zero for any and because it shrinks to a point when or (Agresti, 2013). We now modify it to a exact interval using the modification process. Table 3 contains 95% , , and others when . For example, has , and . In order to assure the computed ICP for an exact interval to be at least , the lower limits for a confidence interval round down but the upper limits round up at the fourth decimal place. Although has an zero ICP, its modification has a correct ICP 95%. In return, becomes wider and does not make the lower limits at larger than zero. This is due to the negative values of the lower limits of at . We further apply the modification process to for more times. The 22nd modification has an ICP 95% and is a subset of . It cannot be shortened by the modification process any further, however, can be shortened at following Casella (1986) or Wang (2014).

We also report, in Table 3, the result of modifying the 95% Wilson interval (1927). Huwang (1995) showed that its ICP is much lower than the nominal level for any . However, the modified has a correct ICP 95% and is admissible here. To see the admissibility, we apply the refining approach in Wang (2014) to and find that the refinement of is equal to . Since Wang’s approach produces admissible intervals, we conclude admissible when .

Here is an interesting part. Can we apply the modification process to a point estimator to obtain a exact interval? The answer is yes. Table 3 reports the sample proportion (a zero-length interval estimator of level zero) and its modification . The latter turns out to be admissible following the same argument in the previous paragraph. i.e., the refinement of from Wang’s approach is equal to . We try another (arbitrary) point estimator , which satisfies (9) and has a nondecreasing confidence limits, see Table 3. The modified has an ICP 95% and can be shortened further because is a subset of .

Example 2. (more confidence intervals for the difference of two proportions) We now modify two approximate intervals for based on two independent binomials and in Section 2.3. The first is Wald interval for :

 Cd3(X,Y)=[^p1−^p2∓zα2√^p1(1−^p1)n1+^p2(1−^p2)n2].

This interval has a zero ICP for any and and (Wang and Zhang, 2014). The second is the MLE of that is treated as a confidence interval

 Cd4(X,Y)=[^p1−^p2∓0].

Clearly, this interval has a zero ICP and a zero length. Theorem 1 is applied to these two approximate intervals to obtain two exact confidence intervals and . The modified intervals have a larger TIL over all sample points than the original intervals because they have a correct ICP that is at least . Table 2 contains several examples. Theorem 1 is also applied to three exact intervals, and , discussed in Section 2.3 and the two-one-sided interval in Wang (2010), denoted by . This time all TIL’s of modified intervals decrease rather than increase. See Table 2.

Example 3 (the z-interval and its variants) In this example, we investigate the effect of choosing a point estimator on the modified confidence interval. Suppose is a random sample from a normal population for known . Confidence intervals for based on the minimum sufficient statistic , including the z-interval, are of interest. Consider vs . If is estimated by a point estimator for two given constants and , then, following (15), define a test statistic

 Tzab(¯X,μ0)=min{μ0−a¯X−b,a¯X+b−μ0}=−|μ0−a¯X−b|