# A computational approach to the Kiefer-Weiss problem for sampling from a Bernoulli population

We present a computational approach to solution of the Kiefer-Weiss problem. Algorithms for construction of the optimal sampling plans and evaluation of their performance are proposed. In the particular case of Bernoulli observations, the proposed algorithms are implemented in the form of R program code. Using the developed computer program, we numerically compare the optimal tests with the respective sequential probability ratio test (SPRT) and the fixed sample size test, for a wide range of hypothesized values and type I and type II errors. The results are compared with those of D. Freeman and L. Weiss (Journal of the American Statistical Association, 59(1964)). The R source code for the algorithms of construction of optimal sampling plans and evaluation of their characteristics is available at https://github.com/tosinabase/Kiefer-Weiss.

## Authors

• 2 publications
• 1 publication
• 1 publication
03/26/2022

### Design and performance evaluation in Kiefer-Weiss problems when sampling from discrete exponential families

In this paper, we deal with problems of testing hypotheses in the framew...
05/28/2021

### The Power of Log-Sum-Exp: Sequential Density Ratio Matrix Estimation for Speed-Accuracy Optimization

We propose a model for multiclass classification of time series to make ...
02/18/2021

### Attempted Blind Constrained Descent Experiments

Blind Descent uses constrained but, guided approach to learn the weights...
06/10/2022

### On the Complexity of Sampling Redistricting Plans

A crucial task in the political redistricting problem is to sample redis...
09/18/2018

### Random problems with R

R (Version 3.5.1 patched) has an issue with its random sampling function...
01/15/2019

### Optimal acceptance sampling for modules F and F1 of the European Measuring Instruments Directive

Acceptance sampling plans offered by ISO 2859-1 are far from optimal und...
06/24/2020

### Reinforced Data Sampling for Model Diversificatio

With the rising number of machine learning competitions, the world has w...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In a sequential statistical experiment, a sequence of random variables

, is potentially available to the statistician on the one-by-one basis. The observed data bear the information about the underlying distribution , being an unknown parameter whose true value is of interest to the statistician. In this paper, we are concerned with testing a simple hypothesis H against a simple alternative H , which is a classical problem of sequential analysis (see, e. g., Wald and Wolfowitz, 1948).

In its simplest form, a sequential hypothesis test is a pair consisting of a the stopping time and a (terminal) decision rule . Formally, it is required that and , for any natural and . The performance characteristics of a sequential test are the type I and type II error probabilities, and , and the average sample number, .

The Kiefer-Weiss problem is to find a test with a minimum value of , among all the tests satisfying the constraints on the type I and type II error probabilities:

 α(τ,δ)≤αandβ(τ,δ)≤β. (1.1)

Kiefer and Weiss (1957) noted that in some cases the solution of this problem can be obtained through the solution of a much simpler one (known, at the time being, as the modified Kiefer-Weiss problem). To be more specific, suppose that there exists a test that minimizes , for some fixed , among all the tests satisfying (1.1), and that this is the “least favorable” for the stopping time in the sense that

 supθEθτ=Eθ∗τ. (1.2)

Then it is easy to see that minimizes , among all the tests satisfying (1.1), that is, it solves the original Kiefer-Weiss problem.

In this way, Weiss (1962) used Bayesian approach for the solution of the Kiefer-Weiss problem, with , for two special cases:
1) the observations are independent and identically distributed (i.i.d.) with a normal distribution with an unknown mean

and a known variance

, and 2) the observations are i.i.d. with a Bernoulli distribution with an unknown success probability

, under the additional assumption that . In neither case the Bayesian test was easy to find. Even the simplest Bernoulli case “requires heavy computing” (as stated by Weiss (1962), p. 565).

Freeman and Weiss (1964) applied the same technique to a more general case of the Bernoulli model with non-symmetric hypotheses; they proposed a scheme for finding an approximate numerical solution to the Kiefer-Weiss problem by making close enough to (cf. (1.2)).

Lorden (1980) characterized the structure of the optimal tests in the modified Kiefer-Weiss problem for the particular case of one-parametric Koopman-Darmois families of distributions and showed that, for , the stopping times of these tests are almost surely bounded, and gave this bound.

Many other results concerning approximate or asymptotic solutions of the Kiefer–Weiss problem, and especially the modified Kiefer–Weiss problem, can be found in the literature. The modern viewpoint on the Kiefer-Weiss problem, and related, can be found in the monograph Tartakovsky et al. (2014). With respect to the exact solutions to the Kiefer-Weiss problem, which is the point of our interest in this paper, the authors state that “finding the exact solution involves quite heavy computation” (Tartakovsky et al., 2014, p. 228), which is generally concurring with the opinions expressed earlier by Weiss (1962), Freeman and Weiss (1964), and Lorden (1980), among others, with respect to the particular cases they studied.

In this paper, we propose a computational approach to the solution of the Kiefer-Weiss problem, which seems to be general enough to provide an exact solution to virtually any particular case of the Kiefer-Weiss problem, but generally does not require extreme computational power and is within the reach of modern computer capabilities.

We call this approach “computational”, because it essentially relies on computer algorithms as opposed to demonstrated properties of mathematical objects.

In Section 2, we bring together theoretical results our method is based on, and give a general formulation of the method.

In Section 3, we concretize the general method in the case of Bernoulli observations, providing all the necessary formulas ready for their computer implementation. The R source code (R Core Team, 2013) implementing the calculations in accordance with the formulas can be found in Novikov et al. (2021).

Using the computer code, we calculate, for a range of hypothesized values and/or error probabilities , the parameters of optimal sampling plans and their characteristics, as well those of the Wald’s sequential probability ratio tests (SPRT) and the fixed-sample-size tests (FSST) with the same levels of type I and type II error probabilities. The results are presented in the form of tables and graphs and discussed in Section 4.

## 2 Kiefer-Weiss problem: general results

In this section, we formulate the general results on the structure of the optimal tests in the Kiefer-Weiss problem and its modified version.

### 2.1 Theoretical basis

Throughout this paper, we will use the notation of Novikov (2009) and follow its general assumptions. The assumptions are notably more general than we actually need in this paper, but keeping in mind the possible extensions and generalizations, we consider it convenient to stick to the same framework as in Novikov (2009).

In particular, we consider randomized sequential tests , with being a stopping rule, and , being a (terminal) decision rule. It is always assumed that and are measurable functions with values in , whose values are interpreted as the conditional probabilities, given the data , to stop at stage , and, respectively, those to reject H, after stopping has been decided on, for any

Denote , and , so for any test

 α(ψ,ϕ)=∞∑n=1Eθ0sψnϕn (2.1)

is the type I error probability,

 β(ψ,ϕ)=∞∑n=1Eθ1sψn(1−ϕn) (2.2)

is the type II error probability, and

 N(θ;ψ)=∞∑n=1nEθsψn (2.3)

is the average sample number when the true parameter value is (provided that , – otherwise it is infinite).

It is common that the error probabilities are expressed through the operating characteristic function defined as

 OCθ(ψ,ϕ)=∞∑n=1Eθsψn(1−ϕn): (2.4)

.

Let be the set of all tests such that

 α(ψ,ϕ)≤α,β(ψ,ϕ)≤β, (2.5)

where are some fixed real numbers.

We are interested in finding the tests that minimize over all the tests in . This problem is known as the Kiefer-Weiss problem.

The respective modified Kiefer-Weiss problem is to minimize over all , for a given fixed value of .

At the time being, all the known solutions to the Kiefer-Weiss problem, for particular models, are obtained through the modified version of it (see Kiefer and Weiss, 1957; Weiss, 1962; Freeman and Weiss, 1964; Lai, 1973; Lorden, 1980; Huffman, 1983; Zhitlukhin et al., 2013; Tartakovsky et al., 2014).

The solutions to the modified Kiefer-Weiss problem can be obtained, at least in theory, in a very general situation using the following variant of the Lagrange multipliers method. Let us start with this.

Let

 L(ψ,ϕ)=N(θ∗;ψ)+λ0α(ψ,ϕ)+λ1β(ψ,ϕ), (2.6)

where is some fixed value of the parameter and are some nonnegative constants (called Lagrange multipliers).

Then the tests minimizing subject to (2.5) can be obtained through an unconstrained minimization of over all , using an appropriate choice of the Lagrange multipliers (see (see Novikov, 2009, Section 2)).

Lorden (1980) shows that in the case of i.i.d. observations the problem of minimizing the Lagrangian function is reduced to an optimal stopping problem for a Markov process.

It is easy to see that finding Bayesian tests used in Kiefer and Weiss (1957) is mathematically equivalent to the minimization of (2.6).

To construct the optimal tests we need an additional assumption on the distribution of the observations. Let be the Radon–Nikodym derivative of the distribution of with respect to a product-measure ( times by itself), .

In Lorden (1980), it is shown that, in the case of i.i.d. observations following a distribution from a Koopman-Darmois family, the tests giving solution to the modified problem have bounded with probability one stopping times when .

Let us describe the construction of tests minimizing the Lagrangian function calculated at some , over all truncated tests, i.e. those not taking more than a fixed number of observations ( is also called horizon in this case).

Formally, let be the class of all such tests. Let us define

 VHH=min{λ0fHθ0,λ1fHθ1}, (2.7)

and, recursively over ,

 VHn=min{λ0fnθ0,λ1fnθ1,fnθ∗+IVHn+1}, (2.8)

being

 IVHn+1=(IVHn+1)(x1,…,xn)=∫VHn+1(x1,…,xn+1)dμ(xn+1). (2.9)
###### Remark 2.1

defined above, as well as in (2.6), implicitly depend on .

From the results of Section 3.1 in Novikov (2009), we easily obtain the following characterization of all the truncated sequential tests minimizing .

###### Proposition 1

For all

 L(ψ,ϕ)≥1+IVH1. (2.10)

There is an equality in (2.10), for a test , if and only if it satisfies the following two conditions:

a)

for all

 I{min{λ0fnθ0,λ1fnθ1}

-a.e. on , and

b)

for all

 I{λ0fnθ0<λ1fnθ1}≤ϕn≤I{λ0fnθ0≤λ1fnθ1} (2.12)

-a.e. on .

Under very mild conditions, the optimal non-truncated tests are obtained on the basis of limits . Optimal stopping rules for the non-truncated tests are obtained substituting for in (2.11) for all , leaving the decision rules the same, just applying (2.12) for all (see Section 3 in Novikov (2009)).

Let us denote the class of all tests satisfying conditions a) and b) of Proposition 1, and let be the class of all (non-truncated) tests which satisfy (2.11) with is replaced by , for all natural , and satisfy (2.12) for all natural .

The following proposition shows how the Kiefer-Weiss problem and its modified version can be related (cf. also Section 3 of Freeman and Weiss (1964)).

###### Proposition 2

Let such that , , and let

 supθN(θ;ψ∗)−N(θ∗;ψ∗)=Δ(ψ∗). (2.13)

Then

 supθN(θ;ψ∗)≤infsupθN(θ;ψ)+Δ(ψ∗), (2.14)

where the infimum is taken over all

Proof. Let be a test satisfying the conditions of Proposition 2, and let be any test from . Then

 N(θ∗;ψ∗)+λ0α+λ1β=N(θ∗;ψ∗)+λ0α(ψ∗,ϕ∗)+λ1β(ψ∗,ϕ∗)
 ≤N(θ∗;ψ)+λ0α(ψ,ϕ)+λ1β(ψ,ϕ)≤N(θ∗;ψ)+λ0α+λ1β,

where the first inequality is due to Proposition 1. Therefore, , so

 supθN(θ;ψ∗)=N(θ∗;ψ∗)+Δ(ψ∗)≤N(θ∗;ψ)+Δ(ψ∗)≤supθN(θ;ψ)+Δ(ψ∗),

and (2.14) follows.

Obviously, if in Proposition 2, then solves the original Kiefer-Weiss problem with and . In this way, Proposition 2 reduces the original Kiefer-Weiss problem to its modified variant: one has to seek for a solution to the modified Kiefer-Weiss problem, with some , for which is a maximum of over all .

The treatment of the Kiefer-Weiss problem by Weiss (1962), in particular symmetric cases of sampling from normal and Bernoulli distributions, in essence, makes use of this proposition.

Freeman and Weiss (1964) propose, for the non-symmetric Bernoulli case of observations, a choice of making relatively small which gives nearly optimal tests in the Kiefer-Weiss problem.

### 2.2 The proposed computational method

The method is based on Propositions 1 and 2.

Proposition 1 is used for the Lagrangian minimization when seeking for solutions to the modified problem potentially solving the Kiefer-Weiss problem by virtue of Proposition 2.

Both parts are essentially numerical, and we will show in the next Section, in the case of sampling from a Bernoulli population, how they can be implemented using the modern statistical software.

In what remains of this Section we want to give a general description of the proposed method.

The method deals with an appropriate choice of the constants . From any class let us choose one test (for example, the easiest for implementation), - in this way, will identify a specific test.

We assume that a computer procedure is available which calculates, for any , the whole set of test characteristics: the error probabilities and the average sample number, whatever be the true value of . Derived from these, we will also assume that computing (see (2.13)) is available as well.

We know from Lorden (1980) that in the case of one-parametric Koopman-Darmois family of i.i.d. observations there is an upper bound on the horizon of the optimal test in the modified Kiefer-Weiss problem when is between and . Generally, it depends on and (see Lorden (1980)). Because and will not be moved within the algorithm, let us only retain in the notation, supposing is available for computation in such case.

Obviously, there can be cases where the bound does not exist, or is not available for computing, for some reason, or we just do not want to use it. We want the method be applicable in both cases, more precisely, we will treat these cases as two variants of the method because they are based on the same principles.

The proposed method

Option 1. Supposing the bound is available.

2. Seek for a such that the test with has a minimum value of over all :

 Δ(ψ∗)≤Δ(ψ)for⟨ψ,ϕ⟩∈MH(θ,λ0,λ1) (2.15)

with , whatever .

3. Evaluate and . If they do not comply with requirements on the error probabilities, repeat steps 2 and 3 with other and .

If (because of the computational nature of the method, this should mean in fact that it is close to 0), the solution to the Kiefer-Weiss problem for given and is found. If is not very close to 0, it could be informative anyway, because it shows how good the modified version is for the Kiefer-Weiss problem.

We implement this method in the next Section for the general Bernoulli case and obtain, for a series of particular hypothesized values of parameters and probability errors, numerical results showing that the minimum value of is 0 in each case.

In this way, we may speak about numerical solution of the Kiefer-Weiss problem for the Bernoulli observations. We provide full computational algorithms which make our results completely verifiable.

At the same time, we are rather pessimistic about the possibility of a strictly mathematical proof that this will always be the case, even for Koopman-Darmois families, and even for the Bernoulli case.

Option 2. Not using bounds for the maximum sample size.

2. For an increasing sequence of repeat:

3. Seek for a such that the test has a minimum value of over all :

 Δ(ψ∗)≤Δ(ψ)for⟨ψ,ϕ⟩∈MH(θ,λ0,λ1), (2.16)

whatever . Evaluate .

4. Stop repeating when the successive values of are close to each other.

5. Evaluate and . If they do not comply with requirements on the error probabilities, repeat steps 2 through 5 with other and .

Option 2, for any given horizon tries to find a solution to the Kiefer-Weiss problem in the class of all truncated, at level , sequential tests. If the solution is a truncated test, it will be found when is sufficiently large. If the optimal test is not truncated, the algorithm tries to find a good approximation to it in the class of the truncated tests, with high levels of truncation, which may be considered a numerical solution of the Kiefer-Weiss problem, at some precision level.

To see the particular usefulness of this approach, one can have a look at the example of the uniform distribution

(see Section 3.1 of Novikov and Palacios-Soto, 2020, starting from p. 148) . It is easily seen that in this case the algorithm applies exactly (not numerically), and that for any fixed it readily gives an optimal truncated test, whose characteristics converge, as , to those of the optimal non-truncated test, solving the Kiefer-Weiss problem for this particular model. Interestingly, the solution to the Kiefer-Weiss problem is an SPRT in this case.

The virtue of this example is more theoretical than practical, though.

There are other examples (rather artificial as well) of the modified Kiefer-Weiss problem, where the optimal stopping rule has a non-bounded stopping time (Hawix and Schmitz, 1998), but there is no Kiefer-Weiss problem it is related to. Should a general context for this example exist, Option 2 should be applicable for its numerical solution. We are pretty sure that the numerical optimization in step 3 can be available for its computer implementation much more generally than only in the uniform and the Bernoulli case.

Option 1 is preferable (when applicable), because it avoids the iteration cycle over truncation level in Option 2. But Option 2 is more general and is applicable to virtually any Kiefer-Weiss problem for which the computation in step 3 is feasible. Numerical experiments in the example case of the next Section, where both Options are applicable, show that both give the same results.

## 3 Kiefer-Weiss problem for sampling from a Bernoulli population

In this section, we obtain formulas for solving the modified Kiefer-Weiss problem and implement the computational algorithms of the preceding Section for solving the Kiefer-Weiss problem in the particular case of sampling from a Bernoulli population. On the basis of this, we obtain and analyze numerical results for the efficiency of the numerical solution to the Kiefer-Weiss problem with respect to the sequential probability ratio test and to the fixed-sample-size test, ones with the same error probabilities.

Let the observations be independent identically distributed Bernoulli random variables with , for , and . Then the joint probability

 fnθ(x1,…,xn)=gnθ(sn)=θsn(1−θ)n−sn,

where ,

### 3.1 Construction of tests

Let be two fixed hypothesized parameter values.

Let us construct, using Proposition 1, a solution to the modified Kiefer-Weiss problem for a given (see (2.7) – (2.9)).

In this Bernoulli model, it is not difficult to see, by induction, that

 VHn(x1,…,xn)=UHn(n∑i=1xi),

for , where

 UHH(s)=min{λ0gHθ0(s),λ1gHθ1(s)},s=0,1,…,H (3.1)

and, recursively over ,

 UHn(s)=min{λ0gnθ0(s),λ1gnθ1(s),gnθ∗(s)+UHn+1(s+1)+UHn+1(s)}, (3.2)

.

Let us consider a non-randomized test , which, at any stage , with observed,

• stops and accepts H, if (in which case ),

• stops and rejects H, if (giving preference to (a) if both (a) and (b) apply), being in this case , and

• continues to the next stage, if neither (a) nor (b) applies (being in this case);

and, at stage , stops and accepts H if () and rejects H otherwise ().

### 3.2 Operating characteristic, average sample number and related formulas

Let

 aHθ(s)=gHθ(s)(1−ϕ∗H(s)),s=0,1,…,H, (3.3)

and, recursively over ,

 anθ(s)=gnθ(s)ψ∗n(s)(1−ϕ∗n(s))+an+1θ(s)+an+1θ(s+1),s=0,1,…,n. (3.4)

Then the error probability of type II is .

To understand this, one can trace the appearance, in (3.1)–(3.2), of all the terms containing . The terms having as a coefficient are exactly those in (3.3)–(3.4) (with ). Now, take into account that, by Proposition 1, coincides with

 λ0α(ψ∗,ϕ∗)+λ1β(ψ∗,ϕ∗)+N(θ∗;ψ∗),

and the term in having as a coefficient is , so it is equal to .

Because now , we conclude that for any . And finally .

In a similar way, let

 bHθ(s)=0,s=0,1,…,H, (3.5)

and, recursively over ,

 bnθ(s)=(gnθ(s)+bn+1θ(s)+bn+1θ(s+1))(1−ψ∗n(s)),s=0,1,…,n. (3.6)

Then the average sample number , where .

When high levels of truncation are needed, the following variant of (3.3)–(3.4) seems to work computationally better.

Let us denote , .

Then, define

 AHθ(s)=GHθ(s)(1−ϕ∗H(s)),s=0,1,…,H, (3.7)

and, recursively over ,

 Anθ(s)=Gnθ(s)ψ∗n(s)(1−ϕ∗n(s))+An+1θ(s)n+1−sn+1+An+1θ(s+1)s+1n+1, (3.8)

.

It is easy to see, by induction, that , , , so

 OCθ(ψ∗,ϕ∗)=A0θ=A1θ(0)+A1θ(1). (3.9)

Analogously, let

 BHθ(s)=0,s=0,1,…,H, (3.10)

and, recursively over ,

 Bnθ(s)=(Gnθ(s)+Bn+1θ(s)n+1−sn+1+Bn+1θ(s+1)s+1n+1)(1−ψ∗n(s)), (3.11)

Again, , , , and the average sample number

 N(θ;ψ∗)=1+B0θ,whereB0θ=B1θ(0)+B1θ(1). (3.12)

We implement, in the R program code, versions (3.7)-(3.8) and (3.10)-(3.11), repectively, as a routine for calculation of the operationg characteristic function (3.9) and the average samle number (3.12).

It is very likely that the algorithms above in this subsection are applicable not only to the tests obtained through the Lagrange minimization but also generally to any truncated test. We defer the strict proof of this fact to a later occasion.

We will use Option 1 of the method in Section 2, in view of the intensity of calculations we need for obtaining the massive numerical results for the analysis below in this Section. Option 2 gives the same results but is somewhat slower.

Due to Lorden (1980) (see also Hawix and Schmitz (1998)) the test minimizing the Lagrangian function, given , can be found in with not exceeding

 H(θ∗,λ0,λ1)=inf{n≥1:alogλ0+blogλ1−n≤(a+b)logw0}, (3.13)

where , and and are determined from

 alog(fθ∗(X)fθ0(X)))+blog(fθ∗(X)fθ1(X)))≡1.

Thus, the computer implementation of (3.13) is straightforward.

At last, for the minimization step at stage 2 of the Option 1 of the method in preceding Section, we use R’s optimize function, first, to find the maximum of over , given , then the minimum of , over all . We use the default parameter of tolerance when using optimize, which is approx. 0.00012 in our case. Using lower levels of the tolerance parameter, better approximation is achieved.

The details of the implementation can be consulted in Novikov et al. (2021).

### 3.3 Numerical results

The main goal of this part is to illustrate the use of the developed R code on concrete examples based on the Bernoulli sampling, to show that the obtained sequential tests in any one of the examples provide numerical solutions to the Kiefer-Weiss problem, and to analyze the efficiency of the obtained tests with respect to the classical sequential probability ratio tests and the fixed-sample-size tests provided these have the same level of error probabilities.

We use a series of examples seen in Freeman and Weiss (1964), specified by 5 pairs of hypothesized success probabilibies: ; ; ; and . In each one, we employ a range of error probabilities commonly used in practice: 0.1, 0.05, 0.025, 0.01, 0.005, 0.001 and 0.0005.

For each combination of and we ran the computer code corresponding to the implementation of the method of Section 2 (Option 1), seeking for a test with the closest values of the error probabilities to their nominal values. In each case the real and the nominal error probability are within 0.001 of relative distance to each other.

The respective results are presented in Tables 1 – 5. For each test, there are its corresponding values of in the table, as well as its average sample number and its corresponding

, and the 0.99-quantile Q

of the distribution of the sample number, under as well. We also present the maximum sample number (denoted in the table) the test actually takes (this is not the upper bound (3.13)).

In the second part of each table, there are the calculated characteristics of the corresponding SPRT with the closest values of and to the nominal ones. Those are the values of the average sample number and the 0.99-quantile Q of the distribution of the sample number, both calculated at . They are calculated using the exact formulas in Young (1994), and not through the Wald approximations, as in Freeman and Weiss (1964). and are the endpoints of the continuation interval of the corresponding SPRT.

At last, FSS is the minimum value of the sample number required by the optimal fixed-sample-size test with error probabilities not exceeding and

. It is calculated using the binomial distribution of the test statistic rather than its normal approximation used in

Freeman and Weiss (1964).

In the last part of each table, there are calculated values of efficiency of each test with respect to the FSST. The efficiency is calculated as the ratio of FSS to other characteristics of the respective test: : and for the optimal Kiefer-Weiss test and and for the Wald’s SPRT. For example, means the optimal Kiefer-Weiss test takes 1.5 times fewer observations, on the average, than the corresponding fixed-sample-size test.

There is no SPRT part in Table 5. The reason for this is that in the symmetric case (when ), unlike other cases, it is generally impossible to find an SPRT matching, at least approximately, the nominal values of and . This is because, in the symmetric case, there is a restricted number of available values of , for example, for and , these are 0.0991, 0.0826, 0.0686, 0.0568, 0.0470, etc., none of them exactly matching the values of used for other hypothesis pairs. In the symmetric case, the SPRT’s decision-making process is equivalent to a random walk within two bounds formed by two horizontal straight lines, so the values of we gave above were obtained from the well-known formula for Gambler’s Ruin probability when the lines are symmetric with respect to the horizontal axis.

To evaluate the performance of the optimal tests for we ran the R computer code on a grid of equidistant points of and on the logarithmic scale, for each pair of and finding, at each point of the grid, the optimal test for the Kiefer-Weiss problem. In each case was found to be 0 (within the given precision of calculations).

For each pair of and , we calculated the error probabilities and and the average sample numbers , and corresponding to the optimal test

, as well as an estimate of the fixed sample size required to achieve the same

and . This time we use an approximate formula for the FSS based on the normal approximation

 FSS≈⎛⎝zα√θ0(1−θ0)+zβ√θ1(1−θ1)θ1−θ0⎞⎠2,

where , being

the cumulative distribution function of the standard normal distribution. This form is preferable from the point of view of smoothness of the graphical representation below, while the relative error, in comparison with its exact value based on the binomial distribution is within as much as 5%, for the range of the

and calculated.

In Figures 1 – 3, we present the results of the performance evaluation for three hypothesis pairs: vs. , vs. and vs. .

Each graph depicts one performance characteristic (-coordinate) as a function of (-coordinate) and (-coordinate). For and , the scale of decimal logarithms is used. In each Figure, the upper left graph represents the same efficiency used in the Tables, defined as . The upper right graph depicts the average sample number as a function of and . The two lower graphs in each Figure represent the performance of the optimal tests under and (very much analogously to the first graph) depicting the efficiency defined as and , respectively.

The graphs are prepared using the 3D visualization package rgl (Murdoch and Adler, 2021). Each graph is based on a grid of 2525 equidistant points of and , both within a range of 6 to 13.

There are interactive versions of the graphs in Novikov et al. (2021).