    # Bayesian sequential composite hypothesis testing in discrete time

We study the sequential testing problem of two alternative hypotheses regarding an unknown parameter in an exponential family when observations are costly. In a Bayesian setting, the problem can be embedded in a Markovian framework. Using the conditional probability of one of the hypotheses as the underlying spatial variable, we show that the cost function is concave and that the posterior distribution becomes more concentrated as time goes on. Moreover, we study time monotonicity of the value function. For a large class of model specifications, the cost function is non-decreasing in time, and the optimal stopping boundaries are thus monotone.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Assume that a sequence of random variables

is observed sequentially, and that the sequence is drawn from a one-parameter family of distributions depending on a real-valued random variable in such a way that are independent (conditional on ). Consider a tester who wants to test the two alternative hypotheses

 H0: Θ≤θ0, H1: Θ>θ0,

where is a given constant (the ’threshold’). In the presence of an observation cost, a tradeoff between statistical precision and costly observation arises.

In a Bayesian formulation of the problem, the tester’s initial belief is described by a prior distribution for the unknown parameter . Denote by the set of -stopping times with values in , where is the filtration generated by the observation process . Given a stopping time , let be the set of -measurable random variables with values in . The random variable here represents the decision of the tester, with ’’ representing that hypothesis is accepted. We define the cost

 (1) V:=infτ∈Tinfd∈Dτ{P(d=1,Θ≤θ0)+P(d=0,Θ>θ0)+cE[τ]},

where is a given and fixed cost of each observation.

The case when is a two-point distribution with

 P(Θ=θ2) =π, P(Θ=θ1) =1−π,

where and was studied in the classical reference , see also [19, Chapter 4.1]. It turns out that the statistical problem (1

) can be reduced to an optimal stopping problem in terms of the posterior probability process

, and since in this case is a (time-homogeneous) Markov process, the stopping problem can be embedded in a Markovian framework. It is shown in  that the cost function is concave in the prior belief ; as a consequence, the continuation region is an interval, and the optimal stopping time is the first exit time from this interval (the latter property was also obtained in ).

In the current article we relax the assumption about a two-point prior distribution and study the sequential analysis problem (1) in a Bayesian set-up for general prior distributions . To do that, we impose a one-dimensional exponential structure on the distribution of . As in , the conditional probability process is then still Markovian; however, is in general time-inhomogenous, which leads to time-dependence in the cost function, and the study of optimal strategies is more involved. In the absence of explicit solutions for the cost and the optimal strategy, we focus on structural properties of the solution. In particular, we prove that spatial concavity of the cost function holds regardless of the prior distribution. We also show a concentration result for the posterior distribution, which combined with the concavity result has implications for the monotonicity of the cost with respect to the time parameter.

### 1.1. Literature review

The problem of sequential testing of an unknown parameter has attracted much attention in the statistical literature, with 

as an early reference covering the case of two simple hypotheses and independent and identically distributed observations. Sequential testing of composite hypotheses in a discrete time setting with Bernoulli distributed observations is studied in

 and 

, with linear penalty for wrong decisions and relying on a conjugate prior for the unknown parameter. In

, Sobel studies sequential testing of composite hypotheses for an arbitrary class of distributions in the exponential family and with a general prior distribution of the unknown parameter. In a key result, he establishes the existence of two stopping boundaries beyond which it is optimal to stop. Related literature in discrete time, but more focused on the case of sequential estimation, includes  and .

Another strand of literature has focused on continuous time approximations of sequential testing problems and their connections with free boundary problems. For the sequential testing of two simple hypotheses,  solved the problem of determining the unknown drift of a Brownian motion, and  solved the corresponding sequential testing problem of determining an unknown intensity of a Poisson process. In 

, a problem with composite hypotheses was studied in continuous time and for a normal prior distribution, with a ’0-1’ loss function for wrong decisions (as in (

1)), and in a series of papers (see  and the references therein), Chernoff studied the same problem but with linear penalty functions. In the case of sequential composite hypothesis testing, explicit solutions are rare, and a main focus in this literature is on deriving asymptotics of the problem as the cost of observation tends to zero, as well as asymptotically optimal solutions (e.g. ,  and ) and deriving bounds for the stopping boundaries.

More recent literature has focused on different variants of these continuous-time problems. To mention a few,  studies a version with finite horizon,  studies a setting with combined learning from several Brownian motions and compound Poisson processes, and  studies Wiener sequential testing in a multi-dimensional set-up. All these papers study simple hypotheses, i.e. set-ups where the unknown parameters can take only two possible values. In , a hypothesis testing problem for a case with three possible drifts is examined, and in  a composite hypothesis problem for the drift of a Wiener process is studied with a general prior distribution. Moreover,  study a sequential estimation problem for a Wiener process in the same set-up. Key to the analysis in  and  is the choice of appropriate variables. In fact, in  it is shown that if instead of the observation process one uses the conditional probability as state variable, then the corresponding continuation region is shrinking in time; a similar result holds for sequential least-square estimation if one uses the conditional expectation as state variable.

### 1.2. Our contribution

In the current article, we study the sequential composite hypothesis testing problem (1) using a Markovian approach. Our analysis is general in the sense that we treat the whole one-parameter exponential family with arbitrary prior distribution, and we thus do not rely on conjugate priors. Following , we use the conditional probability process as the underlying state variable, and we show that a concavity result holds in these coordinates. We also use these coordinates to obtain a concentration result for the posterior distribution, which then is used to show that spatial concavity is intimately connected with monotonicity with respect to time. In particular, we provide a condition under which the continuation region is non-increasing in time. In principle, translating back to the observation coordinates, this would give an upper bound on the growth of the stopping boundaries.

The paper is organised as follows. In Section 2 we recall some basic properties of statistical inference in the exponential family, and we introduce the notion of -level curves along which the value of the conditional probability is constant. In Section 3, we provide a Markovian embedding of (1), and we prove that the embedded cost function is spatially concave. In Section 4 we prove that the posterior distribution becomes more concentrated about the threshold along level curves. Sections 5-6 deal with the question whether the value function is monotone with respect to the time parameter.

## 2. Preliminaries on the exponential family

In this article, we will consider the case of a one-dimensional exponential family of distributions for , . More precisely, let be a -finite measure on , and define

 B(u):=log(∫Rexp{ux}ν(dx))

and

 N={u∈R:∫Rexp{ux}ν(dx)<∞}

so that

 B(u)<∞

for . For , let

 (2) pu(x):=exp{ux−B(u)}

so that . We assume that the distribution of , conditional on , is

 (3) P(Xk∈A|Θ=u)=∫Apu(x)ν(dx).
###### Remark 2.1.

In some literature, the notion of an exponential family allows for densities on the form , and the case (3) in which and is then refered to as a natural exponential family. Using the transformed variables and , an exponential form can be transformed into a natural form, so we may consider the natural form (as above) without loss of generality.

###### Lemma 2.2.

We have that

• is convex, and is an interval.

Denote by the interior of . Then

• all derivatives of exist on , and they are given by the expressions obtained by formally differentiating inside the integral. In particular,

 B′(u)=∫Rxexp{ux}ν(dx)∫Rexp{ux}ν(dx)=E[X1|Θ=u];
• the function is non-decreasing for any non-decreasing function .

###### Proof.

For (i) and (ii) we refer to [5, Theorem 1.13]) and [5, Theorem 2.2], respectively. For (iii), we have

 ∂∂uE[G(X1)|Θ=u] = ∂∂u∫RG(x)pu(x)ν(dx)=∫RG(x)(x−B′(u))pu(x)ν(dx) = E[G(X1)X1|Θ=u]−E[G(X1)|Θ=u]E[X1|Θ=u]≥0,

where the final inequality is due to the fact the covariance of two non-decreasing functions evaluated at the same random variable is non-negative. ∎

We use a Bayesian set-up in which the unknown parameter has a given prior distribution ; we assume that is a measure on , and we denote the support of by . Moreover, denote

 S+=S∩(θ0,∞)&S−=S∩(−∞,θ0]=S∖S+.

Naturally, to avoid degenerate cases we assume that .

Next, by standard means, the optimization problem (1) can be reduced to an optimal stopping problem, i.e. a problem in which only one optimization (namely over ) takes place. In fact, given a stopping time , an optimal decision rule is given by

 d={0if Πτ≤1/21if Πτ>1/2,

where the posterior probability process is given by

 Πn:=P(Θ>θ0|FXn).

Consequently,

 V=infτ∈TE[Πτ∧(1−Πτ)+cτ],

where . To derive an expression for , note that

 P(Θ>θ0|X1=x1)=∫S+pu(x1)μ(du)∫Spu(x1)μ(du),

so

 Π1=∫S+pu(X1)μ(du)∫Spu(X1)μ(du).

More generally, at time , given observations we have by independence

 P(Θ>θ0|X1=x1,…,Xn=xn) =∫S+∏ni=1pu(xi)μ(du)∫S∏ni=1pu(xi)μ(du) =∫S+exp{u∑ni=1xi−nB(u)}μ(du)∫Sexp{u∑ni=1xi−nB(u)}μ(du).

Thus, denoting

 Yn:=n∑i=1Xi

we have

 Πn=q(n,Yn),

where

 q(n,y):=∫S+euy−nB(u)μ(du)∫Seuy−nB(u)μ(du).
###### Remark 2.3.

The fact that is a sufficient statistic in any exponential family is well-known. Moreover, also a converse holds: under some mild conditions it holds that any family of distributions that admits a real-valued sufficient statistic for sample size larger than one is a one-parameter exponential family, see e.g.  and .

We denote by

 μn,y(du):=euy−nB(u)μ(du)∫Seuy−nB(u)μ(du)

the posterior distribution of at time conditional on . Note that the prior distribution satisfies ; however, for reasons of Markovian embedding, below we will consider simultaneously the whole family of alternative prior distributions.

###### Lemma 2.4.

The function is an increasing bijection for each fixed .

###### Proof.

We have

 ∂q(n,y)∂y = ∫S+ueuy−nB(u)μ(du)∫Seuy−nB(u)μ(du)−∫Sueuy−nB(u)μ(du)∫S+euy−nB(u)μ(du)(∫Seuy−nB(u)μ(du))2 = E[Θ1{Θ>θ0}|Yn=y]−P(Θ>θ0|Yn=y)E[Θ|Yn=y].

Since assigns positive mass on each side of the threshold , the above covariance is strictly positive. Thus , so is strictly increasing. Moreover,

 ∫S+euy−nB(u)μ(du)∫S−euy−nB(u)μ(du)≥∫S+e(u−θ0)y−nB(u)μ(du)∫S−e−nB(u)μ(du)→∞

as , so as . A similar argument shows that as , so is surjective. ∎

For each fixed value , denote by the unique value such that . The set consists of all points with , and is refered to as the level curve. Since the function is a bijection, two level curves with different -values never intersect. Furthermore, they are ordered so that if , then .

## 3. Markovian embedding

It follows from Lemma 2.4 that the process is a (time-inhomogeneous) Markov process, and we can write the -process in terms of as

 Πn=∫S+μn,Yn(du)=∫S+pu(Xn)μn−1,Yn−1(du)∫Spu(Xn)μn−1,Yn−1(du).

Furthermore, this allows us to embed the optimal stopping problem (1) as a time-dependent problem in terms of the Markov process as

 (5) V(n,π)=infτ∈TEn,π[Πτ+n∧(1−Πτ+n)+cτ].

Here is the probability measure under which has distribution . We emphasize that , i.e. can take any value in .

###### Lemma 3.1.

The value function satisfies

 V(n−1,π)=min{π∧(1−π),c+En−1,π[V(n,Πn)]}.
###### Proof.

This follows directly from the Markovian structure of the process . ∎

###### Lemma 3.2.

Let be a concave function. Then is concave on .

###### Proof.

To simplify the notation, we prove the statement for . Moreover, we will assume that is twice continuously differentiable; the general case follows readily by approximation.

First note that

 E0,π[f(Π1)]=∫Rf(α(x,π)β(x,π))β(x,π)dx,

where

 α(x,π) =∫S+pu(x)μ0,y(0,π)(du), β(x,π) =∫Spu(x)μ0,y(0,π)(du).

Define

 H1(z):=f(z)+(1−z)f′(z)

and

 H2(z):=f(z)−zf′(z).

Straightforward differentiation yields

 ∂2E0,π[f(Π1)]∂π2 =∫R(f(αβ)βππ+f′(αβ)(βαππ−αβππ)2β+f′′(αβ)(βαπ−αβπ)2β3)dx ≤∫R(f(αβ)βππ+f′(αβ)(βαππ−αβππ)2β)dx =∫R(αππH1(αβ)+(β−α)ππH2(αβ))dx =I1+I2,

where

 I1:=∫RαππH1(αβ)dx&I2:=∫R(β−α)ππH2(αβ)dx

Note that is decreasing on , and is increasing. Furthermore, by Lemma 2.4, increases in .

We will show that

 I1≤0&I2≤0.

To do that, first note that

 α(x,π)=∫S+pu(x)euy(0,π)∫Reuy(0,π)μ(du)μ(du),

so

 I1 =∫S+(euy(0,π)∫Reuy(0,π)μ(du))ππ∫Spu(x)H1(α(x,π)β(x,π))dxμ(du) (6) =∫S+(euy(0,π)∫Reuy(0,π)μ(du))ππE[H1(α(X1,π)β(X1,π))|Θ=u]μ(du).

By Lemma 2.2, the function

 (7) u↦E[H1(α(X1,π)β(X1,π))|Θ=u]

is non-increasing.

To study the first factor of the integrand in (3), denote and note that

 ∂∂π(euy(0,π)∫Seuy(0,π)μ(du))=∂∂y(euyg(y))π′(y)∣∣ ∣ ∣∣y=y(0,π),

where

 π(y):=∫S+euyμ(du)∫Seuyμ(du).

Consequently,

 ∂2∂π2(euy(0,π)∫Reuy(0,π)μ(du)) = π′(y)∂2∂y2(euyg(y))−π′′(y)∂∂y(euyg(y))π′(y)3∣∣ ∣ ∣∣y=y(0,π)

Using

 ∂∂yeuyg=euyg2(ug−g′)

and

 π′=g∫S+ueuyμ(du)−g′∫S+euyμ(du)g2,

straightforward calculations show that

 ∂2∂π2(euy(0,π)∫Seuy(0,π)μ(du)) =euy(0,π)F(u)(π′)3g3,

where

 F(u)= u2(g∫S+ueuyμ(du)−g′∫S+euyμ(du)) +u(g′′∫S+euyμ(du)−g∫S+u2euyμ(du)) +g′∫S+u2euyμ(du)−g′′∫S+ueuyμ(du).

Note that is a quadratic function in , and that the coefficient of is positive since

 g∫S+ueuyμ(du)−g′∫S+euyμ(du)=g2Cov0,π(Θ,1{Θ>θ0})>0.

Consequently, the set is a bounded interval (possibly empty). Moreover, since

 π=∫S+euy(0,π)μ(du)g(y(0,π))=1−∫S−euy(0,π)μ(du)g(y(0,π)),

we have

 (8) ∫S+∂2∂π2(euy(0,π)∫Reuy(0,π)μ(du))μ(du)=∫S−∂2∂π2(euy(0,π)∫Reuy(0,π)μ(du))μ(du)=0.

Therefore we must have

 F(θ0)<0,

so the interval . Denote the end-points of this interval by and , respectively, so that , with . Then, using (7) we find that

 I1 = ∫S∩(−∞,u1)euy(0,π)F(u)(π′(y(0,π)))3g3(y(0,π))E[H1(α(X1,π)β(X1,π))∣∣ ∣∣Θ=u]μ(du) +∫S∩[u1,∞)euy(0,π)F(u)(π′(y(0,π)))3g3(y(0,π))E[H1(α(X1,π)β(X1,π))∣∣ ∣∣Θ=u]μ(du) ≤ E[H1(α(X1,π)β(X1,π))∣∣ ∣∣Θ=u1]∫(θ0,u1)euy(0,π)F(u)(π′(y(0,π)))3g3(y(0,π))μ(du) +E[H1(α(X1,π)β(X1,π))∣∣ ∣∣Θ=u1]∫[u1,∞)euy(0,π)F(u)(π′(y(0,π)))3g3(y(0,π))μ(du) = 0,

where we used (8) in the last equality.

Similarly, increases in , so

 I2 = ∫S+∩(−∞,u0]euy(0,π)F(u)(π′(y(0,π)))3g3(y(0,π))E[H2(α(X1,π)β(X1,π))∣∣ ∣∣Θ=u]μ(du) +∫S−∩(u0,∞)euy(0,π)F(u)(π′(y(0,π)))3g3(y(0,π))E[H2(α(X1,π)β(X1,π))∣∣ ∣∣Θ=u]μ(du) ≤ E[H2(α(X1,π)β(X1,π))∣∣ ∣∣Θ=u0]∫S−euy(0,π)F(u)(π′(y(0,π)))3g3(y(0,π))μ(du) = 0.

Thus is concave. ∎

###### Theorem 3.3.

The function is concave for each fixed .

###### Proof.

Define the cost function as in (5), but with the infimum being taken over stopping times ( is then the value function in a problem with a finite horizon). By an iterated use of Lemma 3.1 and Lemma 3.2 and the fact that the minimum of two concave functions is concave, is concave. Moreover, it is straightforward to check that as , and since the pointwise limit of concave functions is concave, the result follows. ∎

So far we have been working under the assumption that . One can further extend the value function to the boundary points by setting for all . In this way, is defined for every and the concavity is preserved.

In accordance with standard stopping theory, we introduce the continuation region by

 C:={(n,π)∈N0×[0,1]:V(n,π)<π∧(1−π)},

and the stopping region by

 D:={(n,π)∈N0×[0,1]:V(n,π)=π∧(1−π)}.

The stopping time

 τ∗:=inf{k≥0:(n+k,Πn+k)∈D}

is an optimal strategy for our testing problem.

The concavity of the value function has important implications for the structure of the continuation region.

###### Corollary 3.4.

There exist functions and such that

 C={(n,π)∈N0×[0,1]:b1(n)<π
###### Proof.

Since , we have . The result then follows from concavity of and the piecewise linearity of . ∎

###### Remark 3.5.

In view of the bijection in Lemma 2.4, the fact that time sections of the continuation region are intervals in the -coordinates implies that also time sections of the continuation region expressed in -coordinates are intervals. This is a well-known result, see  (under somewhat different assumptions).

## 4. Concentration of the posterior distribution

Recall that the mass above of the posterior distribution remains constantly equal to along a -level curve. In this section we show that the posterior distribution becomes more concentrated around

along a level curve. This result, however natural it appears, seems to be new in the literature; for related results showing that the conditional variance of the mean-square estimate is a supermartingale, see

.

###### Theorem 4.1.

If , then

 n↦Pn,π(Θ≤a)&n↦Pn,π(Θ>b)

are decreasing.

###### Proof.

For the first claim, it suffices to show that

 (9) P0,π(Θ≤a)≥P1,π(Θ≤a)

where . Moreover, without loss of generality, we may assume that so that . Let

 f(u):=euy(1,π)−B(u),

and let . Note that

 P0,π(Θ≤a)=∫Saμ(du)

and

 P1,π(Θ≤a)=∫Saf(u)μ(du)∫Sf(u)μ(du).

Also note that

 ∂f(u)∂u=f(u)(y(1,π)−B′(u)).

Therefore, since is convex, we have that changes its monotonicity (from increasing to decreasing) at most once. Now we consider two separate cases:

and

• .

If (i) holds, then for (since changes its monotonicity at most once). Consequently, if , then

 (10) ∫Saf(u)μ(du)∫S−∖Saf(u)μ(du)≤f(a)∫Saμ(du)f(a)∫S−∖Saμ(du)=∫Saμ(du)∫S−∖Saμ(du).

(if , then (9) holds trivially with equality). Since

 ∫S−f(u)μ(du)∫Sf(u)μ(du)=1−π=∫S−μ(du),

(10) implies that

 P1,π(Θ≤a) = ∫Saf(u)μ(du)∫Sf(u)μ(du)=∫Saf(u)μ(du)∫Saμ(du)∫S−f(u)μ(du) = ∫Saf(u)μ(du)∫S−∖Saf(u)μ(du)∫S−μ(du)1+∫Saf(u)μ(du)∫S−∖Saf(u)μ(du)≤∫Saμ(du)=P0,π(Θ≤a),

so (9) holds.

On the other hand, if (ii) holds, then the fact that changes its monotonicity at most once gives that for all . Consequently,

 (11) ∫S−∖Saf(u)μ(du)∫S+f(u)μ(du)≥f(θ0)∫S−∖Saμ(du)f(θ0)∫S+μ(du)=∫S−∖Saμ(du)∫S+μ(du).

Since

 ∫S+f(u)μ(du)∫Sf(u)μ(du)=π=∫S+μ(du),

the inequality (11) yields

 P1,π(Θ>a)