Bayesian sequential composite hypothesis testing in discrete time

08/16/2021 ∙ by Erik Ekström, et al. ∙ 0

We study the sequential testing problem of two alternative hypotheses regarding an unknown parameter in an exponential family when observations are costly. In a Bayesian setting, the problem can be embedded in a Markovian framework. Using the conditional probability of one of the hypotheses as the underlying spatial variable, we show that the cost function is concave and that the posterior distribution becomes more concentrated as time goes on. Moreover, we study time monotonicity of the value function. For a large class of model specifications, the cost function is non-decreasing in time, and the optimal stopping boundaries are thus monotone.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Assume that a sequence of random variables

is observed sequentially, and that the sequence is drawn from a one-parameter family of distributions depending on a real-valued random variable in such a way that are independent (conditional on ). Consider a tester who wants to test the two alternative hypotheses

where is a given constant (the ’threshold’). In the presence of an observation cost, a tradeoff between statistical precision and costly observation arises.

In a Bayesian formulation of the problem, the tester’s initial belief is described by a prior distribution for the unknown parameter . Denote by the set of -stopping times with values in , where is the filtration generated by the observation process . Given a stopping time , let be the set of -measurable random variables with values in . The random variable here represents the decision of the tester, with ’’ representing that hypothesis is accepted. We define the cost


where is a given and fixed cost of each observation.

The case when is a two-point distribution with

where and was studied in the classical reference [22], see also [19, Chapter 4.1]. It turns out that the statistical problem (1

) can be reduced to an optimal stopping problem in terms of the posterior probability process

, and since in this case is a (time-homogeneous) Markov process, the stopping problem can be embedded in a Markovian framework. It is shown in [19] that the cost function is concave in the prior belief ; as a consequence, the continuation region is an interval, and the optimal stopping time is the first exit time from this interval (the latter property was also obtained in [22]).

In the current article we relax the assumption about a two-point prior distribution and study the sequential analysis problem (1) in a Bayesian set-up for general prior distributions . To do that, we impose a one-dimensional exponential structure on the distribution of . As in [19], the conditional probability process is then still Markovian; however, is in general time-inhomogenous, which leads to time-dependence in the cost function, and the study of optimal strategies is more involved. In the absence of explicit solutions for the cost and the optimal strategy, we focus on structural properties of the solution. In particular, we prove that spatial concavity of the cost function holds regardless of the prior distribution. We also show a concentration result for the posterior distribution, which combined with the concavity result has implications for the monotonicity of the cost with respect to the time parameter.

1.1. Literature review

The problem of sequential testing of an unknown parameter has attracted much attention in the statistical literature, with [22]

as an early reference covering the case of two simple hypotheses and independent and identically distributed observations. Sequential testing of composite hypotheses in a discrete time setting with Bernoulli distributed observations is studied in

[15] and [16]

, with linear penalty for wrong decisions and relying on a conjugate prior for the unknown parameter. In

[21], Sobel studies sequential testing of composite hypotheses for an arbitrary class of distributions in the exponential family and with a general prior distribution of the unknown parameter. In a key result, he establishes the existence of two stopping boundaries beyond which it is optimal to stop. Related literature in discrete time, but more focused on the case of sequential estimation, includes [1] and [6].

Another strand of literature has focused on continuous time approximations of sequential testing problems and their connections with free boundary problems. For the sequential testing of two simple hypotheses, [20] solved the problem of determining the unknown drift of a Brownian motion, and [17] solved the corresponding sequential testing problem of determining an unknown intensity of a Poisson process. In [2]

, a problem with composite hypotheses was studied in continuous time and for a normal prior distribution, with a ’0-1’ loss function for wrong decisions (as in (

1)), and in a series of papers (see [7] and the references therein), Chernoff studied the same problem but with linear penalty functions. In the case of sequential composite hypothesis testing, explicit solutions are rare, and a main focus in this literature is on deriving asymptotics of the problem as the cost of observation tends to zero, as well as asymptotically optimal solutions (e.g. [3], [14] and [18]) and deriving bounds for the stopping boundaries.

More recent literature has focused on different variants of these continuous-time problems. To mention a few, [12] studies a version with finite horizon, [8] studies a setting with combined learning from several Brownian motions and compound Poisson processes, and [11] studies Wiener sequential testing in a multi-dimensional set-up. All these papers study simple hypotheses, i.e. set-ups where the unknown parameters can take only two possible values. In [23], a hypothesis testing problem for a case with three possible drifts is examined, and in [10] a composite hypothesis problem for the drift of a Wiener process is studied with a general prior distribution. Moreover, [9] study a sequential estimation problem for a Wiener process in the same set-up. Key to the analysis in [10] and [9] is the choice of appropriate variables. In fact, in [10] it is shown that if instead of the observation process one uses the conditional probability as state variable, then the corresponding continuation region is shrinking in time; a similar result holds for sequential least-square estimation if one uses the conditional expectation as state variable.

1.2. Our contribution

In the current article, we study the sequential composite hypothesis testing problem (1) using a Markovian approach. Our analysis is general in the sense that we treat the whole one-parameter exponential family with arbitrary prior distribution, and we thus do not rely on conjugate priors. Following [10], we use the conditional probability process as the underlying state variable, and we show that a concavity result holds in these coordinates. We also use these coordinates to obtain a concentration result for the posterior distribution, which then is used to show that spatial concavity is intimately connected with monotonicity with respect to time. In particular, we provide a condition under which the continuation region is non-increasing in time. In principle, translating back to the observation coordinates, this would give an upper bound on the growth of the stopping boundaries.

The paper is organised as follows. In Section 2 we recall some basic properties of statistical inference in the exponential family, and we introduce the notion of -level curves along which the value of the conditional probability is constant. In Section 3, we provide a Markovian embedding of (1), and we prove that the embedded cost function is spatially concave. In Section 4 we prove that the posterior distribution becomes more concentrated about the threshold along level curves. Sections 5-6 deal with the question whether the value function is monotone with respect to the time parameter.

2. Preliminaries on the exponential family

In this article, we will consider the case of a one-dimensional exponential family of distributions for , . More precisely, let be a -finite measure on , and define


so that

for . For , let


so that . We assume that the distribution of , conditional on , is

Remark 2.1.

In some literature, the notion of an exponential family allows for densities on the form , and the case (3) in which and is then refered to as a natural exponential family. Using the transformed variables and , an exponential form can be transformed into a natural form, so we may consider the natural form (as above) without loss of generality.

We start with some well-known results.

Lemma 2.2.

We have that

  • is convex, and is an interval.

Denote by the interior of . Then

  • all derivatives of exist on , and they are given by the expressions obtained by formally differentiating inside the integral. In particular,

  • the function is non-decreasing for any non-decreasing function .


For (i) and (ii) we refer to [5, Theorem 1.13]) and [5, Theorem 2.2], respectively. For (iii), we have

where the final inequality is due to the fact the covariance of two non-decreasing functions evaluated at the same random variable is non-negative. ∎

We use a Bayesian set-up in which the unknown parameter has a given prior distribution ; we assume that is a measure on , and we denote the support of by . Moreover, denote

Naturally, to avoid degenerate cases we assume that .

Next, by standard means, the optimization problem (1) can be reduced to an optimal stopping problem, i.e. a problem in which only one optimization (namely over ) takes place. In fact, given a stopping time , an optimal decision rule is given by

where the posterior probability process is given by


where . To derive an expression for , note that


More generally, at time , given observations we have by independence

Thus, denoting

we have


Remark 2.3.

The fact that is a sufficient statistic in any exponential family is well-known. Moreover, also a converse holds: under some mild conditions it holds that any family of distributions that admits a real-valued sufficient statistic for sample size larger than one is a one-parameter exponential family, see e.g. [4] and [13].

We denote by

the posterior distribution of at time conditional on . Note that the prior distribution satisfies ; however, for reasons of Markovian embedding, below we will consider simultaneously the whole family of alternative prior distributions.

Lemma 2.4.

The function is an increasing bijection for each fixed .


We have

Since assigns positive mass on each side of the threshold , the above covariance is strictly positive. Thus , so is strictly increasing. Moreover,

as , so as . A similar argument shows that as , so is surjective. ∎

For each fixed value , denote by the unique value such that . The set consists of all points with , and is refered to as the level curve. Since the function is a bijection, two level curves with different -values never intersect. Furthermore, they are ordered so that if , then .

3. Markovian embedding

It follows from Lemma 2.4 that the process is a (time-inhomogeneous) Markov process, and we can write the -process in terms of as

Furthermore, this allows us to embed the optimal stopping problem (1) as a time-dependent problem in terms of the Markov process as


Here is the probability measure under which has distribution . We emphasize that , i.e. can take any value in .

Lemma 3.1.

The value function satisfies


This follows directly from the Markovian structure of the process . ∎

Lemma 3.2.

Let be a concave function. Then is concave on .


To simplify the notation, we prove the statement for . Moreover, we will assume that is twice continuously differentiable; the general case follows readily by approximation.

First note that




Straightforward differentiation yields


Note that is decreasing on , and is increasing. Furthermore, by Lemma 2.4, increases in .

We will show that

To do that, first note that



By Lemma 2.2, the function


is non-increasing.

To study the first factor of the integrand in (3), denote and note that





straightforward calculations show that


Note that is a quadratic function in , and that the coefficient of is positive since

Consequently, the set is a bounded interval (possibly empty). Moreover, since

we have


Therefore we must have

so the interval . Denote the end-points of this interval by and , respectively, so that , with . Then, using (7) we find that

where we used (8) in the last equality.

Similarly, increases in , so

Thus is concave. ∎

Theorem 3.3.

The function is concave for each fixed .


Define the cost function as in (5), but with the infimum being taken over stopping times ( is then the value function in a problem with a finite horizon). By an iterated use of Lemma 3.1 and Lemma 3.2 and the fact that the minimum of two concave functions is concave, is concave. Moreover, it is straightforward to check that as , and since the pointwise limit of concave functions is concave, the result follows. ∎

So far we have been working under the assumption that . One can further extend the value function to the boundary points by setting for all . In this way, is defined for every and the concavity is preserved.

In accordance with standard stopping theory, we introduce the continuation region by

and the stopping region by

The stopping time

is an optimal strategy for our testing problem.

The concavity of the value function has important implications for the structure of the continuation region.

Corollary 3.4.

There exist functions and such that


Since , we have . The result then follows from concavity of and the piecewise linearity of . ∎

Remark 3.5.

In view of the bijection in Lemma 2.4, the fact that time sections of the continuation region are intervals in the -coordinates implies that also time sections of the continuation region expressed in -coordinates are intervals. This is a well-known result, see [21] (under somewhat different assumptions).

4. Concentration of the posterior distribution

Recall that the mass above of the posterior distribution remains constantly equal to along a -level curve. In this section we show that the posterior distribution becomes more concentrated around

along a level curve. This result, however natural it appears, seems to be new in the literature; for related results showing that the conditional variance of the mean-square estimate is a supermartingale, see


Theorem 4.1.

If , then

are decreasing.


For the first claim, it suffices to show that


where . Moreover, without loss of generality, we may assume that so that . Let

and let . Note that


Also note that

Therefore, since is convex, we have that changes its monotonicity (from increasing to decreasing) at most once. Now we consider two separate cases:


  • .

If (i) holds, then for (since changes its monotonicity at most once). Consequently, if , then


(if , then (9) holds trivially with equality). Since

(10) implies that

so (9) holds.

On the other hand, if (ii) holds, then the fact that changes its monotonicity at most once gives that for all . Consequently,



the inequality (11) yields