1 Introduction
The problem of timely risk assessment and decisionmaking based on a sequentially observed time series is ubiquitous, with applications in finance, medicine, cognitive science and signal processing [17]. A common setting that arises in all these domains is that a decisionmaker, provided with sequential observations of a time series, needs to decide whether or not an adverse event (e.g. financial crisis, clinical acuity for ward patients, etc) will take place in the future. The decisionmaker’s recognition of a forthcoming adverse event needs to be timely, for that a delayed decision may hinder effective intervention (e.g. delayed admission of clinically acute patients to intensive care units can lead to mortality [5]). In the context of cognitive science, this decisionmaking task is known as the twoalternative forced choice (2AFC) task [15]. Insightful structural solutions for the optimal Bayesian 2AFC decisionmaking policies have been derived in [916], most of which are inspired by the classical work of Wald on sequential probability ratio tests (SPRT) [8].
In this paper, we present a Bayesian decisionmaking model in which a decisionmaker adaptively decides when to gather (costly) information from an underlying time series in order to accumulate evidence on the occurrence/nonoccurrence of an adverse event. The decisionmaker operates under time pressure: occurrence of the adverse event terminates the decisionmaking process. Our abstract model is motivated and inspired by many practical decisionmaking tasks such as: constructing temporal patterns for gathering sensory information in perceptual decisionmaking [1], scheduling lab tests for ward patients in order to predict clinical deterioration in a timely manner [3, 5], designing breast cancer screening programs for early tumor detection [7], etc.
We characterize the structure of the optimal decisionmaking policy that prescribes when should the decisionmaker acquire new information, and when should she stop acquiring information and issue a final prediction. We show that the decisionmaker’s posterior belief process, based on which policies are prescribed, is a supermartingale that reflects the decisionmaker’s tendency to deny the occurrence of an adverse event in the future as she observes the survival of the time series for longer time periods. Moreover, the information acquisition policy has a ”rendezvous” structure; the optimal ”date” for acquiring the next information sample can be computed given the current sample. The optimal schedule for gathering information over time balances the information gain (surprise) obtained from acquiring new samples, and the probability of survival for the underlying stochastic process (suspense). Finally, we characterize the continuation and stopping regions in the decisionmaker’s statespace and show that, unlike previous models, they depend on the time series ”context” and not just the decisionmaker’s beliefs.
Related Works Mathematical models and analyses for perceptual decisionmaking based on sequential hypothesis testing have been developed in [917]. Most of these models use tools from sequential analysis developed by Wald [8] and Shiryaev [21, 22]. In [9,13,14], optimal decisionmaking policies for the 2AFC task were computed by modelling the decisionmaker’s sensory evidence using diffusion processes [20]. These models assume an infinite time horizon for the decisionmaking policy, and an exogenous supply of sensory information.
The assumption of an infinite time horizon was relaxed in [10] and [15], where decisionmaking is assumed to be performed under the pressure of a stochastic deadline; however, these deadlines were considered to be drawn from known distributions that are independent of the hypothesis and the realized sensory evidence, and the assumption of an exogenous information supply was maintained. In practical settings, the deadlines would naturally be dependent on the realized sensory information (e.g. patients’ acuity events are correlated with their physiological information [5]), which induces more complex dynamics in the decisionmaking process. Contextbased decisionmaking models were introduced in [17], but assuming an exogenous information supply and an infinite time horizon.
The notions of “suspense” and “surprise” in Bayesian decisionmaking have also been recently introduced in the economics literature (see [18] and the references therein). These models use measures for Bayesian surprise, originally introduced in the context of sensory neuroscience [19], in order to model the explicit preference of a decisionmaker to noninstrumental information. The goal there is to design information disclosure policies that are suspenseoptimal or surpriseoptimal. Unlike our model, such models impose suspense (and/or surprise) as a (behavioral) preference of the decisionmaker, and hence they do not emerge endogenously by virtue of rational decision making.
2 Timely Decision Making with Endogenous Information Acquisition
Time Series Model The decisionmaker has access to a timeseries modeled as a continuoustime stochastic process that takes values in , and is defined over the time domain , with an underlying filtered probability space . The process is naturally adapted to , and hence the filtration abstracts the information conveyed in the time series realization up to time . The decisionmaker extracts information from to guide her actions over time.
We assume that is a stationary Markov process^{1}^{1}1Most of the insights distilled from our results would hold for more general dependency structures. However, we keep this assumption to simplify the exposition and maintain the tractability and interpretability of the results., with a stationary transition kernel , where
is a realization of a latent Bernoulli random variable
(unobservable by the decisionmaker), with . The distributional properties of the paths of are determined by , since the realization of decides which Markov kernel ( or ) generates . If the realization is equal to , then an adverse event occurs almost surely at a (finite) random time , the distribution of which is dependent on the realization of the path .The decisionmaker’s ultimate goal is to sequentially observe , and infer before the adverse event happens; inference is obsolete if it is declared after . Since is latent, the decisionmaker is unaware whether the adverse event will occur or not, i.e. whether her access to is temporary ( for ) or permanent ( for ). In order to model the occurrence of the adverse event; we define as an stopping time for the process , for which we assume the following:

The stopping time is finite almost surely, whereas is infinite almost surely, i.e. , and .

The stopping time is accessible^{2}^{2}2Our analyses hold if the stopping time is totally inaccessible., with a Markovian dependency on history, i.e. , where is an injective map from to and is nondecreasing in .
Thus, unlike the stochastic deadline models in [10] and [15], the decision deadline in our model (i.e. occurrence of the adverse event) is contextdependent as it depends on the time series realization (i.e. is not independent of as in [15]). We use the notation where to denote the stopped process to which the decisionmaker has access. Throughout the paper, the measures and assign probability measures to the paths and respectively, and we assume that ^{3}^{3}3The absolute continuity of with respect to means that no sample path of should be fully revealing of the realization of ..
Information The decisionmaker can only observe a set of (costly) samples of rather than the full continuous path. The samples observed by the decisionmaker are captured by partitioning over specific time intervals: we define with , as a size partition of over the interval , where is the total number of samples in the partition . The decisionmaker observes the values that takes at the time instances in ; thus the sequence of observations is given by the process where is the Dirac measure. The space of all partitions over the interval is denoted by . We denote the probability measures for partitioned paths generated under and with a partition as and respectively.
Since the decisionmaker observes through the partition , her information at time is conveyed in the algebra . The stopping event is observable by the decisionmaker even if . We denote the algebra generated by the stopping event as . Thus, the information that the decisionmaker has at time is expressed by the filtration , and it follows that any decisionmaking policy needs to be measurable.
Figure 1 depicts a Brownian path (a sample path of a Wiener process, which satisfies all the assumptions of our model)^{4}^{4}4In Figure 1, the stopping event was simulated as a totally inaccessible first jump of a Poisson process., with an exemplary partition over the time interval . The decisionmaker observes the samples in sequentially, and reasons about the realization of the latent variable based on these samples and the process survival, i.e. at , the decisionmaker’s information resides in the algebra generated by the samples in , and the algebra generated by the process’ survival .
Policies and Risks The decisionmaker’s goal is to come up with a (timely) decision , that reflects her prediction for whether the actual realization is or , before the process potentially stops at the unknown time . The decisionmaker follows a policy: a (continuoustime) mapping from the observations gathered up to every time instance to two types of actions:

A sensing action : if , then the decisionmaker decides to observe a new sample from the running process at time .

A continuation/stopping action : if , then the decisionmaker decides to stop gathering samples from
, and declares a final decision (estimate) for
. Whenever the decisionmaker continues observing and postpones her declaration for the estimate of .
A policy is a (measurable) mapping rule that maps the information in to an action tuple at every time instance . We assume that every single observation that the decisionmaker draws from entails a fixed cost, hence the process has to be a point process under any optimal policy^{5}^{5}5Note that the cost of observing any local continuous path is infinite, hence any optimal policy must have being a point process to keep the number of observed samples finite.. We denote the space of all such policies by .
A policy generates the following random quantities as a function of the paths on the probability space :
1 A stopping time : The first time at which the decisionmaker declares its estimate for , i.e. .
2 A decision (estimate of ) : Given by .
3 A random partition : A realization of the point process , comprising a finite set of strictly increasing stopping times at which the decisionmaker decides to sample the path .
A loss function is associated with every realization of the policy
, representing the overall cost incurred when following that policy for a specific path . The loss function is given by(1) 
where
is the cost of type I error (failure to anticipate the adverse event),
is the cost of type II error (falsely predicting that an adverse event will occur),
is the cost of the delay in declaring the estimate , is the cost incurred when the adverse event occurs before an estimate is declared (cost of missing the deadline), and is the cost of every observation sample (cost of information). The risk of each policy is defined as its expected loss(2) 
where the expectation is taken over the paths of . In the next section, we characterize the structure of the optimal policy .
3 Structure of the Optimal Policy
Since the decisionmaker’s posterior belief at time , defined as , is an important statistic for designing sequential policies [10, 2122], we start our characterization for by investigating the belief process .
3.1 The Posterior Belief Process
Recall that the decisionmaker distills information from two types of observations: the realization of the partitioned time series (i.e. the information in ), and 2) the survival of the process up to time (i.e. the information in ). In the following Theorem, we study the evolution of the decisionmaker’s beliefs as she integrates these pieces of information over time^{6}^{6}6All proofs are provided in the supplementary material.
Theorem 1 (Information and beliefs). Every posterior belief trajectory associated with a policy that creates a partition of is a càdlàg path given by
where is the RadonNikodym derivative^{7}^{7}7Since we impose the condition and fix a partition , then the RadonNikodym derivative exists. of the measure with respect to , and is given by the following elementary predictable process
for and for . Moreover, the path has exactly jumps at the time indexes in .
Proof: The posterior belief process is given by
(3) 
where we have used the fact that in (a), and the fact that the event is measurable in (b), and hence . Therefore, we can write the posterior belief process in the following form
Now we focus on computing . Note that using Bayes’ rule, we have that
(4) 
where the existence of the RadonNykodim derivative follows from the fact that . Hence, we have that
Now we focus on evaluating . Using a further application of Bayes’ rule we have that
(5) 
where we have used the fact that . For any partition , the likelihood ratio is an elementary predictable process that takes an initial value that is equal to the prior (when no samples are initially observed), and then takes constant values of in the interval between any two samples in the partition (only when a new sample is observed, the likelihood is updated). Hence, we have that
The process is predictable since the likelihood remains constant as long as no new samples are observed. Modulated by the survival probability, can be written as
Under usual regularity conditions on it is easy to see that will have jumps only at the time instances in the partition and at the stopping time , i.e. a total of jumps at the time indexes in .
Theorem 1 says that every belief path is rightcontinuous with left limits, and has jumps at the time indexes in the partition , whereas between each two jumps, the paths are predictable (i.e. they are known ahead of time once we know the magnitudes of the jumps preceding them). This means that the decisionmaker obtains ”active” information by probing the time series to observe new samples (i.e. the information in ), inducing jumps that revive her beliefs, whereas the progression of time without witnessing a stopping event offers the decisionmaker ”passive information” that is distilled just from the costless observation of the process’ survival. Both sources of information manifest themselves in terms of the likelihood ratio, and the survival probability in the expression of above.
In Figure 2, we plot the càdlàg belief paths for policies and where (i.e. policy observe a subset of the samples observed by ). We also plot the (predictable) belief path of a waitandwatch policy that observes no samples. We can see that , which has more jumps of ”active information”, copes faster with the truthful belief over time. Between each two jumps, the belief process exhibits a nonincreasing predictable path until fed with a new piece of information. The waitandwatch policy has its belief drifting away from the prior towards the wrong belief since it only distills information from the process survival, which favors the hypothesis . This discussion motivates the introduction of the following key quantities.
Information gain (surprise) : The amount of drift in the decisionmaker’s belief at time with respect to her belief at time , given the information available up to time , i.e. .
Posterior survival function (suspense) : The probability that a process generated with survives up to time given the information observed up to time , i.e. . The function is a nonincreasing function in i.e. .
That is, the information gain is the amount of “surprise” that the decisionmaker experiences in response to a new information sample expressed in terms of the change in here belief, i.e. the jumps in , whereas the survival probability (suspense) is her assessment for the risk of having the adverse event taking places in the next time interval. As we will see in the next subsection, the optimal policy would balance the two quantities when scheduling the times to sense .
We conclude our analysis for the process by noting that the lack of information samples creates bias towards the belief that (e.g. see the belief path of the waitandwatch policy in Figure 2). We formally express this behavior in the following Corollary.
Corollary 1 (Leaning towards denial). For every policy , the posterior belief process is a supermartingale with respect to , where
Proof: Recall that from Theorem 1, we know that the posterior belief process can be written as
Hence, the expected posterior belief at time given the information in the filtration can be written as
(6) 
and hence can be written as
which is equivalent to
(7) 
Furthermore, the term in the expression above can be expressed as
(8)  
Therefore, can be written as
(9) 
Now it remains to evaluate the term in order to find . We first note that
We start evaluating the above by first looking at the term . Using Bayes’ rule, we have that
(10) 
where can be expanded using successive applications of Bayes’ rule as
which is equivalent to
(11) 
Similarly, it is easy to see that
(12) 
where again, we have used the fact that . Now we reformulate (10) using Bayes rule to arrive at the following
(13) 
then using (11) and (12), (13) can be further reduced to
(14) 
Finally, we use the expression in (14) to evaluate the term as follows
which, using (14), can be written as
Since
then the integral above reduces to