DeepAI

# Balancing Suspense and Surprise: Timely Decision Making with Endogenous Information Acquisition

We develop a Bayesian model for decision-making under time pressure with endogenous information acquisition. In our model, the decision maker decides when to observe (costly) information by sampling an underlying continuous-time stochastic process (time series) that conveys information about the potential occurrence or non-occurrence of an adverse event which will terminate the decision-making process. In her attempt to predict the occurrence of the adverse event, the decision-maker follows a policy that determines when to acquire information from the time series (continuation), and when to stop acquiring information and make a final prediction (stopping). We show that the optimal policy has a rendezvous structure, i.e. a structure in which whenever a new information sample is gathered from the time series, the optimal "date" for acquiring the next sample becomes computable. The optimal interval between two information samples balances a trade-off between the decision maker's surprise, i.e. the drift in her posterior belief after observing new information, and suspense, i.e. the probability that the adverse event occurs in the time interval between two information samples. Moreover, we characterize the continuation and stopping regions in the decision-maker's state-space, and show that they depend not only on the decision-maker's beliefs, but also on the context, i.e. the current realization of the time series.

• 21 publications
• 120 publications
10/22/2020

### Predicting Human Decision Making in Psychological Tasks with Recurrent Neural Networks

Unlike traditional time series, the action sequences of human decision m...
03/30/2022

### Theory of Acceleration of Decision Making by Correlated Times Sequences

Photonic accelerators have been intensively studied to provide enhanced ...
10/21/2020

### A study of the Multicriteria decision analysis based on the time-series features and a TOPSIS method proposal for a tensorial approach

A number of Multiple Criteria Decision Analysis (MCDA) methods have been...
10/06/2018

### Discretizing Logged Interaction Data Biases Learning for Decision-Making

Time series data that are not measured at regular intervals are commonly...
04/27/2022

### Stopping time detection of wood panel compression: A functional time series approach

We consider determining the optimal stopping time for the glue curing of...
03/29/2017

### Optimal Policies for Observing Time Series and Related Restless Bandit Problems

The trade-off between the cost of acquiring and processing data, and unc...
05/12/2019

### Note on Thompson sampling for large decision problems

There is increasing interest in using streaming data to inform decision ...

## 1 Introduction

The problem of timely risk assessment and decision-making based on a sequentially observed time series is ubiquitous, with applications in finance, medicine, cognitive science and signal processing [1-7]. A common setting that arises in all these domains is that a decision-maker, provided with sequential observations of a time series, needs to decide whether or not an adverse event (e.g. financial crisis, clinical acuity for ward patients, etc) will take place in the future. The decision-maker’s recognition of a forthcoming adverse event needs to be timely, for that a delayed decision may hinder effective intervention (e.g. delayed admission of clinically acute patients to intensive care units can lead to mortality [5]). In the context of cognitive science, this decision-making task is known as the two-alternative forced choice (2AFC) task [15]. Insightful structural solutions for the optimal Bayesian 2AFC decision-making policies have been derived in [9-16], most of which are inspired by the classical work of Wald on sequential probability ratio tests (SPRT) [8].

In this paper, we present a Bayesian decision-making model in which a decision-maker adaptively decides when to gather (costly) information from an underlying time series in order to accumulate evidence on the occurrence/non-occurrence of an adverse event. The decision-maker operates under time pressure: occurrence of the adverse event terminates the decision-making process. Our abstract model is motivated and inspired by many practical decision-making tasks such as: constructing temporal patterns for gathering sensory information in perceptual decision-making [1], scheduling lab tests for ward patients in order to predict clinical deterioration in a timely manner [3, 5], designing breast cancer screening programs for early tumor detection [7], etc.

We characterize the structure of the optimal decision-making policy that prescribes when should the decision-maker acquire new information, and when should she stop acquiring information and issue a final prediction. We show that the decision-maker’s posterior belief process, based on which policies are prescribed, is a supermartingale that reflects the decision-maker’s tendency to deny the occurrence of an adverse event in the future as she observes the survival of the time series for longer time periods. Moreover, the information acquisition policy has a ”rendezvous” structure; the optimal ”date” for acquiring the next information sample can be computed given the current sample. The optimal schedule for gathering information over time balances the information gain (surprise) obtained from acquiring new samples, and the probability of survival for the underlying stochastic process (suspense). Finally, we characterize the continuation and stopping regions in the decision-maker’s state-space and show that, unlike previous models, they depend on the time series ”context” and not just the decision-maker’s beliefs.

Related Works  Mathematical models and analyses for perceptual decision-making based on sequential hypothesis testing have been developed in [9-17]. Most of these models use tools from sequential analysis developed by Wald [8] and Shiryaev [21, 22]. In [9,13,14], optimal decision-making policies for the 2AFC task were computed by modelling the decision-maker’s sensory evidence using diffusion processes [20]. These models assume an infinite time horizon for the decision-making policy, and an exogenous supply of sensory information.

The assumption of an infinite time horizon was relaxed in [10] and [15], where decision-making is assumed to be performed under the pressure of a stochastic deadline; however, these deadlines were considered to be drawn from known distributions that are independent of the hypothesis and the realized sensory evidence, and the assumption of an exogenous information supply was maintained. In practical settings, the deadlines would naturally be dependent on the realized sensory information (e.g. patients’ acuity events are correlated with their physiological information [5]), which induces more complex dynamics in the decision-making process. Context-based decision-making models were introduced in [17], but assuming an exogenous information supply and an infinite time horizon.

The notions of “suspense” and “surprise” in Bayesian decision-making have also been recently introduced in the economics literature (see [18] and the references therein). These models use measures for Bayesian surprise, originally introduced in the context of sensory neuroscience [19], in order to model the explicit preference of a decision-maker to non-instrumental information. The goal there is to design information disclosure policies that are suspense-optimal or surprise-optimal. Unlike our model, such models impose suspense (and/or surprise) as a (behavioral) preference of the decision-maker, and hence they do not emerge endogenously by virtue of rational decision making.

## 2 Timely Decision Making with Endogenous Information Acquisition

Time Series Model  The decision-maker has access to a time-series modeled as a continuous-time stochastic process that takes values in , and is defined over the time domain , with an underlying filtered probability space . The process is naturally adapted to , and hence the filtration abstracts the information conveyed in the time series realization up to time . The decision-maker extracts information from to guide her actions over time.

We assume that is a stationary Markov process111Most of the insights distilled from our results would hold for more general dependency structures. However, we keep this assumption to simplify the exposition and maintain the tractability and interpretability of the results., with a stationary transition kernel , where

is a realization of a latent Bernoulli random variable

(unobservable by the decision-maker), with . The distributional properties of the paths of are determined by , since the realization of decides which Markov kernel ( or ) generates . If the realization is equal to , then an adverse event occurs almost surely at a (finite) random time , the distribution of which is dependent on the realization of the path .

The decision-maker’s ultimate goal is to sequentially observe , and infer before the adverse event happens; inference is obsolete if it is declared after . Since is latent, the decision-maker is unaware whether the adverse event will occur or not, i.e. whether her access to is temporary ( for ) or permanent ( for ). In order to model the occurrence of the adverse event; we define as an -stopping time for the process , for which we assume the following:

• The stopping time is finite almost surely, whereas is infinite almost surely, i.e. , and .

• The stopping time is accessible222Our analyses hold if the stopping time is totally inaccessible., with a Markovian dependency on history, i.e. , where is an injective map from to and is non-decreasing in .

Thus, unlike the stochastic deadline models in [10] and [15], the decision deadline in our model (i.e. occurrence of the adverse event) is context-dependent as it depends on the time series realization (i.e. is not independent of as in [15]). We use the notation where to denote the stopped process to which the decision-maker has access. Throughout the paper, the measures and assign probability measures to the paths and respectively, and we assume that 333The absolute continuity of with respect to means that no sample path of should be fully revealing of the realization of ..

Information  The decision-maker can only observe a set of (costly) samples of rather than the full continuous path. The samples observed by the decision-maker are captured by partitioning over specific time intervals: we define with , as a size- partition of over the interval , where is the total number of samples in the partition . The decision-maker observes the values that takes at the time instances in ; thus the sequence of observations is given by the process where is the Dirac measure. The space of all partitions over the interval is denoted by . We denote the probability measures for partitioned paths generated under and with a partition as and respectively.

Since the decision-maker observes through the partition , her information at time is conveyed in the -algebra . The stopping event is observable by the decision-maker even if . We denote the -algebra generated by the stopping event as . Thus, the information that the decision-maker has at time is expressed by the filtration , and it follows that any decision-making policy needs to be -measurable.

Figure 1 depicts a Brownian path (a sample path of a Wiener process, which satisfies all the assumptions of our model)444In Figure 1, the stopping event was simulated as a totally inaccessible first jump of a Poisson process., with an exemplary partition over the time interval . The decision-maker observes the samples in sequentially, and reasons about the realization of the latent variable based on these samples and the process survival, i.e. at , the decision-maker’s information resides in the -algebra generated by the samples in , and the -algebra generated by the process’ survival .

Policies and Risks  The decision-maker’s goal is to come up with a (timely) decision , that reflects her prediction for whether the actual realization is or , before the process potentially stops at the unknown time . The decision-maker follows a policy: a (continuous-time) mapping from the observations gathered up to every time instance to two types of actions:

• A sensing action : if , then the decision-maker decides to observe a new sample from the running process at time .

• A continuation/stopping action : if , then the decision-maker decides to stop gathering samples from

, and declares a final decision (estimate) for

. Whenever the decision-maker continues observing and postpones her declaration for the estimate of .

A policy is a (-measurable) mapping rule that maps the information in to an action tuple at every time instance . We assume that every single observation that the decision-maker draws from entails a fixed cost, hence the process has to be a point process under any optimal policy555Note that the cost of observing any local continuous path is infinite, hence any optimal policy must have being a point process to keep the number of observed samples finite.. We denote the space of all such policies by .

A policy generates the following random quantities as a function of the paths on the probability space :

1- A stopping time : The first time at which the decision-maker declares its estimate for , i.e. . 2- A decision (estimate of ) : Given by . 3- A random partition : A realization of the point process , comprising a finite set of strictly increasing -stopping times at which the decision-maker decides to sample the path .

A loss function is associated with every realization of the policy

, representing the overall cost incurred when following that policy for a specific path . The loss function is given by

 (1)

where

is the cost of type I error (failure to anticipate the adverse event),

is the cost of type II error (falsely predicting that an adverse event will occur),

is the cost of the delay in declaring the estimate , is the cost incurred when the adverse event occurs before an estimate is declared (cost of missing the deadline), and is the cost of every observation sample (cost of information). The risk of each policy is defined as its expected loss

 R(π)≜E[ℓ(π;Θ)], (2)

where the expectation is taken over the paths of . In the next section, we characterize the structure of the optimal policy .

## 3 Structure of the Optimal Policy

Since the decision-maker’s posterior belief at time , defined as , is an important statistic for designing sequential policies [10, 21-22], we start our characterization for by investigating the belief process .

### 3.1 The Posterior Belief Process

Recall that the decision-maker distills information from two types of observations: the realization of the partitioned time series (i.e. the information in ), and 2) the survival of the process up to time (i.e. the information in ). In the following Theorem, we study the evolution of the decision-maker’s beliefs as she integrates these pieces of information over time666All proofs are provided in the supplementary material.

Theorem 1 (Information and beliefs).   Every posterior belief trajectory associated with a policy that creates a partition of is a càdlàg path given by

 μt=⎧⎪⎨⎪⎩1,fort≥τ(1+1−ppd~Po(Pπt)d~P1(Pπt))−1,for0≤t<τ

where is the Radon-Nikodym derivative777Since we impose the condition and fix a partition , then the Radon-Nikodym derivative exists. of the measure with respect to , and is given by the following elementary predictable process

 1d~Po(Pπt)d~P1(Pπt)=N(Pπt)−1∑k=1P(X(Pπt)|Θ=1)P(X(Pπt)|Θ=0)% Likelihood ratioP(τ>t|σ(X(Pπt),Θ=1)Survival probability1{Pπt(k)≤t≤Pπt(k+1)},

for and for . Moreover, the path has exactly jumps at the time indexes in .

Proof: The posterior belief process is given by

 μt =P(Θ=1|~Ft) (a)=P(Θ=1|σ(X(Pπt)),St) =1{t≥τ}⋅P(Θ=1|σ(X(Pπt)),t≥τ)+1{t<τ}⋅P(Θ=1|σ(X(Pπt)),t<τ) (b)=1{t≥τ}+1{t<τ}⋅P(Θ=1|σ(X(Pπt)),t<τ), (3)

where we have used the fact that in (a), and the fact that the event is -measurable in (b), and hence . Therefore, we can write the posterior belief process in the following form

 μt={1,fort≥τP(Θ=1|σ(X(Pπt)),t<τ),for0≤t<τ.

Now we focus on computing . Note that using Bayes’ rule, we have that

 P(Θ=1|σ(X(Pπt)),t<τ) =P(Θ=1,σ(X(Pπt)),t<τ)P(σ(X(Pπt)),t<τ) =P(Θ=1,σ(X(Pπt)),t<τ)∑θ∈{0,1}P(Θ=θ,σ(X(Pπt)),t<τ) =dP(σ(X(Pπt)),t<τ|Θ=1)P(Θ=1)∑θ∈{0,1}dP(σ(X(Pπt)),t<τ|Θ=θ)P(Θ=θ) =dP(σ(X(Pπt)),t<τ|Θ=1)P(Θ=1)dP(σ(X(Pπt)),t<τ|Θ=0)P(Θ=0)+dP(σ(X(Pπt)),t<τ|Θ=1)P(Θ=1) =pdP(σ(X(Pπt)),t<τ|Θ=1)(1−p)dP(σ(X(Pπt)),t<τ|Θ=0)+pdP(σ(X(Pπt)),t<τ|Θ=1) =(1+1−pp⋅dP(σ(X(Pπt)),t<τ|Θ=0)dP(σ(X(Pπt)),t<τ|Θ=1))−1 =(1+1−pp⋅d~Po(Pπt)d~P1(Pπt))−1, (4)

where the existence of the Radon-Nykodim derivative follows from the fact that . Hence, we have that

 μt=⎧⎪⎨⎪⎩1,fort≥τ(1+1−pp⋅d~Po(Pπt)d~P1(Pπt))−1,for0≤t<τ.

Now we focus on evaluating . Using a further application of Bayes’ rule we have that

 =dP(σ(X(Pπt)),t<τ|Θ=1)dP(σ(X(Pπt)),t<τ|Θ=0) =P(t<τ|X(Pπt),Θ=1)⋅dP(X(Pπt)|Θ=1)P(t<τ|X(Pπt),Θ=0)⋅dP(X(Pπt)|Θ=0) =dP(X(Pπt)|Θ=1)dP(X(Pπt)|Θ=0)⋅P(t<τ|X(Pπt),Θ=1), (5)

where we have used the fact that . For any partition , the likelihood ratio is an elementary predictable process that takes an initial value that is equal to the prior (when no samples are initially observed), and then takes constant values of in the interval between any two samples in the partition (only when a new sample is observed, the likelihood is updated). Hence, we have that

 dP(X(Pπt)|Θ=1)dP(X(Pπt)|Θ=0)=p1{t=0}+N(Pπt)−1∑k=1P(X(Pπt)|Θ=1)P(X(Pπt)|Θ=0)1{Pπt(k−1)≤t≤Pπt(k)}.

The process is predictable since the likelihood remains constant as long as no new samples are observed. Modulated by the survival probability, can be written as

 pP(τ>t|Θ=1)1{tt|σ(X(Pπt),Θ=1)1{Pπt(k)≤t≤Pπt(k+1)}.

Under usual regularity conditions on it is easy to see that will have jumps only at the time instances in the partition and at the stopping time , i.e. a total of jumps at the time indexes in .

Theorem 1 says that every belief path is right-continuous with left limits, and has jumps at the time indexes in the partition , whereas between each two jumps, the paths are predictable (i.e. they are known ahead of time once we know the magnitudes of the jumps preceding them). This means that the decision-maker obtains ”active” information by probing the time series to observe new samples (i.e. the information in ), inducing jumps that revive her beliefs, whereas the progression of time without witnessing a stopping event offers the decision-maker ”passive information” that is distilled just from the costless observation of the process’ survival. Both sources of information manifest themselves in terms of the likelihood ratio, and the survival probability in the expression of above.

In Figure 2, we plot the càdlàg belief paths for policies and where (i.e. policy observe a subset of the samples observed by ). We also plot the (predictable) belief path of a wait-and-watch policy that observes no samples. We can see that , which has more jumps of ”active information”, copes faster with the truthful belief over time. Between each two jumps, the belief process exhibits a non-increasing predictable path until fed with a new piece of information. The wait-and-watch policy has its belief drifting away from the prior towards the wrong belief since it only distills information from the process survival, which favors the hypothesis . This discussion motivates the introduction of the following key quantities.

Information gain (surprise) : The amount of drift in the decision-maker’s belief at time with respect to her belief at time , given the information available up to time , i.e. . Posterior survival function (suspense) : The probability that a process generated with survives up to time given the information observed up to time , i.e. . The function is a non-increasing function in i.e. .

That is, the information gain is the amount of “surprise” that the decision-maker experiences in response to a new information sample expressed in terms of the change in here belief, i.e. the jumps in , whereas the survival probability (suspense) is her assessment for the risk of having the adverse event taking places in the next time interval. As we will see in the next subsection, the optimal policy would balance the two quantities when scheduling the times to sense .

We conclude our analysis for the process by noting that the lack of information samples creates bias towards the belief that (e.g. see the belief path of the wait-and-watch policy in Figure 2). We formally express this behavior in the following Corollary.

Corollary 1 (Leaning towards denial).   For every policy , the posterior belief process is a supermartingale with respect to , where

 E[μt+Δt|~Ft]=μt−μ2tSt(Δt)(1−St(Δt))≤μt,∀Δt∈R+.

Proof: Recall that from Theorem 1, we know that the posterior belief process can be written as

 μt=1{t≥τ}+1{t<τ}P(Θ=1|~Ft).

Hence, the expected posterior belief at time given the information in the filtration can be written as

 E[μt+Δt∣∣~Ft] =E[1{t+Δt≥τ}+1{t+Δt<τ}P(Θ=1|~Ft+Δt)∣∣~Ft] =P(Θ=1,t+Δt≥τ|~Ft)+P(t+Δt<τ|~Ft)⋅E[P(Θ=1|~Ft+Δt)∣∣~Ft∨{t+Δt<τ}], (6)

and hence can be written as

 P(t+Δt≥τ|~Ft,Θ=1)⋅P(Θ=1|~Ft)+P(t+Δt<τ|~Ft)⋅E[P(Θ=1|~Ft+Δt)∣∣~Ft∨{t+Δt<τ}],

which is equivalent to

 E[μt+Δt∣∣~Ft] =(1−St(Δt))⋅μt+P(t+Δt<τ|~Ft)⋅E[P(Θ=1|~Ft+Δt)∣∣~Ft∨{t+Δt<τ}]. (7)

Furthermore, the term in the expression above can be expressed as

 P(t+Δt<τ|~Ft) =P(t+Δt<τ|~Ft,Θ=1)⋅P(Θ=1|~Ft)+P(t+Δt<τ|~Ft,Θ=0)⋅P(Θ=0|~Ft) (8) =St(Δt)⋅μt+(1−μt).

Therefore, can be written as

 E[μt+Δt∣∣~Ft] =(1−St(Δt))⋅μt+(1−μt+St(Δt)⋅μt)⋅E[P(Θ=1|~Ft+Δt)∣∣~Ft∨{t+Δt<τ}]. (9)

Now it remains to evaluate the term in order to find . We first note that

 E[P(Θ=1|~Ft+Δt)∣∣~Ft∨{t+Δt<τ}]=E[P(Θ=1|σ(Xτ(Pπt+Δt)),t+Δt<τ)∣∣~Ft].

We start evaluating the above by first looking at the term . Using Bayes’ rule, we have that

 P(Θ=1|Xτ(Pπt+Δt),t+Δt<τ) =P(Θ=1,Xτ(Pπt+Δt),t+Δt<τ)P(Xτ(Pπt+Δt),t+Δt<τ), (10)

where can be expanded using successive applications of Bayes’ rule as

 P(Θ=1|Xτ(Pπt),t<τ)⋅P(Xτ(Pπt),t<τ)⋅P(t+Δt<τ|Θ=1,Xτ(Pπt),t<τ)
 ⋅dP(Xτ(t+Δt)|Θ=1,Xτ(Pπt),t+Δt<τ),

which is equivalent to

 P(Θ=1,Xτ(Pπt+Δt),t+Δt<τ)=μt⋅St(Δt)⋅P(Xτ(Pπt),t<τ)⋅dP(Xτ(t+Δt)|Θ=1,Xτ(Pπt),t+Δt<τ) (11)

Similarly, it is easy to see that

 P(Θ=0,Xτ(Pπt+Δt),t+Δt<τ)=(1−μt)⋅P(Xτ(Pπt),t<τ)⋅dP(Xτ(t+Δt)|Θ=0,Xτ(Pπt),t+Δt<τ), (12)

where again, we have used the fact that . Now we re-formulate (10) using Bayes rule to arrive at the following

 P(Θ=1|Xτ(Pπt+Δt),t+Δt<τ) =P(Θ=1,Xτ(Pπt+Δt),t+Δt<τ)∑θ∈{0,1}P(Θ=θ,Xτ(Pπt+Δt),t+Δt<τ), (13)

then using (11) and (12), (13) can be further reduced to

 μt⋅St(Δt)⋅dP(Xτ(t+Δt)|Θ=1,Xτ(Pπt),t+Δt<τ)μt⋅St(Δt)⋅dP(Xτ(t+Δt)|Θ=1,Xτ(Pπt),t+Δt<τ)+(1−μt)⋅dP(Xτ(t+Δt)|Θ=0,Xτ(Pπt),t+Δt<τ). (14)

Finally, we use the expression in (14) to evaluate the term as follows

 E[P(Θ=1|σ(Xτ(Pπt+Δt)),t+Δt<τ)∣∣~Ft]=
 ∑θ∈{0,1}∫P(Θ=1|Xτ(Pπt+Δt),t+Δt<τ)⋅dP(Xτ(t+Δt)|Θ=θ,Xτ(Pπt),t+Δt<τ),

which, using (14), can be written as

Since

 ∑θ∈{0,1}dP(Xτ(t+Δt)|Θ=θ,Xτ(Pπt),t+Δt<τ)=
 μt⋅St(Δt)⋅dP(Xτ(t+Δt)|Θ=1,Xτ(Pπt),t+Δt<τ)+(1−μt)⋅dP(Xτ(t+Δt)|Θ=0,Xτ(Pπt),t+Δt<τ),

then the integral above reduces to

 ∫μt⋅St(Δt)⋅dP(Xτ(t+Δt)|Θ=θ,Xτ(Pπt),t+Δt<τ)=μt⋅St(Δt)⋅∫dP(Xτ(t+Δt)|Θ=θ,Xτ(Pπt),t+Δt