1 Introduction
While statisticians are trained to be aware of multiple testing issues, temporal multiplicity is often easy to miss. Let us examine the following simplified situation alluded to in the abstract. Consider a team of statisticians at a pharmaceutical company who test a new drug every week of the year. In week , a new drug is under consideration, and to assess its treatment effect , the team conducts a new randomized clinical trial with new participants. Suppose that the data, such as the normalized empirical difference in means between the treatment and control groups, can be summarized by the observation , independent of all the previous .
Now consider the following selection rule: if , then the statisticians simply ignore drug , and if , then the team reports the twosided marginal CI for to the management (who may then decide to run a much larger second phase clinical trial since the CI does not contain 0). This may initially seem like an innocuous situation: each drug is different and has a different treatment effect , the data is always fresh and independent, the decision for whether or not to construct the CI for is dependent only on and independent of all other , and so is the interval if constructed.
Nonetheless, the combination of multiplicity and selection is a cause for concern already in the offline setting, as was insightfully pointed out by Benjamini and Yekutieli (2005). In the online case, when there is an infinite sequence of parameters, it is even easier to construct an example where ignoring selection has undesirable consequences. Indeed, consider the special case where for all , in other words, every tested drug is equivalent to a placebo. In this situation, every single CI that is reported to the management is incorrect, since it does not contain zero. Because a selection will eventually occur, among constructed CIs the proportion of noncovering CIs—this is later formally defined as the false coverage proportion, FCP—will equal one from this point on. Thus, the FCR—expectation of FDP—is not controlled. Of course, the second phase of the trial will rectify this error, but at a huge cost of time and money, and loss of faith in the team of statisticians.
One natural solution for this is provided by conditional postselection inference: instead of a marginal CI, we may construct a conditional interval, where we condition on the event that , leading to inference based on a truncated gaussian likelihood in the above setting. Confidence intervals based on a truncated normal observation were proposed by Zhong and Prentice (2008) and Weinstein et al. (2013) to counteract the selection effect when providing inference after hypothesis testing. While these works consider the batch (offline) setting, in our simple example constructing such conditional CIs (as well as the selection rule) is a legitimate online CI procedure. Furthermore, this controls the FCR— in fact, as will be discussed in Section 3 and demonstrated in our simulations, constructing conditional intervals provides unnecessarily strong guarantees, that come at a price.
In this paper, we will propose a new approach for online FCR control that is very different from the aforementioned conditional approach. Informally, in order to achieve FCR control at level , instead of constructing conditional CIs, we construct marginal CIs for some . The algorithm to set the s is inspired by recent advances in the online false discovery rate (FDR) control literature, specifically recent work by the first author (Ramdas et al., 2017). The new CI procedure works in much more generality than the simple example described above, that is when are multidimensional, the data is not necessarily gaussian, and so on—cases in which constructing a conditional CI may be substantially harder if at all possible.
Even more importantly, by constructing marginal instead of conditional CIs, we leave open the possibility to use as a criterion for selection the candidate CI itself. For example, the rule may entail constructing the candidate marginal CI only if it does not include values of opposite signs. Thus, returning to the motivating example, this allows the team of statisticians to ensure that each reported CI is conclusive about the direction of the treatment effect, while the FCR is controlled. With such situations in mind, we instantiate our marginal CI procedure to propose a confidence intervaldriven procedure, that constructs signdetermining CIs and can be seen as an online adaptation of the ideas of Weinstein and Yekutieli (2019). Every such signdetermining CI procedure corresponds to an online signclassification procedure that controls the false sign rate (FSR). As a special case we show that for some recently proposed online testing procedures, supplementing rejections based on twosided values with directional decisions suffices to control the FSR.
The rest of this paper is organized as follows. Section 2 sets up the problem formally and introduces necessary notation. In Section 3 we discuss a conditional solution to the online FCR problem. A new online procedure that adjusts marginal confidence intervals, is presented in Section 4. In Section 5 we show how our marginal CI procedure can be used to solve a general online localization problem, and study the special case of the online signclassification problem. Simulation results for comparing the marginal approach and the conditional approach are reported in Section 6. We end with a brief discussion in Section 7, where we mention how all of our results also hold for prediction intervals for unseen responses, with further details furnished in Appendix B.
2 Problem Setup
Let be a fixed sequence of fixed unknown parameters, where the domain of is arbitrary, but common examples may include or . Let denote the set of all measurable subsets of , in other words it is any acceptable confidence set for . In our setup, at each time step , we observe an independent observation (or summary statistic) , where the distribution of depends on (and possibly other parameters). For example when , we may have . Let denote the selection rule that indicates whether or not the user wishes to report a confidence set for . Explicitly, letting be the indicator for selection, where means that the user will report a confidence set for . Let the filtration formed by the sequence of selection decisions be denoted by
Next, let be the rule for constructing the confidence set for , the second argument allowing to take as an input a “confidence level". We denote . Thus, may be a marginal or a conditional confidence set for as discussed later, but in general it is no more than a map from as described above. For simplicity, in the rest of the paper we refer to as a confidence interval (CI) like it would usually be if , but with the understanding that everything discussed in this paper applies to the more general case of arbitrary confidence sets.
In our setup, the above rules are all required to be predictable, meaning that
and we write
. Naturally, the instantiated random variables
both depend on . However, the rules must be measurable, hence specified before observing . We emphasize that the requirement to be measurable also prevents the rules from depending on unless it is through . Importantly, can depend on because both are predictable, and hence can depend on —for example, whether or not looks “favorable”, a point to which we will return in later sections.Using these definitions, we now define an online selectiveCI procedure. In the rest of the paper, we omit the term “selective", but this is done only for the sake of readability. Thus, an online CI protocol proceeds as follows:

At time , first commit to .

Then, observe . Decide whether or not is selected for coverage by setting .

Report if . Then, increment , and go back to step 1.
We next discuss the metrics used to evaluate the errors made by an online CI protocol.
2.1 Error metrics
Let the unknown false coverage indicator be denoted . Hence, implies that we intended to cover but our reported CI failed to do so. Using the aforementioned terminology, define the false coverage proportion up to time as
where per standard convention (i.e., if no intervals are constructed, then the false coverage proportion is trivially zero). The false coverage rate (FCR) and the modified FCR are defined, respectively, as
Along the way, we will consider the relationship of the FCR to other error metrics like the positive FCR (pFCR), the false sign rate (FSR) and the wellknown false discovery rate (FDR).
2.2 Main objective
The main objective of this paper is to develop and compare algorithms to specify and such that FCR or mFCR control is guaranteed at any time regardless of the choice of , that is,
Specifically, we explore the following two avenues for constructing the CIs:

Marginal CI: this has the guarantee that for any , we have
(1) where the probability is taken only over the marginal measure of
, because the rule is predictable. 
Conditional CI: this has the property that for any , we have
(2) where the probability is taken over the measure of conditional on , because is predictable. ^{1}^{1}1 In defining a conditional CI, one may consider requiring only that . This weaker condition will suffice for mFCR control, as can be seen from the proofs of our theorems. We chose to use the stronger requirement (2) partly because it is more natural to construct a conditional CI when conditioning on along with ; indeed, our simulations include a typical example where we do not know how to construct a conditional CI with the weaker property, but it is easy to construct one with the stronger property (2).
For either choice, we must specify the level to use with if is selected for coverage.
On accomplishing this main objective, we detail in Section 5 exactly how it enables us to solve several other practical problems of interest, such as controlling the false sign rate. As mentioned in the end of the discussion in Section 7, the entire setup of this paper applies equally well to prediction intervals instead of CIs.
3 A method based on conditional inference
A conceptually straightforward method to control the mFCR is to construct conditional CIs at the nominal level . This trivially controls the mFCR at level , as seen by the following argument.
Theorem 1.
Constructing a conditional CI after every selection ensures that .
Proof.
From the definition (2) of a conditional CI it follows immediately that
Together with the fact that , we have
and hence,
Rearranging the first and last displays above yields the desired result. ∎
Constructing conditional CIs at the nominal level ensures also that FCR is controlled. As a matter of fact, even the conditional expectation of FCP given that at least one selection is made,
is controlled when using conditional CIs. We call the above the positive FCR, in analogy to the positive FDR (Storey et al., 2003).
Theorem 2.
Constructing a conditional CI after every selection ensures that
Proof.
Consider any sequence such that . We have
where equality uses the fact that the selection decisions are independent of given because the selection rules are predictable. The original claim follows by taking expectation over the conditional distribution of given that . ∎
We immediately conclude that with conditional CIs we also have
Control of the pFCR (and hence FCR) may seem pleasant, but in fact this strong guarantee has a price. Our two main criticisms of the conditional approach are:

Incompatibility. Conditional CIs are not able to ensure compatibility between selection decisions and the reported CI. For example, it is impossible to ensure that all selected CIs are signdetermining, meaning that it is impossible to select only those confidence intervals that do not contain 0. This is discussed further and explicitly demonstrated in Subsection 6.2.

Intractability. The conditional distribution of given and the event , is the distribution resulting from restricting to some subset of , which may be intractable to compute in general. At the very least, the conditional approach requires a casebycase treatment; depending on the marginal distribution of and the selection rules , computing the conditional distribution may be far from trivial.
In the next sections, we describe a marginal approach to controlling the FCR, and elaborate on its various advantages with respect to the aforementioned conditional approach.
4 Adjusting marginal intervals: the LORDCI procedure
In what follows, an algorithm is a sequence of mappings from past selection decisions to confidence levels, meaning that it maps to . By definition, such an is measurable, hence a procedure that constructs a marginal confidence interval at level whenever , is a legitimate online CI protocol. We will refer to such a procedure as a marginal online CI protocol/procedure. A trivial marginal online CI protocol can be obtained by taking any fixed sequence of such that the series ; this procedure is called alphaspending in the context of online FDR control by Foster and Stine (2008), and controls the familywise error rate (which in our context is the probability of even a single miscoverage event). Naturally, this is a much more stringent notion of error, and hence the resulting selected CIs will be excessively wide. The question we will address below is the following: is there a nontrivial algorithm to set the so that FCR is controlled?
4.1 mFCR control for arbitrary selection rules
Our first result identifies a sufficient condition for an algorithm to imply mFCR
control. Thus, we first associate any algorithm with an estimated false coverage proportion,
We may then define the following procedure for online FCR control.
Definition 1 (LORDCI procedure).
A LORDCI procedure is any online protocol that constructs marginal confidence intervals, where are defined in a predictable fashion to maintain the invariant
(3) 
regardless of the selection rules .
Any LORDCI procedure comes with the following theoretical guarantee.
Theorem 3.
Given an arbitrary sequence of selection rules made by the user, any LORDCI procedure has the guarantee that .
Proof.
If one really insisted on requiring FCR control as opposed to mFCR control, we provide a guarantee for a subclass of “monotone” selection rules, as introduced below.
4.2 Monotonicity of algorithms, intervals and selection rules
An online FCR algorithm is called monotone if for any two vectors , we have . Equivalently, an online FCR algorithm is monotone if
(4) 
where is the level produced by the online FCR algorithm, when presented with the history of selection decisions . We say that a CI rule is monotone if
Monotonicity is satisfied for most natural (even nonequivariant) CI constructions, and thus we do not view this as a restriction. Irrespective of whether the online FCR algorithm and CI rule are monotone, we say that a selection rule is monotone if
(5) 
where, as before, is used to denote the selection decision at time , for the same observation , but for a different history .
As a simple special case, if each rule is independent of , then such a selection rule is trivially monotone, even if the underlying online FCR algorithm is not. In other words, if the final decision is based only on and on none of the past decisions, then such a rule is monotone. For example, setting for every constitutes a trivial monotone selection rule.
4.3 FCR control for monotone selection rules
We can provide the following guarantee for the nontrivial class of monotone selection rules.
Theorem 4.
Given an arbitrary sequence of monotone selection rules chosen by the user, any LORDCI procedure that maintains the invariant (3) also satisfies that .
Proof.
The critical step in the aforementioned argument is the invocation of the following powerful lemma.
Lemma 1.
Given an arbitrary sequence of monotone selection rules, we have
Intuitively, the statement of the above lemma is obvious if the expectation could be taken separately in the numerator, as if it was independent of the denominator, because by construction (1). The following proof demonstrates that monotonicity allows us to formally perform such a step.
Proof.
Without loss of generality, we can ignore the case when almost surely for some ; in other words, if we would never select , then almost surely, and we can just ignore the time instant . Hence, we only consider the case when at least one value of leads to selection.
To derive a bound on , consider the following thought experiment. Let us hallucinate what selection decisions would have occurred under a slightly different series of observations, namely
where is any value that would have led to selection of , which is a predictable choice, because it can be made based on only the predictable selection rule . Let the sequence of selection decisions made by the same algorithm on be denoted , the levels be denoted , and the constructed intervals be . We then claim that
where we have intentionally altered only the denominator. To see that the above equality holds, first note that if , then . Then note that if , then for all . Indeed, because , the first selection decisions are identical by construction; then if (and by construction), then , and so every future selection decision is also identical (and also the constructed CIs, at levels ). Hence,
where inequality holds because , equality follows because is measurable because by construction, inequality holds by definition (1) of a marginal CI, and inequality holds because for all by the monotonicity of selection rules. This completes the proof of the lemma. ∎
The above is a generalization of lemmas that have been proved in the context of online FDR control by Javanmard and Montanari (2018); Ramdas et al. (2017), since the selection event may or may not be associated with the miscoverage event , but in online FDR control, the rejection event is obviously directly related to the false discovery event . We will later see that online FCR control captures online FDR control as a special case.
4.4 An explicit monotone online FCR algorithm
By the theorems above, the class of procedures in Definition 1 yield mFCR (FCR) control. To obtain a specific procedure, we use the LORD++ online FDR algorithm (Ramdas et al., 2017) to set the sequence of the . LORD++ was originally designed to maintain the invariant (3) in the context of testing, i.e., when stands for rejection of the
th null hypothesis. In the absence of
values, our algorithm instead substitutes rejection events by arbitrary selection events (). We call the aforementioned adaptation of LORD++ to the context of CIs the LORDCI algorithm, and refer to the corresponding marginal online CI protocol as the LORDCI procedure. In the sequel, unless indicated otherwise, whenever we refer to a LORDCI procedure (or simply LORDCI), we mean the LORDCI procedure, that is, the protocol utilizing LORD++. An explicit description of LORDCI is given in Protocol 1 below.In Protocol 1, is a deterministic nonincreasing sequence of positive constants summing to one, that is specified in advance; is a prespecified constant; is a sequence of arbitrary predictable selection rules; and is now a sequence of marginal CIs, that is, each has the property (1). On implementing Protocol 1, set whenever (this happens for ). It is easy to verify that (see line 7 in Protocol 1) is monotone, because it has an additional nonnegative term in the summation with every new selection. One may also verify that satisfies the invariant (3), because is always less than .
5 Selections that depend on the candidate CIs
In the LORDCI procedure, the predictable sequences of selection rules and marginal CI rules are both arbitrary, and these may be specified independently of each other. In this section we demonstrate how tying the selection rule to the confidence interval rule, by letting the (candidate) marginal CI determine whether is selected or not, can lead to many instantiations of the LORDCI procedure that are of practical interest.
Informally, the idea is as follows. Suppose that we made some choice in advance for the marginal CI rule. Suppose also that we have in mind a criterion for what constitutes a “good" reported CI. For example, when , we might consider a reported CI “good" if it excludes zero. Then at each step , upon observing , we pretend that we were to construct where are set by the LORDCI algorithm, but we only actually select and report it if it is “good". By design, then, we only report “good" intervals. Note that because is predictable, choosing to report an interval only if it is “good” is a predictable selection rule. Therefore we may use LORDCI to determine levels of coverage and immediately be guaranteed FCR control. This is formalized in the definition below. We just remark that these ideas appeared first in Weinstein and Yekutieli (2019), but their treatment is rather informal, and, importantly, their proposed procedure is not an online procedure.
For the remainder of this section, whenever we speak of a CI rule, it will be assumed to be monotone. Again, we do not view this as a restriction.
5.1 From coverage to localization
Suppose that for each we have a collection of prespecified disjoint subsets of . Being able to say that for exactly one qualifies as having “localized” the signal ^{2}^{2}2If the sets are not disjoint apriori, one may either create a new set for the intersection, or generalize the definition of localization to allow for the reporting of multiple sets.. On observing , we must either localize by specifying which of it belongs to, or refrain from making any claim at all about (the latter reflecting the decision “not enough evidence to decide"). The corresponding natural notion of error for a given procedure is a false localization rate (FLR),
As we will see below, the false localization rate generalizes the false discovery rate.
Definition 2 (LORDCI for localization).
Let be an arbitrary prespecified monotone marginal CI rule for , and define as follows:
Then LORDCI for localization is the online CI protocol that applies LORDCI to the above selection rule, and when , it outputs the unique index such that .
The above procedure comes with the following guarantee.
Theorem 5.
The LORDCI for localization procedure (Definition 2) satisfies for any .
Proof.
Note that the selection rule in Definition 2 can be rewritten as
(7) 
which defines a predictable selection rule because are predictable. Thus, the procedure in Definition 2 is LORDCI for a predictable selection rule. Because the CI rules are monotone, and the output by the LORDCI algorithm are also monotone by construction, we conclude that the selection rule (7) is also monotone according to condition (5). Hence, the procedure in Definition 2 is now the LORDCI procedure for a predictable and monotone selection rule, which controls the FCR by Theorem 4. The last step is to observe that a false localization event implies a false coverage event (but not necessarily the other way around), and hence . ∎
Next we consider some special cases of localization and their implications.
5.2 Online composite hypothesis testing with FDR control
Suppose that we have a sequence of composite null hypotheses that we wish to test:
where . For any online testing procedure, let and define
which reduces to the usual definition of the FDR when include a single value, i.e., when testing point null hypotheses. We can use the procedure of Definition 2 to devise an online testing protocol that controls the FDR.
Definition 3 (LORDCI for composite testing).
Consider an arbitrary marginal CI rule for each parameter . We reject the th composite null hypothesis and set , if and only if
(8) 
where is determined by the LORDCI procedure using .
The procedure in the definition above comes with the following guarantee.
Corollary 1.
The LORDCI procedure for composite testing (Definition 3) enjoys for any .
Proof.
Before proceeding, we would like to point out a connection to existing online FDR testing protocols. We can define a pvalue for testing by
where is any monotone CI for . Indeed, if , then for any we have
We can therefore apply an existing online FDR protocol using this definition for a pvalue. Note that while the computation of the pvalue above might not be trivial, we are really only required at each step to check if , which is equivalent to rejecting when . In fact, if we use the same CI rules and the same algorithm to set the as in the CI procedure employed in Definition 3, we obtain exactly the composite testing procedure of Definition 3.
5.3 Online signclassification with FSR control
Sometimes we would like to ask about the direction of the effect rather than test a twosided null hypothesis. As argued in Gelman et al. (2012); Gelman and Tuerlinckx (2000), this often makes a more sensible question than asking whether a parameter is equal to zero. In fact, even statisticians that use a twosided test of a point null hypothesis, tend to supplement—perhaps with a leap of faith—a rejection of the null with a claim about the sign of the parameter (Goeman et al., 2010, call this post hoc inference of the sign). Inferring the signs of multiple parameters simultaneously was considered at least as early as Bohrer (1979); Bohrer and Schervish (1980); Hochberg (1986). In the story of Section 1, the management might be interested primarily in identifying which drugs have a positive treatment effect and which drugs have a nonpositive treatment effect. Throughout this subsection suppose that , and that for some common likelihood function , so that a common CI rule can be used at all times^{3}^{3}3Note that a CI rule depends on the likelihood function only, not on the true value of ; for lack of a better phrase, we call this situation the “common likelihood” case.
When considering a signclassification procedure, we will aim to control—in analogy to the FDR—the expected ratio of number of incorrect directional decisions to the total number of directional decisions made. Throughout this paper, to make a directional decision means to classify
(positive) or (nonpositive); because zero is included on one side, this can be considered a weak signclassification (although the definition is not symmetric, zero can be just as well appended to the positive side instead of the negative side). Hence, a signclassification protocol is an online procedure that outputsBorrowing a term from Stephens (2016), we define the false sign rate as
It is worth noting that this is slightly different from the definitions of Benjamini et al. (1993), who consider procedures that classify parameters as strictly positive or as strictly negative; however, if there are no parameters that equal zero exactly, then all definitions coincide (which is the case in virtually all realistic situations, see, e.g., Tukey, 1991). A natural procedure to consider is applying any online FDR protocol to test the hypotheses that
, and then classify each rejection according to the sign of an unbiased estimate of
(so, for example, when , a rejected null with entails ). We will see later that, for example, applying LORD++ to the usual twosided values indeed works, however FSR control is not automatically guaranteed, i.e., this requires a proof (see Gelman and Tuerlinckx, 2000, who point out caveats in replacing rejections with statements about the signs of the parameters).We can rely again on the procedure of Definition 2 to devise a signclassification protocol that controls the FSR. Thus, suppose that we have an arbitrary (common) marginal CI rule . Now specialize the prescription in Definition 2 by taking , . In words, this is the LORDCI procedure that reports whenever it includes either only positive or only nonpositive values. This special case of the procedure in Definition 2 is central enough to merit a separate definition.
Definition 4 (Signdetermining LORDCI procedure).
Suppose that we are in the “common likelihood" case, and let be any marginal confidence interval procedure, i.e., for any . Assume that for any parameter there is a corresponding “null" value . The signdetermining LORDCI procedure associated with is defined to be the LORDCI procedure that utilizes the selection rules
(9) 
and constructs if .
For simplicity, assume from now on that . In that case the signdetermining LORDCI procedure constructs if and only if this interval is signdetermining, meaning that it includes only positive or only nonpositive values.
Returning to the FSR problem, apply now the CI procedure from Definition 4 with an arbitrary choice of , and set
(10) 
Then we have the following result:
Corollary 2.
The signclassification procedure given by (10), enjoys for any .
Proof.
We have because a wrong decision on the sign of a parameter necessarily implies that a noncovering CI was constructed. Here, the left hand side of the inequality is the false sign rate associated with the signclassification procedure in (10), and the right hand side is the false coverage rate associated with the signdetermining LORDCI procedure of Definition 4 (using ). On the other hand, is controlled as a special case of the procedure of Definition 2, because we assumed that is monotone. ∎
Remark 1.
It is easy to see that we could drop the assumption on monotonicity of the CI rules in this section and still be guaranteed control of the respective modified error rates, for example mFDR in Subsection 5.2 and mFSR in Subsection 5.3 (which would now be implied by mFCR control for the corresponding CI procedure).
5.4 Configuring the signdetermining LORDCI procedure
The signdetermining LORDCI procedure was used in the previous subsection as a “wrapper” device to control the FSR for any definition of the CI rules , but it can be of interest to design specific rules since we know that the FSR procedure will only select signdetermining CIs. Indeed, in most realistic situations, it is useful to supplement a directional decision with confidence bounds that are consistent with that decision. For example, if the team of statisticians declare a specific drug to have a positive effect, the management will likely want to know how large the effect is at least, as would be quantified by a nonnegative lower endpoint of a CI.
Thus, ideally, the signdetermining LORDCI procedure selects and constructs a large number of CIs—meaning that it is “powerful” when translated to an FSR protocol as in Section 5.3—while the lower endpoint for a positive interval is as far away from zero as possible (and, similarly, the upper endpoint for a nonpositive interval is as far away from zero as possible). Unfortunately, these two goals are conflicting in general; see Benjamini et al. (1998), who study the singleparameter case. The tradeoff between these two properties will be controlled here through the choice of the marginal CI rule ; thus, the corresponding signdetermining LORDCI procedure may be seen as the online counterpart of the offline signdetermining multiple testing procedure of Weinstein and Yekutieli (2019). Below, we point out a few concrete examples of CI rules . For the rest of this section assume that , though the constructions below can be extended beyond the normal case.

is the usual symmetric interval, .
It can be easily verified that the signdetermining LORDCI procedure with this choice for , selects exactly the set of parameters rejected by the LORD++ online FDR procedure using the usual twosided values
A constructed CI has length , and is guaranteed to be signdetermining. As a byproduct, if we translate this into a signclassification procedure as explained in Section 5.3, we have as a conclusion that selecting with LORD++ and classifying according to the sign of , controls the FSR. In fact, this conclusion still holds if is interpreted as declaring that is strictly negative rather than nonpositive, because the usual symmetric interval is open.

is the “onesided" interval given by
It can be verified that the signdetermining LORDCI procedure with this choice for , selects exactly the set of parameters rejected by the LORD++ online FDR procedure using “onesided" values
hence is much more powerful when translated into a signclassification procedure. However, constructed intervals do not have an a priori bound on their length, since they take the form (if the observation is positive) or (if the observation is negative). Perhaps more seriously, a reported interval necessarily touches zero, thus failing to address our followup question on how big the effect is at least (a nonpositive interval even includes zero).

is the Modified QuasiConventional (MQC) CI of Weinstein and Yekutieli (2019).
The MQC confidence interval is in a sense a compromise between the two choices of presented above: it determines the sign earlier than the twosided interval but not as early as the “onesided" interval. In turn, it leads to more power than LORD++ applied to twosided values when interpreted as a signclassification procedure, and at the same time separates from zero for large enough . A mathematical definition of the MQC interval is given in Weinstein and Yekutieli (2019), where its properties are further explained; we include a figure instead to illustrate the properties of that interval. In Figure 1 the endpoints of the MQC interval are shown in solid lines as a function of the observation for . The potential gain in power due to using the MQC interval instead of the symmetric interval, is demonstrated in Section 6.
6 Numerical experiments
6.1 Simulations
To examine how the LORDCI procedure compares to conditional CIs, we carry out numerical experiments where online confidence intervals are constructed under different (predictable) selection schemes. We set and in each of simulation runs, we draw parameters i.i.d. from a mixture
where . The mass at represents the “null" component (essentially zero), while the “nonnulls" are drawn so that large effects are rare. The observations are then drawn as . The are revealed one by one, and a confidence interval is to be quoted whenever a parameter is selected. The LORDCI procedure uses the sequence of specified by the LORD++ procedure (Ramdas et al., 2017) with “default" choices and , as used in the experiments of Javanmard and Montanari (2018); Ramdas et al. (2017). If not indicated otherwise, the marginal CI used for LORDCI is the symmetric twosided interval, and the conditional CI used is the construction from Weinstein et al. (2013, Section 2) obtained by inverting shortest acceptance regions. Table 1 gives quantitative summary statistics (averaged over the replications) for the three simulation examples. Below, we examine the output from a single realization of the experiment for each of the examples.
We begin with a simple selection rule, where a CI is constructed when , i.e., when the size of the current observation exceeds a fixed threshold. Figure 2 shows conditional CIs (red) versus LORDCI intervals (black) for a single realization. The conditional CIs are considerably shorter than LORDCI, which seems to be conservative with as compared to about 0.1 for conditional. In particular, the conditional CIs become closer to the marginal twosided
interval as the observation size increases, which would in that sense resemble Bayesian credible intervals for our example (these are not shown in the figure). Both the conditional and LORDCI intervals may cross zero, as can be seen in the plot. In fact, as many as 53% of the conditional CIs cross zero, and 38% of LORDCI intervals cross zero. Note that the lower endpoint of the CI is monotone nondecreasing for the conditional intervals, but not for LORDCI. The conditional intervals seem preferable in this situation.
The second simulation example illustrates a situation where we are interested first in detecting the sign of the parameters, and second in supplementing a directional decision with confidence bounds. For this we implement the signdetermining LORDCI procedure of Section 5, in other words, is selected whenever the candidate (symmetric) LORDCI interval excludes zero. Because this amounts to selecting when , and because are independent and predictable, the conditional distribution of given and , is that of a truncated normal, and we can use again the intervals of Weinstein et al. (2013) (the cutoff will now be different for every selection, as opposed to the previous example where it was