Selective Monitoring

06/15/2018 ∙ by Radu Grigore, et al. ∙ 0

We study selective monitors for labelled Markov chains. Monitors observe the outputs that are generated by a Markov chain during its run, with the goal of identifying runs as correct or faulty. A monitor is selective if it skips observations in order to reduce monitoring overhead. We are interested in monitors that minimize the expected number of observations. We establish an undecidability result for selectively monitoring general Markov chains. On the other hand, we show for non-hidden Markov chains (where any output identifies the state the Markov chain is in) that simple optimal monitors exist and can be computed efficiently, based on DFA language equivalence. These monitors do not depend on the precise transition probabilities in the Markov chain. We report on experiments where we compute these monitors for several open-source Java projects.



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Consider an MC (Markov chain) whose transitions are labelled with letters, and a finite automaton that accepts languages of infinite words. Computing the probability that the random word emitted by the MC is accepted by the automaton is a classical problem at the heart of probabilistic verification. A finite prefix may already determine whether the random infinite word is accepted, and computing the probability that such a deciding finite prefix is produced is a nontrivial diagnosability problem. The theoretical problem we study in this paper is how to catch deciding prefixes without observing the whole prefix; i.e., we want to minimize the expected number of observations and still catch all deciding prefixes.


In runtime verification a program sends messages to a monitor, which decides if the program run is faulty. Usually, runtime verification is turned off in production code because monitoring overhead is prohibitive. QVM (quality virtual machine) and ARV (adaptive runtime verification) are existing pragmatic solutions to the overhead problem, which perform best-effort monitoring within a specified overhead budget [1, 3]

. ARV relies on RVSE (runtime verification with state estimation) to also compute a probability that the program run is faulty 

[21, 15]. We take the opposite approach: we ask for the smallest overhead achievable without compromising precision at all.

Previous Work.

Before worrying about the performance of a monitor, one might want to check if faults in a given system can be diagnosed at all. This problem has been studied under the term diagnosability, first for non-stochastic finite discrete event systems [19], which are labelled transition systems. It was shown in [14] that diagnosability can be checked in polynomial time, although the associated monitors may have exponential size. Later the notion of diagnosability was extended to stochastic discrete-event systems, which are labelled Markov chains [22]. Several notions of diagnosability in stochastic systems exist, and some of them have several names, see, e.g., [20, 4] and the references therein. Bertrand et al. [4] also compare the notions. For instance, they show that for one variant of the problem (referred to as A-diagnosability or SS-diagnosability or IF-diagnosability) a previously proposed polynomial-time algorithm is incorrect, and prove that this notion of diagnosability is PSPACE-complete. Indeed, most variants of diagnosability for stochastic systems are PSPACE-complete [4], with the notable exception of AA-diagnosability (where the monitor is allowed to diagnose wrongly with arbitrarily small probability), which can be solved in polynomial time [5].

Selective Monitoring.

In this paper, we seem to make the problem harder: since observations by a monitor come with a performance overhead, we allow the monitor to skip observations. In order to decide how many observations to skip, the monitor employs an observation policy. Skipping observations might decrease the probability of deciding (whether the current run of the system is faulty or correct). We do not study this tradeoff: we require policies to be feasible, i.e., the probability of deciding must be as high as under the policy that observes everything. We do not require the system to be diagnosable; i.e., the probability of deciding may be less than . Checking whether the system is diagnosable is PSPACE-complete ([4], 8).

The Cost of Decision in General Markov Chains.

The cost (of decision) is the number of observations that the policy makes during a run of the system. We are interested in minimizing the expected cost among all feasible policies. We show that if the system is diagnosable then there exists a policy with finite expected cost, i.e., the policy may stop observing after finite expected time. (The converse is not true.) Whether the infimum cost (among feasible policies) is finite is also PSPACE-complete (14). Whether there is a feasible policy whose expected cost is smaller than a given threshold is undecidable (15), even for diagnosable systems.

Non-Hidden Markov Chains.

We identify a class of MCs, namely non-hidden MCs, where the picture is much brighter. An MC is called non-hidden when each label identifies the state. Non-hidden MCs are always diagnosable. Moreover, we show that maximally procrastinating policies are (almost) optimal (27). A policy is called maximally procrastinating when it skips observations up to the point where one further skip would put a decision on the current run in question. We also show that one can construct an (almost) optimal maximally procrastinating policy in polynomial time. This policy does not depend on the exact probabilities in the MC, although the expected cost under that policy does. That is, we efficiently construct a policy that is (almost) optimal regardless of the transition probabilities on the MC transitions. We also show that the infimum cost (among all feasible policies) can be computed in polynomial time (28). Underlying these results is a theory based on automata, in particular, checking language equivalence of DFAs.


We evaluated the algorithms presented in this paper by implementing them in Facebook Infer, and trying them on of the most forked Java projects on GitHub. We found that, on average, selective monitoring can reduce the number of observations to a half.

2 Preliminaries

Let be a finite set. We view elements of as vectors

, more specifically as row vectors. We write

for the all-1 vector, i.e., the element of . For a vector , we denote by its transpose, a column vector. A vector is a distribution over  if . For we write for the (Dirac) distribution over  with and for . We view elements of as matrices. A matrix is called stochastic if each row sums up to one, i.e., .

For a finite alphabet , we write and  for the finite and infinite words over , respectively. We write for the empty word. We represent languages using deterministic finite automata, and we represent probability measures over  using Markov chains.

A (discrete-time, finite-state, labelled) Markov chain (MC) is a quadruple where is a finite set of states, a finite alphabet, an initial state, and specifies the transitions, such that

is a stochastic matrix. Intuitively, if the MC is in state 

, then with probability  it emits  and moves to state . For the complexity results in this paper, we assume that all numbers in the matrices  for are rationals given as fractions of integers represented in binary. We extend  to the mapping with for . Intuitively, if the MC is in state  then with probability  it emits the word  and moves (in steps) to state . An MC is called non-hidden if for each all non-zero entries of  are in the same column. Intuitively, in a non-hidden MC, the emitted letter identifies the next state. An MC defines the standard probability measure  over , uniquely defined by assigning probabilities to cylinder sets , with , as follows:

A deterministic finite automaton (DFA) is a quintuple where is a finite set of states, a finite alphabet, a transition function, an initial state, and a set of accepting states. We extend to as usual. A DFA defines a language  as follows:

Note that we do not require accepting states to be visited infinitely often: just once suffices. Therefore we can and will assume without loss of generality that there is with and for all .

For the rest of the paper we fix an MC and a DFA . We define their composition as the MC where equals  if and otherwise. Thus, and induce the same probability measure .

An observation is either a letter or the special symbol , which stands for ‘not seen’. An observation policy is a (not necessarily computable) function that, given the observations made so far, says whether we should observe the next letter. An observation policy  determines a projection : we have when

for all

We denote the see-all policy by ; thus, .

In the rest of the paper we reserve  for letters,  for observations,  for finite words,  for infinite words,  for finite observation prefixes,  for states from an MC, and  for states from a DFA. We write when  and  are the same or at least one of them is . We lift this relation to (finite and infinite) sequences of observations (of the same length). We write when holds for the length- prefix of .

We say that is negatively deciding when . Intuitively, is negatively deciding when is incompatible (up to a null set) with . Similarly, we say that is positively deciding when . An observation prefix  is deciding when it is positively or negatively deciding. An observation policy  decides  when has a deciding prefix. A monitor is an interactive algorithm that implements an observation policy: it processes a stream of letters and, after each letter, it replies with one of ‘yes’, ‘no’, or ‘skip  letters’, where .

Lemma 1.

For any , if some policy decides  then decides .


Let decide . Then there is a deciding prefix of . Suppose is positively deciding, i.e., . Let be the length- prefix of . Then , since can be obtained from  by possibly replacing some letters with . Hence is also positively deciding. Since is a prefix of , we have that decides . The case where is negatively deciding is similar. ∎

It follows that . We say that a policy  is feasible when it also attains the maximum, i.e., when

Equivalently, is feasible when , i.e., almost all words that are decided by the see-all policy are also decided by . If is the shortest prefix of that is deciding, then the cost of decision  is . This paper is about finding feasible observation policies  that minimize , the expectation of the cost of decision with respect to .

3 Qualitative Analysis of Observation Policies

In this section we study properties of observation policies that are qualitative, i.e., not directly related to the cost of decision. We focus on properties of observation prefixes that a policy may produce.

Observation Prefixes.

We have already defined deciding observation prefixes. We now define several other types of prefixes: enabled, confused, very confused, and finitary. A prefix  is enabled if it occurs with positive probability, . Intuitively, the other types of prefixes  are defined in terms of what would happen if we were to observe all from now on: if it is not almost sure that eventually a deciding prefix is reached, then we say  is confused; if it is almost sure that a deciding prefix will not be reached, then we say  is very confused; if it is almost sure that eventually a deciding or very confused prefix is reached, then we say  is finitary. To say this formally, let us make a few notational conventions: for an observation prefix , we write as a shorthand for ; for a set of observation prefixes, we write as a shorthand for . With these conventions, we define:

  1.  is confused when

  2.  is very confused when

  3.  is finitary when

Observe that (a) confused implies enabled, (b) deciding implies not confused, and (c) enabled and very confused implies confused. The following are alternative equivalent definitions:

  1.  is confused when

  2.  is very confused when is non-deciding for all enabled

  3.  is finitary when

Example 2.

Consider the MC and the DFA depicted here:

All observation prefixes that do not start with  are enabled. The observation prefixes and and, in fact, all observation prefixes that contain , are positively deciding. For all we have and , so  is not deciding. If the MC takes the right transition first then almost surely it emits  at some point. Thus . Hence  is confused. In this example only non-enabled observation prefixes are very confused. It follows that  is not finitary.


For any  we write  for the probability measure of the MC  obtained from  by making the initial state. For any  we write  for the language of the DFA  obtained from  by making the initial state. We call a pair negatively deciding when ; similarly, we call positively deciding when . A subset of is called belief. We call a belief negatively (positively, respectively) deciding when all its elements are. We fix the notation (for the initial belief) for the remainder of the paper. Define the belief NFA as the NFA with:

We extend the transition function to in the way that is usual for NFAs. Intuitively, if belief is the set of states where the product could be now, then is the belief adjusted by additionally observing . To reason about observation prefixes  algorithmically, it will be convenient to reason about the belief .

We define confused, very confused, and finitary beliefs as follows:

  1. is confused when for some

  2. is very confused when is empty or not deciding for all 

  3. is finitary when for all

Example 3.

In 2 we have , and for all , and , and , and for all that contain . The latter belief is positively deciding. We have , so any belief that contains is confused. Also, is confused as .

Relation Between Observation Prefixes and Beliefs.

By the following lemma, the corresponding properties of observation prefixes and beliefs are closely related.

Lemma 4.

Let be an observation prefix.

  1. is enabled if and only if .

  2. is negatively deciding if and only if is negatively deciding.

  3. is positively deciding if and only if is positively deciding.

  4. is confused if and only if is confused.

  5. is very confused if and only if is very confused.

  6. is finitary if and only if is finitary.

The following lemma gives complexity bounds for computing these properties.

Lemma 5.

Let be an observation prefix, and a belief.

  1. Whether is enabled can be decided in P.

  2. Whether (or ) is negatively deciding can be decided in P.

  3. Whether (or ) is positively deciding can be decided in P.

  4. Whether (or ) is confused can be decided in PSPACE.

  5. Whether (or ) is very confused can be decided in PSPACE.

  6. Whether (or ) is finitary can be decided in PSPACE.

Proof sketch.

The belief NFA  and the MC can be computed in polynomial time (even in deterministic logspace). For items 1–3, there are efficient graph algorithms that search these product structures. For instance, to show that a given pair is not negatively deciding, it suffices to show that has a path from to a state for some . This can be checked in polynomial time (even in NL).

For items 4–6, one searches the (exponential-sized) product of  and the determinization of . This can be done in PSPACE. For instance, to show that a given belief  is confused, it suffices to show that there are and and such that has a -labelled path from to  such that there do not exist and such that has a -labelled path from to  such that is deciding. This can be checked in NPSPACE = PSPACE by nondeterministically guessing paths in the product of and the determinization of . ∎


We call a policy a diagnoser when it decides almost surely.

Example 6.

In 2 a diagnoser does not exist. Indeed, the policy  does not decide when the MC takes the left transition, and decides (positively) almost surely when the MC takes the right transition in the first step. Hence . So is not a diagnoser. By 1, it follows that there is no diagnoser.

Diagnosability can be characterized by the notion of confusion:

Proposition 7.

There exists a diagnoser if and only if is not confused.

The following proposition shows that diagnosability is hard to check.

Theorem 8 (cf. [4, Theorem 6]).

Given an MC  and a DFA , it is PSPACE-complete to check if there exists a diagnoser.

8 essentially follows from a result by Bertrand et al. [4]. They study several different notions of diagnosability; one of them (FA-diagnosability) is very similar to our notion of diagnosability. There are several small differences; e.g., their systems are not necessarily products of an MC and a DFA. Therefore we give a self-contained proof of 8.

Proof sketch.

By 7 it suffices to show PSPACE-completeness of checking whether  is confused. Membership in PSPACE follows from 5.4. For hardness we reduce from the following problem: given an NFA  over  where all states are initial and accepting, does accept all (finite) words? This problem is PSPACE-complete [16, Lemma 6]. ∎

Allowing Confusion.

We say an observation policy allows confusion when, with positive probability, it produces an observation prefix such that is confused but is not.

Proposition 9.

A feasible observation policy does not allow confusion.

Hence, in order to be feasible, a policy must observe when it would get confused otherwise. In § 5 we show that in the non-hidden case there is almost a converse of 9; i.e., in order to be feasible, a policy need not do much more than not allow confusion.

4 Analyzing the Cost of Decision

In this section we study the computational complexity of finding feasible policies that minimize the expected cost of decision. We focus on the decision version of the problem: Is there a feasible policy whose expected cost is smaller than a given threshold? Define:

Since the see-all policy  never stops observing, we have , so . However, once an observation prefix  is deciding or very confused, there is no point in continuing observation. Hence, we define a light see-all policy , which observes until the observation prefix  is deciding or very confused; formally, if and only if  is deciding or very confused. It follows from the definition of very confused that the policy  is feasible. Concerning the cost  we have for all 


where if the length- prefix of  is deciding or very confused, and otherwise. The following results are proved in the appendix:

Lemma 10.

If  is finitary then is finite.

Lemma 11.

Let be a feasible observation policy. If then  is finitary.

Proposition 12.

 is finite if and only if is finitary.

Proposition 13.

If a diagnoser exists then is finite.

Theorem 14.

It is PSPACE-complete to check if .

10 holds because, in , a bottom strongly connected component is reached in expected finite time. 11 says that a kind of converse holds for feasible policies. 12 follows from Lemmas 11 and 10. 13 follows from Propositions 12 and 7. To show 14, we use 12 and adapt the proof of 8.

The main negative result of the paper is that one cannot compute :

Theorem 15.

It is undecidable to check if , even when a diagnoser exists.

Proof sketch.

By a reduction from the undecidable problem whether a given probabilistic automaton accepts some word with probability . The proof is somewhat complicated. In fact, in the appendix we give two versions of the proof: a short incorrect one (with the correct main idea) and a long correct one. ∎

5 The Non-Hidden Case

Now we turn to positive results. In the rest of the paper we assume that the MC  is non-hidden, i.e., there exists a function such that implies . We extend  to finite words so that . We write to indicate that there is  with .

Example 16.

Consider the following non-hidden MC and DFA:

 is the initial belief. The beliefs and  are not confused: indeed, is negatively deciding, and is positively deciding. The belief  is confused, as there is no for which is deciding. Finally,  is very confused.

We will show that in the non-hidden case there always exists a diagnoser (23). It follows that feasible policies need to decide almost surely and, by 13, that  is finite. We have seen in 9 that feasible policies do not allow confusion. In this section we construct policies that procrastinate so much that they avoid confusion just barely. We will see that such policies have an expected cost that comes arbitrarily close to .

Language Equivalence.

We characterize confusion by language equivalence in a certain DFA. Consider the belief NFA . In the non-hidden case, if we disallow -transitions then  becomes a DFA . For  we define a set of accepting states by .

Example 17.

For the previous example, a part of the DFA  looks as follows:

States that are unreachable from are not drawn here.

We associate with each the language that  accepts starting from initial state . We call language equivalent, denoted by , when .

Lemma 18.

One can compute the relation  in polynomial time.


For any one can use standard MC algorithms to check in polynomial time if (using a graph search in the composition , as in the proof of 5.3). Language equivalence in the DFA  can be computed in polynomial time by minimization. ∎

We call a belief settled when all are language equivalent.

Lemma 19.

A belief is confused if and only if there is such that is not settled.

It follows that one can check in polynomial time whether a given belief is confused. We generalize this fact in 22 below.

Example 20.

In 16 the belief is not settled. Indeed, from the DFA in 17 we see that . Since , by 19, the belief  is confused.


For a belief  and , if is confused then so is . We define:

We set if is confused. We may write for .

Example 21.

In 16 we have and and and .

Lemma 22.

Given a belief , one can compute  in polynomial time. Further, if is finite then .


Let . By 19, is confused if and only if:

This holds if and only if there is with such that:

Let  be the directed graph with nodes in and edges

Also define the following set of nodes:

By 18 one can compute  in polynomial time. It follows from the argument above that is confused if and only if there are such that there is a length- path in  from to a node in . Let be the length of the shortest such path, and set if no such path exists. Then  can be computed in polynomial time by a search of the graph , and we have . ∎

The Procrastination Policy.

For any belief  and any observation prefix , the language equivalence classes represented in depend only on  and the language equivalence classes in . Therefore, when tracking beliefs along observations, we may restrict  to a single representative of each equivalence class. We denote this operation by . A belief  is settled if and only if .

A procrastination policy  is parameterized with (a large) . Define (and precompute) for all . We define by the following monitor that implements it:

  1. while is not deciding:

    1. skip observations, then observe a letter 

    2. ;

    3. ;

  2. output yes/no decision

It follows from the definition of and 19 that is indeed a singleton for all . We have:

Lemma 23.

For all the procrastination policy is a diagnoser.


For a non-hidden MC  and a DFA , there is at most one successor for on letter  in the belief NFA , for all . Then, by 19, singleton beliefs are not confused, and in particular the initial belief  is not confused. By 4.4, is not confused, which means that . Since almost surely a deciding word  is produced and since whenever , it follows that eventually an observation prefix  is produced such that contains a deciding pair . But, as remarked above, is settled, so it is deciding. ∎

The Procrastination MC .

The policy  produces a (random, almost surely finite) word with . Indeed, the observations that  makes can be described by an MC. Recall that we have previously defined a composition MC . Now define an MC where is a fresh letter and the transitions are as follows: when is deciding then , and when is not deciding then

where the matrix is powered by . The MC  may not be non-hidden, but could be made non-hidden by (i) collapsing all language equivalent in the natural way, and (ii) redirecting all -labelled transition to a new state  that has a self-loop. In the understanding that

indicates ‘decision made’, the probability distribution defined by the MC 

coincides with the probability distribution on sequences of non- observations made by .

Example 24.

For 16 the MC  for is as follows:

Here the lower number in a state indicate the number. The left state is negatively deciding, and the right state is positively deciding. The policy  skips the first observation and then observes either or , each with probability , each leading to a deciding belief.

Maximal Procrastination is Optimal.

The following lemma states, loosely speaking, that when a belief with is reached and  is large, then a single further observation is expected to suffice for a decision.

Lemma 25.

Let denote the expected cost of decision under starting in . For each there exists such that for all with we have .

Proof sketch.

The proof is a quantitative version of the proof of 23. The singleton belief is not confused. Thus, if is large then with high probability the belief (for the observed next letter ) contains a deciding pair . But if then, by 19, is settled, so if contains a deciding pair then is deciding. ∎

Example 26.

Consider the following variant of the previous example:

The MC  for is as follows:

The left state is negatively deciding, and the right state is positively deciding. We have and .

Now we can prove the main positive result of the paper:

Theorem 27.

For any feasible policy  there is such that:

Proof sketch.

Let  be a feasible policy. We choose , so, by 22,  coincides with until time, say, when encounters a pair with . (The time  may, with positive probability, never come.) Let us compare with  up to time . For , define and as the observation prefixes obtained by and , respectively, after  steps. Write and for the number of non- observations in and , respectively. For beliefs we write when for all there is with . One can show by induction that we have for all :

If time  does not come then the inequality from above suffices. Similarly, if at time  the pair is deciding, we are also done. If after time  the procrastination policy  observes at least one more letter then  also observes at least one more letter. By 25, one can choose large so that for one additional observation probably suffices. If it is the case that  almost surely observes only one letter after , then also needs only one more observation, since it has observed at time . ∎

It follows that, in order to compute , it suffices to analyze for large . This leads to the following theorem:

Theorem 28.

Given a non-hidden MC  and a DFA , one can compute  in polynomial time.


For each define as in 25, and define . By 25, for each non-deciding with we have . Hence the satisfy the following system of linear equations where some coefficients come from the procrastination MC :