We study selective monitors for labelled Markov chains. Monitors observe the outputs that are generated by a Markov chain during its run, with the goal of identifying runs as correct or faulty. A monitor is selective if it skips observations in order to reduce monitoring overhead. We are interested in monitors that minimize the expected number of observations. We establish an undecidability result for selectively monitoring general Markov chains. On the other hand, we show for non-hidden Markov chains (where any output identifies the state the Markov chain is in) that simple optimal monitors exist and can be computed efficiently, based on DFA language equivalence. These monitors do not depend on the precise transition probabilities in the Markov chain. We report on experiments where we compute these monitors for several open-source Java projects.READ FULL TEXT VIEW PDF
Hidden Markov chain, or Markov field, models, with observations in a
We consider the problem of statistical inference in a parametric finite
A fundamental problem when aggregating Markov chains is the specificatio...
Markov Chains with variable length are useful stochastic models for data...
It is common to subsample Markov chain output to reduce the storage burd...
In this paper, the open problem of finding a closed analytical expressio...
This article surveys recent advancements of strategy designs for persist...
Consider an MC (Markov chain) whose transitions are labelled with letters, and a finite automaton that accepts languages of infinite words. Computing the probability that the random word emitted by the MC is accepted by the automaton is a classical problem at the heart of probabilistic verification. A finite prefix may already determine whether the random infinite word is accepted, and computing the probability that such a deciding finite prefix is produced is a nontrivial diagnosability problem. The theoretical problem we study in this paper is how to catch deciding prefixes without observing the whole prefix; i.e., we want to minimize the expected number of observations and still catch all deciding prefixes.
In runtime verification a program sends messages to a monitor, which decides if the program run is faulty. Usually, runtime verification is turned off in production code because monitoring overhead is prohibitive. QVM (quality virtual machine) and ARV (adaptive runtime verification) are existing pragmatic solutions to the overhead problem, which perform best-effort monitoring within a specified overhead budget [1, 3]
. ARV relies on RVSE (runtime verification with state estimation) to also compute a probability that the program run is faulty[21, 15]. We take the opposite approach: we ask for the smallest overhead achievable without compromising precision at all.
Before worrying about the performance of a monitor, one might want to check if faults in a given system can be diagnosed at all. This problem has been studied under the term diagnosability, first for non-stochastic finite discrete event systems , which are labelled transition systems. It was shown in  that diagnosability can be checked in polynomial time, although the associated monitors may have exponential size. Later the notion of diagnosability was extended to stochastic discrete-event systems, which are labelled Markov chains . Several notions of diagnosability in stochastic systems exist, and some of them have several names, see, e.g., [20, 4] and the references therein. Bertrand et al.  also compare the notions. For instance, they show that for one variant of the problem (referred to as A-diagnosability or SS-diagnosability or IF-diagnosability) a previously proposed polynomial-time algorithm is incorrect, and prove that this notion of diagnosability is PSPACE-complete. Indeed, most variants of diagnosability for stochastic systems are PSPACE-complete , with the notable exception of AA-diagnosability (where the monitor is allowed to diagnose wrongly with arbitrarily small probability), which can be solved in polynomial time .
In this paper, we seem to make the problem harder: since observations by a monitor come with a performance overhead, we allow the monitor to skip observations. In order to decide how many observations to skip, the monitor employs an observation policy. Skipping observations might decrease the probability of deciding (whether the current run of the system is faulty or correct). We do not study this tradeoff: we require policies to be feasible, i.e., the probability of deciding must be as high as under the policy that observes everything. We do not require the system to be diagnosable; i.e., the probability of deciding may be less than . Checking whether the system is diagnosable is PSPACE-complete (, 8).
The cost (of decision) is the number of observations that the policy makes during a run of the system. We are interested in minimizing the expected cost among all feasible policies. We show that if the system is diagnosable then there exists a policy with finite expected cost, i.e., the policy may stop observing after finite expected time. (The converse is not true.) Whether the infimum cost (among feasible policies) is finite is also PSPACE-complete (14). Whether there is a feasible policy whose expected cost is smaller than a given threshold is undecidable (15), even for diagnosable systems.
We identify a class of MCs, namely non-hidden MCs, where the picture is much brighter. An MC is called non-hidden when each label identifies the state. Non-hidden MCs are always diagnosable. Moreover, we show that maximally procrastinating policies are (almost) optimal (27). A policy is called maximally procrastinating when it skips observations up to the point where one further skip would put a decision on the current run in question. We also show that one can construct an (almost) optimal maximally procrastinating policy in polynomial time. This policy does not depend on the exact probabilities in the MC, although the expected cost under that policy does. That is, we efficiently construct a policy that is (almost) optimal regardless of the transition probabilities on the MC transitions. We also show that the infimum cost (among all feasible policies) can be computed in polynomial time (28). Underlying these results is a theory based on automata, in particular, checking language equivalence of DFAs.
We evaluated the algorithms presented in this paper by implementing them in Facebook Infer, and trying them on of the most forked Java projects on GitHub. We found that, on average, selective monitoring can reduce the number of observations to a half.
Let be a finite set. We view elements of as vectors
, more specifically as row vectors. We writefor the all-1 vector, i.e., the element of . For a vector , we denote by its transpose, a column vector. A vector is a distribution over if . For we write for the (Dirac) distribution over with and for . We view elements of as matrices. A matrix is called stochastic if each row sums up to one, i.e., .
For a finite alphabet , we write and for the finite and infinite words over , respectively. We write for the empty word. We represent languages using deterministic finite automata, and we represent probability measures over using Markov chains.
A (discrete-time, finite-state, labelled) Markov chain (MC) is a quadruple where is a finite set of states, a finite alphabet, an initial state, and specifies the transitions, such that
is a stochastic matrix. Intuitively, if the MC is in state, then with probability it emits and moves to state . For the complexity results in this paper, we assume that all numbers in the matrices for are rationals given as fractions of integers represented in binary. We extend to the mapping with for . Intuitively, if the MC is in state then with probability it emits the word and moves (in steps) to state . An MC is called non-hidden if for each all non-zero entries of are in the same column. Intuitively, in a non-hidden MC, the emitted letter identifies the next state. An MC defines the standard probability measure over , uniquely defined by assigning probabilities to cylinder sets , with , as follows:
A deterministic finite automaton (DFA) is a quintuple where is a finite set of states, a finite alphabet, a transition function, an initial state, and a set of accepting states. We extend to as usual. A DFA defines a language as follows:
Note that we do not require accepting states to be visited infinitely often: just once suffices. Therefore we can and will assume without loss of generality that there is with and for all .
For the rest of the paper we fix an MC and a DFA . We define their composition as the MC where equals if and otherwise. Thus, and induce the same probability measure .
An observation is either a letter or the special symbol , which stands for ‘not seen’. An observation policy is a (not necessarily computable) function that, given the observations made so far, says whether we should observe the next letter. An observation policy determines a projection : we have when
We denote the see-all policy by ; thus, .
In the rest of the paper we reserve for letters, for observations, for finite words, for infinite words, for finite observation prefixes, for states from an MC, and for states from a DFA. We write when and are the same or at least one of them is . We lift this relation to (finite and infinite) sequences of observations (of the same length). We write when holds for the length- prefix of .
We say that is negatively deciding when . Intuitively, is negatively deciding when is incompatible (up to a null set) with . Similarly, we say that is positively deciding when . An observation prefix is deciding when it is positively or negatively deciding. An observation policy decides when has a deciding prefix. A monitor is an interactive algorithm that implements an observation policy: it processes a stream of letters and, after each letter, it replies with one of ‘yes’, ‘no’, or ‘skip letters’, where .
For any , if some policy decides then decides .
Let decide . Then there is a deciding prefix of . Suppose is positively deciding, i.e., . Let be the length- prefix of . Then , since can be obtained from by possibly replacing some letters with . Hence is also positively deciding. Since is a prefix of , we have that decides . The case where is negatively deciding is similar. ∎
It follows that . We say that a policy is feasible when it also attains the maximum, i.e., when
Equivalently, is feasible when , i.e., almost all words that are decided by the see-all policy are also decided by . If is the shortest prefix of that is deciding, then the cost of decision is . This paper is about finding feasible observation policies that minimize , the expectation of the cost of decision with respect to .
In this section we study properties of observation policies that are qualitative, i.e., not directly related to the cost of decision. We focus on properties of observation prefixes that a policy may produce.
We have already defined deciding observation prefixes. We now define several other types of prefixes: enabled, confused, very confused, and finitary. A prefix is enabled if it occurs with positive probability, . Intuitively, the other types of prefixes are defined in terms of what would happen if we were to observe all from now on: if it is not almost sure that eventually a deciding prefix is reached, then we say is confused; if it is almost sure that a deciding prefix will not be reached, then we say is very confused; if it is almost sure that eventually a deciding or very confused prefix is reached, then we say is finitary. To say this formally, let us make a few notational conventions: for an observation prefix , we write as a shorthand for ; for a set of observation prefixes, we write as a shorthand for . With these conventions, we define:
is confused when
is very confused when
is finitary when
Observe that (a) confused implies enabled, (b) deciding implies not confused, and (c) enabled and very confused implies confused. The following are alternative equivalent definitions:
is confused when
is very confused when is non-deciding for all enabled
is finitary when
Consider the MC and the DFA depicted here:
All observation prefixes that do not start with are enabled. The observation prefixes and and, in fact, all observation prefixes that contain , are positively deciding. For all we have and , so is not deciding. If the MC takes the right transition first then almost surely it emits at some point. Thus . Hence is confused. In this example only non-enabled observation prefixes are very confused. It follows that is not finitary.
For any we write for the probability measure of the MC obtained from by making the initial state. For any we write for the language of the DFA obtained from by making the initial state. We call a pair negatively deciding when ; similarly, we call positively deciding when . A subset of is called belief. We call a belief negatively (positively, respectively) deciding when all its elements are. We fix the notation (for the initial belief) for the remainder of the paper. Define the belief NFA as the NFA with:
We extend the transition function to in the way that is usual for NFAs. Intuitively, if belief is the set of states where the product could be now, then is the belief adjusted by additionally observing . To reason about observation prefixes algorithmically, it will be convenient to reason about the belief .
We define confused, very confused, and finitary beliefs as follows:
is confused when for some
is very confused when is empty or not deciding for all
is finitary when for all
In 2 we have , and for all , and , and , and for all that contain . The latter belief is positively deciding. We have , so any belief that contains is confused. Also, is confused as .
By the following lemma, the corresponding properties of observation prefixes and beliefs are closely related.
Let be an observation prefix.
is enabled if and only if .
is negatively deciding if and only if is negatively deciding.
is positively deciding if and only if is positively deciding.
is confused if and only if is confused.
is very confused if and only if is very confused.
is finitary if and only if is finitary.
The following lemma gives complexity bounds for computing these properties.
Let be an observation prefix, and a belief.
Whether is enabled can be decided in P.
Whether (or ) is negatively deciding can be decided in P.
Whether (or ) is positively deciding can be decided in P.
Whether (or ) is confused can be decided in PSPACE.
Whether (or ) is very confused can be decided in PSPACE.
Whether (or ) is finitary can be decided in PSPACE.
The belief NFA and the MC can be computed in polynomial time (even in deterministic logspace). For items 1–3, there are efficient graph algorithms that search these product structures. For instance, to show that a given pair is not negatively deciding, it suffices to show that has a path from to a state for some . This can be checked in polynomial time (even in NL).
For items 4–6, one searches the (exponential-sized) product of and the determinization of . This can be done in PSPACE. For instance, to show that a given belief is confused, it suffices to show that there are and and such that has a -labelled path from to such that there do not exist and such that has a -labelled path from to such that is deciding. This can be checked in NPSPACE = PSPACE by nondeterministically guessing paths in the product of and the determinization of . ∎
We call a policy a diagnoser when it decides almost surely.
Diagnosability can be characterized by the notion of confusion:
There exists a diagnoser if and only if is not confused.
The following proposition shows that diagnosability is hard to check.
Given an MC and a DFA , it is PSPACE-complete to check if there exists a diagnoser.
8 essentially follows from a result by Bertrand et al. . They study several different notions of diagnosability; one of them (FA-diagnosability) is very similar to our notion of diagnosability. There are several small differences; e.g., their systems are not necessarily products of an MC and a DFA. Therefore we give a self-contained proof of 8.
By 7 it suffices to show PSPACE-completeness of checking whether is confused. Membership in PSPACE follows from 5.4. For hardness we reduce from the following problem: given an NFA over where all states are initial and accepting, does accept all (finite) words? This problem is PSPACE-complete [16, Lemma 6]. ∎
We say an observation policy allows confusion when, with positive probability, it produces an observation prefix such that is confused but is not.
A feasible observation policy does not allow confusion.
In this section we study the computational complexity of finding feasible policies that minimize the expected cost of decision. We focus on the decision version of the problem: Is there a feasible policy whose expected cost is smaller than a given threshold? Define:
Since the see-all policy never stops observing, we have , so . However, once an observation prefix is deciding or very confused, there is no point in continuing observation. Hence, we define a light see-all policy , which observes until the observation prefix is deciding or very confused; formally, if and only if is deciding or very confused. It follows from the definition of very confused that the policy is feasible. Concerning the cost we have for all
where if the length- prefix of is deciding or very confused, and otherwise. The following results are proved in the appendix:
If is finitary then is finite.
Let be a feasible observation policy. If then is finitary.
is finite if and only if is finitary.
If a diagnoser exists then is finite.
It is PSPACE-complete to check if .
10 holds because, in , a bottom strongly connected component is reached in expected finite time. 11 says that a kind of converse holds for feasible policies. 12 follows from Lemmas 11 and 10. 13 follows from Propositions 12 and 7. To show 14, we use 12 and adapt the proof of 8.
The main negative result of the paper is that one cannot compute :
It is undecidable to check if , even when a diagnoser exists.
By a reduction from the undecidable problem whether a given probabilistic automaton accepts some word with probability . The proof is somewhat complicated. In fact, in the appendix we give two versions of the proof: a short incorrect one (with the correct main idea) and a long correct one. ∎
Now we turn to positive results. In the rest of the paper we assume that the MC is non-hidden, i.e., there exists a function such that implies . We extend to finite words so that . We write to indicate that there is with .
Consider the following non-hidden MC and DFA:
is the initial belief. The beliefs and are not confused: indeed, is negatively deciding, and is positively deciding. The belief is confused, as there is no for which is deciding. Finally, is very confused.
We will show that in the non-hidden case there always exists a diagnoser (23). It follows that feasible policies need to decide almost surely and, by 13, that is finite. We have seen in 9 that feasible policies do not allow confusion. In this section we construct policies that procrastinate so much that they avoid confusion just barely. We will see that such policies have an expected cost that comes arbitrarily close to .
We characterize confusion by language equivalence in a certain DFA. Consider the belief NFA . In the non-hidden case, if we disallow -transitions then becomes a DFA . For we define a set of accepting states by .
For the previous example, a part of the DFA looks as follows:
States that are unreachable from are not drawn here.
We associate with each the language that accepts starting from initial state . We call language equivalent, denoted by , when .
One can compute the relation in polynomial time.
For any one can use standard MC algorithms to check in polynomial time if (using a graph search in the composition , as in the proof of 5.3). Language equivalence in the DFA can be computed in polynomial time by minimization. ∎
We call a belief settled when all are language equivalent.
A belief is confused if and only if there is such that is not settled.
It follows that one can check in polynomial time whether a given belief is confused. We generalize this fact in 22 below.
For a belief and , if is confused then so is . We define:
We set if is confused. We may write for .
In 16 we have and and and .
Given a belief , one can compute in polynomial time. Further, if is finite then .
Let . By 19, is confused if and only if:
This holds if and only if there is with such that:
Let be the directed graph with nodes in and edges
Also define the following set of nodes:
By 18 one can compute in polynomial time. It follows from the argument above that is confused if and only if there are such that there is a length- path in from to a node in . Let be the length of the shortest such path, and set if no such path exists. Then can be computed in polynomial time by a search of the graph , and we have . ∎
For any belief and any observation prefix , the language equivalence classes represented in depend only on and the language equivalence classes in . Therefore, when tracking beliefs along observations, we may restrict to a single representative of each equivalence class. We denote this operation by . A belief is settled if and only if .
A procrastination policy is parameterized with (a large) . Define (and precompute) for all . We define by the following monitor that implements it:
while is not deciding:
skip observations, then observe a letter
output yes/no decision
It follows from the definition of and 19 that is indeed a singleton for all . We have:
For all the procrastination policy is a diagnoser.
For a non-hidden MC and a DFA , there is at most one successor for on letter in the belief NFA , for all . Then, by 19, singleton beliefs are not confused, and in particular the initial belief is not confused. By 4.4, is not confused, which means that . Since almost surely a deciding word is produced and since whenever , it follows that eventually an observation prefix is produced such that contains a deciding pair . But, as remarked above, is settled, so it is deciding. ∎
The policy produces a (random, almost surely finite) word with . Indeed, the observations that makes can be described by an MC. Recall that we have previously defined a composition MC . Now define an MC where is a fresh letter and the transitions are as follows: when is deciding then , and when is not deciding then
where the matrix is powered by . The MC may not be non-hidden, but could be made non-hidden by (i) collapsing all language equivalent in the natural way, and (ii) redirecting all -labelled transition to a new state that has a self-loop. In the understanding that
indicates ‘decision made’, the probability distribution defined by the MCcoincides with the probability distribution on sequences of non- observations made by .
For 16 the MC for is as follows:
Here the lower number in a state indicate the number. The left state is negatively deciding, and the right state is positively deciding. The policy skips the first observation and then observes either or , each with probability , each leading to a deciding belief.
The following lemma states, loosely speaking, that when a belief with is reached and is large, then a single further observation is expected to suffice for a decision.
Let denote the expected cost of decision under starting in . For each there exists such that for all with we have .
Consider the following variant of the previous example:
The MC for is as follows:
The left state is negatively deciding, and the right state is positively deciding. We have and .
Now we can prove the main positive result of the paper:
For any feasible policy there is such that:
Let be a feasible policy. We choose , so, by 22, coincides with until time, say, when encounters a pair with . (The time may, with positive probability, never come.) Let us compare with up to time . For , define and as the observation prefixes obtained by and , respectively, after steps. Write and for the number of non- observations in and , respectively. For beliefs we write when for all there is with . One can show by induction that we have for all :
If time does not come then the inequality from above suffices. Similarly, if at time the pair is deciding, we are also done. If after time the procrastination policy observes at least one more letter then also observes at least one more letter. By 25, one can choose large so that for one additional observation probably suffices. If it is the case that almost surely observes only one letter after , then also needs only one more observation, since it has observed at time . ∎
It follows that, in order to compute , it suffices to analyze for large . This leads to the following theorem:
Given a non-hidden MC and a DFA , one can compute in polynomial time.