The operation and control of complex cyber-physical systems often requires the choice of a sequence of decisions under partial observability. This problem of adaptive decision-making appears in various applications, both in the context of stochastic control (e.g., in robot navigation [kaelbling1998]sutton1998]) and stochastic state estimation (e.g., in information gathering in robotics[singh2007, javdani2014], fault diagnosis in nuclear plants [santoso1999] and sensor placement [golovin2011, debouk2002]
). In particular, the research area of active diagnosis and active learning, i.e., the problem of discrete object or state identification through a minimal number of sequential actions, has many real-world applications in medical diagnosis and emergency response[jaakkola1998, bellala2012], electrical systems diagnosis [maillet2013], scheduling [kosaraju1999]
, etc. Hence, progress in this direction can have a significant impact on a broad range of problems in robotics, cyber-physical systems and artificial intelligence.
Finding optimal policies for general partially observable stochastic estimation problems is often an intractable combinatorial optimization problem. Hence, researchers, e.g.,[summers2014, tzoumas2016] in the context of actuator and sensor placement, have appealed to a diminishing returns property known as submodularity, which plays a similar role in combinatorial optimization as convexity in continuous optimization. Moreover, a recent paper [golovin2011] introduced the notion of adaptive submodularity for set functions that extends submodularity to the adaptive setting, i.e., when actions are sequentially chosen based on observations, and provided near-optimal performance guarantees for adaptive greedy policies. However, there are many applications in which greedy policies perform very well even when the objective function in question is not adaptive submodular. This was observed in [das2011] in the non-adaptive setting, where “near-submodular” set functions are shown to also lead to near-optimal performance guarantees. The paper [kusner2014] hinted that this will hold in the adaptive case, but without providing formal justification. Thus, there is still a need for a rigorous definition of “near-submodular” set functions in the adaptive setting, and for the derivation of performance guarantees for adaptive decision-making problems with such functions.
For the problem of active diagnosis (a.k.a. adaptive stochastic maximization), where actions or queries are selected sequentially to maximize the accuracy of diagnosis with a given budget of actions, [golovin2011] showed that the reward function for this problem is adaptive monotone submodular, when outcomes are not corrupted by noise or faults. Hence, an adaptive greedy policy yields the best polynomial-time approximation algorithm [golovin2011]. However, the case when the outcomes are corrupted by noise or faults is less well studied. The papers [bellala2013, zheng2012] considered this problem when the fault is stochastic/non-persistent, i.e., the case when repeated measurements may yield different outcomes, but the proposed algorithms have no provable performance guarantees. And to our best knowledge, the active diagnosis problem with persistent noise or faults has not been considered, although this has been investigated for the complementary problem known as active learning in [bellala2012, golovin2010]. In these approaches, proxy set functions were found to indirectly prove the near-optimality of the original problem. However, corresponding specialized algorithms were needed and it is unclear how these apply to our active diagnosis problem. In fact, these papers [bellala2012, golovin2010] considered a more general problem of group-based active learning, also studied in [javdani2014], of which active learning with persistent noise is a special case. The objective of group-based active learning is to learn the group to which an object or objects belong, as opposed to the objects themselves with a minimum number of actions or queries. On the other hand, the problem of group-based active diagnosis has, to the extent of our knowledge, not been investigated, despite its potentially broad applicability.
Contributions. To close the gap between non-adaptive and adaptive settings, we rigorously define the concept of weak adaptive submodularity for set functions, as a generalization of (strong) adaptive submodularity, and prove that an adaptive greedy policy yields competitive near-optimal performance guarantees for the active diagnosis problem, when the objective function is adaptive monotone and weakly adaptive submodular. We observe that the adaptive submodularity factor that characterizes nearness to (strong) adaptive submodularity affects the performance bound in the same way as when approximate greedy policies are employed.
Then, we consider the group-based active diagnosis problem and show that the reward function corresponding to group identification is both adaptive monotone and weakly adaptive submodular; thus we also obtain competitive performance guarantees for this problem when using a simple greedy policy without the need for proxy set functions nor specialized algorithms. We believe that our approach is novel and valuable especially for its simplicity. Moreover, the adaptive submodularity factor also offers an alternative explanation for the deterioration of the performance guarantees that was observed in group-based active learning when proxy set functions are introduced (cf. [bellala2012, golovin2010, javdani2014]).
Furthermore, we demonstrate that the problem of active diagnosis with persistent noise or fault is equivalent to the group-based active diagnosis problem; hence, once again, we have provably competitive performance guarantees for the adaptive greedy policy and do not require proxy set functions and algorithms. In addition, our general formulation allows for state-dependent sensor noise or fault, and also complex faults, such as faults that give the same outcome regardless of the system states, or faults that always give the opposite outcome. Our experiments on aircraft electrical systems show that the simple adaptive greedy policy performs just as well as a brute-force policy that tries all possible actions, while only having a polynomial-time computational complexity.
Ii Motivation: Aircraft Electrical System with Persistent Sensor Faults
We are motivated by the discrete state estimation problem of an aircraft electrical system via active sensing, first studied in [maillet2013] (see Figure 1 for an example of a simple circuit and the readers are referred to [maillet2013] for a detailed description of the electrical circuits). As more systems become more dependent on electric power systems, this problem is crucial to the safety of these systems. In the above work, the sensors are assumed to be healthy or faultless. But in reality, sensors can be faulty, which means that they can provide incorrect information about the unknown discrete state (i.e., operating condition or status) of the electrical components. The faults can be either stochastic or persistent, and in this paper, we are mainly interested in persistent faults. One such fault is when a faulty sensor persistently gives the opposite reading and another is when a faulty sensor always gives a constant reading (hence sometimes correctly and others incorrectly).
Despite persistent faults, our goal is similar to that in [maillet2013], i.e., to design a policy that can adaptively estimate the discrete state of the circuit by taking “actions” (i.e., opening or closing controllable contactors) and observing the sensor measurements. Note that we are only interested in estimating the unknown discrete state of the system and not the true failure mode of the sensors. Hence, if we group all possible sensor failure modes that correspond to each state, our problem of interest is one of group-based active diagnosis, i.e., we aim to adaptively identify or estimate which group (i.e., state) is compatible with our sensor measurements. However, the reward function for this problem is not adaptive submodular, which motivates the need for the concept of weak adaptive submodularity, discussed next.
Iii Weak Adaptive Submodularity
In this section, we propose a generalization of the concept of adaptive submodularity and show that adaptive greedy policies achieve near-optimal performance for objective functions with such a property. These results will be useful for group-based active diagnosis, considered in the next section.
Iii-a Mathematical Preliminaries and Definitions
We begin by introducing notations for the adaptive decision-making problem we consider. Given a finite set of actions (or queries), , our objective is to adaptively select actions to estimate a fixed but hidden state,
. We consider a Bayesian approach and model the state as a random variable,
, with a given prior probability distributionon . When we take an action , we observe the (sensor) measurement or outcome , which we will use to sequentially make a decision about the next action. We represent the pairs of actions and observed outcomes up to time by a partial realization . Given two partial realizations and , we call a subrealization of if .
To evaluate “progress” in the decision-making process, we define an objective function that maps the set of actions under state to reward . A policy is a function from a set of partial realizations to actions, which specifies the action to choose next after observing , i.e., , and denotes the set of all actions under policy . Randomized policies that specifies the distribution on actions are also allowed.
Next, we provide definitions specific to set functions in [golovin2011], followed by the definition of weak adaptive submodularity.
Definition 1 (Conditional Expected Marginal Benefit).
Given an objective function , an action and a partial realization , the conditional expected marginal benefit of conditioned on having observed is defined as
with the expectation taken with respect to . Similarly, the conditional expected marginal benefit of a policy is
The following definition of adaptive monotonicity has the interpretation that the conditional expected marginal benefit of any action is non-negative.
Definition 2 (Adaptive Monotonicity).
A function is adaptive monotone with respect to distribution if for all and with ,
Next, we define a new property that has the interpretation that the conditional expected marginal benefit of any fixed action (or query) does not increase “too much” as more actions are performed and their measurements are observed.
Definition 3 (-Weak Adaptive Submodularity).
A function is -weakly adaptive submodular with respect to distribution if for all such that is a subrealization of , i.e., , and for all ,
for some constant that we refer to as the adaptive submodularity factor. Additionally, let be the smallest such , referred to as the best adaptive submodularity factor.
From the above definition, it is clear that for any , if is -weakly adaptive submodular, then is also -weakly adaptive submodular. Further, (strongly) adaptive submodular functions (with ) are also -weakly adaptive submodular for any ; hence, this is a generalization of (strong) adaptive submodularity. In fact, any set function is -weakly adaptive submodular with some (possibly infinite) . Note that our definition differs from approximate adaptive submodularity that was defined in [kusner2014] without justification.
Iii-B Adaptive Greedy Policies
Weak adaptive submodularity resembles a diminishing returns property for policies, which suggests that an adaptive greedy policy can provide nice performance guarantees. Thus, we consider an adaptive greedy policy that, at each iteration, myopically follows the greedy heuristic of maximizing its expected gain given its current observations:
Such greedy strategies do not in general produce an optimal solution when compared to clairvoyant algorithms, but are oftentimes near-optimal in practice and have the advantage of having a polynomial-time computational complexity.
Iii-C Performance Guarantees for Active Diagnosis
We consider an important class of problems as follows:
Problem 1 (Active Diagnosis).
Given a reward function , find a policy with a budget of actions such that
with expectation taken with respect to and is the set of all actions under policy when the state is .
It turns out that for this class of problems, the weak adaptive submodularity property offers a very nice performance guarantee when used in conjunction with adaptive greedy policies, which we will prove in Appendix -A.
Fix any . Let the greedy policy be run for iterations (so that it select actions), and be any policy selecting at most actions for any realization . Then for adaptive monotone and -weakly adaptive submodular ,
where is the expected reward of the policy with respect to the distribution .
A corollary from the above theorem is that our greedy policy that selects actions obtains at least of the value of the optimal policy that selects actions. Note that if is large, then the performance bound can be poor.
The adaptive submodularity factor affects the performance guarantee in Theorem 1 in the exact way as the approximation factor of -approximate greedy policies. Just as approximate greedy policies suggest that greedy policies are robust to incorrect priors by a factor (cf. [golovin2011, Section 4.3] for a discussion), the -weak adaptive submodularity property is “robust” to the deviation from (strong) adaptive submodularity by a factor .
Iv Near-Optimal Group-Based Active Diagnosis
Iv-a Problem Setup
The objective of group-based active diagnosis is not to determine the unknown object but rather the group to which the object belongs via active sensing. For instance, in medical diagnosis, one is more interested to diagnose the disease that a patient has without necessarily identifying all its symptoms, while in movie recommendation systems, it may be of more interest to identify the genre that a user prefers as opposed to the exact scene preferences. As also noted in [bellala2012], the group-based diagnosis problem cannot be simply reduced to an active diagnosis problem with groups as “meta-states”, since the objects within a group do not generally yield the same observations. In our motivating example, our goal is to primarily estimate the hidden system state with no emphasis on finding the hidden persistent sensor faults. Further, under the hypotheses of different sensor failure modes for the same system state, the sensor measurements would be different.
Iv-A1 Mathematical formulation
In this section, we further introduce notations that are specific to the group-based active diagnosis problem. We will only consider non-overlapping groups in this paper; thus, to make the connection to active diagnosis with persistent faults clear, we will refer to groups as (system) states and objects in the groups as (sensor) modes . We assume that the true state and the true mode are fixed but unknown. As before, we will consider a Bayesian approach and model the state and mode as random variables with a given prior joint probability distribution on . Note that the modes need not be the same for each state. can be taken as the union of all modes and if a particular mode does not belong to a state , then .
Given the true pair of state and mode , is the unique outcome of performing action . Further, we define as the set of states that gives the same outcome under the action if were the true mode. We then define for each iteration , as the set of all compatible states with the hypothesis that is the true mode, i.e., the set of all states that produce the same set of outcomes under the set of actions , which can also be written as
Since only intersections are taken, the order of actions does not matter.
Iv-A2 Problem statement
The objective of group-based active diagnosis is to eliminate as many of the states as possible that are incompatible with observed outcomes, i.e., the uncertainty of represented by . For the group-based active diagnosis problem, this means that at any step , we cannot eliminate any state that is compatible with the hypothesis of one or more modes, i.e., . Thus, we define a reward function as follows
with . Note that only the final term in (5) is essential for the greedy algorithms that we propose. If
is uniformly distributed on, then this term is proportional to the size of . The constant is added such that the reward function is non-negative, and is equal to 0 when no action has been taken, i.e., .
We also assume that we are allocated a budget on the maximum number of actions taken, . To achieve the goal, we want to design a policy that finds the “best expected estimate” of the state .
Iv-A3 Active diagnosis with persistent faults (special case)
To consider this problem as a group-based active diagnosis problem (see also [bellala2012, golovin2010] for a similar characterization), we create groups of objects where each group corresponds to a state and the objects within the group are copies of the state for each possible mode of sensor faults. For example, we consider a system with states, each denoted by . If we have sensors that can be faulty and there are ways a sensor fault can manifest itself, then we have possible combinations of functioning and faulty sensor modes, denoted by , with cardinality . Next, we define an object as a pair of a state and a mode, denoted as . In other words, for each group/state , all pairs of form the objects corresponding to this group.
Next, to introduce the types of sensor faults we consider, we define the function as the unique outcome under action and true state when all sensors are healthy and the function as the corruption of the ‘healthy’ sensor outcome to the ‘faulty’ sensor outcome when the failure mode is . Hence, . Further, we define with as the set of states that gives the same ‘healthy’ sensor measurement under the action when is the actual system state. Thus, , where the inverse function is such that implies that .
Note that the fault function is surjective, whereas is an injective partial function, as can be seen with the types of sensor faults we consider in this paper. First, a Type 1 fault is when a faulty sensor persistently outputs the opposite outcome. Another type of sensor fault is when the faulty sensor always gives a constant reading, e.g., ‘0’ or ‘1’ (Type 2 fault). For a single sensor with a Type 1 fault, , whereas for one with a Type 2 fault, or . To further illustrate this, consider a simple example with three sensors: , where only is faulty, i.e., , and the ‘faulty’ sensor measurement is . If the fault is of Type 1, we have and , whereas if the fault is of Type 2 (with persistent outcome ‘1’), , with and , and . On the other hand, if is faulty of Type 2 (with persistent outcome ‘0’) and if should always be 1 for all , then is undefined and by definition, .
Iv-B Adaptive Greedy Policy: Group-Based Active Diagnosis
The optimal policy for Problem 2 may only be obtained by a clairvoyant algorithm. Moreover, the complexity of planning ahead for steps scales exponentially with . Hence, we resolve to find a scalable (polynomial-time) algorithm using an adaptive greedy strategy, i.e., to select the action at time that maximizes the expected one-step reward given the available information , denoted by and given in Definition 1, where the expectation is taken with respect to . This probability measure on the set for each can be obtained via Bayes’ rule:
Note that we substituted if because the measurement process is deterministic, and otherwise. Moreover, since , the normalization term can be computed as
Thus, at each iteration or step , the policy consists of choosing the next best action that maximizes the gain in uncertainty reduction as defined in Definition 1, which can be rewritten for the reward function in (5) as
where . Since the first term does not depend on the choice of action , the above expression can be equivalently written as
Iv-C Near-Optimality Performance Guarantees
Proposition 1 (Adaptive Monotonicity).
The reward function in (5) is adaptive monotone.
Proposition 2 (-Weak Adaptive Submodularity).
The reward function in (5) is -weakly adaptive submodular with
where and are defined as
Both and can be computed algorithmically, but since we are interested in a smaller , we only provide an algorithm for computing in Algorithm 1. is of interest because it can be easily derived analytically for some special cases. Specifically, for a group-based active diagnosis problem with uniform for each , which is the case for active sensing with no informative priors on the sensor faults, we have . In this case, . Note that these factors are only required for finding the performance guarantee, and are not needed for the adaptive greedy strategy we propose.
Since the group-based active diagnosis problem (Problem 2) is equivalent to the active diagnosis problem with an adaptive monotone and weakly adaptive submodular reward function, by Propositions 1 and 2, as well as Theorem 1, we have the following performance guarantee.
Hence, in the absence of informative priors on the sensor faults (i.e., when ), our adaptive greedy policy that selects actions obtains at least of the value of the optimal strategy that selects actions. To our best knowledge, this result is novel. Group identification for the complementary problem of adaptive stochastic minimum cost cover, a.k.a. Bayesian active learning has been considered in [golovin2010, bellala2012, javdani2014], but not the group-based active diagnosis problem we consider in this paper. Interestingly, the performance bounds given in [golovin2010, bellala2012] have an additional factor of 2, which is equal to since they had exactly 2 modes for each persistent sensor fault, while [javdani2014] also has a factor that resembles .
We tested the adaptive greedy policy on two electric power system circuits—small and large circuits. For these experiments, the state is uniformly distributed on , and for any sensor measurement that is prone to faults, for all , the conditional probabilities that faults of Type 1 and Type 2 (with outcome ‘stuck’ at ‘1’) persistently occur are and , respectively. The conditional probability for each state and mode pair is then . For simplicity, we only allow two states for the system components, i.e., they are either healthy or faulty. Correspondingly, the sensor readings can only take two values, i.e., proper voltage or improper voltage. We do not assume an initial action , so we initialize our experiments with for each .
For many practical cases, it is not possible to reduce the size of the compatible states to one because the number and positioning of sensors are often limited, and more so when the sensors can be faulty. Thus, we compare the performance of our adaptive greedy policy with the results of a brute force policy that exhaustively takes every action (thus, violating the given budget on the number of actions). The brute force policy represents the best achievable outcome, which we use as a benchmark. Both policies are run in Python on an Intel® Core i5-4300U 64-bit CPU@1.90GHz 2.50GHz with 8.00GB RAM, and the experiments are performed for all possible system states and sensor modes .
V-a Tests on a small-size circuit
We first tested the proposed adaptive greedy policy on a small circuit with 13 components, as shown in Figure 1. By using four controllable contactors (thus, actions) and observing the measurements of sensors , which may be faulty, the objective is to estimate the unknown states of the components .
Simulation result: The adaptive greedy policy that takes 6 actions performs equally well as the brute force policy that takes all 16 actions, i.e., the values of the objective function are equal for all possible states and failure modes .
V-A1 Average execution time
We recorded the average execution times to find the next best action, and the results are shown in Figure 2 for two representative examples for different true sensor modes . It takes about 25 milliseconds to compute the next best action when all the sensors are healthy, and we observed that the average execution time increases linearly in the number of modes , i.e., exponentially with the number of fault-prone sensors .
V-A2 Maximal number of indistinguishable states
Figure 3 compares the maximal numbers of indistinguishable states (corresponding to the minimal values of the reward function ) after 6 actions for all and two illustrative examples of . The ordinate shows the fraction of system states (out of ) that obtains the number of indistinguishable states, , that is less than or equal to the maximal number (on the abscissa). Since the adaptive greedy strategy performs equally well as the brute force method, we omit the plots for the brute force policy that lie exactly on top of the plots for the adaptive greedy policy. Moreover, comparing Figures 2(a) and 2(b), we observe that the maximal number of indistinguishable states is larger when more sensors are faulty, as expected. Furthermore, for most system states , the best reward is already obtained by our adaptive greedy policy within 2 actions.
V-B Tests on a larger circuit
We also tested our greedy policy on a larger circuit with more sensors and components (cf. [maillet2013, Section V-B] for details), which is more representative of aircraft power distribution systems. When all sensors are healthy, the average execution time for obtaining the next best action is about 0.22 seconds. As before, the average execution time increases linearly with the number of sensor modes . Similarly, the performance of our adaptive greedy policy with actions is as good as the brute force policy with actions.
Vi Conclusions and Future Directions
In this paper, we considered stochastic state estimation problems with partial observations via active sensing. First, we introduced a property for set functions, called weak adaptive submodularity, that generalizes the concept of adaptive submodularity. Then, for the active diagnosis problem, we showed that adaptive greedy policies are near-optimal when the reward function is adaptive monotone and weakly adaptive submodular. Next, we considered the group-based active diagnosis problem, for which a special case is the active diagnosis problem with persistent sensor noise or faults, and proved that the group-based reward function is weakly adaptive submodular; hence the group-based active diagnosis problem can be solved using a simple adaptive greedy policy with guaranteed competitive performance when compared to the optimal adaptive policy. Our state estimation experiments with aircraft electrical systems plagued by persistent sensor faults demonstrated that the adaptive greedy policy performs just as well as a brute-force policy.
Future work will consider group-based active learning for decision-making using weak adaptive submodularity with the hope of removing the need for proxy set functions and specialized algorithms, while preserving the provable competitiveness of adaptive greedy policies. We will also consider the general active diagnosis problem where the sensor noise and faults can be persistent and/or non-persistent.