Controlling and monitoring physical processes via sensed data are integral parts of internet-of-things (IOT), cyber-physical systems, and defense with applications to industrial process monitoring and control, localization, tracking of mobile objects, environmental monitoring, system identification and disaster management. In such applications, sensors are simultaneously resource constrained (power and/or bandwdith) and tasked to achieve high performance sensing, control, communication, and tracking. Wireless sensor networks must further contend with interference and fading. One strategy for balancing resource use with performance is to activate a subset of the total possible number of sensors to limit both computation as well as bandwidth use.
Herein, we address the fundamental problem of optimal dynamic sensor subset selection for tracking a time-varying stochastic process. We first examine the centralized tracking of an iid process with an unknown, parametric distribution which serves as a benchmark for the first extension to decentralized tracking of this process. For both proposed algorithms, almost sure convergence to local optima can be proven. Next a distributed algorithm for tracking a Markov process with a known probability transition matrix is developed. All algorithms are numerically validated.
I-a Related Literature
Optimal sensor subset selection problems can be broadly classified into two categories: (i) optimal sensor subset selection for static data with known prior distribution, but unknown realization, and (ii) dynamic sensor subset selection to track a time-varying stochastic process. There have been several recent attempts to solve the first problem; see[3, 1]. This problem poses two major challenges: (i) computing the estimation error given the observations from a subset of sensors, and (ii) finding the optimal sensor subset from exponentially many number of subsets. In , a tractable lower bound on performance addressed the first challenge and a greedy algorithm addressed the second. In  only the second challenge is addressed via Gibbs sampling approach.
Dynamic sensor subset selection for a time-varying stochastic process is considered in e.g. [4, 5, 6, 7, 8, 9]. Markov process tracking with centralized controllers for single () or multiple sensor selection with energy constraints and sequential decision making ([4, 5]) have been previously studied. The optimal policy and its structural properties for a special case of dynamic sensor selection over an infinite horizon was examined in . Single sensor selection with the broadcast of sensor data is studied in . To our knowledge, the combination of Gibbs sampling (e.g. ) and stochastic approximation (e.g. ) has not been previously applied to iid process tracking as we do herein. The paper 
, using Thompson sampling, has solved the problem ofcentralized tracking of a linear Gaussian process (with unknown noise statistics) via active sensing.
Given our consideration of dynamic sensor subset selection for distributed tracking of a Markov process, traditional Kalman filtering is not applicable; however, there is much prior, recent art on the developmemt of consensus-based Kalman filtering [13, 14, 15, 16, 17] which we further adapt to work with our Gibbs sampling/stochastic approximation approach for the iid case. This distributed tracking problem does not appear to be heavily studied. In  perfect information sharing between sensors is assumed, which is impractical and  assumes that the estimation is done in a centralized fashion, while sensors make decentralized decisions about whether to sense or communicate to the fusion center. In , the sparsity of the Markov process is also exploited.
I-B Our Contribution
In this paper, we make the following contributions:
A centralized tracking and learning algorithm for an iid process with an unknown, but parametric distribution is developed. In particular, Gibbs sampling minimizes computational complexity with stochastic approximation to achieve the mean number of activated sensors constraint. Furthermore, simultaneous perturbation stochastic approximation (SPSA) is employed for parameter estimation obviating the need for the expectation-maximization algorithm. A challenge we overcome in the analysis, is handling updates at different time scales. As a precursor to the algorithm for unknown parameters, algorithms have been developed for a simpler version of the problem when the parameter of the distribution is known.
The centralized algorithm, which serves as a benchmark, is adapted to the distributed case by exploiting partial consensus. Our partial consensus is novel in that SPSA is again employed to learn the optimal consensus gains adaptively.
A trick for ensuring that all sensors employ similar sampling strategies is to have each sensor use the same seed in a random number generator.
For both the centralized and distributed algorithms, we can prove almost sure convergence to local optima. Furthermore, we an prove that the resources needed for communication and learning can be made arbitrarily small, by exploiting properties of the multi-scale updates.
Our final generalization to is to develop an algorithm for sensor subset selection for the decentralized tracking of a Markov chain with known probability transition matrix. We adapt methods for Kalman consensus filtering to our framework with Gibbs sampling and stochastic approximation.
Numerical results show that the decentralized scheme for a Markov chain performs close to that of the centralized scheme. Numerical results also show a tradeoff between performance and message exchange for learning. Furthermore, the numerical results show that global (not local) optima are achieved in tracking iid process.
The rest of the paper is organized as follows. The system model is described in Section II. Centralized tracking of an iid process with known distribution is described in Section III. Section IV deals with centralized tracking of an iid process having a parametric distribution with unknown parameters. Distributed tracking of iid process is discussed in Section V. Distributed tracking of a Markov chain is described in Section VI. Numerical results are presented in Section VII, followed by the conclusion in Section VIII.
Ii System Model
Ii-a Network, data and sensing model
We consider a connected single or multi-hop wireless sensor network. The sensor nodes are denoted by the set . It might also be possible that the connectivity in the network is maintained via a few relay nodes, but we ignore this possibility for the ease of analysis. There can be a fusion center (connected to all sensors via single hop wireless links) responsible for all control or estimation operations in the network, or, alternatively, the sensors can operate autonomously in a multihop mesh network.
The physical process under measurement is denoted by , where is a discrete time index and . We consider two models for the evolution of :
IID model: is an i.i.d. process with a parametric distribution , where the unknown parameter needs to be be learnt via the measurements. lies inside the interior of a compact subset .
Markov model: is a finite-state ergodic Markov chain with known transition probability matrix.
At time , if a sensor is used to sense the process, then the observation at sensor is provided by a
-dimensional column vectorwhere is an observation matrix of appropriate dimension, and is a Gaussian vector; the observation noise is assumed to be independent across and i.i.d. across .
Let be a vector where is the indicator that the sensor is activated at time ; if sensor is active at time , or else . The decision to activate any sensor for sensing and communicating the observation is taken either by the fusion center or by the sensor itself in the absence of a fusion center. We denote by the set of all possible configurations in the network, and by a generic configuration. Clearly, . Each configuration represents a set of activated sensors. is used to represent the configuration with its -th entry removed.
The observation made by sensor at time is . We define .
Ii-B Centralized estimation problem
The estimate of at the fusion center (connected to all sensors via direct wireless links) is denoted by . We denote by the information available at time at the fusion center about the history of observations, activations and estimates up to time , before the fusion center determines . On the other hand, we define where is the current estimate of ; denotes the information used by the fusion center at time to estimate . For i.i.d. time varying process, is sufficient to estimate and obtain , and will be available only after deciding the activation vector and collecting all observations. In order to optimally decide , the fusion center needs knowledge about the performance of all configurations in the past. Hence, and have two different information structures. However, we will see that, our Gibbs sampling algorithm determines by using only a sufficient statistic calculated iteratively in each slot.
The information structure used to track a Markov chain will be different, which we will see in Section VI.
We define a policy as a pair of mappings, where and . Our first goal is to solve the following centralized problem of minimizing the time-average mean squared error (MSE) subject to a constraint on the mean number of sensors active per unit time:
Ii-C Distributed estimation problem
In the absence of a fusion center in a multi-hop network, the estimate of at sensor is denoted by . We denote by the information available at time at sensor about the history before the sensor determines , and by the information available at sensor at time just before is estimated.
We define a policy , where and .
We seek to solve the following distributed problem:
Iii Centralized Tracking of IID proces: known
In this section, we provide an algorithm for solving the centralized problem (P1) when i.i.d. This is done by relaxing (P1) by a Lagrange multiplier. Though our final goal is to track an i.i.d. process with unknown , we discuss algorithms for known in this section, as a precursor to the algorithms developed in subsequent sections for unknown and also tracking a Markov chain. Also, extension to distributed tracking will be discussed in Section V for unknown , and hence will be omitted in this section.
Iii-a The relaxed version of the constrained problem
We relax (P1) by using a Lagrance multiplier and obtain the following unconstrained problem:
The multiplier can be viewed as the cost incurred for activating a sensor at any time instant. Now, since (P3) is an unconstrained problem and is i.i.d. across , there exists one optimizer (not necessarily unique) for the problem (P3); if the configuration is chosen at each , the minimum cost of (P3
) can be achieved by the law of large numbers. Hence, for known, the problem (P3) can be equivalently written as:
The following result tells us how to choose the optimal to solve (P1).
Theorem 1Consider problem (P1) and its relaxed version (P4). If there exists a Lagrange multiplier and a , such that an optimal configuration for (P4) under is , and the constraint in (P1) is satisfied with equality under the pair , then is an optimal configuration for (P1). In case there exist multiple configurations , a multiplier , and a probability mass function such that (i) each of is optimal for problem (P4) under , and (ii) , then an optimal solution for (P1) is to choose one configuration from with probability mass function .
Iii-B Basics of Gibbs sampling for known
Finding the optimal solution of (P4) requires us to search over possible configurations and to compute MMSE for each of these configurations. Hence, we propose Gibbs sampling based algorithms to avoid this computation.
Let us define the probability distributionover as (with ):
Following the terminology in statistical physics, we call the inverse temperature, and the partition function. is viewed as the energy under configuration . Now, . Hence, if a configuration is selected at each time with probability distribution for sufficiently large , then will belong to the set of minimizers of (P3) with high probability. However, computing requires addition operations; hence, we use a sequential subset selection algorithm based on Gibbs sampling (see [10, Chapter ]) in order to avoid explicit computation of while picking .
Algorithm 1Start with an initial configuration . At time , pick a random sensor uniformly from the set of all sensors. Choose with probability and choose with probability . For , choose . Activate the sensors according to .
Theorem 2Under Algorithm 1, is a reversible, ergodic, time-homogeneous Markov chain with stationary distribution .
Theorem 2 tells us that if the fusion center runs Algorithm 1 and reaches the steady state distribution of the Markov chain , then the configuration chosen by the algorithm will have distribution . For very large , if one runs for a sufficiently long, finite time , then the terminal state will belong to with high probability. Also, by the ergodicity of , the time-average occurence rates of all configurations match the distribution almost surely.
Iii-C The exact solution
Algorithm 1 is operated with a fixed ; but the optimal soultion of the unconstrained problem (P3) can only be obtained with ; this is done by updating at a slower time-scale than the iterates of Algorithm 1.
Algorithm 2This algorithm is same as Algorithm 1 except that at time , we use to compute the update probabilities, where , , and .
Theorem 3Under Algorithm 2, the Markov chain is strongly ergodic, and the limiting probability distribution satisfies .
For i.i.d. time varying with known joint distribution, we can either: (i) find the optimal configuration
with known joint distribution, we can either: (i) find the optimal configurationusing Algorithm 2 and use for ever, or (ii) run Algorithm 2 at the same timescale as , and use the running configuration for sensor activation; both schemes will minimize the cost in (P3). By the strong ergodicity of , optimal cost will be achieved for (P3) under Algorithm 2.
Iii-C1 Convergence rate of Algorithm 1
Let denote the probability distribution of under Algorithm 1. Let us consider the transition probability matrix of the Markov chain with , under Algorithm 1. Let us recall the definition of the Dobrushin’s ergodic coefficient from [10, Chapter , Section ] for the matrix ; using a method similar to that of the proof of Theorem 3, we can show that . Then, by [10, Chapter , Theorem ], we can say that under Algorithm 1, we have . We can prove similar bounds for any , where .
Unfortunately, we are not aware of such a closed-form bound for Algorithm 2.
Iii-D Gibbs sampling and stochastic approximation based approach to solve the constrained problem
In Section III-B and Section III-C, we presented Gibbs sampling based algorithms for (P3). Now we provide an algorithm that updates with time in order to meet the constraint in (P1) with equality, and thereby solves (P1) (via Theorem 1).
Lemma 1For the unconstrained problem (P4), the optimal mean number of active sensors, , decreases with . Similarly, the optimal error, , increases with .
The optimal mean number of active sensors, , for the unconstrained problem (P4) is a decreasing staircase function of , where each point of discontinuity is associated with a change in the optimizer .
Lemma 1 provides an intuition about how to update in Algorithm 1 or in Algorithm 2 in order to solve (P1). We seek to provide one algorithm which updates at each time instant, based on the number of active sensors in the previous time instant. In order to maintain the necessary timescale difference between the process and the update process, we use stochastic approximation () based update rules for . But the above remark tells us that the optimal solution of the constrained problem (P1) requires us to randomize between two values of in case the optimal as in Theorem 1 belongs to the set of such discontinuities. However, this randomization will require us to update a randomization probability at another timescale; having stochastic approximations running in multiple timescales leads to slow convergence. Hence, instead of using a varying , we use a fixed, but large and update in an iterative fashion using stochastic approximation. Our proposed Algorithm 3 updates iteratively in order to solve (P1).
Algorithm 3Choose any initial and . At each discrete time instant , pick a random sensor independently and uniformly. For sensor , choose with probability and choose with probability . For , we choose . After this operation, before the decision instant, update at each node as follows.
The update of in Algorithm 3 is inspired by the following result which is crucial in the convergence proof.
Lemma 2Under Algorithm 1, is a Lipschitz continuous and decreasing function of .
Discussion of Algorithm 3:
If is more than , then is increased with the hope that this will reduce the number of active sensors in subsequent iterations, as suggested by Lemma 2.
The and processes run on two different timescales; runs in the faster timescale whereas runs in a slower timescale. This can be understood from the fact that the stepsize in the update process decreases with time . Here the faster timescale iterate will view the slower timescale iterate as quasi-static, while the slower timescale iterate will view the faster timescale as almost equilibriated. This is reminiscent of two-timescale stochastic approximation (see [11, Chapter ]).
We make the following feasibility assumption for (P1), under the chosen .
By Lemma 2, continuously decreases in . Hence, if is feasible, then such a must exist by the intermediate value theorem.
Let us define: . Let denote under .
Iii-E A hard constraint on the number of activated sensors
Let us consider the following modified constrained problem:
It is easy to see that (MCP) can be easily solved using similar Gibbs sampling algorithms as in Section III, where the Gibbs sampling algorithm runs only on the set of configurations which activate number of sensors. Thus, as a by-product, we have also proposed a methodology for the problem in , though our framework is more general than .
Iv Centralized Tracking of IID process: Unknown
In Section III, we described algorithms for centralized tracking of an i.i.d. process where is known. In this section, we will deal with the centralized tracking of an i.i.d. process where with unknown; in this case, has to be learnt over time through observations, which creates many nontrivial issues that need to be addressed before using Gibbs sampling for sensor subset selection.
Iv-a The proposed algorithm for unknown
Since is unknown, its estimate has to be updated over time using the sensor observations. On the other hand, to solve the constrained problem (P1), we need to update over time so as to attain the optimal of Theorem 1 iteratively. Further, (MSE under configuration ) in (P4) is unknown since is unknown, and has to be learnt over time using the sensor observations. Hence, we combine the Gibbs sampling algorithm with update schemes for , and using stochastic approximation (see ).
The algorithm also requires a sufficiently large positive number and a large integer as input.
Let denote the indicator that time is an integer multiple of . Define .
We first describe some key features and steps of the algorithm and then provide a brief summary of the algorithm.
Iv-A1 Step size
For the stochastic approximation updates, the algorithm uses step sizes which are nonnegative sequences , , , such that:
Iv-A2 Gibbs sampling step
The algorithm also maintains a running estimate of for all . At time , it selects a random sensor with uniformly and independently, and sets with probability and with probability (similar to Algorithm 1). For , it sets . This operation can even be repeated multiple times. The sensors are activated according to , and the observations are collected. Then the algorithm declares .
Iv-A3 Occasional reading of all the sensors
If , the fusion center reads all sensors and obtains . This is required primarily because we seek to update iteratively and reach a local maximum of the function .
Iv-A4 update when
Since we seek to reach a local maximum of , a gradient ascent scheme needs to be used. The gradient of along any coordinate can be computed by perturbing in two opposite directions along that coordinate and evaluating the difference of at those two perturbed values. However, if is high-dimensional, then estimating this gradient along each coordinate is computationally intensive. Moreover, evaluating for any requires us to compute an expectation, which might also be expensive. Hence, we perform a noisy gradient estimation for by simultaneous perturbation stochastic approximation (SPSA) as in . Our algorithm generates uniformly over all sequences, and perturbs the current estimate by a random vector in two opposite directions, and estimates each component of the gradient from the difference ; this estimate is noisy because (i) and are random, and (ii) .
The -th component of is updated as follows:
The iterates are projected onto the compact set to ensure boundedness. The diminishing sequence ensures that the gradient estimate becomes more accurate with time.
Since is not known initially, the true value of is not known; hence, the algorithm updates an estimate using the sensor observations. If , the fusion center obtains by reading all sensors. The goal is to obtain an estimate of the MSE under a configuration , by using these observations, and update using . However, since is unknown and only is available, as an alternative to the MSE under configuration , the fusion center uses the trace of the conditional covariance matrix of given , assuming that
. Hence, we define a random variablefor all , where is the MMSE estimate declared by under configuration , and is the observation made by active sensors determined by . Clearly, is a random variable with the randomness coming from two sources: (i) randomness of , and (ii) randomness of which has a distribution since the original process that yields has a distribution . Computation of is simple for Gaussian and the MMSE estimator, since closed form expressions are available to compute .
Using , the following update is made for all :
The iterates are projected onto to ensure boundedness. The goal here is that, if , then will converge to , which is equal to under .
We will later argue that this occasional computation for all can be avoided, but convergence will be slow.
Iv-A7 The algorithm
Iv-A8 Multiple timescales in Algorithm 4
Algorithm 4 has multiple iterations running in multiple timescales (see [11, Chapter ]). The process runs ar the fastest timescale, whereas the update scheme runs at the slowest timescale. The basic idea is that a faster timescale iterate views a slower timescale iterate as quasi-static, whereas a slower timescale iterate views a faster timescale iterate as almost equilibriated. For example, since , the iterates will vary very slowly compared to iterates; as a result, iterates will view quasi-static .
Iv-B Complexity of Algorithm 4
Iv-B1 Sampling and communication complexity
Since all sensors are activated when , the mean number of additional active sensors per unit time is ; these observations need to be communicated to the fusion center. can be made arbitrarily small by choosing large enough.
Iv-B2 Computational complexity
The computation of in (4) for all requires computations whenever . However, if one chooses large (e.g., ), then this additional computation per unit time will be small. However, if one wants to avoid that computation also, then one can simply compute and update instead of doing it for all configurations . However, the stepsize sequence cannot be used; instead, a stepsize has to be used when and is updated using (4), where is the number of times configuration was chosen till time . In this case, the convergence result (Theorem 5) on Algorithm 4 will still hold; however, the proof will require a technical condition almost surely for all , which will be satisfied by the Gibbs sampler using finite and bounded . However, we discuss only (4) update in this paper for the sake of simplicity in the convergence proof, since technical details of asynchrounous stochastic approximation required in the variant is not the main theme of this paper.
When , one can avoid computation of for all in Step of Algorithm 4. Instead, the fusion center can update only , and at time , since only these iterates are required in the Gibbs sampling.
Iv-C Convergence of Algorithm 4
Iv-C1 List of assumptions
The distribution and the mapping (or for distributed case) as defined before are Lipschitz continuous in .
is known to the fusion center (centralized case), and are known to all sensors (distributed).
Let us consider with fixed in Algorithm 4. Suppose that, one uses Algorithm 1 to solve (P3) for a given but with the MSE replaced by in the objective function of (P3), and then finds the as in Theorem 1 to meet the constraint . We assume that, for the given and , and for each , there exists such that, the optimal Lagrange multiplier to relax this new unconstrained problem is (Theorem 1). Also, is Lipschitz continuous in .
Let us define the function .
Consider the function ; this is the expected conditional log-likelihood function of conditioned on , given that and . We assume that the ordinary differential equation
. We assume that the ordinary differential equationhas a globally asymptotically stable solution in the interior of . Also, is Lipschitz continuous in .
One can show that the iteration (2) asymptotically tracks the ordinary differential equation inside the interior of . In fact, when lies inside the interior of . The condition on is required to make sure that the iteration does not converge to some unwanted point on the boundary of due to the forced projection. The assumption on makes sure that the iteration converges to .