# Distributed Detection and Mitigation of Biasing Attacks over Multi-Agent Networks

This paper proposes a distributed attack detection and mitigation technique based on distributed estimation over a multi-agent network, where the agents take partial system measurements susceptible to (possible) biasing attacks. In particular, we assume that the system is not locally observable via the measurements in the direct neighborhood of any agent. First, for performance analysis in the attack-free case, we show that the proposed distributed estimation is unbiased with bounded mean-square deviation in steady-state. Then, we propose a residual-based strategy to locally detect possible attacks at agents. In contrast to the deterministic thresholds in the literature assuming an upper bound on the noise support, we define the thresholds on the residuals in a probabilistic sense. After detecting and isolating the attacked agent, a system-digraph-based mitigation strategy is proposed to replace the attacked measurement with a new observationally-equivalent one to recover potential observability loss. We adopt a graph-theoretic method to classify the agents based on their measurements, to distinguish between the agents recovering the system rank-deficiency and the ones recovering output-connectivity of the system digraph. The attack detection/mitigation strategy is specifically described for each type, which is of polynomial-order complexity for large-scale applications. Illustrative simulations support our theoretical results.

## Authors

• 18 publications
• 5 publications
• 38 publications
• 22 publications
• 19 publications
04/01/2021

### Delay-Tolerant Consensus-based Distributed Estimation: Full-Rank Systems with Potentially Unstable Dynamics

Classical distributed estimation scenarios typically assume timely and r...
05/22/2021

### Simultaneous Distributed Estimation and Attack Detection/Isolation in Social Networks: Structural Observability, Kronecker-Product Network, and Chi-Square Detector

This paper considers distributed estimation of linear systems when the s...
12/22/2020

### Distributed Q-Learning with State Tracking for Multi-agent Networked Control

This paper studies distributed Q-learning for Linear Quadratic Regulator...
10/19/2019

### Resilient Distributed Recovery of Large Fields

This paper studies the resilient distributed recovery of large fields un...
06/23/2022

### A Fast Algorithm for Robust Action Selection in Multi-Agent Systems

In this paper, we consider a robust action selection problem in multi-ag...
11/06/2019

### Asymptotic Analysis for Greedy Initialization of Threshold-Based Distributed Optimization of Persistent Monitoring on Graphs

We consider the optimal multi-agent persistent monitoring problem define...
01/21/2022

### Mitigating Smart Jammers in MU-MIMO via Joint Channel Estimation and Data Detection

Wireless systems must be resilient to jamming attacks. Existing mitigati...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Data (or measurements) regarding many real-world systems, such as wireless sensor networks, multi-agent robotic systems, block-chain and cloud-computing, smart energy networks, are naturally distributed over large geographical regions [2, 52]. Collecting all these to a central coordinator (or a fusion center) for the purposes of processing and learning is tedious and impractical in many applications. Distributed learning or inference is thus typically preferred, due to the fact that it does not require long-range communication to a central unit. The corresponding distributed strategies are practically feasible as they rely on local data processing and local communication only among the neighboring agents. However, such decentralized strategies are vulnerable to malicious attacks. In this paper, we consider distributed detection and mitigation of biasing attacks at sensors/agents performing distributed estimation over a large-scale dynamical system. Potential applications include secure distributed estimation over Cyber-Physical-Systems (CPS) [97, 24, 35, 96, 77, 83, 26], Internet-of-Things (IoT) [14, 38, 75], smart cities [44], social networks [78, 34, 20], and power-grid monitoring systems [18, 16, 80, 52, 39, 7, 66, 3, 64, 9] among others.

In distributed estimation (or filtering) applications [56, 6, 73] a multi-agent network is referred to a group of agents with sensing, data-processing, and communication capabilities, which take (noisy) output or measurements of the dynamical system, share their information over a network, and process the received data locally to track the system state. In case of erroneous or biased data [17, 68], the distributed estimation performance is significantly degraded if the biased measurements are necessary for observability. Recall that observability refers to the possibility of inferring the (entire) states of the dynamical system via tracking outputs/measurements of a subset of states over a finite time. This is more challenging in single time-scale estimation with only one step of data-fusion between every two consecutive time-steps of system dynamics, and with no local observability (i.e., the system is not observable in the neighborhood of any agent) [56, 6, 11, 13, 4, 87, 76]. This differs from double time-scale estimation where all necessary information for observability is directly communicated to every agent from its neighbors. This requires considerably more communication traffic and information exchange over the network. This implies that the biased (attacked) measurement affects the residual (defined as the deviation of the estimated/expected output from the original system output [42]) at more agents, making it harder to locally isolate the faulty sensor. Such additive bias could be, for example, due to false-data injection attacks [43]. The general idea in this work is to locally detect and isolate such attacks and, further, reconfigure the multi-agent network using substitute measurements to recover (potential) loss of observability.

The distributed estimator in this paper performs consensus (on the received data) at the same time-scale of the underlying system (single time-scale), see e.g., [55, 30] for details. We use structured systems theory [19, 30, 31]) to guarantee generic or structural observability. This helps to partition the system outputs (fed to the agents) into certain observationally-equivalent classes [27]. This gives the set of necessary agents for estimation (whose removal makes the system unobservable) and the set of redundant agents (whose removal results in no observability loss). Subsequently, different strategies are used to substitute the faulty sensor and design inter-agent communications. We propose our attack detection and mitigation strategy based on this specific agent classification. In particular, we show that isolation of the attacks related to system rank-deficiency is more challenging and requires certain constrained gain design. Recall that system rank refers to the rank of the associated matrix to the linear system of differential equations (in the state-space representation), see Section 2.1 for more details.

Comparison with related literature: this work develops a joint distributed estimation and attack detection/isolation technique, and extends the prior works on resilient distributed estimation subject to unreliable sensor measurements [79, 36] and adversarial attacks [40, 69, 72, 93, 98, 86, 63, 92, 13]. These literature do not detect/isolate the attack, but estimate the system in the presence of (specific) attacks with bounded (steady-state) error, while making simplifying assumptions, e.g., a noise-free model. Our work extends [79, 36, 40, 69, 72, 93, 98, 86, 63, 92, 13] by further considering distributed/localized techniques to locate the attacked sensor. Further, this work differs from many works on distributed estimation in the literature by relaxing the observability assumption; for example, [56, 6, 11, 13, 4, 87, 76] assume local observability at some (or all) agents. In contrast, and similar to [47, 48, 73], our work makes no such restrictive assumption. However, [47, 48, 73] perform many iterations of data-fusion (consensus) between two consecutive system steps (double time-scale estimation), requiring much faster data-processing/communication rate.

In the context of adversarial attacks, most observer-based detection scenarios assume system and/or measurement noise with bounded support, i.e., they consider an upper bound on the noise variable [59, 74, 15, 61, 84]

. In this paper, we make no such assumption; instead, the noise is assumed to be of infinite support (i.e., it can take any arbitrarily large value with bounded second-order moment). Therefore, we propose probabilistic attack-detection thresholds, in contrast to the deterministic threshold design (or

flag value) in observer-based detection methods [59, 74, 15, 61, 84]. In another line of research [50, 91, 46, 10, 81, 85, 51, 99], distributed attack detection without observer/estimator design is considered. These works consider a multi-agent network aiming to detect (typically Byzantine) attack in a sensed signal in a distributed way, with no estimation purpose (due to unknown system model). For example, [77]

uses innovation variance to detect attacks (component malfunctions) in linear-quadratic-Gaussian (LQG) CPS models. However, our main goal is to detect the attacks (in form of biasing anomalies changing the true output values

[68]) deteriorating distributed estimation performance, and, further, to provide a mitigation strategy to restore observability (more precisely, distributed observability [25]). In this regard, this paper performs simultaneous distributed estimation and attack-detection, which makes it different from [50, 91, 46, 10, 81, 85, 51, 99, 77] performing only detection.

Of relevance are also watermarking strategies [70, 82], that inject a known input signal (watermark) into the system and track this watermark in the outputs using Chi-square testing (-detector). Such input injection is not possible for tracking autonomous systems, and thus, the physical watermarking is impractical in such cases. The distributed strategy in this work is not limited to full-rank LTI systems, in contrast to distributed estimators in [17, 4, 87, 76, 32] over strongly-connected (SC) sensor-networks. Further, unlike the static parameter estimation in [12] and noiseless centralized attack-detection/estimation in [11], this work is based on distributed estimation of noise-corrupted linear systems. Another relevant topic is compressive sensing [49, 88, 62, 67, 45, 95] to translate the data into a compressed dimension, share and combine the data, reconstruct it to the full dimension, and perform diffusion-based [95] or least mean square (LMS) update [62, 45, 67] to estimate the original signal. Although the compressed transmit of data is applicable in our work (to reduce the communication burden), distributed dynamic observability makes our work different from [62, 95, 67, 45, 9] based on static observability irrespective of the dynamic system model. Recall that this is referred to as the Static Linear State-Space (SLS) model in detection literature [42] and differs from our solution considering Linear Dynamical State-Space (LDS) model111

Using the dynamic model of the system (LDS case), fewer outputs are needed to reconstruct the full state of the system (dynamic observability), while in the static or SLS case (with no information of system dynamics) in general more outputs (as many as system states) are needed. Having fewer outputs in the SLS case results in under-determined system of linear equations (unobservability), which mandates substitute recovering solutions such as compressive-sensing or auto-encoder neural networks

[1]. A compressive-sensing-based example for the smart-grid application is given in [9], which requires no rank condition on the SLS model. . Similarly this work differs from centralized estimation in [49, 88] with certain assumptions on the sparsity of the initial states [49, 88] or system rank [49]

. Autoencoder-based learning is used in some works

[89, 90, 94, 58]

to distinguish (classify) faulty/attacked data from non-attacked measurement data. In smart-grid applications, the PMU measurements are used to train the detector via either supervised learning

[89, 90][94]. No dynamics is considered in these works (SLS model), contrasting our (distributed) observability-based LDS model. Further, [89, 90, 94] only perform detection with no aim of estimation in the absence of attacks, while some works (see references in [58]) only perform learning-based estimation with no possibility of detection. Recall that noise (in system dynamics and/or output) plays a key role in the LDS detection. As mentioned before, the assumption on the noise support (finite or infinite) and its value in the finite-case affects the performance of the detection mechanism [59, 74, 15, 61, 84]. Similarly, noise in the output data affects the SLS detection performance, e.g., in power-system applications [62, 95, 67, 45, 9]. See more details along with a review of centralized physics-based detection mechanisms in [42].

Main contributions: (i) Our observer-based detection strategy is localized and distributed over the multi-agent network with no local observability assumption at any agent, but global observability at the group of agents. This is key in large-scale, as it enables each agent to detect a (possible) attack on its received output with no central coordination, in contrast to centralized detection scenarios. (ii) Using certain agent classification based on system-rank, we develop detection and attack isolation strategies which are specific to the measurement types based on the system dynamics (LDS model) (see Section 2.2 for detailed explanation). (iii) The noise is considered over an infinite range with no constraint/bound on its support, which is more realistic for real-world applications (see Remark 1). In this sense, our attack detection and mitigation is categorized as probabilistic (vs. deterministic) thresholding. (iv) In order to prevent repetitive attacks at the same agent by the adversary, we consider an attack mitigation strategy to replace the biased measurement with an observationally-equivalent one (borrowing results from [27, 37]). We emphasize that the proposed algorithms for threshold design, agent classification, and mitigation via observational equivalency are of polynomial-order complexity.

Notation:

Throughout this paper, scalar and (column) vector variables are respectively represented by lower-case and bold lower-case letters. Further, capital letters represent matrices. The induced

-norm of the matrix is defined as where and denotes the spectral radius of matrix. Further, denotes the Euclidean norm. Table I summarizes the notation in this paper.

## 2 Problem Setup

### 2.1 Linear Dynamical System

Following the discussions in Section 1, we consider noise-corrupted linear discrete-time systems (LDS model [42]) as,

 xk+1 =Axk+νk, (1)

with as the column-vector of states at time , as the system matrix, and as the system noise vector. Throughout the paper, the system-rank refers to the rank of the system matrix . Consider a group of agents with scalar outputs given by and the vector form as,

 yk =Cxk+ζk+τk, (2)

with as the column-vector of state measurements (or system outputs) , as the measurement noise vector, and as the column-vector of biasing attack at the agents. We assume arbitrary attack by the adversary, e.g., both fixed stationary attack and non-stationary attacks are considered for simulation (Section 5). Further, the measurement matrix is the column concatenation of row-vectors associated with agent  (with “;” as column concatenation). Standard assumptions on Gaussianity and independence of noise terms are considered. For example, it is typical to assume that the sensor measurements are independent, making the measurement noise covariance matrix diagonal.

###### Remark 1.

Several papers in the literature (e.g., [59, 74, 15, 61, 84]) assume constrained noise and/or , where the upper bound on the noise support sets the deterministic thresholds for attack detection. For example, in [74] the deterministic threshold at sensor is defined as with and as the 2-norm of the observability Grammian and the state-estimation error, respectively. In contrast, we make no such finite support assumption (loosely speaking, ), while it is standard to assume that the second moments of the noise terms are finite, i.e., and . Assuming unbounded , the deterministic threshold, for example in [74], also goes unbounded (), and thus, no attack can be detected. Similar arguments hold for [59, 15, 61, 84].

###### Remark 2.

Note that noise Gaussianity is a standard assumption in most distributed estimation/filtering and attack detection literature, e.g., see [73, 68, 97, 24, 35, 96, 83, 18, 16, 80, 52, 39, 79, 36, 72, 93, 50, 10, 99, 70, 82, 32, 11, 23, 22, 54].

### 2.2 Agent Classification based on Structural Analysis

The notion of observability used throughout this paper is structural [22, 19, 65] and the theory is build on this notion. It is known that the rank deficiency of matrix and strong-connectivity of system digraph affect its structural observability properties, and further, its estimation performance. In this direction, using structured systems theory and generic analysis [19, 65], we propose specific sensor/agent classification based on the structure (zero-nonzero pattern) of the system matrix and system digraph . Using the theory developed in [27, 35], the agents are partitioned into different classes based on their state-measurements. We specifically show in Section 4.2 that the detection and mitigation logic differs for each class. First, we describe some relevant graph-theoretic notions. In , every node represents a state and every link represents a fixed non-zero entry of ( implies as a link from node to node ). In a strongly-connected-component (SCC) is a component in which every node is connected to every other node via a path. Define a parent SCC as an SCC with no out-going links to other SCCs. Further, a contraction is a component for which , with and as the set cardinality. Based on these graph components, three types of agents are defined as follows,

• -agent is an agent with measurement of a state node in a contraction .

• -agent is an agent with measurement of a state node in a parent SCC .

• -agent is any agent which is neither type nor .

An example of such classification is given in Section 5. This partitioning has two advantages: (i) it allows using a different communication topology for different types of agents and simpler topology design when one or the other type of agents is not present; and (ii) it allows for the attack detection and mitigation strategy to be specifically defined for each type (see details in Section 4.2). In particular, following [23], it can be shown that any -agent recovers the (structural) rank condition for observability, while the -agent recovers the output-connectivity of the system digraph [22]. Therefore, both and -agents are necessary for observability, while removing (redundant) -agents has no effect on system observability. Recall that the structural properties are irrespective of the numerical values of system parameters [19]; therefore, for a structure-invariant matrix the proposed classification is fixed and time-invariant.

### 2.3 Problem Statement

This paper considers a group of sensors/agents taking noise-corrupted measurements in the form (2) of a dynamical system (e.g., social network or power grid) in the form (1) represented by a system digraph , see Fig. 2. The agents perform distributed estimation over a network, denoted by to track the state of the noisy dynamical system (1). Note that the networks , , and their union include all the agents of type , , and . It is assumed that an adversarial attacker aims to add an arbitrary value (at any time ) to make the measurement at (one or more) agent biased from its original value. Since the dynamical system is not necessarily observable at any agent, the biased measurements (at -agents) affect the estimation error at all agents and result in the degradation of the distributed estimation performance. The problem here is to find a strategy to detect (and isolate) such instantaneous attacks locally at each agent. In particular, we propose a probabilistic detection strategy that returns the probability of attack (at each agent), instead of deterministic strategies returning 0-1 (NoAttack-Attack). The next question addressed in this paper is how to recover the potential loss of observability due to removing the attacked measurement depending on its type (, , or ). Such countermeasures prevent the same adversarial attack by removing the attacked agent/measurement. As explained in Section 4.2, the attacked measurement can be replaced with a new observationally-equivalent one to avoid possible repetitive attacks at the same agent.

### 2.4 Assumptions

1. The pair is observable. The pairs and are not necessarily observable at any sensor or in its neighborhood denoted by (see details in Section 3). This implies that the underlying system is not necessarily observable in the neighborhood of any agent.

2. The noise terms , are iid Gaussian, see Remark 1.

3. The known system matrix is not necessarily stable, i.e., its spectral radius can be potentially greater than . In other words, this paper applies to both stable and unstable systems.

4. The adversary can manipulate the state measurements at a subset of sensors by adding erroneous additive term at any time . For example,

can be from a uniform distribution over

with ( in general) or can be a fixed value. In general, the term may be non-zero at some time-instants (instantaneous attack) and zero at some other times.

## 3 Distributed Estimation under Possible Measurement Attacks

In this section, we propose a consensus-based distributed estimation (filtering) protocol over the multi-agent network. The proposed protocol performs one iteration of information sharing and consensus between every two consecutive steps of system dynamics as follows:

 ˆxik|k−1 =∑j∈Nβ(i)WijAˆxjk−1|k−1, (3) ˆxik|k =ˆxik|k−1+Ki∑j∈Nα(i)cj(yjk−c⊤jˆxik|k−1), (4)

where is the measurement of agent at time that could be attack-corrupted (or biased), and are the neighborhood of agent , respectively, over network and , is the local feedback gain (or the observer gain) matrix at agent , and and are the (column-vector of) estimates of system state at agent given the measurements, respectively, up to time and . In fact, is the a-priori estimate (or prediction) and is the posteriori estimate after measurement-update at time-step .

###### Remark 3.

In this work, the combination of the following two graphs forms the multi-agent network: (i) over which agents share the estimates , and (ii) over which agents share their measurements . Define matrices and as the associated matrices to the graphs and , respectively. The matrix is the 0–1 adjacency matrix of , with associated to the link in from -agent to every agent . The non-zero entries of take values in the range associated to the link in .

Matrix is row-stochastic to ensure consensus on a-priori estimates, i.e., for all . Such a matrix (and the graph ) can be formed via distributed algorithms in [8]. The structure of and (and the associated matrices) need to be designed properly for bounded steady-state estimation error, see Section 3.1.

###### Remark 4.

The proposed protocol (3)-(4) is a single time-scale distributed estimator, where the estimation is performed at the same time-scale of the system dynamics. This is in contrast to the double time-scale protocols [47, 48, 73], which require much faster estimation and communication rate than the sampling rate of the system dynamics, and, therefore, demand more costly communication and processing equipment. However, the observability assumption in [47, 48, 73] is similar to Assumption (ii), which makes such scenarios suitable for large-scale applications as the proposed protocol (3)-(4); see examples in Section 5.

Denote the estimation error at agent at time by and let be the global or collective error. Then, the following proposition defines the error dynamics of the protocol (3)-(4).

###### Proposition 1.

The global error dynamics for protocol (3)-(4) is,

 ek =(W⊗A−KDC(W⊗A))ek−1+ηk=ˆAek−1+ηk, (5) ηk =1N⊗νk−1−KDC(1N⊗νk−1)−K¯¯¯¯¯DCζk−K¯¯¯¯¯DCτk, (6)

where collects the noise terms, , , , and with “” and “”, respectively, as the entrywise (Hadamard) and Kronecker product.

###### Proof.

The error at each agent is as follows,

 eik =xk−(∑j∈Nβ(i)WijAˆxjk−1|k−1+Ki∑j∈Nα(i)cj(yjk−c⊤j∑j∈Nβ(i)WijAˆxjk−1|k−1)).

Recalling stochasticity of matrix, we have . Substituting this along with equations (1)-(2),

 eik =∑j∈Nβ(i)WijAxk−1−∑j∈Nβ(i)WijAˆxjk−1|k−1 =∑j∈Nβ(i)WijAejk−1−Ki∑j∈Nα(i)cjc⊤j∑j∈Nβ(i)WijAejk−1+ηik, (7)

with . Using the definition of Kronecker and entrywise products, the collective error and noise term follow Eq. (5)-(6). ∎

### 3.1 Error Stability

The following lemma establishes the stability condition of the error dynamics (5)-(6).

###### Lemma 1.

The necessary condition for error dynamics (5)-(6) to be stable is that the pair is observable.

###### Proof.

The proof follows the Kalman stability theorem on the error dynamics (6). More information can be found in [5, 22, 53] on error stability of linear observer design. ∎

Note that -observability is also referred to as the distributed observability [25]. Using structured system theory (generic analysis), distributed observability can be formulated as the observability of the Kronecker product of the graphs and . Following the observability analysis of Kronecker composite networks in [29], the following lemma determines the sufficient connectivity of and .

###### Lemma 2.

The pair is observable if and only if the following conditions hold:

1. is strongly-connected (SC) with self-link at each agent, which further implies that is irreducible.

2. is a hub-network in which every -agent is a hub, i.e., there is a directed link from every -agent to every other agent in . Further, for every agent .

###### Proof.

We provide the sketch of the proof here and refer the interested reader to [29] for more details. For (structural) observability two conditions on the associated composite graph need to be satisfied [22, 65]: (i) the output connectivity condition, implying the existence of a directed path from every state node in the system graph to an agent (output), and (ii) the rank condition, implying a direct output of (at least) one state node in every contraction in for system-output rank recovery. In this work, the global system graph associated with is the Kronecker-product of and . Recall that for -observability (or distributed observability) the global system state must be observable to every agent. Therefore, to satisfy condition (i), every state node needs to be connected via a directed path to every agent, which justifies strong-connectivity of . On the other hand, to satisfy condition (ii), the outputs from state nodes measured by all -agents (including one node in every contraction) need to be directly shared among all agents to recover their system-output rank. This implies that for any -agent , we have . This justifies the connectivity of , and completes the proof. ∎

With and satisfying the conditions in Lemma 2, the block-diagonal gain matrix can be designed such that , i.e., is a Schur matrix. In fact, the gain matrix is known to be the solution to the Linear-Matrix-Inequality (LMI) or equivalently,

 (XˆA⊤XXˆAX)≻0, (8)

for some (where “” denotes positive-definiteness). However, to satisfy the distributed condition, needs to be further block-diagonal in order to satisfy information locality. Following [41, 53], iterative cone-complementarity optimization method is adopted to design the proper matrix with polynomial-order complexity. Applying such matrix, we have , which implies stability and steady-state boundedness of the error in the attack-free case.

### 3.2 Performance Analysis in the Attack-free Case

Next, we provide the performance analysis of the proposed distributed estimator (filter) (3)-(4) in the attack-free case. Following the same analogy as in [56, 6, 73, 54], we analyze the mean performance and mean-square performance of the protocol (3)-(4) for .

###### Lemma 3.

Let denote the steady-state error of the proposed estimator (3)-(4). Then, .

###### Proof.

Taking expectation of the error dynamics (5),

 E(ek) =ˆAE(ek−1)+E(ηk). (9)

Recall from Section 3.1 that and following from [56, 54], it is clear that the first term in (9) vanishes asymptotically. Then, from (6) in the attack-free case (),

 E(e∞) =E(η∞) =1N⊗E(ν∞)−KDC(1N⊗E(ν∞))−K¯¯¯¯¯DCE(ζ∞).

Recall from Section 2.1 that and . This implies that and the lemma follows. ∎

###### Lemma 4.

Define and . Let denote the collective error covariance at the steady-state. For error dynamics (5) in the attack-free case,

 ∥Q∞∥2≤a1N∥E∥2+a2∥¯¯¯¯R∥21−b2, (10)

with , and , .

###### Proof.

Following [54] with ,

 ∥Q∞∥2≤∥Φ∥21−b2. (11)

From (6) we have,

 ηkη⊤k =(INn−KDC)(1NN⊗νk−1ν⊤k−1)(INn−KDC)⊤ +(K¯¯¯¯¯DC)ζkζ⊤k(K¯¯¯¯¯DC)⊤. (12)

Then, from (6),

 ∥Φ∥2 ≤∥(INn−KDC)(1NN⊗E)(INn−KDC)⊤∥2 +∥(K¯¯¯¯¯DC)R(K¯¯¯¯¯DC)⊤∥2.

Using the fact that ,

 ∥Φ∥2≤∥INn−KDC∥22N∥E∥2+∥K∥22∥¯¯¯¯R∥2, (13)

and applying equation (11) results in (10). ∎

In fact, Lemma 3 implies that the estimator (3)-(4) is unbiased in the absence of attacks, while Lemma 4 states that its mean-square estimation error (also known as mean-square deviation [6]) is bounded in steady-state.

## 4 Main Algorithm

We now describe the attack detection logic. Define the residual at every agent as the absolute difference value between the original output and the estimated output,

 rik ≜|yik−ˆyik|=|c⊤iˆAiek−1+c⊤iηik+ζik+τik|. (14)

Note that the residual defined above based on the absolute-value is a standard definition, which is irrespective of the attack being positive () or negative () and works for both sign-preserving and sign-changing attacks. As shown in Lemmas 3 and 4, in the attack-free case with , the estimation error , and therefore, the residual is bounded steady-state stable and unbiased at all agents. Note that in general due to Schur stability of , while the second term in (14) is,

 c⊤iηik=c⊤iνk−1−c⊤iKi∑j∈Nα(i)(cjζjk+cjτjk+cjc⊤jνk−1). (15)

In case of an attack on agent , i.e., , the term is biased at agent . This biased residual can be used to find (isolate) the attacked agent. In this sense, first, we need to define a threshold on the residuals to distinguish the effect of noise terms (in absence of attacks) and the biasing attacks.

### 4.1 Probabilistic Threshold Design

Here, the probabilistic detection thresholds are defined based on in (10). For each agent define,

 ∥Q∞∥2N≤a1N∥E∥2+a2∥¯¯¯¯R∥2N(1−b2)=:Θ1. (16)

Then, for specific false alarm rates and attack detection probabilities , one can consider different detection-levels as described in Fig. 3. A detection-level represents a specific probability threshold associated with the Gaussian PDF of the estimation error in the attack-free case. Then, the thresholds are designed as follows.

###### Lemma 5.

Following the assumptions in Section 2.4, given the noise covariance and and the residuals from Eq. (14), the attack detection threshold for a detection-level is,

 θκ\coloneqqmΘi2, Θi2\coloneqq|c⊤i|Θ1+Rii (17)

where is detection probability (with as the Gauss error function), is the measurement column-vector at agent , and follows (16).

###### Proof.

The proof directly follows from Lemma 3 and 4 and the results in [54]. From Lemma 3 and 4, for attack-free case, and following the zero-mean Gaussian distribution of the noise terms in (including and ) and linearity of the error dynamics (5)-(6) and the protocol (3)-(4), it is straightforward to see that and are Gaussian; see details in [54]. Then, from standard textbooks on Gaussian distribution (e.g., [60]) and Eq. (14) in attack-free case, the probability of with is determined via the value of the normal deviate less than , i.e., . Recall that is the residual variance and is the measurement noise variance at agent . Then, in presence of attack, both error and residual are biased by some products of

(due to linearity). In this case, the residual follows a biased Gaussian distribution with non-zero mean. Following statistical hypothesis testing for the two Gaussian distributions with equal variance (assuming equally likely a-priori hypothesis), if the residual

is greater than then the probability of attack is and probability of false alarm is . This justifies the probability thresholds (as illustrated in Fig. 3) and completes the proof. ∎

The parameter in (17) and Lemma 5 can take any real (or integer) value in . Some typical threshold probability values for integer values of are given in Table II. Clearly, higher values of (and ) implies lower false alarm rates.

###### Remark 5.

A straightforward sequel to Lemma 5 is that one can design the threshold for a given false-alarm rate as .

###### Remark 6.

The magnitude of the residual is tightly related to the magnitude of the biasing attack . In other words, greater measurement bias results in greater residual exceeding the threshold with higher attack probability and lower probability of false alarm .

Recall from Remark 1 that, unlike [59, 74, 15, 61, 84] considering a fixed (deterministic) threshold based on the upper bound on , Eq. (17) assigns probability to the threshold with no such upper bound assumption on the noise terms, implying the probabilistic threshold design.

### 4.2 Attack Detection and Mitigation Logic

Recall that, following Lemma 2, the connectivity of the , , and -agents over and results in the next lemma.

###### Lemma 6.

Following the connectivity condition in Lemma 2 and residual formulations in (14)-(15),

1. In case of having no -agent222Number of -agents is equal to the rank-deficiency of the system matrix [23]. Therefore, for a full-rank system the associated distributed estimator has no -agent [32]. , attack at any or -agent is isolated.

2. For isolation of attack in presence of an -agent , the gain matrix needs to satisfy,

 ∣∣ ∣∣c⊤iKicjc⊤jKjcj−1∣∣ ∣∣≤ϵ, for i≠j, (18)

where is a pre-specified constant determining the residual ratio.

###### Proof.

From Lemma 2, in absence of any -agent, for any agent of type and . Thus, from (14)-(15), biasing attack at a or -agent only affects the residual . This implies that is biased while () is unbiased, implying that attack is isolated at any /-agent. On the other hand, in the presence of an -agent subject to attack , Eq. (14)-(15) implies that the residual at every agent is affected by the attack at agent via the term , while the residual at -agent is affected by the factor . Therefore, Eq. (18) ensures that (for ), implying greater residual at -agent by factor . This constraint ensures that the attack can be isolated at every -agent . ∎

Following Lemma 5 and 6, for the attacked agent (of any type) the residual is (more) biased over in (17), while the residuals at other agents are less biased (or unbiased). Largest such that declares the probability of attack (or probability of false alarm ). Likewise, from Remark 5 and 6, the attack detection logic can be designed for a given false alarm rate (and probabilistic threshold ) at sensor . Then, similar to the deterministic case, the following hypothesis testing locally declares “Attack“ or “No-Attack“ at sensor (under certain false alarm rate ),

 (19)
###### Remark 7.

A relevant concept is nodal/local consistency of measurement/prediction information (data) set at agent and at every time , denoted by [57]. Recall that nodal consistency checks the statistical consistency of with the information over a sliding time-window , declaring that is trustable or not. In this direction, one can track the information over such time-window and apply, for example, a chi-square detector on the residuals over [20] instead of instantaneous residuals (14). Local consistency, on the other hand, checks the statistical consistency of the common information (e.g., on the shared observable subspace) between and received information , , and declares if is trustable or not. Note that for (necessary) /-agents, weak local consistencies imply certain loss of observability information and degradation of estimation performance.

###### Remark 8.

(Attack mitigation) From Section 2.2, /-agents are necessary for observability; therefore, in case of attacks, their erroneous information of their observable subsystems makes those subsystems unobservable to all agents, causing unstable estimation error. To recover the loss of observability, recall that the states in the same parent SCC and in the same contraction are observationally-equivalent, in the sense that measurement of two states in or in provide information on the same observable subsystem. In other words, the information offered by two state measurements (agents ) are said to be observationally-equivalent if they equally contribute to the rank recovery of the observability Gramian (see detailed definition in [27, 37]). In this regard, for attack mitigation, the biased measurement can be replaced with a new measurement of an observationally-equivalent state in or . Note that, after mitigating the attacks, the performance analysis follows as in Section 3.2.

###### Remark 9.

(Cost-optimal mitigation) Given an observationally-equivalent set of state nodes or

, the substitute/replacement state measurement can be chosen based on its sensing cost. Combinatorial optimization strategies

[28], e.g., the well-known Hungarian algorithm, can be adopted to find the minimal-cost equivalent measurement to reduce the overall sensing cost. Similar arguments hold for cost-optimal design of the multi-agent network , e.g., using the so-called minimum spanning strong sub-graph algorithm [33].

Remark 8 along with Lemma 5 and 6 result in Algorithm 1.

Note that the terms in (5) and in (16) are defined locally, i.e., the -th diagonal block of and related to agent are defined based on received measurement information and from its direct neighbors (summation is over ). Therefore, the calculations of these terms are distributed and localized over the network. The thresholds in (17), agent types, and the sets of observationally-equivalent states in the system digraph are determined by a central entity once off-line, then, broadcasted and transmitted to every agent. This procedure is done once and the information is stored at all agents; then, the agents can perform estimation and detect the attack locally with no further role of the centralized entity. See similar assumptions in [53, 54] for distributed estimation/filtering.

###### Remark 10.

The DM (Dulmage-Mendelsohn) decomposition and DFS (depth-first-search) or Kosaraju-Sharir algorithms can be used, respectively, to find contractions and SCCs (along with their topological order) with computational complexity and [71]. The residual calculation at agents is of complexity, while the complexity of the threshold design based on -norm calculation is . Overall, the complexity of Algorithm 1 is . This polynomial order complexity suits large-scale applications.

## 5 Simulation

For simulation we consider a dynamical system with states associated with the system digraph in Fig. 4-(Left). The link weights in are considered randomly (such that ).

Following Remark 10, the contractions and parent SCCs in are: , , and . From Section 2.2, one output from each of these node sets ensure observability of . As shown in Fig.4-(Left), agents , , and take output of state , , and , respectively, along with a redundant agent with output of state (which is not necessary for observability). Following Section 3, the network is considered as a cycle, while in agents and are two hubs of the network. Each agent adopts the proposed protocol (3)-(4) to estimate all system states (with partial observability via its measurement and neighboring information). The link weights in (the nonzero s) are chosen randomly such that is row-stochastic. The noise terms follow and . The block-diagonal gain

is determined via heuristic LMIs such that, for example:

, , , , satisfying Lemma 6 for any with as agent in (18). Likewise, for agent , implying that, for this given , the attack-related portion of the residual at attacked agent is almost times greater than the residuals at other (non-attacked) agents. Therefore, any attack at agents can be isolated. The parameters in Eq. (16) are , , , which result in and . We consider fixed attack at agent (following Assumption (iv)) along with an auto-regressive non-stationary attack for at agent in the form with and

as a uniform random variable. The residuals (

14) (shown in Fig. 4-(Right)) at the attacked agents and are biased, respectively, over and , implying false alarm probabilities333The auto-regressive attack is given as an example of possible extension of the results to the case of non-stationary attacks, where the attack probabilities can be approximated by Lemma 5. approximately less than and .

Comparison with recent literature: next, we use the estimation and detection strategy in [47, 48] for comparison. Recall that from Remark 4, the distributed observer in [47, 48] is a double time-scale protocol, which requires many iterations of consensus between every two time-steps of system dynamics. Therefore, it needs much faster information sharing/processing rate as compared to the proposed protocol (3)-(4). The reason for choosing [47, 48] for comparison study is that double time-scale protocols make similar relaxed observability assumption as Assumption (ii) in Section 2.4 (irrespective of system rank-deficiency). This is in contrast to many exisitng single time-scale protocols, e.g., [56, 6, 11, 13, 4, 87, 76], which assume that the underlying system is observable in the neighborhood of each agent and/or is full-rank. In other words, the mentioned references generally require more network connectivity, and therefore, do not result in steady-state stable error over the given and networks in Fig. 4-(Left). We set the parameters in [47, 48] as in Table III (which seem to provide the best outcome).