Security against false data injection attack in cyber-physical systems

07/31/2018 ∙ by Arpan Chattopadhyay, et al. ∙ University of Southern California 0

In this paper, secure, remote estimation of a linear Gaussian process via observations at multiple sensors is considered. Such a framework is relevant to many cyber-physical systems and internet-of-things applications. Sensors make sequential measurements that are shared with a fusion center; the fusion center applies a certain filtering algorithm to make its estimates. The challenge is the presence of a few unknown malicious sensors which can inject anomalous observations to skew the estimates at the fusion center. The set of malicious sensors may be time-varying. The problems of malicious sensor detection and secure estimation are considered. First, an algorithm for secure estimation is proposed. The proposed estimation scheme uses a novel filtering and learning algorithm, where an optimal filter is learnt over time by using the sensor observations in order to filter out malicious sensor observations while retaining other sensor measurements. Next, a novel detector to detect injection attacks on an unknown sensor subset is developed. Numerical results demonstrate up to 3 dB gain in the mean squared error and up to 75 probability under a small false alarm rate constraint, against a competing algorithm that requires additional side information.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Cyber-physical systems (CPS) combine the cyber world and the physical world via seamless integration of sensing, control, communication and computation. CPS has widespread applications such as networked monitoring and control of industrial processes, intelligent transportation systems, smart grid, and environmental monitoring. Most of these applications critically depend on reliable remote estimation of a physical process via multiple sensors over a wireless network. Hence, any malicious attack on sensors can have a catastrophic impact. We focus on false data injection (FDI) attacks which can be characterized as integrity or deception attacks where the attacker modifies the information sent to the fusion center [2, 3]. This is in contrast to a denial-of-service attack where the attacker attempts to block resources for the system (e.g., wireless jamming attack to block bandwidth usage [4]). FDI attacks modify the information either by breaking the cryptography of the data packets or by physical manipulation of the sensors (e.g., putting a heater near a temperature sensor).

The problem of FDI attack and its detection has received recent attention. In [5], conditions for undetectable FDI attack are developed, and the minimum number of sensors to be attacked to ensure undetectability is computed. In [6], a linear deception attack scheme that can fool the popular detector is provided. Later, a new detection algorithm against such linear deception attacks is designed in [7], where observations are available from a few known safe sensor nodes. Efficient attack detection and secure estimation schemes for linear Gaussian systems under cyber attack on a static, unknown sensor subset have been developed in [8], but this detector is not designed to tackle the linear deception attack of [6]. The optimal attack strategy to steer the control of CPS to a target value is provided in [9], while ensuring a constraint on the attack detection probability. Centralized and decentralized attack detection schemes for noiseless systems have been developed in [10]. Coding of sensor output for efficient attack detection using detector is proposed in [11]. Attack-resilient state estimation of a dynamical system with only bounded noise has been discussed in [12]. Sparsity models to characterize the switching location attack in a noiseless linear system and state recovery constraints for various attack modes have been described in [13]. Attack detection, secure estimation and control in the presence of FDI attack for power systems are addressed in [14, 15, 16].

In contrast to the prior literature, our current paper addresses the problem of attack detection and secure remote estimation of a linear system with unbounded Gaussian noise, in the presence of an FDI attack (which could be the FDI attack of [6]

), when no safe sensor subset is available. In this paper, we make the following contributions. (i) We develop a learning algorithm that learns a Kalman-like filter over time for secure estimation in presence of FDI attacks. The filter gain matrix is updated iteratively over time via simultaneous perturbation stochastic approximation (SPSA), in order to minimize a combination of the estimation error in the absence of attack and the anomaly in estimates returned by various sensor subsets. The convergence of the learning algorithm is proved by exploiting the properties of the attack scheme, the filtering scheme and SPSA. To our knowledge, this is a novel contribution to the adaptive filtering literature as well. This algorithm later motivates another low-complexity heuristic which offers up to

 dB improvement in mean squared error (MSE) against another competing algorithm from [7] that additionally requires a subset of safe sensors. The algorithms are extended to handle random packet loss between the sensors and the fusion center. (ii) We propose an algorithm for FDI attack detection, that can offer up to improvement in attack detection probability under a small false alarm constraint, against the detection scheme of [7] which additionally requires a subset of safe

sensors. Our detection algorithm detects an attack via anomaly detection between estimates made by various sensor subsets. We also provide a learning scheme for optimization of our detector subject to a constraint on false alarm.

The rest of the paper is organized as follows. Preliminaries are discussed in Section II. The secure estimation algorithm to combat the FDI attack is described in Section III. The attack detection scheme is described in Section IV. Numerical results are provided in Section V, followed by the conclusions in Section VI. All proofs are provided in the appendices.

Ii Background

Throughout this paper, bold capital letters, bold small letters and capital letters with caligraphic font will denote matrices, vectors and sets respectively.

Ii-a Sensing and remote estimation model

We consider a set of smart sensors , which are sensing a discrete-time stochastic process . The sensors send their observation directly to a fusion center via error-free wireless links so that the fusion center can estimate at each time . The physical process (where ) is a linear Gaussian process that evolves according to the following equation:

(1)

where is a zero-mean Gaussian noise vector with covariance matrix , and is i.i.d. across . The scalar or vector observation made by sensor  is given by the following observation equation if sensor  is used in sensing:

(2)

where is a matrix of appropriate dimension and is a Gaussian observation noise with covariance matrix . Observation noise is assumed to be independent across sensors and i.i.d. across time. The pair is assumed to be stabilizable and the pair is assumed to be detectable for all .

The goal of the fusion center is to minimize the time-average expected mean squared error (MSE) in estimation:

(3)

If all sensors send their observations to the fusion center in real time, then the system is equivalent to a single sensor and a remote estimator with real-time communication. The sensing and observation models can be rewritten as:

(4)

where is the observation noise and (also called ) is the complete observation vector. The minimum mean-squared error (MMSE) estimator in this case is a linear filter called Kalman filter (see [17]):

(5)

where is the MMSE estimate and is the the error covariance matrix for the estimate , provided that the iteration starts from . It has been shown in [17] that exists and is the unique fixed point to the iteration called the Riccati equation. Another quantity of interest is the innovation sequence ; it was proved in [17] that is a zero-mean Gaussian sequence which is pairwise independent across time and whose covariance matrix in the steady state is .

Ii-B False data injection (FDI) attack

Any unknown subset of sensors can be under attack at time . Any sensor  sends an observation according to the following attack equation:

(6)

where is an error term injected by the attacker. The goal of the attacker is to insert the false data sequence so as to maximize the MSE given by (3). If for all , then the attack is called a static attack, otherwise the attack is called a switching location attack. We assume that, at most sensors can be under attack at a time.

The detector: Since under steady state when there is no attack, a natural technique (see [6], [7]) to detect any FDI attack is to detect any anomaly in . This is done by observing the innovation sequence over a pre-specified window of time-slots, and declaring an attack at time if , where is a pre-specified threshould used to tune the false alarm probability.

Figure 1: False data injection attack in remote estimation.

In [6], a linear injection attack to fool the detector is constructed; at time , the malicious sensor(s) modifies the innovation vector as , where is a square matrix and is i.i.d. Gaussian. It was shown in [6] that where . Hence, if we ensure , then will have the same distribution as , and hence the detection probability of the detector will remain unaffected even under the attack. The estimation error is maximized when the attacker just inverts the innovation sequence (i.e., when and ). However, the authors of [7] proposed another efficient scheme to detect such attack. The detection algorithm in [7] assumed the presence of a few safe sensors; an attack is detected by exploiting any anomaly between the observations made by the safe sensors and other sensors. The assumption of the existence of a set of safe sensors is restrictive, and the design of efficient attack detection and secure estimation schemes in the absence of such safe sensors is the topic of our current paper.

Stationary attacks: In this paper, we consider a special class of attack schemes called stationary attacks, where, at time , the injected error is independently chosen from a distribution

. Note that, this attack class contains the class of linear attacks where the attacked sensor subset is either static, or varying with time according to a i.i.d. process or a time-homogeneous Markov chain.

Iii Secure estimation

In this section, we will provide an algorithm to obtain a reliable estimate in the presence of FDI attacks, without explicitly detecting the malicious sensor subset. This algorithm is useful when it is not possible for the system administrator to take necessary measure even upon the detection of an attack (e.g., if a heater is deliberately kept by an attacker near a temperature sensor, it may not be always be possible to physically remove the heater).

Iii-a Formulation as an on-line optimization problem

Note that, any sensor observation is ignored if the corresponding entries in the Kalman gain matrix in (5) are set to . Ideally, one should compensate for the errors introduced by the malicious sensors, and if done so, the anomalies in estimates from various sensor subsets will be small. However, since the estimation error depends collectively on the process noise, the observation noise and the noise injected by the attacker, a reasonable technique for reliable estimation would be to dynamically learn an optimal gain matrix that minimizes the anomalies in estimates returned by various sensor subsets subject to a constraint on the MSE in the absence of attack.

Now, let us assume that the attack is a stationary attack (a static attack is a special case). We restrict the discussion to the class of linear estimators. The estimator we consider is similar to the Kalman filter in (5), except that the Kalman gain matrix

is learnt via a stochastic gradient descent scheme to solve the following constrainted optimization problem:

(7)

Here represents the estimate of the error covariance matrix if a gain matrix sequence is used for estimation, when there is no attack. Note that, can be calculated iteratively. The objective function captures the anomaly between the estimates and coming from two different sensor subsets and , when the restrictions and of to the subsets and are used as gain matrices applied to the observations coming from these sensor subsets. The constraint can be chosen to be a certain multiple of the time-average MMSE of the system under no attack (which can be computed by running an optimal Kalman filter).

The above constrained problem can be relaxed by a Lagrange multiplier as follows:

(8)

where is the single-stage cost at time .

The following result tells us how to choose .

Lemma 1.

If there exists a such that the optimal solution of the unconstrained problem (8) under meets the constraint in (7) with equality, then the optimal solution of the unconstrained problem (8) under is optimal for the constrained problem (7) as well.

The proof is similar to that of [18, Theorem ].

Iii-B The proposed learning algorithm

In this subsection, we propose an iterative learning algorithm to solve (7). Our proposed algorithm is based on multi-timescale stochastic approximation (see [19, Chapter ]). Note that, Lemma 1 suggests solving (8) for various values of and then choosing a suitable that meets the constraint in (7) with equality; we can solve (8) in an inner loop and then vary in an outer loop to converge to . This is achieved by varying at a slower timescale as the iterations progress, and solving the unconstrained problems at faster timescales.

Let us consider (8) for a fixed . Due to the unavailability of any closed-form expression of the cost function in (8), direct computation of a gradient estimate w.r.t. is not possible. However, we will minimize the cost function (8) by iteratively learning an optimal gain matrix over time, via a stochastic gradient descent (SGD) algorithm. Hence, we employ simultaneous perturbation stochastic approximation (SPSA, see [20]) for this optimization problem. In SPSA, all elements of are perturbed simultaneously by a random vector in two opposite directions, the single stage cost function is evaluated for these two perturbed gain matrices and , and a noisy estimate of the gradient of the single stage cost is obtained from this. This noisy estimate of the gradient is then used in SGD for asymptotically minimizing .

The proposed SEC (secure estimation) algorithm uses three positive sequences , and that satisfy the following conditions: (i) , (ii) , , (iii) , (iv) and (v) .

The first two conditions are standard for stochastic approximation (see [19]). The third condition ensures that the gradient estimate is asymptotically unbiased. The fourth condition is a technical condition required for the convergence of SPSA (see [20]). The fifth condition ensures that, in our proposed algorithm, the iteration using step size runs at a faster timescale than the iteration using step size .

The algorithm requires a small . Let us define where

is the maximum eigenvalue of the matrix

. The algorithm also requires a large number .

We define the gain matrix which is the same as except that the entries corresponding to the sensors from are set to , and vice versa for the definition of .

Let and denote the estimates obtained by using Kalman filters with constant gain matrices and applied to the observed sequence over the time period . Let us similarly define and . In particular, is computed recursively as follows for :

(9)

and setting .

The SEC algorithm is described below.

 The SEC algorithm   Input: , , , , ,

Initialization: , , , for all .

For :

  1. Collect the observation vector from all sensors.

  2. Compute the estimate .

  3. Pick a random matrix

    such that each entry in is chosen from independently with probability .

  4. Compute and .

  5. Compute , , and for all subsets of size , using (9) or similar update equations.

  6. Compute and similarly . Compute .

  7. Compute , and similarly .

  8. For each entry , do the following SPSA update:

    (10)

    and project onto in order to obtain .

  9. Update .

end  

Note that, is projected onto a compact interval to ensure stability of the iteration (10). Similarly, is projected onto a compact interval .

The spectral radius of is maintained to be less than to ensure that the error covariance matrix remains bounded. A standard result says that, the covariance matrix of the estimation error varies according to the following recursive equation:

(11)

when the gain matrix is chosen arbitrarily (not optimally as in (5)). Step  of SEC is motivated by the above expression.

Equation (10) is a SGD algorithm where a noisy estimate of the gradient of w.r.t. is used instead of the true gradient. The noisy gradient estimate is . Note that, SPSA requires computation of and only for two different gain matrices and ; this allows us to avoid unnecessarily huge computation involved in gradient estimation using coordinatewise perturbation.

Iii-C Convergence of SEC

Let us consider the problem in (8). We define a function which is the time-averaged MSE (and also the limiting MSE) achieved if a Kalman-like linear estimator is used with a constant gain matrix for all , when there is no attack. Also, for a constant gain matrix used at all time , let us define by the limiting (and also time-average) value of the anomaly , when there is a possible attack. We define . Let us also define .

Assumption 1.

For any fixed , the function is Lipschitz continuous in .

Let us recall that,

Since the spectral radius of is less than or equal to , the iteration converges almost surely, and hence the MSE under SEC is uniformly bounded across sample paths. If a constant gain matrix is used, it is still easy to prove that is Lipschitz continuous in . Thus, Assumption 1 is specifically required for .

Theorem 1.

Under SEC with a fixed (i.e., for all ) and Assumption 1, the iterates converge almost surely to the set , provided that each such stationary point belongs to the interior of .

Proof.

See Appendix A. ∎

Theorem 1 says that converges to the set of local minima of (i.e. for a given ) in case there is no saddle point that is not a local minimum.

However, SEC varies at a slower timescale in order to solve the constrained problem (7). The next theorem provides our main result on the convergence of SEC in its original form, to the desired solution set of the constrained problem (7).

Let denote the closure of the convex hull of the set . We define by the collection of closed connected internally chain transitive invariant sets of the following differential inclusion (reference [21]):

Theorem 2.

Under SEC and Assumption 1, the sequence almost surely converges to the set .

Proof.

See Appendix B. ∎

The proof of Theorem 2 suggests that the update equation asymptotically behaves like a stochastic recursive inclusion (see [19, Chapter ]), where at each time, the iteration involves a set . However, the proof requires that the set should be convex and compact, which may not be true in general. Hence, we consider the set , which results in a weaker result on the convergence of SEC in Theorem 2.

Iii-D Complexity issues and a heuristic

The SEC algorithm requires us to compute and also and . The matrix can be computed iteratively using (11). However, ideally we should use instead of , and instead of , in the SPSA update (10). Computing will require us to run the iteration (11) times at iteration ; i.e., the computational complexity of grows with . But, in the proof of Theorem 1, we can show that and . This allows us to use and matrices in (10), whose computation can be done recursively as in step  of SEC. Thus, (10) is not a standard SPSA acheme.

However, the computation of for all of cardinality requires computation at time ; this happens because at each time we obtain two new matrices and , and the estimates for all need to be computed using constant gain matrices and applied to the histrory of observations . This restricts the possibility of using SEC in practical applications, since IoT applications will require low-complexity solutions.

One possible heuristic to alleviate this problem is to update only up to some fixed time steps, and afterwards use this constant matrix for estimation for ever. However, this results in the loss of the very essence of SEC. SEC does observation driven gain adjustment to tackle FDI attacks; if an attack starts beyond time , this modified security algorithm will miss the attack. Another possible strategy could be to run the SPSA sequence for time steps, and then repeat the procedure for time with the same step size sequence , but on the observation sequence . This can at best ensure convergence within a neighbourhood of the solution set. However, choice of a large still results in large computational complexity.

Now we present an alternative low-complexity version of SEC called SEC-L, which recursively computes for all of cardinality , which are (suboptimal) proxies for .

SEC-L algorithm: This algorithm is same as SEC, except that, at step , we compute: . The estimates for all subsets of size are also calculated similarly.∎

Clearly, SEC-L reduces the complexity for computing and to , since SEC-L does not involve filtering over the entire observation history.

Iii-E Packet loss

So far we have assumed that all observations reach the fusion center without error. But, in practice, the links between a sensor and the fusion center can be unreliable, and hence some of the observation packets sent from the sensors to the fusion center might be lost. Let us assume that the observation packet sent by sensor  at time is lost with a known probability ; packet loss is assumed to be i.i.d. across time and independent across sensors. In this case, at each time, the fusion center will receive an observation vector of variable size depending on the lost observation packets. However, since the fusion center knows the sensors from which observations are received at the current time step, the fusion center can simply restrict the observation matrix at time to the set of observed sensors, and update only that part of (via SPSA) which corresponds to the observed sensors. However, since various submatrices of the gain matrix are updated at various time instants, we need to update them using asynchronous stochastic approximation ([19, Chapter ]) in the SPSA step. This requires us to maintain a counter for each sensor subset ; is the number of times till when observations came only from sensor subset . Now, SEC can be adapted to this case in the following way. At time , let the observed sensor subset be . Then, estimation is done by the following rule:

and the update equation is modified as:

where is the restriction of to the sensor subset , and is the restriction of to subset . It is easy to adapt our convergence proofs for Theorem 1 and Theorem 2 to this modified algorithm.

Iv Attack detection

In this section, we develop an efficient detection algorithm for the FDI attack on a static sensor subset, though the proposed detector can be heuristically used to detect a general stationary attack. This algorithm is only meant for attack detection, with the assumption that necessary measures will be taken if an attack is detected. The detection problem is mathematically represented as a hypothesis testing problem on the two hypotheses:

  • : there is no attack; ,

  • : there is an attack; for some

with observation sequence . Note that, due to the complicated dynamics involved in Kalman filtering, it is difficult to carry out standard hypothesis testing schemes. Also, due to the unavailability of any known safe sensor, we cannot compare the innovation sequence against any reliable quantity. However, if a subset of sensors is under attack, then the process estimate obtained only from these sensor observations is likely to have high error, and hence should be significantly different from the estimates made by other sensor observations. We exploit this fact to develop an attack detector.

Let us denote by the process estimate returned by an optimal Kalman filter that uses observations made by the sensor subset only (see (5)). Let us denote the covariance matrix of the anomaly by under steady state, when there is no attack. Clearly, if there is no attack, then, under steady state, since the error and are zero-mean Gaussian. Hence, one can detect an attack by checking whether is coming from the distribution for each subset of size . The covariance matrix can be pre-computed by simulating the process beforehand.

The algorithm requires a positive integer as an observation window, and a threshold for attack detection.

The algorithm to detect and localize an attack is given below. We call this algorithm DETECT.

 The DETECT algorithm  Input: , .
Off-line pre-computation: Simulate off-line using (1). In this simulated process, compute and via optimal Kalman filtering for each sensor subset of size , by suitable adaptation of (5). Compute .

Attack detection in the physical process:

For

  1. Use the optimal Kalman filter (5) to compute .

  2. Compute and via optimal Kalman filtering for each sensor subset of size , by suitable adaptation of (5). Compute for each sensor subset of size .

  3. Declare that an attack has happened if:

  4. If an attack is declared, identify the following maximizing subset as the attacked sensor subset:

end  

The detection step is similar to the standard

test used to check whether a sequence of random vectors are coming from a desired Gaussian distribution, except that this test is conducted on all possible subsets of size

, and hence the operation is needed. The false alarm probability can be controlled via selection of the threshold .

Now we provide a learning scheme to find the optimal for a given target on the false alarm probability. The false alarm probability under DETECT is defined as:

In order to satisfy the constraint with equality, we need to choose an optimal threshold in DETECT. The optimal can be computed by using the following LEARN algorithm (a stochastic approximation step) in the off-line pre-computation phase of DETECT.

The LEARN algorithm requires a positive sequence such that and . LEARN also requires for all subset of size as input. The algorithm simulates the process off-line, and maintains a detector as in DETECT with an initial threshold . Let us denote the number of false alarm triggers made by this detector up to time in the simulated process by .

  The LEARN algorithm   Input: , , for all subsets of size . Initialization: ,

For in the simulated process:

  1. Check if .

  2. If this condition is satisfied, update , else .

  3. Update the threshold .

end

  

The update scheme is a stochastic approximation algorithm (see [19]). The goal of the update scheme is to meet the false alarm probability constraint with equality. If a false alarm is triggered at time in the simulation, is increased; else, is decreased. By the theory of [19], it is straightforward to show that almost surely, and . is a large positive number such that . The projection operation is used to ensure boundedness of the iterates.

V Numerical results

In this section, we numerically demonstrate the efficacy of SEC-L and DETECT. For attack detection, we compare the performance of DETECT with the traditional detector, and also with the detector of [7]. The algorithm of [7] assumes the availability of a set of safe sensors (SAFE). In SAFE, at each time , observations are collected from all sensors, but the safe sensor observations are used by the Kalman filter to generate an initial estimate. Then, the observations from potentially unsafe sensors are passed through a detector, and those observations are used in a Kalman filter to obtain if and only if the detector is not triggered.

For secure estimation, we also compare the performance of SEC-L with a blind Kalman filter oblivious to cyber-attack (KALMAN), and a Kalman filter which perfectly knows (genie-aided) the malicious sensors and ignores their observations (we call this estimator GENIE). Note that, we do not investigate the performance for the original SEC algorithm in order to avoid the huge computational burden that grows with time, but we recall that SEC-L is motivated by SEC.

In each case, we consider an independent realization of a system with the following parameters. The state transition matrix is taken as a randomly generated stochastic matrix multiplied by . State noise covariance matrix is chosen to be a positive semidefinite (PSD) matrix such that where . The matrix is also chosen similarly. Observation matrix is chosen randomly, with ; the observation made by each sensor is a -dimensional column vector. We also assume that at most sensors can be attacked at a time. The attacker inverts the sign of the innovation vectors coming from the malicious sensors.

We consider two situations: (i) the attacker knows the estimate made by the remote estimator, and (ii) the attacker can only run a Kalman filter in order to guess the estimate. Let us denote a matrix which is same as , except that the entries corresponding to the benign (not malicious) sensors at time are set to . Similarly, let be same as , except that the entries corresponding to the benign sensors are .

When the attacker knows the estimate made by the remote estimator, the received observation at time at the remote estimator becomes ; this is equivalent to inverting the sign of the innovation vector. We call the corresponding variants of SEC-L, KALMAN and GENIE by SEC-L-K, KALMAN-K and GENIE-K (with knowledge of the estimate).

However, if is not known to the attacker, then the attacker can run a Kalman filter on the received observations at the estimator, in order to maintain a proxy for . In this case, the the received observation at time at the remote estimator is . We call the corresponding variants of SEC-L, KALMAN and GENIE by SEC-L-NK, KALMAN-NK and GENIE-NK (no knowledge of estimate).

Figure 2: Performance of SEC-L under static attack. , , , , , , .
Figure 3: Performance of SEC-L under switching location attack. , , , , , ,