Distorting an Adversary's View in Cyber-Physical Systems

09/12/2018
by   Gaurav Kumar Agarwal, et al.
0

In Cyber-Physical Systems (CPSs), inference based on communicated data is of critical significance as it can be used to manipulate or damage the control operations by adversaries. This calls for efficient mechanisms for secure transmission of data since control systems are becoming increasingly distributed over larger geographical areas. Distortion based security, recently proposed as one candidate for CPSs security, is not only more appropriate for these applications but also quite frugal in terms of prior requirements on shared keys. In this paper, we propose distortion-based metrics to protect CPSs communication and show that it is possible to confuse adversaries with just a few bits of pre-shared keys.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/25/2020

Distortion based Light-weight Security for Cyber-Physical Systems

In Cyber-Physical Systems (CPS), inference based on communicated data is...
02/04/2019

Deception-As-Defense Framework for Cyber-Physical Systems

We introduce deceptive signaling framework as a new defense measure agai...
09/16/2021

Blockchain for Trust and Reputation Management in Cyber-physical Systems

The salient features of blockchain, such as decentralisation and transpa...
05/06/2021

Exploiting Partial Order of Keys to Verify Security of a Vehicular Group Protocol

Vehicular networks will enable a range of novel applications to enhance ...
07/02/2018

Distributed Ledger Technology, Cyber-Physical Systems, and Social Compliance

This paper describes how Distributed Ledger Technologies can be used to ...
09/12/2019

Physical Layer Security in Multimode Fiber Optical Networks

Inverse precoding algorithms in multimode fiber based communication netw...
02/08/2020

BLCS: Brain-Like based Distributed Control Security in Cyber Physical Systems

Cyber-physical system (CPS) has operated, controlled and coordinated the...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

It is well recognized that wireless networking is essential to realize the potential of new CPS applications, and is equally well recognized that private and secure exchange of information is a necessary and not simply a desirable condition for the CPS ecosystem to thrive. For instance, personal health data in assisted environments, car positions and trajectories, proprietary interests, all need to be protected. We introduces a new approach to CPS security, that aims to distort an adversary’s view of a control system’s states.

Our starting observation is that information security measures (cryptographic and information theoretic secrecy), are not well matched to CPS applications as they impose unnecessary requirements, such as protecting all the raw data, and thus can cause high operational costs. Cryptographic methods rely on computational complexity: they require short keys, but high complexity at the communicating nodes (that can be simple sensors in some cases), and can impose a significant overhead on short packet transmissions, therefore increasing delay [1, 2, 3, 4]

. Information theoretic methods rely on keys: they have low complexity and do not add packet overhead, but require the communicating nodes to share large keys - every communication link needs to use a shared secret key (for a one-time pad) of length equal to the entropy (effectively length) of the transmitted data 

[5]. These costs accumulate rapidly given that large CPS applications can have dense communication patterns.

Instead, we propose a lightweight approach, that uses small amounts of key and low complexity operations, and builds around a distortion measure. To illustrate111Although we illustrate our approach for a specific simple example, it extends to protecting general system states., consider the following simple example of a drone flying motion inside a square, depicted in Fig. 0(a). The drone starts at any position, and moves between adjacent points in the grid. It regularly communicates its location to a legitimate receiver, Bob; a passive eavesdropper, Eve, wishes to infer the drone’s locations, and can perfectly overhear all the transmissions the drone makes. We assume the drone and Bob share just one bit of key, that is secret from Eve, and ask: what is the best use we can make of the key?

Using the one bit of shared key to protect the most significant bit (MSB) is not a good solution. As shown in Fig. 0(a), the adversary can discover the fake trajectory after a few time steps since this scheme can lead to trajectories that do not adhere to the dynamics or environment constraints. At this point, it can learn the real trajectory by flipping back the MSB (we assume that the used scheme is known to everyone). Similar attacks can be made if we use a one-time pad [5] using the same keys over time: as time progresses, more fake trajectories can be discovered and discarded.

(a)
(b)
Figure 1: Example of drone motion: (a) protection of the most significant bit (b) mirroring based scheme.

Conventional entropy measures also fail to provide insights on how to use the key. For instance, assume we label the squares in Fig. 0(a) sequentially row per row, and consider two cases: in case I, Eve learns that the drone is in one of the neighboring squares

, each with probability 1/2. For case II, Eve knows that the drone is in one of the squares

, again each with probability 1/2. Both cases are equivalent from an information security perspective since in both cases Eve’s uncertainty is a set of size 2 equiprobable elements and hence its entropy is 1. However, the security risk in each situation is different. For example, if Eve aims to take a photo of the drone, in the first case she knows where to turn her camera (squares and are close by) while in the second case, she does not (squares and are far apart).

Instead, we propose to use an Euclidean distance distortion measure: how far (in Euclidean distance) is Eve’s estimate from the actual location. We then propose encoding/decoding schemes which utilize the shared key to maximize this distance. We first consider a distortion measure averaged over time and trajectories as we formally define later. Note that if Eve had not received any of the drone transmissions, then the best (adversarial) estimate of the drone’s location at any given time is the center point of the confined region in Fig. 

0(a). Therefore, a good encryption scheme would strive to maintain Eve’s estimate to be as close to the center point as possible; and we achieve the maximum possible distortion, if, after overhearing the drone’s transmissions, Eve’s best estimate still remains the center point.

The following scheme can achieve this maximum distortion by using exactly one bit of shared secret key. When encoding, the drone either sends its actual trajectory, or a “mirrored” version of it, depending on the value of the secret key. The mirrored trajectory is obtained by reflecting the actual trajectory across a mirroring point in space; in this example, the mirroring point is the center point as shown in Fig. 0(b). Since Eve does not know the value of the shared key, its best estimate of the drone’s location - after receiving the drone’s transmissions - would be the average location given the trajectory and its mirrored version, which is exactly the center point. Our results in Section III extend this idea of mirroring to dynamical systems in higher dimensional spaces, and theoretically analyze the performance in terms of average distortion for a larger variety of distributions (with certain symmetry conditions).

Next, we consider a worst-case distortion-based metric. In this case, our security metric is “in the worst case, how far is Eve’s estimate from the actual location?” That is, the adversary’s distortion may be different for different time instances and different instances of the actual trajectory, and we are interested in the minimum among these. In Section IV we provide encryption schemes that are suitable for maximizing this distortion metric and show that with bits of shared key per dimension (i.e., for three dimensional motion), our schemes achieve near-perfect worst case distortion. Our main contributions are as follows:
We define security measures that are based on assessing the distortion: in the average sense over time and over data, and in the worst-case sense, providing such guarantees at any time and for any particular instances of data.
For the average distortion, we develop a mirroring based scheme which uses one bit of key and achieves the maximum possible distortion (equivalent to Eve with no observations) in some cases. We also discuss the cases where it is sub-optimal and analytically characterize the attained distortion.
For the worst case distortion, we design a scheme that uses bits of key per dimension and prove it achieves the maximum possible distortion (equivalent to Eve having no observations) when the inputs to the system are independent from the previous states.

Related Work. Secure data communication where the adversary has unlimited computational power is studied from the lens on information theory, most notably by Shannon [5] and Wyner [6]. The study of secure communication from a distortion angle is relatively new and is first studied by Yamamoto [7], where the goal is to maximize the distortion of an eavesdropper’s estimate on a message. Schieler and Cuff [8] later showed that, in the limit of an infinite block length () code, only bits of secret keys are needed to achieve the maximum possible distortion. Schemes for single shot communication were considered in [9] and exponential benefits for each additional bit of shared key were discussed. However, the above schemes do not directly translate to the scenarios where one has to communicate correlated temporal data like the state of a control system.

Secure communication in control systems is studied in [10, 11, 12, 13, 14, 15]. These works either provide distortion only at the steady state or use measures like differential privacy (does not use keys) and weak information theoretic security; they sometimes also assume that Eve gets different (a subset of the) information than Bob.

Ii System Model

Notation.

denote a column vector, and

for and ;

denotes the probability density function of a random vector

; for any random vector , we denote the mean vector and covariance matrix of by and respectively, thus for example, the mean and the covariance matrix of will be denoted by and respectively; for a matrix , denotes the transpose of and denotes the -th power of ; where .

System Dynamics. We consider the linear dynamical system,

(1)

where is the state of the system at time , is the input to the system at time , is the process noise, are the system observations, and is the observation noise. Let , and . Based on the initial and target states, the controller computes which moves the system from to .

Communication and Adversary Models. At each time instance the system transmits information about its state to a legitimate receiver, which is referred to as Bob, via a noiseless link. This situation occurs for example when Bob is remotely monitoring the execution of the system as in Supervisory Control And Data Acquisition (SCADA) system. A malicious receiver, Eve, is assumed to eavesdrop on the communication between the system and Bob and is able to receive all transmitted signals. Eve is assumed to be passive: she does not actively communicate but is interested in learning the system’s states from to . We assume that the system and Bob have a shared -bit key which they use to encode/decode the transmitted messages.

Inputs and States Random Process Model. We assume that both Eve and Bob are aware of the system model, the matrices and the statistics of noises. From the perspective of Eve, the input and output sequences have random distributions which depend on and the statistics of the noise. In addition to the process noise

, the joint distribution

depends on the initial and target states and the control law of the system. So, even in noiseless systems, and possess inherent randomness from Eve’s perspective due to her lack of knowledge about the control law and the initial and target states. In general, the control inputs can be dependent on the system states . However, knowing the marginal distribution of in noiseless systems can specify the marginal distribution of . This follows by noting that , where and are lower triangular block matrices with the th block submatrices, , being and repsectively, and . This implies that for noiseless systems, the marginal distribution of would imply the marginal distribution of for a given initial state and thus the marginal distribution of . For a given , the mean vector and covariance matrix of become and .

Encoding Model. The system transmits a packet at each time step . The -th transmitted packet can be a function of all previous observations and the shared keys, thus, , where is the encoding function used at time . We will denote by .
Bob/Eve Models of Decoding. Bob noiselessly receives the transmitted packets from the system, and decodes them using the shared key. Then, using the decoded information, it generates an estimate of the state transmissions of the system at times . We require Bob to decode losslessly (i.e., with zero distortion). Formally, , where is the Shannon entropy [5].

Similarly, Eve also receives all transmissions from the system. However, unlike Bob, she does not have the key . Therefore, Eve’s estimate of is , where is the decoding function used by Eve at time .

Distortion Metrics. We consider a distortion-based security metric which captures how far (in Euclidean distance) an estimate is from the actual value. More formally, for a given time instance and a transmitted codeword , we define

(2)

where (2) captures the distortion incurred by Eve’s estimate of . Equality in (a) follows because the best (minimizing) estimates of Eve at time are, This implies that Eve’s state estimation is the optimal one given the observations . In general, this state estimate is dependent on the time instance. In other words, unless it happens to be the optimal estimate, making a constant estimation of the state hoping that it matches the actual state at some time will lead to high distortion values. Because Bob is required to successfully decode - for a given realization of the key, the encoding function can only map one and that key realization to each value of . Therefore Eve realizes that only trajectories from a particular subset can be the true trajectory for a given : those are the ones which correspond to each key realization. The expectation in (2

) is in fact taken over the randomness in the key taking into account posterior probabilities given

. If Eve does not have observations, the expectation is taken over with prior distribution and will get .

As is a function of time and the transmitted sequence , we consider two overall distortion metrics: the average case distortion (denoted by ) where we take the expectation over all possible averaged out over time; and the worst case distortion (denoted by ) where we take the minimum over all possible and time instances.

(3)
(4)

Note that can be defined even when there is no prior distribution on . However, to provide a baseline comparison with the case when the adversary has no observations, we assume that always have a known prior distribution.

Design Goals. Our goal is to choose the encoding function, , so that Bob can decode loselessly while the distortion is maximized for Eve’s estimate. In addition, we seek to achieve this with the minimum amount of shared key . In absence of any observation by Eve, these distortions will be, and . These provide upper bounds as,

(5)
(6)

where (a) and (b) follow by noting that the trace of the conditional covariance matrix is a quadratic (convex) function in and therefore we can use Jensen’s inequality.

Iii Optimizing The Average Distortion

In this section, we assume that the control system in (1) is noise free, that is . Although our results can be extended to an arbitrary observable pair (A,C) in (1), to simplify the exposition we assume the state can be directly measured (C = I). We now discuss our proposed scheme that uses one bit of shared key and show how the achieved distortion compares to the upper bound in (5). As we show later (Corollary III.3), this scheme is optimal when the prior distribution on the state have a point of symmetry.

Mirroring Scheme. Let be the state vector , mirrored across a affine subspace , where and This scheme works as follows:

(7)

where is the shared bit. Since every affine subspace can be written in terms of orthogonal vectors, we assume that . It is easy to show that the mirrored point is and thus the encoding/decoding complexity of our scheme is .

Example. Consider where and . Then corresponds to reflecting across a line that passes through the origin with a angle.

The performance of our scheme is as follows.

Theorem III.1

(Proof in Appendix V-A) The mirroring scheme with matrices and allows Bob to perfectly estimate , and the distortion for Eve is,

(8)

where is the mirrored version of .

Assuming that is known, then Theorem III.1 provides a closed-form characterization of the achieved average distortion for any mirroring scheme with matrices and . Moreover, under some symmetry conditions on , the expression in (8) simplifies and gives insights on the maximum achievable distortion. This is shown in Corollary III.2.

Corollary III.2

(Proof in Appendix V-A) If the mirroring scheme matrices and in Theorem III.1 are selected such that , then (8) becomes,

(9)

Note that implies . We can interpret (9) as follows. Assuming that is met, then the distortion becomes . The achieved distortion therefore depends on the choice of : if then the maximum distortion can be achieved by our mirroring scheme. However, such a choice of may not be able to ensure that is met, as we will see in some of the following examples. One case for which satisfies and allows maximum distortion is when is symmetrically distributed around a point. We show this in the next corollary.

Corollary III.3

For a random vector , if there exists a point for which , , then .

Since and have the same distribution, they will have the same mean. This implies that . We then use the following mirroring scheme: , for . With this, we get , and thus where and . This implies, . Therefore the distortion is .

We now illustrate our results for few examples.
Example 1. Assume is distributed as Gaussian with mean and covariance matrix . Then for a zero initial state,

is also Gaussian distributed with mean

and covariance , as we assume the noise to be zero. A Gaussian random vector satisfies the conditions in Corollary III.3, and therefore we can get maximum distortion by setting and .

The next example is based on a Markov-based model for the dynamical system and uses the following lemma.

Lemma III.4

Consider the random vectors where the following conditions hold: 1) and 2) . Then for this case, , where . Therefore, by virtue of Corollary III.3, mirroring schemes can achieve the maximum distortion.

Example 2. Consider the following random walk mobility model. Let , and be its location at time , then,

One can see that these distributions satisfy the conditions in Lemma III.4. Therefore, one can set and , which will achieve maximum distortion of .

Example 3. Here we provide a numerical example which shows how our mirroring scheme performs for situations where we do not have an analytical handle on the state distributions. We assume the quadrotor dynamical system provided in [16]. The quadrotor moves in a 3-dimensional cubed space with a width, length and height of 2 meters, where the origin is the center point of the space. The quadrotor starts its trajectory from an initial point and finishes its trajectory at a target point after time steps, where the points are picked uniformly at random in . We assume that time steps, and that the continuous model in [16] is discretized with a sample time of seconds. We assume that the quadrotor encodes and transmits only the states which contain the location information (first three elements of the state vector ). The quadrotor is equipped with an LQR controller which designs the input sequence which minimizes while ensuring that is equal to the target state. We perform numerical simulation of the aforementioned setup: we run millions iterations, where in each iteration a new initial and target points are picked, and the resultant trajectory is recorded. Based on the recorded data, we consider different mirroring schemes and numerically evaluate the attained distortion. To facilitate numerical evaluations, the simulation space is gridded into bins with meters of separation, and the location of the drone at each trajectory is approximated to the nearest bin.

Figure 2: An illustration of some trajectories. The reflection plane is shown as a dashed-black line. One trajectory (solid-black) is shown along with its mirrored image (dotted-black).

Figure 2 shows some of the drone trajectories obtained from our numerical simulation. It is clear that not all trajectories are equiprobable, and therefore the distribution of is not uniform across all bins in space. However, the computation of shows the expected value of the position to be the origin. Moreover, since the motion of the drone is mainly progressive in the positive x-axis direction, reflection across the origin results in mirrored trajectories that are progressing in the opposite direction, and therefore are identified to be fake automatically. Therefore, mirroring across a point here is useless: the numerically computed distortion for this scheme is equal to zero.

Next we consider mirroring across the reflection plane shown in Figure 2, where and . As can be seen from the figure, the reflection plane is indeed an axis of symmetry for the distribution of the drones trajectories, and therefore is expected to provide high distortion values. We numerically evaluate the attained distortion using the scheme by using equation (8), which evaluates to . This is slightly less than .

Iv Optimizing The Worst Case Distortion

Figure 3: Vs Z for mirroring+shift based scheme with ; .

(a)
(b)
Figure 4: (a) Transparent shapes represent true values and solid shapes represent their respective mapping when two bit key is and respectively. (b) , as a function of number of keys for optimal choice of .

The expected distortion metric might not be well-suited for some applications (for example if an adversary wants to shoot a drone). In this case, the adversary’s estimate needs to be far from the actual state at all time instances. Therefore, a more appropriate metric would be to consider the worst case distortion for the adversary. Consider for example the scheme in Fig. 0(b). Here, the adversary’s estimate is always the center point and the maximum expected distortion is achieved. However, when the drone is very close to the center, its mirror image will also be close to the center. At this particular time instance, the adversary’s distortion will be very small and thus the adversary will essentially know the position.

In this section, we present an encryption scheme that attempts to maximize the worst case distortion for Eve. The scheme obfuscates the initial state such that, even if Eve optimally uses her observations and knowledge about the dynamics, her best estimate attains maximal distortion. We start by studying the case of single shot transmission (Theorems IV.2 and IV.3), which form the basis for maximizing the worst case distortion of a trajectory (Theorem IV.4).

Iv-a Building Step: Scalar Case

Consider the case where the system wants to communicate a single scalar random variable

to Bob by transmitting . The worst case distortion for Eve will be . Note that if Eve does not overhear

, Eve uses the minimum mean square error estimate (i.e., the mean value) as her estimate, and thus experiences a worst case distortion equal to the variance of

.

We first assume that , and thus, the worst case distortion can not be larger than by (6). We next develop our scheme progressively, from simple to more sophisticated steps. We will also use the following lemma.

Lemma IV.1

The variance of two real numbers and with probabilities and is given by .

Mirroring or Shifting. Reflecting around the origin (as proposed in Section III) is not suitable for maximizing : indeed, using Lemma IV.1, evaluates to when and attains limited distortion for small values of . Another scheme consists of “shifting” by a constant whenever the shared key bit is one. Differently, this scheme admits which decreases fast for large values of .

Shifting+Mirroring. We here combine both schemes in order to achieve a good performance for both small and large values of . We start from the case where we have bit of key and then go to the case .
We select a that determines a window size ( is public and known by Eve). The encoding function is

We note that there is one particular value of , , which we do not transmit. Since this is of zero probability measure, it can be safely ignored. Given , there are two possibilities for : for ; for ; for . Using the fact that , we can calculate the posterior probabilities and use Lemma IV.1 to compute . Fig. 3 plots for . The worst case distortion in this case becomes , which is the best we can hope for if we have only one bit of shared key. This follows because for any mapping from to , a transmitted symbol can have at most two pre-images (as Bob needs to reliably decode with one bit of key), and if one of these is , then no matter what the second one is, the distortion corresponding to will be at most . Equality occurs when the second pre-image of is . Note that our scheme also maps to (for ).
. For , we use the following encoding:

where the optimal value of the constant depends on the number of keys we have, is the decimal equivalent of a binary string of length , and is such that is an integer and for . Intuitively, if then for half of the keys, we reflect across origin and for the other half we do nothing; if , we divide this window of size into equal size windows and shift a point from one window to another by jumping (in decimal) windows. An example for is shown in Fig. 3(a) for the key values and . Fig. 3(b) plots as a function of the number of keys . Using and we achieve which is very close to , the best we could hope for.

Theorem IV.2

A Gaussian random variable with mean and variance can be near perfectly ( times the perfect distortion) distorted in the worst case settings by just using three bits of shared keys.

Generate the random variable as and encrypt it using key bits and the previously described scheme. For we have . Remark: We optimized the parameter of our scheme assuming Gaussian distribution. For other distributions, the optimal choice of and the corresponding worst case distortion would be different.

Iv-B Vector Case and Time Series

Theorem IV.3

(Proof in Appendix V-B) For a Gaussian random vector with mean and a diagonal covariance matrix we can achieve within of the optimal by using bits of shared keys.

This theorem uses our 3-bit encryption for each element in the vector. Assume now that this vector captures the probability distribution of the initial state of dynamical system; by encrypting this state we can guarantee the following.

Theorem IV.4

(Complete Proof in Appendix V-C) Using bits of shared keys we can achieve within of the optimal for the dynamical systems (1) with ,

, singular values of

more than , and initial state , where is diagonal covariance matrix, and and are independent of .

Remark: Although the independence assumption on the inputs is rather restrictive, the result serves as a stepping stone towards understanding general cases.

The system transmits where is the encoding in Theorem IV.3, and

Bob can decode using and . Then:

Eve’s distortion is calculated in Appendix V-C.

Complexity: per time for both encoding & decoding.

V Appendices

V-a Proof of Theorem iii.1 and Corollary iii.2

We start by computing . Note that given a sequence of transmitted symbol there are two possible values of sequence of message symbols which are and , where is the image of across the affine subspace given by .

With this, the posterior probability of given i.e., will be equal to . We note that , where . Then ,

Now, is the transmitted symbols if and key was zero or if and key was one. So . Thus ,

which proves (8). Again, if we can choose ’s and ’s such that,

the distortion becomes,

which proves (9).

V-B Proof for Theorem iv.3

Let the shared key is where all

’s are i.i.d. and uniformly distributed in

. Let us also assume that , where each . Similar to the scheme for scalar case, we create a random vector where,

and encode using key as in the case of a scalar for all . Thus, the distortion will be,

where . And since is the expected distortion even when the adversary has no observations, and as we can not beat this by (6), this is optimal.

V-C Proof for Theorem iv.4

Distortion at the adversary’s end. Based on the coding scheme we can see that the adversary get by just subtracting from for . So the adversary’s information is given by following set:

Thus,

Let’s first compute ,