Real-time Wireless Transmitter Authorization: Adapting to Dynamic Authorized Sets with Information Retrieval

11/04/2021
by   Samurdhi Karunaratne, et al.
0

As the Internet of Things (IoT) continues to grow, ensuring the security of systems that rely on wireless IoT devices has become critically important. Deep learning-based passive physical layer transmitter authorization systems have been introduced recently for this purpose, as they accommodate the limited computational and power budget of such devices. These systems have been shown to offer excellent outlier detection accuracies when trained and tested on a fixed authorized transmitter set. However in a real-life deployment, a need may arise for transmitters to be added and removed as the authorized set of transmitters changes. In such cases, the system could experience long down-times, as retraining the underlying deep learning model is often a time-consuming process. In this paper, we draw inspiration from information retrieval to address this problem: by utilizing feature vectors as RF fingerprints, we first demonstrate that training could be simplified to indexing those feature vectors into a database using locality sensitive hashing (LSH). Then we show that approximate nearest neighbor search could be performed on the database to perform transmitter authorization that matches the accuracy of deep learning models, while allowing for more than 100x faster retraining. Furthermore, dimensionality reduction techniques are used on the feature vectors to show that the authorization latency of our technique could be reduced to approach that of traditional deep learning-based systems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 5

11/03/2017

Deep Learning-Based Dynamic Watermarking for Secure Signal Authentication in the Internet of Things

Securing the Internet of Things (IoT) is a necessary milestone toward ex...
10/02/2018

Cloud Chaser: Real Time Deep Learning Computer Vision on Low Computing Power Devices

Internet of Things(IoT) devices, mobile phones, and robotic systems are ...
05/31/2019

IoT Network Security from the Perspective of Adversarial Deep Learning

Machine learning finds rich applications in Internet of Things (IoT) net...
10/17/2021

Low-Precision Quantization for Efficient Nearest Neighbor Search

Fast k-Nearest Neighbor search over real-valued vector spaces (KNN) is a...
05/20/2019

A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices

Developing deep learning models for resource-constrained Internet-of-Thi...
05/11/2021

Towards a Model for LSH

As data volumes continue to grow, clustering and outlier detection algor...
11/16/2021

Mathematical Models for Local Sensing Hashes

As data volumes continue to grow, searches in data are becoming increasi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

With the rapid proliferation of the Internet of Things (IoT), the task of securing IoT networks has become more challenging. Wireless devices in these networks such as sensors are typically constrained by their power and computational capability, rendering traditional cryptography-based authentication systems unsuitable. To address this, passive Physical Layer Authentication (PLA) has been proposed since it does not impose any overhead on the transmitter[1]. To identify transmitters, PLA uses channel state information and fingerprints embedded in transmitted signals due to hardware impairments.

Typically such an authentication system needs to differentiate among transmitters in the authorized set while rejecting unauthorized transmitters (outliers). Since the unauthorized set is practically infinite, this problem has been posed as an open-set classification, as opposed to closed-set classification where all classes are known. Recently, a number of efforts have evaluated open set classification models based on deep learning (DL) in this regard [2] [3]. They have become the state-of-the-art in PLA, owing to reaching high accuracy while being reasonably robust in the face of channel variations [3].

To the best of our knowledge, these authentication systems have all been evaluated with a static authorized set, meaning that the authorized set of transmitters was assumed to be fixed during training, testing and deployment. However, in most practical situations, needs change after deployment, resulting in changes to the authorized set: some authorized transmitters might need to be invalidated while others might need to be added. For example, a malfunctioning sensor in an IoT network might need to be replaced with a new sensor. In such cases, it is critical that the authentication system be adapted quickly to the updated authorized set to avoid long down-times. Despite the existence of efficient strategies for retraining DL models, they are still too time-intensive for critical real-time applications like authorization, especially in situations where high availability is key.

In this paper, we propose to use similarity search techniques used in information retrieval applications for open set transmitter authorization. The neural network (NN) of a DL-based authenticator is used to extract feature vectors from a training dataset consisting of authorized, and possibly unauthorized signals. Using the feature vector of each signal sample as its RF fingerprint, we formulate the task of authenticating a query signal as a nearest-neighbor search over the database of RF fingerprints. Since the inference latency associated with an exact nearest-neighbor search is too prohibitive for real-time authentication, locality-sensitive hashing (LSH) is used to partition the database, allowing for a much faster approximate nearest neighbor (ANN) search to be performed. This authorization scheme by design allows for new authorized transmitters to be added by simply indexing signal samples from those transmitters into the database. Removing authorized transmitters could be accommodated without requiring any changes to the database. Our results show that the proposed LSH scheme is able to achieve retraining times orders of magnitude lower than DL models, with a negligible impact on outlier detection accuracy and inference latency.

Several previous works have used hashing methods to solve the open-set face recognition problem: in

[4], the authors paired LSH with fully-connected neural networks, but their approach differs significantly from ours since the purpose for using LSH was for model selection and not for nearest-neighbor search. The closest approach to ours is in [5], where they used LSH to identify most similar faces and thereby solve open-set face identification. However, neither of these approaches considered a dynamic authorized set.

The rest of the paper is organized as follows: we start by formulating the problem in Section II. Section III discuses how state-of-the-art DL models could be adapted to changes in the authorized set. Section IV presents our LSH-based authorization scheme. An empirical validation of the proposed methods is included in Section V. Section VI concludes the paper.

Ii System Model and Problem Formulation

Fig. 1: System model: must determine whether the received signal originated from an authorized transmitter in , or from an unauthorized transmitter in , some of which may be known to .

We consider a finite set of transmitters that are authorized to access a system through receiver . The signal received at when some transmitter sends a set of symbols is ; models the channel effect, as well as the transmitter fingerprint imprinted on by due to the variability of its internal circuitry. The authentication problem can then be formulated as the following binary hypotheses test: based on , should determine whether belongs to the authorized set () or to the set of outliers (). This is visualized in Fig. 1.

An additional set , where , of known outliers may be used to improve the outlier detection [3]. So typically, a dataset of signal samples captured from transmitters in and a similar dataset captured from transmitters in will be used during training to assist the outlier detector to differentiate between authorized and non-authorized transmitters.

Our task is to adapt to a change in after deploying the authentication system, as quickly as possible. Denote by , and , respectively, the initial value of , and . Then, some set of transmitters could be added to or some set could be removed from and added to (although both an addition and removal from could happen, this could be thought of as an addition followed by a removal).

Iii Adapting deep-learning based classifiers

In [3], we explored several neural network architectures that could be used for the authentication problem such as Disc, DClass and OvA. In this section, we demonstrate how each of these architectures could be adapted to accommodate changes in , without entirely retraining the underlying model from scratch.

Disc DClass OvA
TABLE I: Adapting DL models to additions to . Dashed-boxes indicate the base model.

The high-level architecture of Disc, DClass and OvA are given in Table I (within dashed-boxes), where each could be broken into three building blocks: input, feature extractor and output. The input and feature extractor blocks are similar in all three architectures. In Disc, the output block produces a scalar output through a sigmoid activation indicating its binary authentication decision. OvA has parallel output blocks, each identical to the output block in Disc, and where the -th block is tasked with independently determining whether the input signal belongs to . DClass has one output block with outputs emerging through a softmax activation: the first outputs correspond to authorized transmitters while the last output corresponds to outliers.

Adding transmitters to the authorized set requires a modification of the output block in some form for all three architectures, as summarized in Table I. If is the set of newly added transmitters to , in the case of OvA, this modification could be achieved by adding more output blocks in parallel, and retraining the new output blocks while keeping the rest of the NN frozen. Since there is only a single scalar output block in Disc, we could simply retrain that output block. With DClass, a similar approach to Disc is possible, where a new output block with outputs could be trained; however, a more efficient approach would be to utilize the cascaded architecture shown in Table I. First we train a secondary network, using as the authorized set, with the same input and feature extractor blocks as the original network, but with a new output block with outputs. A query signal is then judged to be unauthorized only if it is rejected by both NNs. The transmitter-level granularity of OvA and DClass output blocks makes removing transmitters from relatively straightforward: during inference, we simply need to treat the outputs corresponding to the invalidated transmitters as unauthorized. However, Disc does not offer this flexibility, requiring a retraining of the output block as in Table I.

Note that except in the case of Disc, it could be potentially very expensive computationally to adapt the NN models to additions to the authorized set, especially for large , even with the strategies highlighted in Table I (we will demonstrate this empirically in Section V). This is our motivation to explore alternative authorization schemes that are more adept at efficiently adapting to changes in .

Iv Information retrieval-based transmitter authorization

Fig. 2: An overview of the indexing and query processing procedure in the LSH authorization scheme

Information retrieval is a broad term that refers to the organization, storage and retrieval of information with respect to a repository of data objects such as signals, documents or images. A typical use case is the task of finding a similar object to a given query object in a repository of objects. More formally, assume we have a repository of objects, each of dimensionality ; given a query object , the task is to find an object similar to , based on some similarity metric, as efficiently as possible. In practice, evaluating similarity between objects in the raw data space is ineffective as proximity in data space does not typically correspond to semantic similarity. Therefore, a mapping is done from each data object to a feature vector where the similarity search could be achieved by performing a nearest-neighbor search over a database consisting of those feature vectors.

Assuming we have a training dataset containing sufficient signals from both and , a simple algorithm to solve the open set transmitter authorization problem is to find the most similar signal to the query signal : if we can infer , and if . A straightforward solution to the similarity problem is to perform an exact nearest neighbor search over the entire database. If the distance between two feature vectors could be computed in time (e.g. Euclidean distance), this process would take time. i.e. linear in . Assuming that the open set transmitter authorization problem could be solved by performing such a nearest-neighbor search, a per-query linear-time solution is too prohibitive, considering the fact that such an authorization system is expected to serve multiple authorization requests per second. Therefore a sub-linear time search is required.

Approximate nearest-neighbor search algorithms allow us to perform the similarity search in sub-linear time by making the compromise that the returned item need not be the strictly nearest-neighbor, but whose distance to the query object is sufficiently close to that of the strictly nearest-neighbor. A common approach to achieving sub-linearity is to eliminate the need for an exhaustive search by partitioning the database into some “buckets” such that and its true nearest neighbor

are in the same bucket with high probability; then, the exhaustive search for

need only be done inside that bucket, and not over the entire database. Locality Sensitive Hashing (LSH) [6] could be used to perform the partitioning such that this property holds.

Cryptographic hash functions (CHFs) attempt to create a large deviation in the hash value when there is a slight deviation in the input; conversely, LSH functions try to create hash values that preserve locality. In particular, LSH functions ensure that inputs that are close in the input space receive the same hash value with high probability. Although there are a number of LSH functions proposed in the literature, in this paper we chose the function based on random projections, mainly due to its simplicity and ease of implementation. For an input , the hash value is a binary string calculated as following: hyperplanes are randomly generated where ; then, the -th bit of , is set to 1 or 0 depending on whether the point is above or below the hyperplane in space. Here, is the length of the hash value, called the hash size (note that there are possible hash values). With defined this way, the indexing process is simply to place each signal in the bucket labeled with hash value , as visualized in Fig. 2.

Iv-a Using LSH database to perform authorization

Assume we have indexed a set of training signals into an LSH database; includes signal samples from and possibly samples from . For a query signal , we can use the LSH database to determine whether or not in a two step inference process:

  • Step 1: Determine , the approximate nearest-neighbor of . If does not exist, we infer that . Otherwise, we move to the next step.

  • Step 2: Let . If , we infer that , and that otherwise.

Note that the existence of in Step 1 is not guaranteed since the randomization involved means that similar items are not guaranteed to be grouped correctly. This shortcoming could be overcome by creating LSH databases instead of one, where the set of hyperplanes is generated independently in each case. Here, the exact nearest-neighbor search is performed on all buckets mapped to over all databases, increasing the chance that a nearest-neighbor is found. Furthermore, it should be noted that the two-step process above does not require to contain samples from ; in that case, intuitively should not exist as long as is large enough (there are enough buckets).

Iv-B Feature extraction

It has been shown that the activations produced by deeper layers of convolutional neural networks trained for image classification tasks could be used as a high-level image descriptor

[7]

. Inspired by this, we propose to use the activations invoked by the feature extractor block of a trained transmitter authorization NN model as the feature vector for a signal in our LSH authorization scheme. We call this NN our embedding model since it is used to extract a feature vector or embedding. Although this creates a dependence on a standard DL-based classifier, the expectation is that as long as the initial embedding model is expressive enough (trained on a sufficiently large dataset), it does not need to be retrained when the authorized set changes.

Iv-C Adapting to changes in

With the authorization scheme described above, it is straightforward to adapt to changes in . If transmitters in are added to , then we simply need to index signal samples collected from transmitters in to the LSH database. If some transmitters are removed from , then no modification to the LSH database is necessary: during Step 2 of the inference process, it should simply be noted that if , then in fact .

Iv-D Computational complexity and feature vector compression

It is easy to see that the indexing process has a cost of (cost of -dimensional dot products for the data-points, repeated for all databases). Since , the computational complexity of the two-step inference process is essentially the same as that of the operation:

  1. Calculating has a cost of since a -dimensional dot product needs to be calculated times

  2. If all data-points are distributed evenly over the buckets, then the exact nearest-neighbor search will constitute calculating the distance metric over data-points for a total cost of .

  3. Since is evaluated over databases, the total inference cost is .

Note that both the indexing cost and the inference cost has a linear dependence on the dimensionality of the feature vectors

. Therefore we could also attempt to add a dimensionality-reduction step during indexing as well as during inference; in this paper, we tested the use of an auto-encoder model for this purpose. Note that similar to the embedding model used for feature-extraction, the encoder does not need to be retrained during changes to

, as long as the initial auto-encoder was trained on a sufficiently large dataset.

V Experimental Evaluation

Auth. scheme Description Trained on Retrained on
DClass Initial DClass model and and (adapted as in Table I)
DClass sep Initial DClass model retrained from scratch and on and
LSH Standard LSH scheme
LSH small LSH scheme with a smaller database 300 samples from 300 samples from
LSH dim-red LSH scheme with dimensionality-reduced feature vectors
LSH dim-red small Similar to LSH dim-red but with a smaller database 300 samples from 300 samples from
TABLE II: Different authorization schemes considered in the experimental evaluation

We start by introducing the dataset and evaluation procedure, and discuss results obtained for different experiments.

V-a Dataset

A dataset consisting of 71 transmitters was captured on the Orbit testbed [8]. The receiver was a software defined radio (USRP N210) and each transmitter was an off-the-shelf Atheros WiFi module allowed to transmit over Channel 11 (with a center frequency of 2462 MHz and bandwidth of 20 MHz). Energy detection was used to extract packets after an IQ capture at a rate of 25 Msps for 1 second. Without any synchronization or further preprocessing, we used the first 256 IQ samples of each packet, containing the preamble, as the signal sample.

V-B Evaluation Procedure

(a) Retraining time
(b) Accuracy
(c) Inference Latency
(d)
Fig. 3: Performance of different authorization schemes against
(a) Accuracy
(b) Precision
(c) Recall
(d) Inference Latency (ms)
Fig. 4: Performance of LSH with the variation of and

As explained in Section III, removing transmitters from the authorized set is a relatively inexpensive procedure for all the NN architectures in Table I; therefore, we will only focus on the case of adding transmitters to the authorized set. Also, we will only use DClass for comparisons with the LSH scheme since it has better outlier detection accuracy than Disc while being less computationally intensive to train than OvA [3], offering a more fair comparison.

, and will be chosen randomly, subject to the constraints specified for each evaluation—however, when comparing different authorization schemes, the same , and will be kept. For chosen , and , the dataset split will be as follows: for the training dataset and the validation dataset , we use 70% of the samples belonging to , and all the samples belonging . The shuffled combination of this data is split into 80% for and 20% for . The test set contains all samples from and the remaining 30% of . We will define this method of splitting the dataset for some , and as where .

For each , and , we start with a DClass model trained on and , and an LSH authorization scheme where is used to create the initial LSH database. The composition of the DClass feature extractor block was the same as that used in [3]. A frozen copy of the initial DClass model will be used as the embedding model for any LSH authorization schemes. An auto-encoder is also trained on ; the resulting encoder is isolated and frozen to be used as the encoder for any dimensionality reduction. Then for a given value of , a set of transmitters will be randomly chosen from as and the dataset will be split again to form and .

Table II details the set of authorization schemes we use in our experiments, including on which datasets they are trained and retrained on. The small datasets are considered because the inference cost is positively correlated with , and therefore should help reduce the inference latency. Note that Euclidean distance was used as the distance metric for LSH schemes.

Different authorization schemes will be evaluated on with respect to:

  • Accuracy: Outlier detection accuracy on

  • Inference latency: The time to output the authorization decision per query signal, averaged across

  • Retraining time: The total time required to adapt the deployed authorization system to the change in .

It should be stressed that as long as the LSH scheme does not significantly compromise the accuracy and inference latency compared to DClass, retraining time is the critical metric of interest. Training time, which is the total time required to train each authorization system, is not analyzed as it is predictably higher for LSH schemes due to the indexing overhead; this is however, a good compromise to make as the training-phase occurs before the deployment of the authorization system.

V-C Adding transmitters to

In this experiment, we fix and start with , , and then add transmitters to from . The variation of retraining time, outlier detection accuracy and inference latency versus are given in Fig. 3. Fig. (a)a provides strong evidence that LSH authorization schemes are able to adapt to the change in the authorized set much faster than the DL models; in particular, we are able to see a roughly 100x improvement in retraining time (note that the time-axis is in logarithmic scale). Furthermore, from Fig. (b)b we can immediately see that LSH schemes are able to match or even outperform the DClass models in terms of outlier detection accuracy. Also note that DClass matches the performance of DClass sep, justifying the freeze-and-train method proposed in Table I. Fig. (c)c paints a contrasting picture: DClass models are able to perform authorization decisions much faster than the standard LSH scheme. This justifies the purpose of opting to build smaller LSH databases with dimensionality-reduced features. Note in particular that LSH dim-red small is able to match the latency performance of DClass while still slightly outperforming it on accuracy performance. Therefore, it is clear that LSH authorization schemes are a viable alternative to DL models, especially when is expected to evolve over the lifetime of the authorization system.

V-D Effect of and

Understanding the performance impact of the two hyper-parameters and (number of LSH databases and hash size) can help design LSH authorization systems to fit individual needs and flexibilities. To evaluate this, we fixed , , , and varied , to obtain the results in Fig. 4.

Recall from Section IV-D that the indexing cost is directly proportional to both and ; therefore as expected, we observed that the retraining time grew with both and (not displayed in Fig. 4 for the sake of brevity). More interestingly, from Fig. (b)b and Fig. (c)c we see that for large , as is increased, the precision increases but the recall decreases. Increasing amortizes the effect of bad hyperplane selections, ensuring that true nearest-neighbors “collide” (fall to the same bucket) on at least one of the databases. This results in a decrease of false-positives (authorized signals being flagged as unauthorized) and hence an increase in precision, as it prevents Step 1 of the two-step inference process from failing erroneously. However, increasing also has the side-effect of increasing the probability that an unauthorized signal collides with authorized signals (imagine a case when exclusively has samples from ), thereby increasing false negatives and hence decreasing the recall. Decreasing increases the likelihood of false collisions resulting in increased false-negatives and hence lower recall as seen in Fig. (c)c. However, if is too high at low , it could result in similar points not colliding, resulting in false positives and hence low precision; this can actually be seen in Fig. (b)b, where for , the precision increases at first but then decreases. Due to false negatives varying in a larger range (higher range of recall in Fig. (c)c) than false positives (lower range of precision in Fig. (b)b), it is unsurprising that in Fig. (a)a the accuracy follows the same trend as the recall.

Arguably the most surprising result in Fig. 4 is that higher accuracy in Fig. (a)a does not come at the cost of higher latency in Fig. (d)d; in fact, it seems that higher accuracy is attainable with lower latency. Although this might seem counter-intuitive, it is explainable from the inference cost formula we derived in Section IV-D: . As it dictates, we can clearly see the linear variation of with in Fig. (d)d. However, the variation of with very much depends on the particular value of ; in fact, assuming the formula for holds, it can be theoretically shown that minimizes . In our case, so should have been ideal, which is seemingly contradicted in Fig. (d)d due to the latency continuing to drop as is increased upto 25. This discrepancy is most likely due to the assumption made in deriving that data-points in the LSH database are evenly divided across the buckets, which may not be true due to the nature of the data involved. In fact, as is increased beyond 25, around the latency starts to increase as the cost of calculating becomes too prohibitive.

The takeaway from this experiment is that the performance impact of and is hard to predict due to dependence on factors like , composition of and the nature of the data involved. Therefore, it would be advisable to use a validation split of the dataset to calibrate them to the specific use case.

Vi Conclusion

In this paper, we considered the problem of adapting to a dynamic authorized set in RF transmitter authorization. First, we demonstrated how state-of-the-art DL models could be adapted to changes in the authorized set. Then we described how locality sensitive hashing could be used to facilitate approximate nearest-neighbor search in the realm of information retrieval to solve the transmitter authorization problem by building an LSH database. With this approach, incorporating changes to the authorized set in terms of additions and removals was shown to be manageable with simple changes to the underlying LSH scheme. From empirical results we showed that LSH schemes offers dramatically reduced retraining times compared to DL models when is changed, while matching their accuracy; although LSH schemes tended to have higher inference latencies, it was shown that the latency-gap could be bridged by building smaller databases with dimensionality-reduced features. Furthermore, we showed how the number of LSH databases and the hash size interplay to trade-off precision, recall and latency.

Fig. 5: Using LSH-based authorization as backup for DClass

Even though we demonstrated many promising features of LSH-based authorization, these results are preliminary, and hence our message from this paper is not for them to replace DL models as the state-of-the-art. Since the LSH scheme we evaluated relied on a DL-based authenticator as its feature-extractor by design, our proposition is that they be used as a quick-adapting backup to DL models in the face of sudden changes in the authorized set: this is depicted in Fig. 5. As the LSH scheme could be adapted quickly, we can use it as a backup authenticator while the DL model is down, and retrain the DL model in the background. This ensures that the authorization system experiences minimal downtime while not compromising much in terms of accuracy or latency.

References

  • [1] W. Wang, Z. Sun, S. Piao, B. Zhu, and K. Ren, “Wireless Physical-Layer Identification: Modeling and Validation,” IEEE Transactions on Information Forensics and Security, vol. 11, pp. 2091–2106, Sept. 2016.
  • [2] S. Riyaz, K. Sankhe, S. Ioannidis, and K. Chowdhury, “Deep Learning Convolutional Neural Networks for Radio Identification,” IEEE Communications Magazine, vol. 56, pp. 146–152, Sept. 2018.
  • [3] S. Hanna, S. Karunaratne, and D. Cabric, “Open Set Wireless Transmitter Authorization: Deep Learning Approaches and Dataset Considerations,” IEEE Transactions on Cognitive Communications and Networking, vol. 7, pp. 59–72, Mar. 2021.
  • [4] R. Vareto, S. Silva, F. Costa, and W. R. Schwartz, “Towards open-set face recognition using hashing functions,” in 2017 IEEE International Joint Conference on Biometrics (IJCB), pp. 634–641, Oct. 2017.
  • [5] X. Dong, S. Kim, Z. Jin, J. Y. Hwang, S. Cho, and A. B. J. Teoh, “Open-set face identification with index-of-max hashing by learning,” Pattern Recognition, vol. 103, p. 107277, July 2020.
  • [6] A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in High Dimensions via Hashing,” in Proceedings of the 25th International Conference on Very Large Data Bases, VLDB ’99, (San Francisco, CA, USA), pp. 518–529, Morgan Kaufmann Publishers Inc., Sept. 1999.
  • [7]

    A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky, “Neural Codes for Image Retrieval,” in

    Computer Vision – ECCV 2014 (D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, eds.), Lecture Notes in Computer Science, (Cham), pp. 584–599, Springer International Publishing, 2014.
  • [8] D. Raychaudhuri, I. Seskar, M. Ott, S. Ganu, K. Ramachandran, H. Kremo, R. Siracusa, H. Liu, and M. Singh, “Overview of the ORBIT radio grid testbed for evaluation of next-generation wireless network protocols,” in Wireless Communications and Networking Conference, 2005 IEEE, vol. 3, pp. 1664–1669, IEEE, 2005.