Federated Mimic Learning for Privacy Preserving Intrusion Detection

12/13/2020 ∙ by Noor Ali Al-Athba Al-Marri, et al. ∙ Hamad Bin Khalifa University 0

Internet of things (IoT) devices are prone to attacks due to the limitation of their privacy and security components. These attacks vary from exploiting backdoors to disrupting the communication network of the devices. Intrusion Detection Systems (IDS) play an essential role in ensuring information privacy and security of IoT devices against these attacks. Recently, deep learning-based IDS techniques are becoming more prominent due to their high classification accuracy. However, conventional deep learning techniques jeopardize user privacy due to the transfer of user data to a centralized server. Federated learning (FL) is a popular privacy-preserving decentralized learning method. FL enables training models locally at the edge devices and transferring local models to a centralized server instead of transferring sensitive data. Nevertheless, FL can suffer from reverse engineering ML attacks that can learn information about the user's data from model. To overcome the problem of reverse engineering, mimic learning is another way to preserve the privacy of ML-based IDS. In mimic learning, a student model is trained with the public dataset, which is labeled with the teacher model that is trained by sensitive user data. In this work, we propose a novel approach that combines the advantages of FL and mimic learning, namely federated mimic learning to create a distributed IDS while minimizing the risk of jeopardizing users' privacy, and benchmark its performance compared to other ML-based IDS techniques using NSL-KDD dataset. Our results show that we can achieve 98.11 detection accuracy with federated mimic learning.



There are no comments yet.


page 1

page 3

page 4

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Smart and interconnected internet of things (IoT) devices became prominent in our daily life as they provide users with vital services through human-to-machine or machine-to-machine communications [Mosenia2017Comprehensive]. Smart devices in smart homes help users to turn on the heater, lock the doors, and monitor through cameras and smart alarms to ensure the safety and security of home. Due to the high level of connectivity risks of IoTs, adversaries can generate intrusive attacks on IoT devices to control or to monitor the user behavior[Bugeja2017analysis]. Therefore, safeguarding IoT devices against these attacks is vital.

Researchers proposed various solutions to prevent intrusion attacks, which rely on analyzing the network. These solutions are called Intrusion Detection Systems (IDS) in general. Recently, with the growth in data availability and processing power, machine learning (ML) based techniques became prominent in IDS. However, ML-based approaches jeopardize the privacy of the end users

[liu2017smart]. The privacy of users endangered because existing ML-based techniques rely on transferring the user data, which contains sensitive information of the user’s behavior to a centralized server for processing and training ML models. Hence, preserving user’s data privacy is becoming the focus of the ML-based IDS research.

Federated learning (FL) is a revolutionary distributed machine learning technique that utilizes the computational power of edge devices[mcmahan2016communication] without exchanging data samples of users. The local models are trained with user data at the device, and these models are transmitted to the centralized server. Hence, FL is partially privacy-preserving and communication-wise efficient since it avoids the transmission of vast amounts of data[park2019wireless, niknam2019federated]. However, the privacy preservation of FL may suffer from reverse engineering since it is trained on sensitive user data.

Recently, as an alternative solution to the privacy problem, mimic learning is proposed by [Shafee2020Mimic] to preserve the privacy of end-users. The research focuses on using mimic learning in order to allow data transmission of intrusion detection knowledge from a teacher model to a student model. The authors indicate that the performance of both teacher and student model is nearly identical, although the datasets for both of them are different. It proves that the unlabeled dataset that is trained by the teacher model can be used in order to transfer knowledge to the student model without disclosing any sensitive information. Nevertheless, mimic learning provides a trained model from a single user; hence it is not distributed and may not provide a broad solution.

In this paper, we propose a novel solution that integrates FL, along with mimic learning for developing an ML-based IDS. Utilizing the advantages of both techniques, we create a privacy-preserving distributed ML-based IDS. We utilize the NSL-KDD dataset to benchmark our proposed system against existing solutions. In our system model, we preprocess the dataset as well as applying feature selection to reduce the computational load on the edge devices. Our machine learning model is based on MultiLayer Perceptron (MLP), and the same model with the same parameters were used in all scenarios for a fair benchmark. To the best of our knowledge, we are the first to apply federated mimic learning for privacy-preserving purposes which will help in providing up-to-date intrusion detection systems.

Our main contributions with this study are listed below;

  • We have developed a novel federated mimic learning-based IDS technique by taking advantage of both FL and mimic learning to minimize the possibility of obtaining any sensitive data against reverse engineering attacks on the student model.

  • We have implemented the system using Python on Google Colab and carried out simulations using the real-world dataset (NSL-KDD) to benchmark our proposed model against centralized and federated ML-based IDS.

This paper is structured as follows. In Section II, a brief overview of existing literature on IDS is presented. We explain our proposed method of federated mimic learning for the IDS in Section III. The dataset and the preprocessing techniques used in our simulations are explained in Section IV. Subsequently, we analyze the numerical results for the performance benchmark of our proposed method in Section V. Finally, we present our concluding remarks in Section VI.

Ii Literature Review

Machine learning techniques are becoming pre-eminent in the design of IDS due to their capability of extracting hidden connections in the data. In this section, we will review several ML-based IDS in the literature as well as the studies about the available datasets for IDS. In [sikder20176thsense]

, the authors proposed ’6th Sense’ which is an IDS that helps in strengthening IoT security by detecting changes found in sensor data. It then creates a relative model to differentiate between malicious and non-malicious behaviors of target sensors. The proposed system utilizes three types of ML-based detection mechanisms: Markov Chain, Naive Bayes, and Logistic Model Tree. The proposed solution collects sensor data of user activity by custom made applications. The final result of the data analysis shows if the current state of the device is considered as malicious or not. Finally, the proposed solution achieved accuracy over 95% using the three ML methods, which makes it highly effective and efficient when detecting sensor-based attacks.

As a solution to privacy problems, the authors in [meidan2019privacy]

investigate how to ensure user’s behavioral information and traffic data privacy when implementing an ML-based IDS behind a network address translation (NAT) in a smart home. The local detector, between the router and the optical network terminal, monitors the network address translated (NATed) traffic data emerging from the home network, and the pre-trained classifier is applied to detect IoT devices. Results show that using the proposed solution allows service providers to detect 73% of any NATed IoT device. However, this approach requires manual updating of the classifiers.

A study by [rawat2019intrusion]

uses the NSL-KDD data set to analyze it using ML techniques as well as deep learning algorithms to implement and evaluate IDS in the networks. A deep neural network was trained, and the following scenarios were considered to compare the models using NSL-KDD:

  • Classifying the records of network connections either as a normal connection or an attack according to the features existing in the NSL-KDD dataset.

  • Classifying the records of network connections either as a normal connection or an attack according to the minimum number of features in the NSL-KDD dataset.

A study by [nkiama2016subset]

proposed a feature selection mechanism that focuses on excluding non-relevant features to classify features that will contribute to the improvement of the detection rate, based on the performance of each feature during the selection process. A recursive feature elimination procedure was used to classify the correct related features and associate them with a decision tree-based classifier.

A survey paper by [sherasiya2016survey] analyzed the different types of existing IDS such as signature-based IDS which is based on analyzing the network traffic and compare the signature against predefined attack patterns, anomaly-based IDS that is based on understanding what is considered normal behavior in the network and in case it detects behavior that is different from usual then it defines it as an intrusion attack, as such it is considered to be more efficient compared to the signature-based in terms of detection. Lastly is specification-based IDS, which defines the normal behavior that is done manually, therefore ensures the reduction of false-positive rate.

In [Mourabit2014Wireless]

, the authors proposed an anomaly detection-based approach that suggested a mobile agent-based intrusion detection program within Wireless Sensor Networks (WSN). It uses a multi-agent and classification-based intrusion detection. The proposed system has fewer parameters to characterize the attacks so that work can be enhanced by creating more complex detection parameters and using statistical detection of anomalies to enable the creation of signatures for the attack.

The authors of [Karuppiah2014Novel] proposed a hierarchical energy-efficient IDS to detect Sybil node in WSN. The system proposed lays down two cases: In the first case, a centralized approach is developed to send an acknowledge queried data packets. The cluster head manages and maintains a table that is used to store names and locations of all nodes. In the second case, all valid nodes with their identities and current position coordinates respond to the cluster head. Additionally, Sybil node sends its identities and current position, so that cluster head matches such data in a table with valid nodes. If any problems occurred, the Sybil node is identified. The outcome of the simulation reveals that the proposed system increases energy efficiency and reliably detects the Sybil node.

In [Shafee2020Mimic]

, mimic learning is utilized to deploy privacy-preserving ML-based IDS. The teacher model is trained on private labeled data with four different types of classifiers: Decision Tree Induction (DTI), Random Forest (RF), Support Vector Machine (SVM), and Naïve Bayes (NB). The classifier with the highest result is then selected, and the teacher model is used to label an unlabeled public dataset. The newly labeled public dataset is used to train the same four types of classifiers to generate the student models for each. The student model is used as a privacy-preserving knowledge transfer. Results show that the RF classifier for both the teacher and student model had the highest accuracy in detection while NB was the lowest.

Fig. 1:

Feedforward Multilayer Perceptron Architecture.

(a) FTML
(b) FSML
Fig. 2: System Models for Federated Teacher Mimic Learning (FTML) and Federated Student Mimic Learning (FSML).

Iii Federated Mimic Learning for Privacy-Preserving Intrusion Detection

Feedforward Multilayer Perceptrons (MLPs) are used as our neural network architecture for the IDS, as shown in Fig. 1

. We created two hidden layers with 256 neural units each. For activation function, we use the Rectified Linear Unit (ReLU). We use a dropout rate of 0.4 to ensure regularization after every hidden layer. Dropout layers help in controlling over-fitting by removing an individual unit with a random probability while training the model. The softmax activation function is used for the output layer of the classifier.

Iii-a Federated Mimic Learning

The implementation of federated mimic learning is based on FL, as shown in Fig.2. The simulation and learning parameters are kept the same with Deep Learning and FL for a fair benchmark. The teacher models of users are created utilizing each users’ private training dataset. The teacher models are utilized to label the unlabeled public dataset at each user. The labeled public dataset of each user is then used to generate the student models. Then the student models of each user are transferred to the centralized server for federated averaging to create the new global model. There are two methods for implementing federated mimic learning. If the averaged global model is returned to the user and included in the loop of training from the teacher model, it is called Federated Teacher Mimic Learning (FTML). If the global model is included in the loop of training from the student model, it is called Federated Student Mimic Learning (FSML). The pseudo-codes of FTML and FSML are shared with Algorithm 1 and Algorithm 2.

Iv Dataset and Preprocessing

Iv-a NSL-KDD Dataset

Label # of Training Samples # of Test Samples
(0) DoS 41,334 4,592
(1) Normal 60,608 6,734
(2) Probe 10,490 1,165
(3) R2L 895 99
(4) U2R 46 5
TABLE I: Number of samples in training set and test set.

NSL-KDD dataset is one of the most used traffic data for developing IDS. The number of samples for the dataset for each type are summarized in Table I. It is an enhanced version of the KDD’99 dataset, which had some drawbacks in terms of the massive amount of repeated records and dataset imbalance, which resulted in easily detecting attack classes. The NSL-KDD dataset was introduced to overcome such drawbacks. This dataset includes 41 features which belong to 3 major families:

  • Basic Features: Features associated with connection information such as hosts, ports, protocols, and services used.

  • Traffic Features: Features that are calculated as a collection during window interval.

  • Content Features: Features that are obtained from data packet or payload and are related to a certain protocol or application used.

Every row (sample) in the NSL-KDD dataset contains a label to indicate whether it is normal connectivity or a particular type of attack. The dataset includes four distinctive attack classes:

  • Denial of Service (DOS): A cyber-attack that targets a device or a machine to make it unavailable to the intended users either temporarily or indefinitely by disrupting its services.

  • Probe: The initial step of an attack. The attacker gathers information on web applications, operating systems, databases, networks, and the devices connecting to it. Attackers scan them to identify both known and unknown vulnerabilities.

  • User to Root (U2R): An attack that allows an attacker to gain root privileges when accessing a machine.

  • Remote to Local (R2L): An unauthorized access from a remote device to a local device.

We mapped each attack label to five classes: four are the attacks mentioned above or normal. After mapping them, we added an output column called ”Attack” that indicates the type of the attack or normal connectivity.

Iv-B One-Hot Encoding for Categorical Data

There are three kinds of features in the NSL-KDD dataset: nominal, binary, and numeric. Binary data are data that contains numeric values, which is enough to indicate their presence by either (0) or (1). Nominal data are variables that include categorical values instead of numeric values. Deep learning techniques cannot operate on such data; therefore, one-hot encoding is applied to nominal features in order to change them to numeric features.

Additionally, since we also apply one-hot encoding, it will transform into 70 new features while the ”flag” feature will transform into 11 new features. Therefore, the original 41 features will become 127 features in the dataset.

After transforming all the nominal features into numeric using one-hot encoding, we apply normalization in order to range the values between to . This allows us to balance the dataset from having vast numbers that might affect our model accuracy due to imbalanced classifiers.

Iv-C Feature Elimination

Feature selection with logistic regression helps to identify the critical features in a dataset. It is a crucial step when tuning a model. It helps in reducing the dimensionality of the dataset, which enhances the performance and speed of a model. This is followed by recursive feature elimination (RFE) which selects smaller sets of features of the dataset. Critical features can be obtained using feature importance attribute in RFE. In order to select the critical features for each class of attack and normal class, we implemented recursive feature elimination with logistic regression in order to get the top 20 features for detecting each type of attack. We were able to differentiate each type of attack as well as the normal cases by setting 1 and 0 as the output of training samples. This process is repeated in the dataset until the desired number of selected features is obtained. Out of 127 features, 42 features were selected as a result of logistic regression-based feature selection. We then used these selected features to feed it into our model for both training and testing.

Parameter Value
Deep Learning Libraries

Google TensorFlow & Keras

Optimizer Adam
Learning Rate
, ,
Number of Hidden Layers
Number of Hidden Nodes nodes in each hidden layer
Activation Function ReLu
Loss Function Mean Absolute Error
Batch Size

Number of Epochs

Number of Rounds
TABLE II: System parameters for the simulations.

V Numerical Results

The preprocessed NSL-KDD dataset was used as the input for our deep learning model. We apply a cross-validation method to evaluate our model performance. The dataset is split into for the training set and for the testing set. Therefore, the training dataset samples, while for the testing set, it is . The ADAM optimizer is used for training the model, while the batch size is , and the learning rate is set to . Our model and learning algorithms are implemented using TensorFlow based architecture on the Python environment. Code evaluations were implemented on Google Colab with TPU acceleration. The simulations parameters are shown in Table LABEL:table:parameter.

Label Accuracy Precision Recall FalseAlarm F-Score
Normal 98.6 98 100 2.86 99
DoS 98.89 99 98 0.54 98
Probe 99.8 100 98 0.03 99
R2L 99.26 0 0 0 0
U2R 99.95 0 0 0 0
TABLE III: DL-based classifier accuracy results.
Fig. 3: Confusion matrix for DL-based classifier.

The result indicates that our deep learning model is capable of detecting attacks that are generated on IoT devices traffic with a percentage equal to 98.15% for the training, while the test accuracy is equal to 98.28%. Results of centralized deep learning-based IDS is summarized in Table III and Fig. 3 shows the confusion matrix for centralized deep learning-based classifier.

As a second step, the FL-based solution is implemented to benchmark its performance against centralized deep learning. In this case, the user’s data is not transmitted to the centralized intrusion detection provider. Instead, 5000 data points are distributed over ten users and used to train local models for each user. Then, each user provides their trained local models to the centralized server of the intrusion detection system. The centralized server applies model averaging to local models to create a new global model. The global model is returned to individual users for federated training. This loop is considered a round. After 20 rounds, the results indicate that with preserving user’s privacy, we were able to achieve a 98.61% detection accuracy using FL. This result shows that privacy of users can be enhanced by utilizing edge resources with high accuracy. Table IV and Fig. 4 below summarizes the result of each attack category using the confusion matrix in FL-based classifier.

Label Accuracy Precision Recall FalseAlarm F-Score
Normal 98.62 98 100 2.93 99
DoS 99.13 100 98 0.006 99
Probe 99.9 100 99 0.02 100
R2L 99.59 100 0.45 0 0.62
U2R 99.95 0 0 0 0
TABLE IV: FL-based classifier accuracy results.
Fig. 4: Confusion matrix for FL-based classifier.

To further enhance data privacy, we have implemented federated mimic learning, as explained in Section III. To achieve that, we divided the training data like the following: 60% of the dataset is used in the private dataset for teacher models, while the remaining 40% were used as the unlabeled public dataset for student models. The teacher models are trained at ten users, and these models are used to label the unlabeled public dataset at each user. Each user trains its student model with the public dataset labeled with its teacher model. The student models are then averaged to obtain the global model. The global model returned to the training of either teacher model as in FTML or student model as in FSML. The functionality of the proposed method was tested using the test dataset. Results indicate that with the use of the proposed federated mimic learning-based method, we were able to achieve a 98.118% detection accuracy using FTML. Table V and Fig. 5 below summarize the accuracy and the confusion matrix for FTML, respectively.

Label Accuracy Precision Recall FalseAlarm F-Score
Normal 98.14 97 100 3.93 98
DoS 99.09 100 98 0.03 99
Probe 99.87 100 99 0.02 99
R2L 99.14 0 0 0 0
U2R 99.98 0 0 0 0
TABLE V: FTML-based classifier accuracy results.
Fig. 5: Confusion matrix for FTML-based classifier.

In addition to that, we tested the detection accuracy in FSML. Results indicate that with the use of federated mimic learning global model in the training process of the student model, we were able to achieve a 98.110% detection accuracy using federated student mimic learning. Table VI and Fig. 6 below summarizes the result with respect to each attack category and the confusion matrix for FSML.

Label Accuracy Precision Recall FalseAlarm F-Score
Normal 98.13 97 100 3.95 98
DoS 99.08 100 98 0.03 99
Probe 99.87 100 99 0.02 99
R2L 99.14 0 0 0 0
U2R 99.98 0 0 0 0
TABLE VI: FSML-based classifier accuracy results.
Fig. 6: Confusion matrix for FSML-based classifier.

Note that FSML requires half the computational cost of FTML same as since it only requires the device to train local model once per round, instead of two as in FTML. As a result, FSML can achieve close performance to the centralized deep learning and FTML, while improving the privacy preservation of the user data significantly.

Classification Algorithm Class Name Accuracy
Random Forest Normal 99.1
DoS 98.7
Probe 97.6
R2L 96.8
U2R 97.5
Naive Bayes Normal 70.3
DoS 72.7
Probe 70.9
R2L 69.8
U2R 70.7
FTML Normal 98.14
DoS 99.09
Probe 99.87
R2L 99.14
U2R 99.98
FSML Normal 98.13
DoS 99.08
Probe 99.87
R2L 99.14
U2R 99.98
TABLE VII: Results of proposed FTML and FSML model compared to Random Forest and Naive Bayes[Revathi2013ADA].

A study by [Revathi2013ADA] analyzes various ML techniques and their detection performance for IDS using the NSL-KDD dataset. According to the authors, Random Forest (RF) classifier achieved a 99.1% detection accuracy for normal traffic. Furthermore, RF achieved 98.7% detection accuracy for DoS attacks, while its detection accuracy for Probe is 97.6%. R2L and U2R detection accuracy using RF is 96.8% and 97.5%. Additionally, Naive Bayes achieved 70.3% detection accuracy for detecting normal traffic. As for the attack classes, Naive Bayes achieved 72.7% detection accuracy for DoS attacks, 70.9% for Probe, 69.8% for R2L, and 70.7% for U2R. By comparing the above with our results, our proposed model shows that both FTML and FSML have higher detection accuracy compared to RF. Similarly, Naive Bayes has the lowest detection accuracy with a huge difference compared to our proposed model, as shown in Table VII.

Vi Conclusion

In this study, we propose an ML-based method of IDS for IoT devices using federated mimic learning for preserving user privacy. The paper was divided into three types of implementation: first, we implemented a centralized deep learning model of the IDS, then is implemented. After that, we implemented the proposed federated mimic learning method covering both federated teacher mimic learning, and federated student mimic learning as an ML-based IDS. Results show that federated mimic learning provides a detection accuracy while maintaining privacy similar to deep learning and models by 98.118% in federated teacher mimic learning (FTML). Additionally, we obtained a result with a minimal difference while optimizing the half the computational cost of FTML with the federated student mimic learning (FSML) with a 98.11% detection accuracy. To the best of our knowledge, we are the first to apply federated mimic learning for privacy-preserving purposes, which will help in providing up-to-date intrusion detection systems.